A comparison of study designs for estimating overdiagnosis in cancer screening

Overdiagnosis is defined as the diagnosis of an asymptomatic cancer that would not have become clinically evident during the person’s lifetime in the absence of screening or similar activities, such as diagnostic imaging tests that reveal “incidentalomas.”

The whole concept of overdiagnosis seems counterintuitive to the public and even to many clinicians, because the traditional view, encouraged by public health messages and by medical training, is that cancer is lethal unless detected early. This core belief system has driven the quest for increasingly sensitive screening tests that can detect as many asymptomatic cancers as possible.

However, the natural history of such asymptomatic tumors is unknown, and some may be very indolent or even non-progressive. Nevertheless, because cancer overdiagnosis and its potential to trigger overtreatment are important harms of screening, recent research has been devoted to estimating its frequency in association with various screening tests.

The aim of this commentary is to describe and contrast the various methods used to quantify/estimate the amount of overdiagnosis incurred by several types of cancer screening. The primary approaches are pathology or imaging studies, mathematical or simulation modeling, examination of trends in population data, and randomized trials.⁽¹⁾

Pathology or imaging studies are aimed at predicting future behavior of cancers at a fixed point in time, based on pathologic or anatomic features. The identified features are assumed to be related to the natural history of the disease and are predictive of the ultimate progression, or lack thereof, and outcome of the cancer.

It seems clear that the key assumption, based on a static picture of an evolving process, is difficult, if not impossible, to verify and that this approach is therefore of limited utility since it misses the dynamic aspects of lesion progression.

In a typical modeling approach, a simulation process is used to generate the time of clinical detection in the absence of screening for each screen-detected cancer, based on assumptions about the lead time distribution (lead time is the time between screen detection of a preclinical cancer and the time of clinical diagnosis had the screening not occurred). A time to death from other causes is also generated.

The cancer is overdiagnosed if death from other causes occurs prior to the simulated time of clinical detection. The resulting estimate of overdiagnosis is dependent upon the choice of the lead time distribution—virtually never known, because it is not observable for any given individual.

The challenge of this modeling approach, therefore, is obtaining a valid estimate of the lead time distribution. Compounding the challenge is the likelihood that for many cancers the preclinical cancers in a population are a mixture of progressive cancers (that would eventually become clinically manifest in the absence of screening) and non-progressive cancers (that would never become clinically manifest).

In practice, the lead time distribution is typically derived from the distribution of time with progressive cancer, and this is obtained from the distribution of the duration of preclinical cancer. Ideally, this preclinical duration distribution would include the probability of non-progressive cancer.

Although understanding the level of harm is important, obtaining accurate estimates can present a challenge; and commonly used study designs can yield seriously biased results. The gold standard, if practical, is a large-scale randomized trial with a stop-screen design and sufficient follow-up.

This becomes circular, since non-progressive cancers are an important component of the very entity being estimated. However, the assumption is often made that there is no non-progressive cancer, and further that the distribution of the duration of preclinical cancer used to estimate the mean lead time is exponential.

Together, these overly simplistic assumptions could lead to a substantially biased estimate of overdiagnosis. A more realistic distribution of the duration of preclinical cancer would be desirable, but it is then statistically difficult, if not impossible, to separate the distribution of time to progressive cancer from the probability of non-progressive cancer; and obtaining a reliable estimate of overdiagnosis is problematic.⁽²⁾ Assumptions about the distribution may explain why microsimulation models of the same screening test have yielded a wide range of estimates of overdiagnosis.⁽¹⁾

Another commonly used method to estimate overdiagnosis is the difference between the annual incidence of cancer in a population or cohort receiving screening and the estimated annual incidence if, counterfactually, the population screened were not screened. Several estimates of the latter quantity have been used, all of which have important limitations.

Of major importance is the lack of an internal direct comparison group, as would be created by randomization. Absent an internal comparison group, underlying annual incidence requires extrapolating prescreening cancer trends, annual contemporaneous incidence in an unscreened geographic region, and/or annual contemporaneous incidence among persons who did not accept the screening invitation. All such comparison groups are prone to selection bias and other confounding factors. This is also the reason why single-arm screening studies cannot provide a reliable estimate of overdiagnosis.⁽¹⁾

In principle, the most direct approach to estimate overdiagnosis is a randomized trial with a stop-screen design and sufficient duration of follow-up. ^(3,4) In a stop-screen randomized trial, participants in the screened arm(s) receive periodic screening until the start of a follow-up period. In this trial design, the number of persons overdiagnosed is estimated by an excess cumulative incidence, the difference in the cumulative number of incident cancers between the screened arm and the control arm that extends well beyond the active screening period.

Unbiased estimation requires that the length of follow-up exceed the longest lead time. Ideally this involves a design with multiple screening rounds, compliance with screening that is nearly perfect, neither study group gets screened after the prescribed screening period ends, and all participants are followed to death. The ideal is seldom, if ever, achieved, but at least some randomized trials have achieved useful approximations of the ideal.

In a trial in which both study arms are of equal size, the number of overdiagnosed cases can be determined as the difference between the total numbers of cases in two arms. Let ns be the total number of cases in the screened arm, nc the total number of cases in the control arm, and nO the number of overdiagnosed cases. Then nO = ns – nc.

All study participants are not followed to death in actual trials. However, if follow-up continues well after screening stops, it is possible the cumulative number of cases in each study arm will become the same. Then there is no overdiagnosis; nO = 0. If instead the difference in total cases between the two arms becomes constant, nO follows as above. A useful design in practice then is a two arm stop screen randomized trial with follow-up long enough to determine nO.

For example, in the ovarian component of the PLCO trial, from the initial endpoint report, screening was conducted over a five-year period but follow-up continued through year 13. Compliance with the screening tests was about 80% and contamination was very low at less than 5%. A persistent excess of cumulative ovarian cancer cases in the screened arm was observed from year 7 through year 13, the tell-tale hallmark of overdiagnosis.⁽⁵⁾

When two screening tests for the same cancer are to be compared, and benefit has not been demonstrated for either test, an unscreened control group is ethical and desirable. The preferred design is the three arm stop screen randomized trial. Participants in one arm are screened with one test, participants in a second arm are screened with another test, and participants in the third arm serve as controls.

We assume equal numbers of participants in each arm and equal periods of screening in the two screened arms. Ideally, all participants would be followed to death. In practice, follow-up must be sufficiently long to establish either equivalence or a constant difference in total cases between the control arm and each screened arm.

From the control arm the total number of cases is nc. Let i = 1,2 index the two tests and the corresponding trial arms. The total number of cases in screened arm i is nsi. Each test arm is then compared to the control arm to determine the individual numbers of overdiagnosed cases as nOi = nsi – nc. This design thus allows one to observe the number of cases overdiagnosed by each test.

If an unscreened control arm is not ethical or practical, a possibility that still has utility for guiding screening policy is the two-arm stop screen randomized design with follow-up. It is not possible to determine the number of overdiagnosed cases because there is no control arm. However, it is possible to determine the difference in overdiagnosed cases ∆nO = ns2 – ns1. This can aid in ranking the harm done by each screening program with respect to overdiagnosis.

It is worth noting a concern inherent in the often used single cohort paired design for comparing two screening tests, wherein all participants are in one cohort, and each person receives both tests. Importantly, overdiagnosed cases cannot be determined from this design, because there is no control arm.

Cases will be identified by screening and during post-screening follow-up. The totality of cases existing during follow-up is a mixture of cases detected by one test or the other that would have been diagnosed clinically in the absence of screening, plus cases missed by both tests, plus cases newly developing after screening ceases, plus cases overdiagnosed by one test or the other test or both. However, the overdiagnosed cases cannot be identified nor linked to one test or the other in this design.

The extent of overdiagnosis can depend upon various features of a screening program. In the randomized trial designs, near-100% attendance at screening is often achievable, but this is very unlikely to occur in a population screening setting. Any reduction in attendance will likely reduce the estimate of potential overdiagnosis.

Further, in the designs with a control arm, screening contamination in that arm will bias the estimate of overdiagnosis downward. Similarly, when comparing two tests, differential compliance in the screening arms will bias the comparison of overdiagnosis. The number of screening rounds and the length of the interval between screens will also influence the amount of overdiagnosis.

In summary, estimates of the amount of overdiagnosis associated with a screening test have important implications for people making an informed decision based on the benefits and harms of the test, and for generating guidelines for the public and health professions.

Although understanding the level of harm is important, obtaining accurate estimates can present a challenge; and commonly used study designs can yield seriously biased results. The gold standard, if practical, is a large-scale randomized trial with a stop-screen design and sufficient follow-up.

References

Carter JL, Coletti RJ, Harris RP. Quantifying and monitoring overdiagnosis in cancer screening: a systematic review of methods. BMJ 2015; 350: g7773 doi.
Baker SG, Prorok PC, Kramer BS. Challenges in quantifying overdiagnosis. JNCI 2017; 109 (10): djx064
Etzioni RD, Connor RJ, Prorok PC, Self SG. Design and analysis of cancer screening trials. Statisitcal Methods in Medical Research 1995; 4(1): 3-17.
Prorok PC, Kramer BS, Miller AB. Study designs for determining and comparing sensitivities of disease screening tests. J Med Screening 2015; 22(4): 213-220.
Buys SS, Partridge E, Black A, et al. Effect of screening on ovarian cancer mortality,the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial. JAMA 2011; 305(22): 2295-2303.

Byline

Barry Kramer

Contractor, NCI Division of Cancer Control and Population Sciences, Office of the DCCPS Director; Former director of the NCI Division of Cancer Prevention

Philip C. Prorok

Contractor, NCI Division of Cancer Prevention, Office of the Director, Division of Cancer Prevention; Former chief of the Biometry Research Group at DCP

DOWNLOAD THE PDF

Table of Contents

YOU MAY BE INTERESTED IN

Podcast The Directors

The Directors: Two leaders in data science have a warp speed vision for AI, with cancer clinical trials leading the way
Knudsen and Thoelke: “It’s actually solving for bottlenecks and challenges that we’ve been unable to overcome for so long”

The United States has a translation problem. Scientific breakthroughs in oncology are accelerating, but clinical trial infrastructure hasn’t kept pace.

July 17, 2026

Vol.52 No.28

By Paul Goldberg and Claire Marie Porter

Regulatory News

FDA final guidance on psychedelics opens the door to psycho-oncology research
Guidance permits treatment without psychotherapy, alarming scientists, advocates

FDA has released a final guidance for drug sponsors conducting clinical trials of psychedelics, including psilocybin, LSD, and MDMA, for the treatment of psychiatric and substance abuse disorders.

OMB proposal to sideline peer review sparks widespread opposition
Nearly 500,000 public comments flood in—here are some highlights

The Trump administration’s effort to give political appointees greater authority over federally-funded science is colliding with fierce objections from researchers, universities, medical organizations, and members of Congress.

July 17, 2026

Vol.52 No.28

By Claire Marie Porter

Guest Editorial

Cancer progress is not reaching everyone. Native Nations must be part of the solution.

In 1971, the National Cancer Act set the United States on a path to confront one of its most formidable public health challenges. More than five decades later, the progress is undeniable. The latest 2026 report from the American Cancer Society shows continued declines in overall cancer mortality and improvements in survival for many major cancer types. Recent advances in early detection, prevention, and treatment are saving lives.

July 17, 2026

Vol.52 No.28

By Rodney Haring

The future of oncology: Early-career scientists need mentorship and a new academic model

The early years of a scientific career can be unforgiving.

July 17, 2026

Vol.52 No.28

By Marcel van den Brink

Cancer Policy

House Energy & Commerce hearing focuses on America’s leadership in biomedical research, examines FDA role

At a recent House Energy and Commerce Health Subcommittee hearing titled “Maintaining America’s Leadership in Biomedical Innovation: FDA’s Role in Advancing U.S. Drug Development,” lawmakers examined ways to increase efficiency and speed across early-stage clinical trial development.

July 17, 2026

Vol.52 No.28

By Claire Marie Porter

A comparison of study designs for estimating overdiagnosis in cancer screening

YOU MAY BE INTERESTED IN

The Directors: Two leaders in data science have a warp speed vision for AI, with cancer clinical trials leading the way
Knudsen and Thoelke: “It’s actually solving for bottlenecks and challenges that we’ve been unable to overcome for so long”

FDA final guidance on psychedelics opens the door to psycho-oncology research
Guidance permits treatment without psychotherapy, alarming scientists, advocates

OMB proposal to sideline peer review sparks widespread opposition
Nearly 500,000 public comments flood in—here are some highlights

Cancer progress is not reaching everyone. Native Nations must be part of the solution.

The future of oncology: Early-career scientists need mentorship and a new academic model

House Energy & Commerce hearing focuses on America’s leadership in biomedical research, examines FDA role

Renew today!

Subscriber content

Never miss an issue!

Login

A comparison of study designs for estimating overdiagnosis in cancer screening

YOU MAY BE INTERESTED IN

The Directors: Two leaders in data science have a warp speed vision for AI, with cancer clinical trials leading the wayKnudsen and Thoelke: “It’s actually solving for bottlenecks and challenges that we’ve been unable to overcome for so long”

FDA final guidance on psychedelics opens the door to psycho-oncology researchGuidance permits treatment without psychotherapy, alarming scientists, advocates

OMB proposal to sideline peer review sparks widespread oppositionNearly 500,000 public comments flood in—here are some highlights

Cancer progress is not reaching everyone. Native Nations must be part of the solution.

The future of oncology: Early-career scientists need mentorship and a new academic model

House Energy & Commerce hearing focuses on America’s leadership in biomedical research, examines FDA role

Never miss an issue!

Login

The Directors: Two leaders in data science have a warp speed vision for AI, with cancer clinical trials leading the way
Knudsen and Thoelke: “It’s actually solving for bottlenecks and challenges that we’ve been unable to overcome for so long”

FDA final guidance on psychedelics opens the door to psycho-oncology research
Guidance permits treatment without psychotherapy, alarming scientists, advocates

OMB proposal to sideline peer review sparks widespread opposition
Nearly 500,000 public comments flood in—here are some highlights