publication date: May. 10, 2019

Trials & Tribulations

Estimating overdiagnosis is hard—we need all the help we can get

Ruth Etzioni

Ruth Etzioni

Division of Public Sciences,
Fred Hutchinson Cancer Research Center,
Center for Early Detection Advanced Research,
Knight Cancer Institute


Roman Gulati

Roman Gulati

Division of Public Sciences,
Fred Hutchinson Cancer Research Center


Over the past decade there has been much consternation about overdiagnosis, the detection of cancers that would not have been diagnosed without screening.

The growing awareness of the problem of overdiagnosis has gone hand in hand with a collective sobering up about early detection.

The reality is that the size of the population that stands to benefit is much smaller than the population exposed to potential harm. Only a small minority of those screened for any given cancer have their lives extended by the test, since only a small minority would have died of that cancer.

At the same time, every person screened is exposed to potential harms from overdiagnoses and false-positive tests.

This does not automatically imply that screening should be abandoned, but it does lead to a nuanced perspective regarding benefit vs. harm. This, in turn, creates a need to quantify harm-benefit tradeoffs and brings us to the challenge of estimating overdiagnosis.

The frequency of overdiagnosis is extraordinarily difficult to assess. To know whether any screen-detected case has been overdiagnosed would require ignoring the diagnosis, withholding treatment, and waiting to determine whether other-cause death precedes clinical (symptomatic) detection.

Since this practically never happens, we are left with two methods for estimating overdiagnosis, excess incidence and lead-time modeling. Both were covered as part of a recent commentary (Kramer BS and Prorok PC, “A comparison of study designs for estimating overdiagnosis in cancer screening,” The Clinical Cancer Letter (CCL), May 3).

The first method recognizes that an overdiagnosed case is a true excess diagnosis—a cancer detected only because of the screening test. This reasoning leads to an excess-incidence formulation for capturing the extent of overdiagnosis: subtract the incidence without screening from the incidence with screening to yield the incidence of overdiagnosis.

The excess-incidence method is simple in principle, but studies (e.g. Duffy SW and Parmar D, “Overdiagnosis in breast cancer screening: the importance of length of observation period and lead time,” Breast Cancer Research, May 2013) have shown that it is difficult to get right in practice.

Indeed, many published excess-incidence estimates are inflated and have contributed to alarmist media headlines about overdiagnosis such as “The Great Prostate Mistake” (New York Times, April 2010) and “It’s Time to Rethink Cancer Early Detection,” with a table titled “Fatal Retraction” (Wall Street Journal, Sept. 2014).

Why is it so hard to get excess incidence right in practice?

There are several reasons. In the population setting (breast or prostate cancer in the U.S. for example), incidence with screening is generally available, but the background incidence without screening is not.

Attempts to impute this may ultimately amount to guesstimates that cannot be verified or defended. In the trial setting, the control group provides the background incidence, but the trial design, measure of excess incidence, and follow-up duration must satisfy very specific conditions in order to avoid a result that is provably biased (Gulati R, Feuer R, Etzioni R, “Conditions for valid empirical estimates of cancer overdiagnosis in randomized trials and population studies,” American Journal of Epidemiology, July 2016).

The recent CCL commentary concurred that the best chance of a valid result comes from cumulative excess incidence in a trial with a stop-screen design and adequate follow up after screening stops.

In other settings, corrections have been suggested as having potential to reduce bias (e.g. Ripping TM et al, “Quantifying overdiagnosis in cancer screening: A systematic review to evaluate the methodology,” JNCI, Oct. 2017).

We agree with the CCL commentary that excess incidence is a useful idea and can, in ideal settings, produce a valid result. But we do not agree with its dismissal of the second method for overdiagnosis estimation, lead-time modeling. Before explaining why we believe that lead-time modeling has value, we provide some background about this approach.

Lead time is defined as the time by which diagnosis is advanced by screening. This, in turn, is inextricably linked to the underlying disease latency. Lead-time modeling first estimates the lead time distribution. Then, overdiagnosis is estimated as the proportion of individuals whose lead time is longer than their time to death from other causes, which can be estimated from life tables.

Lead-time modeling uses excess incidence, but indirectly. A seminal study published in the early 1990s (Feuer EJ and Wun LM, “How much of the recent rise in breast cancer incidence can be explained by increases in mammography utilization?” American Journal of Epidemiology, Dec. 1992) showed that the increase in incidence expected under screening ties directly to the lead time—given a specific pattern of screening dissemination, longer lead times produce a more pronounced rise in incidence than shorter lead times. In principle, therefore, we should be able to learn about lead time and disease latency from excess incidence under screening.

In fact, the connection between excess incidence and disease latency has a rich history in both the cancer and HIV literatures, and was harnessed in many studies of HIV-tested cohorts to predict the size of the AIDS epidemic in the 1980s and 1990s (e.g., Brookmeyer R, “Reconstruction and future trends of the AIDS epidemic in the United States,” Science, July 1991).

So, the notion of using excess incidence to inform about disease latency is well established. Why, then, the dismissiveness of lead-time modeling for estimating overdiagnosis? As the CCL commentary explains, this arises from concerns about the assumptions made by the models.

The majority of published modeling studies make a simplifying assumption about the shape of the disease latency distribution. They do not explicitly include a non-progressive or indolent fraction of cases which have infinite latency. This leads to a concern that lead-time estimates may be substantially biased and becomes grounds for some to dismiss the entire approach.

It is true that models inevitably simplify disease biology and that the estimated lead-time distribution will be different if a fraction of non-progressive cancers is allowed. But the model-based procedure for estimating overdiagnosis is agnostic to whether the lead time is infinite or simply very long.

If the assumed family of distributions allows for a fraction of lengthy lead times, and the models are identifiable (they can be uniquely estimated on the basis of the available data), the results should permit approximation of the overdiagnosis frequency or at least provide a sense of whether it is likely to be non-trivial.

In practice, modeling studies should examine multiple shapes for the lead time distribution as part of a thorough sensitivity analysis. Identifiability of the estimates should ideally be confirmed, even though this can be quite challenging and data-set dependent (Ryser, M et al, “Identification of the Fraction of Indolent Tumors and Associated Overdiagnosis in Breast Cancer Screening Trials,” American Journal of Epidemiology, Jan. 2019). Uncertainty in the estimates should be quantified and potential sources of bias considered.

Not least, model descriptions should be transparent and methods described should be reproducible. It is understandable that skepticism about models surfaces when these steps are not followed. But there are thoughtful modeling analyses that have provided useful insights about overdiagnosis in both trial and population settings.

In conclusion, both excess incidence and lead-time modeling have the potential to produce misleading results. But today we have a better understanding than ever before of the circumstances under which the two methods can be trusted. We need to bring this understanding to how we estimate and report the extent of overdiagnosis. Regardless of estimation approach, high-quality efforts should be recognized. Where feasible, it can be illuminating to compare multiple approaches (Etzioni R, et al, “A reality check for overdiagnosis estimates associated with breast cancer screening,” JNCI, Dec. 2014).

Well-founded cancer screening policies and properly-informed patient decisions are at stake.

Copyright (c) 2020 The Cancer Letter Inc.