An analysis by researchers from the International Agency for Research on Cancer casts doubt on validity of an endpoint used in key studies of multi-cancer detection tests.
Published in JAMA April 7, the IARC meta-analysis compares the standard endpoint of cancer-related mortality to the alternative endpoint of reduction of incidence of late-stage cancer.
The study examined the relationship between these two endpoints in 41 randomized clinical trials evaluating single-disease screening tests.
The IARC team extracted data on the numbers of participants, cancer diagnoses, and cancer deaths in the intervention and comparison groups.
The study concludes that the correlation between reductions in late-stage cancer and cancer-specific mortality varies widely by cancer type, making it impossible to uniformly infer associations between reduced late-stage cancer and reduced cancer mortality.
Not everyone has bought into the reduced late-stage cancer endpoint. NCI, for example, appears to be committed to using cancer-related mortality, the “gold standard” for screening trials, as the primary endpoint when studying the clinical benefit of MCDs.
The JAMA paper appears to be relevant to three large trials of the MCDs:
- NHS-Galleri is a prospective, randomized, controlled trial conducted by a GRAIL LLC, Cancer Research UK, and the UK’s National Health Service to determine whether screening with the Galleri MCD test reduces the primary outcome of the number of stage 3 and 4 cancer diagnoses. The study has enrolled 140,000 participants.
- Galleri-Medicare Real-world Evidence to Advance Multi-Cancer Early Detection Health Equity (REACH) aims to enroll up to 50,000 Medicare beneficiaries and will measure reduction in diagnosed stage 4 cancers as a result of screening with the GRAIL’s Galleri test. Medicare will cover the costs of the Galleri test and related and routine items and services for study participants.
- PREVENT, a prospective, multicenter, interventional study of 12,500 participants plans to evaluate the performance of the OverC multi-cancer detection blood test in asymptomatic individuals with cancer risk in Sichuan, China. The primary outcome is the clinical stages of cancer patients diagnosed via the OverC multi-cancer detection blood test.
The results from the just-published analysis suggest that these trials, after years of study and millions of dollars in investment, may not determine with any certainty whether these MCDs save lives.
The Galleri test is available on the market.
“The work by Feng et al showed that cancer-related mortality remains the most appropriate endpoint for clinical evaluation of the new blood-based tests that aim to detect many cancers for which there is no evidence that screening is beneficial,” Peter Bach, chief medical officer of DELFI Diagnostics, said in an editorial, published alongside the IARC paper in JAMA. “Studies might take a little longer, but will yield a more reliable answer.”
GRAIL officials said that cancer screening trials need “a major reconsideration with regards to endpoints (including surrogates), design, and analytical methods to keep up with rapidly advancing technology.”
Responding to questions from The Cancer Letter, the company reiterated its stance that “absolute incidence of late-stage cancer as a directly measured primary endpoint in clinical studies, followed by computational disease modeling to understand potential impacts on cancer-specific mortality, represents a promising strategy for robust yet accelerated evidence regarding the clinical utility of MCED and other technologies that detect cancer in average-risk populations.”
If we have multi-cancer screening trials and we measure the ability of these tests to reduce late-stage cancer, can we then use that information to draw conclusions about their ability to reduce cancer deaths? Unfortunately, the answer is, ‘Probably not.’
Hilary A. Robbins
The NCI Vanguard pilot study, scheduled to launch in the fall, is designed to address the feasibility of using MCD tests in future randomized controlled trials.
The results from the Vanguard study would inform the design of a much larger randomized controlled trial that would be powered to assess a mortality endpoint.
“Looking at cancer mortality as an important endpoint is one of the reasons that NCI is setting up the Cancer Screening Research Network, intended to be agnostic of the technology and studying a wide variety of screening approaches, but starting with MCD assays,” Lori Minasian, deputy director for NCI’s Division of Cancer Prevention, said to The Cancer Letter.
NCI has been clear about its stance: the only way to know if MCDs benefit public health is to measure cancer-related mortality (The Cancer Letter, Jan. 12, 2024).
“One of the basic underlying principles for cancer screening is that you screen a population because, ultimately, that screening process will lead to a reduction in mortality from that disease,” Minasian said at a meeting of the NCI Council of Research Advocates March 5. “That’s why you invest in population-based screening. The question that we don’t know is, ‘Will screening people with these drastically different and new technologies actually reduce deaths from cancer?’”
Strong positive correlations between late-stage cancer endpoints and cancer-specific mortality endpoints were observed only in studies of lung cancer screening, and poor correlations were found across studies of colorectal cancer and prostate cancer. The correlation statistic appeared strongly positive across studies of ovarian cancer, but the finding relies on a small dataset, the authors said.
The results suggest that while late-stage cancer may be an appropriate alternative endpoint to cancer-specific mortality for screening for some types of cancer, it is not suitable for others.
“What we’re thinking about with these results is if we have multi-cancer screening trials and we measure the ability of these tests to reduce late-stage cancer, can we then use that information to draw conclusions about their ability to reduce cancer deaths?” Hilary A. Robbins, an epidemiologist at the International Agency for Research on Cancer and senior author of the study, said during a presentation of the research at the Association of American Cancer Research annual meeting.
“Unfortunately, the answer is, ‘Probably not.’ Because here, we were able to study five different cancer types which we’ve screened for in the past, and we learned something about how these two endpoints relate to each other. But multi-cancer tests, depending on the test, are detecting many other types of cancer for which we cannot examine this relationship, and we don’t know what it is. Then, at the end, we put all of these different cancer types together in the same test, and we measure an aggregate reduction in late-stage cancer.
“So, it becomes difficult to predict one from the other, to draw a line from reduced late-stage cancer to reduced cancer mortality.”
GRAIL stands by its chosen endpoint:
The meta-analysis by Feng et al is centered on the observed mortality benefits of single cancer screening approaches. The multi-cancer early detection (MCED) approach differs from the existing single cancer screening approaches, as GRAIL’s MCED test screens for a signal in circulating DNA that is specific for cancer that is shared by many cancers, so the historic single-cancer approach simply does not apply.
This novel MCED approach screens simultaneously for many cancer types to increase overall cancer detection in the population, which can help drive important public health benefits, as most cancer deaths occur from cancers we are not currently screening for. The efficiency, cost-effectiveness, and potential public health impact of a cancer screening test are directly tied to the prevalence of cancer in the target population, and these would likely be improved by testing for multiple cancers at the same time.
In the same way that existing single-cancer screening programs do not address performance or utility for subtypes of breast, lung, colorectal, cervical or prostate cancers, it’s challenging to think about those metrics for MCED tests cancer type by cancer type.
One of the basic underlying principles for cancer screening is that you screen a population because, ultimately, that screening process will lead to a reduction in mortality from that disease. That’s why you invest in population-based screening. The question that we don’t know is, ‘Will screening people with these drastically different and new technologies actually reduce deaths from cancer?’
Lori Minasian
Patients and their physicians should not have to wait a decade or two for access to new technologies like MCED when we know that there is potential to reduce the overall burden of late stage cancer. Screening programs for breast, cervical, and colon cancer were implemented years—if not decades—before studies with observed mortality outcomes were completed, in part because the goal of screening is to reduce suffering and premature death from cancer.
Large population-based randomized controlled trials (RCTs) with mortality endpoints are the traditional gold standard for assessing clinical utility of screening. However, even when looking at single cancers, these studies take a decade or more to observe an effect, focusing on cancer-specific mortality.
This emphasis overlooks treatment advances that would be expected to occur in that timeframe, as well as other crucial—and potentially more timely—clinical outcomes. These include a reduction in cancers diagnosed at late stages, as well as morbidity, quality of life, and physical function, all of which can be positively impacted by earlier detection and treatment, with more treatment options available for earlier stage disease.
Moreover, single cancer screening tests have historically been imaging-based or tissue visualization-based, techniques that may improve over time but not fundamentally change. In contrast, current MCED tests combine advanced genomics and machine learning, and are likely to advance more quickly. Thus trials of current MCED tests that take 10-20 years to read out will likely be irrelevant due to technological obsolescence.
Given the high and growing number of cancer deaths still occurring under current screening recommendations, GRAIL believes there is substantial opportunity cost in deferring use of screening tests with the potential to reduce cancer morbidity and mortality. The relevance of trial results reported after a decade or more will be diminished by the rapid evolution of MCED technologies and cancer treatments during that time.
In the same way that modern drug trials are being redesigned to fit with modern therapies (e.g. basket and umbrella trials), cancer screening trials also need a major reconsideration with regards to endpoints (including surrogates), design, and analytical methods to keep up with rapidly advancing technology.
We believe that absolute incidence of late-stage cancer as a directly measured primary endpoint in clinical studies, followed by computational disease modeling to understand potential impacts on cancer-specific mortality, represents a promising strategy for robust yet accelerated evidence regarding the clinical utility of MCED and other technologies that detect cancer in average-risk populations.
Studies geared to measure cancer-related mortality endpoints are indisputably the best way to tell whether the benefits of screening strategies outweighs their risks, but such trials are time consuming and require huge numbers of participants in order to garner the necessary statistical power.
NCI hopes to work around these problems by establishing a large, platform trial after the completion of the Vanguard pilot study, NCI’s Minasian said.
“Given all the technology now and where we are in this, we’d like to really have an opportunity to study more and different MCDs over time and see where the technology takes us,” Minasian said to The Cancer Letter. “So, the goal is to set up a randomized trial that will have a control group and different arms with different technologies, with the idea that potentially a new arm can come in when a company is ready.
“But the infrastructure is in place, so that we can compare each arm to the control group, but then look cross-sectionally at the different technologies potentially and understand how that fits with a natural history of some of these cancers that are difficult to find in early stages and difficult to treat regardless of stage.”
NCI anticipates that this trial design could expedite the process without losing valuable information.
“We have our statisticians doing a deep dive into this to see really, how long might this take,” Minasian said. “I don’t think we’re talking about a trial that’s not going to have answers for 20 years. The trial itself may last 20 years, if we have more and more different assays coming in, but we should be able to answer very specific questions in considerably less time than that.”
Moreover, in a divergence from the pattern of oncologic drug development, novel MCD assays are being produced very quickly, and these technologies are constantly being refined. Investigators run the risk of spending decades—and millions—studying a test that could be obsolete by the time the results are published.
NCI has thought of a workaround for this, too. MCDs can be broken down into two pieces: measurement and analysis.
Measurement of the chosen biochemical signals of the MCDs will likely not change much in the years to come, Minasian said. The majority of the boom in technological advances will be in the software algorithms responsible for interpreting these measurements.
A trial can store all the original, raw biochemical measurements in the case of this possibility, so that any changes in software can be retroactively applied to the original data.
Said Minasian:
When we talk to the companies, for the most part, there are two parts to these assays. The first part is the biochemical measurement—methylation, fragmentation, whatever the technology they have chosen to measure. They’ve been very upfront about the process of measuring whatever the biochemical aspect is.
The second piece is a software algorithm that they have developed over time to help identify their cut point. What’s positive, what’s negative?
And the algorithms for those cut-points are being refined and changing. [The companies are] updating and upgrading their software algorithm. Sometimes they’ll be changing the biochemical measurements as well and refining that. But most of the change is in the software algorithm.
So, if that’s the case, we can actually design the study with an endpoint for a prospective, secondary analysis, using the new algorithm to see how well it worked.
There are also ways to accommodate some of the changes in measurement technology. If they radically change how they measure something biochemically, then we have stored blood and we can remeasure it.
NCI doesn’t have all the answers, Minasian said. The institute hopes to learn a lot from the Vanguard pilot study.
“Some of these questions are things we intentionally want to study, so that we can figure out, is this a good idea?” Minasian said. “Is this not a good idea for the large trial? You want to be able to play with a variety of ideas in a pilot setting to explore things so that you don’t make those mistakes in the large, expensive study.”
“We care about cancer mortality; we need to simply measure it.”
For each trial included in the analysis published in JAMA, the authors assessed whether the results were statistically significant in terms of reducing cancer mortality or reducing late-stage cancer.
Of the 13 trials in which screening significantly reduced late-stage cancers, 62% showed no effect of screening on cancer mortality, and 44% of clinical trials in which screening reduced cancer mortality did not find that screening reduced late-stage cancer.
“If we care about cancer mortality as the gold standard, if we are trying to reduce the frequency with which people die from cancer, it’s really difficult to use these studies to say something about that,” study author Robbins said. “I think maybe it’s overly simplistic, but my personal opinion is that we need to simply measure mortality. We care about cancer mortality; we need to simply measure it.”
The paper addresses three important questions about the relationship between the frequency of late-stage cancer diagnosis to the accepted endpoint of cancer-related mortality, DELFI’s Bach wrote in his editorial:
First, if the alternative endpoint (ie, late stage cancer) can be substituted for the accepted endpoint (ie, cancer-related mortality), then results based on the alternative endpoint should rarely disagree with results using the accepted endpoint. There are at least 2 problems that could arise: the alternative could find a benefit that was not there when the accepted endpoint was examined or the alternative could fail to find a benefit that was present when the accepted endpoint was examined. These mismatched results bear resemblance to type I and type II errors in clinical studies, which generally have tolerance thresholds of 5% and 10% to 20%, respectively. Feng et al reported that findings based on the alternative endpoint regularly produced incorrect results. In 62% of the comparisons, the alternative endpoint of late-stage cancer incorrectly predicted that screening was beneficial when the accepted endpoint of cancer-related mortality showed that it was not. In 44% of the comparisons, the alternative endpoint failed to identify a benefit that was apparent in analyses of the accepted endpoint. The authors showed these findings were similar at statistical cutoffs of significance of α = .05 and α = .10
Second, studies of cancer screening tests also typically quantify the benefit of screening in a specific population. Because most people screened do not have the disease they were screened for, a typical measure of population benefit is the number needed to screen to prevent a death. The calculation requires an accurate measure of the mortality benefit (ie, the accepted endpoint) associated with screening. Although changes in the frequency of the alternative endpoint need not be exactly interchangeable with those of the accepted endpoint, the relationship must be consistent. Otherwise, one cannot accurately predict the magnitude of mortality benefit from the magnitude of the reduction in advanced cancer diagnoses. However, Feng et al reported strong statistical evidence that the 2 endpoints did not reliably correspond with each other (P = .004). Therefore, studies that rely on the alternative endpoint will not produce data sufficient for basic assessments of the risk-benefit tradeoff of the screening approach.
Third, the work by Feng et al assessed the correlation between the 2 endpoints across different cancer types in their analyses. A consistent correlation would be required to assume that the relation between cancer-specific mortality and late-stage cancer diagnoses is similar for the many other cancers that multiple cancer tests aim to detect. Feng et al evaluated whether the relation between the alternative and accepted endpoints were sufficiently consistent across cancers (including lung cancer, colorectal cancer, and breast cancer) that one could reasonably assume that the relation between the 2 endpoints would be similar for other cancers, such as gastric cancer, esophageal cancer, sarcoma, and lymphoma, for which evidence is lacking that screening will reduce mortality. However, Feng et al reported meaningful differences in the correspondence between the 2 endpoints for different cancers (ie, statistically significant heterogeneity; P = .02). Therefore, one cannot assume that one could substitute the alternative endpoint for the accepted endpoint for the other cancers that these tests may detect. This finding is not surprising, because cancer staging categories are neither developed nor evaluated with an expectation that they are similar across cancer types.
NCI’s Vanguard: the details
NCI formally established a plan to evaluate the clinical utility of MCDs in 2022, when the institute’s Board of Scientific Advisors unanimously approved a concept for creating a research network that, as its first project, would take on the challenge of evaluating the clinical utility of MCED assays (The Cancer Letter, June 24, 2022).
The research network proposed in 2022 has materialized in the form of the NCI Cancer Screening Research Network, or the CSRN, announced in NCI Director Kimryn Rathmell’s March 20 report to the NCI Board of Scientific Advisors (The Cancer Letter, March 22, 2024).
The first initiative from the CSRN is the Vanguard pilot study, which is expected to enroll up to 24,000 people. The Vanguard aims to:
- Assess participant willingness for randomization,
- Determine adherence to testing and diagnostic follow-up,
- Evaluate feasibility of protocol-defined diagnostic workflows,
- Determine reliability and timeliness of blood specimen testing and return by MCD companies,
- Identify facilitators and barriers to recruitment/retention/compliance of diverse participant groups.
“This [study] is designed to be very parallel, to be a mini version, of the very large randomized control trial that we would ultimately like to run,” Minasian said at the NCRA meeting. “If we have the resources, we’re hoping to set this up in a way that we learn how best to design that trial, because the biggest challenge here is what is the difference in designing a screening trial when the screening modality can detect, possibly, many different types of cancer?”
The CSRN is comprised of NCI; a Statistics and Data Management Center; a Coordinating and Communication Center; and nine Accrual, Enrollment, and Screening Site, or ACCESS, Hubs. Members of all these organizations participate in the Cancer Screening Research Network steering committee.
Fred Hutchinson Cancer Center in Seattle will be both the Coordinating and Communications Center and the Statistics and Data Management Center.
The nine ACCESS Hubs are:
- Henry Ford Health + Michigan State University Health Sciences
- Kaiser Permanente Northern California, Kaiser Permanente Southern California, and Kaiser Permanente School of Medicine
- OU Health Stephenson Cancer Center at the University of Oklahoma Health Sciences Center
- University of Colorado Cancer Center, Kaiser Permanente Colorado, and Kaiser Permanente Hawaii
- University of North Carolina Lineberger Comprehensive Cancer Center
- Virginia Commonwealth University, Inova, and Sentara Health
- Washington University School of Medicine in St. Louis
- Department of Defense Uniformed Services University
- Department of Veterans Affairs
“We knew we were starting this out as a pilot and wanted to be able to see what the experience was in different healthcare settings in the U.S.,” Minasian said. “Will these participants follow through? What are the challenges? What are the barriers?”
Patients in the Vanguard study will be randomized into one of three arms: a control arm, an “MCD 1” arm, and an “MCD 2” arm. Participants in all arms will be offered standard of care cancers screenings.
The work by Feng et al showed that cancer-related mortality remains the most appropriate endpoint for clinical evaluation of the new blood-based tests that aim to detect many cancers for which there is no evidence that screening is beneficial. Studies might take a little longer, but will yield a more reliable answer.
Peter Bach
“We’re not combining MCDs,” Minasian said. “It’s one arm per MCD assay. Everybody’s randomized because the first objective of the Vanguard study is to assess participant willingness to be randomized to this kind of a design.
“We have participants being randomized all the time. The question is, in the current environment, are people just going to go and get these assays if they’re commercially available, or are they willing to engage in the research opportunity and be willing to randomize in this particular setting?”
The study will be thorough in its data capture.
“Everybody will be offered standard of care screening,” Minasian said. “We will capture all cancers. We’ll capture their histology, their histopathology, the markers they’re associated with. It doesn’t matter whether or not those assays say they detect that cancer. We will collect all cancers across [study arms].”
Vanguard will also evaluate the feasibility of the protocol-defined diagnostic workups.
“We’re still trying to figure out how we should gather that data, how we should assess that, and determine the reliability and the timeliness of the blood specimen testing and return by the MCD companies,” Minasian said. “We’ve been very fortunate with the engagement of the MCD companies, but that’s one of the objectives of this pilot study.
“Particularly our folks in the federally qualified health centers—we actually, early on, had a conversation with a woman that runs one of the federally qualified health centers here in Maryland and found that she had quite a few challenges. The most significant challenge she had was finding the resources so that these participants could, in fact, get the follow-up for the positive test.”