In anticipation of an infusion of funds from Congress, NCI is developing a blueprint for a comprehensive cancer data federation—starting with pediatric cancer.
“I think data aggregation, data federation, is something we need throughout cancer research, but it’s a particularly pressing need in pediatric cancer research,” NCI Director Ned Sharpless said to The Cancer Letter in his first detailed comments on this issue.
The data federation that NCI has in mind would allow researchers to move seamlessly between types of data—clinical records, genomic information, pathology and outcomes data—as well as through different platforms where databases are stored, be they in Bethesda, Philadelphia or Memphis.
“We envision this to be a very high-grade dataset that will be useful for real cutting-edge translational and basic research,” Sharpless said. “This quality, this size, this scope doesn’t exist in any area of biomedical research.
“And so, this is an important first step in learning how useful radical data sharing and aggregation can be. Therefore, we really expect it to inform not just childhood cancer, but every kind of cancer.”
A conversation with Sharpless appears here.
NCI’s plans, developed over the past month with key players in childhood cancer, are a response to a pledge by President Donald Trump to dedicate more federal funds to pediatric cancer research (The Cancer Letter, Feb. 8).
“Many childhood cancers have not seen new therapies in decades,” Trump said at his State of the Union address Feb. 5. “My budget will ask the Congress for $500 million over the next 10 years to fund this critical life-saving research.”
Trump is expected to release his budget proposal on March 11. If congressional appropriators concur with his request, insiders anticipate that these funds would be added to NCI’s budget. At this writing, it’s not publicly known whether the president will seek an increase for NIH and NCI, or whether he will propose dramatic cuts, as he has in the past two years (The Cancer Letter, Feb. 16, 2018, May 26, 2017).
“The support [for pediatric cancer research] that the president suggests—$500 million over 10 years—is wonderful and appreciated, but that is not enough money to boil the ocean in terms of big data,” Sharpless said. “But $50 million a year for 10 years is a significant investment. I mean, that would help a lot. Certainly, Congress decides the appropriation, were they to give us more, we’d find a use for it. I mean, NCI could always use more support for great cancer research.
“For this to be successful, we have to leverage existing investments and make sure we use the datasets that are already out there and try and link them, and get data and pull data from them to get into this common aggregated and federated dataset that lives in the cloud.”
The pediatric cancer community is in universal agreement that the data needs for childhood cancer research are not currently met, said Adam Resnick, director of the Center for Data Driven Discovery in Biomedicine at the Children’s Hospital of Philadelphia and scientific chair for several consortia-based efforts, including the Children’s Brain Tumor Tissue Consortium and Pacific Pediatric Neuro-Oncology Consortium.
We envision this to be a very high-grade dataset that will be useful for real cutting-edge translational and basic research. This quality, this size, this scope doesn’t exist in any area of biomedical research.
Ned Sharpless
“Through such integrative efforts that look at leveraging a tremendous opportunity to think about pediatric cancer anew—in ways that leverage emerging data-science supportive technologies, cloud-based resources, and community engagement—I think this can really transform the research landscape in terms of its capacity to accelerate discovery, diagnostics, and have immediate impact in the context of clinical translation, potentially prospectively for each and every individual patient across the U.S.,” Resnick said to The Cancer Letter.
“For us, it’s extremely exciting to hear and see both the NCI and the administration and other community partners and patient groups really coming together and reconsidering, ‘What are the unmet needs that can inform in new ways through new approaches of integration?’
“This is a time of new technologies, new initiatives and efforts, the emergence of an NIH data commons landscape, the ongoing growth of efforts of the NCI in terms of establishing and developing a data commons framework, the proposed new resources and influx of funding provide, and just-in-time opportunity for our community to engage in defining how all these parallel and intersecting efforts can be brought to bear in the context of the pediatric enterprise and the translation to impact.”
A conversation with Resnick appears here.
Sharpless said the data federation would build on existing initiatives at NIH and NCI—including the Cancer Research Data Commons, the Genomic Data Commons, TARGET, the pediatric version of The Cancer Genome Atlas, as well as the Gabriella Miller Kids First Pediatric Research Program, a trans-NIH initiative that receives $100 million over 8 years through the Kids First Research Act.
The institute would also leverage collaborations with academic institutions and research networks, including St. Jude Children’s Research Hospital, the Children’s Oncology Group, and CHOP.
“None of these existing things are perfect,” Sharpless said in an interview. “They all have some aspects of the elements we want, but by putting them all together and making them searchable—the vision is that you would just go in as a researcher and look for, say, who with neuroblastoma responds to adriamycin. And you would know if that was a St. Jude’s patient, or a COG patient, or wherever the source came from.”
Private sequencing companies, including Foundation Medicine, would likely play a role in the data federation as well.
“I think everything’s on the table as to how we build this out. It is unimaginable to me, given the expertise that exists for data analysis and data aggregation in the private sector, that we wouldn’t be relying heavily on industry partners for some aspects,” Sharpless said. “Once the common structure is there, it allows everybody to contribute data to the sandbox and all things work better.”
Among international stakeholders, The World Health Organization would be a key partner—in September 2018, St. Jude and WHO formed a collaboration that aims to cure at least 60 percent of children with cancer worldwide by 2030 (The Cancer Letter, Oct. 12, 2018).
St. Jude launched its own data-sharing platform, St. Jude Cloud, in April 2018. To date, the platform is the largest public repository of pediatric cancer genomics data, with 5,000 whole-genome, 5,000 whole-exome, and 1,200 RNA-seq datasets. The Memphis, TN, hospital expects to make 10,000 whole-genome sequences available later this year.
“St. Jude Cloud is the world’s largest repository for pediatric cancer genomics data, including pediatric cancer and cancer survivorship data,” said Charles Roberts, executive vice president, director of the Comprehensive Cancer Center, and director of the Molecular Oncology Division at St. Jude. “This also reveals the thirst for data sharing as, since its launch less than a year ago, more than 800 people from over 400 institutions have registered. They get immediate access to data in the cloud that previously would have taken weeks to download.”
A conversation with Roberts appears here.
The data federation should complement other existing efforts through coordination of resources, since NCI is already dedicating over $2 billion over 10 years for pediatric cancer, including funding from the Beau Biden Cancer Moonshot, said Vincent Miller, chief medical officer of Foundation Medicine.
“One of the key things in any of these efforts is not to let perfection be the enemy of excellence,” Miller said to The Cancer Letter. “And that being said, the pediatric space is unique in that patients tend to be cared for in a much more manageable and more uniform way as far as number of institutions, number of EMRs, clinical trial participation, etc., than in the adult oncology ecosystem.
“Certainly, at Foundation Medicine, we provide genomic data on a large number of patients dealing with pediatric cancer. A couple of thousand patients are actually on our website. They’re formatted for researchers as part of a portal.”
In July 2016, Foundation contributed 18,000 cases to NCI’s Genomic Data Commons, without renumeration (The Cancer Letter, July 29, 2016).
“We’ve been big supporters of these types of initiatives on the pediatric front,” Miller said. “We’ve got the appropriate template agreements in place. We’ve got precedent for doing this. We’ve shared and worked through some of the glitches that are always common in large data transfers. So, we’re certainly excited to both contribute to the discussion but also contribute meaningfully on the data front.”
One of the key things in any of these efforts is not to let perfection be the enemy of excellence. And that being said, the pediatric space is unique in that patients tend to be cared for in a much more manageable and more uniform way as far as number of institutions, number of EMRs, clinical trial participation, etc., than in the adult oncology ecosystem.
Vincent Miller
The childhood cancer community will benefit from NCI’s vision to create a broader data federation for cancer research, said Peter Adamson, chair of the Children’s Oncology Group and professor of pediatrics at CHOP.
“I don’t think there’s any question that there’s going to be some return on the investment to do that,” Adamson said to The Cancer Letter. “Part of this discussion, which is an important discussion to have, is it’s always great that the president brought childhood cancer and the problem of childhood cancer to the forefront, which is always welcome.
“But, we also need to have a robust budget for the NCI as a whole. If we’re unable to grow the NCI budget as a whole, I think childhood cancer is going to be challenged, along with other cancers. So, I don’t think you could do one without the other, and obviously doing both would be ideal. But I don’t think you can shrink the NCI budget and have as much of an impact with $50 million a year for childhood cancer.”
NCI needs to invest more in epigenetics and research on outcomes of children with cancer, Adamson said.
“There are still many cancers where we don’t know the drivers. It’s not revealed by sequencing and it may as well be in the epigenome,” Adamson said. “So, I do think there’s a need to build upon some existing infrastructures that do capture biospecimens and outcomes and making sure that we are able to learn from every child with cancer in the country by building up.
“And that’s part of the related STAR Act: Survivorship, Treatment, Access, and Research. It’s to make sure that we have a biorepository system that can help feed investigator-initiated research as well as other initiatives.”
NCI’s new data initiative will also build upon survivorship and biospecimen collection efforts funded through the STAR Act, which authorizes NCI to spend up to $30 million per year over 5 years, beginning in 2019.
“I think the STAR Act is, in some ways, a great taking-off point for this initiative,” Sharpless said. “But I think it’s also important to say that this initiative would not only facilitate and improve survivorship research and biospecimen analysis, but I think it really helps with every area of pediatric cancer research.”
The genomic characterization of pediatric cancers has allowed researchers to understand, in reasonable detail, what drives childhood malignancies both at diagnosis and at relapse.
“There’s plenty of evidence in the pediatric context that targeted-based approaches or precision-based approaches that are defined by molecular context are extremely effective,” CHOP’s Resnick said.
“What we’re now finding is that what has been pathologically described as one disease is potentially five or six or 11 different diseases, when you start looking at the detailed molecular biology,” Resnick said.
“And that presents a challenge for a whole new type of way of thinking about creating the kinds of clinical trials where the right patients are selected for the trials in ways, that with smaller numbers, are more likely to actually achieve meaningful, statistically significant results, as supported by molecular definition of disease.”
With funding support from the Moonshot, NCI is investing in fusion oncoproteins, a driver for many childhood cancers, COG’s Adamson said.
“One of the classic ones is in Ewing sarcoma with EWSR1,” Adamson said. “We still haven’t come up with a therapeutic approach, even though we’ve known about this for well over 20 years.
“Because childhood cancers don’t undergo a long evolutionary period, many occur within a short developmental period and not from years or decades of exposure. When you do find an aberration, it’s more likely to be fundamental to the malignant process than for potentially cancers that have accumulated many, many aberrations and knowing what the drivers are is far from trivial.
“What we often argue, in part, is when we find something in a childhood cancer that is a target in driving a cancer, it’s often fundamental and can apply more broadly than to the rare childhood cancer. So, I do think defining that landscape often points to clear drivers.
“I think what we’ve also learned is that the initial sequencing efforts are not going to uncover everything we wanted to know as far as what the drivers are. And I think that’s right now where there’s increasing interest in the epigenome.”
There is likely a need for more comprehensive clinical genomics that can be linked directly to clinical care and implementation, Resnick said.
“I think what we’re finding more and more is that more comprehensive processes that essentially look at the entire genome—like a whole-genome sequencing, as opposed to a whole-exome or a panel—provide a larger amount of information that can complement current clinical efforts,” Resnick said.
“I think there’s a lot of interest, by our community, in thinking about comprehensive clinical data generation, and this is driven in part by the recognition that the cost of large-scale clinical data generation is now dropping in ways that it will indeed be feasible for clinical whole genome sequencing, for example, to occur within a short period of time. Big data is set to transition from largely occurring in the context of research to, in the very near future, be the standard of care in the clinical context.
“And so, many in our community recognize that it won’t be too long before, for example, clinical whole genome sequencing combined with RNA-seq is the starting digital footprint of an electronic health record in ways that would suddenly make big data a daily reality that right now is still largely restricted by our community to research-grade data sets.”
At St. Jude, whole-genome sequencing and whole-exome sequencing have a minimum of 30X and 100X coverage, respectively, which is the standard used for cancer genomic research, said St. Jude’s Roberts.
“The current coverage provides us with 90 percent of the power for identifying mutations present in 20 percent of the bulk tumors and deep sequencing by panel may enable discovery of additional variants present in smaller subclones,” Roberts said. “For others, provision of panel data or exomes enables new discoveries. We are in the process of learning tumor heterogeneity by performing single-cell DNA and RNA sequencing.
“While in 2019 we understand the genetics of cancer so much better than a few years ago, we still have a long way to go. Increasingly we’re learning that there are numerous cancer-driving mutations that can only be identified via the combination of RNA-seq and/or whole genome data. And additional information can come from methylation and ATAC-seq analyses.”
In his State of the Union address, Trump seemed to make a significant personal commitment to the $500 million, said Nancy Goodman, founder and executive director of Kids v Cancer.
“He brought out a beautiful girl who survived cancer, inspirational Grace,” Goodman writes in a guest editorial for The Cancer Letter. “The president asked us to be emotionally invested in Grace, as he was. He told us that ‘nurses and doctors cried when Grace finished chemo.’ He concluded: ‘Grace—you are an inspiration to us all.’”
Goodman’s guest editorial appears here.
“NCI has a terrific project for the funds—a clinical database of pediatric cancer with deep genomic sequencing, clinical records, and data federation,” Goodman writes. “The private genomic sequencing and big data industry’s expertise and resources could be used to help design, build and populate this dataset.
“If we don’t get new funds—meaning the $50 million per year above the $30 million Congress authorized under the Childhood Cancer STAR Act, then Grace was just emotional bait. That would be really lousy. The president’s offer will have been a cheap shot, an exploitation of Grace and of all of us whose children have been treated for or have died of cancer.”