Matthew Ong: You have a pretty comprehensive database at St. Jude, but overall, as a community of pediatric researchers, is there a need for something bigger, better?
Charles Roberts: I think there is a substantial unmet need.
This year, Jinghui Zhang from Computational Biology at St. Jude, working with the National Cancer Institute and Children’s Oncology Group, performed genomic analysis of 1,699 childhood cancers and found that 55 percent of the genetic mutations that drive pediatric cancer are not found in adult cancers.
Historically, cancer research drug development has largely been focused upon adult cancers, with drugs trickling down to pediatric trials over time.
But we now know that 55 percent of the driver mutations are unique to childhood cancers. So, relying upon that old model is not going to serve children well.
How do we best serve children fighting cancer? First and foremost it is collaboration across institutions and across areas of expertise, spanning from basic to translational and clinical research.
The greatest impact will require a combination of laboratory investigation to reveal the mechanisms by which these unique mutations drive cancer, focused efforts on pediatric cancer translate into clinical trials specifically for children, and robust data sharing.
Given the many pediatric-unique mutations and that differential therapeutic responses are affected by heterogeneity within cancer types and driven by sub-class specific mutations, getting samples and data shared are essential for advancing the field.
With respect to biopsy samples, the number of cases is smaller compared to adult cancers, the biopsies are often small, so samples are limiting and can be gone quickly.
To address this limitation, one of the approaches we’ve been taking is to systematically put samples into mouse models and to comprehensively characterize both the original tumor and the mouse PDX samples via whole genome and whole exome sequencing, paired with RNA-seq, methylation profiling. And then make all of the PDX samples and genomic data freely available.
Demand is clearly there as our Childhood Solid Tumor Network has sent out over 1,300 vials to 194 investigators at 93 institutions in 15 countries. And our PROPEL resource has more than 200 samples of leukemias to share free of charge.
The need for sharing goes beyond samples—its big data, too. We developed the St. Jude Cloud for this purpose. It provides researchers around the world access to the world’s largest public repository of pediatric cancer genomics data.
This also reveals the thirst for data sharing, as since its launch less than a year ago, more than 800 people from over 400 institutions have registered.
They get immediate access to data in the cloud that previously would have taken weeks to download.
How many cases do you have in the cloud thus far?
CR: The St. Jude Cloud already has more than 5,000 whole genome, whole-exome, and 1,200 RNA-seq datasets from more than 5,000 pediatric cancer patients and survivors.
We continue to add more whole genome sequences and expects to make 10,000 of those available at AACR this month.
At ASCO, we’re going to be announcing that comprehensively sequenced and clinically annotated patient-derived data will be made available to others in real time, rather than holding them back for months or years in order to accompany a publication.
Is the St. Jude’s database currently the most well-annotated, well-aggregated and most comprehensive database on childhood cancer?
CR: St. Jude Cloud is the world’s largest repository for pediatric cancer genomics data, including pediatric cancer and cancer survivorship data.
This data includes whole-genome sequencing, not just whole-exome sequencing. With whole-genome, whole-exome and RNA-seq data, we are already making novel discoveries.
A St. Jude study published in Nature Medicine last week found whole genome sequencing led to the discovery of gene fusions common in childhood melanoma.
St. Jude Cloud also has a collection of bioinformatics tools to help both experts and non-specialists gain novel insights from genomics data.
These tools include validated data analysis pipelines and interactive visualization tools to make it easier to make discoveries from large datasets. Data and results can be securely shared with collaborators within the platform.
One of the biggest choke points in advancing cancer research is the need for computational biologists. Often, scientists and physician scientists have good questions and ideas but don’t have access to dedicated computational biologists as they’re expensive.
What the St. Jude Cloud provides is brilliant. The tools and designed to be accessible and user-friendly and displaying results in a way that biologists and physicians can understand the impact without needing a computational biologist. Indeed this was a major driver behind why we created St. Jude Cloud, and I do think it’s world-leading.
Another major advance is providing all of the data in the cloud so that analyses can be performed without having to download the data.
Scott Newman, one of our Bioinformatics Group leaders, prior to coming to St. Jude found that it took over seven months just to download 10TB data of 92 high grade gliomas from the Pediatric Cancer Genome Project.
With the advent of the St. Jude Cloud, investigators can analyze data directly in the cloud.
Indeed, if they choose, they can upload their own tools, choosing whether to share them, and immediately analyze all of the PCGP data.
How much has St. Jude spent on this in total, to get the cloud to where it is today? And how much do you continue to spend every year? What’s your annual budget?
CR: We spent $3.3 million to develop and launch the St. Jude Cloud, which included $500,000 in support from DNANexus and $2 million from Microsoft.
We have budgeted $14.6 million and 15 positions to support it over the first five years, through 2023.
Additionally, on an annual basis DNANexus has been providing $500,000 and Microsoft over a million dollars in storage and computational cost.
Equipping not just the scientists and doctors at St. Jude, but, in real time equipping investigators at Dana-Farber, Memorial Sloan Kettering, Stanford, Seattle Children’s Hospital, and people in Princess Máxima in the Netherlands, and around the world to have access to this data, that will facilitate so many more advances.
For something like this to be useful, it has to be research-grade; right?
CR: Yes. Absolutely. Most of the tools are published in peer-reviewed journals including ProteinPaint which was published in Nature Genetics in 2015.
What does it take to make a database like this research-grade? Do you need deep sequencing?
CR: There are some questions that have yes/no answers, such as whether a patient carries a particular mutation. Our whole-genome sequencing and whole-exome sequencing have a minimum of 30X and 100X coverage, respectively, which is the standard used for cancer genomic research.
The current coverage provides us with 90 percent of the power for identifying mutations present in 20 percent of the bulk tumors and deep sequencing by panel may enable discovery of additional variants present in smaller subclones.
For others, provision of panel data or exomes enables new discoveries. We are in the process of learning tumor heterogeneity by performing single-cell DNA and RNA sequencing.
While in 2019 we understand the genetics of cancer so much better than a few years ago, we still have a long way to go.
Increasingly, we’re learning that there are numerous cancer-driving mutations that can only be identified via the combination of RNA-seq and/or whole -enome data.
And additional information can come from methylation and ATAC-seq analyses.
Say, for instance, if you’re looking at germline mutations and you’re looking for new, actionable targets that are unique to pediatric malignancies, do you have to do whole genome sequencing to get there? Or is this a clinical-grade question?
CR: For current, actionable clinical questions, typically targeted panels are sufficient. But we know that the list of recognized germline predisposing mutations will continue to grow.
There are interesting correlations between germline variations and genome-wide somatic alteration profile. For example, BRCA-like mutational signature has recently been reported to be a good predictor for sensitivity to PARP inhibitors.
Given the low mutation burden of pediatric cancer, a genome-wide approach is required to ensure robust result of mutational signature analysis.
Furthermore, germline copy number alterations and structural variations have rarely been explored and variants in regulatory regions will also need a genome-wide approach.
For this reason, to account for both current clinical needs and continued discovery research we perform all of these on each new cancer patient at St. Jude.
As you know, the White House promised $500 million—of course it’s unclear whether that’s coming through just yet—but with $500 million over 10 years, what can we realistically achieve, and is that enough?
CR: As I mentioned, we now know that while pediatric cancer shares some features and mutations with adult cancers, for the majority of mutations pediatric cancer is different.
Any investment is welcome. There’s no question that the field of pediatric researchers can put that scope of investment to good use.
This funding will facilitate better data sharing, which is critical for researchers nationwide and around the world to understand the unique nature of pediatric cancer.
As a field, it will enable us all to be able to better identify problems and enable people to develop new ideas about how we can best intervene.
Do we need a federated model of data sharing in terms of infrastructure, and would NCI be in a good position to facilitate that?
CR: I think the federated model is absolutely the way to go.
Past efforts to set up central databases haven’t always worked. The reason is the field is in evolution. The type of data we need is changing. The questions we’re asking are changing.
Just like in any innovative field, whether it be Silicon Valley or similar industries, people are thinking and collaborating and competing and new ideas come up and you suddenly say, “Oh, that’s much better,” and the whole field changes.
That innovation enables a number of experiments and different approaches to develop rapidly.
St. Jude is developing new tools and new ways of thinking that we think are revolutionary for our field, and we want the community to have access to those resources.
That’s why we want to provide everything for free and enable other researchers. It’s critical, and part of our mission, to share these innovations with our peers.
And others are interested in focused sharing of pediatric cancer data too. The NCI-supported Gabriella Miller Kids First Data Resource Portal has substantial genome data.
We have been collaborating with HudsonAlpha Institute for Technology and developed a genome sequencing center for the Kids First program with the focus on generating and uploading the high quality genomic sequencing data for pediatric cancer.
The UCSC Treehouse has RNA-seq data and we have already shared our RNA-seq data with the UCSC Treehouse team.
Additionally, we have discussed with the Kids First team on multiple occasions the feasibility of developing methods that will make the model of federated data sharing a reality.
We do think that this federated model is important in order to support advances most rapidly and we’re excited about the idea of NCI further supporting data sharing.
However, this is technically challenging and will require dedicated effort to tackle this problem.
Who else has full capacity to be able to generate high grade data and do the sequencing that required? Or, perhaps, it’s truly a team effort.
CR: Beating pediatric cancer will clearly take a team effort. At St. Jude, we’re fortunate to we have the ability and capacity to make a major impact. It’s intrinsic to our mission.
Danny Thomas didn’t say, “No child should die in the dawn of life in Memphis, TN” or “No child should die in the dawn of life in the United States.”
He said, “No child should die in the dawn of life.” And I think that is a mission that everyone can support.
We know that no single institution can maximize cures and minimize toxicities alone.
In collaboration with COG and many other institutions, if our scientific expertise, our data and our analysis tools speed advances in research and answer important questions, that’s the goal.