publication date: Mar. 8, 2019
Conversation with The Cancer Letter
CHOP’s Resnick: Big data will transition from research to the standard of care in the clinic
Director, Center for Data Driven Discovery in Biomedicine, Children’s Hospital of Philadelphia; Scientific chair, Children’s Brain Tumor Tissue Consortium and Pacific Pediatric Neuro-Oncology Consortium
Is there a need for a consolidated data commons for pediatric cancer? What are you hearing from the childhood cancer community?
I think this would be a universal answer in the pediatric cancer community: the data needs are not currently met, and it’s for a variety of different reasons that are distinguished in the pediatric setting, compared to the adult cancer landscape.
Centralization is one approach, especially as supported by disease-specific efforts or NIH entities, but there is also a likely need for federation across a decentralized data commons landscape.
Can you describe what you’ve learned in your work on data-driven discovery, and how that approach could inform the creation of a larger, federated model for databases?
One of the main challenges, obviously, in the pediatric cancer landscape is that no single institution sees enough patients of a particular kind to collect enough specimens or generate enough data to analyze it independently, or fully interpret the datasets on their own, or even generate sufficient the data on their own to drive the accelerated impact for patients that data-driven processes can impart.
And so, if any one entity wanted to actually undertake such an effort, it would take a very long time, just because the pediatric cancer landscape, by definition, ends up being a rare disease research context, despite the fact it’s currently the leading cause of disease-related death in children.
By definition, the pediatric community has to undertake a different form of consortia or collaborative-based efforts in order to aggregate or connect either specimens or datasets in order to empower them for meaningful analysis.
A second layer of challenge is that there are much fewer of us in the pediatric research community undertaking the analysis of such datasets or sufficient numbers of us focused on the analysis of specific cancers.
And what I mean by that is that the data can be just as complex and just as challenging to understand as a melanoma cancer dataset or a lung cancer dataset as require similar levels of infrastructure and resources.
But by comparison, there are very few focused domain experts, let’s say for certain types of pediatric cancer, like medulloblastoma or neuroblastoma researchers, to fully explore, mine and interpret and iteratively re-contextualize the data.
And so, in the context of the modern, technologically evolving landscape of new types of data and their analysis, whether it’s a different modality from genomics to proteomics to single-cell datasets, or even how such data get intersected in the clinical trial context, there’s a need for the pediatric community not only to collaborate amongst themselves, but also to undertake engagement and recruitment of data-type specific domain expertise into our research community for communities who may not have yet defined pediatric cancers as a research objective.
I think it represents what we think of and what others have also recently contextualized as the opportunity and need for models of convergence research across the disparate knowledge domains and research efforts, meaning that as a community we must have infrastructure and mechanisms to provide access and use of the data in a way that’s non-local, that functions to bring together, one, the disease-specific community itself with their domain expertise, but two, also attracts and brings other research community members who may not even be researchers in the pediatric disease context, but have other expertise to inform, connect, and analyze data.
This is the transformational power of a data commons or data federation approach in accelerating translational impact.
You can imagine genomicists or proteomicists or computer scientists who may not necessarily have received their training or have ever had access to pediatric data in the past.
How can we bring them into the fold and provide them the capacity to inform pediatric cancer datasets partnering in non-local environments with disease-specific domain experts?
To accelerate discovery and translation, it’s clear that the data generator is not the only user who should inform analysis. This is in part because of the diversity of expertise required, but also because other data sets and their connectivity, immediately impart new paths to knowledge not evident in the initial investigatory-specific cohort.
Diversifying the community of expertise and its access and utilization of data will only accelerate our capacity to interpret it, just because we don’t currently have enough pediatric cancer researchers and are unlikely as a community to be able to scale our resources in a community-specific way.
And, certainly, in the data-driven research component, the NIH as a whole, I’d say, and as has been defined recently by the NIH’s strategic plans, faces a critical shortage of data scientists. Data scientists with particular expertise in pediatrics are therefore even more limited.
We have to actually think about our community in a much broader scope, and through strategic investments in pediatric cancer research, look for how such efforts can support growth via convergence and integration.
I think the challenge that the pediatric community faces is that there’s tremendous power and impact that can be harnessed by focusing, centralizing efforts in a pediatric disease entity, or perhaps even in a broader pediatric cancer-specific effort, but there’s also the risk of potentially siloing the pediatric community in doing so.
This is the critical balance data commons infrastructure and data-driven efforts must navigate. We must harness and empower community efforts through the lens of domain experts, but also recognize the design principles required to fully harness acceleration of discovery from data to information to knowledge and impact through the elimination of such domain-specific boundaries and control.
In addition to data science specific approaches, I also think there’s a strategic space, supported by new technologies and resources, for our community to look at not only bringing in other domain experts into the field in support of convergence research, across pediatric data types and modalities, but especially in cancer we also need to really think of pediatric cancer as part of the continuum of research across pediatrics, adolescents and young adults, and ultimately adult cancers, recognizing that these are indeed different and likely have different origins or causes, as kids don’t typically get cancer, because they smoke or overeat, or because of any of the other major lifestyle drivers of adult cancers.
But as a community, we’ve begun to recognize that there are many opportunities for looking at cancer more broadly and integratively across this continuum of research across ages and across cancer types.
In the pediatric cancer landscape, especially, there are additional opportunities to also expand this continuum even beyond cancer, recognizing that pediatric cancer is occurring in the context of childhood development.
That is indeed the context of the efforts being undertaking by the NIH Gabriella Miller Kids First Pediatric Research Program, which is a trans-NIH initiative, as the Kids First Data Resource Center, we’re trying to make headway in looking at ways in which cross-disease analyses across cancer and structural birth defects can also support the discovery process and its acceleration towards translational impact, recognizing that many childhood disease and syndromes have both a structural birth defect component and a cancer component connected through shared biology.
Through such integrative efforts that look at leveraging a tremendous opportunity to think about pediatric cancer anew—in ways that leverage emerging data-science supportive technologies, cloud-based resources, and community engagement—I think this can really transform the research landscape in terms of its capacity to accelerate discovery, diagnostics, and have immediate impact in the context of clinical translation, potentially prospectively for each and every individual patient across the U.S.
For us, it’s extremely exciting to hear and see both the NCI and the administration and other community partners and patient groups really coming together and reconsidering, ‘What are the unmet needs that can inform in new ways through new approaches of integration?’
“This is a time of new technologies, new initiatives and efforts, the emergence of an NIH data commons landscape, the ongoing growth of efforts of the NCI in terms of establishing and developing a data commons framework, the proposed new resources and influx of funding provide, and just-in-time opportunity for our community to engage in defining how all these parallel and intersecting efforts can be brought to bear in the context of the pediatric enterprise and the translation to impact.”
What’s in the works at CHOP?
At CHOP, within our center, there are largely two broad efforts underway, but intrinsic to these is the recognition that CHOP, like all pediatric research hospitals, must work in the context of a community of hospitals.
One are consortia-based initiatives where again, groups of investigators and hospitals have come together independent of, let’s say, NIH funding or specific initiatives, to support centralized biorepository-based efforts or data generation efforts in a disease specific manner.
My own experiences really began with and have largely benefited from a pediatric brain tumor initiative called the Children’s Brain Tumor Tissue Consortium. In that context, we have recruited more than 3,000 patients on a longitudinal, observational study.
We have just finished sequencing and released via the Kids First Data Resource more than 1,000 whole-genomes along with RNA-seq and deep clinical and phenotypic data for such subjects. We’ll continue to generate additional data from such specimens and will be releasing such consortia-based data in near real-time without embargo.
We’re also part of new NCI-supported initiatives, that again, generate other new data types beyond bulk sequencing, including single cell sequencing efforts that are leveraging our collaboration network amongst pediatric enterprises, but also creating new partnerships with adult efforts.
One such pilot initiative that was launched at the end of last year is called Project HOPE and Project CARE, looking at single-cell sequencing at least in one disease type. Here, it’s gliomas, in pediatrics, adolescents and young adults, and then adult GBMs.
Additionally, a separate pilot is underway with the NCI’s [Office of Cancer Clinical Proteomics Research] effort. This is a proteomics-based initiative in pediatric brain tumors across multiple histologies with new data being release in the coming months. I think that the community is now poised to leverage some of these emerging centralized resources, evaluating existing approaches in these collaborative efforts while looking for such efforts can be broadened and scaled.
And again, at CHOP, every one of these efforts is a partnership, and while we may be the coordinating center for some of these consortia-based or NIH-based initiatives, the reality is that it’s a shared resource across a broad community. And there’s parity of ownership and responsibility across more than 18 institutions who have partnered in across consortia-based initiatives like the Children’s Brain Tumor Consortium.
The other side of the equation for data generation efforts are NIH-sponsored initiatives that are fairly recent in the context of the data sciences, and for us at CHOP this has been in the context of the Gabriella Miller Kids First Program or Kids First DRC or Data Resource Center for which CHOP is the prime recipient along with several key partners.
This effort is only a year and a half in and is an NIH Common Fund-supported effort that includes the NCI and looks at creating centralized environments for cross-disease analysis integration and data empowerment, initially focused on whole-genome sequencing and germ-line contributions to disease, especially across childhood cancer and structural birth defects.
That program includes also data generation efforts where individual investigators from a variety of institutions submit grants, essentially, on behalf of certain cohorts, and then receive allocated funding for sequencing. Or actually, I think they just receive the commitment to the sequencing. They don’t actually get any funding themselves.
The Kids First Program includes data generation efforts where individual investigators from a variety of institutions submit application, essentially, on behalf of certain, well defined disease cohorts, and then receive allocated sequencing commitments for the cohort that will become part of public datasets on the Kids First DRC platforms.
That Kids First program is staged to have more than 30,000 whole-genomes, by the end of 2019, split largely evenly between pediatric cancer and structural birth defects cohorts.
Those represent my own center’s direct efforts that are part of a much larger community of pediatric cancer efforts. And really, the key to the success for these types of efforts is also, as I mentioned before, ensuring that we’re not siloed.
We’re working very hard on partnering the Kids First DRC with efforts at the NCI, particularly as it relates to the Genomic Data Commons and the data commons framework, ensuring that users can interact between such spaces, because, as I mentioned, the pediatric cancer context is slightly unique in the context of syndromic diseases and ensuring that we can both integrate vertically across, particularly, NIH ICs’ efforts for pediatric data, as well as horizontally across different institutes and centers within the NIH is key to our community’s success.
These efforts are still at the early stages, but I think there’s tremendous momentum in the program across our community.
In creating a data federation that isn’t only clinical-grade, but also research-grade, how deep does the sequencing need to be in order for the data to be effective or useful?
I think there are a couple of different ways to think about these questions. One of the challenges of, and I think this is what you’re pointing to this space, is the difference between let’s say, clinical-grade sequencing and research-based sequencing. One type of difference that you noted is in the depth of coverage of sequencing and its use.
But one of the key challenges that the community still faces is that the approach of using panels or targeted-based efforts is largely derived from creating such clinical platforms in ways that can be directly linked to existing actionability. And because the actionable space can be limited in pediatric cancers, clinical panels can also be limited in advancing new knowledge.
I think what we’re finding more and more is that more comprehensive processes that essentially look at the entire genome—like a whole-genome sequencing as opposed to a whole-exome or a panel—provide a larger amount of information that can complement current clinical efforts.
It’s true that not all the information and perhaps not even a majority of WGS data may necessarily be clinically actionable at that particular moment, but through the right types of infrastructure and community engagement and resources, those efforts can become a living, breathing data set that continues to grow in understanding through reanalysis, secondary use, and data-sharing practices in ways that can be immediately translatable to the patients in the clinic or in the context of either new clinical trial designs or emerging therapeutics.
I think this is why there is likely a need for more comprehensive clinical genomics that can be linked directly to clinical care and implementation. For example, particularly in the context of the emerging immunotherapy landscape, where a vaccine-based approach or neoantigen-based approaches may be less constrained than the small molecule-based targeting approaches in the drug development process, such data-driven clinical resources could be transformative.
I think there’s a lot of interest, by our community, in thinking about comprehensive clinical data generation, and this is driven in part by the recognition that the cost of large-scale clinical data generation is now dropping in ways that it will indeed be feasible for clinical whole-genome sequencing, for example, to occur within a short period of time. Big data is set to transition from largely occurring in the context of research to, in the very near future, be the standard of care in the clinical context.
And so, many in our community recognize that it won’t be too long before, for example, clinical whole-genome sequencing combined with RNA-seq is the starting digital footprint of an electronic health record in ways that would suddenly make big data a daily reality that right now is still largely restricted by our community to research-grade datasets.
But because costs are dropping, that’s going to happen, and it’s going to happen fast along timelines our community may not be fully prepared to harness, and we as a community need to think about what is the right infrastructure and workflows and standards around which we can continuously empower the use of such data on behalf of patients, and how we can support its implementation in the clinical setting in ways that, I think, right now are still going to be fairly challenging for most oncologists and clinical environments to fully harness.
So really, building the right tools and environments to iterate around multimodal data analysis, its integration with the longitudinal, clinical, phenotypic, and genotypic data collection processes is poised to be transformative.
Recognizing that layering longitudinal, clinical, EHR data along with molecular clinical grade data across time and along with imaging data like MRIs or digital pathology—that’s extremely key, but again, I think that it has been challenging historically to implement especially across a federated landscape across institutions and hospitals. But this is likely what will be required for pediatric cancer research to succeed.
However, from a data driven and technology perspective, there is a huge amount of opportunity. And the pediatric community, I think, is itself extremely well-poised because of our historical existential need to already collaborate and partner across consortia and clinical trials.
I’m sure Peter [Adamson] talked about the COG and the unique context under which a very high percentage of pediatric patients end up on clinical trials comparatively to the single digit percentages of adult cancer patients. The community is extremely well-poised for such initiatives to be undertaken and be supported.
Have we definitively reached a point at which genomic characterization of pediatric cancer is the standard approach to thinking about research and treatment in a meaningful way?
I think for sure. Some of the best-case scenarios and use cases that the entire cancer community utilizes around molecularly driven or precision-based approaches are actually in the pediatric context.
The impact is phenomenal and measured in slightly different ways, potentially, than in the adult community, largely because the number of patients is smaller.
But being able to have precision-based approaches driven by a molecular definition of the disease has a number of different constraints that are especially important in a pediatric context.
And I’ll provide you the brain tumor context, especially, as a use case.
Non-selective treatment approaches, let’s say, radiation therapy, that target a specific cancer in the central nervous system, in the context of a young child, while it may be curative for the cancer, the approach can also damage and oftentimes does damage the central nervous system itself of the developing child.
In the context of development, where you have cell proliferation both across cancer and non-cancer contexts, non-specific approaches have severe side effects for children in ways that essentially can impart Pyrrhic victories for the child, where parents are faced with decisions of loss of IQ vs. survival.
And so, precision-based approaches are especially salient for the pediatric context where you’re trying to minimize long-term side effects, toxicities, and downstream harm to what hopefully is a very long life still ahead of a child.
And one that’s likely distinguished from an aged population who is being treated for cancer and that has somewhat different side effects independent of development.
I think many of us in the pediatric community try to enunciate what is a really high unmet need for leveraging precision-based approaches in the pediatric context to ensure that we are not only are curing, but also providing for a happy and long-lived normal life for a child as a fully functional member of humankind. In the pediatric context, survival is key, but the ultimate goal of most parents is lifelong normalcy.
I think therefore the answer to your question is obviously, “Yes.” There’s plenty of evidence in the pediatric context that targeted-based approaches or precision-based approaches that are defined by molecular contexts are extremely effective.
However, what is also informative to remember is that in the pediatric cancer context, one of the biggest challenges we face is that if we don’t have large numbers to support the kinds of traditional clinical trials that have been run historically in other cancer types, and so it’s extremely important that when we do run clinical trials, the patient populations are very well defined.
And so, what we’re now finding is that what has been pathologically described as one disease is potentially five or six or 11 different diseases, when you start looking at the detailed molecular biology.
And that presents a challenge for a whole new type of way of thinking about creating the kinds of clinical trials where the right patients are selected for the trials in ways, that with smaller numbers, are more likely to actually achieve meaningful, statistically significant results as supported by molecular definition of disease.
I think it’s along those two contexts—precision-based therapies and molecular definition of disease—that I think there’s tremendous opportunity for applying data generation-based efforts.
In one, guiding a path to sub-classify and better classify diseases in ways that can define the kind of clinical trials that we need to innovate around, in the context of smaller numbers of patients.
And then secondarily, in engaging targeted or precision-based approaches that can mitigate the harm and toxicities that traditional chemotherapeutic or radiation-based approaches impart in the developing context of childhood cancer.
It sounds like NCI and the community are pretty much on the same page about where this needs to go and how potential new federal funding could be used in this space.
Yes. And I think it’s a unique opportunity and time, because of the synthesis of new technologies, new ways to empower communities to come together around data commons-like environments and infrastructure that supports truly non-local, federated, convergence-based research through collaboration, and shared resources bringing new and diverse communities together.
Is there anything I’ve missed?
No. I think, by and large, hopefully most of us that you’ve interviewed, really are sounding the same message of unprecedented opportunities, clear unmet needs, and really strategic alignment between the community, hospital systems, clinical trial organizations, the NIH, and the U.S. government.