NCI’s new chief data scientist Warren Kibbe tells us about efforts to get “AI-ready”

“All research now involves data science at some level.”

Share on facebook
Share on twitter
Share on linkedin
Share on email
Share on print
Warren A. Kibbe, PhD

Warren A. Kibbe, PhD

Deputy director for data science and strategy, NCI

All research now involves data science at some level. And I can’t think of some aspect of science where we don’t want to analyze the results.

What can NCI accomplish in data science?

In the informatics world, the institute’s resources would be considered paltry by comparison with private companies. 

The institute’s total fiscal year 2024 budget of $7.2 billion would constitute less than 25% of the R&D budget of Microsoft Corp. and would be $2.3 billion lower than this year’s research and development spend of the artificial intelligence giant NVIDIA Corp.

“I think maybe the better question is how can we take what is being developed by those companies for really broad use, and how do we bring that into cancer research and make cancer research happen faster, better?” said Warren A. Kibbe, NCI’s inaugural deputy director for data science and strategy.

Kibbe said his primary goals as NCI’s top data scientist include improving access to data, enhancing the scientists’ abilities to apply visualization techniques for cancer research, and using technologies, such as artificial intelligence and machine learning, to advance our understanding of the basic mechanisms and etiology of cancer. 

Over the past seven years, Kibbe served as chief for translational biomedical informatics and professor and vice chair in the Department of Biostatistics and Bioinformatics at Duke University and as the chief data officer for the Duke Cancer Institute.

Prior to that, Kibbe headed the NCI Center for Biomedical Informatics and Information Technology, known as CBIIT, for four years (The Cancer Letter, June 16, 2017; June 26, 2013). At NCI, Kibbe spearheaded development of the Genomic Data Commons, a big data project for comprehensive, raw genomics information (The Cancer Letter, April 29, 2016). 

In addition to consulting across NIH on data science issues during his first stint at NCI, Kibbe played a role in establishing partnerships with the U.S. Department of Energy and was involved in putting together collaborations in precision medicine and the Cancer Moonshot. 

Kibbe is also a co-founder, with Sorena Nadaf, of the Cancer Center Informatics Society (CI4CC). 

“No one is better suited to help the NCI innovate in cancer data science and biomedical informatics than Warren,” Nadaf said to The Cancer Letter. “His background and temperament provide an excellent balance to the needs and vision of the NCI. I look forward to continuing to collaborate with Warren on various projects, both as a long-standing colleague and as the president & CEO of the Cancer Center Informatics Society.”

CBIIT is now headed by acting director Jill Barnholtz-Sloan. 

Kibbe returns to NCI at a higher level, that of deputy director.

“As NCI’s first deputy director for data science and strategy, Dr. Kibbe will advise me and other senior leaders on the utilization, stewardship, and sharing of one of our most important tools to end cancer: data,” NCI Director Kimryn Rathmell said in her announcement of Kibbe’s appointment.

Said Rathmell:

He will also provide strategic direction to CBIIT and be responsible for management and oversight for all aspects of data science for the NCI. 

He will lead NCI’s implementation of the NIH Strategic Plan for Data Science, which will inform the development and implementation of an NCI-specific data strategy that maximizes data quality, security, sharing, privacy, and use for cancer research.

Dr. Kibbe’s counsel will be essential to guiding key NCI data science initiatives, including the Childhood Cancer Data Initiative, the Cancer Research Data Commons, and the Biomedical Data Fabric Toolbox, a collaboration with ARPA-H. 

He will also serve as senior data science liaison on a variety of NIH and other government committees. 

In a conversation with The Cancer Letter, Kibbe said NCI’s role is to bridge science that is emerging at cancer centers with technologies that are being developed broadly by the informatics industry.

Said Kibbe:

I think of it both from a standpoint of capabilities and the levers.

What are the unique capabilities that academia has? They’re really—particularly the NCI-designated cancer centers—they’re really engaged in basic research, understanding tumor biology.

They’re engaged in everything from coming up with new compounds, new treatments for cancer patients, to delivering cancer trials and trying to understand how those things can lead to the next standard of care.

The engines for innovation—that is in academia. It’s really doing an amazing job.

NCI has the opportunity to facilitate things between what the NCI-designated cancer centers can do, and then what’s available in industry.

And, again, I think the industrial pieces are incredibly important, because they’re the ones that actually make it happen at scale.

If it’s an important therapy, an important device, and there’s an ability to pay for it—they will absolutely produce it and make it available to everybody. It’s a matter of connecting all those pieces together in the right way. 

To me, it’s more of a continuum than it is that they’re distinct. 

They each have a place.

Kibbe spoke with Paul Goldberg, editor and publisher of The Cancer Letter

Paul Goldberg: Well, first of all, congratulations on coming back to NCI. It will be great to see you there.

Warren Kibbe: Thank you.

You left CBIIT and went to Duke in 2017. And now you’re returning. What has changed in the field since you left for Duke?

WK: Oh, my goodness. 

Well, it’s been seven years, as you said.

I’ll say that when I left, I had no idea that suddenly, transformer models (that’s one way of doing AI) and large language models would become, if you will, the talk of the town.

That’s been a huge change since 2017. 

I think there’s been lots of transformation that’s happened in that time that, frankly, just makes data science that much more important.

Some of it’s been that things like [the United States Core Data for Interoperability]—so that’s the EHR data standard for being able to send clinical data from one place to another—it’s really improved and matured. It’s now becoming useful from a clinical research standpoint. So, I think we’re going to see some really major opportunities with USCDI. 

Other significant changes have been in basic science technologies. Things like spatial transcriptomics. There were a few labs that were trying that in 2017, but now it’s mainstream. 

That really changes our understanding of what an individual cell can do, because we measure a single cell. It’s phenomenal. And all data-driven. And the techniques for that are still being developed.

Also, some of the change around cloud computing has been pretty phenomenal. 

So, it’s not just about AI, it’s also about the compute horsepower. 

A lot of the current AI models need a lot of compute, so, again, these kinds of models weren’t feasible just a few years ago.

The other thing that’s happened is cloud costs themselves have really gone down. So, it’s becoming cheaper to do things in the cloud. 

At the same time, the kinds of things we want to do, actually, need more computing. So, maybe it doesn’t end up being cheaper, but the capabilities have really changed.

Seven years is a long time, actually.

What are the three top things you will do at NCI?

WK: Well, it’s hard to pick just three! 

It is not so much what I do.

It’s at least partly the questions that I want to ask. 

What’s really important for me as I rejoin NCI is to find out what everyone is doing, find out what the portfolio at NCI looks like today, and understand where the opportunities for partnership are.

And thinking about partnerships, when you listen to what Dr. Rathmell, in particular, is talking about right now, it’s about how do we build the right partnerships, both inside NCI, across the research community, across the country and across the globe.

NCI has the opportunity to facilitate things between what the NCI-designated cancer centers can do, and then what’s available in industry. And, again, I think the industrial pieces are incredibly important, because they’re the ones that actually make it happen at scale.

Warren A. Kibbe

But I guess you asked a different question, and that’s what am I going to do? 

And so, maybe that’s a little more concrete, a little less aspirational.

One of the things that I helped set up, just before I left, was the Office of Data Sharing, and I really see that the importance and the opportunity to more rapidly get data in people’s hands—we aren’t done with that yet.

How do we make sure we aren’t leaving researchers behind? How do we make sure we aren’t leaving populations in the U.S. behind as we think about access to data?

I think, along with that, make sure that we are training a much more diverse and representative group of data scientists for the future.

Those are things I really care about. 

How much I can actually impact all those things? Well, we’ll see. 

That’s part of why I want to go back and see how much I can actually help change some of those things.

The last point is a focus on high-quality data and data management. 

There is a transformation needed to go from data to information to knowledge. The current buzz-phrase is “getting data AI-ready,” but I think it’s almost more fundamental than that. You need to have good quality data for all kinds of things.

We need good-quality data for analysis, for visualization, and then of course for all these different AI tools. 

And part of the data being ready is you need to understand what the data’s good for (fit for use) and what it’s not good for, where the limitations are, where the biases are.

And if you don’t understand those things, all those techniques can tell you the wrong thing. They can introduce biases, they can actually reduce health equity. So, it’s really important that we think about the limitations of the data that we have, the data we generate, what it’s good for.

Good data quality, data management, understanding the utility of all NCI-generated and funded data is a critical focus for us as a research organization. 

Every cancer researcher already thinks about these issues, but mostly in terms of their own data, not the ecosystem of data.

What is the role that NCI can realistically play in this field, because this is a sector where investment by private industry can be astronomical.

In fact, they can be larger than the entire NCI budget.

So, what can you do with the amount you have?

By the way, how much money do you have within the system? How much do you give out in data science?

WK: For data science you mean?

What’s the NCI data science, or CBIIT budget?

WK: I can only speak to the total NCI budget. I don’t know how much is allocated to data science.

I think what I can say, is that when I left NCI in 2017, I went to the SPL, the Senior Program Leaders group, and I said that I think by 2030, 40% of NCI’s budget will go to support data science in one way or another.

Whether that’s data generation, data analysis, data has become more important.

When you look at what everyone actually does in cancer research, a lot of it is now data science.

So, it is a different question than how much is dedicated to data science. All research now involves data science at some level. And I can’t think of some aspect of science where we don’t want to analyze the results. 

To me, that’s data science; right?

So, it’s everywhere.

Is it 100%?

WK: But you were asking what role can NCI play? 

Well, again, I think because it’s so instrumental in everything we do, we have to do data science.

Are we going to come up with the next large language model, the ChatGPT 5.0? 

No, that’s not NCI’s role.

Frankly, that’s not something academia will do, either.

That is those big players. It’s clear that NVIDIA is an incredibly powerful player, but they’re actually not coming up with those models, either.

That’s OpenAI, it’s AWS, it’s Google, it’s Microsoft. They’re developing and investing in AI.

So, I think maybe the better question is how can we take what is being developed by those companies, that’s being developed for really broad use, and how do we bring that into cancer research and make cancer research happen faster, better?

I’ll say that it’s been interesting to me over the course of my career that I’ve always felt like we’ve generated data where the signal is just too diffuse.

You can look at the way we do things like GWAS [genome-wide association studies], there’s clearly a lot of information in looking across variants in human populations that, up until now, we didn’t have techniques that would let us understand really subtle interactions. 

We had to look for big changes. 

We had to look for drivers.

But now, I think, we can look at these much more subtle interactions, and that’s going to be really important for cancer researchers to move forward.

What convinced you to return to NCI? 

WK: As you pointed out, data science has changed a lot. 

I think the opportunities have changed.

What we can do with data has changed. 

Dr. Bertagnolli certainly brought this up when she was NCI director, and now as NIH director, there’s an opportunity to partner across HHS that I didn’t see when I was there before.

And so, I’m pretty excited about coming back in and seeing how I can contribute to that broader partnership across HHS.

And, of course, it’s not just with those federal agencies, it’s with all of the groups, all the people that are across the country that are involved in those missions. 

Making drugs safer, making sure that we get patients on the best possible treatment that we know about—that’s not just NCI’s job.

Are you moving back to Washington or staying in Durham?

WK: I have to say I’m really blessed with living in the Triangle area of North Carolina. I love it. And I can’t quite see moving back to DC, so I’m going to be commuting back and forth.

Seems a lot of that is going on now. It makes it easier for NCI to recruit.

WK: I think so. And I think that the reality of the pandemic is it showed we actually can work effectively virtually.

In my job at Duke, I go in to the office one or two days a week. But even when I’m in my office, I’m mostly on Zoom.

It’s great to see people in person. 

I think that’s really important. It’s good for our mental health to see people in person. But the jobs have changed. And I think that we can do a lot virtually. But there are times when we absolutely need to be together.

Now, what do you actually see as the biggest opportunities in data science?

WK: Well, I’ve mentioned a couple indirectly. I’ll try to be more concrete. 

When I think about what goes on in the discovery science, in basic science, we have new tools, like spatial transcriptomics.

We have a lot of new tools and technologies to look at the proteome, look at the microbiome, look at the microenvironment of a tumor, understand the complexity of it, the heterogeneity of it. And how that actually affects the tumor. 

So, really starting to understand tumor biology, not just from understanding individual pieces, but how those pieces work together to create biology.

That’s something that we are still figuring out how to do, but we are generating the kinds of data that let us build an even more integrated, systems view of biology. 

So, that’s just super-exciting—building that complex view of biology, of normal biology and of cancer biology.

And it is enabling.  It lets us build models for how we think cells actually work and work together. And having those models will let us computationally perturb that model, for instance, with different kinds of drugs. And then see if the model prediction really works when we try it in the lab. So, I see a whole new level of simulation we’ll be able to do that will lead us to new kinds of drugs. And be able to do that much more accurately than we’ve ever been able to do before.

As NCI’s first deputy director for data science and strategy, Dr. Kibbe will advise me and other senior leaders on the utilization, stewardship, and sharing of one of our most important tools to end cancer: data.

Kimryn Rathmell

So, that’s pretty exciting. That’s on the basic science side.

Looking at it from a translational research perspective, some of those same things lead the translation. How can we do a better job of being able to screen and identify compounds that really work against specific cancers?

That’s going to be a huge opportunity.

Another opportunity is even more effectively applying data science all through the clinical trial pipeline. For instance, can we use these new tools to do a better job of designing the studies themselves?

Can we identify, looking at a given cancer type, who doesn’t have good outcomes? Can we design a study that lets us try to get better outcomes for those patients? And then design a protocol and accrue patients to that protocol, again, using data science techniques that specifically address those areas.

That’s something we’re trying to do with the Childhood Cancer Data Initiative, where every childhood cancer is a rare tumor, a rare disease.

How do we match those patients with the right therapies? How do we identify where we really need to have a new therapy, find that therapy, bring it into the clinic, and then analyze to see if those patients benefit? If there is a benefit, then use data to make sure it’s available to everyone.

Addressing data quality issues and using the capabilities of data science to enable this transformation at scale across the whole country is both really important and feasible now.

This issue of scale is particularly important for improving outcomes for rare diseases. Because you need to be able to look at everyone to see enough patients where you can really learn something from them.

Maybe I should ask you about the anatomy of your decision to return. How did that happen? Was it while Monica was there? Did she call you? How did it work?

WK: So, again, I think something that is probably important to put in context is the last four years, because of the Childhood Cancer Data Initiative, I’ve actually been on an IPA with NCI.

So, I’ve had a fair amount of connections because of that with NCI.

I’ll say that when Monica came in, she really identified the need and the desire to have a deputy director for data science. I had a lot of people that I was working with ask, “Warren, are you going to apply for that?”

And so I did.

And I will say, I wasn’t looking to change. I was very happy here at Duke. It’s been a great place. I’ve enjoyed what I do. The department I’m in has been phenomenal. I’ve been able to hire lots of faculty in my area.

It’ll be hard to leave them. But they’re going to do great things with or without me. But it wasn’t an easy choice to make, to say, “Okay, well, I really will go back to NCI.”

But the opportunity to do something again at a national level with NCI, it’s pretty compelling, at least for me.

And now Monica, with her interest in data science, is the NIH director, and you would think that that’s going to lead to a lot of trans-NIH initiatives in data science?

WK: I am absolutely sure it will. And I look forward to working with Dr. Bertagnolli on any of those things that’s appropriate for NCI to be involved in.

But is there any of it that’s coming up right now? Anything that comes to mind?

WK: I’m still on the outside. So, I’m sure I don’t know everything that’s going on on the inside. And knowing Monica, I’m absolutely sure she has lots of plans.

Yes, absolutely. I’m still wondering whether there is a simple way to distinguish between the roles that NCI can play, that academia can play, and that the industry can play? And how do they actually—these components—interact in data science?

WK: I think that’s a great question. 

I think of it both from a standpoint of capabilities and the levers. What are the unique capabilities that academia has? They’re really—particularly the NCI-designated cancer centers—they’re really engaged in basic research, understanding tumor biology.

They’re engaged in everything from coming up with new compounds, new treatments for cancer patients to delivering cancer trials and trying to understand how those things can lead to the next standard of care.

That is that the engines for innovation that is in academia. It’s really doing an amazing job.

NCI has the opportunity to facilitate things between what the NCI-designated cancer centers can do, and then what’s available in industry.

And, again, I think the industrial pieces are incredibly important, because they’re the ones that actually make it happen at scale.

If it’s an important therapy, an important device, and there’s an ability to pay for it—they will absolutely produce it and make it available to everybody.

It’s a matter of connecting all those pieces together in the right way. To me, it’s more of a continuum than it is that they’re distinct. They each have a place.

And the government role is more of a coordinating role? Is that what you’re seeing?

WK: Well, there is the convening part that’s really important, but there is also the opportunity to say, “Well, here’s a place that that industry doesn’t see an opportunity, a financial opportunity, to get into.

Can government make that more attractive, so that we make sure we help those people?”

Again, rare disease is the place where that’s worked really well. And it’s not NCI driving that alone. That’s actually more federal policy that makes those things happen. But again, I think NCI plays a role there in helping identify, “Well, where are these opportunities?”

You’re also part of the Cancer Center Informatics Society

WK: Absolutely.

You’re one of the founders.

WK: Yes. And, in fact, I just got a text from Sorena Nadaf as we were first starting to talk. Sorena and I were the co-founders of it.

I’m just wondering what the issues are for the cancer center informatics officers and how you can help them from NCI?

WK: I think we can go back to the founding of the Informatics for Cancer Centers Society. And the reason we started it is we really wanted to have a place where the whole informatics, data science community that was part of cancer research and part of cancer centers, could come together.

And I think that’s still a really important part of what the society does, bring those people together. 

And I’m not quite answering your question…I will get there.

Oh, this is complicated. I get it.

WK: But what’s been really fascinating to me is every time we want to run a meeting, we bring different chairs.

And it is just amazing—the depth of both knowledge and activity in practically every component of cancer research we dig into.

We had a meeting recently focused on cancer imaging and radiology—all kinds of things in there that I didn’t even realize were happening. Our last meeting was on clinical trials and really thinking about everything from health equity and representation on clinical trials, to workforce development for clinical trials, to doing things like pulling data from EHRs more effectively so people don’t have to hand enter all that data into a clinical trial management system.

Again, it’s pretty exciting. 

You can see all kinds of change happening in practically every aspect of cancer research. 

And again, I’d like to think data science is part of that, both that impetus and the actuator or actualizer of that change.

What I was asking is, how is the funding model working for the cancer centers? How does a cancer center fund its data science activities? Is the model good enough—now?

WK: I see. So, you’re saying, how does an individual cancer center have an informatics program? How is it funded?

For most cancer centers, it’s a mix of the way that biostatistics and bioinformatics shared resources, operate inside a cancer center. 

For shared resources, it’s much more a service model where people who are doing research, they come to the folks that are doing the informatics and the statistics and say, “Here’s a problem I have. Will you partner with me? We will go jointly and get some grant funding, whatever the funding is, to do this work together.”

Now, I see more and more informaticists, data scientists, who are the PIs themselves.

They’re going out and getting their own funding and building their own labs. And they’re attacking fundamental and applied problems in cancer research. 

They are just like everybody else in cancer research.

They’re funding themselves through grants. They’re funding themselves through foundations. If it’s tied to the clinic, they’re getting sometimes clinical revenues to do that.

To me, it looks a lot like all the rest of cancer research.

Steve Rosen was telling me some years ago [at Northwestern] that when you came in, there was no funding, and he took a chance. Kind of like, “Give this kid a chance to do something fun.” And then by the time you left for NCI, it was one of the best funded pieces of Northwestern.

WK: Well, I think that Steve’s being exceptionally generous there, but I had a group of about 40 people by the time I left. So, it was not the biggest.

I think we were doing great things, and a lot of my group’s success was Steve creating the right environment so we could actually be successful.

Can we do some specifics? Like machine learning. It’s effective as a clinical support tool in pathology? Well, what about adoption in clinical research and validation?

WK: Sorry, I tried to say some of that earlier, when I was talking about some of the places where data science can really play a role, but let’s talk about pathology, although it is imaging in general.

Imaging has been an early win for machine learning techniques, and now AI techniques. These tools can identify features that are associated with a particular cancer.

What’s exciting to me is that some of those features that these algorithms are identifying—features that pathologists or radiologists haven’t really looked at because they’re not how they were trained to look—appear to be predictive.

So, I think that’s exciting, because it’s possible we’ll find features in imaging and pathology that are actually better predictors of outcomes than what we’ve been able to do historically.

A great part about these techniques is really work well with imaging data in particular. 

So, that’s clearly been an early win for a lot of AI.

But now, both with transformer models and with LLMs, we are seeing them be able to do much more sophisticated abstraction of human language, something I think is going to be a big deal in the near future. And some people say it already is.

But I have to say that what we know how to do today isn’t equally distributed everywhere. Some people know how to use ML and AI models, but most people don’t yet. It’s not quite routine. Things like using these models to process clinicians’ notes at the time they put the notes in, and be able to prompt back and say, “Is this what you meant? Did you mean this patient has this disease? This stage, these conditions?”

However, as we do this, a major concern is how do you preserve privacy when you’re trying to train these models?

Because they work best when you give them lots and lots of data. But it’s hard to do that and also preserve everyone’s privacy.

So, that’s an interesting problem. 

I’m not, certainly, going to solve that, but I’m really hoping that all the smart people out there, particularly in academia, can help solve it.

Are we really at the point where there is a difference between machine learning and AI? Is that a meaningful difference? In other words, is there such a thing now as true artificial intelligence?

WK: Well, so again, I think there are many techniques for analyzing data.

Some of them are very, very simple. T-tests, linear regression, to more and more sophisticated methods, until you get to the latest techniques like transformers and LLMs, and neural nets, which are, I think, all in the realm of AI.

But again, to me, they’re all tools. And the problem should help you define which tool you use.

The AI pieces are important. They’re new tools. But, if you will, the tried-and-true linear regression models work great for most problems. You don’t need AI to solve those problems. 

I think you want to reserve those really sophisticated methods where you can’t do it any other way.

Was that your question?

Is there a difference between machine learning and AI at this point? There’s this boundary that may have been crossed or may not have been crossed to true artificial intelligence. Maybe it’s not the right way to ask this question.

WK: Maybe I’ll answer that a different way, now that I know that’s the question you’re asking.

When you look at the way that large language models are trained and the way they behave, and, in fact, the way that they can fill in information they’re not given.

Once they’ve been fully trained, you can give them incomplete information and they fill it in, they make inferences.

And right now, one of the problems with that is they hallucinate. You hear all these things about AI models hallucinating. 

Well, I’d actually say that’s a pretty good analogy for the way we think.

I mean, when you and I are talking, I’m filling in what I think you mean. Sometimes I get it wrong. And if you will, that’s kind of hallucinating, right? That’s the kind of hallucination where I think that this is what you meant, but it’s not exactly what you meant.

So, it feels like large language models are starting to behave like the way we think.

Are they going to replace us?

No.

But is it a useful approximation?

I think maybe it is. We’ll see. We’re in a really exciting time for artificial intelligence.

What are some of the promising applications for AI in cancer research and drug development? Are there any that you would prioritize in your new job?

WK: Well, first, it’s not up to me to prioritize those things. That’s really something we do jointly.

Right. 

WK: You’re part of the CI4CC listserv. I hope.

I am.

WK: And you saw when I sent out the announcement of this is that, I will be reaching out to people to participate in task forces and help me identify where the opportunities are. 

What should we be doing? How do we distinguish between things that are really exciting, but maybe not very useful and the things that are actually incredibly useful and we should be doing?

But back to your question about applications: I really do see this opportunity to do more with our data where we really simulate behavior of cells, simulate the behavior of preclinical models, simulate the impact of a treatment on a patient. And then evaluate if that model is accurate. And refine it, get better and better at modeling and simulation.

Can we do such a good job of predicting how a patient is going to respond to a therapy to the point where we can run clinical trials that are adaptive based on these predictions? Where we understand how patients are likely to respond to multiple therapies, to be able to give something that right now is very difficult, combinations of drugs.

If you had a chance to join the joint NCAB BSA last week, Lou Staudt and Alex Shalek were talking about the Human Cancer Models Characterization Models Initiative, where they have much more advanced models for cell lines.

And one of the things that both Lou and his colleague brought up is that by characterizing these models, they can start to really understand what drives plasticity. So, when you hit a cell with a therapy, how does it start to avoid that therapy?

And the next step of that is, now how do you perhaps hit with multiple therapies that make it impossible for that cell to escape. And to me, that’s very exciting.

We’re starting to better understand with all these different techniques what is a cell state? What does plasticity mean for a cell? 

The flip side to plasticity is homeostasis. When you hit a cell with different compounds it tries to maintain its current state. And so, if they’re cancer cells, they try to maintain being a cancer cell. And you can hit them with all kinds of compounds, they try to maintain being a cancer cell. So, that’s kind of the opposite view of plasticity.

They both play a role in how cells respond to drugs.

And again, I see the opportunity now with these really advanced computational models and sophisticated data generation techniques being able to understand that interplay with the biology of a tumor so we can do a better job of predicting not just what single therapy we can do, but how do we stop a cell from becoming resistant to that therapy.

We now have at least some of the tools and techniques necessary for this much more sophisticated analysis of cancer. So, being able to generate things like organoid models, being able to generate now all the data from single-cell sequencing, from proteomics, that we can actually start to really understand, predict, test.

And hopefully come up with really very different kinds of therapy. So, if you’re asking me what gets me excited about all this, that’s it. Understanding the connection between human biology and why therapies both work and don’t work, that’s pretty cool.

That is pretty cool. What a difference seven years make. This wasn’t possible seven years ago.

WK: It was not possible seven years ago. I was involved in starting this thing called the ATOM Initiative, or the ATOM Consortium.

And the idea there was using high-performance computing to try to understand just the interaction between two molecules.

Things like a GTP-regulated receptor, and the molecules that turn it on and off. Now, seven years later, we can think about not just doing that pairwise, but start putting in hundreds of molecules.

People in academia and in the industry like to complain about NCI not being transparent about what’s happening.

First of all, is this fair? 

Is there anything you can do about it? Seeing your posts about looking for ideas, it sounds like you’re already doing something about it?

WK: Well, I think that when you’re talking about a big organization like NCI, it’s hard for every aspect to be equally transparent, because there’s just so much to communicate.

But transparency is incredibly important, because if you don’t have transparency, you can’t have accountability. And again, accountability has to be a priority. Transparency is important for any organization, not just a government organization. I don’t want to read too much into your question about transparency, but I think it’s always true that some people will feel like no matter what you do, you’re not quite transparent enough.

I think the hard part for NCI is there are parts of the process it cannot be transparent about.

For instance, when you’re developing a new NOFO, a new funding opportunity, you can’t talk about it until it hits the street. To compensate for that, Ned, Monica, and now Kim are all really good at talking about the priorities of the institution.

It’s not a direct line between stating a priority and saying, “We’re going to come out with these funding opportunities.”

But I think it’s important to talk about where the priorities are so people get prepared for it.

And when those opportunities hit the street, they know about it.

But it’s important that, I always say now, particularly in government, you try to be as transparent as possible, as transparent as you’re allowed to be.

One way to deal with it, probably, and I’ve seen FDA do it, is to become a part of the scientific mainstream. And to be a part of it or drive it, or some such.

For example, before Rick Pazdur at FDA, we didn’t know what FDA was thinking. Now, you can kind of know it, because he shows up at ASCO and AACR. You can ask him. He responds to email, or forwards it to the right person. That kind of stuff. Instead of just ruling by fiat.

And I think you are kind of answering it in a way by being a part of the bioinformatics society and remaining there. 

WK: I think NCI has been very involved both with ASCO and AACR. And, hopefully to your point, those are important venues for people to say, “Here’s what we’re doing.”

And we need to make sure that we’re doing as much of that as possible.

It’s also in how you solicit advice and how you accept advice. And some of it is probably kind of a reaction to yesteryear really predating you, at NCI. One final question is about the possibility that AI can be discriminatory. Can you play a role in setting up standards to make sure that it doesn’t become discriminatory?

WK: Well, I mentioned earlier that paying attention to both what does it mean to be AI-ready and high-quality data, it’s really important to understand what the data is good for and pay attention to how it gets applied.

Because if you don’t do that, you absolutely can create inequity. 

I think a really important example, and this isn’t necessarily just a data science example, is that most cancer centers are creating dashboards that show the demographic makeup of their catchment area. Visualizing where all the cancer cases are, and who goes on clinical trials. Here’s who comes to our organization. 

Are we doing a good job of bringing the right people in the door? And if we’re not, what barriers did we put in place? I think that’s a data-driven approach to thinking about how to create a more fair and equitable cancer treatment ecosystem.

And it’s asking those kinds of questions that are so important so in fact, we don’t create more health inequities.

You didn’t ask about telehealth, but I think telehealth is wonderful, particularly in rural communities that have broadband access.

But what about the rural communities that don’t have broadband? Are they going to get left behind? What about families that really can’t afford broadband? What does it do for them?

So, I think there’s some really important questions that as you roll out these technologies you absolutely have to ask. And, ideally, you have answers for how you reduce those disparities.

But what’s the NCI role in this potentially? I’m just asking. I have no idea.

WK: I think that one thing that’s happened, and this has been really important, is there’s now a plan. The Plan to Enhance Diversity.

It’s something that’s in the CCSG. The CCOE is about community engagement. But I think the PED pieces are just as important—how do we bring people into cancer research and into cancer care who look like our populations?

I mean, there’s been so much evidence, over and over again, that part of the reason we have health disparities in this country is because the healthcare workforce isn’t diverse…It’s not comfortable for some people to go to their local provider.

They do not look like them. They don’t speak the way they do. They don’t understand the problems the way they do. It’s so important to have a diverse workforce. And I think that’s going to be true for data science, too. I want to see a diverse workforce so we really understand the totality of our country.

Well, sir, I’m sure we’ll be back in touch over a lot of this, if not all of this, but is there anything I forgot to ask?

WK: Well, I’m glad you gave me a chance to talk about the modeling piece. The part about health equity is really important. We’ve hit everything that I wanted to make sure we touched on.

Well, thank you so much taking the time to talk with me.


Matthew Bin Han Ong contributed to this story.

Paul Goldberg
Editor & Publisher
Table of Contents

YOU MAY BE INTERESTED IN

By the end of 2022, Toni Monteiro had no fight left in her. She had been battling a rare blood cancer for three years. Her husband had just died. She was at risk of being evicted from her Washington, DC, apartment. Also, her heart was failing. “You’re really under stress,” Monteiro recalls her physician saying. ...

Paul Goldberg
Editor & Publisher

Login