Support The Bulwark and subscribe today.
  Join Now

How Software and Big Data Could Eat the Next Pandemic

Doing data-driven medicine is harder than you think.
April 5, 2020
How Software and Big Data Could Eat the Next Pandemic
CREMONA, ITALY - APRIL 02: Nurses check medical data on a computer in a Covid ward at Cremona Hospital on April 02, 2020 in Cremona, Italy. The Italian government continues to enforce the nationwide lockdown measures to control the spread of the Coronavirus (COVID-19).(Photo by Marco Mantovani/Getty Images)

Managing uncertainty and acquiring data are hard even in the best of circumstances. In the middle of a global pandemic? Good luck.

The world is getting a crash course in the uncertainty of medicine, a discipline that is informed by science and strives to be scientific, but which operates far more often than you’d think in a data-poor zone, where even very good doctors are required to make many clinical decisions for their patients based more on intuition and hunches than robust data.

The overwhelming uncertainty experienced in daily medical practice is euphemistically celebrated as “the art of medicine” by some, and bemoaned as a need for improved, evidence-based decision making by others. It’s also what has drawn so many technologists to the sector: The profound opportunity to bring data to bear and improve outcomes. But it has also repelled many of them once they realize just how far from a “rationalized” ideal state medicine is, and the limited extent to which robust data informs—or is even available to inform—daily medical practice.

In short: While software and Big Data may have eaten the rest of world, they have yet to digest medicine.

What clinicians have always understood is in the face of this uncertainty they still need to make high-stakes decisions about patients every day. This is probably why the most definitive physicians I’ve ever met are the junior doctors—the residents—who supervise medical students and interns and manage much of the day-to-day patient care activities in many hospitals.

To be able to function, and make the hundreds of decisions they need to every day, residents have a tendency to become dogmatic, leaning heavily on structured algorithms and diagnostic approaches. Because while these mental models only loosely approximate reality, they enable the often-overwhelmed residents to function, make decisions, and get through days where the norm is something like trying to drink water from a firehose.

In contrast, I’ve always been struck by the nuanced perspective that senior, expert physicians seem to have. They almost all seem to understand that few things are as cut-and-dried as residents see them. What their experience gives them is a deep understanding about the contingency of a diagnosis, the uncertainties of a given therapeutic approach, the inherent ambiguity of illness.


The COVID-19 pandemic has put this sort of uncertainty and ambiguity front and center. What are the symptoms? What’s the prognosis? Who is really at highest risk? How is the disease acquired? Does the amount of virus you’re exposed to matter? Are tests for the virus reliable? What about tests for immunity?

What the general public may not understand, though, is that our uncertainties about COVID-19 mirror many of the questions surrounding other medical conditions. Most of the illnesses a physician attends to in a typical day actually don’t fit into a simple box, where you can unambiguously assign a name, perform a definitive test, and recommend a therapeutic approach that you’re confident will be effective for the specific patient in front of you.

As we try to sharpen our approach to COVID-19, one of the first steps is to acquire better, more complete data, so that we can understand the scope of the problem and begin to benchmark our progress. The futility—and danger—of making consequential decisions without such data has been eloquently highlighted by Stanford epidemiologist John Ioannidis, Manhattan Institute economist Allison Schrager, and the team at 538, among others.

Data are what enables you to move from uncertainty, which largely eludes quantification, to risk, which we can begin to quantify and manage, Schrager explains. But the “data collected so far on how many people are infected and how the epidemic is evolving are utterly unreliable,” Ioannidis argues, resulting in an “evidence fiasco” that “creates tremendous uncertainty.”

Adds 538: “if every individual piece of a model is wobbly,” the model is in trouble, and your ability to make meaningful forecasts is going to be compromised; that’s how you can get U.S. fatality estimates varying from 200,000 people to 2.2 million, a “freaking huge” difference as the authors, accurately, point out.


It would be enormously valuable to have a granular understanding of the journeys of the patients who are now battling COVID-19. Richer information about the patients and the nuances of their treatment could help us better understand how different people react to the virus, what patient characteristics might predispose to better outcomes, and what therapeutic tweaks seem especially effective.

Amazingly enough, even in the midst of this outbreak we’re getting some of this essential information—which is an enormous credit of the contributing physicians. But gathering data eats up resources and, in many cases, it’s asking the impossible to have clinicians spend time gathering research data while care teams are being overwhelmed and they are doing everything they can just to manage the devastating situation in front of them.

Again, this is an exaggerated version of the challenge healthcare workers have every day: How to balance providing care in the moment with the potential value of gathering and documenting richer data to better inform care in the future.

But more than anything else, the challenge of COVID-19 data collection highlights the need to make information collected during routine patient care every day more adaptable for robust research analysis.

One of the lessons of COVID-19 is that we’ll be better able to manage the next crisis if we can build a system that allows front-line providers to gather more, and better, data with a minimal burden. Efforts to achieve this are underway, in both academic settings as well as the private sector.

Because even if software and Big Data aren’t ever able to fully “eat” medicine, they should be able to take a big bite out of the next epidemic.

David Shaywitz

David Shaywitz is a physician-scientist at a biopharmaceutical company, an adjunct scholar at the American Enterprise Institute, and a lecturer in the Department of Biomedical Informatics at Harvard Medical School.