A much misunderstood component of modern life is the role of data in decision-making. Many people have a sense that hard numbers are authoritative and provide a perfect (or at least fairly accurate) representation of reality. When a number is applied to a concept, that concept becomes real because the number comprises a count of real things. It has an objective source; someone had to gather it and verify it.
It is less commonly understood that data comprise a representation of reality under duress. Data are gathered, entered, processed, queried, collected, verified, and presented. Every step of that journey can generate mistakes or represent an incomplete picture in one way or another. Anyone can chart a graph. It takes thought and care to understand what is happening between the chart and the reality.
This is nowhere more clearly seen than in the dreadful daily drumbeat of COVID-19 data. Every day, more tests are done, more reports come in, more deaths are tallied, more hospitalizations reported. One state sees cases rise while another sees them fall. The numbers are quickly reported, but poorly understood. Why are there multiple national death counts, often differing by the thousands? Do we look at the number from the Centers for Disease Control? Or from the New York Times? Or the Johns Hopkins data dashboard? Where does all this information come from, and why is it changing like this?
In a time before COVID, ordinary people would almost never see data in these raw forms, and certainly not in real time. For one thing, most government-collected health data come in yearly or monthly. Much of it is cleaned, sorted, and verified before it goes into a report the public can access. There is time to ask questions, to remove anomalies, to refine definitions. We are used to seeing data that have been subjected to a careful review process before they go out into the wild.
This proved to be a failure when it came to a fast-spreading pandemic. When COVID infections started showing up across the country, the Centers for Disease Control were not only slow to respond with working tests and functioning policies but were slow to pull together a comprehensive data operation to give a clear view into how the pandemic was spreading. In some regions, identified cases were doubling in days, not weeks, which meant that daily data needed to be assembled so we could clearly understand the nature of the threat.
The bulk of the work of reporting cases fell to the various state health departments, many of which came to this crisis woefully unprepared. Many states did not initially release their “positivity rate” for COVID, which is the percentage of tests that came back positive. To know this, we need to know how many positive and negative cases a given state has. This is an important metric because the higher the positivity rate, the more prevalent the disease is in the community. But people usually don’t come in for testing unless they have some symptoms. A positivity rate of under 10 percent is good, as it means only 1 in 10 people who are feeling sick and getting tested have the disease. Over 20 percent is very worrying; 30 percent means that the outbreak is very serious.
Many states were initially unable to deliver that data because they didn’t have the data infrastructure to report negative cases. And, to be fair, why should they? No one reports negative HIV tests. No one signed up for a job at the state Department of Health expecting to deliver much more than a monthly report on some key health statistics. What they are expected to deliver now is far beyond what they were initially hired and trained to deliver. That most states have actually risen to this challenge over the course of the past few months is an example of nearly heroic ingenuity and competency that is never remarked upon.
Even with these enormous efforts, the data quality in the early stages of this crisis was all over the map. California had a debacle in which 60,000 COVID tests were “pending” for months as labs that promised to deliver results were overwhelmed with tests and tried to bluff their way through. Washington State had a nearly week-long stretch when its people reported nothing at all while they tried to roll out a new data dashboard. Connecticut stopped reporting all non-COVID deaths to the CDC for two months and had to play catch-up in early July. Several states have combined results from antigen tests (does this person have an active case?) and antibody tests (did this person recover from COVID?). There have been mistakes and errors everywhere as we stumble toward a stable, reliable, accurate data-delivery system.
And then there is Florida.
_____________
If you look at the country’s most controversial COVID state strictly from a data and reporting perspective, you’ll note that Florida has a transparent, high-functioning, well-oiled data machine. It has delivered clear, quality, up-to-date information and made it accessible dating back to the early days of this crisis. Its collectors have been flexible when the situation called for it, and they have wrangled a complicated and involved process with a small team of dedicated data professionals.
Are you interested in long-term care (nursing home) cases? Florida has that. Antibody reports? They’re there, too. Prison-facility reports? Yep. Every day Florida provides data on positives, negatives, and inconclusive tests, how many tests they’re still waiting on results for, how many cases involved travel or contact with another known case. Florida’s fact-gatherers do statistical work to make sure they don’t include the same person taking multiple tests multiple times in their final report. Florida is the only state that provides open access to the case-line data, which is to say, every single individual case that has tested positive along with vital statistics, resident status, and known travel or contact with another known case. You could go download this right now, it’s not hard to find. It’s all out in the open.
When researchers want to test a theory, they often start with Florida’s data because the numbers are comprehensive and easy to access. When we started hearing in June that new cases were largely younger patients, the most accessible dataset with which we could test this theory was Florida’s. The theory ended up proving true, and it was Florida’s data that gave us this level of granularity utterly unprecedented in this country and around the world.
Unfortunately, that is not the story that most people know because most people do not make a practice of checking data at the source. Instead they rely on others to help them understand what is happening in their state. In an ideal world, this would be a task delegated to local and regional news organizations, but these have fallen desperately short. News organizations have been reporting half-true and poorly understood metrics, in the process misreading data and seeing conscious villainy in the very normal irregularities that come along with real-time data reporting. That has been the case with Florida.
According to news reports over the past few months
-
- Florida has twice as many deaths as reported (a claim based on a faulty reading of a CDC report)
-
- Governor Ron DeSantis is, the Guardian claimed,“cooking the books” (a false representation ofa data review requested by the official state epidemiologist)
-
- Florida is “blocking COVID information” (a misunderstanding of an effort to protect patient privacy, which saw the state requesting the redaction of patient names and identifying details)
-
- Florida told county coroners to stop releasing COVID death data (which it did, but not to suppress anything, rather because they were not releasing the data in line with CDC recommendations).
As someone who has been closely watching the data from the beginning, I feel as if I’m living in another world altogether as I watch these conspiracy theories develop in real time. News stories and analyses are undertaken by tendentious people who cannot seem to grasp that data gathering and reporting are difficult and uneven tasks, prone to revisions and delays—or who are just trying to score naked partisan points.
Dozens of Florida reporters, egged on by clicks and Twitter, have put out so much terrible, inaccurate information that the people of Florida simply don’t know what to believe and have lost all faith in a data operation properly touted by Deborah Birx, the coordinator of the executive branch’s response to the virus, as an example to emulate.
The most blatant example is the complaint that Florida is deleting deaths from its database to make its death count seem smaller and thereby circumvent arguments that the state should be locked down. On the surface, the claim that Florida has been deleting deaths is true—but only to the extent that every state has done this. Washington State has made multiple revisions to its death data as officials determined that cases previously marked as COVID deaths were suicides, homicides, overdoses, or accidents. Massachusetts and Pennsylvania have done this as well. Those changes have been reported as straight news, a simple revision of the data based on evolving information. Only in Florida is the very boring but vital work of data management a task with dark intent, with every move scrutinized as the malicious action of a vain governor—a conspiracy that involves dozens of hospitals, county health departments, data scientists, epidemiologists and researchers, and both Republican and Democratic members of the Florida legislature.
The Sun-Sentinel editorial page has all but accused Governor DeSantis personally of “rigging COVID-19 data.” This is astonishingly unfair, and worse, such ideas undermine the very serious efforts Florida officials have made here. Even as Florida data continues to be a touchstone for analysis, rigor, and review, much of the public has lost faith in it. The narrative is now set for people, and there is no going back from it. In pursuit of this narrative, the press (especially the Florida press) has repeatedly misinformed, fumbled basic data concepts, and omitted vital information. They have taken any small accident in reporting, any decision to protect patient privacy, any after-the-fact data correction, and spun it into a conspiracy.
There is a stark irony in the fact that it is Florida’s transparency that even makes these stories possible. Reporters have free access to the raw data and end up reporting on every statistical anomaly and data revision simply because they can see it happening. In many cases, they will retract their panic-inducing headlines as they discover more and more context.
One example: On July 13, Florida listed the number of cases and the positive percentage reported by every single lab in the state. Several labs were listed as reporting 100 percent positive COVID cases. If this had been true, it would have been extremely bad news. But a careful understanding of the data would have led any sensible analyst to make the educated guess that these labs probably simply didn’t report their negative tests. This has been a known data problem across the country since the very beginning of the crisis, and understanding that context would lead one to be cautious with this information.
Instead of caution, Fox35 Orlando led with the headline “Some Florida labs report 100% positivity rates.” The headline was picked up by NPR and ABC News and reported nationwide.
The Fox35 reporters called the labs to confirm these reports. The lab managers, sensing the panic and worried about their jobs, denied reporting 100 percent positives and replied that they were getting only 9–10 percent positives. The headline was then changed to “Hospitals Confirm Mistakes in COVID-19 Report”— suggesting that the Florida Department of Health had made the mistake since it had chosen to include the information that the lab gave them. (It’s possible, I suppose, that the DoH could have redacted those numbers as incomplete information, but then its officials would have been accused of hiding COVID numbers from the public.)
The Fox35 reporters returned to the DoH and requested clarification. Finally, four days later, they made the final update to their story with the headline changed again, to “Some Labs Have Not Reported Negative COVID 19 Results.” Four days of panic, inaccuracy, and misrepresentative headlines had been elevated to the national level, all due to the fact it takes time to get the data right.
Gathering data is a dirty job that involves mistakes, revisions, updates, and the assumption that you’re going to get it wrong on your way to getting it right. The data we report today aren’t correct and that isn’t a scandal. The data we gather tomorrow will give us more context about today. In late July, the CDC was still collecting, cleaning, and revising data on deaths from early May. This might not be ideal, but it is not due to incompetence or malicious intent.
That is how public health data is collected: slowly and with an understanding that a single weird data point isn’t a scandal or a panic. It isn’t a breaking news story. It’s something we should make a mental note of and include in a monthly or yearly review on how to improve our reporting mechanisms. In any normal time, this is what would happen. But in the middle of a pandemic, we are seeing reports on the rawest forms of data collection and people who have long assumed that numbers are a transparent representation of reality act on that assumption and forget to hold back their judgement.
In truth, every new data report is only a first draft of what we know. The data we see every day is coming from dedicated people who are working hard. They are all under stress, and their jobs are weighing on them. Understanding this is the only way to be fair to them and to the data, and it is the best way to find truth. If things go wrong, we need to think hard about how that could have gone wrong in a system where people are trying to make the best decisions.
In an ideal world, these lessons of patience, caution, and grace in a volatile situation would be ones that journalists applied to their reporting. It should not be the job of the reader to investigate every outrageous data point, every reporting error, every delay in the data. There has been enough time and enough embarrassing errors on the part of the press that we should hope to see some reporting reforms or at least a hint more caution. Instead, we have to assume that the news we get today is just the first draft and apply proper skepticism when it comes to making any kind of judgment—or accepting the judgments of the media voices responsible for so much misinformation, and disinformation.
We want to hear your thoughts about this article. Click here to send a letter to the editor.