As a parent of three children under 4, I was hit hard by last month's announcement that the Food and Drug Administration was delaying its review of Pfizer-BioNTech's Covid-19 vaccine for children under 5.
Like many caregivers guarding young children against the coronavirus, my winter has been full of rapid tests, mask reorders and outdoor play dates in borderline frostbite conditions. I'm able to manage this because I believe it's temporary; we just need to hold out a little longer until our children can get vaccinated.
But because I study statistics, I'm also racked with concern that if the data had been assessed in a more nuanced way, we might be putting vaccination appointments on the family calendar right now.
It's unclear why the FDA paused the review. The most recent data hasn't been shared, and reporting suggests Pfizer found that the omicron wave led to many more infections than previously seen in its clinical trial. The decision was made to wait for data on the third dose. Perhaps the two doses were not effective enough for the full group, though earlier data had suggested the vaccines produced a desired immune response for children ages 6 months to 24 months.
The bigger issue, as I see it, is in general statistical methods that are often relied on to evaluate the effectiveness of vaccines and drugs. The standard approach used in almost all clinical trials and endorsed by the FDA requires new drugs to meet an arbitrary statistical threshold, the one people who have taken stats classes may recognize as statistical significance. This is appealing because it serves as a standardized final exam that experimental results all have to pass, unaided by preconceptions on the part of the reviewers or special pleading by the experimenters.
But the whole idea of statistical significance has been losing favor among many statisticians, for two good reasons. For one, this thinking is inherently binary; after the number crunching is complete, results are classified as significant or not significant, suggesting a finality and certitude that are rarely justified, and second, like any standardized test, it's overly reductive. If relied on too heavily, it becomes a substitute for a more thoughtful, holistic analysis of the data, including important scientific context.
Nearly three years ago, an open letter signed by more than 800 scientists called for an end to the practice, and prominent statisticians, including the head of the American Statistical Association, put it bluntly: "Don't say 'statistically significant.' " Too often, they said, this binary labeling of results as worthy or unworthy has become "the antithesis of thoughtfulness," a shortcut around what should be the hard work of any statistical inquiry.
What we need for the under-5 vaccine trial evaluation, instead of judgments of absolute safety or efficacy, is probable improvement over the next best alternative, taking into consideration all the available information. Even the concept of an emergency use authorization challenges the ordinary FDA binary of approval and disapproval. We should take that idea and extend it.