Applying machine learning to identify unrecognized Covid-19 deaths in the US
| Source: HN | Original article
The research team led by Kiang et al. has released a machine‑learning analysis that revises the United States’ COVID‑19 death toll for the first two pandemic years. By training a gradient‑boosted model on more than 2 million death certificates that listed COVID‑19 as a cause, the algorithm learned to recognise the textual and coding patterns that signal a pandemic‑related fatality. When applied to the full set of certificates from March 2020 through December 2021, the model identified 155,536 deaths—19 % higher than the 995,787 COVID‑19 deaths officially recorded. The 95 % uncertainty interval (150,062–161,112) suggests that a substantial share of fatalities were logged under other causes such as pneumonia, heart disease or “unspecified respiratory failure.”
The finding matters because mortality statistics drive everything from federal funding allocations to public‑health preparedness assessments. Under‑counting obscures the true impact of the virus, hampers evaluation of past interventions, and may skew models that forecast future health crises. Moreover, the study demonstrates that AI can systematically audit vital‑statistics systems, exposing gaps that traditional surveillance missed.
What to watch next is how health agencies respond. The Centers for Disease Control and Prevention has signalled interest in integrating AI‑based cross‑checks into its National Center for Health Statistics pipeline, a move that could refine real‑time reporting in future outbreaks. Parallel work is already under way to adapt the approach for other infectious diseases and for sub‑national analyses that could reveal geographic disparities in misclassification. As we reported on April 5, 2026, the pandemic’s death toll remains a contested figure; this new evidence adds a quantitative backbone to calls for more transparent, AI‑augmented mortality tracking.
Sources
Back to AIPULSEN