Skip Navigation


Skip Side Navigation

OU-Led Research Uses Artificial Intelligence to Guide the Search for the Next SARS-COV-2

Daniel Becker

Daniel Becker, an assistant professor of biology in the Dodge Family College of Arts and Sciences at the University of Oklahoma, has been leading a proactive modeling study over the last year and a half to identify bat species that are likely to carry betacoronaviruses, including but not limited to SARS-like viruses.

The study “Optimizing predictive models to prioritize viral discovery in zoonotic reservoirs,” which was published by Lancet Microbe, was guided by Becker; Greg Albery, a postdoctoral fellow at Georgetown University’s Bansal Lab; and Colin J. Carlson, an assistant research professor at Georgetown’s Center for Global Health Science and Security.

It also included collaborators from the University of Idaho, Louisiana State University, University of California Berkeley, Colorado State University, Pacific Lutheran University, Icahn School of Medicine at Mount Sinai, University of Glasgow, Université de Montréal, University of Toronto, Ghent University, University College Dublin, Cary Institute of Ecosystem Studies, and the American Museum of Natural History.

Becker and colleagues’ study is part of the broader efforts of an international research team called the Verena Consortium (, which works to predict which viruses could infect humans, which animals host them and where they could emerge. Albery and Carlson were co-founders of the consortium in 2020, with Becker as a founding member.

Despite global investments in disease surveillance, it remains difficult to identify and monitor wildlife reservoirs of viruses that could someday infect humans. Statistical models are increasingly being used to prioritize which wildlife species to sample in the field, but the predictions being generated from any one model can be highly uncertain. Scientists also rarely track the success or failure of their predictions after they make them, making it hard to learn and make better models in the future. Together, these limitations mean that there is high uncertainty in which models may be best suited to the task.

In this study, researchers used bat hosts of betacoronaviruses, a large group of viruses that includes those responsible for SARS and COVID-19, as a case study for how to dynamically use data to compare and validate these predictive models of likely reservoir hosts. The study is the first to prove that machine learning models can optimize wildlife sampling for undiscovered viruses and illustrates how these models are best implemented through a dynamic process of prediction, data collection, validation and updating.

In the first quarter of 2020, researchers trained eight different statistical models that predicted which kinds of animals could host betacoronaviruses. Over more than a year, the team then tracked discovery of 40 new bat hosts of betacoronaviruses to validate initial predictions and dynamically update their models. The researchers found that models harnessing data on bat ecology and evolution performed extremely well at predicting new hosts of betacoronaviruses. In contrast, cutting-edge models from network science that used high-level mathematics – but less biological data – performed roughly as well or worse than expected at random.

Importantly, their revised models predicted over 400 bat species globally that could be undetected hosts of betacoronaviruses, including not only in southeast Asia but also in sub-Saharan Africa and the Western Hemisphere. Although 21 species of horseshoe bats (in the Rhinolophus genus) are known to be hosts of SARS-like viruses, researchers found at least two-fourths of plausible betacoronavirus reservoirs in this bat genus might still be undetected.

“One of the most important things our study gives us is a data-driven shortlist of which bat species should be studied further,” said Becker, who adds that his team is now working with field biologists and museums to put their predictions to use. “After identifying these likely hosts, the next step is then to invest in monitoring to understand where and when betacoronaviruses are likely to spill over.”

Becker added that although the origins of SARS-CoV-2 remain uncertain, the spillover of other viruses from bats has been triggered by forms of habitat disturbance, such as agriculture or urbanization.

“Bat conservation is therefore an important part of public health, and our study shows that learning more about the ecology of these animals can help us better predict future spillover events,” he said.

Rhinolophus rouxi, which inhabits parts of South Asia, was identified as a likely but undetected betacoronavirus host by the authors. Photo credit: Brock and Sherri Fenton.