UNIVERSITY PARK, Pa. — It was late January 2020 when Maciej Boni realized that the COVID-19 pandemic was about to take over his life.
Boni, associate professor of biology, is an epidemiologist with extensive expertise in viral evolution, including a recent focus on human and avian flu. When COVID-19 hit, he tapped into a network of colleagues around the world, quickly joining an international team intent on tracking the outbreak to its origins.
Coronaviruses like SARS-CoV-2, Boni knew, are highly recombinant, each a genetic mash-up of bits and pieces picked up and discarded through generations of evolution. As a graduate student, he had created the recombination detection algorithm 3SEQ, the most accurate method yet devised for identifying recombinant viruses, and his research group in Penn State’s Center for Infectious Disease Dynamics continues to maintain this important tool.
“So I thought, why not see how highly recombinant SARS-CoV-2 is?” he said.
The first reason for wanting to know the viral origins of an outbreak is to stop it. “Identify the point source and close a poultry market, close a wet market, isolate a single district before it’s gotten to thousands of people,” as Boni said. In the case of SARS-CoV-2, however, the outbreak had already spread too far for that kind of intervention. If he and other experts could determine where the virus had come from, they would have a better chance at predicting where it was going. Understanding the evolutionary history, moreover, would be critical for preventing future outbreaks.
To untangle the details of the SARS-CoV-2 genome, Boni and his colleagues used bioinformatics to pull out the recombinant segments.
“[That] left us with two or three major segments that, as far as we can tell, have not been broken up and pasted back together,” he said. Using these fixed elements as a kind of evolutionary backbone, they created a family tree of all the coronaviruses they could identify in what was left. Within that panoply, they calculated that SARS-CoV-2 and its closest relative, a bat virus called RaTG13, diverged from a common ancestor between 40 and 70 years ago.
That means SARS-CoV-2 has been circulating in bats for decades, Boni says. What’s more, one of the older traits that SARS-CoV-2 shares with RaTG13 and other close relatives is its receptor-binding site, the genetic mechanism that enables the virus to recognize and bind to receptors inside the human lung.