Skip to content
Starts With A Bang

No, gain of function research did not cause COVID-19

SARS-CoV-2 first emerged in humans in 2019. Despite much noise generated by lab leak proponents, the evidence indicates a natural origin.
Horseshoe bats, as seen here in their natural environment, are abundant and diverse all across southern and central Asia, and carry a wide range of coronaviruses. The ancestral strain, RaTG13, is a 96% match for SARS-CoV-2, but is missing some important parts of the genetic sequence that lead SARS-CoV-2 to be infectious to humans. That genetic information, importantly, exists within viruses found in related animal populations throughout the wild.
Credit: orientalizing/flickr
Key Takeaways
  • Since the first cases of COVID-19 were first identified in early 2020 and dated back to late 2019, many questions have been raised concerning the origin of the virus that causes the illness: SARS-CoV-2.
  • Many have claimed the virus originated from a lab and was created through irresponsible gain-of-function research. They claim that deceitful scientists, rather than nature and poor human management, caused the pandemic.
  • However, the truth of the virus’s origin is written in the genome of the virus itself, and the genetic sequence of SARS-CoV-2 has definitively proven, since 2021, that it’s of natural origin. Here’s how.
Sign up for the Starts With a Bang newsletter
Travel the universe with Dr. Ethan Siegel as he answers the biggest questions of all

In late 2019, unbeknownst to all, a new virus first infected humans in Wuhan, China: SARS-CoV-2. This infectious virus — which turned out to be airborne, highly infectious, lethal to a few but capable of causing long-term damage to many — is the cause of the COVID-19 illness that has now infected at least 700 million (and likely billions) across the globe. By January of 2020, dozens of infections had spread to several countries, and the global response was insufficient to prevent a pandemic. As a result, at least seven million people (and possibly tens of millions) died while tens-to-hundreds of millions more were left with long-term, often disabling, conditions. That novel coronavirus, SARS-CoV-2, has gone on to mutate many times and continues to be infectious: to humans and animals both.

And yet, one popular conspiracy theory continues to thrive in the media and in politics: the idea that unlike all other virus-caused pandemics, which originated naturally, this virus was instead created in the lab.

The heart of the argument is that irresponsible researchers, who began with a non-infectious, non-lethal progenitor virus, used the technique of gain-of-function research to create SARS-CoV-2, which then infected someone within the lab and spread from there, creating the COVID-19 pandemic that has killed and infected so many.

If true, this would imply a conspiracy of the highest order. Many among the world’s leading scientists, nominally engaged in pandemic research and prevention, would actually be the culprits behind the greatest pandemic of the century thus far. But if untrue, then it would mean that innocent, even heroic people are being unjustly attacked and vilified by a completely baseless, meritless accusation.

Fortunately, the science very clearly points to one conclusion: that of a natural origin for SARS-CoV-2 and the COVID-19 infections it causes in humans. Here’s how we know.

Markets, such as this one in Hong Kong, often contain fruits, vegetables, animals, and other derivative products available for purchase. Produce and animals are brought in from up to thousands of kilometers away, including adjacent provinces and even foreign or offshore sources, for sale at such markets. Wuhan is the hub city for all of central/southern China, and goods, including animals, often arrive at markets there after journeys in excess of 1000 kilometers.
Credit: Philip Fong/AFP

What is gain-of-function research?

Let’s begin by answering a very basic question: what is this “gain-of-function” research that is being discussed, why is it so controversial, and is the controversy merited at all?

In evolutionary biology, arguably the most important relationship in any living organism is the relationship between structure and function. A structure is literally a part of the organism: arms, hands, fingers, and internal organs are all examples of structures in humans, but also includes microscopic structures such as spike proteins, receptor sites, and cleavage sites, which appear across many coronaviruses. If an organism is going to perform a function, from humans opening a tight jar to viruses entering and infecting the cell of a host organism, it needs a structure to perform that function.

As organisms evolve over many generations, they can grow new structures, modify existing structures, or lose certain structures, all of which can lead to either a gain-of-function or a loss-of-function. In human evolution, for example, we can see that our hominid ancestors had stronger jaws but smaller brains than modern humans; the weakening of our ancestors’ jaws represented a loss-of-function, but that loss-of-function enabled the part of the skull that contains the brain to grow larger, permitting the gain-of-function of smarter hominids through the structure of larger brains. Both of these processes occur naturally over many generations, but laboratory studies are also conducted with these goals in mind.

Illustration of various primate skulls, including human, showing comparative anatomy when humans arose.
This drawing shows a variety of human, monkey, and ape skulls from a variety of extant species. The older apes have smaller cranial capacities and smaller brains than humans, but stronger jaws, on average, by far. In order for large brains to develop, the jawbones needed to weaken: a loss-of-function adaptation. Modern humans have the greatest encephalization quotient of all known animals, followed by dolphins and then, more distantly, chimpanzees and some birds.
Credit: schinz de Visser, 1845/public domain

Certain selection pressures, for example, can be applied to an organism in the lab, and that will demand that the organism adapt — i.e., become able to perform a novel function — in order to survive. How that organism adapts to perform that function, however, is not something with a unique solution; there are many possible adaptations.

There was an experiment performed almost 20 years ago where they did this with a culture of bacteria, where they began with three identical cultures and placed them in an environment where their normal food source was scarce, while a nutrient source that the organism was ill-adapted to consume was abundant. The researchers evolved the bacteria slowly by gradually decreasing the normal food source while gradually increasing the alternate food source over tens of thousands of generations.

In all three cultures, bacteria survived. But what was remarkable is that each of the three cultures evolved different ways of performing the function of metabolizing the alternative nutrient source. They all gained function, but through different structures. Importantly, they all displayed vastly different genetic adaptations; their genomes had changed in different ways from one another. Then, the researchers took each of the three populations and did the reverse: gradually replacing the alternative nutrient source with the original nutrient source. Again, over tens of thousands of generations, each population evolved differently from the others, with different underlying genetic codes, but they all survived.

Through this type of research, both gain-of-function and loss-of-function research, we can gain insight into how organisms evolve to perform certain tasks. For viruses, in particular, this type of research holds tremendous promise for teaching us how to fight, treat, and even prevent deadly infections.

lab leak
Chinese virologist Shi Zhengli (L) is seen inside the P4 laboratory in Wuhan in this 2017 photo. The P4 epidemiological laboratory, part of the Wuhan Institute of Virology, is one of the world’s leading research centers on coronaviruses. Since 2020, it has also been the target of many baseless accusations about biosafety, secret research, bioweapons development, and more.
Credit: Johannes Eisele/AFP

What is the allegation made by lab leak proponents?

The allegation is simple: that this virus didn’t occur through natural evolution in the wild, and then spill over into the human population through human-animal contact. Instead, this virus came into existence by:

  • beginning with a benign (to human) virus,
  • that was brought into the Wuhan Institute for Virology for study,
  • that then had unreported gain-of-function research conducted on it,
  • which made it more infectious to humans,
  • and then an unreported infection event occurred where this virus actually infected a human,
  • and then that infected human went outside the lab and infected others, kicking off the COVID-19 pandemic.

That’s quite a tale, but one that you would expect the evidence to either support or refute on scientific grounds.

According to Alina Chan, writing in the New York Times, there are five pieces of evidence that lab leak proponents want everyone to consider.

  1. The SARS-like virus, SARS-CoV-2, that caused the pandemic, emerged in Wuhan: where the WIV (which researches SARS-like viruses) is located.
  2. A year prior to SARS-CoV-2’s emergence, the WIV, in collaboration with US-based partners, proposed creating viruses similar to SARS-CoV-2 through gain-of-function research.
  3. Scientists at the WIV pursued this type of work, under low biosafety conditions: conditions that could not have contained an infectious, airborne virus like SARS-CoV-2.
  4. The hypothesis of a natural spillover origin for COVID-19, from an animal at the Huanan Seafood Market in Wuhan, is not supported by the evidence.
  5. And that the key evidence that would be expected to have emerged from a natural spillover event, the progenitor animal host of SARS-CoV-2, has never been found.
Any type of research that’s going to be conducted with US funds or at US-based institutions has to meet both federal and institutional regulations, which are tightly enforced. There is also, importantly, additional oversight that is provided by occupational health services to make sure that research personnel are safe, as well as federal and institutional regulation of research involving animals (through IACUC) and humans (through IRB). There is no evidence that any of these standards were violated in conjunction with any and all research conducted at the Wuhan Institute of Virology.
Credit: F. Goodrum et al., Journal of Virology/American Society for Microbiology, 2023

But are these five things true? The first one certainly is: SARS-CoV-2 first emerged in Wuhan, and the WIV is located there too.

The second one is very misleading. Yes, there was an international collaboration to investigate the features of coronaviruses that could lead to the infection of humans, and that collaboration included US-based partners. However, they never created a potentially infectious virus, and importantly, the specific proposal to create a virus with the defining features of SARS-CoV-2 was rejected, and that research was never conducted.

The third point is simply untrue. All of the work that was conducted at WIV was pursued under standard biosafety procedures for the type of work that was done, and even passed an international inspection. The work that was pursued also was incapable of creating a novel, infectious virus such as SARS-CoV-2.

The fourth point is so thoroughly untrue that it demands pushback. The wet market origin hypothesis for SARS-CoV-2 is overwhelmingly supported by the evidence. Some of that evidence includes:

And the fifth point is very misleading: most pandemics never have the “progenitor animal host” that caused the spillover identified, so the fact that this hasn’t occurred for SARS-CoV-2 is expected, not evidence that it didn’t have a natural origin.

A typical example of a scene at a fur farm, showing human-animal contact. Animals are often killed en masse prior to them being skinned by hand at a pelt or fur farm. This industry is a $61 billion per year enterprise in China alone, and is a prime candidate for the zoonotic spillover of SARS-CoV-2 into humans that occurred at the Huanan Wet Market in China.
Credit: Viktor Drachev/AFP

Is there any way to know, based on the limited evidence we have, whether this virus emerged naturally or was created, via gain-of-function research, in a lab?

If we wanted absolute proof, we’d demand to know everything. To prove a natural origin, we’d ideally want to pinpoint the exact animal, how it got infected, what it was infected with, how it was transmitted to humans and when, etc. We’d want to identify “patient zero” for COVID-19, and ideally even find a natural reservoir of the original strain of SARS-CoV-2 circulating in a wild population. This is unlikely to be plausible.

However, if you wanted simply to convince yourself — as a reasonable person who was following all of the available evidence — that a natural origin was enormously favored over an origin that involved gain-of-function research, the bar suddenly gets much lower. What you can do is look at the following:

  • the genome of the original SARS-CoV-2 strain that first infected humans,
  • the genome of the closest virus that existed at WIV, prior to the pandemic, that could have possibly been subjected to gain-of-function research,
  • and the genomes of viruses found in the wild that have features that are found in SARS-CoV-2 that are missing from the closest-relative virus that pre-existed at WIV.

Why would you want to look at those? Because of the way that evolution works, and the way that genomes encode structures.

This image shows the standard RNA codon table, where each of the 64 possible three-base-pair codons involving U, C, A, and G bases are shown. These codons encode amino acids, as well as the information to begin (⇒) or end (Stop) encoding a particular protein out of those amino acids. Note the important feature of redundancy of the table, as there are only typically 20 amino acids for 64 codons. DNA typically encodes 20 amino acids as well, with thymine replacing uracil.
Credit: DNA and RNA codon tables/English Language Wikipedia

There are only five base pairs that appear in the genomes of living organisms: A, C, T, and G for organisms (like humans) with DNA-based genomes, and A, C, U, and G for organisms (like coronaviruses) with RNA-based genomes. It takes three consecutive base pairs to encode an amino acid, where every “three-in-a-row” set of base pairs forms what’s known as a codon. There are 64 possible codons for 20-to-22 amino acids, meaning the codons are redundant. And finally, by encoding the genetic sequence of many amino acids in a row — beginning with a “start” codon and ending with a “stop” codon — organisms come with the instructions to produce proteins, including the proteins that make up their various function-performing structures.

If you are conducting gain-of-function research in a lab, the structure that you form will likely be very different from the structures that exist in the wild, even if they perform similar functions. However, even in the event that the structures encoded via wild breeding and gain-of-function research stemmed from protein sequences that were identical, the genetic sequences that encoded those structures would have to be different. These key proteins are more than 100 amino acids long. Even if there were only two ways to encode every amino acid, that would mean the odds of getting identical sequences were 1-in-2100, or tens of thousands of times worse than your odds of winning the Powerball lottery three times in a row. Just like the initially-identical bacteria colonies subjected to the same selection pressures, evolution never works exactly the same way twice.

lab leak SARS-CoV-2
The SARS virus (orange) has a crown-like structure, meaning that it’s part of the coronavirus family of diseases. The novel coronavirus SARS-CoV-2, also known as the virus which causes COVID-19 in humans, is the largest, most lethal and long-term detrimental new pandemic to hit planet Earth since the dawn of the 21st century. Despite having a genetic sequence of only ~30,000 base pairs in it, this virus has killed over 7 million people since 2020, with many estimates for the true number of deaths rising into the tens of millions.
Credit: NIH

What is written in the genome of SARS-CoV-2?

The closest virus known to be at the WIV prior to the start of the pandemic is a bat virus known as RaTG13, which is 97% identical to the genome of SARS-CoV-2. (Out of ~30,000 base pairs in its genome, only about 1000 base pairs are different from SARS-CoV-2.) A major difference between RaTG13 and SARS-CoV-2 can been seen in the spike protein section of the viruses, where in particular the receptor-binding domain (RBD) of the two viruses show a significant (~10% or more) difference between one another over a span of the genome that encodes around 300 amino acids.

In September 2021, a study was posted (and then published in Nature in February of 2022) where they identified 46 new bat viruses collected from sites in Laos near the Mekong River (between July 2020 and January 2021, well after the start of the pandemic), three of which are now known as BANAL-20-52, BANAL-20-103, and BANAL-20-236.

All three of these contain part of the RBD, and BANAL-20-52 contains all of the RBD, which are a smoking-gun match for SARS-CoV-2 in exactly the way that RaTG13 is not. Because viruses can swap chunks of RNA with one another through the process of recombination, it teaches us that RaTG13 and BANAL-20-53 are likely cousins: of each other and also of SARS-CoV-2.

lab leak
The central idea of the lab leak hypothesis, that the virus spilled over from the Wuhan Institute of Virology, is only possible if the virus from which SARS-CoV-2 originated was actually ever inside the institute itself. If the virus originated naturally, with parts of it found in animals that were located in a wild population in Laos, which genetic sequencing uncovered in 2021 indicates, the lab leak hypothesis is ruled out as a possibility. You cannot create something through gain-of-function research that will have an identical genetic code to something that came about in the wild through natural processes such as recombination.
Credit: S. Temmam et al., Nature, 2022

In SARS-CoV-2, we have an organism with a genome that is very closely related to two types of strain of coronavirus: the RaTG13 strain and the various BANAL-20 strains. The RaTG13 strain contains many elements of SARS-CoV-2, but is missing critical sections, including the receptor binding domain (RBD) site on the spike protein. Conversely, the BANAL-20 strains also contain many elements of SARS-CoV-2, and does contain the RBD site on the spike protein, with BANAL-20-52 matching the entirety of the spike protein sequence better than RaTG13 and with BANAL-20-103 matching the first ~5000 bases in the sequence much better than RaTG13 and even better than BANAL-20-52.

Just as genetics teaches us how closely related we are to our various family members, it also teaches us how closely various virus strains are related to one another. While viruses don’t reproduce sexually like humans do, they do engage in recombination, which allows parts of one virus’s sequence to “swap” with another virus’s sequence. The fact that a pangolin virus was found to have an ACE2 receptor binding domain very similar to SARS-CoV-2 is unsurprising because viruses don’t just pass from animal to human and back, but also from animal to animal. In fact, SARS-related beta-coronaviruses are known to be highly promiscuous in this regard, undergoing recombination easily with other circulating viruses, which explains why SARS-CoV-2 appears to contain features that best match those of viruses found across several different, disconnected evolutionary lineages.

This color-coded diagram represents 15 recombinant fragments of various SARS-related beta coronaviruses compared to the original genome of SARS-CoV-2 that first infected humans. Several different strains show a “best match” for a variety of these 15 fragments, indicating a recombination-based origin for SARS-CoV-2, and refuting the feasibility of a lab creation through gain-of-function research.
Credit: S. Temmam et al., Nature, 2022

Conclusions

Put simply, if SARS-CoV-2 came about by frequent recombination through contact with common virus-carrying animals in the wild, we would expect it to display a mix of genome segments shared between it and several of their close cousins. However, if SARS-CoV-2 came about through manipulation in a lab, such as through gain-of-function research, we would expect a close match to one and only one “initial” strain, with the remainder of the genome failing to match any other wild strains. The discovery that SARS-CoV-2 does indeed have what biologists call a mosaic genome makes it abundantly clear that it could not have arisen through laboratory manipulation of an initial, single strain, through gain-of-function research or otherwise.

In 2021, the same Alina Chan who penned the NYT op-ed said the following:

“I have days where I think this could be natural. And if it’s natural, then I’ve done a terrible thing because I’ve put a lot of scientists in a very dangerous spot by saying that they could be the source of an accident that resulted in millions of people dying. I would feel terrible if it’s natural and I did all this.”

By following the evidence, we have learned that is precisely the case. It is natural. The observed recombination patterns that exist in the genome of SARS-CoV-2 must have been left behind by recombination events between parental lineages in the wild: where all of these different viral strains were able to meet and interbreed. Importantly, those patterns that are written in the genome of SARS-CoV-2 cannot be produced, simulated, or faked by any means in a laboratory environment.

Given that information, and the fact that this information is now nearly three full years old, it’s long past time to move past the ever-changing conspiracy theory of the lab leak hypothesis, and embrace reality. The genome of SARS-CoV-2 demonstrates it has a natural origin, whether we ever find the original virus in a wild population of animals or not. The misinformation being spread, and the scientists being vilified, over gain-of-function research has no basis in reality. A lot of scientists are, and have been for a few years now, in a very dangerous spot due to proponents of the lab leak hypothesis, as they are being accused of creating an accident that started the COVID-19 pandemic when in fact they were the proverbial firefighters working to extinguish it. It’s time to replace our conspiratorial fears with scientific truths, and to invest resources where they belong: in scientists who work to understand the Universe as it is, and to help humanity cope with the cold, hard reality that we all face.

The author acknowledges Dr. Philipp Markolin for his illuminating writing on this topic.

Sign up for the Starts With a Bang newsletter
Travel the universe with Dr. Ethan Siegel as he answers the biggest questions of all

Related

Up Next