No, gain of function research did not cause COVID-19
- Since the first cases of COVID-19 were first identified in early 2020 and dated back to late 2019, many questions have been raised concerning the origin of the virus that causes the illness: SARS-CoV-2.
- Many have claimed the virus originated from a lab and was created through irresponsible gain-of-function research. They claim that deceitful scientists, rather than nature and poor human management, caused the pandemic.
- However, the truth of the virus’s origin is written in the genome of the virus itself, and the genetic sequence of SARS-CoV-2 has definitively proven, since 2021, that it’s of natural origin. Here’s how.
In late 2019, unbeknownst to all, a new virus first infected humans in Wuhan, China: SARS-CoV-2. This infectious virus — which turned out to be airborne, highly infectious, lethal to a few but capable of causing long-term damage to many — is the cause of the COVID-19 illness that has now infected at least 700 million (and likely billions) across the globe. By January of 2020, dozens of infections had spread to several countries, and the global response was insufficient to prevent a pandemic. As a result, at least seven million people (and possibly tens of millions) died while tens-to-hundreds of millions more were left with long-term, often disabling, conditions. That novel coronavirus, SARS-CoV-2, has gone on to mutate many times and continues to be infectious: to humans and animals both.
And yet, one popular conspiracy theory continues to thrive in the media and in politics: the idea that unlike all other virus-caused pandemics, which originated naturally, this virus was instead created in the lab.
The heart of the argument is that irresponsible researchers, who began with a non-infectious, non-lethal progenitor virus, used the technique of gain-of-function research to create SARS-CoV-2, which then infected someone within the lab and spread from there, creating the COVID-19 pandemic that has killed and infected so many.
If true, this would imply a conspiracy of the highest order. Many among the world’s leading scientists, nominally engaged in pandemic research and prevention, would actually be the culprits behind the greatest pandemic of the century thus far. But if untrue, then it would mean that innocent, even heroic people are being unjustly attacked and vilified by a completely baseless, meritless accusation.
Fortunately, the science very clearly points to one conclusion: that of a natural origin for SARS-CoV-2 and the COVID-19 infections it causes in humans. Here’s how we know.
What is gain-of-function research?
Let’s begin by answering a very basic question: what is this “gain-of-function” research that is being discussed, why is it so controversial, and is the controversy merited at all?
In evolutionary biology, arguably the most important relationship in any living organism is the relationship between structure and function. A structure is literally a part of the organism: arms, hands, fingers, and internal organs are all examples of structures in humans, but also includes microscopic structures such as spike proteins, receptor sites, and cleavage sites, which appear across many coronaviruses. If an organism is going to perform a function, from humans opening a tight jar to viruses entering and infecting the cell of a host organism, it needs a structure to perform that function.
As organisms evolve over many generations, they can grow new structures, modify existing structures, or lose certain structures, all of which can lead to either a gain-of-function or a loss-of-function. In human evolution, for example, we can see that our hominid ancestors had stronger jaws but smaller brains than modern humans; the weakening of our ancestors’ jaws represented a loss-of-function, but that loss-of-function enabled the part of the skull that contains the brain to grow larger, permitting the gain-of-function of smarter hominids through the structure of larger brains. Both of these processes occur naturally over many generations, but laboratory studies are also conducted with these goals in mind.
Certain selection pressures, for example, can be applied to an organism in the lab, and that will demand that the organism adapt — i.e., become able to perform a novel function — in order to survive. How that organism adapts to perform that function, however, is not something with a unique solution; there are many possible adaptations.
There was an experiment performed almost 20 years ago where they did this with a culture of bacteria, where they began with three identical cultures and placed them in an environment where their normal food source was scarce, while a nutrient source that the organism was ill-adapted to consume was abundant. The researchers evolved the bacteria slowly by gradually decreasing the normal food source while gradually increasing the alternate food source over tens of thousands of generations.
In all three cultures, bacteria survived. But what was remarkable is that each of the three cultures evolved different ways of performing the function of metabolizing the alternative nutrient source. They all gained function, but through different structures. Importantly, they all displayed vastly different genetic adaptations; their genomes had changed in different ways from one another. Then, the researchers took each of the three populations and did the reverse: gradually replacing the alternative nutrient source with the original nutrient source. Again, over tens of thousands of generations, each population evolved differently from the others, with different underlying genetic codes, but they all survived.
Through this type of research, both gain-of-function and loss-of-function research, we can gain insight into how organisms evolve to perform certain tasks. For viruses, in particular, this type of research holds tremendous promise for teaching us how to fight, treat, and even prevent deadly infections.
What is the allegation made by lab leak proponents?
The allegation is simple: that this virus didn’t occur through natural evolution in the wild, and then spill over into the human population through human-animal contact. Instead, this virus came into existence by:
- beginning with a benign (to human) virus,
- that was brought into the Wuhan Institute for Virology for study,
- that then had unreported gain-of-function research conducted on it,
- which made it more infectious to humans,
- and then an unreported infection event occurred where this virus actually infected a human,
- and then that infected human went outside the lab and infected others, kicking off the COVID-19 pandemic.
That’s quite a tale, but one that you would expect the evidence to either support or refute on scientific grounds.
According to Alina Chan, writing in the New York Times, there are five pieces of evidence that lab leak proponents want everyone to consider.
- The SARS-like virus, SARS-CoV-2, that caused the pandemic, emerged in Wuhan: where the WIV (which researches SARS-like viruses) is located.
- A year prior to SARS-CoV-2’s emergence, the WIV, in collaboration with US-based partners, proposed creating viruses similar to SARS-CoV-2 through gain-of-function research.
- Scientists at the WIV pursued this type of work, under low biosafety conditions: conditions that could not have contained an infectious, airborne virus like SARS-CoV-2.
- The hypothesis of a natural spillover origin for COVID-19, from an animal at the Huanan Seafood Market in Wuhan, is not supported by the evidence.
- And that the key evidence that would be expected to have emerged from a natural spillover event, the progenitor animal host of SARS-CoV-2, has never been found.
But are these five things true? The first one certainly is: SARS-CoV-2 first emerged in Wuhan, and the WIV is located there too.
The second one is very misleading. Yes, there was an international collaboration to investigate the features of coronaviruses that could lead to the infection of humans, and that collaboration included US-based partners. However, they never created a potentially infectious virus, and importantly, the specific proposal to create a virus with the defining features of SARS-CoV-2 was rejected, and that research was never conducted.
The third point is simply untrue. All of the work that was conducted at WIV was pursued under standard biosafety procedures for the type of work that was done, and even passed an international inspection. The work that was pursued also was incapable of creating a novel, infectious virus such as SARS-CoV-2.
The fourth point is so thoroughly untrue that it demands pushback. The wet market origin hypothesis for SARS-CoV-2 is overwhelmingly supported by the evidence. Some of that evidence includes:
- a 2023 critical analysis of the evidence for the origin of SARS-CoV-2,
- a separate 2023 study on the origins of SARS-CoV-2,
- a 2023 study noting a paucity of evidence for a lab origin and raising concerns about the way lab leak proponents have misinformed the public about the nature of gain-of-function research,
- a 2021 study critically reviewing the (strong) evidence for a zoonotic origin for SARS-CoV-2,
- a 2022 study that identifies the Huanan Seafood Wholesale Market in Wuhan (not the WIV) as the early epicenter of the COVID-19 pandemic,
- and a 2024 (free) article in the Annual Review of Virology that concludes “The available data clearly point to a natural zoonotic emergence within, or closely linked to, the Huanan Seafood Wholesale Market in Wuhan,” while further noting that no evidence links SARS-CoV-2 to laboratory work at WIV.
And the fifth point is very misleading: most pandemics never have the “progenitor animal host” that caused the spillover identified, so the fact that this hasn’t occurred for SARS-CoV-2 is expected, not evidence that it didn’t have a natural origin.
Is there any way to know, based on the limited evidence we have, whether this virus emerged naturally or was created, via gain-of-function research, in a lab?
If we wanted absolute proof, we’d demand to know everything. To prove a natural origin, we’d ideally want to pinpoint the exact animal, how it got infected, what it was infected with, how it was transmitted to humans and when, etc. We’d want to identify “patient zero” for COVID-19, and ideally even find a natural reservoir of the original strain of SARS-CoV-2 circulating in a wild population. This is unlikely to be plausible.
However, if you wanted simply to convince yourself — as a reasonable person who was following all of the available evidence — that a natural origin was enormously favored over an origin that involved gain-of-function research, the bar suddenly gets much lower. What you can do is look at the following:
- the genome of the original SARS-CoV-2 strain that first infected humans,
- the genome of the closest virus that existed at WIV, prior to the pandemic, that could have possibly been subjected to gain-of-function research,
- and the genomes of viruses found in the wild that have features that are found in SARS-CoV-2 that are missing from the closest-relative virus that pre-existed at WIV.
Why would you want to look at those? Because of the way that evolution works, and the way that genomes encode structures.
There are only five base pairs that appear in the genomes of living organisms: A, C, T, and G for organisms (like humans) with DNA-based genomes, and A, C, U, and G for organisms (like coronaviruses) with RNA-based genomes. It takes three consecutive base pairs to encode an amino acid, where every “three-in-a-row” set of base pairs forms what’s known as a codon. There are 64 possible codons for 20-to-22 amino acids, meaning the codons are redundant. And finally, by encoding the genetic sequence of many amino acids in a row — beginning with a “start” codon and ending with a “stop” codon — organisms come with the instructions to produce proteins, including the proteins that make up their various function-performing structures.
If you are conducting gain-of-function research in a lab, the structure that you form will likely be very different from the structures that exist in the wild, even if they perform similar functions. However, even in the event that the structures encoded via wild breeding and gain-of-function research stemmed from protein sequences that were identical, the genetic sequences that encoded those structures would have to be different. These key proteins are more than 100 amino acids long. Even if there were only two ways to encode every amino acid, that would mean the odds of getting identical sequences were 1-in-2100, or tens of thousands of times worse than your odds of winning the Powerball lottery three times in a row. Just like the initially-identical bacteria colonies subjected to the same selection pressures, evolution never works exactly the same way twice.
What is written in the genome of SARS-CoV-2?
The closest virus known to be at the WIV prior to the start of the pandemic is a bat virus known as RaTG13, which is 97% identical to the genome of SARS-CoV-2. (Out of ~30,000 base pairs in its genome, only about 1000 base pairs are different from SARS-CoV-2.) A major difference between RaTG13 and SARS-CoV-2 can been seen in the spike protein section of the viruses, where in particular the receptor-binding domain (RBD) of the two viruses show a significant (~10% or more) difference between one another over a span of the genome that encodes around 300 amino acids.
In September 2021, a study was posted (and then published in Nature in February of 2022) where they identified 46 new bat viruses collected from sites in Laos near the Mekong River (between July 2020 and January 2021, well after the start of the pandemic), three of which are now known as BANAL-20-52, BANAL-20-103, and BANAL-20-236.
All three of these contain part of the RBD, and BANAL-20-52 contains all of the RBD, which are a smoking-gun match for SARS-CoV-2 in exactly the way that RaTG13 is not. Because viruses can swap chunks of RNA with one another through the process of recombination, it teaches us that RaTG13 and BANAL-20-53 are likely cousins: of each other and also of SARS-CoV-2.
In SARS-CoV-2, we have an organism with a genome that is very closely related to two types of strain of coronavirus: the RaTG13 strain and the various BANAL-20 strains. The RaTG13 strain contains many elements of SARS-CoV-2, but is missing critical sections, including the receptor binding domain (RBD) site on the spike protein. Conversely, the BANAL-20 strains also contain many elements of SARS-CoV-2, and does contain the RBD site on the spike protein, with BANAL-20-52 matching the entirety of the spike protein sequence better than RaTG13 and with BANAL-20-103 matching the first ~5000 bases in the sequence much better than RaTG13 and even better than BANAL-20-52.
Just as genetics teaches us how closely related we are to our various family members, it also teaches us how closely various virus strains are related to one another. While viruses don’t reproduce sexually like humans do, they do engage in recombination, which allows parts of one virus’s sequence to “swap” with another virus’s sequence. The fact that a pangolin virus was found to have an ACE2 receptor binding domain very similar to SARS-CoV-2 is unsurprising because viruses don’t just pass from animal to human and back, but also from animal to animal. In fact, SARS-related beta-coronaviruses are known to be highly promiscuous in this regard, undergoing recombination easily with other circulating viruses, which explains why SARS-CoV-2 appears to contain features that best match those of viruses found across several different, disconnected evolutionary lineages.
Conclusions
Put simply, if SARS-CoV-2 came about by frequent recombination through contact with common virus-carrying animals in the wild, we would expect it to display a mix of genome segments shared between it and several of their close cousins. However, if SARS-CoV-2 came about through manipulation in a lab, such as through gain-of-function research, we would expect a close match to one and only one “initial” strain, with the remainder of the genome failing to match any other wild strains. The discovery that SARS-CoV-2 does indeed have what biologists call a mosaic genome makes it abundantly clear that it could not have arisen through laboratory manipulation of an initial, single strain, through gain-of-function research or otherwise.
In 2021, the same Alina Chan who penned the NYT op-ed said the following:
“I have days where I think this could be natural. And if it’s natural, then I’ve done a terrible thing because I’ve put a lot of scientists in a very dangerous spot by saying that they could be the source of an accident that resulted in millions of people dying. I would feel terrible if it’s natural and I did all this.”
By following the evidence, we have learned that is precisely the case. It is natural. The observed recombination patterns that exist in the genome of SARS-CoV-2 must have been left behind by recombination events between parental lineages in the wild: where all of these different viral strains were able to meet and interbreed. Importantly, those patterns that are written in the genome of SARS-CoV-2 cannot be produced, simulated, or faked by any means in a laboratory environment.
Given that information, and the fact that this information is now nearly three full years old, it’s long past time to move past the ever-changing conspiracy theory of the lab leak hypothesis, and embrace reality. The genome of SARS-CoV-2 demonstrates it has a natural origin, whether we ever find the original virus in a wild population of animals or not. The misinformation being spread, and the scientists being vilified, over gain-of-function research has no basis in reality. A lot of scientists are, and have been for a few years now, in a very dangerous spot due to proponents of the lab leak hypothesis, as they are being accused of creating an accident that started the COVID-19 pandemic when in fact they were the proverbial firefighters working to extinguish it. It’s time to replace our conspiratorial fears with scientific truths, and to invest resources where they belong: in scientists who work to understand the Universe as it is, and to help humanity cope with the cold, hard reality that we all face.
The author acknowledges Dr. Philipp Markolin for his illuminating writing on this topic.