- The replication crisis refers to a longstanding pattern of researchers being unable to replicate the findings of previous studies.
- A recent study suggests that the replication crisis is overblown.
- It is often claimed that the replication crisis has a negative impact on the public's already-dwindling perception of science. However, lack of reproducibility is an inherent feature of scientific fields exploring bold ideas.
Over the last decade and a half, several reports have brought attention to a growing crisis in science. Many scientific studies, especially in the life sciences cannot be replicated. For instance, a major reproducibility project that sought to replicate 193 experiments from 53 high-profile cancer biology research papers ran into multiple barriers. In the end, it could only repeat 50 experiments from 23 papers.
Understandably, the replication crisis questions the foundation of knowledge generation. If we cannot replicate the findings of experiments, what degree of belief should scientists or the public have in the majority of scientific research?
It’s possible, however, that the replication crisis is either nonexistent or highly overstated. Here is a look at two lines of arguments.
Base rate fallacy
When there were waves of COVID cases following vaccination efforts, many people on the internet suggested that COVID vaccines were ineffective. One of their primary reasons was that we see more cases in vaccinated individuals than unvaccinated ones.
This is a classic example of the base rate fallacy, which is the tendency to ignore the general prevalence (the base rate) of a phenomenon and instead focus on data pertaining only to a specific group or situation. If most people are vaccinated, even a smaller fraction of these individuals could be as large or greater than a much larger fraction of the unvaccinated minority.
Why does this matter for replication in science? British philosopher Alexander Bird argues that the base rate fallacy explains it.
When studying any phenomenon, it won’t be surprising if most hypotheses are wrong. This could be due to a number of reasons, including a tendency to test bold ideas or because the particular field of study is largely unexplored and difficult (like cancer biology). A high prior likelihood of hypotheses being wrong is then consistent with high-quality science. Due to inherent variability in experiments, some of these wrong hypotheses will falsely be proven right.
If false hypotheses constitute a large share of all possible hypotheses, the number of these false proofs may be comparable in numbers to the few hypotheses that are (correctly) proven right. This situation is further biased by the fact that scientists are more likely to report studies where they find something — regardless of whether it is really true or not — than when their hypotheses fail. Later experiments that attempt to replicate these falsely proven hypotheses will fail to do so.
Among scientific fields, psychology and clinical medicine are often reported to have the lowest rates of reproducibility. Unlike physics, our knowledge of biology is still far too incomplete to be able to understand biological systems from first principles. Consequently, hypotheses are more likely to be false, explaining the low reproducibility of experiments.
Some scientists suggest that the replication crisis is a symptom of systemic challenges. According to them, one of the main reasons studies are irreproducible is the publish-or-perish pressure that researchers often face. Are alarmingly large numbers of scientists resorting to unethical practices or shortcuts to achieve publishable results that are hard to reproduce?
In a preprint published on SocArXiv, researchers argue that low reproducibility in a field can exist even if there are no scientists engaging in data forgery or other questionable practices.
The authors, however, do not agree with the base rate fallacy argument. In flipping the argument on its head, they cite how the average newly-hired assistant professor of psychology has 16 publications. These publications test multiple hypotheses, with few of them featuring negative results (owing to the publishing industry and funding agencies incentivizing positive results). If most hypotheses were unlikely to be true, generating such a body of positive results would be unfathomable for most young academics.
To prevent researchers from hypothesizing after results are known (HARKing), clinical or psychological studies are preregistered. This means that their hypotheses and methods are documented before the studies are performed. Comparing preregistered studies to published studies suggests that many hypotheses are indeed false. However, the difference in the numbers of false and true hypotheses (among those tested) isn’t large enough for the base rate fallacy to be a sufficient explanation for the replication crisis.
Using low replication rates to claim that a field produces a large number of incorrect findings requires the assumption that effect sizes are fixed. In an experiment that tells how two parameters are related, effect size tells how strong the relationship is. However, depending on the context of the experiment, effect size can vary greatly.
The authors built a statistical model of publication and replication that incorporated variations in effect sizes. The simulations showed replication rates as low as 50% without incorporating unethical behavior.
Rethinking the replication crisis
It is often claimed that the replication crisis has a negative impact on the public’s already-dwindling perception of science. However, lack of reproducibility is an inherent feature of scientific fields exploring bold ideas.
If a hypothesis is highly unlikely to be true, even a positive result means that it is still unlikely that it is indeed true. Results overturned by later experiments highlight the self-correcting nature of science.
Those concerned with the replication crisis offer a range of potential solutions. However, if the crisis is a mere statistical outcome of the scale of modern science, these solutions may have unintended consequences. For example, lowering the significance, which some scientists think can help, will hurt productivity without improving the replication rate.
The authors also suggest that “some reforms may impose disproportionate costs to early career researchers, particularly those whose identities are underrepresented in science.”