Reproducibility – or the ability to obtain similar experimental results across replicate studies – in pre-clinical research is essential to the success of clinical trials. While this topic has garnered much attention in popular press, it is one that is sometimes not well understood and, other times, misrepresented to mean that all science is in crisis and failing. At the same time, such sensationalism often fails to acknowledge that animal research remains foundational to the development of medicines and that these challenges do not mean that the results of animal research are failing to advance scientific knowledge.
A newly published paper in the open access journal PLOS Biology by researchers at the Universities of Bern and Edinburgh suggests that single lab studies may contribute, in part, to poorer reproducibility of experimental results in preclinical research when compared to multi-lab studies. These researchers created computer-simulated models using the published results of 440 preclinical studies across 13 different interventions for stroke, heart attack, and breast cancer. The models were then used to compare the reproducibility of experimental results from multiple single-laboratory studies – which are currently more commonly performed in preclinical research – to that of studies from multiple (but different) laboratories.
First, these researchers selected 50 independent studies on the effect of hypothermia on stroke severity in rodent models. A meta-analysis of these studies showed that stroke severity, as assessed by the amount of dead brain tissue (infarct volume), was reduced by approximately 50% (effect size) when using hypothermia as a treatment. This effect size was then used as the criterion against which results from the simulated studies were compared. To simulate multi-laboratory studies, researchers combined data from multiple studies to mimic results produced by different laboratories. Less than 50% of the single-lab studies accurately predicted the meta-analytic effect size of a 50% decrease in infarct volume, whereas predictability increased to 73% of two-lab studies, 83% of three-lab studies, and 87% of four-lab studies.
The researchers then repeated the analysis with the 12 remaining interventions in animal models of stroke, heart attack, and breast cancer. In all cases, the single-laboratory studies less accurately predicted the results from the meta-analysis than the multi-laboratory studies. Additionally, the study results showed that accuracy was not improved with increased sample sizes in single-laboratory simulations. In fact, increasing the sample sizes reduced the accuracy of single-laboratory studies when comparing derived experimental results to the average meta-analytic effect size. While larger sample sizes may increase the precision of a given result within a single laboratory (i.e., an increase in power), this study demonstrates that multi-laboratory studies better approximate the average effect size obtained from meta-analyses. This indicates that more heterogeneous study samples (i.e., multi-laboratory studies) may be key to improving the reproducibility of experimental results.
These results are promising for preclinical science, not only because it points to a potential explanation, or contributing factor, to low reproducibility, but it also provides evidence that supports the conservation of resources. If increased sample sizes are not needed to increase reproducibility, then the number of animals in a given study could be reduced. Instead of increased numbers of animals within one study, the multi-laboratory study results demonstrated in this paper indicate that only two different laboratories are needed for the greatest gains in improved reproducibility. In light of their results, the authors now suggest that multi-laboratory studies should replace single-laboratory studies as the gold standard for late-phase preclinical trials. Of course, to be consistent with the concept and tenets of the scientific method, the first step would be to replicate the results of this study and to verify that the conclusions hold with other preclinical research interventions. By doing so, the strongest evidence will drive practices and policies about the science. And that, is the nature of science.
Voelkl, B., Vogt, L., Sena, E., Würbel, H. 2018. Reproducibility of pre-clinical animal research improves with heterogeneity of study samples, PLoS Biology, 16(2): e2003693.