Global study reveals low reliability of research in psychology

An international AI-powered study of more than 14,100 articles in top journals has confirmed the poor reproducibility of (especially experimental) research in psychology – a blow to the discipline. Research methods and citation impact can help predict whether research is reliable, but university prestige and citation numbers do not.

Replicability is a crucial aspect of research, pointed out study co-author Professor Brian Uzzi, a professor of leadership at Northwestern University in the United States. “Replicability means that a scientific finding is a fact. The results can be counted on to hold and work as expected again and again. Findings that don’t replicate are flukes instead of facts.”

It is especially important to the field of psychology, he told University World News. “Psychology has been a leader in developing and testing methods and theory on replicability. It was a logical place to start our analysis and compare our findings to previous work.”

It is not known why experimental studies are substantially less replicable than non-experimental studies across all subfields of psychology. “But, it is a significant issue since the only way to prove one thing causes something else – like a certain diet leads to weight loss or a pill cures a disease – is by experiments.”


The authors of the study, published in Proceedings of the National Academy of Sciences (PNAS) on 30 January 2023, are Dr Youyou Wu of University College London in the United Kingdom, Assistant Professor Yang Yang of the University of Notre Dame in France, and Brian Uzzi of Northwestern.

They found non-experimental papers to be some 1.3 times more likely to be replicable – that is, the likelihood that if a study is conducted a second time using the same methods, the results will be the same.

“This finding is worrisome, given that psychology’s strong scientific reputation is built, in part, on its proficiency with experiments,” write Wu, Yang and Uzzi.

Another worrying finding is that social media attention is linked to research replication failure. “We can only speculate why,” Uzzi said. “Social media likes controversial and flashy findings, which by their unusual nature may be less likely to replicate. This suggests that social media may be good at raising awareness but weaker at detecting replicability.”

The study investigated the ability of a validated text-based machine learning model to predict the probability of successful replication for 14,126 psychology research articles, published since 2000 in six top journals. It encompassed 26,349 authors from 6,173 institutions with 1,222,292 total citations and 27,447 total media mentions.

The model works by pattern recognition, Uzzi explained. It finds differences in the statistical patterns in papers known to replicate or not replicate. “Once these statistical patterns are reliably identified in known papers, the machine can then tell you whether the statistical patterns in a paper it has never seen before are more likely to replicate or not.”


The PNAS article outlines four primary findings. First, that replication success rates differ widely by subfields in psychology, and so replication failure from one project is unlikely to characterise all branches of the diverse discipline.

Second, replication rates are strongly linked to research methods, with experiments replicating at a significantly lower rate than non-experimental studies.

“Third, we find that authors’ cumulative publication number and citation impact are positively related to the likelihood of replication, while other proxies of research quality and rigour, such as an author’s university prestige and a paper’s citations, are unrelated to replicability,” write Wu, Yang and Uzzi. The fourth finding is about social media attention.

Uzzi told University World News: “The positive relationship between replication and publication productivity suggests that experience in doing research strengthens a researcher’s research. Conversely, citations can be high or low for researchers if they have had a blockbuster study.”

It was hard to say whether the finding on university prestige suggests that research at top universities is no more rigorous than research at other institutions. “Replicability is an important aspect of rigour, but not the only component.”

Upping the scale of evidence

The authors say in the PNAS paper that using a machine learning model produced evidence that both supports and refutes speculations from a smaller sample of manual replications. The results were verified with manual replication data when possible.

Brian Uzzi said: “Prior work on replication has been based on small samples with unknown generalisability. Our study provides the first census- level analysis of expected replication rates and how they vary.”

Manual replications are very expensive and slow to conduct, he pointed out, “resulting in there being far too few manual replications to generalise from. Hence, AI to the rescue!”.

University College London said in a release on 10 February that the study could help to address widespread concern about weak replicability in the social sciences, particularly psychology, and strengthen the field as a whole.

It quotes co-author Youyou Wu as saying: “Our results could help develop new strategies for testing a scientific literature’s overall replicability, self-assessing research prior to journal submission – as well as training peer reviewers.”

The authors write in the PNAS article: “The findings highlight the need for both academics and the public to be cautious when evaluating research and scholars using pre- and post-publication metrics as proxies for research quality.” Deciding a paper’s merits based on its media coverage is also unwise.

The model they developed could help to estimate replicability for studies that are difficult or impossible to replicate manually; and its predicted replication scores could “help prioritise manual replications of certain studies over others in the face of limited resources”.

There are limitations to the findings: that the papers studied came only from top tier journals; that the estimates of replicability are approximate; and that the training sample used lacked direct manual replication for some psychology subfields.

But they conclude: “Machine learning methods paired with human acumen present an effective approach for developing a better understanding of replicability. The combination balances the costs of testing with the rewards of exploration in scientific discovery.”