Quantifying new threats to health and biomedical literature integrity from rapidly scaled publications and problematic research

Matt Spick; Anthony Onoja; Charlie Harrison; Stefan Stender; Jennifer Byrne; Nophar Geifman

doi:10.1016/j.jclinepi.2026.112203

Back

Quantifying new threats to health and biomedical literature integrity from rapidly scaled publications and problematic research

Journal article

Open access

Peer reviewed

Quantifying new threats to health and biomedical literature integrity from rapidly scaled publications and problematic research

Matt Spick, Anthony Onoja, Charlie Harrison, Stefan Stender, Jennifer Byrne and Nophar Geifman

Journal of clinical epidemiology, Vol.193, p.112203

23/02/2026

DOI: https://doi.org/10.1016/j.jclinepi.2026.112203

PMID: 41740900

Abstract

Metascience

Paper mills

Artificial intelligence

FAIR guiding principles

Open science

Integrity

The last three years have seen an explosion in published manuscripts analysing open-access health datasets, in many cases presenting misleading or biologically implausible findings. There is a growing evidence base to suggest that this is due in part to AI-assisted and formulaic workflows, and publishers are responding by discouraging submissions employing open-access health datasets. Here we employ a scientometric analysis to investigate which datasets have seen publication rates deviate from previous trends, especially where this coincides with changes to author geographical origins and increases in formulaic titles. Across 36 datasets we identify nine showing hallmarks of paper mill exploitation (FAERS, NHANES, UK Biobank, FinnGen, the Global Burden of Disease Study, MIMIC, CHARLS, CDC WONDER, and TriNetX). These nine datasets had, in 2025, a combined publication count of 23,005 indexed in the OpenAlex database. This represents an excess of 11,577 publications above the AutoRegressive Integrated Moving Average (ARIMA) forecast trend, and is a 3.0x fold change on the 7,655 publication count for these nine datasets in 2022. We also identified a notable difference in the fold change for China (4.2x) versus the rest of the world (1.9x) and an increase in formulaic titles. These findings highlight potential risks to research integrity in areas such as public health and drug safety, and especially to the accessibility and interoperability principles central to Open Science and FAIR data practices. We argue that permissive open-access data policies naturally facilitate exploitative workflows, and that these findings add to the case for the safeguarding mechanisms to preserve the goals of Open Science

Files and links (1)

url

https://doi.org/10.1016/j.jclinepi.2026.112203View

Published (Version of record) Open

Metrics

6 Record Views

Details

Title: Quantifying new threats to health and biomedical literature integrity from rapidly scaled publications and problematic research
Creators: Matt Spick - University of Surrey
Anthony Onoja - University of Surrey
Charlie Harrison - Aberystwyth University
Stefan Stender - Copenhagen University Hospital
Jennifer Byrne - New South Wales Department of Health
Nophar Geifman - University of Surrey
Publication Details: Journal of clinical epidemiology, Vol.193, p.112203
Publisher: Elsevier Inc; NEW YORK
Number of pages: 12
Publication Date: 23/02/2026
Grant note: UK Research and Innovation: UKRI1095 Biotechnology and Biological Sciences Research Council: BB/Y006933/1
Funding: Matt Spick was supported by UK Research and Innovation (UKRI1095) . Charlie Harrison was supported by the Biotechnology and Biological Sciences Research Council (BB/Y006933/1) and by UK Research and Innovation (UKRI1095) . The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Identifiers: 991110193502346; WOS:001720081800001
Academic Unit: School of Health Sciences
Language: English
Resource Type: Journal article

Quantifying new threats to health and biomedical literature integrity from rapidly scaled publications and problematic research

Abstract

Files and links (1)

Metrics

Details

Usage Policy