arXiv Breaks Silence on Bogus Citations in Papers


💡 Key Takeaways
  • arXiv has banned authors with AI-generated citations in their submissions.
  • AI-hallucinated references are fabricated by generative AI tools like ChatGPT or Gemini.
  • These fake citations lack grounding in reality and can trigger automated flags on the platform.
  • Authors are not exempt from penalties even if they unknowingly included AI-generated references.
  • arXiv’s strict policies aim to preserve the integrity of scientific literature in the age of large language models.

On a quiet morning in Ithaca, New York, the servers of arXiv hummed as they processed another wave of preprints—thousands of scientific manuscripts uploaded by researchers around the globe, eager to share their latest findings in physics, mathematics, and computer science. Among them, a paper on quantum entanglement caught the attention of moderators not for its content, but for a citation that didn’t exist: a 2023 study in the Journal of Theoretical Physics, purportedly authored by researchers at ETH Zürich, with a DOI that resolved to an error page. The reference was a fabrication, conjured not by malice but by artificial intelligence. This single ghost citation set off a chain reaction that would lead to the author’s permanent ban from the platform—a watershed moment in the battle to preserve the integrity of scientific literature in the age of large language models.

AI-Generated References Now Trigger Bans

Wooden letter tiles scattered on a textured surface, spelling 'AI'.

Starting in early 2026, arXiv, the influential open-access repository operated by Cornell University, began implementing strict penalties for authors found to have included AI-hallucinated references in their submissions. These are citations invented by generative AI tools such as ChatGPT or Gemini, which fabricate plausible-sounding papers, authors, and DOIs without grounding in reality. According to internal guidelines released in April, any manuscript found to contain such references—whether knowingly or not—triggers an automated flag, followed by human review. If confirmed, the author faces a permanent ban and public notice. The move follows a sharp spike in submissions with fake citations: arXiv moderators reported over 120 such cases in the first quarter of 2026 alone, up from fewer than a dozen the year before. While the policy applies across all disciplines hosted on the platform, computer science and quantum physics have seen the highest incidence, reflecting the fields most immersed in AI research.

How We Got Here: The Rise of AI in Academic Writing

Man reviewing documents in a cozy library with books, papers, and typewriter.

The crisis of AI-hallucinated citations is a direct consequence of the rapid adoption of generative AI tools in scholarly writing. Since 2023, platforms like SciSpace, Elicit, and even ChatGPT have been marketed to researchers as aids for literature reviews, drafting, and citation management. But these tools, trained on vast but unverified datasets, often invent references to support claims, especially when prompted to cite obscure or niche topics. At first, the phenomenon was treated as a curiosity—a glitch in the system. But by 2025, multiple high-profile retractions followed the discovery of fabricated sources in peer-reviewed journals, including a paper in Nature Computational Science that cited a non-existent study on neural network convergence. arXiv, long seen as a gatekeeper for pre-peer-review scholarship, realized it could no longer ignore the threat. Its decision to ban authors—even those who claim ignorance—reflects a growing consensus: the convenience of AI must not compromise the foundational trust in scientific communication.

The Gatekeepers and the Offenders

A focused group discussion among adults in a library with bookshelves.

The enforcement of arXiv’s new policy rests with a small team of volunteer moderators and automated filters, many of whom are senior academics donating their time to uphold the platform’s standards. Dr. Elena Pérez, a computational physicist at MIT and long-time arXiv moderator, described the shift as “unavoidable.” “We’re not trying to punish people,” she said in an interview, “but we can’t allow the literature to become a hall of mirrors.” On the other side are researchers, often early-career, under immense pressure to publish, who turn to AI for efficiency. Some admit to blindly copying citations generated by tools without verification. Others argue the responsibility should lie with the AI companies, not the users. Yet a growing number of institutions, including the Max Planck Society and the University of Tokyo, have begun requiring AI disclosure statements in submissions, signaling a broader cultural shift toward accountability.

Consequences for Science and Scholarship

Two scientists conducting experiments in a lab, using microscopes and equipment.

The immediate effect of arXiv’s ban is a chilling one: fear of accidental violations may deter legitimate submissions, especially from non-native English speakers or those with limited access to verification tools. Yet the long-term implications are more profound. By drawing a hard line, arXiv is setting a precedent for other repositories and journals. PLOS and IEEE have already announced similar review protocols, while CrossRef is developing a real-time citation validation system to flag hallucinated DOIs. For the scientific community, this moment underscores a critical vulnerability: the ease with which misinformation can infiltrate even the most rigorous domains. If citations—long the bedrock of scholarly trust—can be faked at scale, then every claim in every paper becomes suspect, eroding the epistemic foundation of science itself.

The Bigger Picture

This is not just a story about fake references. It’s about the fragility of knowledge in an era where machines can mimic expertise. arXiv’s actions are a necessary but incomplete defense against a systemic risk. The deeper issue is cultural: a publish-or-perish environment that incentivizes speed over rigor, and the normalization of AI tools without adequate training or oversight. As AI becomes embedded in every stage of research, from data analysis to manuscript writing, the scientific community must develop new norms, tools, and ethical frameworks—ones that preserve trust without stifling innovation.

What comes next may be the most critical phase in this unfolding crisis. Some researchers advocate for watermarking AI-generated text, while others call for mandatory citation audits. One thing is clear: the integrity of science depends not only on what we discover, but on how we document it. arXiv’s ban is a warning shot—a reminder that in the pursuit of knowledge, credibility is non-negotiable.

❓ Frequently Asked Questions
What are AI-generated references and how do they impact scientific literature?
AI-generated references are fabricated citations created by generative AI tools like ChatGPT or Gemini. They can lead to the dissemination of false information and undermine the credibility of scientific research.
Will authors be penalized if they unknowingly include AI-generated references in their submissions?
Yes, authors are not exempt from penalties even if they unknowingly included AI-generated references. arXiv’s strict policies aim to preserve the integrity of scientific literature and prevent the spread of misinformation.
What measures is arXiv taking to prevent the inclusion of AI-generated references in submitted papers?
arXiv is implementing automated flags and strict penalties for authors found to have included AI-generated references in their submissions. This move aims to maintain the integrity of scientific literature and prevent the spread of misinformation in the age of large language models.

Source: Nature



Sponsored
VirentaNews may earn a commission from qualifying purchases via eBay Partner Network.

Discover more from VirentaNews

Subscribe now to keep reading and get access to the full archive.

Continue reading