OpenAI 2024. "Why Language Models Hallucinate"

daihum · September 6, 2025, 8:35am

https://openai.com/index/why-language-models-hallucinate/

Overview

[AI Summary]: This research paper from OpenAI investigates the fundamental causes of hallucinations in language models, arguing that standard training and evaluation procedures inadvertently reward models for guessing rather than acknowledging uncertainty. The authors demonstrate that current accuracy-based evaluations encourage models to provide plausible but incorrect answers instead of abstaining when uncertain, and propose that penalizing confident errors more than uncertainty expressions would reduce hallucinations. The paper provides statistical analysis showing how hallucinations arise from next-word prediction during pretraining, particularly for low-frequency facts that cannot be reliably predicted from patterns alone.

Authors: Adam Kalai, Santosh Vempala, Ofir Nachum, Eddie Zhang, David Robinson, Saachi Jain, Eric Mitchell, Alex Beutel, Johannes Heidecke
Year: 2024
Paper: https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4aaa5/why-language-models-hallucinate.pdf