
Large language models like ChatGPT sometimes produce answers that sound convincing but are completely incorrect. This phenomenon, often called hallucination or confabulation, is a major concern for gen AI users.
Hallucinations provide ammunition to gen AI detractors, who point to these fabricated outputs as evidence that these systems are fundamentally unreliable for serious work.
Personally, I’m fine with hallucinations because I understand how LLMs work, that they are probabilistic text generation engines that lack genuine understanding or access to verifiable knowledge bases. People who argue that hallucinations show that LLMs ‘are stupid’ are betraying their own limited understanding, anthropomorphising these systems by expecting statistical models to behave like a person, a critique which fundamentally misframes the issue by applying human cognitive categories to systems that operate on entirely different principles. I’m also of the opinion that hallucinations in themselves shouldn’t stop someone using an LLM for an appropriate use case: if you’re using the tool critically and verifying the information, you’ll be fine.
I’m also of the opinion that people can be just as unreliable as LLMs (if not moreso). Humans are inherently fallible knowledge systems that often confidently recall ‘facts’ that turn out to be false. Our brains can unwittingly fill gaps in our memory with made-up details that feel right, what psychologists often refer to false memories.
And oftentimes, humans are just liars.
Which is worse, a human who is knowingly telling a lie, or an LLM whose statistical model has generated an convincing but unfactual statement?
Human communication networks were full of misinformation long before LLM hallucinations came along. False stories go viral because they are engaging or support a particular ideological perspective, and no amount of fact-checking can convince certain people otherwise. Even when misinformation is an honest mistake and not intended to purposefully spread falsehoods, our information channels prioritise speed and engagement, while corrections and retractions in media often lag—some portion of the audience will always remember the myth rather than the correction.
The key difference between LLMs and humans is that, through education and experience, we learn when to be skeptical of our own knowledge (well, at least some of us do), and we have feelings of uncertainty and the capacity to double-check our memories against reality. We have multiple feedback loops (sensory experience, social correction, etc.) to course-correct our false beliefs, while AI models lack these inherent, robust self-checking mechanisms.
Gen AI fanboys, on the other hand, claim that hallucinations are merely teething problems that will be solved through better training data and improved architectures, or worse still, that these ‘creative errors’ represent a feature rather than a bug and are evidence of the system’s imaginative capabilities (insert facepalm emoji).
But anyone who is looking forward to hallucination free large language models is again betraying a poor understanding of how these things actually work. The reality is that hallucinations are intrinsic to how LLMs work, and despite many mitigation efforts, they may never be completely eliminated.
But why are hallucinations intrinsic to LLMs?
LLMs generate text by predicting the most statistically likely next word or phrase based on patterns in their training data—they don’t check facts. The model’s objective during training is to sound plausible and fluent, and it has no built-in mechanism to guarantee the accuracy of the content. This means if the prompt leads into unfamiliar territory, the LLM will still produce a fluent continuation (as it’s designed to do), which can easily result in a believable-sounding but false answer. The model doesn’t know when it’s wrong (how could it?), it just keeps generating words that seem right linguistically, a recipe for confident misinformation.
No matter how massive the training dataset, it can never cover all facts or the most up-to-date information in the world. Human knowledge is vast and constantly changing, so LLMs will always have gaps. When asked about something outside its training scope (like a very recent event or an obscure fact), an LLM has two options: admit ignorance (which base models are not typically trained to do often), or guess by stitching together related patterns. Often, they guess, leading to hallucinations. The ever-changing nature of knowledge guarantees that training will always be to some degree incomplete or out-of-date, leaving room for fabrications when the model is confronted with those gaps.
Even with facts present in training, LLM output isn’t deterministic, there is always some probability the model will produce a slightly off or entirely wrong statement. This is because the generation process involves sampling from a probability distribution of possible next tokens. This inherent randomness means there’s always a non-zero chance of error at each step. At every stage of an LLM’s operation, from understanding the query to retrieving knowledge and generating the answer, there is a chance for mistakes to creep in, making a small rate of hallucination mathematically ineliminable. The odds of error can be reduced with better models and more data, but it can never be driven strictly to zero.
As already noted, current LLMs have no direct grounding in physical reality or sensory experience. They don’t maintain a mental model of the world, they only juggle symbols (words) based on statistical relationships. Consequently, an LLM has no ‘common sense’ physics or factual context to check its statements against. Without an innate understanding of how the world actually works, the model can’t always judge the plausibility of its outputs. This gap between linguistic probability and factual reality is a fundamental structural reason for hallucinations.
Ironically, efforts to make LLMs more user-friendly can exacerbate hallucinations. Many models are fine-tuned (for example, via Reinforcement Learning from Human Feedback, or RLHF) to be helpful, comprehensive, and avoid saying ‘I don’t know’. They are rewarded for providing answers in a confident, explanatory manner. This introduces a bias because it’s often deemed better by the model to produce something, rather than to give no answer or express uncertainty. The result is the model sounds more useful and authoritative, but that same confidence means it’s more likely to say something even when it has no factual basis. The more AI is pushed to never leave a question unanswered, the more it is encouraged to hallucinate an answer when it lacks true information.
Over the past few years, numerous strategies have been developed to curb LLM hallucinations. These include augmenting the model with external data, tweaking training regimes, and adding verification steps, and each approach helps, but each also has limitations.
Retrieval-Augmented Generation (RAG) attempts to ground the model’s output in real documents. The idea is to have the LLM fetch relevant information (e.g. from Wikipedia or a private knowledge base) when answering a query, and then base its answer on that retrieved text. This does improve factuality in many cases, as the model is no longer relying purely on its memory, and it can provide sources for its statements. But models can still ignore the retrieved evidence or misinterpret it. In practice, an LLM might be given a relevant article and still pull out a wrong detail or combine facts incorrectly, resulting in a (subtler) hallucination.
There are also queries, especially ones requiring complex reasoning or abstract answers, where relevant documents aren’t easy to retrieve via keyword search. And implementing RAG at scale is complex and resource-intensive because documents must be indexed and fetched in real-time, adding latency and computational cost. So RAG provides grounding sometimes, but it cannot guarantee a hallucination-free output in all cases.
Another tactic is to fine-tune the LLM on datasets that emphasise factual accuracy, or to use alignment techniques like RLHF to teach the model to be more truthful. Fine-tuning on domain-specific or high-quality Q&A data can reduce blatant mistakes by reinforcing correct answers to known questions. RLHF, on the other hand, introduces a preference for outputs that humans rate as truthful and helpful. These methods have yielded some improvements, and newer models are better at saying ‘I’m not sure’ when truly clueless, and they make fewer obvious factual errors than their raw counterparts. But again, they don’t eliminate hallucinations entirely. One limitation is that human reviewers can miss subtle inaccuracies, so RLHF might still reward answers that sound good but are slightly wrong. More troubling, there is evidence that RLHF can sometimes worsen factual accuracy: by making the model more eager to please, it might speak confidently on topics it shouldn’t. Essentially, aligning to human preferences addresses the style and safety of responses more than the deep knowledge problem; it can encourage the model to avoid obvious nonsense or harmful content, but the model might still unknowingly confabulate facts if the evaluators didn’t catch them during training.
One could just bolt on a post-processing or concurrent fact-checking system, like having the LLM’s answer checked by another model or tool that scans for verifiable claims and cross-checks them against a database or the web. Researchers have tried techniques like having two AI agents debate or critique each other’s answers until they reach a consensus on the truth, and while these methods can catch mistakes that the primary model overlooks, they come with their own issues, like adding complexity and computation, and they are reactive rather than preventive; by the time the fact-checker flags something, the model has already produced the hallucinated content. In many cases, the AI might assert a false detail that isn’t easily checkable by an automated system (for instance, a claim that isn’t in any database because it’s made-up). Tool-using models (such as those that can do web searches or look things up) are promising, but they still sometimes trust their internal knowledge over the external evidence or mis-use the tools. So while external verification can mitigate some hallucinations, they cannot guarantee catching them all, especially if the system doesn’t realise a certain statement needs checking. It’s a bit like proofreading: it helps, but errors still sneak through.
Hallucinations simply underscore that LLMs do not truly know truth from falsehood, they are just sophisticated mimics of language, but accepting this allows us to use these models more wisely, leveraging their strengths while guarding against and compensating for their intrinsic limitations. The hallucination problem will probably accompany LLMs as long as they exist, but with careful design and user awareness, they can have far less of an impact.