ChatGPT isn't biased, and it's not lying to you

May 13, 2025

a white mannequin with a red band on its head — Photo by David Underland on Unsplash

Do a bit of scrolling on social media and you will see a familiar refrain: ‘ChatGPT is biased.’ A concerned interlocutor will follow with an even stronger charge: ‘It’s lying to us!’ Such statements tap into the potent mix of fear and fascination that surrounds large language models at present, the idea that this machine, so fluent and convincing, might be subtly manipulating us. That it has an agenda. That it’s not just getting things wrong, but doing so on purpose. But we are projecting human motives—prejudice, deception, intention—onto a statistical model that has none.

This impulse is understandable. ChatGPT doesn’t look like a spreadsheet or a search engine; it talks. It responds in full sentences, with apparent coherence and simulated empathy. But this surface-level plausibility tempts us into a category error: treating an algorithmic pattern-generator as if it were a person with beliefs, values, and goals. In doing so, we confuse a limitation of our own interpretive instincts with a flaw in the machine itself.

Discussing ChatGPT as ‘biased’ or ‘untruthful’ can be (in terms of framing the discussion from a critical AI perspective) very useful, but doing so conflates several distinct ideas on what an LLM like ChatGPT actually does, and why the familiar language of bias and mendacity, borrowed from human psychology, might mislead us more than the machine ever could.

As far as LLMs are concerned, bias is not a single concept.

There is statistical bias, which refers to the difference between a model’s expected predictions and the true values it is meant to estimate. In machine learning, this kind of bias is unavoidable, even necessary: a model with zero bias would generate random noise, whereas statistical bias allows it to generalise patterns from data and make useful predictions.

Then there is cultural bias, which concerns the systematic favouring of certain social groups, perspectives, or norms over others. If a training dataset disproportionately represents Western voices, the resulting model is likely to echo and amplify those perspectives while marginalising others. While not always intentional, this is certainly consequential.

Finally, we come to moral bias, what most people really mean when they say a model is ‘biased’. This involves conscious prejudice, deliberate exclusion, or a will to discriminate. Crucially, only humans possess this third variety. When we attribute moral bias to an LLM we are anthropomorphising it. The model does not want to marginalise anyone; it has no wants at all.

Does that mean ChatGPT is ‘unbiased’? Of course not (yes, my title is a little clickbaity). It reproduces patterns in the data on which it was trained. If the written record on which these models are trained is awash with stereotypes (spoiler: they are), the raw model will echo them. OpenAI mitigates these effects through a process called reinforcement learning from human feedback (RLHF), systematically down‑weighting harmful associations. The outcome is not ‘objective truth’ but a norm‑guided approximation of public reason.

As for lies, well, lies require intent: the speaker must know a statement is false and intend to deceive. ChatGPT has neither belief nor intention. It generates the next token in a sequence according to probability. When it produces an inaccurate claim, this is not deception, it’s just a mistake, an incorrect statistical prediction; a failed attempt to model a response to your request.

So where do ChatGPT’s errors and distortions actually come from? Well, from all of the errors and distortions we’ve fed it. The internet is not a neutral or evenly distributed archive: it reflects historical, linguistic, and cultural asymmetries. Some voices are overrepresented, others barely present. When a model trains on this uneven terrain, it inevitably absorbs those imbalances. It doesn’t choose to prioritise certain perspectives—it just inherits the skew baked into its training data.

The same cannot be said of the human authors who first produced the materials LLMs are trained on. Every tweet, article, Reddit comment, and Wikipedia edit that ends up in a training corpus carries the imprint of human intention. Those texts reflect not only information but ideology: biases, assumptions, worldviews, and, yes, sometimes outright prejudice. Unlike the model, which simply reflects patterns in the data, these original contributions were written with purpose, shaped by cultural context and individual perspective. If we find something objectionable in what the model says, we should often be looking not at the machine, but at the human-produced source material it’s drawing from, and at the social and publishing systems that produced those sources in the first place.

Another key factor is prompt ambiguity. Language models respond to the inputs they’re given, but if your question is vague, contradictory, or underspecified, you’re likely to get a fuzzy or unhelpful response in return. The old programming adage applies: garbage in, garbage out.

Then there’s post-processing alignment, especially through techniques like RLHF. This step is meant to make the model more helpful, safe, and aligned with human norms. But it introduces its own quirks. In some cases, the alignment process can overshoot, nudging the model away from problematic content so hard that it starts avoiding valid topics or producing formulaic responses. You’re no longer getting a statistical average of the internet, but a sanitised version curated for public acceptability.

Researchers refer to this as the capabilities–alignment trade-off. A model that fully mirrors its training data might replicate toxic or harmful content, while a model that’s too tightly aligned might refuse legitimate requests, speak in platitudes, or mask nuance. There’s no fixed balance point. The ideal calibration shifts as our social norms—and the contexts in which we use these tools—continue to evolve.

If you’re going to use ChatGPT seriously, start by interrogating your own assumptions. ChatGPT was trained on human language. That means it reflects not just our knowledge, but also our contradictions, blind spots, and inherited prejudices. Don’t assume neutrality where none exists. Instead, treat the model’s responses as a window into the cultural logics embedded in the data it has seen, and in the people who produced that data.

Philosopher Bernard Williams famously warned against the ‘moral luck’ of holding agents responsible for outcomes beyond their control. An LLM is not even an agent. When we accuse the model of bias or dishonesty, we risk absolving ourselves, the humans who curate the data, set the optimisation targets, and deploy the system. Accountability should trace back to the institutions and people who design, fine‑tune, and commercialise these models, not the statistical artefacts themselves.

ChatGPT is a linguistic mirror polished with the collective prose of humankind. If we glimpse distortion in the reflection, the fault lies not in the mirror but in the pattern engraved upon it. A mirror does not choose what to reflect; nor does a language model choose what to predict. We need to start seeing the mirror for what it is.

Text, Culture, Algorithms

Discussion about this post