The AI detection delusion

AI detection software can do a lot of damage through misplaced certainty

Mar 23, 2026

Marshmello walking down the side of the road with camera on the top of the building — Photo by arvin keynes on Unsplash

Last week, Peter Vandermeersch, former chief executive of Mediahuis Ireland and one-time editor-in-chief of NRC, was suspended after an investigation revealed that fifteen of his fifty-three Substack newsletter posts contained fabricated quotes that were hallucinated by AI tools he had used to summarise source material. Seven of the individuals he cited confirmed they had never said the words attributed to them. The story made international headlines, and following the revelation, Vandermeersch published a candid apology on his newsletter, acknowledging that he had trusted LLM-generated summaries without verification. It was, by any measure, a serious lapse in journalistic practice, and the coverage was warranted.

But what came next was, to my mind at least, less warranted.

Within hours, journalists and commentators took to running the apology through AI detection tools, with many concluding that Vandermeersch had, to add insult to injury, used generative AI to generate said apology.

But feeding text into AI detection software and treating the output as evidence of anything represents a misunderstanding so profound that it deserves its own reckoning.

I wrote about this last year in my ‘GPTZero is useless’ post, but let me say it louder for those down the back: AI detection software does not work. And making public claims on the basis of its outputs is almost as bad as any of Vandermeersch’s sins.

AI detection tools operate by measuring statistical properties of text, most commonly ‘perplexity’ (how predictable the text would be to a language model) and ‘burstiness’ (the variation in sentence complexity across a passage). Text that a language model would find easy to predict scores low on perplexity, while text with uniform sentence structures scores low on burstiness. Both of these properties, the logic holds, indicate machine authorship, but this logic conflates the statistical fingerprint of machine-generated text with the statistical fingerprint of clear, structured, formal prose. The two overlap considerably.

The consequences of this overlap are well documented. A 2023 study by Stanford researchers found that seven widely used AI detectors misclassified over 61% of essays written by non-native English speakers as AI-generated. 97% of those essays were flagged by at least one detector. The reason is straightforward: non-native speakers tend to write with simpler vocabulary, more predictable grammar, and more uniform sentence structures, the very same qualities the detectors associate with machine output.

OpenAI itself launched an AI text classifier in January 2023. It correctly identified AI-generated text 26% of the time—it was accurate about a quarter of the time. By July of the same year, OpenAI pulled the tool entirely, citing its low rate of accuracy. Not even the organisation that built GPT could build a detector that worked. The research on AI detectors could not be clearer—they do not work. One paper demonstrated that the best available classifiers performed only marginally better than chance under real-world conditions. Times Higher Education demonstrated that they could reduce Turnitin’s detection rate from one hundred per cent to zero simply by prompting ChatGPT to ‘write like a teenager’. These are just a couple of the countless examples which demonstrate how useless AI detection tools really are at detecting AI.

Detection tools won’t get better because the task they are attempting may be intractable in principle. Language models generate text by predicting the most probable next token given the preceding context, and the better the model, the more closely its output resembles the distribution of human language (that is, of course, the entire point). The better AI writing becomes, the more it converges on the same statistical properties as human writing and the narrower the gap that detectors are trying to exploit.

Remember, people don’t write like AI, large language models write like people, because they were trained on vast amounts of human writing (which was, for the most part, stolen). A human writer who happens to produce clean, well-structured prose or who writes in English as a second language, or who has a neurodivergent cognitive profile that favours repetition and regularity will continue to be flagged. AI detectors are essentially more effective at penalising certain kinds of human writers than at identifying AI output.

For all the legitimate concern about LLMs introducing fabrication, hallucination, and intellectual laziness into professional and academic life—concerns amply illustrated by the Vandermeersch case itself—detection software introduces a different category of harm in the form of false accusations.

When a journalist—or anyone—runs a public figure’s text through GPTZero and reports the result, they are making a claim about authorship on the basis of a tool whose error rates and limitations are extensively documented. The result is a kind of epistemological vigilantism, a public accusation of dishonesty grounded in technology that cannot support that accusation’s weight.

Consider the position of the accused. If you are a non-native English speaker, a neurodivergent writer, a person who writes formal or careful prose, or simply someone who happens to produce text with low perplexity on a given day, you have no reliable means of proving that you wrote your own words. The detector’s verdict, once publicised, carries an air of technological authority that human protest cannot easily overcome and the presumption of guilt attaches with a speed that no correction can match.

The Vandermeersch case is instructive because the actual journalism worked. NRC’s investigation proceeded by verifying quotes against their purported sources, contacting the individuals cited, and checking the factual claims against the published record. This is painstaking, unglamorous work but it produced clear evidence of fabrication. No detection tool was required, and none would have been sufficient.

Running Vandermeersch’s apology through a detector after the fact accomplished nothing except to muddy the waters with spurious technologically-derived suspicion.

None of this is to defend AI-assisted plagiarism or the uncritical use of LLMs in professional writing. What Vandermeersch did was wrong, and he has said as much himself. The problem of AI-generated fabrication in journalism, academia, and public life is real and growing.

But the solution to one form of technological recklessness cannot be another. Detection tools give their users the feeling of objective certainty while delivering probabilistic guesses, and in doing so, they can cause real harm to real people—students who lose marks or face disciplinary action, professionals whose reputations are damaged, and writers whose command of English is held against them by an algorithm that mistakes simplicity for artificiality.

If you are a journalist covering an AI fabrication scandal, verify the claims against their sources. If you are an educator concerned about student integrity, design assessments that resist mechanical completion. If you are a reader confronted with text that ‘sounds AI-ish’, interrogate your own assumptions about what human writing is supposed to sound like, and ask whether the person you are about to accuse deserves more than a percentage score from a tool that OpenAI itself could not make work.

The Vandermeersch affair deserved serious scrutiny and it got that. What it did not need was a secondary performance powered by software that does not do what it promises. Detection tools are just as bad as the misuse they were invented to prevent. Stop using them.

Vicki J. Sapp

So much to say...so little time (and maybe even less will at this point). I fully agree that AI detectors are flawed; however, there's a missing piece here. "The reason is straightforward: non-native speakers tend to write with simpler vocabulary, more predictable grammar, and more uniform sentence structures, the very same qualities the detectors associate with machine output." These days I primarily teach non-native speakers, and the clash between the submitted assignment prose and other contact in writing is big and glaring. The same is true for my native speakers; it's not that "no human writes like that"; it's more like "this human doesn't write like that." The statement quoted above seems quite reductive and hard to apply across big, complex populations. Relying on the AI detectors alone can indeed lead to damaging error; however, without this even marginal support for our suspicions (which we cannot officially use, it's true), we are left with the he-said, she-said impasse. Students will swear up and down that they didn't use AI or just used it to "to fix grammar issues." With every experiential and logical perception that we have, we know that this isn't true. Yes, it's a learning curve trying to resolve these challenges with our deeply-rooted desire that our students actually learn something in our classes. But try challenging such cases without empirical support and find yourself in exhausting, useless defeat. And of course, yes, toss it back to the instructors for not offering AI-proof assignments--try that, too, and you'll be surprised how easy it is, outside of paper & pen in class (and even this is compromised by the device in the lap, under consultation the whole exercise), for students to bring in AI for most any assignment. If I sound exhausted with it all, I am...if I sound defeated, maybe not yet. It's still, after all, a compelling challenge, which has its own exhilaration and rewards...

James O'Sullivan

Discussion about this post

Ready for more?