AI-slop , GrantaGate and Bad Writing
AI writes poorly because it has crammed too much fanfic jargon
A few days ago, at a very fancy party in NYC, this young woman in her 20s walked up to me and asked, “Are you writing something?” “Academics are never not writing, my friend,” I said in a mildly sarcastic tone. “You know what, I’d like to read your stuff on Substack,” she said. “But that doesn’t get you tenure,” I replied. Since that exchange, I have wondered: should I write something non-academic that is also a bit academic? My hesitance can in part be attributed to George Saunders and Ottessa Moshfegh, whose brilliant Substack columns intimidated me. I am no writer. Who on earth would want to read my Substack? Yet here I am, writing my first Substack post.
The last two days have been a bit crazy, thanks to a certain short story published in Granta magazine that has now been awarded a prize by the Commonwealth Foundation. Nabeel Qureshi was the first on my feed to highlight that it was AI-generated. As I write, his tweet has 1.5M views which is roughly 1.4M more than anything I have ever written. For us fortunate, or if I may say unfortunate, professors whose inboxes are inundated by AI-written emails, it is not exactly detective work to recognize that this short story was completely AI-generated.
Pangram’s AI detection confirmed this. For those who don’t know me, I should admit I am a bit of an AI-detection maximalist, so obviously I got into arguments on X with people who were quick to dismiss the detection results and insist detectors don’t work. Now putting that aside, a journalist from El País asked me yesterday, “Why did we suddenly decide, as in the case of this award, that a text is AI-generated when there’s no reliable detection tool available?” My reply was brief. I pointed to the stylistic tics and patterns prevalent in AI writing. Last year I won a best-paper honorable mention award for writing a paper characterizing idiosyncrasies in AI writing. It should not be surprising that I, of all people, have now been trained on an inordinate amount of AI slop.
A few days ago, my very smart PhD student came up to me with a point that stuck with me. In her words, bad AI writing, or low-quality AI writing, can be attributed to how much text it memorized from pre-training. Kind of a brilliant hypothesis. After all, LLMs are not conscious. They do not have a perfect sense of embodiment. They are autoregressive models that generate text by sampling, more or less, from a very large pile of things other people wrote. In simpler terms, it is what my colleague Najoung Kim (a linguistics professor at BU) calls word salad. Last year we wrote a paper, now accepted to ICLR, on how seemingly novel n-grams are often nonsensical or non-pragmatic. There are perfect examples of this in the short story. Consider certain phrases: “she had the kind of walking that made benches become men” or “the girl smiled like sunrise over a sink.” Like, what does it even mean?
So after much deliberation, I asked my student to run the story through Infinigram (an n-gram attribution engine, which is a fancy way of saying it tells you where a phrase was probably stolen from). Based on my anecdotal experience, LLMs don't follow genre conventions; they cherry-pick expressions from wherever, genre be damned. Say I am writing literary fiction; I would avoid tropes or hackneyed metaphors like the plague (you see what I did there :P). That kind of writing tends to live on fan fiction sites and the dustier, lower-ranked corners of the web.
Let’s have a little fun. Below you can see a little demo. Don’t forget to click on it. This will show you how LLMs stitch together verbatim expressions from different sources. Some might say it’s not technically infringement, because these expressions are not super long, but they are not original either.
Now let’s decode them one at a time.
For instance, if you read the sentence “but a belly sound, as if the earth swallows a shout and holds it there,” it seems rather jarring. Not really what you find in good writing, is it? Don’t worry, this might have been potentially drawn from a fan-fiction webpage, The Fall of a Phoenix. (Sorry, the story is removed now, but the sentence was: “He swallows a shout of pain, breathing heavily, his whole body trembling.”)
I was intrigued by the very purple sentence ending in a typical ChatGPT rule of three (x, y, z). Here, “damp earth, woodsmoke and the sour tang of fermenting cocoa” seems hyperspecific and very rare. Turns out there is a possibility it could have been drawn from an AO3 webpage. The sentence being: “She smelled cooking and the sour tang of fermenting grain.”
Consider the sentence “His eyes narrowed against the glare outside and the darker glare inside him.” Here, “eyes narrowed against the glare outside” sounds a bit vain and ominous in a way that’s unnecessary. Well, the reason could be being drawn from jedifiction.com (that’s defunct now), or a random horror blog (the sentence being: “His eyes narrowed against the glare of the halogen strip lights”). Speaking of “the darker glare” while pretty common also appears in another fan-fiction story Poison.
So many other instances of word salad that at this point my head hurts. Another brilliant example of bad plagiarism: “the air sweet with cane and forgetting.” What does it even mean?? Readers on X were quick to remind me that cane here refers to sugarcane in Trinidad. I am skeptical. There is a chance it was copied out of context from a Better Call Saul review (an active elderly relative suddenly having to use a cane and forgetting your name).
More bad writing?
“Vishnu began to plan with the patience of a reptile.” It’s could have been drawn from The Mangled Spider (“She has the patience of a reptile, lying in wait.”)
“Something coiled inside her chest.” Chance of being drawn from another fanfic webpage (“Something coiled inside her stomach as she began to give herself over to sensations [...]”).
Disclaimer: we can’t actually peek inside an LLM’s brain (which is probably for the best). But it’s entirely possible that when you prompt it to write fiction, it’s quietly scouring through every abandoned Wattpad draft and fanfic written at 2 a.m. by someone who really shouldn’t have.
So next time you cringe at AI writing, remember: LLMs are not thinking twice. There is no reflection. They are drawing words from whatever source they can. They do not conform to genre conventions. A lot of bad writing happens because AI hasn’t learned aesthetics. It has simply memorized the whole internet and called it a day. So sure, maybe you don't trust AI detectors. But you can trust your own eyes. Read closely. I am no expert but maybe this verbatim borrowing, these tonal mismatches, and the metaphors that mean nothing are precisely what AI detectors learn to recognize ?



Wait ... is the possibility of this kind of n-gram analysis not an absolutely massive deal? Does it work with other AI-generated text? If so, then I imagine it'd be of extreme interest to Pangram and also totally transform the conversation surrounding AI-generated text.
If LLMs routinely uses verbatim snippets from their training data, that would presumably make AI writing even less desirable / more low-status than it is now, and would strengthen the case it counts as plagiarism.
I'd be interested in seeing this tool used on a piece of writing from 2020. Super interesting findings here but none of the examples you've identified are actually that rare or strange, especially not 'Something coiled inside her' or 'swallows a shout'. The ones identified in the app are even more benign most of the time. I'm certainly open to the Granta piece being AI, but even most of the 10 and 11 token n-grams in the demo are phrases I've read/heard before. Not sure this works as a source of authority with the current explanation/demo.
'the air sweet with cane and forgetting' is a reference to sugarcane , which is commonly eaten in Trinidad and Tobago. While the writing may not be to your taste, I don't think a flowery description of something instantly identifies it as a 'brilliant example of bad plagiarism'.