Generative AI systems didn’t emerge from thin air, they were built on the largest unpaid labour extraction in human history. Generative AI platforms—large language models (like ChatGPT and Claude), DALL-E, Midjourney, and countless others—are built on vast datasets harvested without explicit consent from creators whose work makes the whole enterprise possible.
Every prompt we type, every image we generate, every line of code we request participates in an unprecedented act of cultural theft. Not metaphorical or philosophical theft, but actual theft, the taking of intellectual property without consent (often for commercial gain).
Tech companies frame this as ‘training on publicly available data’, as if posting a photograph to share with friends constitutes consent for a trillion-dollar industry to commercially exploit it forever. Leaving your bicycle unlocked doesn’t mean you’ve agreed to let someone take it, repaint it, and sell it back to you.
Visibility isn’t permission—just because something is online, doesn’t mean it’s fair game for extraction. When an artist or author posts their work, in any forum, this does not mean that they have signed up to train their replacement. Even when a piece of work has an open license, open licensing typically specifies permissible use cases and attribution requirements, both of which are roundly ignored by gen AI systems.
The theft operates on multiple levels.
Keep reading with a 7-day free trial
Subscribe to Text, Culture, Algorithms to keep reading this post and get 7 days of free access to the full post archives.