Courts can't agree whether AI companies owe creators for training on their work
Courts can't agree whether AI companies owe creators for training on their work
Creative work already trained the AI models replacing its makers. Judges now disagree on whether that counts as theft or fair game
Alexi Rosenfeld / Getty Images
Writers and artists learned too late that their published work had trained generative AI systems. Their books had already been read and their images had already been scraped. The models had gained in a fraction of the time all the experience and knowhow that took them a lifetime to gain.
As a broader crisis unfolds for white-collar workers across the economy, creative professionals must now compete against systems built, in part, on their own labor, without their consent and often without their knowledge.
The legal fight is already splitting judges down the middle. One federal judge ruled that training AI models on copyrighted books counts as fair use. A different judge, weighing nearly identical facts, ruled the opposite and called it copyright infringement. How that divide resolves will decide whether the people who built these systems ever get paid for it.
The pipeline that fed creative work into AI systems
Researchers have traced, with increasing precision, how copyrighted work ended up training the AI models built to write, illustrate, and compete with the people who made it. The Pile, an 886-gigabyte collection of English text assembled by the AI research group EleutherAI in 2020, became one of the two most widely used training sets for large language models. One piece of it, a component called Books3, contained roughly 191,000 copyrighted book titles pulled from pirate website Bibliotik. A Danish anti-piracy group forced Books3 offline through legal takedown requests in July 2023, but by then, copies of the dataset had already spread across the internet.
Visual artists faced a parallel pipeline. LAION-5B, a dataset of 5.58 billion paired images and text captions pulled from a broad crawl of the open web, trained Stable Diffusion, Midjourney, and other image generators. The Authors Guild documented that Meta $META and other companies also copied books from........
