The fight over AI training data is coming to a head
A.I.
The fight over AI training data is coming to a head
As AI spreads into everyday life, pressure is mounting on tech firms to explain what data their models learn from — and who gets paid
ByAlex Daniel
Share to XShare to FacebookShare to RedditShare to EmailShare to Link
Published 15 hours ago
Add Quartz on Google
Share to XShare to FacebookShare to RedditShare to EmailShare to Link
Jaque Silva/NurPhoto via Getty Images
OpenAI was an open book back in 2020. When it launched GPT-3, it released a detailed report on how the chatbot was built with a public “reading list” showing the kinds of material it was trained on. (About 3% of it was Wikipedia.) That allowed researchers to see exactly what made the AI tick.
Today, details like these are treated as trade secrets. AI companies say revealing too much about how their technology works would give competitors an advantage, so much of it is withheld from public view — even as these systems are integrated into schools, hospitals, and workplaces. That loss of transparency has become a major source of concern — and the basis of dozens of legal battles.
It's no secret that the work of writers, artists, musicians, and publishers helps power today’s AI models. That fact has resulted in a torrent of lawsuits from copyright holders who allege that AI companies are illegally using their work to train systems without permission. It's becoming one of the defining battles over how the AI industry is allowed to grow.
“You cannot avoid the fact that its sheer existence is because of the songs that I wrote in the past,” said Björn Ulvaeus, the Swedish singer-songwriter and member of ABBA, speaking to Bloomberg last year about AI tools........
