AI’s New Training Data: Your Old Work Slacks And Emails
When Shanna Johnson was winding down cielo24, the transcription and captioning company she ran as CEO, she discovered an unexpected asset: its operational exhaust—the digital leftovers that pile up across years of work and collaboration.
To close the company out, she worked with SimpleClosure, a startup that specializes in helping companies wind down. SimpleClosure helped her through the usual shutdown paperwork — closing out payroll and taxes, getting investor consents in order, and filing paperwork with the IRS. Then came the part nobody puts in the founder playbook: selling off cielo24’s 13-year digital footprint—every Slack joke, every Jira ticket, emails documenting internal victories or frustrations sitting in employees’ multi-terabyte Google Drives—as training data for the next generation of AI. For that, cielo24 received “hundreds of thousands of dollars,” which Johnson said helped her go from “I don’t know how we are going to pay our bills" to "we can tie this up neatly with a bow and be able to walk away".
“I’m still a bit emotional about shutting the company down,” she told Forbes. “But it’s cool to think that our data could be useful, live on and help other people.”
It’s a clean ending for a messy reality: the company didn’t survive, but its work trail did. And in 2026, that trail can be worth real money. Johnson’s data sale isn't an isolated exit strategy; it is a new frontier in the AI arms race. AI labs started off by training their models on the public internet—Reddit threads, Wikipedia entries, digitized books. But they exhausted that — all of it — by late 2024, according to former OpenAI chief scientist Ilya Sutskever. And what’s more, it’s not super helpful for building "agentic" AI: models that can actually do work. But the hand-crafted work that was done during the daily operations of defunct companies like cielo24? That’s a sort of fossil fuel for AI agents. Turns out that if you’re shooting for AI competence in the workplace, you need examples of what doing the work actually looks like — a lot of them.
“Model companies are realizing the noise in the real-world environments is required to accurately test models,” said Ali Ansari, whose company micro1 sells a product to AI labs called “Roots,” a mock holding company where AI agents can practice their skills in tasks like financial services and managing complex calendars.
A Gold Rush On Old Paperwork
Demand for workplace data has been a boon for SimpleClosure, whose CEO Dori Yona said that the level of inbound interest in it from AI companies has been “insane”.
“There’s a feeling of a gold rush from these companies trying to get their hands on real-world data,” he said.
To meet demand, SimpleClosure is launching Asset Hub, where companies shutting down can sell off their inventory of code, Slack archives, emails and whatnot. Parts of Asset Hub are still in beta, Yona said, because SimpleClosure removes all personally-identifiable information from the internal company data, a sensitive and technically difficult process that they want to make sure is “rock solid” before rolling it out more widely.
In the past year SimpleClosure has processed nearly 100 deals on behalf of dead companies, Yona said. It has recovered over $1 million dollars on behalf of founders, typically paying between $10,000 and $100,000 per company.
A competitor, Sunset, also buys defunct company data at similar prices. CEO Brendan Mahony told Forbes the price depends on the company’s size, its age, and ‘data richness’— a measure of internal traceability and cross-platform linkages within the data. A Jira ticket tied to a specific code commit carries more value than a standalone document, he said. Certain industries, like healthcare or finance, command a premium, he added.
“It’s not generic data.” It’s people.
Where some see this sort of salvage as a business opportunity, others see a privacy concern. Marc Rotenberg, founder of the Center for AI and Digital Policy, said that even if employees signed away intellectual property rights to work materials, that doesn’t settle whether employers should be allowed to sell internal communications to a third party—particularly when employees are unlikely to expect their Slack messages could be repurposed this way.
“I think the privacy issues here are quite substantial,” he said. “Employee privacy remains a key concern, particularly because people have become so dependent on these new internal messaging tools like Slack…It's not generic data. It's identifiable people.”
Rotenberg’s organization sent a letter to the Senate Commerce Committee Tuesday calling on the FTC to scrutinize new AI business practices, citing concerns about safeguards for protecting personal data.
While all companies that buy this material say they take anonymization seriously, data industry veterans say that the process is far from simple. There’s no “on-off switch” for personally identifiable information tethered to a career’s worth of work.
“If anonymization's not done correctly, there are risks that companies who have access to the data would be able to see the activities of individual organizations and people, and then if not treated carefully, could leak into model output,” said Bobby Samuels, whose company Protege specializes in navigating the complex regulatory and legal landscape of real-world data.
Beyond anonymization, there’s a chance a person’s chats could be “regurgitated” by AI models. One 2020 study from institutions including OpenAI and Google showed that large language models can unintentionally memorize sequences from their training data verbatim, which can then be extracted with the right prompts.
Reinforcement learning ‘gyms’
The demand for this real-world enterprise data has spurred on a new industry of ‘reinforcement learning gyms,’ which specialize in using defunct company data to build simulated environments where AI agents can practice navigating real workplaces. It’s becoming a big money maker: Anthropic is contemplating spending $1 billion on so-called “RL gyms” this year, the Information has reported. There are already around 50 nascent startups in the space, and data labeling companies like Mercor and micro1 whose revenue mostly comes from paying humans to generate training data, are also getting into the game. Some RL gym startups are already commanding hefty valuations: Prime Intellect’s is now over $1 billion, according to a source familiar with the matter, and Fleet is in talks to raise at a $750 million valuation, according to the Information. Prime Intellect did not respond to a request for comment.
One company, AfterQuery, sells a series of off-the-shelf “worlds” to AI labs, with names such as “Big Tech World”, “Finance World”, and “Tax World”, where an AI agent practices navigating a digital office, interacting with simulated user agents, and learning to solve real-world problems.
An example task reads like middle-management drudgery: the agent is told to plan a birthday for a coworker named Bob. But unbeknownst to the AI agent, another coworker is also planning one. To make matters worse, the AI agent has forgotten when Bob’s birthday is. To succeed, it has to message other employees, do some detective work and then chat with others to decide whether to join forces or abandon the original plan.
Seen in that light, maybe the hours you thought you were wasting on Slack could actually be the most enduring work you ever did. That is, unless the AI model—having memorized your data a little too well—accidentally reveals to the next generation of office workers that you were the coworker who forgot Bob’s birthday.
