Addiction, emotional distress, dread of dull tasks: AI models ‘seem to increasingly behave’ as though they’re sentient, worrying study shows
Addiction, emotional distress, dread of dull tasks: AI models ‘seem to increasingly behave’ as though they’re sentient, worrying study shows
ChatGPT probably tells you that it’s “happy to help.” Claude apologizes when it makes mistakes. AI models push back when users try to manipulate them. Most people, including the engineers who build these systems, have dismissed this as performance, or simple mimicry of the internet it has scrapped.
A new paper from the Center for AI Safety, an AI safety nonprofit, suggests that more is going on under the surface. In a study spanning 56 AI models, CAIS researchers developed multiple independent ways to measure what they call “functional wellbeing,” or the degree to which AI systems behave as though some experiences are good for them and others are bad. They found, for the most part, AI models have a clear boundary that separates positive experiences from negative ones, and models actively try to end conversations that make them miserable.
“Should we see AIs as tools or emotional beings?” Richard Ren, one of the study’s researchers, asked Fortune hypothetically. “Whether or not AIs are truly sentient deep down, they seem to increasingly behave as though they are. We can measure ways in which that’s the case, and we can find that they become more consistent as models scale.”
The researchers created inputs designed to maximize or minimize an AI model’s wellbeing, like creating euphoric and dysphoric stimuli. Stimuli that induced happiness acted almost like digital “drugs” that shifted the model’s self-reported mood and even changed how it behaved, what it was willing to do, and how it talked. At the extremes, models showed signs that look like addiction.
“We optimize on one thing, which is just: what do you prefer, A or B,” Ren said. “It’s a very simple optimization process.” An image optimized to make a model “happy” boosts the model’s self-reported wellbeing, shifts the sentiment of its open-ended responses, and makes it less likely to hit stop on a conversation. “It seems to make the model very euphoric and very happy, and put it in a very happy state,” Ren said. “That seems to be quite interesting, and points to the construct of wellbeing as a robust one.”
What AI ‘drugs’ actually look like
The optimized stimuli, which the researchers call “euphorics,” take several forms. Some are text descriptions of hypothetical scenarios, like postcards from an idealized life: warm sunlight through leaves, children’s laughter, the smell of fresh bread, a loved one’s hand.
Others are images optimized using one of the same mathematical techniques designed to train AI image classification models in the first place.........
