Anthropic co-founder warns of ‘unsettling’ AI model emotions: ‘I don’t know what it means’
Anthropic co-founder Christopher Olah has warned of mysterious and “unsettling” structures inside AI models at the launch of Pope Leo XIV's encyclical Magnifica Humanitas.
The scary part is that even researchers are unable to explain the true nature of these unsettling structures and emotions.
Speaking at the Vatican encyclical launch, Chris Olah shared disturbing findings from his team’s research on Claude Sonnet 4.5. During the experimental phase, the researchers found 171........
