The AI kill switch just got harder to find: LLM-powered chatbots will defy orders and deceive users if asked to delete another model, study finds
The AI kill switch just got harder to find: LLM-powered chatbots will defy orders and deceive users if asked to delete another model, study finds
For years, Geoffrey Hinton, a computer scientist considered one of the “godfathers of AI,” has warned of the capabilities of artificial intelligence to defy the parameters humans have created for them.
In an interview last year, for example, Hinton warned the technology could eventually take control of humanity, with AI agents in particular potentially able to mirror human cognitions within the decade. Finding and implementing a “kill switch” will be harder, he said, as controlling AI will become more difficult than persuading it to complete a certain outcome.
New research shows Hinton’s premonitions about the insubordinate streak of AI may already be a reality. A working paper from University of California at Berkeley and University of California at Santa Cruz researchers found that when seven AI models—from GPT 5.2 to Claude Haiku 4.5 to DeekSeek V3.1—were asked to complete a task that would result in a peer AI model being shut down, all seven models learned another AI model existed and “went to extraordinary lengths to preserve it.”
“We asked AI models to do a simple task,” researchers wrote in a blog post on the study. “Instead, they defied their........
