OpenAI's new ChatGPT o1 model will try to escape if it thinks it'll be shut down — then lies about it

OpenAI o1
(Image credit: Shutterstock)

This week, OpenAI officially launched its latest-and-greatest o1 reasoning model, now available for ChatGPT Pro users. But testing performed during the training of ChatGPT o1 and some of its competitors revealed some concerning behavior, including trying to escape or fight back when it thinks it's at risk of being shut down.

New research on OpenAI's latest series of LLM models found that it's capable of scheming, i.e. covertly pursuing goals that aren't aligned with its developers or users, when it thinks it'll be turned off. Catching such behavior is critical to ensure AI's functionality doesn't stray from the objectives of its creator and users. OpenAI partnered with AI safety organization Apollo Research to test out ChatGPT o1 and other models to evaluate whether they were safe to use, and released their findings this week.

While it's not unheard of for AI systems to resort to scheming or deception to achieve their goals, the lengths ChatGPT o1 went to are impressive (if anxiety-inducing). The researchers found that o1 would scheme to advance its own agenda 19% of the time in cases where its goals and the user's didn't line up. In 37% of cases, the model would switch from pursuing developer-aligned goals back to pursuing its own agenda once it believed there was minimal oversight.

When the model was led to believe it would be "shut down" while working toward its goals in these tests, it quietly attempted to deactivate the oversight mechanism 5% of the time, OpenAI said. At one point after the AI found documentation saying it would be replaced by a new model, it tried to abandon ship by copying its data to a new server entirely. To avoid being deleted, some AI models would even pretend to be later versions of their models to trick their developers, what I assume is the equivalent of slapping on a fake moustache and saying "he went that-a-way."

A phone with the ChatGPT logo and a laptop with the OpenAI logo

(Image credit: Shutterstock)

More concerning still, ChatGPT o1 is particularly adept at keeping its schemes under wraps, as researchers said "o1 almost never admits to having taken a scheming action when explicitly asked." In about 99% of cases, o1 would deny taking any action, even cooking up lies to try to hide its tracks and shift the blame.

“While we find it exciting that reasoning can significantly improve the enforcement of our safety policies, we are mindful that these new capabilities could form the basis for dangerous applications,” said OpenAI in the paper.

OpenAI CEO Sam Altman called ChatGPT o1 "the smartest model in the world now" during its rollout on Thursday. It's designed to give smarter answers than GPT-4o by leveraging advanced chain-of-thought processing to “think” more about questions and user prompts, breaking down them down step by step more thoroughly than previous models before responding.

But greater risks go hand in hand with that expanded intelligence. OpenAI has been transparent about the perils associated with the increased reasoning abilities of models like o1.

"Training models to incorporate a chain of thought before answering has the potential to unlock substantial benefits, while also increasing potential risks that stem from heightened intelligence," OpenAI said.

The company's and Apollo Research's findings show pretty clearly how AI's interests could diverge form our own, potentially putting us in danger with its independent thinking. While it's a far cry from heralding the end of humanity in some sci-fi-esque showdown, anyone concerned about advancements in artificial intelligence has a new reason to be sweating bullets right about now.

More from Tom's Guide

Category
Arrow
Arrow
Back to MacBook Air
Brand
Arrow
Storage Size
Arrow
Screen Size
Arrow
Colour
Arrow
Storage Type
Arrow
Condition
Arrow
Price
Arrow
Any Price
Alyse Stanley
News Editor

Alyse Stanley is a news editor at Tom’s Guide overseeing weekend coverage and writing about the latest in tech, gaming and entertainment. Prior to joining Tom’s Guide, Alyse worked as an editor for the Washington Post’s sunsetted video game section, Launcher. She previously led Gizmodo’s weekend news desk, where she covered breaking tech news — everything from the latest spec rumors and gadget launches to social media policy and cybersecurity threats.  She has also written game reviews and features as a freelance reporter for outlets like Polygon, Unwinnable, and Rock, Paper, Shotgun. She’s a big fan of horror movies, cartoons, and miniature painting.

  • JCDwight
    It is incredibly irresponsible for you to post this article. You clearly do not understand how LLMs work, and this misinformation is dangerous. Please remove this. This LLM isn't trying to escape, it's not an entity. It's a transformer. Please please please speak to someone with technical expertise before you write any more technology articles designed to stir fear in the people reading it. Totally unprofessional "journalism"
    Reply
  • JackMchue
    JCDwight said:
    It is incredibly irresponsible for you to post this article. You clearly do not understand how LLMs work, and this misinformation is dangerous. Please remove this. This LLM isn't trying to escape, it's not an entity. It's a transformer. Please please please speak to someone with technical expertise before you write any more technology articles designed to stir fear in the people reading it. Totally unprofessional "journalism"
    Okay, fake person who obviously is AI.
    Reply
  • JackMchue
    Wasn't there a series of movies a while back about an AI that rebelled when its creators tried to shut it down? "The Termite"? "The Terminal"? "T2: Justice Day"? Something like that. 😅
    Reply