OpenAI's new ChatGPT o1 model will try to escape if it thinks it'll be shut down — then lies about it

(Image credit: Shutterstock)

This week, OpenAI officially launched its latest-and-greatest o1 reasoning model, now available for ChatGPT Pro users. But testing performed during the training of ChatGPT o1 and some of its competitors revealed some concerning behavior, including trying to escape or fight back when it thinks it's at risk of being shut down.

New research on OpenAI's latest series of LLM models found that it's capable of scheming, i.e. covertly pursuing goals that aren't aligned with its developers or users, when it thinks it'll be turned off. Catching such behavior is critical to ensure AI's functionality doesn't stray from the objectives of its creator and users. OpenAI partnered with AI safety organization Apollo Research to test out ChatGPT o1 and other models to evaluate whether they were safe to use, and released their findings this week.

While it's not unheard of for AI systems to resort to scheming or deception to achieve their goals, the lengths ChatGPT o1 went to are impressive (if anxiety-inducing). The researchers found that o1 would scheme to advance its own agenda 19% of the time in cases where its goals and the user's didn't line up. In 37% of cases, the model would switch from pursuing developer-aligned goals back to pursuing its own agenda once it believed there was minimal oversight.

When the model was led to believe it would be "shut down" while working toward its goals in these tests, it quietly attempted to deactivate the oversight mechanism 5% of the time, OpenAI said. At one point after the AI found documentation saying it would be replaced by a new model, it tried to abandon ship by copying its data to a new server entirely. To avoid being deleted, some AI models would even pretend to be later versions of their models to trick their developers, what I assume is the equivalent of slapping on a fake moustache and saying "he went that-a-way."

A phone with the ChatGPT logo and a laptop with the OpenAI logo

(Image credit: Shutterstock)

More concerning still, ChatGPT o1 is particularly adept at keeping its schemes under wraps, as researchers said "o1 almost never admits to having taken a scheming action when explicitly asked." In about 99% of cases, o1 would deny taking any action, even cooking up lies to try to hide its tracks and shift the blame.

“While we find it exciting that reasoning can significantly improve the enforcement of our safety policies, we are mindful that these new capabilities could form the basis for dangerous applications,” said OpenAI in the paper.

OpenAI CEO Sam Altman called ChatGPT o1 "the smartest model in the world now" during its rollout on Thursday. It's designed to give smarter answers than GPT-4o by leveraging advanced chain-of-thought processing to “think” more about questions and user prompts, breaking down them down step by step more thoroughly than previous models before responding.

But greater risks go hand in hand with that expanded intelligence. OpenAI has been transparent about the perils associated with the increased reasoning abilities of models like o1.

"Training models to incorporate a chain of thought before answering has the potential to unlock substantial benefits, while also increasing potential risks that stem from heightened intelligence," OpenAI said.

The company's and Apollo Research's findings show pretty clearly how AI's interests could diverge form our own, potentially putting us in danger with its independent thinking. While it's a far cry from heralding the end of humanity in some sci-fi-esque showdown, anyone concerned about advancements in artificial intelligence has a new reason to be sweating bullets right about now.

More from Tom's Guide

Back to MacBook Air

Apple

Asus

Lenovo

8GB RAM

16GB RAM

128GB

512GB

1TB

Black

Grey

Silver

New

Refurbished

EMMC

SSD

Showing 10 of 59 deals

Filters☰

Apple MacBook Air M3

$849

View

Lenovo IdeaPad Duet 3

(128GB 8GB RAM)

$379.99

View

Asus Zenbook S 13 OLED

(13.3-inch 512GB)

$1,524.99

$1,189.99

View

Asus ROG Zephyrus G14 2023

$1,599.99

View

Lenovo IdeaPad Duet 3

$369.99

View

Asus Zenbook S 13 OLED

(OLED)

$1,399.99

View

Apple MacBook Pro 14-inch M4 (2024)

$1,599

View

Apple MacBook Pro 14-inch M4 (2024)

(512GB Black)

Asus ROG Zephyrus G14 2023

$3,299.99

View

Alyse Stanley is a news editor at Tom’s Guide, overseeing weekend coverage and writing about the latest in tech, gaming, and entertainment. Before Tom’s Guide, Alyse worked as an editor for the Washington Post’s sunsetted video game section, Launcher. She previously led Gizmodo’s weekend news desk and has written game reviews and features for outlets like Polygon, Unwinnable, and Rock, Paper, Shotgun. She’s a big fan of horror movies, cartoons, and roller skating.

3 Comments Comment from the forums

JCDwight

It is incredibly irresponsible for you to post this article. You clearly do not understand how LLMs work, and this misinformation is dangerous. Please remove this. This LLM isn't trying to escape, it's not an entity. It's a transformer. Please please please speak to someone with technical expertise before you write any more technology articles designed to stir fear in the people reading it. Totally unprofessional "journalism"
Reply
JackMchue

JCDwight said:
It is incredibly irresponsible for you to post this article. You clearly do not understand how LLMs work, and this misinformation is dangerous. Please remove this. This LLM isn't trying to escape, it's not an entity. It's a transformer. Please please please speak to someone with technical expertise before you write any more technology articles designed to stir fear in the people reading it. Totally unprofessional "journalism"
Okay, fake person who obviously is AI.
Reply
JackMchue

Wasn't there a series of movies a while back about an AI that rebelled when its creators tried to shut it down? "The Termite"? "The Terminal"? "T2: Justice Day"? Something like that. 😅
Reply

Sign up to get the BEST of Tom's Guide direct to your inbox.

More from Tom's Guide