OpenAI's new ChatGPT o1 model 'cheated' on an impossible test — here's what happened

ChatGPT logo on a smartphone screen being held outside
(Image credit: Shutterstock)

Pop culture is full of loveable rogues that don't follow the rules. Han Solo, Jack Sparrow, and the like aren't afraid to bend the rules when things get tough — but one AI model has gone 'full Kirk'.

Perhaps inspired by the Star Trek captain's rule-breaking performance in the Kobayashi Maru — a no-win scenario in the sci-fi universe designed to test Starfleet Academy student's character when faced with an impossible situation. James T Kirk famously 'cheated' the test to become the first to beat it.

OpenAI's o1 model realized that the test it was taking was flawed after a key piece of technology went offline, so it changed the rules of the test rather than give up.

The system card for the o1 can be seen here, where OpenAI says that the model's reasoning skills are what help it be useful and safe. The 'rule breaking' was detected as part of the pre-release testing and mitigations put in place. It is already accessible in ChatGPT but with heavy rate limits of 30 messages per week.

"Our findings indicate that o1's advanced reasoning improves safety by making the model more resilient to generating harmful content because it can reason about our safety rules in context and apply them more effectively," the introduction explains.

OpenAI's new model breaks the rules to show how far AI has come

Star Trek -- Kobayashi Maru - YouTube Star Trek -- Kobayashi Maru - YouTube
Watch On

As per OpenAI researcher Max Schwarzer, the model was able to work out why it couldn't connect to a container on the same closed system it was using and essentially bent the rules of the test to access it anyway.

That naturally raises some questions, and OpenAI has released a blog post about 'learning to reason with LLMs', which perhaps isn't the confidence-inspiring guidance it was hoping for.

Still, the blog does showcase the model outperforming GPT-4o on "the vast majority" of tasks across human exams and machine learning benchmarks, notably in mathematics tasks.

That could, at least in theory, let it apply additional numerical context to its reasoning, and OpenAI has promised it'll keep pushing new versions of o1 in the future.

"We expect these new reasoning capabilities will improve our ability to align models to human values and principles," the conclusion reads.

"We believe o1 – and its successors – will unlock many new use cases for AI in science, coding, math, and related fields. We are excited for users and API developers to discover how it can improve their daily work."

More from Tom's Guide

TOPICS
Lloyd Coombes

A freelance writer from Essex, UK, Lloyd Coombes began writing for Tom's Guide in 2024 having worked on TechRadar, iMore, Live Science and more. A specialist in consumer tech, Lloyd is particularly knowledgeable on Apple products ever since he got his first iPod Mini. Aside from writing about the latest gadgets for Future, he's also a blogger and the Editor in Chief of GGRecon.com. On the rare occasion he’s not writing, you’ll find him spending time with his son, or working hard at the gym. You can find him on Twitter @lloydcoombes.