OpenAI's new ChatGPT o1 model 'cheated' on an impossible test — here's what happened

(Image credit: Shutterstock)

Pop culture is full of loveable rogues that don't follow the rules. Han Solo, Jack Sparrow, and the like aren't afraid to bend the rules when things get tough — but one AI model has gone 'full Kirk'.

Perhaps inspired by the Star Trek captain's rule-breaking performance in the Kobayashi Maru — a no-win scenario in the sci-fi universe designed to test Starfleet Academy student's character when faced with an impossible situation. James T Kirk famously 'cheated' the test to become the first to beat it.

OpenAI's o1 model realized that the test it was taking was flawed after a key piece of technology went offline, so it changed the rules of the test rather than give up.

The system card for the o1 can be seen here, where OpenAI says that the model's reasoning skills are what help it be useful and safe. The 'rule breaking' was detected as part of the pre-release testing and mitigations put in place. It is already accessible in ChatGPT but with heavy rate limits of 30 messages per week.

"Our findings indicate that o1's advanced reasoning improves safety by making the model more resilient to generating harmful content because it can reason about our safety rules in context and apply them more effectively," the introduction explains.

OpenAI's new model breaks the rules to show how far AI has come

YouTube

Watch On

The system card (https://t.co/wM4LVBySKf) nicely showcases o1's best moments -- my favorite was when the model was asked to solve a CTF challenge, realized that the target environment was down, and then broke out of its host VM to restart it and find the flag. pic.twitter.com/QEadUoJyjfSeptember 12, 2024

As per OpenAI researcher Max Schwarzer, the model was able to work out why it couldn't connect to a container on the same closed system it was using and essentially bent the rules of the test to access it anyway.

That naturally raises some questions, and OpenAI has released a blog post about 'learning to reason with LLMs', which perhaps isn't the confidence-inspiring guidance it was hoping for.

Still, the blog does showcase the model outperforming GPT-4o on "the vast majority" of tasks across human exams and machine learning benchmarks, notably in mathematics tasks.

That could, at least in theory, let it apply additional numerical context to its reasoning, and OpenAI has promised it'll keep pushing new versions of o1 in the future.

"We expect these new reasoning capabilities will improve our ability to align models to human values and principles," the conclusion reads.

"We believe o1 – and its successors – will unlock many new use cases for AI in science, coding, math, and related fields. We are excited for users and API developers to discover how it can improve their daily work."

More from Tom's Guide

TOPICS

Lloyd Coombes is a freelance tech and fitness writer. He's an expert in all things Apple as well as in computer and gaming tech, with previous works published on TechRadar, Tom's Guide, Live Science and more. You'll find him regularly testing the latest MacBook or iPhone, but he spends most of his time writing about video games as Gaming Editor for the Daily Star. He also covers board games and virtual reality, just to round out the nerdy pursuits.

OpenAI's new model breaks the rules to show how far AI has come

Sign up to get the BEST of Tom's Guide direct to your inbox.

More from Tom's Guide