Nobody knows how ChatGPT thinks — but OpenAI says it’s closer to cracking the mystery

(Image credit: Shutterstock)

Creators of AI chatbots like ChatGPT can explain how they train them and even how the underlying technology works but then can’t fully explain what their creations do with that information they've been trained on.

It is an important issue to solve, as often AI developers are taken by surprise at what their creations can do — and can't do. For example the Udio team created an AI music model but found it could write and perform standup comedy.

Even the leaders in the field struggle to grapple with how to work out what LLMs and other frontier models are doing with the information, but it seems OpenAI has made a first step into decoding this mystery.

What have OpenAI discovered?

Graphic to show encoding and decoding of data with AI models — (Image credit: OpenAI)

We currently don't understand how to make sense of the neural activity within language models.
OpenAI

They did so using a technology called sparse autoencoders, which are like machine learning models which can identify the ‘more important’ features. This is opposed to other types of autoencoders that consider all features, making them less useful.

Say you’re discussing cars with a friend. You still have knowledge about how to make your favorite dish but that concept is not very likely to come up in the car discussion.

OpenAI said sparse autoencoders find out which are the more useful set of features or concepts important to generate an answer to a prompt. Similar to the smaller set of concepts a person relies on in any particular discussion.

However, while sparse autoencoders can find features in a given model, that’s only one step towards interpreting it. More work is needed to understand how a model fully uses those features.

OpenAI thinks this work is important because understanding how models work means they can find better ways to approach model safety.

One part of a bigger picture

We're sharing progress toward understanding the neural activity of language models. We improved methods for training sparse autoencoders at scale, disentangling GPT-4’s internal representations into 16 million features—which often appear to correspond to understandable concepts.… pic.twitter.com/UFP0EfEKSLJune 6, 2024

Another challenge is the training of sparse autoencoders which is made complex for various reasons including requiring more computational power to handle the necessary restrictions and avoiding overfitting.

However, OpenAI says it developed new state-of-the-art methodologies which allow it to scale out sparse autoencoders to tens of millions of features on frontier AI models such as GPT-4 or GPT-4o.

To check the interpretability of such features, OpenAI listed fragments of documents where these features activate. These included phrases related to price increases and rhetorical questions.

What happens next?

Sam Altman CEO of OpenAI — (Image credit: Getty Images)

While it’s a first step that shows what large language models are focusing on, OpenAI also admits there are several limitations.

For starters, many of the features they discovered are still hard to interpret with many activating with no clear pattern. Furthermore, they also don’t yet have good ways to check the validity of interpretations.

In the short term, OpenAI hopes the features they found can help monitor and steer language models behaviors.

For the long term, OpenAI wants interpretability to provide new ways to reason about model safety and robustness. Understanding how and why an AI model works in the way it does will help people trust it when it’s making important decisions.

More from Tom's Guide

Back to MacBook Air

Apple

Asus

Lenovo

Intel Core i7

Intel Core M3

Intel Pentium

8GB RAM

16GB RAM

128GB

512GB

1TB

13.3-inch

15-inch

Black

Grey

Silver

EMMC

SSD

New

Refurbished

Showing 10 of 43 deals

Filters☰

(512GB)

Lenovo IdeaPad Duet 3

$369.99

View

Asus ROG Zephyrus G14 2023

$1,599.99

View

Asus Zenbook S 13 OLED

(13.3-inch 512GB)

$1,524.99

$1,189.99

View

Asus Zenbook S 13 OLED

(OLED)

$1,199.99

View

Apple MacBook Pro 14-inch M3 (2023)

(1TB Intel Core M3)

Our Review

☆☆☆☆☆

Asus Zenbook S 13 OLED

(13.3-inch 1TB)

$1,849.99

View

Apple MacBook Pro 14-inch M3 (2023)

(1TB SSD)

Our Review

☆☆☆☆☆

(512GB Silver)

TOPICS

Christoph Schwaiger is a journalist who mainly covers technology, science, and current affairs. His stories have appeared in Tom's Guide, New Scientist, Live Science, and other established publications. Always up for joining a good discussion, Christoph enjoys speaking at events or to other journalists and has appeared on LBC and Times Radio among other outlets. He believes in giving back to the community and has served on different consultative councils. He was also a National President for Junior Chamber International (JCI), a global organization founded in the USA. You can follow him on Twitter @cschwaigermt.