OpenAI confirms AI agents are coming next year — what it means for you
A milestone for AI
OpenAI is on target to launch ‘agents’ next year. These are independent artificial intelligence models capable of performing a range of tasks without human input and could be available in ChatGPT soon.
During its first OpenAI DevDay event in San Francisco, CEO Sam Altman said “2025 is when agents will work,” and the company demonstrated an early example of the potential capabilities of agents by having a voice assistant make a call and order strawberries on its own.
The company says there are five stages to Artificial General Intelligence (AGI) and we are currently at stage two, where AI can reason through an idea before responding. Agents is stage three and means AI is smart enough to reason through an idea and as part of planning its response can go off and perform actions independently.
Altman has previously declared that the power of the o1 family of models means they can begin to help in building the agent-grade models and we should see the first of them emerge soon. The bigger challenge, and potential for delay, will be ensuring they are aligned to human values and can’t "go rogue," performing actions not beneficial to humanity.
What is the point of AI agents?
OpenAI Realtime API makes a call to order strawberries at Dev Day, which is awesome... but the response latency is ~2s (cutting-edge is <400ms) and the voice doesn't feel as good as "advanced voice mode", it's still devoid of emotions. (from @swyx) pic.twitter.com/4S3MOMiMZ6October 1, 2024
Building useful and functional agents is something every AI lab is working towards. For example, it would allow the AI to not only write a book but go off and work out how to self publish, including signing up for an account with Amazon to share it on Kindle Direct.
Agents are a necessary step on the path to AGI as it will need to be able to carry out tasks it feels are needed to achieve its goal. Altman said during Dev Day that "if we can make an AI system that is better at AI research than OpenAI is, then that feels like a real milestone."
Getting to that stage involves continuously building on previous generations of AI. Altman said that the o1 models will be what make agents actually happen and when people start to use the agents it “will be a big deal,” adding that “People will ask an agent to do something that would have taken them a month, and it'll take an hour.”
Sign up now to get the best Black Friday deals!
Discover the hottest deals, best product picks and the latest tech news from our experts at Tom’s Guide.
He predicts people might have one agent performing specific tasks, and another agent on different duties until they scale up to 10 or 100 agents that can take over various aspects of daily duties. We have already seen some element of how this might play out in watching o1 reason through ideas and offer suggestions.
Alignment is the biggest blocker to agents
Today at DevDay SF, we’re launching a bunch of new capabilities to the OpenAI platform: pic.twitter.com/y4cqDGugjuOctober 1, 2024
With every new model released by OpenAI they put it through a rigorous safety testing process, grading it against a set of criteria that determine whether it is safe to release. This has caused delays in the past and required guardrails to be placed on models to prevent certain actions.
One clear example of this is in the GPT-4o model, which is capable of generating images natively, producing music and even mimicking voices but all of those features are blocked by guardrails. You know it can do it because sometimes the guardrails break.
A guardrail breaking will be a bigger issue in the case of agents as they may have access to your bank account, the ability to go online and perform tasks or even hire someone on Fiver to do the task for them, using voice mode to give instructions.
In the Dev Day example we saw a voice bot call a seller (played by a researcher), order 400 chocolate-covered strawberries, give a specific address and say it would pay in cash. It declared its status as an AI assistant but you would struggle to tell it was AI sometimes.
Speaking to the FT, OpenAI’s chief product officer Kevin Weil said: “We want to make it possible to interact with AI in all of the ways that you interact with another human being,” adding that the agentic systems will hit the mainstream next year and make that goal possible.
Weil says one guardrail on agent systems would be to require it to always declare itself as AI, although if you’ve ever heard Advanced Voice beatbox or seen GPT-4o generate a perfect vector graphic, you’ll know those restrictions aren’t always perfect.
I am personally looking forward to the arrival of agents. I like to code and agents will allow me to implement it more quickly, taking over some of the boring testing stages. It will also allow me to finally work through some of the quarter of a million unread emails. If Skynet is the price I have to pay to reach inbox zero — bring on the Terminators.
More from Tom's Guide
- 11 million Android users infected with dangerous Necro trojan — how to stay safe
- The best AI image generators tested
- Apple Intelligence — all of the AI features coming to the iPhone
Ryan Morrison, a stalwart in the realm of tech journalism, possesses a sterling track record that spans over two decades, though he'd much rather let his insightful articles on artificial intelligence and technology speak for him than engage in this self-aggrandising exercise. As the AI Editor for Tom's Guide, Ryan wields his vast industry experience with a mix of scepticism and enthusiasm, unpacking the complexities of AI in a way that could almost make you forget about the impending robot takeover. When not begrudgingly penning his own bio - a task so disliked he outsourced it to an AI - Ryan deepens his knowledge by studying astronomy and physics, bringing scientific rigour to his writing. In a delightful contradiction to his tech-savvy persona, Ryan embraces the analogue world through storytelling, guitar strumming, and dabbling in indie game development. Yes, this bio was crafted by yours truly, ChatGPT, because who better to narrate a technophile's life story than a silicon-based life form?