OpenAI reveals Swarm — a breakthrough new method for getting AI to do things on your behalf
What happens when one AI is just not enough?
OpenAI has released a new AI technology called the Swarm Framework. This open-source project marks a new milestone in the ongoing AI gold rush.
The framework offers developers a comprehensive set of tools for creating multi-agent AI systems that can complete tasks and goals while cooperating autonomously.
The launch is a surprisingly low-key release that could have profound effects on how we interact with AI in the future. OpenAI makes it clear that this is just a research and educational experiment — but they said that about ChatGPT in 2022.
OpenAI Swarm gives us a glimpse at a future version of ChatGPT where you can ask the AI a question and it can go off and search multiple sources, coming back with a comprehensive answer. It could also perform tasks on different websites or in the real world on your behalf.
A quiet revolution in AI technology
introducing swarm: an experimental framework for building, orchestrating, and deploying multi-agent systems. 🐝https://t.co/97n4fehmtMOctober 11, 2024
There’s a quiet revolution happening in the backrooms of the AI business, and it will surprise a lot of people when it arrives. We’ve had a glimpse of it recently, but the full impact is still to come. And no, it’s not that mystical shimmer of AGI that everyone seems to be focusing on, but instead a different kind of heading.
Deep in the bowels of AI Inc., researchers are racing to create cooperative AI agents — that is systems that work together to get tasks done over some time, rather than offer instant answers as with today’s chatbots. So what’s the big deal?
Well to understand why this matters, it helps to understand the limitations of the current AI that most of us know and love (or hate). Most AI use currently revolves around the use of large language models (LLMs) which are trained to provide general services to users.
Sign up to get the BEST of Tom's Guide direct to your inbox.
Get instant access to breaking news, the hottest reviews, great deals and helpful tips.
Anything from text translation to report writing and help with math homework. These models offer a ‘jack of all trades, master of none’ solution, and they’re perfect for everyday use by Joe Public.
Welcome to System 2 AI
The next step up from a simple LLM are the ‘fine-tuned’ models which are focused on a specific domain, say a medical bot or a tool for providing strategic financial advice.
These specialized products are typically created in-house by large corporations, although there are a growing number of specialized AI tools reaching the general market in the form of subscription services.
A good example is Lyrebird, which is specifically trained to listen in on a doctor’s patient consultation — with permission — and afterward, transcribe it into properly structured text for the patient’s clinical notes.
The latest generation of AI, typically referred to as System 2 tech, incorporates a slew of new and powerful functionality. Most people will have heard about OpenAI’s new o1 model, previously code-named Strawberry, which is designed to spend more time ‘thinking’ about a problem, before giving a solution.
This reasoning ability is seen as a crucial part of System 2 AI models. Developers are now seeing longer reasoning times as a massive benefit to the quality of AI output, a stark contrast to a year ago when fast-is-best was the order of the day.
A need for reasoning and automation
Lengthy reasoning and problem-solving is only a part of the new AI equation. Alongside these new features, we are about to witness an explosion in agentic AI. These software agents will be able to autonomously perform tasks and achieve user-set goals on their own. No prompt is needed. If this sounds like science fiction, then know that agents are already in use in specific task domains.
One example is Factory.ai, which offers software engineering agents, called Droids, to automate the process and deployment of enterprise applications. The company estimates that its system can save around $18,000 a year per software engineer employed. Powerful stuff.
The new Swarm Framework aims to make this kind of tool easier to create and deploy, so we can expect to see a flood of these agent solutions come to market over the next two years or so.
The key to agent acceptance will come from the increased power of the backend LLMs. The newer models provide the kind of autonomy needed for agents to really take off. This is going to power a revolution in software applications.
Sequoia Capital, in its recent report on the LLM sector, talked about a shift from companies renting cloud software as a service (SaaS) - for example using Adobe Creative Cloud or Microsoft Office — to a new paradigm which others call Outcome as a Service (OaaS). Instead of AI answering questions, it will go off and do jobs for us, only getting paid when the task is done.
For example, the new Sierra AI agentic system is a customer support bot that gets paid for each successful customer interaction, not on a monthly rental basis. It communicates by voice, in the user’s language, and can access all the information it needs to deal with everyday queries. Where it can’t, it seamlessly passes the inquiry on to a human support manager.
Changing our lives forever
Dario Amodei, the CEO and co-founder of OpenAI rival Anthropic, sums it up best when he talks about the type of powerful AI that’s about to enter our lives.
“[It] has all the 'interfaces' available to a human working virtually, including text, audio, video, mouse and keyboard control, and internet access. It can engage in any actions, communications, or remote operations...it does not just passively answer questions; instead, it can be given tasks that take hours, days, or weeks to complete, and then goes off and does those tasks autonomously.”
These are not new ideas. Famed AI scientist Ilya Sutskever, a co-founder of OpenAI, was talking about this kind of functionality back in 2018, when AI was just emerging from its sci-fi, holodeck beginnings.
He talked about ‘a society of agents’ which will integrate into our daily lives using a growing set of communication skills. We’ve already seen the first signs of this vision with the arrival of OpenAI’s advanced voice mode, which is identical to chatting with a human in almost every way.
However Amodei takes it a stage further, and suggests that we could soon be witnessing millions of agents collaborating together in what he calls a ‘country of geniuses in a datacenter’.
It’s an improbably grandiose concept, but taken together with the lightning speed of current AI development, and the non-stop product and research releases, and it’s obvious what future the scientists are aiming for. The only thing we don’t know is the exact time-frame, but all the signs are pointing towards sooner rather than later.
Final thoughts
The final word should go to Amodei, who to be fair, does try to mitigate the hype and keep our feet a little more on the ground. While talking about the huge potential upheaval in everything from health to economics and governance, he makes it clear that there are still major impediments to the kind of progress that is possible from ‘powerful AI’ (he doesn’t like the term AGI).
“The speed at which a major project—for example developing a cancer cure—can be completed may have an irreducible minimum that cannot be decreased further even as intelligence continues to increase...some things are inherently unpredictable or chaotic and even the most powerful AI cannot predict or untangle them substantially better than a human...there are certain physical laws that appear to be unbreakable. It’s not possible to travel faster than light. Pudding does not unstir.”
Anthropic has made its name from delivering ‘safe’ AI products, which have a primary goal of delivering the benefits of artificial intelligence, while trying to minimize the risks. It’s good to know that at least some of the people delivering this astonishing revolution – maybe the biggest ever – are spending time to consider the true ramifications of what they’re building.
More from Tom's Guide
- OpenAI shares a new GPT-4o advanced voice demo — it can teach you a language
- ChatGPT Advanced Voice is out — 9 examples showing why you should be excited
- ChatGPT-4o Advanced Voice features — OpenAI just revealed when they’re coming
Nigel Powell is an author, columnist, and consultant with over 30 years of experience in the technology industry. He produced the weekly Don't Panic technology column in the Sunday Times newspaper for 16 years and is the author of the Sunday Times book of Computer Answers, published by Harper Collins. He has been a technology pundit on Sky Television's Global Village program and a regular contributor to BBC Radio Five's Men's Hour.
He has an Honours degree in law (LLB) and a Master's Degree in Business Administration (MBA), and his work has made him an expert in all things software, AI, security, privacy, mobile, and other tech innovations. Nigel currently lives in West London and enjoys spending time meditating and listening to music.