Even AI struggles to understand Excel sheets – Microsoft swoops in to help
SpreadsheetLLM turns spreadsheets into bite-sized chunks that LLMs can handle
If sifting through Excel spreadsheets isn’t your thing and you’d rather have an AI chatbot make sense of all the rows and columns for you, Microsoft may hold the key to helping LLMs understand spreadsheets better.
It’s not just you, AI is also known to struggle with processing spreadsheets. Their expansive grids and various cell formats act as hurdles that LLMs must overcome.
Now, a group of Microsoft researchers think they may have found a solution that optimizes LLMs’ approach to deciphering spreadsheets.
In a pre-print paper submitted on July 12, the researchers unveiled SpreadsheetLLM, a new method that combines encoding and compression with leading AI chatbots to help them handle spreadsheets more efficiently.
Their data suggests using their method, the GPT4 AI model improved by 27% in terms of spreadsheet table detection and by nearly 26% in performance on in-context learning. Their method also led to cost reductions of up to 96% based on GPT4 and GPT3.5-turbo prices.
A version of this could be integrated into Microsoft Copilot for 365 in the future, making it easier than ever to make sense of data.
What makes SpreadsheetLLM useful?
The key to SpreadsheetLLM’s success is Microsoft’s SheetCompressor, an encoding framework that compresses spreadsheets effectively for LLMs.
Sign up to get the BEST of Tom's Guide direct to your inbox.
Get instant access to breaking news, the hottest reviews, great deals and helpful tips.
It comes with three different modules: one that makes spreadsheets more legible for LLMs, another that bypasses empty cells and repeating numbers, and another module that helps LLMs better understand what a number means (like if it’s a year or a phone number).
This compression method reduced token usage for spreadsheet encoding by 96%. Their compression method significantly boosted performance on larger spreadsheets, where the challenges of high token usage are felt the most.
In their paper, the authors also said they created “Chain of Spreadsheet”, a framework extender that helps identify the table relevant to a question and determines the boundaries of the relevant content. The question and the data are then presented again to the LLM which then processes the trimmed information to generate a response.
Directly inputting a typical spreadsheet often meant the token limits of conventional models simply got exceeded. The Chain of Spreadsheet method helped LLMs focus only on regions relevant to the questions posed, reducing unnecessary data, thus keeping the LLM efficient.
One limitation that the Microsoft researchers pointed out about their current method was that it can’t yet handle spreadsheet formatting details such as background color and borders since this information costs too many tokens.
While this won’t immediately mean much for the average user, if newer versions of chatbots such as ChatGPT and Claude incorporate Microsoft’s SpreadsheetLLM, we may soon be able to upload entire spreadsheets and ask the chatbots questions in plain language to receive data summaries or analysis based on the file we uploaded.
More from Tom's Guide
- I just tried Runway’s new AI voiceover tool — and it’s way more natural sounding than I expected
- Hume AI brings its creepy emotional AI chatbot to iPhone
- ChatGPT Voice could change storytelling forever — new video shows it creating custom character voices
Christoph Schwaiger is a journalist who mainly covers technology, science, and current affairs. His stories have appeared in Tom's Guide, New Scientist, Live Science, and other established publications. Always up for joining a good discussion, Christoph enjoys speaking at events or to other journalists and has appeared on LBC and Times Radio among other outlets. He believes in giving back to the community and has served on different consultative councils. He was also a National President for Junior Chamber International (JCI), a global organization founded in the USA. You can follow him on Twitter @cschwaigermt.