Google strikes $60m deal with Reddit for AI training data — what you need to know

Google headquarters in California
(Image credit: Achinthamb/Shutterstock)

Reddit spent the latter half of 2023 considering whether to block the Google and Bing search engines from indexing posts on the site. The decision, according to The Washington Post , was in order to prevent the unauthorized and uncompensated use of its posts to train AI. 

Now Reddit has announced it's reached a deal with Google that will, among other things, give the company access to the Reddit Data API “to improve its products and services” which includes “more efficient ways to train models”. In Google’s words, access to said API will grant the company “real-time, structured, unique content from their large and dynamic platform.” 

The deal, which Bloomberg previously suggested would be “worth about $60 million on an annualized basis”, doesn’t stop there. As part of the agreement, Reddit will have access to Google’s Vertex AI service which should improve internal search results, and it will also allow for “Reddit content to be displayed across Google products.” 

Google says this will ensure “more content-forward displays of Reddit information that will make our products more helpful for our users and make it easier to participate in Reddit communities and conversations.” Given the number of people who affix the word “reddit” to searches to surface genuine user-generated insights, that could be a very good thing to the average Google user.

But for Google, the real prize is undoubtedly the vast treasure trove of training data, which will theoretically make its generative AI appear more human, thanks to the posts and comments written by millions of real people every day.

For Google, the real prize is undoubtedly the vast treasure trove of training data, which will theoretically make its generative AI appear more human.

But scale isn’t everything, and in some ways Reddit is an imperfect sample for training artificial intelligence when compared to literature or magazines. Grammar is faster and looser, there’s a lot of memes and inside jokes, it’s full of information that’s just plain wrong and it's predominantly male.

Reddit logo and Reddit logo on phone

(Image credit: Shutterstock)

By contrast, Apple has reportedly sought multi-million dollar deals with publishers in order to train on their more formal and factually accurate magazines and newspapers. Though obviously this has its disadvantages too, concentrating on another small part of the human experience at the expense of how everyday people communicate — something Reddit is undoubtedly better at demonstrating.

Expect more of such deals to be made public over the next few years, because people are realizing that AI means big money and that training data can’t be absorbed free of charge without consequences. In the last year, Open AI, Meta and Stability AI have all been hit by lawsuits from authors who claim that their books were used for training without permission or compensation.

More from Tom's Guide

TOPICS
Alan Martin

Freelance contributor Alan has been writing about tech for over a decade, covering phones, drones and everything in between. Previously Deputy Editor of tech site Alphr, his words are found all over the web and in the occasional magazine too. When not weighing up the pros and cons of the latest smartwatch, you'll probably find him tackling his ever-growing games backlog. Or, more likely, playing Spelunky for the millionth time.

Read more
ChatGPT on phone with Google logo in background
New study reveals people are ditching Google for AI tools like ChatGPT search — here's why
AI Mode of google search
Google launches 'AI Mode' for search — here's how to try it now
Gemini logo
7 ways I use Gemini Advanced — and why I think it's worth it
Gemini screenshot image
Google unveils Gemini 2.5 — claims AI breakthrough with enhanced reasoning and multimodal power
ChatGPT search interface
ChatGPT Search is now open to everyone — no account required
Gemini 2
Google Gemini 2.0 is now free for users — here’s how to access it now
Latest in AI
Bill Gates in 2019
Bill Gates just predicted the death of every job thanks to AI — except for these three
Gemini screenshot image
Google unveils Gemini 2.5 — claims AI breakthrough with enhanced reasoning and multimodal power
nyc spring day AI image
OpenAI just unveiled enhanced image generator within ChatGPT-4o — here's what you can do now
A nervous woman looking at her phone
Is ChatGPT making us lonely? MIT/OpenAI study reveals possible link
AI in man's hand
AI
AI Madness faceoff logo
I just tested Grok vs. DeepSeek with 7 prompts — here's the winner
Latest in News
Bill Gates in 2019
Bill Gates just predicted the death of every job thanks to AI — except for these three
NYTimes Connections
NYT Connections today hints and answers — Wednesday, March 26 (#654)
Gemini screenshot image
Google unveils Gemini 2.5 — claims AI breakthrough with enhanced reasoning and multimodal power
Samsung Galaxy Z Flip 6 review.
Samsung Galaxy Z Flip 7 design just teased in new cases leak — and the outer display is huge
Google Chrome
Chrome failed to install on Windows PCs, but Google has issued a fix — here's what happened
nyc spring day AI image
OpenAI just unveiled enhanced image generator within ChatGPT-4o — here's what you can do now
  • slightnitpick
    But scale isn’t everything, and in some ways Reddit is an imperfect sample for training artificial intelligence when compared to literature or magazines. Grammar is faster and looser, there’s a lot of memes and inside jokes, it’s full of information that’s just plain wrong and it's predominantly male.
    Not to mention the ad hoc moderation decisions. For interactions between people this moderation already makes what people say artificial.

    Will Google have access to deleted posts and comments as well? That would be the only saving grace, and even it has its limits.
    Reply