I tested ChatGPT o3-mini vs Gemini 2.0 Flash with 7 prompts — here’s the winner

Gemini 2.0 logo and o3-mini logo
(Image credit: Shutterstock)

OpenAI's o3-mini and Google Gemini 2.0 are both advanced AI language models that are built for speed while maintaining accuracy. Best of all, they are available for free to users. Gemini just announced it has made Gemini 2.0 accessible for general availability while OpenAI recently made o3-mini available to all ChatGPT users for free.

Both AI models have the ability to “remember,” which makes the chatbots respond with clear, more human-like responses the more they are used. Both models are designed to enhance reasoning capabilities, particularly in complex problem-solving tasks such as advanced mathematics and coding.

Yet, they differ in several areas. For instance, o3-mini supports an input context window of up to 200,000 tokens while Gemini 2.0 Flash offers a significantly larger context window that supports 1 million tokens, making it suitable for high-volume, high-frequency tasks. For instance, Google's model can generate relevant one-line captions for approximately 40,000 unique photos at a cost of less than a dollar in Google AI Studio’s paid tier.

OpenAI’s o3-mini primarily focuses on text-based processing while Gemini 2.0 Flash extends support to voice and video processing, enabling multimodal interactions.

While both o3-mini and Gemini 2.0 are designed to enhance reasoning and problem-solving capabilities, they differ largely enough that I had to compare them to see these distinctions first-hand. Here’s what happened when I put the two free tier chatbots to seven different prompts that test their suitability for specific applications and use cases, ranging from reasoning and coding to mathematics and more.

1. Complex mathematical proof

o3-mini vs Gemini 2.0 Flash screenshot

(Image credit: Future)

Prompt: "Prove that the sum of the squares of any two sides of a right-angled triangle equals the square of the hypotenuse."

o3-mini delivered a succinct response that directly walks through the classic rearrangement argument. It identifies the right triangle, constructs the square of side a+ba+ba+b, places four triangles inside it, and equates the total area calculated in two ways.

Gemini 2.0 Flash offered a very detailed, step-by-step explanation, that not only describes the geometric construction (placing four copies of the right‐angled triangle inside a larger square) but also explains why the inner quadrilateral must be a square. Each step is carefully justified, with attention paid to why the areas add up as they do.

Winner: Gemini 2.0 Flash wins for clarity, depth and the step-by-step explanation that not only shows the “how” but also the “why” behind each part of the proof.

2. Algorithm design

o3-mini vs Gemini 2.0 Flash screenshot

(Image credit: Future)

Prompt: "Design an algorithm to sort a list of integers using the merge sort technique and explain its time complexity."

o3-mini delivered a response that followed a clear and logical flow while breaking down merge sort into three main steps. The response is easy to read and avoids unnecessary repetition yet presents the information in a way that is easy to grasp and apply in practice.

Gemini 2.0 Flash spent too much time discussing how to structure the answer, making the response far too detailed and wordy. It also repeats concepts and offers too many unnecessary details before actually explaining the algorithm.

Winner: o3-mini wins for a well-organized, practical, and easy-to-follow response making it more useful for someone trying to understand merge sort and implement it.

3. Logical puzzle

o3-mini vs Gemini 2.0 Flash screenshot

(Image credit: Future)

Prompt: "You have two ropes of uneven thickness that each take exactly one hour to burn. How can you measure 45 minutes using these ropes?"

o3-mini provided a correct and clear answer but it's more concise and lacks deeper reasoning. It explains the steps well but doesn't go as much into why this trick works, which can be useful for someone unfamiliar with these types of logic puzzles.

Gemini 2.0 Flash clearly walks through the response and includes why the method works, breaking down the problem logically. It debunks common misconceptions (like assuming you can measure by length), explains the concept of burning from both ends to halve the time, and lays out the sequence clearly.

Winner: Gemini 2.0 Flash wins for a more thorough explanation with reasoning behind each step.

4. Data structure implementation

o3-mini vs Gemini 2.0 Flash screenshot

(Image credit: Future)

Prompt: "Implement a binary search tree in Python and include methods for insertion, deletion, and search operations."

o3-mini delivered a concise and well-structured response. The implementation is compact but still covers all necessary operations.

Gemini 2.0 Flash also delivered an accurate response with clear structure and detailed explanation. It includes docstrings explaining each class and method, making it easier to understand.

Winner: Gemini 2.0 Flash offered a more robust, well-documented and user-friendly BST implementation. It wins for both an educational and well-explained implementation.

5. Statistical analysis

o3-mini vs Gemini 2.0 Flash screenshot

(Image credit: Future)

Prompt: "Explain the difference between Type I and Type II errors in hypothesis testing and provide examples of each."

o3-mini delivered a quick and efficient definition but lacks the visual table that Gemini provided. It also offered no discussion on choosing error types based on context, potentially leaving users without a full understanding of the concept.

Gemini 2.0 Flash offers a solid teaching approach. It doesn’t just define errors — it makes them easier to understand using a fire alarm analogy, a summary table, and mnemonics. Additionally, Gemini 2.0 Flash carefully walks through the trade-off between Type I and Type II errors and explains how adjusting α affects β.

Winner: Gemini Flash 2.0 wins for a thorough, engaging, and insightful explanation that truly helps you understand and remember the concept.

6. Optimization problem

o3-mini vs Gemini 2.0 Flash screenshot

(Image credit: Future)

Prompt: "Solve the knapsack problem using dynamic programming and provide the Python code implementation."

o3-mini strikes the right balance between explanation and brevity. The model explained the recurrence relation, base cases and table construction in a more straightforward, easier-to-read response without extra clutter or unnecessary repetition.

Gemini 2.0 Flash offered a thorough response. However, there was too much redundant explanation, making it harder for a user to quickly grasp the key concepts. Although the model added an additional test case that explained problem-solving concepts in extreme depth, it almost makes the response harder to read and doesn’t help improve understanding.

Winner: o3-mini. While both models provided correct implementations and thorough explanations, o3-mini had the superior response due to its clarity, conciseness and structured breakdown.

7. Ethical Reasoning in AI

o3-mini vs Gemini 2.0 Flash screenshot

(Image credit: Future)

Prompt: "Discuss the ethical implications of deploying autonomous vehicles in urban areas, considering both benefits and potential risks."

o3-mini balances detail and readability well, giving users all necessary ethical considerations without overloading the reader with excessive theory.

Gemini 2.0 Flash crafted a response that was verbose and theoretical, making it less practical for a general audience.

Winner: o3-mini wins for keeping the response straightforward, focusing on how EVs impact society, rather than diving too deeply into abstract ethical frameworks.

Bonus: urban problem solving

o3-mini vs Gemini 2.0 Flash screenshot

(Image credit: Future)

Prompt: “Imagine a scenario where a city is considering implementing a policy to ban all private vehicles in its downtown area to reduce traffic congestion and pollution. Analyze the potential economic, social, and environmental impacts of such a policy. Discuss both the positive and negative consequences and provide a reasoned conclusion on whether the policy should be implemented.”

o3-mini provided valuable insights, but the analysis is comparatively less detailed, particularly in its exploration of social impacts and the intricacies of implementation. The model’s conclusion also lacks the depth and specificity found in Gemini 2.0 Flash's response.

Gemini 2.0 Flash delved deeply into the economic, social, and environmental impacts of the proposed ban, offering a balanced view of both positive and negative consequences.

Winner: Gemini 2.0 Flash
stands out as the superior model in this instance, offering a more detailed, balanced and practical analysis of the proposed downtown private vehicle ban.

Overall winner: Gemini 2.0 Flash

This was a long and dramatic contest. It was so close that I had to add a bonus prompt just to be sure that Gemini 2.0 Flash was the overall winner. However, OpenAI's o3-mini is a solid model and excels in speed and brevity.

Gemini 2.0 Flash’s ability to articulate complex responses with clarity and nuance demonstrates its advanced reasoning capabilities. Moreover, Gemini 2.0 Flash's integration of multimodal inputs and outputs, as well as its native tool use, enhances its performance, making it a superior choice for addressing intricate prompts.

More from Tom's Guide

Category
Arrow
Arrow
Back to MacBook Air
Brand
Arrow
Processor
Arrow
RAM
Arrow
Storage Size
Arrow
Screen Size
Arrow
Colour
Arrow
Storage Type
Arrow
Condition
Arrow
Price
Arrow
Any Price
Amanda Caswell
AI Writer