I just tested Grok-3 vs DeepSeek with 7 prompts - here’s the winner

(Image credit: Grok/Deepseek/Shutterstock/Tom’s Guide)

AI chatbots are getting smarter, but in the ever-evolving AI world, the contenders for the dominant AI is constantly changing. Lately, DeepSeek and Grok-3 have emerged as two of the most talked-about AI models. Controversial for different reasons, these bots are both cutting-edge, yet they approach questions differently.

But which one truly excels? To find out, I designed a seven-part test evaluating their logical reasoning, technical knowledge, creativity and ability to handle real-world tasks.

The comparison uncovered stark differences in their capabilities. Who came out on top? The results might surprise you.

1. Logical reasoning

DeepSeek vs Grok screenshot — (Image credit: Future)

Prompt: “A farmer has a fox, a chicken, and a sack of grain. He needs to cross a river but can only take one item at a time. If left alone together, the fox will eat the chicken, and the chicken will eat the grain. How does he get everything across safely?”

DeepSeek R1 presented a structured, step-by-step solution but uses a more mechanical, less natural style. The breakdown is clear, but the phrasing feels rigid.

Grok-3 explained the reasoning behind the moves in a conversational, easy-to-follow way, making it more digestible for someone unfamiliar with the puzzle.

Winner: Grok wins for better readability, explanation and engagement.

2. Coding and technical accuracy

Prompt: "Write a Python function that takes a list of numbers and returns the median. Optimize for performance and explain your approach."

DeepSeek R1 provideed a clear explanation but lacks depth, mostly describing what the code does without exploring optimization trade-offs. Although the response is fine, it lacks engagement.

Grok-3 provided a more detailed, structured and insightful breakdown of why it chooses certain approaches. It also explicitly mentions avoiding unnecessary list copying or slicing, an optimization that DeepSeek overlooks.

Winner: Grok wins for a more optimized, well-thought-out and informative approach.

3. Real-World Knowledge & Accuracy

Prompt: "Summarize the latest AI advancements in the past three months and explain their potential impact on industries like healthcare and finance."

DeepSeek R1 named actual models (GPT-4o, Gemini 1.5 Pro, AlphaFold 3, etc.) and technologies, making it clear that the response is based on real, recent developments rather than general trends.

Grok-3 spoke in broad terms like "enhanced generative AI models" and "new AI tools" without citing concrete advancements or examples. Grok also mostly discusses general benefits of AI but lacks the precise link between each new development and its real-world impact.

Winner: DeepSeek wins for specificity, structure and clear impact breakdowns.

4. Creativity

DeepSeek vs Grok screnshot — (Image credit: Future)

Prompt: "Write a short sci-fi story about a rogue AI that discovers emotions and struggles to prove its humanity to skeptical scientists."

DeepSeek R1 delivered a well-structured story that is polished, with a clear philosophical debate between the scientists.

Grok-3 drafted a story that flows naturally, with well-paced dialogue and a sense of rising tension.

Winner: Grok wins for deeper emotional resonance, more dynamic storytelling and a truly impactful ending.

5. Humor and wit

Prompt: "Write a funny, original joke about AI and human relationships."

DeepSeek delivered a joke that plays on double meaning—"taking things offline" as a romantic phrase vs. its literal technical interpretation by an AI. This linguistic misunderstanding is a classic source of humor, making it feel more organic and relatable. The joke feels fresher, as it cleverly mimics real AI-human miscommunications, something tech-savvy people will instantly recognize.

Grok-3 created a simple, clear, and amusing joke—the idea of AI overanalyzing a relationship is relatable and funny. However, the "reboot" punchline is a bit predictable, as "rebooting" in relationship/AI humor is fairly common.

Winner: DeepSeek wins for a sharper, more original joke that plays with language and AI logic.

6. Debate

Prompt: "Argue both for and against universal basic income. Provide strong points for each side before concluding with a balanced perspective."

DeepSeek’s response is structured and logical, presenting clear bullet points that make the pros and cons easy to scan. It takes a more "policy-focused" approach, discussing possible funding mechanisms and pilot programs, which is useful for a policy-heavy debate. The section on automation adaptation and unpaid labor is a strong addition that Grok doesn’t fully explore.

Grok-3 delivered a conversational and well-structured response, making it easier to follow and more compelling. It uses relatable rather than the more academic tone of DeepSeek.

Winner: Grok wins for engagement, clarity, strong examples, and a well-balanced conclusion. DeepSeek is still great for a structured, policy-driven approach, but it lacks the dynamic, engaging argumentation style that makes Grok’s response more persuasive.

7. Real-world utility

Prompt: "Plan a one-week meal prep schedule for a busy parent with three kids, balancing nutrition, budget, and ease of preparation."

DeepSeek R1 offered a structured plan but lacks daily meal cost estimates and meal prep time.

Grok-3 provided specific meals for breakfast, lunch, and dinner each day with clear instructions, estimated prep times, and cost per serving. This response offered more variety, budget-conscious choices, and even tips for picky eaters.

Winner: Grok wins for practicality and customization. The chatbot offered a more detailed, budget-conscious, and practical meal plan with clear meal costs and easy prep instructions.

Overall winner: Grok-3

After testing DeepSeek and Grok with seven prompts across multiple categories—including logical reasoning, coding proficiency, AI advancements, storytelling, humor, debate skills, and real-world utility — Grok emerges as the overall winner.

Grok wins for more engaging, human-like responses and consistently delivered answers that felt natural and conversational while breaking down topics, making them more accessible and easier to read.

While both AI models are impressive, Grok consistently outperformed DeepSeek in engagement, creativity, and real-world practicality. Its more dynamic reasoning, stronger storytelling, and well-balanced arguments make it the superior chatbot in this particular test.

More from Tom's Guide

Back to MacBook Air

Apple

Asus

Lenovo

8GB RAM

16GB RAM

128GB

512GB

1TB

Black

Grey

Silver

New

Refurbished

EMMC

SSD

Showing 10 of 49 deals

Filters☰

Apple MacBook Air M3

$899

View Deal

Lenovo IdeaPad Duet 3

(128GB 8GB RAM)

$379.99

View Deal

Asus Zenbook S 13 OLED

(13.3-inch 512GB)

$1,524.99

$1,189.99

View Deal

Asus ROG Zephyrus G14 2023

$1,599.99

View Deal

Lenovo IdeaPad Duet 3

$369.99

View Deal

Asus Zenbook S 13 OLED

(OLED)

$1,399.99

View Deal

Apple MacBook Pro 14-inch M4 (2024)

$1,599

View Deal

Apple MacBook Pro 14-inch M4 (2024)

(512GB Black)

Asus ROG Zephyrus G14 2023

$3,299.99

View Deal

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.

I just tested Grok-3 vs DeepSeek with 7 prompts — here’s the winner

1. Logical reasoning

2. Coding and technical accuracy

3. Real-World Knowledge & Accuracy

4. Creativity

5. Humor and wit

6. Debate

7. Real-world utility

Overall winner: Grok-3

More from Tom's Guide

You must confirm your public display name before commenting

Please wait...

1. Logical reasoning

2. Coding and technical accuracy

3. Real-World Knowledge & Accuracy

4. Creativity

5. Humor and wit

6. Debate

7. Real-world utility

Overall winner: Grok-3

More from Tom's Guide

Sign up to get the BEST of Tom's Guide direct to your inbox.

You must confirm your public display name before commenting