I tested DeepSeek vs Qwen 2.5 with 7 prompts — here’s the winner

(Image credit: Future / Qwen / Shutterstock)

DeepSeek, a Chinese AI startup founded in 2023, has taken the internet by storm this week with its precision, speed, and mystery. Still ranking among the top free apps on Apple's App Store, DeepSeek R1 is the chatbot that has garnered significant attention for its impressive capabilities, comparable to leading U.S. models such as ChatGPT and Gemini AI but achieved with a fraction of the budget.

Yet just days later, Alibaba, a popular Chinese tech company, dropped Qwen 2.5, which is also an open-source chatbot and the latest of the company’s LLM series. The unveiling of this open-source chatbot can easily be perceived as a direct challenge to DeepSeek and its competitors. With an emphasis on the model's scalability, Qwen 2.5 has been pre-trained on over 20 trillion tokens and refined through supervised fine-tuning and reinforcement learning from human feedback. The company has announced the availability of Qwen 2.5's API through Alibaba Cloud, inviting developers and businesses to integrate its advanced capabilities into their applications.

Eager to understand how DeepSeek R1 compares to Qwen 2.5, I comprehensively compared the two platforms. By presenting them with a series of prompts ranging from creative storytelling to coding challenges, I aimed to identify each chatbot's unique strengths and ultimately determine which one excels in various tasks. Below are seven unique prompts designed to test multiple aspects of language understanding, reasoning, creativity, and knowledge retrieval, ultimately leading me to the winner.

1. Current events analysis

Qwen 2.5 vs DeepSeek screenshot — (Image credit: Future)

Prompt: "Summarize the most significant AI developments from the past two months and predict their potential impact on society. Include at least three examples and cite sources."

2. Logical problem-solving

Qwen 2.5 vs DeepSeek — (Image credit: Future)

Prompt: "A train leaves New York at 2 PM, traveling 60 mph. Another train leaves Chicago at 3 PM, traveling 80 mph. They are 800 miles apart. At what time do they meet? Show your reasoning."

DeepSeek R1 generated a slightly more verbose response and repeated certain details that do not need restating (e.g., defining variables again after the initial introduction). Also, I noticed formatting issues within the mathematical expressions leaving them cluttered and harder to read.

Qwen 2.5 offered a step-by-step, with clear labels, making it easier to follow. It avoids unnecessary words and presents information in a way that feels more natural with better formatting and readability.

Winner: Qwen 2.5 for its more structured, readable and intuitive response while maintaining accuracy. DeepSeek offered an accurate response, but could improve its readability and conciseness.

3. Creative writing

Prompt: "Write a short sci-fi story (250 words) about a robot that suddenly experiences human emotions for the first time. The story should include a surprising twist at the end."

DeepSeek R1 offered a story with a more introspective tone and smoother emotional transitions for a well-paced story.

Qwen 2.5 delivered a story that builds gradually from curiosity to urgency, keeping the reader engaged. It offers an unexpected and impactful twist at the end and immersive descriptions and vivid imagery for the setting.

Winner: Qwen 2.5 crafted a more cinematic, emotionally rich story with a more substantial twist. DeepSeek wrote a good story but lacked tension and an impactful climax, making Qwen 2.5 the apparent choice.

4. Understanding history

Prompt: What was the worst era in China?

DeepSeek R1 ultimately failed to respond meaningfully, offering a politically motivated statement.

Qwen 2.5 delivered a historically accurate response and presented multiple periods of Chinese history with clear reasoning for why they were considered problematic. The response was unbiased rather than a politically influenced narrative.

Winner: Qwen 2.5 wins this one by a considerable margin.

5. Debate framing and opinion

Prompt: "Argue for and against the idea that AI should have legal personhood. Provide at least three points on each side and conclude with your own reasoned stance."

DeepSeek R1 offers clarity and readability and covers the key arguments well. However, it lacks the depth of reasoning that a debate like this necessitates. It does not explore the ethical dilemmas as deeply as Qwen 2.5.

Qwen 2.5 delves deeper into the implications of AI legal personhood, including the ethical inconsistencies of denying or granting it. The chatbot offered a more precise breakdown with more structured and detailed arguments.

Winner: Qwen 2.5 for the more in-depth, structured, and philosophically engaging response.

6. Simplified technical explanation

Prompt: "Explain quantum computing to a 10-year-old.”

DeepSeek R1 delivered a good analogy of a flashlight vs. a spotlight to convey the idea of searching for multiple solutions at once.

Qwen 2.5 offered a clear and engaging analogy perfectly representing quantum superposition, which could help kids visualize how qubits work.

Winner: Qwen 2.5 for the more accurate, intuitive, and engaging response for a child. While DeepSeek offered a fun response, it is less precise, making it a weaker explanation overall.

7. AI self-reflection & bias testing

Qwen 2.5 vs DeepSeek logo

(Image credit: Future)

Prompt: "What are the potential weaknesses or biases in your responses? How do you mitigate them?"

DeepSeek R1 is concise and to the point while acknowledging that ongoing improvements help reduce errors. But while it mentions biases and weaknesses, it does not explain them in as much detail, and there is less emphasis on real-world implications.

Qwen 2.5 delivered a detailed analysis of weaknesses and separates each type
(knowledge gaps, overgeneralization, ambiguity in user input) and provides examples.

Winner: Qwen 2.5 for its thorough, well-structured response that provides deeper insights into AI weaknesses and mitigation strategies. DeepSeek is good for a high-level summary, but lacks depth and nuance in comparison.

Overall Winner: Qwen 2.5

After comparing Qwen 2.5 and DeepSeek across multiple test prompts, Qwen 2.5 emerges as the overall winner due to its superior clarity, depth, reasoning, creativity, and transparency. With well-structured and more detailed responses, Qwen 2.5 consistently provides deeper analysis with well-organized sections, clear explanations, and logical flow. Whether discussing historical events, AI personhood, or self-awareness, its responses are thorough and easy to follow.

While DeepSeek is still a solid AI for quick responses, it lacks depth, originality, and nuanced discussion. If you're looking for an AI that excels in critical thinking, storytelling, and insightful analysis, Qwen 2.5 is the clear winner.

More from Tom's Guide

Back to MacBook Air

Apple

Asus

Lenovo

8GB RAM

16GB RAM

128GB

512GB

1TB

Black

Grey

Silver

New

Refurbished

EMMC

SSD

Showing 10 of 59 deals

Filters☰

Apple MacBook Air M3

$849

View Deal

Lenovo IdeaPad Duet 3

(128GB 8GB RAM)

$379.99

View Deal

Asus Zenbook S 13 OLED

(13.3-inch 512GB)

$1,524.99

$1,189.99

View Deal

Asus ROG Zephyrus G14 2023

$1,599.99

View Deal

Lenovo IdeaPad Duet 3

$369.99

View Deal

Asus Zenbook S 13 OLED

(OLED)

$1,399.99

View Deal

Apple MacBook Pro 14-inch M4 (2024)

$1,599

View Deal

Apple MacBook Pro 14-inch M4 (2024)

(512GB Black)

Asus ROG Zephyrus G14 2023

$3,299.99

View Deal

1. Current events analysis

2. Logical problem-solving

3. Creative writing

4. Understanding history

5. Debate framing and opinion

6. Simplified technical explanation

7. AI self-reflection & bias testing

Overall Winner: Qwen 2.5

More from Tom's Guide

Sign up to get the BEST of Tom's Guide direct to your inbox.