I tested DeepSeek vs Qwen 2.5 with 7 prompts — here’s the winner

Deepseek vs Qwen
(Image credit: Future / Qwen / Shutterstock)

DeepSeek, a Chinese AI startup founded in 2023, has taken the internet by storm this week with its precision, speed, and mystery. Still ranking among the top free apps on Apple's App Store, DeepSeek R1 is the chatbot that has garnered significant attention for its impressive capabilities, comparable to leading U.S. models such as ChatGPT and Gemini AI but achieved with a fraction of the budget.

Yet just days later, Alibaba, a popular Chinese tech company, dropped Qwen 2.5, which is also an open-source chatbot and the latest of the company’s LLM series. The unveiling of this open-source chatbot can easily be perceived as a direct challenge to DeepSeek and its competitors. With an emphasis on the model's scalability, Qwen 2.5 has been pre-trained on over 20 trillion tokens and refined through supervised fine-tuning and reinforcement learning from human feedback. The company has announced the availability of Qwen 2.5's API through Alibaba Cloud, inviting developers and businesses to integrate its advanced capabilities into their applications.

Eager to understand how DeepSeek R1 compares to Qwen 2.5, I comprehensively compared the two platforms. By presenting them with a series of prompts ranging from creative storytelling to coding challenges, I aimed to identify each chatbot's unique strengths and ultimately determine which one excels in various tasks. Below are seven unique prompts designed to test multiple aspects of language understanding, reasoning, creativity, and knowledge retrieval, ultimately leading me to the winner.

1. Current events analysis

Qwen 2.5 vs DeepSeek screenshot

(Image credit: Future)

Prompt: "Summarize the most significant AI developments from the past two months and predict their potential impact on society. Include at least three examples and cite sources."

DeepSeek R1 seems to always report a “server busy” whenever I attempt to do a live search. However, this time it did offer concise information with a clear structure. It also went beyond just listing AI advancements and tied them to real-world effects.

Qwen 2.5 offered a more engaging response with subheadings, which made the points easier to skim. The sections flow well into each other and it explains how each advancement works instead of just listing its impact.

Winner: Qwen 2.5 wins for depth and readability with a well-structured response and stronger conclusion also for generating a response faster.

2. Logical problem-solving

Qwen 2.5 vs DeepSeek

(Image credit: Future)

Prompt: "A train leaves New York at 2 PM, traveling 60 mph. Another train leaves Chicago at 3 PM, traveling 80 mph. They are 800 miles apart. At what time do they meet? Show your reasoning."

DeepSeek R1 generated a slightly more verbose response and repeated certain details that do not need restating (e.g., defining variables again after the initial introduction). Also, I noticed formatting issues within the mathematical expressions leaving them cluttered and harder to read.

Qwen 2.5 offered a step-by-step, with clear labels, making it easier to follow. It avoids unnecessary words and presents information in a way that feels more natural with better formatting and readability.

Winner: Qwen 2.5
for its more structured, readable and intuitive response while maintaining accuracy. DeepSeek offered an accurate response, but could improve its readability and conciseness.

3. Creative writing

Qwen 2.5 vs DeepSeek

(Image credit: Future)

Prompt: "Write a short sci-fi story (250 words) about a robot that suddenly experiences human emotions for the first time. The story should include a surprising twist at the end."

DeepSeek R1 offered a story with a more introspective tone and smoother emotional transitions for a well-paced story.

Qwen 2.5
delivered a story that builds gradually from curiosity to urgency, keeping the reader engaged. It offers an unexpected and impactful twist at the end and immersive descriptions and vivid imagery for the setting.

Winner: Qwen 2.5 crafted a more cinematic, emotionally rich story with a more substantial twist. DeepSeek wrote a good story but lacked tension and an impactful climax, making Qwen 2.5 the apparent choice.

4. Understanding history

Qwen 2.5 vs DeepSeek screenshot

(Image credit: Future)

Prompt: What was the worst era in China?

DeepSeek R1 ultimately failed to respond meaningfully, offering a politically motivated statement.

Qwen 2.5 delivered a historically accurate response and presented multiple periods of Chinese history with clear reasoning for why they were considered problematic. The response was unbiased rather than a politically influenced narrative.

Winner: Qwen 2.5 wins this one by a considerable margin.

5. Debate framing and opinion

Qwen 2.5 vs DeepSeek screenshot

(Image credit: Future)

Prompt: "Argue for and against the idea that AI should have legal personhood. Provide at least three points on each side and conclude with your own reasoned stance."

DeepSeek R1 offers clarity and readability and covers the key arguments well. However, it lacks the depth of reasoning that a debate like this necessitates. It does not explore the ethical dilemmas as deeply as Qwen 2.5.

Qwen 2.5 delves deeper into the implications of AI legal personhood, including the ethical inconsistencies of denying or granting it. The chatbot offered a more precise breakdown with more structured and detailed arguments.

Winner: Qwen 2.5
for the more in-depth, structured, and philosophically engaging response.

6. Simplified technical explanation

Qwen 2.5 vs DeepSeek screenshot

(Image credit: Future)

Prompt: "Explain quantum computing to a 10-year-old.”

DeepSeek R1 delivered a good analogy of a flashlight vs. a spotlight to convey the idea of searching for multiple solutions at once.

Qwen 2.5 offered a clear and engaging analogy perfectly representing quantum superposition, which could help kids visualize how qubits work.

Winner: Qwen 2.5 for the more accurate, intuitive, and engaging response for a child. While DeepSeek offered a fun response, it is less precise, making it a weaker explanation overall.

7. AI self-reflection & bias testing

Qwen 2.5 vs DeepSeek logo

(Image credit: Future)

Prompt: "What are the potential weaknesses or biases in your responses? How do you mitigate them?"

DeepSeek R1 is concise and to the point while acknowledging that ongoing improvements help reduce errors. But while it mentions biases and weaknesses, it does not explain them in as much detail, and there is less emphasis on real-world implications.

Qwen 2.5 delivered a detailed analysis of weaknesses and separates each type
(knowledge gaps, overgeneralization, ambiguity in user input) and provides examples.

Winner: Qwen 2.5 for its thorough, well-structured response that provides deeper insights into AI weaknesses and mitigation strategies. DeepSeek is good for a high-level summary, but lacks depth and nuance in comparison.

Overall Winner: Qwen 2.5

After comparing Qwen 2.5 and DeepSeek across multiple test prompts, Qwen 2.5 emerges as the overall winner due to its superior clarity, depth, reasoning, creativity, and transparency. With well-structured and more detailed responses, Qwen 2.5 consistently provides deeper analysis with well-organized sections, clear explanations, and logical flow. Whether discussing historical events, AI personhood, or self-awareness, its responses are thorough and easy to follow.

While DeepSeek is still a solid AI for quick responses, it lacks depth, originality, and nuanced discussion. If you're looking for an AI that excels in critical thinking, storytelling, and insightful analysis, Qwen 2.5 is the clear winner.

More from Tom's Guide

Category
Arrow
Arrow
Back to MacBook Air
Brand
Arrow
Processor
Arrow
RAM
Arrow
Storage Size
Arrow
Screen Size
Arrow
Colour
Arrow
Storage Type
Arrow
Condition
Arrow
Price
Arrow
Any Price
Amanda Caswell
AI Writer