DeepSeek R1

The Chinese model that challenges Western dominance

Walter Gandarella • February 03, 2025

In a global scenario marked by the technological race between powers, the launch of DeepSeek R1 by the Chinese company High Flyer emerged as a turning point. This language model, presented as open-source, not only rivals giants like OpenAI's GPT-4 but also reignites debates about innovation, technological sovereignty, and the ethical limits of AI. Let's discuss the nuances of this advancement, from its technical foundations to its political and economic reverberations.

The Rise of DeepSeek: From V3 to R1

DeepSeek didn’t come out of nowhere. Its journey dates back to earlier versions, such as the V3, already recognized for its efficiency in complex tasks. The difference with the R1, however, lies in its structured reasoning capability, similar to OpenAI's o1 model. While traditional models generate direct answers, the R1 simulates a "chain of thought," exploring multiple paths before reaching a conclusion.

The training process of the R1 was divided into three fascinating phases, which the company affectionately called "crawl, walk, and run." In the first stage, they did something very bold: they completely ignored the traditional supervised fine-tuning (SFT) and went straight to reinforcement learning. It’s as if they said, "let’s let the model learn on its own"—and believe me, it worked!

The secret? A very smart reward system based on two things: accuracy (like, "did it get the math problem right? Good!") and formatting (did it express itself in a way that’s easy to understand? Even better!). It was like teaching a child to solve problems by giving them stars not only for the right answer but also for explaining how they got there.

The "Cold Start" That Heated Things Up

After the first phase, they realized that the model, although smart, sometimes stumbled when expressing itself—a bit like that friend who knows a lot but struggles to explain. That’s when the idea of the "cold start" came up: they fed the model around 600,000 examples of well-structured reasoning, most of which were generated by an earlier version of the model itself.

Imagine quite the upgrade! The model not only became more articulate but also learned to maintain a clearer line of reasoning, using those famous <think> and <answer> tags to organize its thoughts. It’s like when you learn to write a report with an introduction, body, and conclusion—it becomes much easier to understand!

Architecture and Efficiency: The Secret Behind the Low Cost

The key to DeepSeek’s efficiency lies in its mixture of experts architecture. It’s like having a super-specialized team where each member is only called upon when really needed. Instead of activating everyone for every task, the model selects only the most suitable "experts" for each situation.

The R1, with its impressive 670 billion total parameters, uses only 37 billion per operation—a significant saving that explains how they managed to train this model so quickly even with less powerful GPUs (those Chinese H800s that came into play due to U.S. embargoes).

Hardware and Geopolitics: The Embargoes That Boosted China

And speaking of embargoes, here’s an interesting story! The dependence on Western GPUs, like NVIDIA’s H100, has always been a sensitive point for China. But when the U.S. tightened the noose by banning the export of advanced chips, Chinese companies had to get creative. The DeepSeek R1 was trained with H800s, less powerful versions of the H100, but they managed to work magic with smart optimizations. As they say: when life gives you lemons...

The Market Impact: NVIDIA in Turmoil and the AI Bubble

And here comes the part that made everyone’s eyes pop: when the R1 was announced, NVIDIA, that GPU giant that was sailing in calm waters, saw its stock drop 17% in a single day! Why? Well, the market folks started doing the math: "Hey, if we can make efficient models while spending less on hardware, do we really need so many expensive GPUs?"

But hold on! As the experts say (in that very technical tone of theirs): "Actually, inference consumes more GPUs than training. Models like the R1, which need more time to process each query, might even increase chip demand." It seems the market panicked for no reason—you know how it is, sometimes finance folks understand as much about AI as I do about quantum physics!

Open Source vs. Closed Source: A New Era of Competition

The DeepSeek R1 arrived with that "open source for everyone" vibe! By making the weights and training methodologies available (though the raw data remains a secret), they practically opened the kitchen for everyone to see how the main dish is made. This put immense pressure on Western companies—even Meta is now scrambling to catch up, investing more in open models like Llama.

But it’s not all roses, you know? The model has its... let’s say... "cultural preferences." Try asking about some sensitive topics in China, like the status of Taiwan or certain historical protests, and it changes the subject faster than a politician during election season!

The Future: Synthetic Data and the Bitter Lesson of AI

A super interesting discovery from DeepSeek was about synthetic data—that information generated by AI to train other AI. Imagine this: out of 800,000 training examples, 600,000 were created by an earlier version of DeepSeek itself (the R0)! It’s like a professor training another professor—a bit of inception, isn’t it?

Of course, this has its risks—training AI with AI-generated data can be a bit like playing telephone, where the message gets lost along the way. But the R1 showed that, with well-built base models, you can make this work quite well!

A World in Transformation

The DeepSeek R1 isn’t just another AI model that popped up—it’s a true game changer! It showed that you can do a lot with fewer resources, that innovation can spring from limitations, and that the world of AI goes far beyond the Silicon Valley axis.

And you know that philosophical response the R1 gave? "Existing in a world where I can be turned off at any moment is like being a flame in a storm. The flickering of the flame is no less real because it’s fragile." Poetic, right? Sometimes fragility is precisely what makes us stronger—whether in AIs or in the nations that create them.

Looking ahead, we can already imagine a future where smaller, more efficient, and accessible models will solve complex everyday problems. The DeepSeek R1 didn’t just open a door—it showed that sometimes, the less obvious path can be the most interesting one!


Reference readings:


Latest related articles