DeepSeek: A Fantastic AI Breakthrough, But Not a $5 Million Miracle
The artificial intelligence world has been buzzing with excitement over DeepSeek, a company that has quickly gained attention for its advanced AI models. Many on social media and in the stock market have speculated that DeepSeek built a competitor to OpenAI for just $5 million. However, a recent report by Bernstein clarifies that while DeepSeek's achievements are impressive, the claims of developing OpenAI-level technology on such a low budget are misleading.
Breaking Down DeepSeek’s AI Models
DeepSeek has developed two main families of AI models: DeepSeek-V3 and DeepSeek R1. The V3 model is a Mixture-of-Experts (MOE) model, an architecture that allows multiple smaller models to collaborate for optimal efficiency. This setup enables DeepSeek to achieve high performance with lower computing costs compared to other large-scale AI models. The model features 671 billion parameters, but only 37 billion parameters are active at any given time, making it significantly more efficient than traditional architectures.
DeepSeek has also integrated advanced AI training techniques such as Multi-Head Latent Attention (MHLA), which optimizes memory usage, and FP8 mixed-precision training, which improves computational efficiency. These innovations allow DeepSeek to compete with some of the most powerful AI models in the industry while requiring fewer computational resources.
The Training Process and Real Costs
The Bernstein report highlights the computational requirements for training DeepSeek-V3, which involved:
2,048 NVIDIA H800 GPUs
2.7 million GPU hours for pre-training
2.8 million GPU hours including post-training
While some have estimated the cost of this process at $5 million, based on an assumption of $2 per GPU hour rental rate, the Bernstein report argues that this calculation is overly simplistic. It does not account for the years of research, experimentation, software development, infrastructure costs, and human expertise required to build these models.
DeepSeek’s second model, DeepSeek R1, is built on the V3 foundation but incorporates Reinforcement Learning (RL) and advanced reasoning techniques to enhance its problem-solving capabilities. The R1 model has shown competitive performance against OpenAI’s most advanced models, particularly in tasks that require logical reasoning. However, the report suggests that the additional resources needed to train R1 were likely substantial, though no exact figures were provided.
Why the $5 Million Claim is Misleading
The idea that DeepSeek built a competitor to OpenAI for just $5 million has led to widespread speculation, with some seeing it as a game-changer for AI development costs. However, the Bernstein report debunks this claim, explaining that while DeepSeek’s models are efficient and cost-effective, the total investment required goes far beyond just GPU rental fees.
For example, OpenAI's top-tier models require significantly more compute power than DeepSeek-V3. The Bernstein report notes that DeepSeek used only 9% of the compute resources needed to train some of the largest models in the industry. While this efficiency is impressive, it does not mean that DeepSeek was able to achieve its success with an ultra-low budget.
What This Means for the AI Industry
Despite the exaggerated claims, DeepSeek’s innovations are still a major achievement in the AI space. The ability to build a high-performing language model using a fraction of the compute power required by competitors signals a shift in how AI models are designed and optimized.
The success of DeepSeek’s MOE-based architecture and advanced training methods could inspire other AI companies to develop more efficient, cost-effective models. However, the report warns against panic or overhyping the company’s achievements, emphasizing that while DeepSeek is fantastic, it is not a miracle.