I wondered, did some homework, and discussed this with X.ai Grok. Here’s what it came up with, finally:
While there's no definitive proof that DeepSeek fudged their numbers or disclosures, the skepticism around their claims largely stems from several points:
Discrepancy in Reported Costs: The significant difference between DeepSeek's reported training costs and the industry's standard expectations for similar AI model training raises eyebrows. The claim of training a model like DeepSeek-V3 for just $5.5 million, when comparables by Western companies cost much more, suggests either groundbreaking efficiency or underreported costs.
Lack of Full Disclosure: In the tech community, particularly in AI development, there's an expectation for transparency especially when claims are as bold as DeepSeek's. If certain aspects of their methodology, hardware usage, or data handling are not fully disclosed, it naturally leads to speculation about what might be hidden or misrepresented.
Performance vs. Resources: The performance achieved by DeepSeek with the resources they claim to have used seems, to some experts, disproportionately high. This raises questions about whether there were additional resources or perhaps different, undisclosed methods that contributed to the model's success.
Geopolitical Context: The backdrop of U.S.-China tech rivalry and the export controls on advanced chips to China add layers of complexity. If DeepSeek managed to achieve what they claim with less powerful hardware, it might imply the use of alternative strategies, possibly involving methods or resources not fully accounted for in their public disclosures.
Market Reaction: The skepticism is also reflected in the market's reaction, with some investors and analysts questioning the claims, which indirectly pressures DeepSeek to provide more clarity or face continued doubt.
However, it's important to balance this skepticism with the acknowledgment that innovation can sometimes lead to unexpected efficiencies or breakthroughs. Without DeepSeek providing more detailed transparency or independent audits verifying their claims, the speculation about fudging numbers or methods will persist. The tech community often operates on a "show me" basis, where extraordinary claims require extraordinary evidence. Until such evidence is forthcoming, the possibility that some aspects of the project might be overstated or not fully disclosed will remain a topic of debate.
Lennart Heim has some interesting commentary. https://blog.heim.xyz/deepseek-what-the-headlines-miss/
“DeepSeek's efficiency gains may have come from previously having access to substantial compute. Counterintuitively, the path to using fewer chips (i.e., “efficiency”) may require starting with many more. DeepSeek operated Asia's first 10,000 A100 cluster, reportedly maintains a 50,000 H800 cluster, and has additional unlimited access to Chinese and foreign cloud providers (which is not export-controlled). This extensive compute access was likely crucial for developing their efficiency techniques through trial and error and for serving their models to customers.”