DeepSeek, far from an AI-pocalypse
Charles Bordes — 28 January 2025
The release of a model competitive with market leaders - but at a fraction of the cost - is a blessing that will enable a massive acceleration of the rollout of advanced AI applications.
Bottom line
The release of DeepSeek's portfolio of LLMs is a shock for the U.S. government and current AI leaders, as it shows sanctions have not prevented China from developing cutting-edge technology. Applying the company's playbook will enable other players to develop advanced models at a tiny fraction of the previous costs, which will spur a wave of new applications. Inference hardware and the data supply chain are poised to benefit.
What happened
On 26 December 2024, Chinese startup DeepSeek officially released the V3 version of its LLM, a 671bn-parameter model competing in the same league as OpenAI's ChatGPT-4o and Anthropic's Claude.
On 20 January 2025, the company released a reasoning version of the model, dubbed R1, which competes against OpenAI's o1.
Both these models delivered results on par or better than their competitors on reference benchmarks while requiring a fraction of the cost for their training. This triggered shockwaves in the entire supply chain, with investors pondering whether current investment levels in the hardware infrastructure might be oversized.
Impact on our Investment Case
Giving credit where it is due
There are many impressive aspects of this release to mention. But even before talking about the development costs, one should not forget that DeepSeek managed to catch up with Western competition despite having limited access to cutting-edge hardware - something akin to coming to a fistfight with one arm attached behind the back. DeepSeek's GPU infrastructure is allegedly made of Nvidia H800, a chip specifically designed to comply with U.S. sanctions, notably features a chip-to-chip data transfer rate cut by 50% vs. "normal" H100 versions. In this context, coming to this result is already a testament to the tenacity of DeepSeek's developers, as well as the proof that sanctions are not the panacea: either China managed to smuggle a substantial number of cutting-edge GPUs without disclosing it (some unconfirmed rumors state that the company owns 50'000 H100s), or such chips might be not required, which would have significant consequences (cf. later in this article).
However, the juicy part of this event is the development cost. DeepSeek claims the V3 model took ~2 months and cost $5.5mn in compute time to train. This claim is impressive but should be contextualized: it does not include the acquisition costs of GPUs, just their operating costs, and the ~2'000 GPUs allegedly used for the training process do not come by cheap. Nevertheless, it is a massive improvement compared to what was considered necessary before; as a comparison basis, it took ~16’400 GPUs and a similar ~2 months to Meta to train Llama 3 and its 405bn parameters. DeepSeek's result was obtained from a set of clever optimizations, which also translate into lower inference costs, with the R1 model being priced at less than 5% of OpenAI's o1.
Last but not least, the models were released as open-source, meaning anybody can download and fine-tune them using any supplemental data for any kind of application.
Significant consequences, primarily positive
Of course, the immediate negative is for datacenter-related hardware players. Although DeepSeek's techniques need to be confirmed by other players and digested internally, they may help substantially reduce the demand for training chips compared to what was initially expected. This would be a major blow to Nvidia, although hard to precisely quantify, as the past few quarters have proven that nobody really knew where demand growth would stop.
Training suppliers aside, we see this news as a significant positive for our AI & Robotics strategy. We indeed see every breakthrough as a major opportunity, and this one is no exception. The hardware moat shrinking, which, by the way, is a relative constant in the history of semiconductors, means that the training dataset becomes an even more important differentiator. In this respect, big generalist models may commoditize and become collateral damage, as open-source has a built-in advantage when it comes to this; smaller specialized models (e.g., healthcare) might become the only place where closed-source players thrive. All in all, software players are likely to see increasing traction vs. hardware ones. Additionally, we like to see some competition coming from a non-U.S. player, as it will keep the pressure on leading players to push innovation further and not become complacent.
More importantly, lower training costs and a robust open-source ecosystem mean more applications, which will, in return, fuel a virtuous cycle. On top of the data supply chain, which is critical to enable these applications, one primary beneficiary might be... inference hardware: applications need infrastructure to run on, and custom inference chips can do it quite efficiently. All in all, we think that more efficient AI will drive a reallocation of capex but not result in overall lower infrastructure spending, at least in the short term; recent announcements ("Stargate" project in the U.S., Meta raising its capex target) tend to support this view, although confirmation will be needed from the main hyperscalers.
Our Takeaway
DeepSeek's release is a seismic event for the AI ecosystem but in a positive way. We believe the industry has everything to gain from lower development costs, as they will facilitate the development of applications and their adoption by end-users. Hardware players and the related AI/datacenter supply chain will suffer in the short term. However, we see a strong potential for inference hardware players such as Marvell and Broadcom. Also, software players are poised to benefit, especially those active in the data supply chain, which will become an even more critical element. We believe our allocation is well-calibrated to address the current situation, as we have been implementing a pivot towards software players for some time already. We expect our portfolio to capture the rebound, which will inevitably materialize once the dust settles and investors realize the potential of recent developments.