DeepSeek, far from an AI-pocalypse
28 January 2025
The release of a model competitive with market leaders - but at a fraction of the cost - is a blessing that will enable a massive acceleration of the rollout of advanced AI applications.
Bottom line
The release of DeepSeek's portfolio of LLMs is a shock for the U.S. government and current AI leaders, as it shows sanctions have not prevented China from developing cutting-edge technology. Applying the company's playbook will enable other players to develop advanced models at a tiny fraction of the previous costs, which will spur a wave of new applications. Inference hardware and the data supply chain are poised to benefit.
What happened
On 26 December 2024, Chinese startup DeepSeek officially released the V3 version of its LLM, a 671bn-parameter model competing in the same league as OpenAI's ChatGPT-4o and Anthropic's Claude.
On 20 January 2025, the company released a reasoning version of the model, dubbed R1, which competes against OpenAI's o1.
Both these models delivered results on par or better than their competitors on reference benchmarks while requiring a fraction of the cost for their training. This triggered shockwaves in the entire supply chain, with investors pondering whether current investment levels in the hardware infrastructure might be oversized.
Impact on our Investment Case
Giving credit where it is due
There are many impressive aspects of this release to mention. But even before talking about the development costs, one should not forget that DeepSeek managed to catch up with Western competition despite having limited access to cutting-edge hardware - something akin to coming to a fistfight with one arm attached behind the back. DeepSeek's GPU infrastructure is allegedly made of Nvidia H800, a chip specifically designed to comply with U.S. sanctions, notably features a chip-to-chip data transfer rate cut by 50% vs. "normal" H100 versions. In this context, coming to this result is already a testament to the tenacity of DeepSeek's developers, as well as the proof that sanctions are not the panacea: either China managed to smuggle a substantial number of cutting-edge GPUs without disclosing it (some unconfirmed rumors state that the company owns 50'000 H100s), or such chips might be not required, which would have significant consequences (cf. later in this article).
However, the juicy part of this event is the development cost. DeepSeek claims the V3 model took ~2 months and cost $5.5mn in compute time to train. This claim is impressive but should be contextualized: it does not include the acquisition costs of GPUs, just their operating costs, and the ~2'000 GPUs allegedly used for the training process do not come by cheap. Nevertheless, it is a massive improvement compared to what was considered necessary before; as a comparison basis, it took ~16’400 GPUs and a similar ~2 months to Meta to train Llama 3 and its 405bn parameters. DeepSeek's result was obtained from a set of clever optimizations, which also translate into lower inference costs, with the R1 model being priced at less than 5% of OpenAI's o1.
Last but not least, the models were released as open-source, meaning anybody can download and fine-tune them using any supplemental data for any kind of application.
Significant consequences, primarily positive
Of course, the immediate negative is for datacenter-related hardware players. Although DeepSeek's techniques need to be confirmed by other players and digested internally, they may help substantially reduce the demand for training chips compared to what was initially expected. This would be a major blow to Nvidia, although hard to precisely quantify, as the past few quarters have proven that nobody really knew where demand growth would stop.
Training suppliers aside, we see this news as a significant positive for our AI & Robotics strategy. We indeed see every breakthrough as a major opportunity, and this one is no exception. The hardware moat shrinking, which, by the way, is a relative constant in the history of semiconductors, means that the training dataset becomes an even more important differentiator. In this respect, big generalist models may commoditize and become collateral damage, as open-source has a built-in advantage when it comes to this; smaller specialized models (e.g., healthcare) might become the only place where closed-source players thrive. All in all, software players are likely to see increasing traction vs. hardware ones. Additionally, we like to see some competition coming from a non-U.S. player, as it will keep the pressure on leading players to push innovation further and not become complacent.
More importantly, lower training costs and a robust open-source ecosystem mean more applications, which will, in return, fuel a virtuous cycle. On top of the data supply chain, which is critical to enable these applications, one primary beneficiary might be... inference hardware: applications need infrastructure to run on, and custom inference chips can do it quite efficiently. All in all, we think that more efficient AI will drive a reallocation of capex but not result in overall lower infrastructure spending, at least in the short term; recent announcements ("Stargate" project in the U.S., Meta raising its capex target) tend to support this view, although confirmation will be needed from the main hyperscalers.
Our Takeaway
DeepSeek's release is a seismic event for the AI ecosystem but in a positive way. We believe the industry has everything to gain from lower development costs, as they will facilitate the development of applications and their adoption by end-users. Hardware players and the related AI/datacenter supply chain will suffer in the short term. However, we see a strong potential for inference hardware players such as Marvell and Broadcom. Also, software players are poised to benefit, especially those active in the data supply chain, which will become an even more critical element. We believe our allocation is well-calibrated to address the current situation, as we have been implementing a pivot towards software players for some time already. We expect our portfolio to capture the rebound, which will inevitably materialize once the dust settles and investors realize the potential of recent developments.
Companies mentioned in this article
Anthropic (Not listed); Broadcom (AVGO); DeepSeek (Not listed); Marvell (MRVL); Meta (META); Nvidia (NVDA); OpenAI (Not listed)
Explore:
Disclaimer
This report has been produced by the organizational unit responsible for investment research (Research unit) of atonra Partners and sent to you by the company sales representatives.
As an internationally active company, atonra Partners SA may be subject to a number of provisions in drawing up and distributing its investment research documents. These regulations include the Directives on the Independence of Financial Research issued by the Swiss Bankers Association. Although atonra Partners SA believes that the information provided in this document is based on reliable sources, it cannot assume responsibility for the quality, correctness, timeliness or completeness of the information contained in this report.
The information contained in these publications is exclusively intended for a client base consisting of professionals or qualified investors. It is sent to you by way of information and cannot be divulged to a third party without the prior consent of atonra Partners. While all reasonable effort has been made to ensure that the information contained is not untrue or misleading at the time of publication, no representation is made as to its accuracy or completeness and it should not be relied upon as such.
Past performance is not indicative or a guarantee of future results. Investment losses may occur, and investors could lose some or all of their investment. Any indices cited herein are provided only as examples of general market performance and no index is directly comparable to the past or future performance of the Certificate.
It should not be assumed that the Certificate will invest in any specific securities that comprise any index, nor should it be understood to mean that there is a correlation between the Certificate’s returns and any index returns.
Any material provided to you is intended only for discussion purposes and is not intended as an offer or solicitation with respect to the purchase or sale of any security and should not be relied upon by you in evaluating the merits of investing inany securities.