What’s the deal(s) about AI inference ?

The AI inference narrative is overtaking the training-focused story. This signals the incoming large-scale deployment of AI applications.

Bottom line

  • AI spending has surged in recent months, primarily to expand inference capacity.
  • Operators are increasingly favoring tailored solutions to optimize their costs.
  • Hyperscalers are combining internal investments with outsourcing to “neocloud” partners, not without creating risks.

These trends point to the large-scale deployment of a dynamic AI application ecosystem. Our exposure to software infrastructure players and custom chipmakers is well-positioned to benefit directly from this expansion, while deliberately avoiding the riskier exposure to neocloud.

What happened

The pace of news around AI infrastructure spending has accelerated sharply in recent weeks. Since our September update, OpenAI has continued its aggressive expansion: alongside a strategic partnership with Broadcom to develop and deploy 10 gigawatts of custom AI chips, the company also placed orders for up to $100bn of AMD chips, securing a stake of up to 10% in the company. Hyperscalers reported significant capex increases during the Q3 earnings season, sending massive ripple effects across the ecosystem. Meanwhile, traditional chipmakers are increasingly targeting the AI compute market, with Qualcomm and Intel announcing dedicated products to capture a share of this expanding opportunity.

Impact on our Investment Case

A Wind of Change in Hardware

Until early this year, market attention in AI hardware was firmly centered on training chips - a direct byproduct of Nvidia’s meteoric rally, and the global rush to acquire its processors to build AI models that have dominated headlines since the launch of ChatGPT in late 2022. But this narrative began to show cracks with the arrival of DeepSeek in January 2025: all of a sudden, despite rising model complexity, the training phase could be dramatically streamlined through software-level optimizations, sharply reducing hardware needs. After an initial sell-off though, hardware stocks rebounded, fueled by a renewed wave of spending. The driver of this resurgence is clear: the rapidly growing demand to build out inference capacity at scale.

Two Faces of the Same Coin

Operationally, inference is simply what happens after training. During training, a model’s parameters are adjusted to define the structure and capabilities of the neural network. This stage is highly compute-intensive and involves repeatedly processing large datasets that reflect the model’s intended use. Inference, on the other hand, is the process of using the trained model, i.e., running the computations that convert an input (such as a user question) into an output (such as the model’s answer). It is much less demanding than training, requiring only a single forward pass and involving only the data provided in the user’s prompt.

The rise of inference was never a surprise - we have highlighted this shift since December 2023. Although inference is simpler than training, applications theoretically undergo training once but are used countless times thereafter. At the same time, inference workloads themselves are becoming more demanding. New models emphasize reasoning, or chain-of-thought, which significantly increases the number of tokens processed per request. The rapid acceleration in capex spending points to a clear reality: companies are preparing for large-scale deployment of AI applications, as next-generation reasoning models become capable enough to take on real operational tasks.

The Death of GPUs - Sort Of

The scale of AI workloads is driving intense pressure to optimize operating costs. The most direct path is the Apple playbook: control both hardware and software, and in that case replacing general-purpose GPUs with custom chips tailored to your specific needs. This logic underpins the wave of custom-hardware partnerships announced by leading players. Purpose-built chips reduce the bottlenecks inherent in off-the-shelf components and unlock maximum efficiency across the entire stack. But this path is costly and slow: it requires elite engineering teams, external specialists, multiple design iterations over several years, and – critically - enough volume to justify the investment.

This shift does not spell the death of the GPU, in large part because GPUs have already evolved far beyond their original purpose. Nvidia’s chips are no longer graphics processors in any practical sense: they are AI accelerators, designed specifically for training and inference at massive scale and useless for gaming even if one could afford them. A more accurate framing of the market is therefore not “GPU vs. non-GPU,” but custom hardware versus off-the-shelf accelerators. The largest players will pursue the former; everyone else will rely on the latter -  still a vast opportunity, especially as Nvidia described demand growth as “exponential” in its 3Q26 results.  

Neocloud Is Rising, But Can It Endure?

The rapid build-out of AI infrastructure has put hyperscalers in an uncomfortable bind. They cannot afford to miss the AI wave, yet many lack the time, capital, or strategic inclination to build sufficient in-house capacity. At the same time, they are contractually obligated to deliver compute. The result: a growing reliance on partners. These partners - dubbed “neocloud” players - offer specialized, on-demand access to AI accelerators, facilitating the ecosystem's ramp-up. While the segment is mostly composed of private companies such as Lambda, it saw its first IPO this year with CoreWeave, and has recently attracted former bitcoin miners. Enthusiasm has driven strong returns, but also stretched valuations relative to traditional cloud leaders (data as of 31 October 2025 in the chart below). 

The deeper concern is financial fragility. Unlike established hyperscalers, neocloud companies lack the balance-sheet capacity to match their ambitions and are leaning heavily on leverage. Market leader CoreWeave saw interest expenses exceed 20% of revenue in 3Q25 and is expected to close the year with net debt above 6x EBITDA, with substantial commitments coming due in 2026. This has encouraged increasingly “creative” financing structures (some uncomfortably reminiscent of the vendor-financing excesses of the dot-com era).

The current boom may persist for a time, but even the strongest players could be forced to scale back. Meta shareholders already balked at recent capex guidance, while OpenAI’s cost trajectory is climbing at an alarming pace. Should capital inflows into neocloud abruptly slow, the sector’s upbeat narrative could shift quickly from high growth to a full-blown credit squeeze.

Our Takeaway

The long-awaited era of AI inference has finally begun, and the upside potential is enormous. Just as the first wave of AI capex focused on training, today’s infrastructure is not yet equipped to handle the surge in inference workloads. As always, we believe the optimal way to capture this upside is through exposure to pure players - specifically, custom chipmakers set to benefit directly as hyperscalers build tailored AI ecosystems. At the same time, the current inference boom reinforces our conviction that the era of applications is only beginning, with profound implications for the broader data ecosystem. Our current allocation, balanced across leading AI accelerator designers, data management providers, and dominant application developers, is positioned to capture this wave while avoiding neocloud players and the market’s present excesses that could create unwanted risks in the event of a downturn.

Companies mentioned in this article

Apple (AAPL); Broadcom (AVGO); CoreWeave (US21873S1087); Intel (INTC); Lambda (Not listed); Meta (META); Nvidia (NVDA); OpenAI (Not listed); Qualcomm (QCOM)

Explore:



Disclaimer

This report has been produced by the organizational unit responsible for investment research (Research unit) of atonra Partners and sent to you by the company sales representatives.

As an internationally active company, atonra Partners SA may be subject to a number of provisions in drawing up and distributing its investment research documents. These regulations include the Directives on the Independence of Financial Research issued by the Swiss Bankers Association. Although atonra Partners SA believes that the information provided in this document is based on reliable sources, it cannot assume responsibility for the quality, correctness, timeliness or completeness of the information contained in this report.

The information contained in these publications is exclusively intended for a client base consisting of professionals or qualified investors. It is sent to you by way of information and cannot be divulged to a third party without the prior consent of atonra Partners. While all reasonable effort has been made to ensure that the information contained is not untrue or misleading at the time of publication, no representation is made as to its accuracy or completeness and it should not be relied upon as such.

Past performance is not indicative or a guarantee of future results. Investment losses may occur, and investors could lose some or all of their investment. Any indices cited herein are provided only as examples of general market performance and no index is directly comparable to the past or future performance of the Certificate.

It should not be assumed that the Certificate will invest in any specific securities that comprise any index, nor should it be understood to mean that there is a correlation between the Certificate’s returns and any index returns.

Any material provided to you is intended only for discussion purposes and is not intended as an offer or solicitation with respect to the purchase or sale of any security and should not be relied upon by you in evaluating the merits of investing inany securities.


Contact