menu_open Columnists
We use cookies to provide some features and experiences in QOSHE

More information  .  Close

A peek beneath the bonnet of DeepSeek’s AI

7 34
04.02.2025

Tumbling stock market values and wild claims have accompanied the release of a new AI chatbot by a small Chinese company. What makes it so different?

The release of China's new DeepSeek AI-powered chatbot app has rocked the technology industry. It quickly overtook OpenAI's ChatGPT as the most-downloaded free iOS app in the US, and caused chip-making company Nvidia to lose almost $600bn (£483bn) of its market value in one day – a new US stock market record.

The reason behind this tumult? The "large language model" (LLM) that powers the app has reasoning capabilities that are comparable to US models such as OpenAI's o1, but reportedly requires a fraction of the cost to train and run.

Analysis

Dr Andrew Duncan is the director of science and innovation fundamental AI at the Alan Turing Institute in London, UK.

DeepSeek claims to have achieved this by deploying several technical strategies that reduced both the amount of computation time required to train its model (called R1) and the amount of memory needed to store it. The reduction of these overheads resulted in a dramatic cutting of cost, says DeepSeek. R1's base model V3 reportedly required 2.788 million hours to train (running across many graphical processing units – GPUs – at the same time), at an estimated cost of under $6m (£4.8m), compared to the more than $100m (£80m) that OpenAI boss Sam Altman says was required to train GPT-4.

Despite the hit taken to Nvidia's market value, the DeepSeek models were trained on around 2,000 Nvidia H800 GPUs, according to one research paper released by the company. These chips are a modified version of the widely used H100 chip, built to comply with export rules to China. These were likely stockpiled before restrictions were further tightened by the Biden administration in October 2023, which effectively banned Nvidia from exporting the H800s to China. It is likely that, working within these constraints, DeepSeek has been forced to find innovative ways to make the most effective use of the resources it has at its disposal.

Reducing the computational cost of training and running models may also address concerns about the environmental impacts of AI. The data centres they run on have huge electricity and water demands, largely to keep the servers from overheating. While most technology companies do not disclose the carbon footprint........

© BBC