Sekėjai

Ieškoti šiame dienoraštyje

2026 m. balandžio 13 d., pirmadienis

The Chinese finance whizz whose DeepSeek AI model stunned the world


“Liang Wenfeng is part of Nature’s 10, a list of people who shaped science in 2025.

 

In January last year, an announcement from China rocked the world of artificial intelligence. The firm DeepSeek released its powerful but cheap R1 model out of the blue — instantly demonstrating that the United States was not as far ahead in AI as many experts had thought.

 

Behind the bombshell announcement is Liang Wenfeng, a 40-year-old former financial analyst who is thought to have made millions of dollars applying AI algorithms to the stock market before using the cash in 2023 to establish DeepSeek, based in Hangzhou. Liang avoids the limelight and has given only a handful of interviews to the Chinese press (he declined a request to speak to Nature).

 

Liang’s models are as open as he is secretive. R1 is a ‘reasoning’ large language model (LLM) that excels at solving complex tasks — such as in mathematics and coding — by breaking them down into steps. It was the first of its kind to be released as open weight, meaning that the model can be downloaded and built on for free, so has been a boon for researchers who want to adapt algorithms to their own field. DeepSeek’s success seems to have prompted other companies in China and the United States to follow suit by releasing their own open models.

 

Despite R1 having many capabilities that are on a par with the best US models, including those powering ChatGPT, its training costs were much less than those of rival companies, say AI experts. Training costs for Meta’s Llama 3 405B model, for example, were more than ten times greater. DeepSeek’s bid for transparency extended to publishing the details of how it built and trained R1 when, in September, the model became the first major LLM to undergo the scrutiny of peer review (D. Guo et al. Nature 645, 633–638; 2025). By releasing its recipe, DeepSeek taught other AI researchers how to train a reasoning model.

 

In many ways, “DeepSeek has been hugely influential”, says Adina Yakefu, a researcher at the community AI platform Hugging Face, which is based in New York City.

 

The heights of AI are a far cry from the village in Guangdong province where Liang was raised as the child of two primary-school teachers. Higher education took him to the prestigious Zhejiang University in Hangzhou, where he graduated with a master’s in engineering in 2010; his thesis involved crafting algorithms to track objects in videos. He soon applied his love of AI to financial markets and, in 2015, co-founded the hedge fund High-Flyer, spinning off DeepSeek in 2023.

 

At that time, China faced a hurdle in developing LLMs. US export controls prevented Chinese firms from buying certain powerful computer chips known as graphics processing units (GPUs) made by the US chip manufacturer NVIDIA, which are suitable for training LLMs. But Liang was already well provisioned. He had spent the previous decade purchasing 10,000 NVIDIA GPUs, fuelled by curiosity about what research could be done on them. In a 2023 interview with Chinese media company 36Kr, he likened their purchase to someone buying a piano for their home: “One can afford it, and there’s a group eager to play music on it.”

 

Like many Western AI entrepreneurs, Liang has set his sights on achieving artificial general intelligence — AI systems as adept as humans in cognitive tasks — and he has shaped his company around this, says Benjamin Liu, a former researcher at DeepSeek. The company prioritizes a person’s potential over their level of experience when hiring (one author on the DeepSeek R1 paper is still in secondary school) and it operates with little hierarchy, with researchers deciding what to work on themselves. Liang is said to be closely involved in research, and “even interns like myself were treated as full-time employees with meaningful responsibilities”, says Liu.

 

Researchers from outside the company are impressed with how DeepSeek operates. Rather than exploit its popularity for commercial success, “it’s remarkable how DeepSeek has remained committed to solving pretty difficult foundational problems” in AI research, says Kwan Yee Ng, who leads international AI governance at Concordia AI, a Beijing-based consultancy that focuses on AI safety.

 

DeepSeek models have become deeply enmeshed in Chinese life: local governments are using them to operate chatbot hotlines and to help citizens fill out forms, and tens of millions of people use them every day as part of the country’s social-media platform, WeChat. In part, this trend is thanks to a government drive to build AI into the economy through a range of applications, from smart cities to health care.

 

DeepSeek has also become a symbol of a transition in the country’s reputation — from master imitators to true innovators, according to Liang and other Chinese researchers. “The shift is real, and it’s accelerating,” says Yu Wu, a researcher at DeepSeek. Now the world is eagerly awaiting the firm’s next reasoning model, R2, which is rumoured to have been delayed by issues with hardware and training data. One good bet is that Liang’s company plans to give R2 to the world for free. “We’re committed to open source forever,” says Wu.” [1]

 

1. Nature 648, 526 (2025) By Elizabeth Gibney

 

Komentarų nėra: