Mokslas, studijos ir ekonomika

2025 m. sausio 29 d., trečiadienis

Tragic Consequences of Reinforcement Learning for AI in the West

"A new A.I. model, released by a scrappy Chinese upstart, has rocked Silicon Valley and upended several fundamental assumptions about A.I. progress.

The artificial intelligence breakthrough that is sending shock waves through stock markets, spooking Silicon Valley giants, and generating breathless takes about the end of America’s technological dominance arrived with an unassuming, wonky title: “Incentivizing Reasoning Capability in LLMs via Reinforcement Learning.”

The 22-page paper, released last week by a scrappy Chinese A.I. start-up called DeepSeek, didn’t immediately set off alarm bells. It took a few days for researchers to digest the paper’s claims, and the implications of what it described. The company had created a new A.I. model called DeepSeek-R1, built by a team of researchers who claimed to have used a modest number of second-rate A.I. chips to match the performance of leading American A.I. models at a fraction of the cost.

DeepSeek said it had done this by using clever engineering to substitute for raw computing horsepower. And it had done it in China, a country many experts thought was in a distant second place in the global A.I. race.

Some industry watchers initially reacted to DeepSeek’s breakthrough with disbelief. Surely, they thought, DeepSeek had cheated to achieve R1’s results, or fudged their numbers to make their model look more impressive than it was. Maybe the Chinese government was promoting propaganda to undermine the narrative of American A.I. dominance. Maybe DeepSeek was hiding a stash of illicit Nvidia H100 chips, banned under U.S. export controls, and lying about it. Maybe R1 was actually just a clever re-skinning of American A.I. models that didn’t represent much in the way of real progress.

Eventually, as more people dug into the details of DeepSeek-R1 — which, unlike most leading A.I. models, was released as open-source software, allowing outsiders to examine its inner workings more closely — their skepticism morphed into worry.

And late last week, when lots of Americans started to use DeepSeek’s models for themselves, and the DeepSeek mobile app hit the number one spot on Apple’s App Store, it tipped into full-blown panic.

I’m skeptical of the most dramatic takes I’ve seen over the past few days — such as the claim, made by one Silicon Valley investor, that DeepSeek is an elaborate plot by the Chinese government to destroy the American tech industry. I also think it’s plausible that the company’s shoestring budget has been badly exaggerated, or that it piggybacked on advancements made by American A.I. firms in ways it hasn’t disclosed.

But I do think that DeepSeek’s R1 breakthrough was real. Based on conversations I’ve had with industry insiders, and a week’s worth of experts poking around and testing the paper’s findings for themselves, it appears to be throwing into question several major assumptions the American tech industry has been making.

The first is the assumption that in order to build cutting-edge A.I. models, you need to spend huge amounts of money on powerful chips and data centers.

It’s hard to overstate how foundational this dogma has become. Companies like Microsoft, Meta and Google have already spent tens of billions of dollars building out the infrastructure they thought was needed to build and run next-generation A.I. models. They plan to spend tens of billions more — or, in the case of OpenAI, as much as $500 billion through a joint venture with Oracle and SoftBank that was announced last week.

DeepSeek appears to have spent a small fraction of that building R1.

We don’t know the exact cost, and there are plenty of caveats to make about the figures they’ve released so far. It’s almost certainly higher than $5.5 million, the number the company claims it spent training a previous model.

But even if R1 cost 10 times more to train than DeepSeek claims, and even if you factor in other costs they may have excluded, like engineer salaries or the costs of doing basic research, it would still be orders of magnitude less than what American A.I. companies are spending to develop their most capable models.

The obvious conclusion to draw is not that American tech giants are wasting their money. It’s still expensive to run powerful A.I. models once they’re trained, and there are reasons to think that spending hundreds of billions of dollars will still make sense for companies like OpenAI and Google, which can afford to pay dearly to stay at the head of the pack.

But DeepSeek’s breakthrough on cost challenges the “bigger is better” narrative that has driven the A.I. arms race in recent years by showing that relatively small models, when trained properly, can match or exceed the performance of much bigger models.

That, in turn, means that A.I. companies may be able to achieve very powerful capabilities with far less investment than previously thought. And it suggests that we may soon see a flood of investment into smaller A.I. start-ups, and much more competition for the giants of Silicon Valley. (Which, because of the enormous costs of training their models, have mostly been competing with each other until now.)

There are other, more technical reasons that everyone in Silicon Valley is paying attention to DeepSeek. In the research paper, the company reveals some details about how R1 was actually built, which include some cutting-edge techniques in model distillation. (Basically, that means compressing big A.I. models down into smaller ones, making them cheaper to run without losing much in the way of performance.)

DeepSeek also included details that suggested that it had not been as hard as previously thought to convert a “vanilla” A.I. language model into a more sophisticated reasoning model, by applying a technique known as reinforcement learning [1] on top of it. (Don’t worry if these terms go over your head — what matters is that methods for improving A.I. systems that were previously closely guarded by American tech companies are now out there on the web, free for anyone to take and replicate.)

Even if the stock prices of American tech giants recover in the coming days, the success of DeepSeek raises important questions about their long-term A.I. strategies. If a Chinese company is able to build cheap, open-source models that match the performance of expensive American models, why would anyone pay for ours? And if you’re Meta — the only U.S. tech giant that releases its models as free open-source software — what prevents DeepSeek or another start-up from simply taking your models, which you spent billions of dollars on, and distilling them into smaller, cheaper models that they can offer for pennies?

DeepSeek’s breakthrough also undercuts some of the geopolitical assumptions many American experts had been making about China’s position in the A.I. race.

First, it challenges the narrative that China is meaningfully behind the frontier, when it comes to building powerful A.I. models. For years, many A.I. experts (and the policymakers who listen to them) have assumed that the United States had a lead of at least several years, and that copying the advancements made by American tech firms was prohibitively hard for Chinese companies to do quickly.

But DeepSeek’s results show that China has advanced A.I. capabilities that can match or exceed models from OpenAI and other American A.I. companies, and that breakthroughs made by U.S. firms may be trivially easy for Chinese firms — or, at least, one Chinese firm — to replicate in a matter of weeks.

(The New York Times has sued OpenAI and its partner, Microsoft, accusing them of copyright infringement of news content related to A.I. systems. OpenAI and Microsoft have denied those claims.)

The results also raise questions about whether the steps the U.S. government has been taking to limit the spread of powerful A.I. systems to our adversaries — namely, the export controls used to prevent powerful A.I. chips from falling into China's hands — are working as designed, or whether those regulations need to adapt to take into account new, more efficient ways of training models.

I’m still not sure what the full impact of DeepSeek’s breakthrough will be, or whether we will consider the release of R1 a “Sputnik moment” for the A.I. industry, as some have claimed.

But it seems wise to take seriously the possibility that we are in a new era of A.I. brinkmanship now — that the biggest and richest American tech companies may no longer win by default, and that containing the spread of increasingly powerful A.I. systems may be harder than we thought.

At the very least, DeepSeek has shown that the A.I. arms race is truly on, and that after several years of dizzying progress, there are still more surprises left in store.” [2]

1. “Reinforcement learning (RL) is a machine learning (ML) technique that teaches software to make decisions that achieve the best results. It's based on the idea of rewarding correct actions and punishing incorrect ones, similar to how humans learn through trial and error.

The goal of RL is to teach an agent to learn an optimal policy that maximizes a reward function. The agent learns by interacting with an environment and receiving feedback.

Some real-world applications of RL include: Self-driving cars, Industry automation, Trading and finance, Natural language processing (NLP), and Healthcare.

Some elements of RL include:

The agent
The environment
A policy
A reward function
A value function
A model of the environment

There are three main ways to implement RL in ML: Value-based, Policy-based, and Model-based.

Q-learning is a type of RL that uses a value-based approach to determine how good an action is in a given state.”

2. Why DeepSeek Could Change What Silicon Valley Believes About A.I.Roose, Kevin. New York Times (Online) New York Times Company. Jan 28, 2025.

Slapta telegrama ir užuomina, kur JAV ir Rusijos santykiai pakito

„Buvo 1994 m. kovo mėn., praėjus daugiau, nei dvejiems, metams po Sovietų Sąjungos iširimo, o diskusijos JAV ambasadoje Maskvoje buvo karštos. Ekonomikos skyriaus diplomatai, remiami Vašingtono Iždo departamento, karštai įrodinėjo, kad radikalios laisvosios rinkos reformos. buvo vienintelis kelias posovietinei Rusijai, ir kad demokratija tikrai seks. Politiniai patarėjai taip pat aistringai tikėjo „Šoko terapija“ tik pablogins niokojantį išnirimą, kurį rusai jau kentėjo žlugus Sovietų Sąjungai. Jie perspėjo, kad dėl savo bėdų rusai kaltins Ameriką ir pačią demokratiją.

Diskusijos įkarštyje E. Wayne'as Merry, aukščiausias ambasados politikos analitikas ir vienas ryžtingiausių šoko terapijos kritikų, pateikė išsamią bylą prieš ją ilgoje telegramoje, provokuojančiai pavadintoje „Kieno Rusija vis dėlto yra? Geros, pagarbios, politikos link“.

Praėjusį mėnesį esė apie telegramą p. Merry tvirtino, kad Amerika tuomet puolė į seną klaidą bandyti suprasti svetimą šalį, „žiūrint į veidrodį“. Jis rašė, kad siekis vykdyti laisvosios rinkos reformas šalyje, neturinčioje jokios rinkos ekonomikos ar demokratijos patirties, buvo „ypač žiaurus atvejis, kai Vašingtono institucijos bandė įkalti svetimą kvadratinį kaištį į Amerikos apvalią skylę“.

P. Merry, dabar Amerikos užsienio politikos tarybos vyresnysis bendradarbis, viename interviu man pasakė: „Kodėl aš parašiau tas 70 pastraipų? Dvejus su puse metų rašiau apie šias problemas ir buvau labai nusivylęs, kad Vašingtone niekas nesidomėjo niekuo kitu, išskyrus ekonomikos teoriją, kilusią iš Harvardo. Jis buvo labai nusivylęs ir pridūrė: „Nusprendžiau, kad mano, kaip Pol/Int“ – ambasados politinio/vidaus skyriaus – vadovo, pareiga „papasakoti Vašingtonui, kas vyksta“.

Tačiau vyresnieji ambasados darbuotojai nuolat svarstė, kaip oficialiai priskirti telegramą, todėl, pasak jo, ponas Merry iš nusivylimo nusiuntė ją per vadinamąjį nesutarimų kanalą – galinį kanalą Valstybės departamentui, įsteigtam per Vietnamo karas leisti diplomatams, kurie nesutiko su JAV politika, pareikšti savo nuomonę. Ilga telegrama ir trumpas Valstybės departamento atsikirtimas buvo tinkamai nusiųsti į užantspauduotą tarnybos paslapčių dėžę.

Tačiau telegrama nebuvo pamiršta. Daugelį metų Rusijos ekspertai Nacionalinio saugumo archyve, ne pelno siekiančioje institucijoje, kuri skelbia išslaptintus vyriausybės dokumentus, vykdė tai, kas buvo žinoma, kaip „ilgoji Wayne'o Merry telegrama“ – linktelėjo į garsiąją George'o Kennano 1946 m. „Ilgąją telegramą“, kuri suformavo JAV politiką Sovietų Sąjungai šaltajame kare. Gruodžio mėnesį Archyvui pagaliau pavyko paskelbti telegramą. Visą telegramos tekstą ir įspūdingą jos istoriją rasite Archyvo svetainėje.

Peržiūrėjus šias diskusijas, prisimenamas baisus 1990-ųjų perėjimas. Rusija buvo sugriuvusi. Bandymai vykdyti rinkos reformas paliko didžiąją dalį gyventojų nuskurdusiais, o vyriausybė kariavo su savimi. 1993 m. spalį, likus keliems mėnesiams, iki ponui Merry parašant savo telegramą, prezidentas Borisas Jelcinas įsakė tankams ir kariuomenei sumušti ginčytiną Parlamentą ir valdė iš esmės dekretu – pritarus Clinton administracijai.

Tai buvo metai, kai Rusija dar buvo atvira Vakarams, o amerikiečiai plūstelėjo į šalį, kaip turistai, studentai, verslininkai ir visokie geranoriški konsultantai. Vladimiras Putinas buvo nežinomas buvęs K.G.B. agentas, dirbantis Sankt Peterburgo merui, vis dar toli nuo valdžios.

Po trisdešimties metų JAV santykiai su Rusija yra patys prasčiausi nuo Šaltojo karo laikų. Kas nutiko? Amerikos arogancijos ir prielaidų negalima atmesti.

Tais neramiais laikais buvau „Times“ biuro Maskvoje viršininkas ir stebėjau privačių ir viešųjų patarėjų paradą, nuoširdžiai bandančių įskiepyti Vakarų liberaliąją demokratiją ant Sovietų Sąjungos karkaso. Nedaugelis turėjo supratimo apie Rusijos istoriją ar visuomenę; daugelis greitai uždirbo turtus chaose. Prisimenu, kaip vienas rimtas Tarptautinio valiutos fondo pareigūnas svarstė, kad jei būtų priimtas fondo nurodymas atleisti energijos kainas, pusė gyventojų mirtinai sušaltų.

P. Merry telegrama buvo kritiška prieš tokį požiūrį. „Net patys pažangiausi ir simpatiškiausi Rusijos pareigūnai prarado kantrybę dėl nesibaigiančios procesijos, vadinamų „pagalbos turistais“, kurie retai kreipiasi į savo šeimininkus, kad įvertintų Rusijos poreikius“, – sakė jis.

Jis rašė, kad Amerikos pastangos turėtų būti sutelktos į „neagresyvią Rusijos išorės politiką ir veikiančių demokratinių institucijų plėtrą“.

Laidą užbaigė išankstinis įspėjimas: „Jei Vakarai su Jungtinėmis Valstijomis pirmenybę teikia ekonominių misionierių vaidmeniui, o ne tikro partnerio vaidmeniui, Rusijos ekstremistai veiks, kad pakenktų besiformuojančiai šalies demokratijai ir skatins atnaujinti Rusijos priešišką poziciją išorinio pasaulio atžvilgiu.

Ar taip atsitiko? Ar Amerikos propagavimas „šoko terapijai“ buvo atsakingas už oligarchų iškilimą ir pono Putino iškilimą?

Diskusijos apie tai, kas prarado Rusiją, yra bergždžios. Mes nežinome ir niekada nežinosime, kokia kita kryptimi galėjo pasukti istorija. Žaidė begalės jėgų.

1994 m. priimtas sprendimas plėsti NATO, paskatinęs dar daugiau ginčų JAV vyriausybėje, tikriausiai, turėjo didesnį vaidmenį, nukreipiant rusus prieš Vakarus, nei klaidingi patarimai.

Tuo metu supratau, kad rusų pasipiktinimas Vakarais ir liberaliąja demokratija didele dalimi išaugo dėl pernelyg išpūstų lūkesčių ir pernelyg romantizuoto Amerikos įvaizdžio. Pirmosios reformatorių ir patarėjų bangos buvo siejamos su pažeminimu ir skurdu, kilusiu po sovietų imperijos žlugimo; p. Putinas pasidalijo šiuo pasipiktinimu ir išmoko jį išnaudoti.

Tačiau tiesa ir tai, kad rusai 1990-aisiais klausėsi amerikiečių. Tiesą sakant, „mes buvome vieninteliai, kurių rusai klausėsi“, – sakė Jamesas F. Collinsas, tais metais ėjęs ambasadoriaus pareigas Maskvoje. „Negalima sumažinti laipsnio, kuriuo pusšimtį metų JAV buvo ta vieta, kurioje buvo pateikti atsakymai, nors, žinoma, buvo tam tikro skepticizmo.”

Viena iš priežasčių buvo ta, kad „sovietinė švietimo sistema nieko nemokė apie tai, kaip veikia rinkos“, – prisiminė Svetlana Savranskaja, kuri 1980-ųjų pabaigoje buvo studentė Rusijoje, o vėliau ilgus metus praleido, bandydama ištraukti pono Merry telegramą iš Valstybės departamento, eidama direktorės pareigas Nacionalinio saugumo archyvo Rusijos programoje.

Taigi, rusai, žinoma, kreipėsi patarimo į Ameriką, kapitalistinę Šiaurės žvaigždę, daugelis lankėsi Jungtinėse Valstijose ir grįžo su entuziazmu dėl prekybos centrų ir energijos.

Nėra prasmės po 30 metų žaisti kaltės žaidimą. Tačiau istorijos apie tai, kaip amerikietis nerūpestingai stumia destruktyvius patarimus svetimoms žemėms – nuo Vietnamo iki Irako ir Afganistano – negalima per dažnai perpasakoti. Tos tolimos diskusijos primena, kad amerikiečiai daro didžiulę įtaką. Jei nepaisome kitų žmonių poreikių arba juos niekiname, galime padaryti didžiulę žalą – jiems, savo šalies interesams ir padėčiai.“ [1]

1. A Secret Cable and a Clue to Where U.S.-Russia Relations Went Wrong: Serge Schmemann. New York Times (Online) New York Times Company. Jan 28, 2025.

A Secret Cable and a Clue to Where U.S.-Russia Relations Went Wrong

"It was March 1994, more than two years after the Soviet Union disintegrated, and the debates within the U.S. Embassy in Moscow were heated. Diplomats in the economic section, backed by the Treasury Department in Washington, argued ardently that radical free-market reforms were the only path for post-Soviet Russia, and that democracy would surely follow. Political advisers believed, equally passionately, that such “shock therapy” would only worsen the devastating dislocation Russians were already suffering with the collapse of the Soviet Union. The Russian people, they warned, would end up blaming America — and democracy itself — for their woes.

In the heat of the debate, E. Wayne Merry, the top political analyst in the embassy and one of the most forceful critics of shock therapy, set out a detailed case against it in a long telegram provocatively titled “Whose Russia Is It Anyway? Toward a Policy of Benign Respect.”

In an essay last month about the telegram, Mr. Merry argued that America back then was falling for the old fallacy of trying to understand a foreign country “by looking in the mirror.” The push for free-market reforms in a country without any experience of a market economy or democracy was, he wrote, “an especially virulent case of Washington institutions trying to ram a foreign square peg into an American round hole.”

Mr. Merry, now a senior fellow at the American Foreign Policy Council, told me in an interview, “Why did I end up writing those 70 paragraphs? I had for two and a half years been writing about these issues, and was very frustrated that nobody in D.C. was interested in anything other than economic theory coming down from Harvard.” He was so frustrated, he added, “I decided it was my duty as head of Pol/Int” — the political/internal department of the embassy — “to tell Washington what was going on.”

But the senior staff at the embassy kept dithering on how to officially attribute the cable, so out of frustration, he said, Mr. Merry sent it over what is known as the dissent channel, a back channel to the State Department set up during the Vietnam War to allow diplomats who differed with U.S. policy to register their views. The lengthy telegram and a brief rebuttal from the State Department were duly consigned to the sealed bin of official secrets.

But the cable was not forgotten. For years, Russia experts at the National Security Archive, a nonprofit institution that publishes declassified government documents, pursued what came to be known as “Wayne Merry’s long telegram,” a nod to George Kennan’s celebrated 1946 “Long Telegram” that shaped U.S. policy toward the Soviet Union in the Cold War. The Archive finally succeeded in publishing the telegram in December. The full text of the telegram, and its fascinating back story, is available on the Archive’s website.

Revisiting those debates recalls the wrenching transition of the 1990s. Russia was in shambles. Attempts at market reforms had left much of the population destitute and the government at war with itself. In October 1993, several months before Mr. Merry wrote his telegram, President Boris Yeltsin had ordered tanks and troops to roust the contentious Parliament and ruled essentially by decree — with the approval of the Clinton administration.

Those were years when Russia was still open to the West, and Americans were pouring into the country as tourists, students, entrepreneurs and all manner of well-intentioned consultants. Vladimir Putin was an unknown former K.G.B. agent working for the mayor of St. Petersburg, still a long way from power.

Thirty years later, the U.S. relationship with Russia is at its worst since the Cold War. What went awry? Mr. American arrogance and presumptions cannot be dismissed.

I was the Times bureau chief in Moscow through those turbulent times, watching the parade of private and public advisers earnestly trying to graft Western liberal democracy onto the carcass of the Soviet Union. Few had any idea of Russia’s history or society; many made quick fortunes in the chaos. I remember one earnest official of the International Monetary Fund musing that if the fund’s prescription for freeing energy prices was adopted, half the population would freeze to death.

Mr. Merry’s cable was a cri de coeur against this approach. “Even the most progressive and sympathetic of Russian officials have lost patience with the endless procession of what they call ‘assistance tourists’ who rarely bother to ask their hosts for an appraisal of Russian needs,” he said.

American efforts, he wrote, should focus instead on a “nonaggressive Russian external policy and development of workable democratic institutions.”

The cable concluded with a prescient warning: “If the West, with the United States in the front rank, prefers the role of economic missionary to that of true partner, we will assist Russian extremists to undermine the country’s nascent democracy and will encourage a renewal of Russia’s adversarial stance toward the outside world.”

Is that what has happened? Was America’s advocacy of “shock therapy” responsible for the rise of oligarchs and the ascent of Mr. Putin?

Debating “who lost Russia” is notoriously futile. We don’t know, and never will, what other direction history might have taken. Myriad forces were at play.

The decision in 1994 to expand NATO, which prompted even more contentious disputes within the U.S. government, arguably had a greater role in turning Russians against the West than misguided advice.

My sense at the time was that the Russians’ resentment of the West and of liberal democracy grew in large part from the souring of their overblown expectations and overly romanticized image of America. The first waves of reformers and advisers came to be associated with the humiliation and poverty that followed the collapse of the Soviet empire; Mr. Putin shared this resentment and learned to exploit it.

Yet it is also true that Russians were listening to Americans in the 1990s. In fact, “we were the only ones Russians were listening to,” said James F. Collins, who served as acting ambassador and ambassador to Moscow in those years. “One can’t minimize the degree to which for a half-dozen years the U.S. was the place with the answers, though admittedly there was some skepticism.”

One reason was that “the Soviet education system taught nothing about how markets worked,” recalled Svetlana Savranskaya, who was a student in Russia in the late 1980s and later spent years trying to wrest Mr. Merry’s cable from the State Department as director of the National Security Archive’s Russia programs.

So Russians naturally turned to America, the capitalist North Star, for guidance, many visiting the United States and returning awed by the malls and the energy.

There is little point 30 years on in playing the blame game. But the story of American blithely pushing destructive advice onto alien lands — from Vietnam to Iraq and Afghanistan — cannot be retold too often. Those distant debates are a reminder that Americans exert enormous influence. If we are oblivious to or disdainful of the needs of other people, we are capable of enormous harm — to them, and to our own country’s interests and standing." [1]

1. A Secret Cable and a Clue to Where U.S.-Russia Relations Went Wrong: Serge Schmemann. New York Times (Online) New York Times Company. Jan 28, 2025.

Mokslas, studijos ir ekonomika

Sekėjai

Ieškoti šiame dienoraštyje

Subscribe Now: Feed Icon

Tinklaraščio archyvas

Apie mane

2025 m. sausio 29 d., trečiadienis

Tragic Consequences of Reinforcement Learning for AI in the West

Slapta telegrama ir užuomina, kur JAV ir Rusijos santykiai pakito

A Secret Cable and a Clue to Where U.S.-Russia Relations Went Wrong

Translate