Sekėjai

Ieškoti šiame dienoraštyje

2026 m. sausio 11 d., sekmadienis

I told AI to make me a protein. Here’s what it came up with


“A new crop of artificial-intelligence models allows users to create, manipulate and learn about biology using ordinary language.

 

“I recently used AI to design an awful protein. Following step-by-step instructions, I made a rudimentary protein language model (PLM) — an artificial intelligence (AI) tool that churns out protein sequences instead of words. With a couple of lines of copied-and-pasted code, I asked the model to dream up a short sequence of amino acids.

 

What’s next for AlphaFold and the AI protein-folding revolution

 

I didn’t know how bad my protein was until I asked AlphaFold, Google DeepMind’s protein-structure predictor, what it looked like. The predicted structure had helices, loops and other realistic elements. But AlphaFold had very low confidence in its prediction — a sign that my molecule probably couldn’t be made in cells in the laboratory, let alone do anything useful.

 

Now, dabblers in computational biology like me have fresh hope. Scientists are developing a new generation of biological AI tools that take instructions in plain language and turn them into proteins and other molecules, including potential drugs. The models also allow researchers to ‘talk’ to cells in ordinary English to decipher their inner workings and glean other biological insights.

 

It is the latest turn of events in the bio-AI revolution that is transforming fields such as protein design and structural biology. PLMs and other AI tools enable scientists to design molecules such as enzymes and antibodies with relative ease. But getting the most out of these tools typically requires considerable expertise.

 

ChatGPT for science: how to talk to your data

 

Models that allow users to interrogate biology using plain text could lower the barrier to joining the bio-AI revolution, say scientists. These AIs also have the potential to enable greater control over the resulting designs and other outputs.

 

“It would be useful to be able to specify precisely what we want, and have a protein be designed with those features,” says Mohammed AlQuraishi, a computational biologist at Columbia University in New York City.

Text-to-protein

 

Last month, a team led by Fajie Yuan, a machine-learning scientist at Westlake University in Hangzhou, China, showed that a text-to-protein model his team developed can design functional proteins, including lab-tested enzymes and fluorescent proteins, that are original in their designs and not similar to existing molecules. “We are the first to design a functional enzyme using only text,” Yuan says. “It’s just like science fiction.”

A molecular model of an protein generated by a plain text biological AI tool.

 

‘An awful protein’: reporter Ewen Callaway created a protein language model (PLM) and used basic code instructions to generate this protein.Credit: Google DeepMind/EMBL-EBI (CC-BY-4.0)

 

The model, called Pinal, is one of several protein-design AIs that can be directed with ordinary language — as opposed to a protein sequence or the structure-guided specifications typical of most such AIs.

 

But it’s early days for these bio-AI models, says Anthony Gitter, a computational biologist at the University of Wisconsin–Madison. “I see it as a high-risk, high-reward area,” he says.

How to speak molecule

 

Teaching biological AI models to communicate in English (or any language) typically involves exposing them to text descriptions of biological data. Yuan’s team trained Pinal using 1.7 billion descriptions of the structures, functions and other characteristics of different proteins. After some extra training, the model could take a prompt and churn out hundreds of sequence designs1. The model has a web interface, and its code and parameters needed to run the model are freely available.

 

AI protein-prediction tool AlphaFold3 is now more open

 

One prompt that the researchers used was ‘Please design a protein that is an alcohol dehydrogenase’, referring to an alcohol-metabolizing enzyme. Yuan and his colleagues then used other computational tools to identify the most promising designs and, working with a biologist collaborator, tested their enzymatic activity.

 

Two of the eight alcohol dehydrogenase designs successfully catalysed the breakdown of alcohol, albeit much less efficiently than natural enzymes. Yuan says his team has also designed working green fluorescent proteins (GFPs) and plastic-degrading enzymes, all dissimilar in sequence to natural examples.

 

Several other teams have developed similar AI models, including one called ESM-3 that can be prompted with keywords, as well as with protein sequences and structures. A start-up firm called 310.ai has developed a proprietary tool called MP4 that designed a slew of proteins from text inputs2, including several that, in the lab, can bind to the cellular energy source ATP. The company is using the model to design proteins that act like GLP-1 drugs, the blockbuster obesity treatments, says its vice-president of discovery Timothy Riley.

One challenge for models such as 310.ai’s is coming up with the right text instructions for an AI to follow, says company co-founder Kathy Wei, although large language models (LLMs) can help to craft successful prompts. She likens it to the early days of image-generating AIs such as Dall-E: some prompts were more fruitful than others, and the models’ struggles to depict human hands, for example, were often a giveaway. Instead of odd-looking hands, MP4 can sometimes spew out proteins with repetitive sequences, says Wei.

Drug design

 

Protein design isn’t the only field in which scientists are wielding AIs with words. A slew of models aims to apply a similar approach to designing chemicals.

 

Major AlphaFold upgrade offers boost for drug discovery

 

Last year, for instance, Gitter’s team released a model that designs small molecules in response to text prompts, and showed that it could design drug-like inhibitors of known protein targets3. The designs haven’t been lab-tested, but computational ‘docking’ tools widely used in drug discovery suggested that some were promising.

 

Scientists are also using bio-AIs to ‘talk’ to cells. Efforts to sequence all the RNA molecules in individual cells have become a bedrock technique in cell biology, revealing unappreciated diversity. But making sense of these data-heavy experiments usually requires intense collaboration between biologists and data scientists, says Christoph Bock, a computational biologist at the Medical University of Vienna.

 

As a shortcut, his lab developed an AI chatbot called CellWhisperer4. It can take plain English instructions — ‘describe these cells in detail’, for example — and return a summary in plain text, or allow users to interrogate a visual representation of a population of diverse cells by ‘lassoing’ those of interest. “It becomes a partner in crime in your data analysis,” Bock says.

 

What’s next for AlphaFold and the AI protein-folding revolution

Cell sentences

 

Another effort translates single-cell sequencing data sets into long lists of the genes that the cells express, and shoves these ‘cell sentences’ into an existing LLM. The resulting model, called Cell2Sentence, can take a single-cell data set and describe characteristics5, such as the kind of immune cell represented, in plain English.

 

And because the model was trained on biological literature and data, it can connect the dots and do things such as predicting how a cancer immunotherapy drug will alter the genes a cell expresses. “Our model can translate between biological language and human language,” says David van Dijk, a computational biologist at Yale University in New Haven, Connecticut, who led the work together with scientists at Google Research and elsewhere.

A molecular model of an protein generated by a plain text biological AI tool.

 

Ewen’s second attempt to create a protein, this time using plain text instructions given to a biological AI model.Credit: Google DeepMind/EMBL-EBI (CC-BY-4.0)

 

Gitter periodically assesses the ability of off-the-shelf LLMs to design proteins, but hasn’t yet been impressed by the results. He asked Amazon’s shopping assistant LLM, called Rufus, to come up with a GFP, but the result lacked a key structural feature of natural GFPs.

 

The current slew of talking bio-AIs is “a little gimmicky”, says AlQuraishi. But the idea of augmenting LLMs with scientific data, such as protein sequences and chemical structures, is a promising one, he adds. “I wouldn’t be surprised if some of the large tech companies are already working on this,” he says.

 

Gimmick or not, they have made a difference to my own project. After my failed attempt, I navigated to Pinal’s web interface and typed in “Make me a good protein”. When I plugged the sequence into AlphaFold, it returned a highly confident prediction. The model resembled a mishmash between spaghetti and fusilli, so I wouldn’t expect it to catalyse a reaction, eat plastic or do much of anything. But it’s a start.” [1]

 

1. Nature 641, 1079-1080 (2025) 20 May By Ewen Callaway

 

Komentarų nėra: