“Tool aims to solve the mystery of non-coding sequences — but is still in its infancy.
Nearly 25 years after scientists completed a draft human genome sequence, many of its 3.1 billion letters remain a puzzle. The 98% of the genome that is not made of protein-coding genes — but which can influence their activity — is especially vexing.
An artificial intelligence (AI) model developed by Google DeepMind in London could help scientists to make sense of this ‘dark matter’, and see how it might contribute to diseases such as cancer and influence the inner workings of cells. The model, called AlphaGenome, is described in a 25 June preprint.
“This is one of the most fundamental problems not just in biology — in all of science,” said Pushmeet Kohli, the company’s head of AI for science, at a press briefing.
The ‘sequence to function’ model takes long stretches of DNA and predicts various properties, such as the expression levels of the genes they contain and how those levels could be affected by mutations.
“I think it is an exciting leap forward,” says Anshul Kundaje, a computational genomicist at Stanford University in Palo Alto, California, who has had early access to AlphaGenome. “It is a genuine improvement in pretty much all current state-of-the-art sequence-to-function models.”
An ‘all in one’ approach
When DeepMind unveiled AlphaFold 2 in 2020, it went a long way towards solving a problem that had challenged researchers for decades: determining how a protein’s sequence contributes to its 3D shape.
Working out what DNA sequences do is different, because there is no one answer, as in a 3D structure that AlphaFold delivers. A single DNA stretch will have numerous, interconnected roles — from attracting one set of cellular machinery to latch onto a particular section of a chromosome and turn a nearby gene into an RNA molecule, to attracting protein-transcription factors that influence where, when and to what extent gene expression occurs. Many DNA sequences, for example, influence gene activity by altering a chromosome’s 3D shape, either restricting or easing access for the machinery that does the transcription.
Biologists have been chipping away at this question for decades with various kinds of computational tool. In the past decade or so, scientists have developed dozens of AI models to make sense of the genome. Many of these have focused on an individual task, such as predicting levels of gene expression or determining how modular segments of individual genes, called exons, are cut-and-pasted into distinct proteins. But scientists are increasingly interested in ‘all in one’ tools for interpreting DNA sequences.
AlphaGenome is one such model. It can take inputs of up to one million DNA letters — a stretch that could include a gene and myriad regulatory elements — and make thousands of predictions about numerous biological properties. In many cases, AlphaGenome’s predictions are sensitive to single-DNA-letter changes, which means that scientists can predict the consequences of mutations.
In one example, DeepMind researchers applied the AlphaGenome model to diverse mutations identified in previous studies in people with a type of leukaemia. The model accurately predicted that the non-coding mutations indirectly activated a nearby gene that is a common driver of this cancer.
Still limited
AlphaGenome was trained on genomic and other experimental data from humans and mice only. It might work as well on related organisms, but the researchers didn’t test this, said Žiga Avsec, a DeepMind scientist, at the briefing. Neither was the model designed to reliably interpret an individual’s genome, or to provide a full picture of how variants influence complex diseases.
There is room for improvement in the accuracy of the AlphaGenome’s predictions. For instance, the model struggles to identify sequences that alter the expression of a gene located more than 100,000 base pairs away. “This model has not yet ‘solved’ gene regulation to the same extent as AlphaFold has, for example, protein 3D-structure prediction,” adds Kundaje.
One thing that AlphaGenome — and similar models — don’t yet capture is how a cell’s changing nature can affect how DNA sequences function, says Peter Koo, a computational biologist at Cold Spring Harbor Laboratory in New York. These models are trained to make predictions in one fixed setting, but cells are dynamic. Protein levels, chemical tags on DNA and other conditions can shift over time or between cell types — and that can change how the same sequence behaves.
Koo predicts that researchers will build on AlphaGenome by using the model to design ‘regulatory’ DNA sequences that allow control over when and where a gene is active, for instance, or to run virtual experiments that simulate how cells respond to genetic changes.
For now, researchers doing non-commercial work can access the model through DeepMind’s servers using a programming interface. A fuller release — that would enable more-sophisticated applications — is planned for the future.” [1]
1. Nature 643, 17-18 (2025) By Ewen Callaway
Komentarų nėra:
Rašyti komentarą