"Machine learning will transform our understanding of protein folding. And it’s essential that all data be open.
The AlphaFold machine-learning tool can predict 3D structures of full protein chains for 98% of human proteinsCredit: Karen Arnott/EMBL-EBI
“I didn’t think we would get to this point in my lifetime.” That’s how one research leader in structural biology responded to last week’s publication of research in which artificial intelligence (AI) was used to predict the structure of more than 20,000 human proteins, as well as that of nearly all the known proteins produced by 20 model organisms such as Escherichia coli, fruit flies and yeast, but also soya bean and Asian rice. That is a combined total of around 365,000 predictions1.
The data, publicly accessible for the first time (see https://alphafold.ebi.ac.uk), were released online on 22 July by researchers at DeepMind, a London-based AI company owned by Google’s parent company, Alphabet, and the European Bioinformatics Institute, based at the European Molecular Biology Laboratory (EBI-EMBL) near Cambridge, UK.
DeepMind’s AI predicts structures for a vast trove of proteins
The DeepMind team developed a machine-learning tool called AlphaFold. The team trained this program on DNA sequences, including their evolutionary history, and the already-known shapes of tens of the thousands of proteins contained in a public-access database of proteins hosted by the EBI-EMBL researchers. A week earlier, DeepMind also released the source code for AlphaFold and detailed how it was constructed2, at the same time that researchers from the University of Washington, Seattle, published details of another protein-structure prediction program — inspired by AlphaFold — called RoseTTAFold3.
For the human proteome, 58% of its predictions for the locations of individual amino acids were good enough to be confident in the shape of the protein’s folds, Tunyasuvunakool says. A subset of those predictions — 36% of the total — are potentially precise enough to detail atomic features useful for drug design, such as the active site of an enzyme.
The unveiling of this catalogue of predicted structures would not be nearly such good news were the data and the methodology not open and freely available. Structural biologists and other researchers are already starting to use AlphaFold to obtain more-accurate models for proteins that have been difficult or impossible to characterize by current experimental methods." [1]
1. Nature 595, 625-626 (2021)
Komentarų nėra:
Rašyti komentarą