"Metagenomics see our environment as a source of undiscovered
viruses and germs: How Big Data biologists are genetically re-discovering the
world.
John Dennehy and his colleagues from the biology department
in the borough of Queens don't know exactly what has happened in the New York
sewers over the past two years. One thing is clear: the pandemic has produced a
lot of new and unknown things here. New Sars-CoV-2 strains that had not yet
been described were romping about in the wastewater. Between January and June
2021, Dennehy and his team took samples every two weeks. They had been sent
underground as outposts after the first, devastating wave of infections. It was
now known that the face of the pandemic virus, which had spread from Wuhan,
China, to the east and west coasts of the United States the year before, had
long since changed. New variants with a dozen mutations and significantly
different properties—higher infectivity, faster replication—had arisen and
spread in different places: alpha, beta, gamma, delta, epsilon, kappa. The
world had pricked up its ears, even the virologists were surprised by the
astonishing evolutionary "drive", the ability to change, of the
corona virus.
Dennehy and his colleagues didn't have to look far for
anything unusual either. In the wastewater samples, they used their molecular
filters to fish out countless gene snippets that could clearly be assigned to
the RNA virus first described in Wuhan and then again not. Although many
mutations had become known in patients in the New York Covid clinics after the
sequence analyzes that had been carried out more and more in the meantime,
there remained a residue that the researchers called "cryptic virus
lines". Laboratory tests designed to examine the function of the unusual
gene sequences showed that the coronavirus had obviously greatly expanded its
host range in the first year of the pandemic: With the modified spike proteins
on the surface, it could not only infect cells from humans, but also other
cells of rats and mice.
In addition, mutations were discovered that occur in the
omicron variant, which was only described many months later and is now
dominant. Laboratory viruses equipped with the appropriate surface molecules –
so-called pseudoviruses – were resistant to antibodies. A large number of
completely mysterious gene snippets whose properties have not yet been
clarified were also found and described in the wastewater samples. Whether they
come from a previously unknown animal reservoir or from Covid-19 patients whose
viruses had fallen through the inevitably incomplete sequencing grid - it also
remains unanswered in the evaluation that the New York biologists have now
published in the journal "Nature Communications”.
A huge, valuable data bubble
So-called metagenomics are only too familiar with the
experiences of the biologists from Queens. Metagenomics is the umbrella term
for very special techniques used to decode the genetic material of organisms
from a specific habitat - of organisms that are largely unknown because they
cannot be reproduced, cultivated and examined in the laboratory. Global genome
surveys on the fly, sort of. Recently, these processes, which were created in
the 1990s, have developed at breakneck speed. Huge equipment parks and genome
databases have been set up and software tools have been programmed for
identification. "Big Biology": Similar to Big Data in AI research,
this is a resource that is growing almost daily. A data bubble that grows as a
network between laboratories around the world. Peer Bork from the European
Molecular Biology Laboratory (EMBL) in Heidelberg is one of the pioneers in
this field. When he has to formulate the goal of metagenomics, he is never
petty.
The aim should be nothing less than the recording of as many
organismic genes as possible worldwide - and thus the recording of the
molecular basis of all life on earth today. A gene project of truly planetary
proportions. Maybe just a utopia. If you consider that the human reference
genome was finally published only last year, i.e. without gaps that could not
be sequenced, the claim of the metagenomics sounds a little more utopian. And
yet they are currently spreading an almost unprecedented optimism. It is
expressed in huge numbers with lots of zeros.
A few days ago, an international team of computer
biologists, including scientists from Heidelberg and Tübingen, published their
database analyzes with "Serratus" in "Nature". This is a
new, shared data cloud in which 5.7 million sequenced gene snippets from all
over the world have been sequenced so far are stored and in which the genetic
information can be examined for virus traces using special tools. Serratus is an Open Science project to uncover the planetary virome, freely and openly.
10.2
petabases of genetic information have come together in the last thirteen years,
in other words: a one followed by fifteen zeros - that's how many gene building
blocks, collected on all continents and in oceans, are what Serratus has in
mind.
Within a few days and with more than 20,000 computer processors working
in parallel, the bioinformaticians discovered no fewer than 131,957 new RNA
viruses in this mess. So far, only 15,000 RNA viruses were known and very few
could be cultivated in laboratories. Among the new viruses from the gene
library are at least nine corona viruses that have not been described before.
Hundreds of thousands of variations of RNA viruses
To a certain extent, three short sections of the gene
responsible for building the RNA-dependent RNA polymerase served as fishing
hooks for the unknown RNA viruses. It is the enzyme that RNA viruses usually need
in order to multiply regularly. Even this essential and highly conserved
molecule can differ significantly among the extremely versatile RNA viruses.
Hundreds of thousands of variations have been identified by biologists. The
group of small hepatitis delta viruses, which was relatively manageable with 13
members up to 2018, was expanded in one fell swoop by what is believed to be
more than three hundred variants. However, many of these metagenomic
discoveries have yet to be confirmed by further laboratory analysis. And even
if that were to succeed for all of these discoveries, the virus researchers are
still far from having exhaustive knowledge of the global “Virom” – the world
empire of viruses.
So far, science has probably studied little more than 0.001
percent of earthly viruses in more detail.
The situation is not much different with the tiny creatures
that Peer Bork and many of his metagenomics colleagues have their eyes on
worldwide: the bacteria. Shortly after the human genome became public in its
first rough version, Bork and his colleagues traveled practically all over the
world to scan the environment for bacterial genetic material. In the years that
followed, the intestinal flora of thousands of people was analyzed, and the
individual microbial fingerprints of more and more volunteers were examined to
see how they changed in the event of illness or under the influence of
medication. Logistical and digital mammoth projects, all of which not only deal
with very practical medical questions, but also advance the overall genetic
view of the planet ever faster.
In a new preprint published in “bioRxiv” by the
biotechnologist Jaime Huerta-Cepas from Madrid together with Bork and
scientists from Berlin to Shanghai, the gene repertoire in bacteria is suddenly
multiplied. The researchers have identified nearly 400 million genes in five
large metagenome databases covering germs from 82 “habitats” – from the human
gut and vagina to marine and sewage samples. Of course, most of these merely
reflect thousands of variations on the same gene. Nevertheless: The gene
discoveries in the previously uncultivable bacteria greatly expand nature's
natural catalog of genes.
In any case, Bork and his colleagues have created
their own computer database for the updating of bacterial gene diversity on
earth: "Global Microbial Gene Catalog v1.0".
No one knows what all these mega gene projects, which are
completely unmanageable for laypeople, will ultimately be good for. One idea is
that, for example, the spread of antibiotic-resistant bacteria can be tracked
and problematic germs can be identified at an early stage. Similar to the
viruses. However, it cannot be predicted with certainty whether the scientists
will ever actually get the chance to use mass sequencing to identify and even
predict the development of infectious variants or those with the potential to
escape the immune system at an early stage. In any case, the computer tools and
data treasures of metagenomics could be suitable for discovering threatening
signals in the environment earlier."
Komentarų nėra:
Rašyti komentarą