"Imagine: A book with trillions of pages - no one can read
it. One could at best search in it, but how is that supposed to work with the
masses? To make matters worse, many of the pages are constantly changing, and
more are added every day.
So how do you open up this giant book, which of
course is not a book, but the electronically stored information of mankind -
the World Wide Web? Basically the same as with an ordinary book. With a table
of contents or better: a keyword index, also called an index.
Before someone enters a search term and presses the return
key, Google, the world's leading search engine, has already done the
preliminary work and compiled an index. So-called crawlers do this at the front
line - you can translate that as reptile or caterpillar. They eat their way
through the web like the insatiable caterpillar. The search programs follow all
the links on a page, and there in turn the links found on the new page. And so
forth.
The pages will be downloaded. Then the system looks at the
pages - albeit differently than a human being. It looks for links and what
words appear on the page, tries to recognize what's in pictures and classifies
accordingly. Each word goes into the index - that is, the keyword index - and
is in the company of similar words found on other websites. This index alone is
around one hundred million gigabytes in size - tens of thousands of hard drives
are needed for this.
More than just words
As a Google user, however, you know that the search can do
much more than just scour the web for individual words. This would lead to
unsatisfactory results for most queries. Therefore there are algorithms, i.e.
mathematical procedural rules, with the help of which the results are improved
and finally weighted. The original idea (and the cornerstone of Google's
superiority over other search engines) was the page rank algorithm.
The assumption: sites that are linked to a lot probably have
relevant information.
The page rank still plays a role today, but is only one
criterion among many. There are now several hundred algorithms, and they are
constantly changing. Most of the time, at least as a normal user, you don't
notice much of it, but sometimes there are major changes. One reason for this
can be, for example, that Google is trying to prevent tricks that site
operators want to use to cheat themselves up in the order of the search
results.
What do users actually want?
Google is known for automating everything - and with the
size of the imaginary book, there is no other way. Artificial intelligence (AI)
has long played a major role in which search results are presented to users. It
starts with recognizing what a user actually wants. When does a user mean a financial
institution with "bank" and when does it mean a park bench? More and
more new and more sophisticated AI systems ensure that the search always shows
good results right away - even for much more complex questions. This is also
necessary, because around 15 percent of all inquiries have never been made.
This is due, for example, to current events such as sporting events or
political developments.
Thousands of servers help
In order to be able to display results quickly, a search
query can temporarily occupy a thousand different server computers in one of
Google's data centers - if only because the index is so large and therefore has
to be divided between many servers. Finally, algorithms try to interpret what
is being searched for and determine the order in which the results are
displayed. A lot is taken into account, including whether you're searching from
a mobile phone or a desktop computer, or whether the topic is current, such as
the result of a soccer team's most recent game. In the settings, users can
specify which of their data Google may use, for example the current position.
Depending on how much data you disclose, the search can show individually more
suitable results.
The fact that the search now leads to good results more and
more often is also due to the fact that the AI algorithms are better and
better at guessing what is really being searched for by training on masses of
data. So how the search terms that a user enters are related, which context is
meant - even if the actual term does not appear in the search query. Google has
developed a kind of universal system for this that is largely independent of
the languages to which it is applied.
But it still doesn't work without people. Any change to an
algorithm is only tested on a test group before being unleashed on all Google
users. Only the very large providers have such masses of data, and it will be
difficult for potential competitors to keep up. They lack both the data and the
computing power to process it."
Komentarų nėra:
Rašyti komentarą