Sekėjai

Ieškoti šiame dienoraštyje

2022 m. kovo 21 d., pirmadienis

Thousands of servers and hundreds of algorithms: what exactly happens when you click the search button on Google?


"Imagine: A book with trillions of pages - no one can read it. One could at best search in it, but how is that supposed to work with the masses? To make matters worse, many of the pages are constantly changing, and more are added every day. 

So how do you open up this giant book, which of course is not a book, but the electronically stored information of mankind - the World Wide Web? Basically the same as with an ordinary book. With a table of contents or better: a keyword index, also called an index.

Before someone enters a search term and presses the return key, Google, the world's leading search engine, has already done the preliminary work and compiled an index. So-called crawlers do this at the front line - you can translate that as reptile or caterpillar. They eat their way through the web like the insatiable caterpillar. The search programs follow all the links on a page, and there in turn the links found on the new page. And so forth.

The pages will be downloaded. Then the system looks at the pages - albeit differently than a human being. It looks for links and what words appear on the page, tries to recognize what's in pictures and classifies accordingly. Each word goes into the index - that is, the keyword index - and is in the company of similar words found on other websites. This index alone is around one hundred million gigabytes in size - tens of thousands of hard drives are needed for this.

More than just words

As a Google user, however, you know that the search can do much more than just scour the web for individual words. This would lead to unsatisfactory results for most queries. Therefore there are algorithms, i.e. mathematical procedural rules, with the help of which the results are improved and finally weighted. The original idea (and the cornerstone of Google's superiority over other search engines) was the page rank algorithm.

The assumption: sites that are linked to a lot probably have relevant information.

The page rank still plays a role today, but is only one criterion among many. There are now several hundred algorithms, and they are constantly changing. Most of the time, at least as a normal user, you don't notice much of it, but sometimes there are major changes. One reason for this can be, for example, that Google is trying to prevent tricks that site operators want to use to cheat themselves up in the order of the search results.

What do users actually want?

Google is known for automating everything - and with the size of the imaginary book, there is no other way. Artificial intelligence (AI) has long played a major role in which search results are presented to users. It starts with recognizing what a user actually wants. When does a user mean a financial institution with "bank" and when does it mean a park bench? More and more new and more sophisticated AI systems ensure that the search always shows good results right away - even for much more complex questions. This is also necessary, because around 15 percent of all inquiries have never been made. This is due, for example, to current events such as sporting events or political developments.

Thousands of servers help

In order to be able to display results quickly, a search query can temporarily occupy a thousand different server computers in one of Google's data centers - if only because the index is so large and therefore has to be divided between many servers. Finally, algorithms try to interpret what is being searched for and determine the order in which the results are displayed. A lot is taken into account, including whether you're searching from a mobile phone or a desktop computer, or whether the topic is current, such as the result of a soccer team's most recent game. In the settings, users can specify which of their data Google may use, for example the current position. Depending on how much data you disclose, the search can show individually more suitable results.

The fact that the search now leads to good results more and more often is also due to the fact that the AI ​​algorithms are better and better at guessing what is really being searched for by training on masses of data. So how the search terms that a user enters are related, which context is meant - even if the actual term does not appear in the search query. Google has developed a kind of universal system for this that is largely independent of the languages ​​to which it is applied.

But it still doesn't work without people. Any change to an algorithm is only tested on a test group before being unleashed on all Google users. Only the very large providers have such masses of data, and it will be difficult for potential competitors to keep up. They lack both the data and the computing power to process it."


Komentarų nėra: