Focused Crawler


A focused crawler focuses its indexing on topic-relevant and particularly current websites. It is in contrast to the universal search engine, whose purpose is to display as many of the represented websites as possible.

Focused crawler basics[edit]

It is almost impossible to record the entire Internet in an index. This depends on the one hand, on the naturally limited performance of the crawler in question and on the other hand, the rapid growth of the Internet. Focused crawlers are limited to a certain area of ​​the web and in turn, index it in great detail.

Generally, the use of focused crawlers means that websites which were granted a particularly high importance or which are updated very frequently will be preferred. Google, for example, uses focused crawling at least with regard to the fact that rarely or never updated websites are visited less frequently than regularly updated sites. The relevance of a website that is not updated, decreases for the Google algorithm because content that doesn’t get updated will become obsolete sooner or later in most cases.

Areas of application[edit]

A typical use of a focused crawler is the creation of digital libraries in a particular area of knowledge. The amount of documents recorded here is less important than the high quality standard. The principle “quality over quantity” applies in this area. The higher amount of time required for the recognition of the quality, however, can be offset by the reduced total volume indexing.

A focused crawler crawls the web looking for topic-relevant websites on a specific subject and disregards websites that are not relevant.

Advantages[edit]

Since a focused crawler does not try to index the whole web but only a relatively narrow sub-region, the amount of computation power required is considerably less. Fewer network resources are used. At the same time a relevant document collection with a particularly high quality and timeliness can be assembled that way. The currentness of the content can be ensured through shorter visit intervals. The crawler reduces the proportion of useless information and combines thematically relevant knowledge at the same time.

Relevance to search engine optimization[edit]

The objective of search engine optimization is for a website to rank well in the SERPs. To achieve this goal, Google has to consider a website relevant. A focused crawler recognizes websites that are frequently updated as particularly relevant. SEOs therefore have the task of regularly providing new content. Such content should ideally provide a high degree of added value and be unique, in order to fulfill Google’s quality requirements. If the search engine reduces the frequency of visits because of rarely updated content, worse ranking result in the SERPs may be the result.