Indexing

Generally, indexing refers to a method of information acquisition (information development), whereby documents are collected and sorted based on keywords. Subsequently, an index is formed which is similar to a library. The indexed documents, mostly text content, get prepared for a search for a specific document or keyword and provided with descriptors.

If you search for a keyword and the related documents, ideally the most relevant content gets displayed. In a library, descriptors may be data such as author, title or ISBN numbers. In principle, the same thing happens with a query on the Internet. In other words, the term indexing denotes the formation of an index where web documents are collected and sorted using various descriptors (such a keywords) and made available for subsequent searches (information retrieval).

General information

The indexing of web documents is an extensive and complex process, which uses various methods of information science, computer science, and computational linguistics. In addition to information development (explained above) and information retrieval, another important term is data mining which is the sorting out of valuable content from a large amount of data.

Various processes associated with indexing occur before a search term is entered. Web documents must be searched and parsed (see Crawlers, Spiders, Bots). These are collected, sorted, and hierarchized in an index before they can be displayed in the SERPs of search engines in a particular sequence. Search engine providers such as Google, Yahoo or Bing are constantly working to improve the indexing of websites to provide the most relevant content.

Google has recently fundamentally changed its index and introduced the Caffeine Index. It is supposed to include web content faster in the index by constantly searching certain parts of the global Internet synchronously. Moreover, web content such as videos or podcasts are supposed be found more easily.^[1]

Practical relevance

Different consequences and possibilities arise for site operators and webmasters with regard to indexing. If a web page is to be indexed and found in the index, it must first be available for the crawler or spider. If it is a new website, it can be submitted to the search engine to get included in the index by registering it. The website must be findable by the crawler and readable to a certain degree.

Metatags, which can be listed in the head section of a webpage, are a way to ensure this. They can also be used to suppress access for crawlers in order to exclude a particular page from the index. Canonical tags and other tags in the robots.txt file can also be used for this purpose. The indexing status can be retrieved in the Google Search Console. URLs which can already be found in the index are displayed under the Google index and Indexing status tabs. That includes those that have been blocked by the site operator.

Indexing and SEO

Indexing is very important for search engine optimization. Webmasters and site operators can control this process from the start and ensure that web pages are crawled, indexed, and then displayed in the SERPs. However, their position on the SERPs can only be influenced with various on-page and off-page measures and the provision of high quality content.

You should also stay current, since Google modifies its algorithms quite regularly in order to exclude spam sites or link networks from the index.

References

↑ Our new search index: Caffeine. googleblog.blogspot.de. Accessed on 02/07/2014

Web Links

Matt Cuts on indexing and crawling in a video post

[1] Our new search index: Caffeine. googleblog.blogspot.de. Accessed on 02/07/2014

[1]