Latent Semantic Indexing

Latent Semantic Indexing (LSI) is a technology that is used by various search engines for several years now in order to increase the quality of search results.


Latent indexing is different to a pure keyword search. In a keyword search, the search engine displays only results that contain the searched word or phrase.

In latent semantic indexing on the other hand, semantically related search results are output as well. This means that under certain circumstances results may be listed at the top which do not even contain the searched keyword. This results from the fact Google finds an article highly relevant due to the use of many semantically related words. Accordingly, a semantically related text may be even more relevant than one that contains the search phrase several times.

Semantic approach[edit]

Search engines link sense and meaning of different words and contents through semantics. It detects semantic words and phrases as well as associated synonyms, antonyms, and other relationships between different word combinations. Words that appear in any text are excluded. These include, for example, determiners and prepositions (such as and, the, or, by, in). Only content words will remain for the evaluation of the semantic context.

Review of search results[edit]

The search engine will execute the search and initially identify the results based on the searched keyword or phrase. Next, additional, possibly topic-relevant websites will be searched for. The search engine checks whether a site is semantically distant or close based on the frequency of semantically related words and phrases. Semantically close websites that have to do with the search request are displayed according to their relevance in the SERPs. How LSI works in practice is explained by Robert Dorsey in this YouTube video.

Relevance to search engine optimization[edit]

A whole new SEO discipline has formed based on latent semantic indexing. The principles of SEO remain the same in latent semantic optimization, but are extended to include the field of semantics. Specifically, this means that when writing articles, it should be ensured that the keywords are included not just in their unmodified form, but also semantically related words and phrases are used as well. That way ranking in search engines that use latent semantic indexing can be improved.