Stemming


The term stemming stands for a procedure in linguistics as well as computer science. Words are traced back to a common word stem. Search engines such as Google use stemming algorithms to better evaluate search queries. Stemming is related to the lemmatization, where words are reduced to their basic or dictionary format.

Stemming algorithms in computer science

Since 1968, computer science has been researching various stemming processes. These methods are used primarily for search engines for every aspect of their functionality. Stemming makes it easy to search large catalogs. Library directories work with stemming as well.

Practical relevance

The word stem “house” can occur in many different words. The task of a stemming algorithm is to recognize this stem in different search queries and to deliver the appropriate results for the determined stem.

  • Housing
  • Birdhouse
  • Warehouse

In lemmatization, however, the program recognizes the stem form or basic form of a word. This is also called lexicon-based stemming.

Basic form: go

  • he/she went
  • gone

Both procedures must be based on a large amount of data and must contain a huge number of rules in order to deliver the most exact results possible because mere reduction to the stem or the basic form alone, does not help to filter out synonyms or words with the same stem but with different meanings. Thus, the two compound words “birdhouse” and “warehouse” may have the same stem “house,” but in practice they are two completely different things. A search engine user certainly would not simultaneously expect data on both as a search result simultaneously when entering “house” in the search slot.

Stemming used by search engines

Google has been officially working on various stemming procedures since 2004, which are designed to optimize the work of the search engine. These adjustments to the search algorithm allow Google to better distinguish search queries. For example, whole sentences are analyzed more closely and Google also hides stop words. If you are specifically searching for a term or a particular sentence group, you can circumvent stemming by enclosing the searched term combination in quotation marks. In practice, you can see how Google deals with search terms.

For example, the search engine uses the stemming technique to add search terms if too few keywords were entered. If a search snippet contains a part of the search query, this or the word stem is marked in bold in the snippet. Stemming also helps Google better identify identical content. For example, the search engine can use the technology to quickly check whether a text is an original or reworded content.

Benefits for SEO

If you want to create content for the web and optimize it for search engines and users, you can also use the advantages of stemming. However, the optimization does not refer to the reduction of content to word stems or basic forms, but to similar terms that are related to the topic of the main keyword. A text about apples could therefore contain information on further uses and apple varieties, in addition to a description of the fruit. That way, the text would even more clearly point to the origin, i.e. the apple and increase its relevance to this concept.

Web Links