Term Frequency


Term frequency (TF) is used in connection with information retrieval and shows how frequently an expression (term, word) occurs in a document.

Term frequency indicates the significance of a particular term within the overall document. This value is often mentioned in the context of inverse document frequency IDF. The term frequency value is consulted, among other things, for the calculation of keyword density.

Criticism

If one considers term frequency in isolation, it is not conclusive in regard to the relevance of a document to a specific keyword. This is because term frequency is exclusively based on the nominal frequency of the keyword. The following example illustrates this notion for the query "wall paint":

In a long text about the construction of a house consisting of 3,000 words, the term “wall paint” appears five times. A painting company explains the most important types of colors and uses the word “wall paint” only twice on their website in a text of 500 words because they use synonyms and color types in the copy. If you were to rely solely on term frequency for the assessment of topic relevance, the long text would seem more relevant than the short text since the keyword appeared five times, although it is definitely not applicable due to the content. Therefore, term frequency can only be used as a component of other evaluation criteria, for example, keyword density.

Expanded application as keyword density

Term frequency can be informative when it is set in relation to text length. This will give you keyword density. For this purpose, the following formula is used:

Keyword density = term frequency / total word number x 100

The example above would result in the following keyword weights:

  • Text 1: 5 / 3,000 x 100 = 0.17 percent
  • Text 2: 2/500 x 100 = 0.4 percent

This shows a higher relevance of the second document since the keyword has a larger relative frequency than text 1.

Term frequency as part of TF*IDF

The TF*IDF (term frequency-inverse document frequency) formula for text optimization also uses term frequency. The frequency of a keyword is viewed in relation to the document length. At the same time, logarithms ensure terms that occur more frequently are not weighted too heavily. Moreover, words that are very common in the language (such as conjunctions, prepositions, articles) are weighted lower.

Relevance to search engine optimization

The Google search engine uses a special algorithm to automatically index web documents but it is kept secret. Experts try to decipher this algorithm mathematically in order to understand the process of indexing. It is assumed that search engines incorporate mathematical values like term frequency (TF) or within-document-frequency (WDF) in the evaluation of the content of a website. It is advisable to determine these values for your own web documents (pages) in order to present a content-relevant webpage to the search engines.