Term frequency (TF) is used in connection with information retrieval and shows how frequently an expression (term, word) occurs in a document.
Term frequency indicates the significance of a particular term within the overall document. This value is often mentioned in the context of inverse document frequency IDF. The term frequency value is consulted, among other things, for the calculation of Keyword Density.
If one considers term frequency in isolation, it is not conclusive with regard to the relevance of a document to a specific keyword. This is because term frequency is exclusively based on the nominal frequency of the keyword. The following example illustrates this:
In a long text about the construction of a house consisting of 3,000 words, the term “wall paint” appears five times. A painting company explains the most important types of colors and uses the word “wall paint” only twice on their website in a text of 500 words, because they use synonyms and color types in the copy. If you were to rely solely on term frequency for the assessment of topic relevance, the long text would seem more relevant than the short text since the keyword appeared five times, although it is definitely not applicable due to the content. Therefore, term frequency can only be used as a component of other evaluation criteria, for example, keyword density.
Term frequency can be informative when it is set in relation to text length. This will give you keyword density. For this purpose, the following formula is used:
Keyword density = term frequency / total word number x 100
For the above example, the following keyword weights would result:
This shows a higher relevance of the second document, since the keyword has a larger relative frequency than text 1.
The WDF*IDF formula for the optimization of texts also uses term frequency. The frequency of a keyword is viewed in relation to the document length. At the same time, logarithms ensure that too frequently occurring terms are not weighted too heavily. Moreover, words that are very common in the language (such as conjunctions, prepositions, articles) are weighted lower.
The Google search engine uses a special algorithm for automatic indexing of web documents. It is kept secret. Experts try to decipher this algorithm through a mathematical approach in order to understand the process of indexing. It is assumed that search engines incorporate mathematical values like term frequency (TF) or within-document-frequency (WDF) in the evaluation of the content of a website. It is advisable to determine these values for your own web documents (pages) in order to present a content-relevant webpage to the search engines.