Gibberish Scores


The gibberish score is a pending Google patent from 2009, which is supposed to make it easier for the search engine, to distinguish unique content more specifically from thin content, copied content, and webspam.

Word definition[edit]

Gibberish literally means unintelligible or meaningless speech or writing. It is part of a language that does not represent meaningful significance. In practice, gibberish is used in vocal exercises before concerts, in theaters when a speaking crowd is to be simulated, for example. The gibberish score is therefore supposed to recognize meaningless content.  

Background[edit]

Ever since the implementation of the influential Panda and Penguin updates, Google has massively stepped up its efforts in fighting webspam in its search result lists. With the Hummingbird update in autumn 2013, the search engine has underlined once again how precisely it can evaluate queries to deliver appropriate search results. The patent-pending gibberish score is intended to help to distinguish “good” content from “bad” content that has no added user value.  

Function[edit]

The gibberish score of a document is determined according to specifications based on two different approaches:

  • language model Scores: A language review of the content is conducted. The source code is first adjusted for the HTML elements, and then broken down into small units of speech. These text units are then compared with the text units of other content which are stored within a specific time period for a particular search query. Most likely the algorithm detects certain phrase within a text. The “language model score” is a summary of these probabilities.
  • query stuffing score: Using this method the algorithms determine how relevant a text is for a certain search query while determining at the same time whether the relevance was achieved through keyword stuffing, i.e. the increased use of money keywords, for example, achieved in an unnatural way. The special feature of this method, however, is that not only individual keywords are taken into account, but also whole phrases.

The evaluation of the two “scores” described here, comprises the gibberish score of a web document. The consequence of too many gibberish elements in a text is the devaluation of the webpage in question in the rankings for a specific search term.

File:600x400-GibberishScores-01.png|link=

Benefits for search engine optimization[edit]

The announcement that Google had filed a patent for the gibberish score fits into the debate about WDF*IDF optimization which was especially intense in 2013. According to the TF*IDF principle, text is no longer created only based on keyword density, but also the meaning of the individual terms in relation to their use on other websites which are relevant to the keyword being optimized for.

Thus, the task of on-page optimization is becoming more complex especially in content creation. Content generation can be easily outsourced to a company such as Demand Media. The gibberish score can serve to control. Massively keyword optimized content, lightly rewritten content, and machine-generated text are identified more easily as such through the use of the gibberish score. The Google patent is therefore a further step towards high quality websites that are created for visitors and not for the search engine.

Web Links[edit]