« Back to front page

The Monster of Website Optimization – The Duplicate Content Monster

Have you paid good money for your SSL certificate but now find that the http version of your site ranks in Google SERPs? The reason for this debacle could be the inconspicuous Duplicate Content Monster.

We will show you in just a few steps how to quickly identify and deal with it!

Magazin-Monster-DuplicateContent-en Duplicate Content

Similar content, which can be found under different URLs or domains, are referred to as Duplicate Content. Duplicate content makes it difficult for search engines to provide the user with the best and most relevant search result because they have to choose between different versions.

Website owners should, therefore, aim to produce unique content. This minimizes the risk of duplicate content.

The most common causes of duplicate content include:

  • Content in various versions (printed versions, PDF, etc.)

  • Automatically-generated documents or lists

  • Lack of server configuration (with / without www)

  • Page is accessible via http and https

  • Page is invokable with and without / at the end

  • Page is invokable with both upper and lower case characters

  • Extensive footer content and sidebars

  • Unauthorized copying (content theft)

Our Monster-Destroying Practical Example:

It is often unavoidable to display duplicate content, especially for online shop operators, for example if a product with the same description is listed in several categories, or is available in different colors. A typical case of duplicate content can be found, for example, in the Deichmann Onlineshop:

Bildschirmfoto_2017-05-23_um_15.47.25 Duplicate Content

Figure 1: Sneakers at Deichmann

The product details page for the sneakers displayed in Figure 1 are available in the online shop both under the category "ladies shoes trainers" (URL 1):

http://www.deichmann.com/GB/en/shop/home-ladies/home-ladies-shoes/home-ladies-shoes-trainers/00009001447038/Slip*On*Casual*Shoes.prod

as well as under the category "Ladies shoes" (URL 2)

http://www.deichmann.com/GB/en/shop/home-ladies/home-ladies-shoes/00009001447038/Slip*On*Casual*Shoes.prod

Solution: The Canonical Tag

The canonical tag is a meta-element in the area of a website that allows you to point search engines to the original URL. Search engines then usually index only this "canonical URL" and ignore the copy. If you want to avoid indexing a page with duplicate content, insert the tag on this page and enter the link to the original URL.

Deichmann does everything right in the source code. Both URLs point to URL 2 by means of the canonical tag, thereby indicating to the search engine that this is the original URL.

Bildschirmfoto_2017-05-23_um_15.50.52-1 Duplicate Content

Figure 2: Extract from the source code.

Duplicate content can waste valuable potential. Therefore, it is not only important to reduce the number of duplicates, but also to establish the necessary technical requirements. The canonical tag is a quick fix to prevent duplicate content. It can and should be used selectively, but in no case should it be used to solve duplicate content on a large scale. Despite Canonical tags, search engines have to analyze the corresponding URLs to see the canonical tag. This process can hugely waste crawler resources where there are a large number of affected sites.

Therefore, you should always try to avoid duplicate content. Google recommends avoiding content that is accessible under multiple URLs through the website structure. This avoids the creation of duplicate content and saves on the crawl budget.

Identify Duplicate Content with Ryte

Ryte enables internal duplicates to be detected in just a few steps. In order to do so, select the "Content" → "Duplicate Content" → "Duplicates" report from the Zoom module. The report lists all the duplicates found by the crawler on the web page and how many duplicates the affected URL has. Clicking the magnifying glass in the "Duplicate counter" column displays all duplicate content URLs.

Screen-Shot-2017-11-16-at-10.24.28 Duplicate Content

Figure 3: Duplicate Content report

The different colors represent the various degrees of optimization. Red means there is still a lot of potential for improvement. Yellow bars contain URLs that do not have such an urgent need for optimization, and green means that everything is fine. The number of duplicates are also shown next to each bar.

Click on the magnifying glass to see information about all the duplicates associated with the page.

Screen-Shot-2017-11-16-at-10.24.15 Duplicate Content

Figure 4: Optimizing duplicates

Now all duplicates are listed for the pre-existing original. In this case, optimization is required due to the very high risk of duplicates.

Conclusion

Duplicate content is not a reason for Google to penalize a website, but duplicates nevertheless waste an enormous amount of potential. The above-mentioned tips enable you to avoid duplicate content and tap this potential.

Have You Met the Other Monsters of Website Optimization?

The Internal Linking Monster
The Slow Page Speed Monster
The Bad Canonical Tag Monster
The Orphan Page Monster
The 404 Monster
The Thin Content Monster
The Missing ALT Tags Monster
The Redirects Monster
The Hreflang Monster

Ryte users gain +93% clicks after 1 year. Learn how!

Published on May 30, 2017 by Eva Wagner