Canonical Tag


The Canonical Tag is a specification in the source code of a website. It refers to a standard resource - a canonical URL - for websites with the same or almost identical content. If a canonical URL is correctly marked, only this source is used for indexing the search engines. Search engines rate duplicate content negatively because there is no added value for the Internet user. A duplicate content checker can be used to detect duplicate content.

Use cases[edit]

The Canonical Tag is applied when content is used repeatedly or when a definite URL is technically impossible:

  • The starting page can be reached from different URLs (www.domain.com, domain.com, www.domain.com/index.html and so on).
  • Pages can be reached with and without Trailing Slashes (“/”) and with Case Sensitivity
  • Because of URL Rewriting, the server only pays attention to one ID and admits variations of the address
  • IDs (as Session-IDs or product filters) are used that don’t change the content
  • Content is presented in different versions (e.g. print version, PDF etc.)
  • There are HTTPS variants of the site
  • Additional content is being published on other, external websites

It makes sense to include a Canonical Tag on every sub page so that every page links to itself. With that, unexpected errors and wrong links are being adjusted or prevented.

CanonicalTag.png

Two ways of indicating a canonical URL[edit]

In general, there are two ways of indicating a canonical URL. In both cases, Google recommends absolute URLs – meaning the entire web address.

  • The syntax of the first case looks like this:
 

The element containing the canonical attribute is placed in theelement of the source code and complements the document’s metadata. It refers to the standard page, but is only used where sites that are not being treated as original resource exhibit identical content.

Let’s assume there are the following two websites:

http://www.example.com/examplepage.htm
http://www.example.com/examplepage/?session_id=xyz.htm

The first one is now our standard resource. The second one is a session as commonly used by online shops in order to be able to store user related data as e.g. items in the shopping cart. The Canonical Tag is now integrated into the head element of the second page. It contains a reference to the standard resource which is the first page. Like that, Google and Co. will know which page shall be handled preferably and incorporated into the index.

  • If the standard resource is a PDF document or another file type supported by Google, the Canonical Tag needs to be included into the site’s header. The syntax is different and the incorporation requires knowledge of the Hypertext Transfer Protocol (HTTP):
 Link: <http://www.example.com/examplepage.pdf>; rel=”canonical” 

This is not only an indication in the document, but rather an instruction for the answer of the HTTP protocol: If the client (e.g. browser or search engine) sends a request, the server replies that this site is the canonical URL. Sometimes the server needs to be reconfigured.

Let’s now assume there are these two websites:

http://www.example.com/examplepage.htm
http://www.example.com/examplepage.pdf

The second site should be the standard resource. As it is a PDF file, the Canonical Tag needs to be integrated into the site’s header. It refers to itself and tells Google, that the PDF document serves as standard for the indexation.

Background[edit]

With the help of the Canonical Tag, website operators can tell search engines which of the pages with identical content should be handled as standard resource. In order to get duplicate content under control, a properly used Canonical Tag is the first thing to do. As a consequence, webmasters influence the link popularity of sites with identical content and at the same time focus their reputation on a canonical URL.

Use cases[edit]

  • Canonical tags and pagination: When paginating websites with rel= "next" and rel= "prev", each page should refer to itself via Canonical, or no Canonical tags should be used at all. The only exception: If there is a view-all page, the Canonical tag can refer to this overview page with all paginated pages.
  • Canonical Tags and hreflang: If a website works withhreflang, the URLs should either refer to themselves by Canonical Tag or should not use Canonicals at all. If both tags are used, Google receives conflicting signals. While the hreflang tag shows that there is another language version available, the Canonical tag would make this version the original URL.

Canonical Tags and Noindex: With the noindex tag, webmasters can convey to Google that a URL should not be indexed. If a Canonical tag refers to this page, Google receives unclear signals. You want to select a page as canonical, but it must not be indexed. Webmasters should therefore decide whether to choose the noindex or canonical version.

Frequent errors[edit]

At the same time, it is a very powerful tool – if applied incorrectly, websites can be ignored completely by Google. First and foremost, the webmaster should make sure whether it actually is identical or almost identical content because only as and when, Canonical Tags make sense.

Frequent errors are:

  • With pages paginated / numbered with rel="next" and rel="prev", Canonical Tags don’t make sense as technically speaking, this is not about identical content.
  • Moreover, the distinguished website should be accessible: A 404 error code needs to be avoided.
  • Combining “noindex”, “disallow” or “nofollow” tags and canonical URLs is explicitly unwished-for from Google.
  • The Canonical Tag is not to be found in a document’s body and may not be used repeatedly in the meta data.
  • A relative Path is specified as a canonical link target. This may cause the Googlebot to misinterpret the tag and thus lose its effect. For this reason, the link should always be specified as a complete URL in the Canonical tag.
  • The syntax is ignored. It makes a difference if the Canonical tag refers to https://seite.de/ or https://seite.de. Therefore, all characters should always be taken into account when specifying the URL. The same applies to the protocol. For example, the Canonical tag should not refer from https to the http protocol.
  • The Canonical Tag refers to the home page of the domain. In this case, only the start page is interpreted as a canonical URL. As a result, Google may only index them in the medium term.
  • The Canonical Tag refers to the first page of a page. The tag would be set incorrectly because it indicates that there are duplicates of a page. With pagination, the contents of the pages and the URLs are not the same. Google is merely informed that the relevant paginated page is part of a series of pages in the same category.

Alternatives[edit]

The Google Search Console allows webmasters to specify how Google should handle parameters of a website. This can cause the Googlebot to ignore certain URLs of a page.

Web Links[edit]