Canonical Tag


A canonical tag, or canonical link element, indicates to search engines that a master copy of a page exists. It is therefore particularly useful in preventing problems with duplicate content. If the same or similar content exists on multiple URLs, the canonical tag points to the most important page so that Google knows which one to index.

Canonical tag definition[edit]

The Canonical tag is an HTML specification in the source code of a website, in the header area. It refers to a standard resource - the canonical URL - for websites with the same or similar content. If a canonical URL is correctly marked, search engines will index this source only, meaning that duplicate content issues can be avoided. Search engines rate duplicate content negatively because there is no added value for the Internet user. A duplicate content checker can be used to detect duplicate content.

The canonical tag should be used when content exists on more than URL - sometimes this cannot be avoided for the following reasons:

  • The homepage can be reached from different URLs (for example www.domain.com, domain.com, www.domain.com/index.html and so on).
  • Pages can be reached with and without Trailing Slashes (“/”) and with case sensitivity
  • Because of URL Rewriting, the server only pays attention to one ID and admits variations of the address
  • IDs (as Session-IDs or product filters) are used that don’t change the content
  • Content is presented in different versions (e.g. print version, PDF etc.)
  • There are HTTPS variants of the site
  • The URL is still available under a HTTP version without SSL encryption
  • Additional content is being published on other, external websites

CanonicalTag.png

Examples of canonical URLs[edit]

In general, when indicating a canonical URL, Google recommends absolute URLs, i.e. the entire URL including the protocol.

The following two URLs have the same content.

https://www.example.com/example.htm
https://www.example.com/examplepage/?session_id=xyz.htm

The first one is the standard resource, and the second one is a session as commonly used by online shops in order to be able to store user related data as e.g. items in the shopping cart. As the first URL is more important, this should become the canonical version, and the canonical tag should be integrated into the head element of the second page to refer to the first page. This will indicate to Google and other search engines that the first URL is more important, and that it should be crawled and indexed in the SERPs.

The canonical tag is placed in the metadata of the second URL. It looks like this:

<link rel="canonical" href="https://www.example.com/examplepage.htm"> />

When should canonical tags be used?[edit]

  • Canonical tags and pagination: when paginating websites with rel= "next" and rel= "prev", each page should refer to itself via canonical, or there should be a "view-all" page, where all products can be visible in one overview. When using rel="next" and rel="prev", the best case would be to not use canonical tags. Instead, add a robot tag to the meta element of the paginated page (from the second page) and exclude the subpages from indexing.
  • Canonical Tags and hreflang: If a website uses hreflang, the URLs should either refer to themselves with a canonical tag, or should not use a canonical tag at all. If both hreflang and canonical tags are used, Google receives conflicting signals. While the hreflang tag shows that there is another language version available, the canonical tag would make this version the original URL.
  • Canonical Tags and Noindex: With the noindex tag, webmasters can convey to Google that a URL should not be indexed. If a canonical tag refers to this page, Google receives unclear signals, as a canonical URL is the relevant page a webmaster wants to be indexed. Webmasters should therefore decide between a canonical and noindex tag.

Frequent errors[edit]

Canonical tags are powerful. If applied incorrectly, websites or certain pages of a website may be completely ignored by Google, which could be a disaster for traffic and sales. Before implementing a canonical tag pointing to another page, a webmaster should firstly decide whether the content is in fact the same. 

Frequent errors are:

  • Using canonical tags when pages are paginated / numbered with rel="next" and rel="prev": canonical tags don’t make sense in this case as this as is not about duplicate content.
  • The canonical version gives a 404 status code: when a webpage is referred to with a canonical tag, it must be available, otherwise a 404 error page will be given.
  • Combining “noindex”, “disallow” or “nofollow” tags and canonical URLs is explicitly unwelcome.[1]
  • The canonical tag is not to be found in a document’s body and may not be used repeatedly in the meta data.
  • A relative Path is specified as a canonical link target. This may cause the Googlebot to misinterpret the tag and it therefore loses its effect. For this reason, the link should always be specified as a complete URL in the canonical tag.
  • The syntax is ignored. It makes a difference if the canonical tag refers to https://page.com/ or https://page.com. Therefore, all characters should always be taken into account when specifying the URL. The same applies to the protocol. For example, the canonical tag should not refer from https to the http protocol. In January 2017, Google stated that the use of a secure HTTPS connection would become an important ranking factor for websites. Since then, Google has preferred HTTPS pages to canonical URLs.[2]. The Canonical tag should therefore point from HTTP protocol to the HTTPS page, not vice versa.
  • The Canonical Tag refers to the homepage of a website. This would be incorrect as it would indicate that there are duplicates of a page. With pagination, the contents of the pages and the URLs are not the same. Google is merely informed that the relevant paginated page is part of a series of pages in the same category.
  • Canonical chains or cross-references: Incorrect use of the canonical tag results in canonical chains or cross-references. Target pages of a canonical link should not refer to other canonicals.

Alternatives[edit]

With the Google Search Console, webmasters can specify how Google should handle parameters of a website. This can cause the Googlebot to ignore certain URLs of a page.

References[edit]

  1. Mueller (Google) regarding the combination of noindex and canonical reddit.com. Accessed on November 28, 2018
  2. General Guidelines for All Canonicalization Methods support.google.com. Accessed on November 28, 2018.

Web Links[edit]