Are you struggling to control all the different URLs automatically created by faceted search issues on your site? Let’s fix that.
Faceted search problems can be the bane of your life if you work in SEO for an ecommerce or publishing website. So many, many possible URLs can be generated by combining different filters that it makes your head spin.
But what is faceted search, what causes the problems, and how can you fix them? Read on…
A faceted search or faceted navigation, most prevalent in ecommerce sites, is a method of allowing website visitors to filter results with a variety of specified enhancements, sortings and details, collectively known as facets.
Usually when presented with a large selection of products on a category page, it’s a best UX practice to provide visitors with relevant facets to help them narrow down their search results and find the ideal product they have in mind.
Each of those filters can append the category page’s URL with additional parameters and generate an own unique page version. Often these filters can be used with unlimited combinations, which could result in a 100-page strong domain leading to thousands or millions of indexable URLs.
For example, my t-shirt category page izzishop.com/products/tshirts/ has many facets available for the user to narrow down results such as color, style, pricing, brand, whether there are cats on the shirt, size, material, and more. When these facets are applied, this could turn the base URL into many different versions each with an own indexable page, like:
izzishop.com/produkte/tshirts/?size=14
izzishop.com/produkte/tshirts/?price=0-20?contains-cats=yes?color=purple
izzishop.com/produkte/tshirts/?style=baseball?price=20-50?size=12
It can sometimes make sense to index these faceted navigation URLs, but only when there is significant demand, potential value, and enough products to justify its sole existence. There is worthwhile search demand for a lucrative long-tail query like “black leather shoes size 42”, so it’s important for a domain to ensure there is an optimized URL to target that search term.
This is why it’s an intelligent strategy for us SEOs to properly analyze the performance of faceted navigation URLs, include and optimize the valuable pages, and cut out all the useless weight that might be bringing down the strength of our whole domain.
Despite increased usability for your website visitors and product browsers, faceted navigation has the potential to cause several critical issues for search engines.
One of Googler John Mueller’s most quoted points of 2018 was “crawl budget is overrated…”, which is (for the most part) true. Many domains with a keen SEO eye may seem concerned about pruning their deadweight content for such a reason, but in the grand scheme of the web there is not much cause for concern.
However, a domain with rogue faceted search URLs can still turn hundreds of pages into millions and cause several unwanted problems, such as:
Providing multiple versions of similar URLs with little difference or value is a huge flag for Google that you’re producing duplicate pages. This can result in a manual penalty that leads to dramatic traffic loss, or an algorithmic penalty that gradually punishes domains with low value and weak content over time.
This means that even if you have fantastic, high-quality pages that deserve to perform well, the increased share of duplicate URLs surrounding those can be detrimental to your entire website’s performance.
When multiple facets and their combinations create own URLs, you have the potential to harm the way Googlebot and other search engine bots efficiently crawl your site and prioritize the content they index.
The search engine bot may then limit crawling resources for your domain in future visits. This is especially problematic for sites that rely on their fresh articles or new inventory being quickly indexed and ranking.
We should always aim to implement a logical and strengthened internal linking structure that passes and effectively shares link equity. Faceted search can weaken that structure, as you are spreading strength to that multitude of undeserving URLs. Not good.
A quick way of identifying if your faceted search URLs are being indexed en masse is by using the simple site search operator with the inurl function for common filter strings.
For example:
site:domain.com inurl:/filter?
site:domain.com inurl:price=
If presented with an unreasonable amount of indexed URLs, as well as the questionable “supplemental index” result, you know there is an issue to rectify.
In the screenshot below, I’ve carried out a site search for Walmart’s domain with its price filter syntax that’s returned a whopping 14,000 results. Whether or not Walmart intended these URLs to be indexed, they should at least aim to make them unique and valuable in their own right, for example by adding the filter label to the page title.
Using site search operators is a quick way of finding if you have a faceted navigation problem, but in order to find and fix big problems, you’ll need a little help from a website crawler like Ryte’s.
Our platform uses a robust crawler to provides you with detailed insights and actionable reports like the “Duplicate Content” report, which highlights all cases where two or more of a website’s URLs feature a high percentage of the same content, and/or code. As we can see, some of Walmart’s product and category pages have hundreds of duplicate versions that are indexable.
Keyword cannibalization occurs when two or more pages from a domain are competing to rank well for a single query. This is a clear sign that Google can’t determine the most relevant page to serve highly in the SERPs, as there are conflicting signals and no clear winner. It’s also evidence of duplicate pages being indexed and generating impressions and potential clicks.
Using incredibly reliable Google Search Console data, our “Cannibalization Report” gives you a direct view of this SEO issue, where a query’s result yields a high number of your competing pages. Be sure to utilize this report in order to find highly critical situations where your faceted search pages are potentially harming the performance of your optimized pages.
Server log files give you insights into the specific Search Engine bots crawling your website, which user agents were detected, and which URLs are being accessed. Carrying out a log file analysis can give you the data of whether faceted navigation URLs are being crawled, and whether this is a large scale problem for you.
However, log files are heavy, tricky to analyze and that’s even if you can get so far as to acquire them from your IT teams or webmasters. At Ryte we have a fancy report called “BotLogs” under Search Engine Optimization, which allows you to monitor search engine bot activity easily without the need for accessing and parsing server log files.
Before carrying out the steps to clean up your faceted search URLs, make sure to check if there are pages that should stay findable and indexable so that you can reap valuable traffic and conversions via those relevant, searched-for queries.
To remain indexable, these pages should meet the following requirements:
Have a reasonable amount of results yielded by the search. A search result or filtered down category page that’s empty or returns only a few results remains thin. Ryte Tip: set up custom snippets in your crawl settings to extract the count of products per page, and then filter out any lower than a significant amount.
Bears a reasonable or a significant search volume for the product’s focus topics, which can be characterized with keyword research, and/or by assessing existing incoming users to the page.
Be unique in its own justifiable right. Remove any possible duplicate cross-combinations (e.g. size 42 + red vs. red + size 42) that may have fallen through the gaps. Make sure to additionally optimize these pages to help them perform, such as providing descriptive titles.
Ideally you will also potentially come across lucrative, high-demand facet pages that deserve to have their own unique and well-performing category page! For example, if I saw during my analysis that “cat t-shirts under €20” was one of the highest visited faceted URLs, I could simply create its own landing page to promote it within the site’s hierarchy.
Providing links to the facets that are created client-side (via JavaScript) and then blocking the resources that a Search Engine needs to generate those URLs within the robots.txt means that they can’t access them, nor can they then index the URLs. Although it can be the trickiest to implement, this is the most efficient solution as you won’t be sacrificing any crawling resources.
When building a site with faceted search, ensure that URLs appended by the facets contain the meta robots directive “noindex” by default. You can alternatively specify the noindex directive as an “x-robots” tag within the HTTP response header.
Note: Refrain from using “disallow” directives within the robots.txt for these pages as this will result in the crawler not being able to access the URL to register the “noindex” directive. Make sure to exclude noindex directives from any relevant faceted search URLs you wish to remain indexed.
Specifying a single rel=canonical that points to the correct version can assist search engine bots identify the URL that deserves to be ranking. However, this may not always be reliable or a strong enough signal on its own, so it works best when paired with the noindex directives. If some faceted search URLs are being linked to externally, or receiving a decent amount of internal page equity, Google can choose to ignore the canonical you’ve provided.
Huge online fashion retailer Zalando has largely avoided faceted navigation problems by correctly using the robots directives to specify “noindex, follow” (follow to ensure proper link equity is still being passed across their pages), as well as the canonical to the overview category URL.
However, they have still referenced the faceted navigation version in the hreflang tags. Not only is it bad practice to reference a non-indexable URL within a translation alternate, but you’re also providing another accessible reference and signal to that page.
Nofollow links may have been used for link equity sculpting purposes, but this is not always a smart choice for larger domains with complex architecture. Nofollow is, in the first instance, a hint not a directive meaning Google can choose to ignore it anyway, and it can also be a firm signal to Google that the target URLs are untrustworthy.
Naturally, we want Googlebot to pay no attention to our weak faceted navigation pages, but we still consider them to be valuable for internal website users.
As I previously mentioned, sometimes unruly faceted navigation can cause horrific combinations due to the distinct order they were applied. By enforcing a logic that controls the order in which they append the URL string, you can further reduce the risk that cross-combinations spin out additional variates.
Faceted navigation has the power to multiply a select amount of meaningful and relevant URLs which can lead to manual or algorithmic punishment from Google and a large decline in rankings and traffic all over.
Make sure you’re adopting the correct methods to identify and clean up any irrelevant indexable faceted navigation pages accordingly to ensure your important pages are performing as awesomely as they should be.
Published on Mar 20, 2020 by Izzi Smith