From Jungle to Garden Plot: More Search Traffic Through Fewer URLs

Many websites resemble a URL jungle. This costs high rankings and traffic. You can prevent this by monitoring the URL scope of your website and identifying and removing unnecessary URLs.

The highest rankings, the best content, the strongest links – SEO is mostly about generating more traffic. Many websites today resemble more of a URL jungle than a well-tended garden plot. Even if Aunt Google doesn't have a problem with a little chaos, it nevertheless likes those websites best that pay attention to this.

How can I generate more traffic with fewer URLs? We want to tackle this problem today.

1500x800-Grafik website optimization URL structure URL

The Vision – What Advantages does a Lean Website have?

Initially, "More traffic through fewer URLs" seems counter-intuitive. If my website has fewer URLs, it can rank on fewer topics (some still say "keywords"). If every URL of a website had the best content on another topic, this would then be correct. However, in practice it isn't really so. Stated more precisely: never.

Fewer URLs have even more advantages. To name just a few:

Fewer error sources for duplicate content, index management, internationalization, etc.
Stronger focus on page rank and user signals in the important URLs
More time for the important URLs of your website
Even the search bot will have more time and will be less distracted (keyword crawl budget)
More oversight with less administrative expense
Relaunch with 10,000 URLs instead of 1,000,000 URLs? The administration will kiss your feet and you will save a lot of nerves and content

Staking out Websites – What Belongs, What Should Go?

Has a visit to your website, up to now, felt like an expedition in a canoe as you paddle into an unknown region of the Amazon? In order to avoid this in the future, we would first like to provide you with an overview of the entire URL scope of the website in order to correct large blunders. The resulting goal is that only one version of your website will be reachable online. Let's say:

https://www.mywebsite.com

What are the Sources for Website Duplicates?

Protocol duplicates: Does the website invocation lead from http://meinewebsite.de correct (per 301) to https://meinewebsite.de or vice versa?
Prefix duplicates: Does the website invocation lead from www.meinewebsite.de correct (per 301) to meinewebsite.de or vice versa?
Subdomains and subfolders: Are there subdomains or subfolders that are reachable via the additional versions of the website?
Domains: Are there additional domains and do they lead correctly (through 301) to the actual website domains upon invocation (www.auchmeinewebsite.de auf www.meinewebsite.de)?

How do People find the Website Duplicates?

Through Server Access

You should first have access to the server on which the website lies. In this way, you can monitor which folders are stationed on the server and how they can be linked with the folders on the website.

Ideally, you will likewise have access to the appropriate website hosting interface. Here you can manage over which domains and subdomains the website can be reached and whether, for all other possibilities outside of our https://www.meinewebsite.de, the correct 301 redirects are placed. Don't forget to check the settings for the protocol (http or HTTPS) and for the prefix (www or not) for each domain/subpage.

domain-interface website optimization URL structure URL

Figure 1: You can control how domains behave in the website hosting interface

Per Website Tracking

If you work for a larger company or client, IT frequently cannot or does not want to give access to these sensitive systems. Then there is nothing else for you to do than to check the website from the outside.

Therefore, it is recommended to take a look into your website tracking. Because we're looking for self-inflicted website duplicates that are 100% identical with our main version, the same tracking pixels are installed on each. We make this available.

It is particularly simple when you use Google Analytics. Set up the following filter that will show you not only the URL path, but also the host names. Then allow Google Analytics to track the website access for a month (the filter works - as is usual in Google Analytics - only on new data, not on previously-assembled data).

hostname_anzeigen website optimization URL structure URL

Figure 2: With this filter, you see the host names of the URL through Google Analytics

After Google Analytics has collected the new data, you can learn in the report Behavior -> Site Content -> All Pages whether tracking has been implemented on the URLs that do not correspond to your domain with the correct protocol and prefix.

By External Crawler

In addition, it is beneficial to send an external crawler over your website. Ryte, for example, offers this. Set up the crawler so that it also crawls subdomains. Then check the results for whether there is a multitude of URLs that have a wrong prefix or protocol that are coming from a domain or a subfolder.

It is also beneficial to take a glance at the external links. Are you finding domains here that are strikingly similar to the main version? Check everything suspicious carefully in order to find out whether there is actually a duplicate.

Pro Tip: It doesn't hurt to ask earlier/current agencies and providers once whether website duplicates are still dormant there. Occasionally you encounter providers who think it is a good idea to place 100% duplicates of their customers' websites online. Such agencies view "SEO" as standing for a Korean woman's name instead of for online marketing...

Rough Cut – Pack the Wheat with the Chaff

After we have drawn the borders of our website and have removed everything that works against it, we will commit ourselves to your specific wishes. The first task is to check all areas of the website and to remove everything that

Does not serve the goal of the company.
This hinders you from ranking as well as possible in search engines.

URLs that serve the company goal are assets. URLs that don't do anything are ballast and hinder you from making your website visible, user-friendly, and search engine-optimized.

“Service URLs”

The most obvious candidates are "service URLs," as I call them (imprint, data protection declaration, General Terms and Conditions, shipping terms, etc.). These URLs are necessary for legal purposes in order to operate a website (or online shop), but overall they do not bring us much. Because we cannot remove them, in the robot meta tag, we enter no-index and think no further of them.

Internal Search Results and Filter URLs

Internal search result sites are likewise recurrent candidates that swell our own search engine index. But even these we cannot remove (because they are dynamically generated), thus we enter no-index in the robots meta tag.

Filters in online shops are also inclined towards producing many unnecessary URLs, with the filter being set up as URLs with attached filter parameters. So that this does not take place, filters are ideally realized using JavaScript and sort products or similar items directly on a site without having to load them anew and change the URL. If this is not possible (thus no budget free for IT), a canonical can always ensure that the filter sites do not land in the index.

WordPress Image URLs

Do you use WordPress? Then click once on your images and check whether they are being opened with a new URL. If yes, our apologies, you have too many URLs [number of images] on your website. Unfortunately, this is the standard setup of WordPress. If you use MySQL, you can correct it for the entire website directly in the database. If not, there is nothing else to do but to correct it manually for each picture.

wordpress-einstellung-bild-verlinkung website optimization URL structure URL

Figure 3: "None" is not the setup we're looking for.

Still: WordPress notes the last choice in this menu. If you have thus chosen "no URL," this setting is also pre-selected for the next image.

Dead Areas of the Website

The website "features" – as you regularly find them on websites with long histories – are exciting. A guestbook from the beginning of the millennium, a dead (marked as spam) forum, or a blog with a few poor articles from a content project begun in the short term and then forgotten (naturally with commentary spam). For such areas of a website, the decision to use the ax is simple. These URLs are stones around the necks of the website. Don't think too long about optimization. This costs additional time. Instead: Get on with it!

(Do you need supporting arguments for your boss? For website tracking, look at the page views in this area. This will help you to establish that this is really just dead weight.)

Fine Cutting – you only need URLs once

After even the obviously dead areas of the website are removed, the next step is that of the individual URLs. First, likewise check for duplicates.

Are products in an online shop assigned categories that each creates an individual URL? Your article has many tags, and your CMS places the day in the URL every day and thus makes it unique? There are many possibilities for why the same content can be reached via several URLs on your website. It is almost always unnecessary and it burdens the software of your website.

How do People Find URL Duplicates?

Here, reliable website crawlers can come into use. Check your content in Website Success. It will reliably show you the possible duplicates on your website. I also check the word count of the URLs. An identical word count is a strong signal for a duplicate.

Frequently, clear structure can be quickly recognized. Products that are arranged in several categories so that each produce an individual URL are the classics in 95% of all online shops.

How do you Proceed with URL Duplicates?

There are many ways to proceed with duplicates: Remove them or insert a canonical. The robot meta tag no-index or a crawl prohibition through robots.txt are not options; you will see why below.

Remove

Ideally, you can remove the cause for the duplicate, because URLs that no longer exist can no longer create problems (in the long-term). This does not always happen without a developer, because the shop / the CMS, in this one instance, has provided individual URLs, and your IT does not have to learn how to get along without the software.

This solution is, by far, the most sustainable and the most secure way of proceeding in order to avoid further duplicates in the future. No trainee can forget your inter-divisional, long-rehearsed, duplicate-avoidance protocol. Out of zeal for their combat for your premium items, no one can inadvertently publish on one of a thousand of your URLs instead of on one URL (bye-bye, high rankings and traffic boost). And no one has to be compelled to click through the website in search for duplicates at regular intervals.

When you remove duplicates, think about 301 redirects, which receive a noteworthy number of page hits.

Canonicals

If removing alone is not possible, you can also fall back on the canonical. With the implementation of a canonical, you tell search engines that this URL is only a copy and that they can take the canonical version from your index.

Unfortunately, the canonical is seen by Google as just a notice, not as a firm rule, such that it regularly happens that not only your canonical version appears in the search results, but also the duplicate.

The positive effect (or it should be positive) that all PageRank and all user signals of the duplicate of the canonical version are included and thus strengthen these.

Noindex

The opportunity to place a robot meta tag no-index is not really an alternative.

Advantage: Search engines recognize no-index as a firm rule, that means a URL with no-index does not appear very securely in the search results.

Disadvantage: With the use of no-index, there is no focusing effect like with the canonical (all PageRank and user signals of the duplicate are arranged in the canonical version). The PageRank is spread across the internal linking to the website, but unfortunately is unfocused. The user signals are forfeited completely.

Crawling Prohibition

The last possibility for imposing a crawling prohibition through robots.txt is likewise not the best idea.
Advantage: It saves the crawl budget.
Disadvantage: Google doesn't like crawl prohibitions, because then Google doesn't know what's hiding on the URL. Thus, Google cannot guarantee its users that the visit is secure. In the worst case, this will result in the evaluation of the rest of the website. Moreover, a crawling prohibition is not an indexing prohibition. That means that a URL that may not be crawled can be indexed and not only with the minimal information that Google receives about the internal or external linking of the URL. That doesn't look good in the SERPs and ranks accordingly.

Refinement – what doesn’t produce has to go

It is frequently not complete areas of a website that perform badly, but rather there are usually a few URLs that reduce the entire performance of a website area. URLs without content, products that are never bought, or items that are not read are examples of this. Here, too, it can pay to sort these out so that users and search engines find only high-quality content and reward these with good signals and better rankings.

In order to find out which URLs are received particularly well or particularly poorly by users and search engines, a look into website tracking and into the Google Search Console can be helpful.

Please ensure: In particular for large websites, not all URLs have the same tasks. It thus makes sense to compare the performance of the URL with that of other URLs with the same task instead of with the average values of the entire website. Google Analytics offers the opportunity of clustering URLs in areas using content groupings .

content-grouping website optimization URL structure URL

Figure 4: Evaluation of content groupings in Google Analytics

The following performance indicators are suitable for evaluating the performance of URLs:

Google Analytics

Number and scope of the conversions (Product URLs)
Number and scope of the goal achieved
Site visits
Stay time
Bounce rate
Entrances and exits

Google Search Console

Number and relevance of rankings
Ranking position
Impressions

The more key figures a URL is poorly divided into, the clearer the case for erasing the URL. Of course, it also helps to place a no-index with the strong intention of later optimizing the URL. From our experience, "I'll optimize that later" is, however, the most common way of giving away search machine traffic for a long time. It's better to do it right away.

Also, it is naturally necessary to be (self-)critical, even if it hurts. Because "Content is king," websites suffer from poorly-designed content. Be honest with yourself and your company. Would you yourself really want to read that? Do you really want your company to be known for having poor content? Of course not!

When you show your boss the visit and conversion numbers in three months, he will thank you for having the courage to say to his face: "That's hurting us, it has to go!"

Conclusion – your Website’s Garden Plot

We have laid the foundation from the large picture to the small details of your website and removed sources for unnecessary URLs. In practice, this can lead to dramatic changes to the number of available URLs.

Example: After the complete analysis of a client's website (supplier with a small online shop with 50 products), we found 180,000 URLs. Of these, Google had already indexed 3,000 URLs. After the close of our optimization, only 2,420 URLs remained, of which Google had previously indexed 1,490. Since then, the website has had clearly higher rankings and traffic.

At this time, we are working for a client for whose online shop we found over 3,000,000 URLs (fewer than 5,000 are indexed). After the optimization, approximately 100,000 URLs remain. We look forward to making the optimizations in Google noticeable.

Ryte users gain +93% clicks after 1 year. Learn how!

Book a demo

Published on Jul 27, 2017 by Eico Schweins

Eico Schweins

Ryte users gain +93% clicks after 1 year. Learn how!

Book a demo