10 Steps for Checking Your Website’s Indexability

All marketers must ensure their websites are indexable to get more users and traffic to their websites, to ensure higher conversion rates. This article helps you ensure your website is indexable.

An important prerequisite of success for your online business is that your website has to be visible to users in the SERPs, i.e. it must be indexable by Google. There are many ways to check whether your website is indexable or not. Ryte can help – you can use Ryte as a step-by-step guide to find any factors that are preventing your website from being indexed. Once you’ve checked these steps and made any necessary corrections, nothing will stand in the way of your website being successfully indexed, leading to increased traffic and conversions.

Step 1: Check Your Pages for Noindex Tags

This is a mistake that can happen even to the most experienced SEOs: You may have accidentally inserted the meta tag “noindex, follow” on your subpages, or forgotten to remove it. This tag is used to ensure that a URL will not be indexed by search engines, and it is inserted into the <head>-area of a web page as follows:

This tag can be a useful way to avoid duplicate content, and can also be used for example before a domain transfer, to test the website before the actual launch. (Although when your site then goes live, the Noindex tag should of course be removed.)

Under Search Engine Optimisation in the report “Indexability”, you can check with a few clicks which pages are indexable.

Figure 1: Check your indexability with Ryte

Step 2: Check your Robots.txt File

Using the robots. txt file, you can actively control the crawling and indexing of your website by giving specific instructions to the Googlebot as to which directories and URLs it should crawl.

When configuring the file, however, you might have accidentally excluded important directories from crawling, or blocked entire pages. This doesn’t directly prevent your URLs from being indexed, because the Googlebot might find, crawl and index them via backlinks from other websites. However, with a faulty robots.txt file, the Googlebot won’t be able to search all areas of your website sufficiently when crawling regularly. Read this article to find out further mistakes that can be made when configuring the robots.txt file.

You should check your robots.txt for errors especially after making changes. Ryte can assist you here: click on the report “Robots.txt monitoring” under Search Engine Optimisation. Ryte will then provide you with a list of all URLs that are excluded from crawling. With Ryte, you can also monitor your robots.txt file to keep track of any changes.

Figure 2: Check your robots.txt with Ryte

Step 3: Check your .htaccess File for Errors

Your .htaccess file may also prevent your page from appearing in the search results and view crawling as unauthorized access. The .htaccess is a control file stored in a directory of the Apache server.

Among other things, website operators use these for the following actions:

Rewriting a URL
Redirecting an old URL to a new URL
Redirecting to the www version of a page

Concrete rules can be defined in the. htaccess. However, for these rules to be executed by the server, the file must always be named exactly the same way in the following cases:

Redirecting or rewriting URLs:

RewriteEngine On

Rewriting require using:

RewriteBase /

Define the rule that the server is to execute:

RewriteEngine On
RewriteBase /
RewriteRule seitea.html seiteb.html [R=301]

Of course, it’s possible that the file may have been named incorrectly and therefore cannot rewrite or redirect URLs. As a result, both users and search engines will not be able to access the pages, and they are therefore not crawled or indexed.

Step 4: Test your Canonical tags

A Canonical tag helps Google to find the original URL for multiple URLs with the same content, so that the correct URL can be indexed. The Canonical tag refers a HTML tag with a link to the original page, the “canonical” URL.

When setting Canonical tags, numerous errors can occur that cause problems with indexing.

The Canonical tag refers to a relative side path
The Canonical tag refers to a URL that is located in the Noindex tag
A paginated page refers to the first page of the pagination by Canonical tag.
The Canonical tag refers to a URL without a trailing slash

How to check your Canonical tags with Ryte:

Ryte has its own report for Canonical tags under Search Engine Optimisation. In the report “Canonical status codes”, you will quickly get an overview of possible problems with your implemented canonical tags.

Figure 3: Check canonical tags with Ryte

Step 5: Monitor Your Server Availability and Status Error Messages

Another reason why a website or URL can’t be indexed could be because of a server failure. This makes it technically impossible to access a page.

Servers also play an important role for search engine optimization for many reasons. For good rankings, you need a fast and efficient server. If it’s slow, there will be delays in the loading time of your website which users don’t like, resulting in a high bounce rate and low average time on page. Google classifies these KPIs as being negative for the user experience, which of course has a negative effect on SEO.

Under Quality Assurance or Web Performance, you can regularly check your server; the feature “Server monitoring” keeps you informed about failures and time-outs so that you can act quickly.

Figure 4: Server monitoring with Ryte

Tip: Check the HTTP status codes of your site regularly to see if 301 redirects work correctly or if 404 status codes exist. Pages with this status are untraceable to potential readers and web crawlers. Links referring to such pages are called “dead links”.

Step 6: Find Orphaned Pages

When you restructure your website, or add new categories, these new pages might not be linked internally. Furthermore, if these new URLs are not listed in the sitemap.xml and are not linked from external sources, there is a high risk that these pages will not be indexed. Therefore, try to avoid orphaned pages at all costs.

Figure 5: Find pages without incoming links

Ryte Website Success shows you orphaned pages quickly. To do this, click on the “Pages without inbound links” report in the “Links” section.

Step 7: Find Content Theft – External Duplicate Content

External duplicate content means that an external internet page takes the content from your page. Although Google has now become better at working out which is the “original”, it’s possible that a page with your content may rank better than you, or in extreme cases, prevent your content from being indexed at all.

The following tip will help you prevent content theft:

When publishing your content, ask for a reference to the original source. By pointing out the conditions for the transfer of these text elements in advance, you avoid external duplicate content. Publishers can either use a specific notice such as “original text on www.yourpage.com” or they can set a canonical tag to the URL where you originally published the content.

To find external duplicate content, you can simply copy some relevant text lines from your page and type them into Google search. If several results with exactly the same content appear without a link to your page, it is obviously content theft.

Step 8: Identify Internal Nofollow Links

If you label your internal links with the rel=”nofollow” attribute, the Googlebot will not follow the link, and you will be be preventing the correct crawling of your website, as if you prevent the Googlebot from following a link, it may not be able to reach into deeper areas of the page. Some URLs will therefore no longer be crawled, meaning that the chance of their indexing decreases.

If you are working with internal nofollow links, you can check with Ryte where to find them. We then recommend that you remove the nofollow attribute. If you really want to exclude a URL from indexing, the noindex tag in combination with the “follow” attribute is better suited.

Step 9: Check your XML Sitemap

When creating a sitemap, it’s possible that the sitemap may not contain all URLs to be indexed. This creates a similar problem to the orphaned pages, because there are no links to the URLs concerned. If this happens, there is a high risk of missing indexation.

Search Engine Optimisation can help you with this. Go to the report “Status codes of files in sitemaps”. There you will be shown all URLs of the sitemap that are either not found on the server, or are redirected.

Figure 6: Check your sitemap.xml for mistakes with Ryte

You can also check your sitemap for errors with the Google Search Console – a warning notice indicates possible problems with indexing.

Step 10: Regularly Check Whether Your Pages have Been Hacked

Google is keen to provide its users with the best possible quality in search results. Chopped pages massively restrict this quality. Be sure to look for clues in the Google Search Console. If your website has fallen victim to hacking, it’s time to stay calm. A first step would be to change the passwords for accessing the backend, if possible.

To prevent hacks, you should change passwords regularly and limit the number of password users as much as possible. In addition, it’s important that you always install all offered updates. Google provides further information and advice regarding hacking in the webmaster central blog.

Conclusion

There can be many reasons why your website or individual URLs are not being indexed. With Ryte, you can easily find and resolve errors, leading to better indexability of your website which will lead to better rankings and more website success.

Indexing Checklist

Area

Measure

Noindex-Tags

Check your URLs for the no-index tag. Unless this tag is completely necessary, change it to “index, follow”.

Robots.txt

Check your robots.txt file and see if important directories are excluded from crawling.

.htaccess

Check this file for incorrect redirects or syntax mistakes.

Canonical-Tags

Check that these tags correctly refer to the canonical URL.

Server availability and status code registrations

Monitor the availability of your server and check the status codes of your URLs.

Orphan pages

Find pages without incoming links and create internal links.

Content theft

Check whether external websites use your content. Create canonical tags and avoid relative URLs.

Internal nofollow-Links

Search for nofollow-tags on your website, and remove them. Alternatives are canonical- or noindex-tags.

XML-Sitemap

Check whether your sitemap contain all URLs to be indexed, and check the status codes of the URLs.

Hacking

Look out for warnings regarding hacked pages in the Google Search Console, and, for example, change your login details.

Ryte users gain +93% clicks after 1 year. Learn how!

Book a demo

Published on Jul 4, 2020 by Olivia Willson

Olivia Willson

After studying at King’s College London, Olivia moved to Munich, where she joined the Ryte team till 2021. She was previously in charge of product marketing and CRO, and also helped out with SEO and content marketing. When she's not working, you can usually find her outside, either running around a track, or hiking up a mountain.