All marketers must ensure their websites are indexable to get more users and traffic to their websites, to ensure higher conversion rates. This article helps you ensure your website is indexable.
An important prerequisite of success for your online business is that your website has to be visible to users in the SERPs, i.e. it must be indexable by Google. There are many ways to check whether your website is indexable or not. Ryte can help – you can use Ryte as a step-by-step guide to find any factors that are preventing your website from being indexed. Once you’ve checked these steps and made any necessary corrections, nothing will stand in the way of your website being successfully indexed, leading to increased traffic and conversions.
This is a mistake that can happen even to the most experienced SEOs: You may have accidentally inserted the meta tag “noindex, follow” on your subpages, or forgotten to remove it. This tag is used to ensure that a URL will not be indexed by search engines, and it is inserted into the <head>-area of a web page as follows:
<meta name="robots" content="noindex, follow"/>
This tag can be a useful way to avoid duplicate content, and can also be used for example before a domain transfer, to test the website before the actual launch. (Although when your site then goes live, the Noindex tag should of course be removed.)
Under Search Engine Optimisation in the report “Indexability”, you can check with a few clicks which pages are indexable.
Figure 1: Check your indexability with Ryte
Using the robots. txt file, you can actively control the crawling and indexing of your website by giving specific instructions to the Googlebot as to which directories and URLs it should crawl.
When configuring the file, however, you might have accidentally excluded important directories from crawling, or blocked entire pages. This doesn’t directly prevent your URLs from being indexed, because the Googlebot might find, crawl and index them via backlinks from other websites. However, with a faulty robots.txt file, the Googlebot won’t be able to search all areas of your website sufficiently when crawling regularly. Read this article to find out further mistakes that can be made when configuring the robots.txt file.
You should check your robots.txt for errors especially after making changes. Ryte can assist you here: click on the report “Robots.txt monitoring” under Search Engine Optimisation. Ryte will then provide you with a list of all URLs that are excluded from crawling. With Ryte, you can also monitor your robots.txt file to keep track of any changes.
Figure 2: Check your robots.txt with Ryte
Your .htaccess file may also prevent your page from appearing in the search results and view crawling as unauthorized access. The .htaccess is a control file stored in a directory of the Apache server.
Among other things, website operators use these for the following actions:
Rewriting a URL
Redirecting an old URL to a new URL
Redirecting to the www version of a page
Concrete rules can be defined in the. htaccess. However, for these rules to be executed by the server, the file must always be named exactly the same way in the following cases:
Redirecting or rewriting URLs:
Rewriting require using:
Define the rule that the server is to execute:
RewriteRule seitea.html seiteb.html [R=301]
Of course, it’s possible that the file may have been named incorrectly and therefore cannot rewrite or redirect URLs. As a result, both users and search engines will not be able to access the pages, and they are therefore not crawled or indexed.
A Canonical tag helps Google to find the original URL for multiple URLs with the same content, so that the correct URL can be indexed. The Canonical tag refers a HTML tag with a link to the original page, the “canonical” URL.
When setting Canonical tags, numerous errors can occur that cause problems with indexing.
The Canonical tag refers to a relative side path
The Canonical tag refers to a URL that is located in the Noindex tag
A paginated page refers to the first page of the pagination by Canonical tag.
The Canonical tag refers to a URL without a trailing slash
How to check your Canonical tags with Ryte:
Ryte has its own report for Canonical tags under Search Engine Optimisation. In the report “Canonical status codes”, you will quickly get an overview of possible problems with your implemented canonical tags.
Figure 3: Check canonical tags with Ryte
Another reason why a website or URL can’t be indexed could be because of a server failure. This makes it technically impossible to access a page.
Servers also play an important role for search engine optimization for many reasons. For good rankings, you need a fast and efficient server. If it’s slow, there will be delays in the loading time of your website which users don’t like, resulting in a high bounce rate and low average time on page. Google classifies these KPIs as being negative for the user experience, which of course has a negative effect on SEO.
Under Quality Assurance or Web Performance, you can regularly check your server; the feature “Server monitoring” keeps you informed about failures and time-outs so that you can act quickly.
Figure 4: Server monitoring with Ryte
Tip: Check the HTTP status codes of your site regularly to see if 301 redirects work correctly or if 404 status codes exist. Pages with this status are untraceable to potential readers and web crawlers. Links referring to such pages are called “dead links”.
When you restructure your website, or add new categories, these new pages might not be linked internally. Furthermore, if these new URLs are not listed in the sitemap.xml and are not linked from external sources, there is a high risk that these pages will not be indexed. Therefore, try to avoid orphaned pages at all costs.
Figure 5: Find pages without incoming links
Ryte Website Success shows you orphaned pages quickly. To do this, click on the “Pages without inbound links” report in the “Links” section.
External duplicate content means that an external internet page takes the content from your page. Although Google has now become better at working out which is the “original”, it’s possible that a page with your content may rank better than you, or in extreme cases, prevent your content from being indexed at all.
The following tip will help you prevent content theft:
When publishing your content, ask for a reference to the original source. By pointing out the conditions for the transfer of these text elements in advance, you avoid external duplicate content. Publishers can either use a specific notice such as “original text on www.yourpage.com” or they can set a canonical tag to the URL where you originally published the content.
To find external duplicate content, you can simply copy some relevant text lines from your page and type them into Google search. If several results with exactly the same content appear without a link to your page, it is obviously content theft.
If you label your internal links with the rel=”nofollow” attribute, the Googlebot will not follow the link, and you will be be preventing the correct crawling of your website, as if you prevent the Googlebot from following a link, it may not be able to reach into deeper areas of the page. Some URLs will therefore no longer be crawled, meaning that the chance of their indexing decreases.
If you are working with internal nofollow links, you can check with Ryte where to find them. We then recommend that you remove the nofollow attribute. If you really want to exclude a URL from indexing, the noindex tag in combination with the “follow” attribute is better suited.
When creating a sitemap, it’s possible that the sitemap may not contain all URLs to be indexed. This creates a similar problem to the orphaned pages, because there are no links to the URLs concerned. If this happens, there is a high risk of missing indexation.
Search Engine Optimisation can help you with this. Go to the report “Status codes of files in sitemaps”. There you will be shown all URLs of the sitemap that are either not found on the server, or are redirected.
Figure 6: Check your sitemap.xml for mistakes with Ryte
You can also check your sitemap for errors with the Google Search Console – a warning notice indicates possible problems with indexing.
Google is keen to provide its users with the best possible quality in search results. Chopped pages massively restrict this quality. Be sure to look for clues in the Google Search Console. If your website has fallen victim to hacking, it’s time to stay calm. A first step would be to change the passwords for accessing the backend, if possible.
To prevent hacks, you should change passwords regularly and limit the number of password users as much as possible. In addition, it’s important that you always install all offered updates. Google provides further information and advice regarding hacking in the webmaster central blog.
There can be many reasons why your website or individual URLs are not being indexed. With Ryte, you can easily find and resolve errors, leading to better indexability of your website which will lead to better rankings and more website success.
Check your URLs for the no-index tag. Unless this tag is completely necessary, change it to “index, follow”.
Check your robots.txt file and see if important directories are excluded from crawling.
Check this file for incorrect redirects or syntax mistakes.
Check that these tags correctly refer to the canonical URL.
Server availability and status code registrations
Monitor the availability of your server and check the status codes of your URLs.
Find pages without incoming links and create internal links.
Check whether external websites use your content. Create canonical tags and avoid relative URLs.
Search for nofollow-tags on your website, and remove them. Alternatives are canonical- or noindex-tags.
Check whether your sitemap contain all URLs to be indexed, and check the status codes of the URLs.
Look out for warnings regarding hacked pages in the Google Search Console, and, for example, change your login details.
Published on Jul 4, 2020 by Olivia Willson