What is Indexable?: Check if Web Content is Indexable

In this article, we explain in detail the features of Ryte’s indexability report. The report gives you an overview of the indexable pages of your website.

Ryte's indexability report analyzes your website in detail, and shows you which pages of your website are indexable. In this article, we'll explain the features of the indexability report and show you how you can use the report to achieve optimal results.

The Indexability Report

The report is found in Ryte's module Website Success, and consists of three components:

Bar chart
Description of the document type
List of URLs

Screen-Shot-2017-12-07-at-10.30.30 Indexability

Figure 1: The indexability report on Ryte

Why Use the Indexability Report?

You want to ensure that your pages are ranked highly. For this to happen, the search engine crawler firstly has to crawl the pages and add them to its index. Making sure that your website is indexable is therefore an important prerequisite for ranking. Ryte's indexability report shows you the URLs of your website that are indexable by search engines, as well as those that are not indexable for various reasons.

Possible reasons why website content is not indexable:

- paginated pages (rel-prev, rel-next)
- pages that have the "nofollow" attribute in the robot’s tag
- pages that contain a redirect
- pages that have a canonical tag
- error pages
- pages that are blocked using the disallow command in the robots.txt

The report therefore gives you a clear overview of the indexable and non-indexable pages. You should take a closer look at the non-indexable pages to find possible problems.

Side note: Crawl budget

Bear in mind that search engines have limited resources (crawl budget) and therefore may not cover all areas of your website per crawl session, particularly if your website has more than a few thousand URLs. If there are certain pages you don't want to have indexed, for example if they have no value for the user, you should instruct the search engine bot not to crawl these pages, for example by using the nofollow command in the robots.txt. This way, you essentially "present" the search engine with the most important URLs, without wasting the crawler's resources.

Example:
The SEA department regularly has stand-alone pages for its campaigns. These pages are saved in a separate directory (/sea). Over time, hundreds of landing pages accumulate. Since the campaign pages are specifically created for SEA, they are not well-suited for ranking in the organic search. Such pages should therefore be excluded from crawling, as they do not need to be indexed.

Indexability Based on Categories

The first part of the Ryte's indexability report contains an overview graph in the form of a bar chart, with sections marked in three different colors.

Green: Everything is okay! No action is needed for these URLs. These pages are usually indexed without any problems.

Yellow: These pages are indexable, but it would be worth double-checking to verify whether or not they are indexed.

Red: This indicates that there could be an error with these URLs, for example they could have been inadvertently excluded from the robots.txt. You should definitely check these.

Screen-Shot-2017-12-07-at-10.31.10 Indexability

Figure 2: The bar chart above the indexability report

Red section

It is advisable to start with the red bars, as these show potential errors with your website. Clicking on one of the red bars, i.e. "broken" or "disallowed via robots.txt", activates the respective filter and you will see a list of the URLs that are not indexable.

Screen-Shot-2017-12-07-at-10.33.51 Indexability

Figure 3: Only the non-indexable pages are listed after clicking on the bar "broken"

The opportunity: Non-indexable pages with a high OPR

The OPR (OnPage Rank) is an indication of how strong a URL is in a domain. It is calculated based on the page’s ranking by Google. Pages that are well linked have a higher OPR. Non-indexed URLs that have a high OPR are therefore a waste of link power. Here, you might want to consider whether it would be logical to make the URL indexable, and link to other pages that could profit from this link power. The indexability report helps you sort the entire left column based on the OPR and, therefore easily identify pages that have a high OPR.

Yellow section

The yellow bars in the indexability report usually show URLs containing:

Paginated pages

Paginated pages have a rel=prev, rel=next attribute and tell the search engine that a previous and subsequent page exists. As soon as the bot comes across this attribute, the search engine recognizes that this is a list of similar pages and does not index these pages. You can found out more about pagination here.

Redirects

Here, the pages either contain a status 301 or 302 redirect, meaning that they refer to another page. The search engine bot therefore simply moves on and does not index the redirected page.

Canonicals on other pages
If a page contains a canonical tag referring to an alternative URL, the search engine does not index this page.

Side note: Indexable URLs without canonicals

The canonical tag is the best way of avoiding duplicate content. For example, if you are tracking a campaign and have added a certain amount of campaign parameters to your URL, every URL with the same content will be accessible from different URLs if a canonical tag is not used. This would be seen by the search engines as duplicate content, and would have a negative effect on the ability of the pages to rank.

noindex via robots tag

This attribute speaks for itself. In the robots meta tag, the noindex attribute instructs the search engine not to index the page.

Indexability based on the type of document

Below the bar chart is an overview of the document types found in the report. Clicking on a specific document type (e.g., HTML, PDF, image, etc.) activates the filter.

Screen-Shot-2017-12-07-at-10.34.23 Indexability

Figure 4: Easily filter out document types

Assume the report is showing a large number of PDFs that are indexable. You should first ask yourself if the PDFs should be indexable. Although PDFs are effective, stable, and often ranked highly by search engines, they also present a dead end for the user. When the user ends up on this PDF, he/she can only go back to your website by changing the URL.

You can identify the indexable PDFs with these steps:

click on the green bar marked "indexable"
click on the "application/pdf" icon in the document type box
set a canonical tag for all your PDFs on the corresponding landing page that points to the PDF. This way, users will be redirected to the landing page from the SERPs and not directly to the PDF. After downloading the PDF, the page can then be visited as usual since it has a link to the homepage, navigation, etc.

Ideally, the green bar should be the highest with much more contents than the non-indexable content. However, this can still give you a lot of information. The following are examples of how you can review pages in a more detailed way. Add extra columns to the indexability report in order to analyze these situations.

1. Indexable URLs that are not in the sitemap.
If they are meant to be indexed, they should also be in the XML sitemap. You can find out which URLs are in the XML sitemap by clicking on the menu point "Sitemaps" "Included in sitemaps"

Screen-Shot-2017-12-07-at-10.36.10 Indexability

Figure 5: Are the indexable URLs in the sitemap?

2. Indexable URLs that only have a few inbound links
A good internal link structure is also very important for the ranking. In order to ensure that users find the pages, indexable URLs should have a sufficient number of internal links.

3. Indexable URLs that have a long click path
The click path measures the shortest path from the homepage to the respective URL. If navigating to a URL requires too many clicks, search engines perceive this page as not being very important.

Tip: You should have a closer look at URLs that are too far from the homepage and consider whether it would be logical to add links from one page to another in order to improve the internal navigation and, thus, raise your rankings.

4. Indexable pages that do not have a title and description
Indexable URLs that do not have a title and description can also be listed in the search engine results. If these two meta tags are missing, the pages could be ranked poorly and the description will be generated by the search engine itself. You should make use of the snippet to encourage users to click.

5. Indexable pages with thin contents (less than 300 words)
Pages that contain little content are often ranked lower than those with relevant contents. Identify all indexable pages on your website that have less than 300 words. This will help you prevent the search engine from realizing that your domain has so-called thin content.

The indexability report not only gives you a very good overview of the non-indexable content, but also shows you shortcomings of your pages that are already in the search engine's index. Regularly keep track on the indexing of your pages, as this is the alpha and omega of successful search engine rankings.

Happy Analyzing!

Ryte users gain +93% clicks after 1 year. Learn how!

Book a demo

Published on Feb 2, 2016 by Olivia Willson

Olivia Willson

After studying at King’s College London, Olivia moved to Munich, where she joined the Ryte team till 2021. She was previously in charge of product marketing and CRO, and also helped out with SEO and content marketing. When she's not working, you can usually find her outside, either running around a track, or hiking up a mountain.