In this article, we explain the importance of the robots.txt file, and how you can monitor your robots.txt with Ryte’s software to ensure a great website performance.
The robots.txt file is a simple text file in the root directory of the website and contains instructions regarding the domain areas that should or should not be accessed by search engine crawlers. The file uses the Standard for Robot Exclusion, a protocol specifies the access options on the website for the different types of web crawlers. The robots.txt file can also provide information on the various files that are stored in the directory, entire directories, or domains.
Inconsistencies in this file can block the crawling of entire website areas. This could have a drastic impact on business, if, for example, the homepage of your online shop was accidentally blocked from being crawled and indexed by the robots.txt file. It is therefore important to constantly monitor your robots.txt file and check its content.
If the "Example_Directory" should not be crawled by the search engine crawlers, the following syntax must be used in the robots.txt file:
There are many different online tools for easily creating the robots.txt file. Once the robots.txt file is created, it is saved in the root directory of the website from where it can then by accessed by the website crawlers:
Note: Files or directories that are excluded from crawling in the robots.txt file can also be indexed by search engines. An allow or disallow command in the robots.txt file is no guarantee that the page will not be crawled and indexed, for example, if a URL excluded from crawling in the robots.txt file is linked from external page. However, the following often appears instead of the Meta description since the bot is prohibited from crawling:
"A description for this result is not available because of this site's robots.txt"
Figure 1: Snippet example of a page that is blocked using the robots.txt file but still indexed
The robots.txt monitoring in the Ryte Website Success is ideal both for professional SEOs as well as operators of small websites. In large companies, changes are often made to the robots.txt file without your knowledge but for smaller websites, the website operators are often responsible for the changes in the robots.txt file. In both cases, it is important to always check if robots.txt is always accessible and if its content has changed.
The robots.txt monitoring function on Ryte is very easy to use. The report can be found in Website Success under the heading "Robots.txt monitoring".
Figure 2: robots.txt monitoring with Ryte
Ryte pings your website’s robots.txt file on an hourly basis in order to verify its accessibility (status 200) and check for changes in its content. Here, the loading time of the file is also taken into account and variations (e.g., timeouts) recorded.
The following technical and content issues are reviewed during the monitoring:
- Is rthe obots.txt accessible? With which status code does the file respond?
- What is the file’s loading time? Does it time out?
- Has the content of the file changed? If yes, how many lines have been added or removed?
- How is the specific content of the current robots.txt version and how was it in the previous version?
The report lists all versions of the robots.txt file that were found, including its download errors and average loading time.
Figure 3: The average loading time of the different robots.txt versions
The next list contains more details of the different versions.
Figure 4: All versions of the robots.txt file
In this table, you can see:
how long the respective version has been online
the changes that were last made, and
the number of lines in the file.
The loading time is also listed separately for the different versions.
Figure 5: Details about the time, last changes, and loading time
To get a closer look at a version, simply click on the magnifier on the right hand side for a detailed view.
Figure 6: Detailed view of the different versions
This displays the entire robots.txt file in a window. If the three symbols of the status codes, document type, and the loading time are highlighted in green, the version is okay and no action is necessary.
The robots.txt Monitoring has a convenient notification function that promptly notifies web operators about changes in the robots.txt file. If the file fails to return the status code 200, the project owner will be notified immediately about this per email.
If changes in the content of the robots.txt file are detected, they are listed in the report. In the case of more than 5 changes, a similar email is sent requesting the web operator to check the robots.txt file and verify whether these changes were intentional.
Tip: Activate or deactivate the notifications for the different projects in the user settings under "emails".
Figure 7: Setting the notification function
The robots.txt Monitoring function makes it possible to keep track of the status code, accessibility, and loading time of your robots.txt file. The convenient notification informs you about any anomalies, meaning that you can correct mistakes as soon as they appear, avoiding a loss in website performance or business.
Published on Oct 8, 2016 by Eva Wagner