With Ryte, you can easily customize your analysis in the best way for your business. Find out in this article how to customize your crawler settings.
Ryte has its own crawler to analyze your website. The crawling technology closely resembles the technology of Google’s crawler. The Ryte crawler starts on the homepage of your project makes its way from page to page by following the internal link path. Just like the Google Crawler, the Ryte bot can be controlled: Therefore, it’s possible to tell the crawler to exclude certain directories or pages from the crawl (and with that, from the analysis). Subdomains can also be analysed in-depth or excluded.
Customising the crawler settings has the following benefits:
The crawler settings can be found when clicking on your project settings. The settings involving the crawler are "Project setup" and "Advanced analysis". These can be modified individually for every project. Let’s take a closer look at the settings!
You can make several adaptations in the Basic Settings. Let’s go through them step-by-step.
Set the limit of the number of URLs that you want to crawl. If you don’t know how many indexable URLs your website has, try the site query in Google: “site:en.ryte.com. The number that appears on top will tell you how many of the domain’s pages are listed in the Google index – you can orient yourself towards it. The Ryte crawler is able to crawl from 100 to 21 million URLs.
Here you can decide how fast your analysis should be. The more parallel requests you use, the faster your site will be analyzed. However, using more than 10 could cause your server to slow down, so check with your administrator or our support team for advice if you want to use more than 10 parallel requests.
If your website has a lot of cookies, you can at this point allow the crawler to accept cookies. This option is disabled by default in order to reveal problems that occur as soon as users (or other crawlers) deny cookies – as for instance session IDs, cloaking etc. These errors are often overlooked as browsers have enabled cookies by default. This is an advanced option and should be enabled with caution.
The Ryte crawler regards pictures as autonomous resources and crawls them by default. If you prefer to stint on resources and only allow the crawling of HTML content, you should untick it. However, flawed and deleted images won’t be displayed in the reports any more. We recommend to get the images crawled in order to receive a thorough error analysis of your website.
If your website features a lot of subdomains, you can crawl all subdomains by ticking this box. This option is activated by default.
You can tell the OnPage.org crawler to regard or disregard the robots.txt. If you deliberately exclude content in the robots.txt from Google, you can also exclude them from OnPage.org. If you would like to include all available content, all sites will be crawled.
Analyse sitemapsWould you like the crawler to download and analyse the sitemap.xml(s)? This option is needed for the "sitemap.xml" report. You can list the sitemap URLs in the advanced settings"
Here you can specify more settings. For example, you can specify the home page, i.e. the page from where the crawler should start. If you want the crawl to start from the homepage, you can leave this blank - the homepage will be crawled by default.
All recent performed crawls within a project and who executed them are listed in "Previous analyses". In the table, the Crawling Limit, the number of found URLs, the number of crawled URLs and excluded URLs as defined by you beforehand are also listed.
Tip:If the number of found URLs is significantly higher than the number of crawled URLs, it would be a good idea to extend the Crawling Limit. Make sure the number of excluded URLs isn’t too high as this could hint at false robots.txt settings.
The various crawler settings make it easier for you to customize your analysis in the best way for your business. If you're not sure how best to set up your crawler, use the recommended settings, or get in touch with our support team for extra help.
Keep on optimising!
Published on 03/26/2015 by Irina Hey.
Who writes here
Irina Hey is a keynote speaker and an expert in the field of customer acquisition, lead generation and data driven marketing. Until April 2018 she worked as a Product Owner of Acquisitions and coordinated all strategic marketing activities at Ryte.
Get more traffic and customers by optimizing your website, content and search performance. What are you waiting for?
Do you want more SEO traffic?
Improve your rankings for free by using Ryte.Register for free