Googlebot does not have an infinite capacity for crawling a website. For this reason, only a limited crawl budget is available.
However, for a long time there has been no official definition of what a crawl budget actually is. Google has now changed this and has more closely explained the concept in one of its own articles from the Webmaster Central Office on 1/16/2017.
To ensure that search engines such as Google can deliver current as well as relevant results to users’ queries, the web must be permanently crawled by bots. This crawling allows your website to be indexed in search results lists. Nevertheless, Googlebot cannot permanently crawl all websites; its activity is limited (the so-called "crawl budget"). In the article by Gary Illyes dated 1/16/2017, the crawl budget of Googlebot is defined as follows:
Crawl Budget = Crawling Frequency + Crawling Need = number of URLs which Googlebot can and will crawl!
This definition is supplemented by further explanations in the article.
The following lessons can be drawn from it:
The larger the budget that is available for Googlebot, the more sites it can crawl in your domain, and the more contents can thereafter be indexed and appear in the SERPs.
According to the statement by Gary Illyes, the crawl budget fundamentally has no influence on sites newly published on the internet, because they are already crawled by Googlebot. Moreover, the crawl budget does not play an important role for domains with fewer than 1,000 URLs, as Googlebot can efficiently crawl this number. Additional factors play a role, such as the server capacity of the website and the prioritization of URLs to be crawled.
Every webmaster who engages with SEO knows that fast websites have a positive impact on usability. Fast server responses nevertheless also have advantages in crawling. The faster a website responds, the higher the crawling frequency of Googlebot and the more simultaneous connections it can use for crawling. If you would like to optimize your use of the crawl budget for Googlebot, you should pay attention to fast servers and fast-loading websites.
The crawl rate can also be controlled by the Google Search Console. This allows the webmaster to reduce frequency in order to conserve server capacity. Nevertheless, an increase will not automatically lead to a higher frequency when crawling.
A higher crawling frequency does not necessarily lead to better positions in the search results. (Gary Illyes, Google)
It is fundamental to understand that the crawling of a website alone is not relevant for ranking. However, the chance for good rankings increases the more extensively and frequently your website is crawled.
Thus, the algorithms of the search engine can always synchronize themselves according to how well your site matches a search query.
Googlebot ensures that the crawl budget is dependent upon much you need to crawl your site. Thus, it does not have to exhaust your entire budget all at once.
The bot more frequently crawls websites that, according to Illyes, are "more popular". Popularity on the internet is usually indicated by the number of incoming backlinks. Thus, a website that is more strongly linked is also more frequently crawled. A site can also be popular if it contains very current information and is continually updated, such as a news site. Unfortunately, Illyes does not touch on “popularity” any further. However, it is clear that Googlebot sees a need for crawling older, indexed sites. But here too the statement remains unspecific. A domain transfer is nevertheless a clear signal for Googlebot to crawl the site.
Googlebot follows all URLs on your site, so all URLs are considered for the crawl budget. It does not matter whether it is dealing with embedded URLs, alternative URLs for hreflang, or AMP. If you want to conserve the crawl budget, you should remove superfluous URLs from your site.
Google must think economically, as there are many URLs that must be crawled on a daily basis. Also, a corporation such as Google does not want to expend unnecessary financial resources on crawling websites.
In his article, Gary Illyes defines exactly which factors can minimize the crawl budget:
Faceted navigation: this can, for example, be a filter that generates a new URL with each upgrade
Duplicate Content
Soft-404-error pages
Sites that were hacked
Qualitatively substandard sites
Spam sites
Infinitely expandable sites: these could, for example, be calendars that generate a new URL each day
The simplest way to optimize the crawl budget is to reduce duplicate content. OnPage.org can help with identifying the appropriate pages and removing them if possible.
Figure 1: Identify duplicate contents with OnPage.org.
Examine your robots.txt file and ensure that Googlebot can crawl all relevant areas.
Figure 2: The robots.txt monitoring from OnPage.org
Update your XML sitemap regularly. By doing this, you show Googlebot all the important URLs on your website that it can follow.
✓ Simplify crawling for Googlebot and increase crawling requirements with fresh content.
✓ Examine your site regularly for server errors using the search console or OnPage.org.
✓ Also avoid sites with little added value, duplicate content, or spam.
Already, in only a few steps, you have saved on the crawl budget and have also optimized your website. Googlebot will visit your site anyway, but it is in your hands whether you truly exhaust its potential!
Published on Jan 19, 2017 by Eva Wagner