Crawl budget is defined as the maximum number of pages that Google crawls on a website.
Google itself defines how many subpages it crawls per URL. This is not the same for all websites, but according to Matt Cutts, is determined primarily based on the PageRank of a page. The higher the PageRank, the greater the crawl budget. The crawl budget also determines how often the most important pages of a website get crawled and how often an in-depth crawl is executed.
The term index budget is different from a crawl budget. It determines how many URLs can be indexed. The difference becomes apparent when a website contains multiple pages that return a 404 error code. Each requested page counts on the crawl budget but if it cannot be indexed due to an error message, the index budget is not fully utilized.
The crawl budget poses a problem for larger websites with many subpages. Specifically, not all subpages will be crawled, but only a portion of them. Accordingly, not all subpages can be indexed. This in turn means that site operators might by losing traffic because relevant pages weren’t indexed.
There is a whole section of search engine optimization specifically devoted to this situation, with the aim is to direct the Googlebot, so that the existing crawl budgets gets used very wisely and high quality pages that are of particular importance for the website operator get indexed. Pages which are of minor importance must be identified first. In particular, that would include pages with poor content or little information, in addition to faulty pages which return a 404 error code. These pages must be excluded from the crawl so that the crawl budget remains available for the better quality pages. Subsequently, the important subpages have to be designed in such a way that they are crawled by spiders as a priority. Possible actions as part of crawl optimization include:
If the portfolio of crawled and indexed pages is improved through crawl optimization, the ranking may be improved as well. Pages with a good ranking are crawled more frequently, which in turn brings benefits.
An informative lecture on “Crawl Budget Best Practices” by Jan Hendrik Jacob Merlin at the SEOkomm 2015 can be found here.