This term sounds intimidating, with its faint suggestion that you might possibly use up the allotted number of times Google crawls your site, or somehow fall foul of Googlebot and see your rankings plunge as a result. Google has taken to its blog this week in order to clear up this confusion and give its definition for crawl budget – something it says is lacking on the web in general – along with explaining what it actually means to Googlebot.
First things first, what’s a crawl budget? How does it relate to how your site is crawled and what Googlebot sees?
Crawl rate limit and Googlebot are inextricably linked. Googlebot is the search engine’s bot, which goes out onto the web and crawls web pages to determine what they are about. Because too many parallel connections can slow the user page load time, crawl rate limit refers to the maximum fetching rate for Googlebot because there starts to be an impact on the user experience in the form of slower connections.
Google says the crawl rate limit is, “…the number of simultaneous parallel connections Googlebot may use to crawl the site, as well as the time it has to wait between the fetches.”
The number of crawls Googlebot can make (its budget) is influenced by considerations such as how quickly the site responds, if errors are returned and the limit set by the site owner in Google Search Console.
Does that mean Googlebot uses up its maximum crawl allocation where possible?
No. Even if you have a large site with lots of pages to index, it doesn’t necessarily mean that you’ll see the maximum number of Googlebot crawls. More popular pages will get crawled more often so the information returned in the search results is up-to-date. Likewise, if you change your site substantially or move it, that might necessitate a larger number of crawls initially, but that spike should reduce to the usual rate once Googlebot has its information. This metric is the crawl demand.
Google uses the crawl rate limit plus the crawl rate demand to arrive at the crawl rate budget. That means, to Google and Googlebot, crawl rate budget is the number of pages Googlebot can crawl and wants to crawl.
In its post, Google says there are a number of factors which can influence the crawl budget. If your website is large, with more than a few thousand URLs, then crawl budget should be on your radar.
Best practice
These are the things to avoid in order to keep your crawl budget working effectively for you:
1. Low value URLs are draining for Googlebot and can affect your crawl budget so you’ll need to thoroughly audit your site structure. This may be a big job as there will likely be thousands of pages to check.
2.If you filter your menus so users can search by parameters such as color, size or location, your customers may thank you for this faceted navigation. Googlebot can feel differently and the search engine says it is one of the so called ‘low value add URLs’ that pulls Googlebot resources away from more meaningful pages.
If your site uses faceted navigation, you may well need to make changes to the navigation coding and structure of your site. A common change is to create an overarching category page. There’s more information on how to restructure poor faceted navigation in this post from Google.
3.For the large sites that will be affected by crawl budget, a content audit will also be called for. If any of your domain has low quality or spam content, you’re using some of your crawl allocation up for Googlebot to crawl those lower quality pages. This means your pages which do have value, are of a higher quality and should be indexed, may suffer as a result. Clean up content by removing and redirecting poor quality content and getting rid of spam content.
4.Ecommerce sites will often have the problem of duplicate content as some products will be available in different configurations and have user tracking codes or session IDs appended to each address. A t-shirt may be available in four colors for example, with a page for each color option and then multiple URLS to reflect different user sessions. This creates a duplicate content issue for Googlebot and again means the site isn’t being crawled effectively. Google provides some help here, by grouping duplicate URLs into a cluster and then indexing just one but you can also help to cut down on duplicate content with careful use of cookies and keeping a sitemap up to date with canonical URLs.
You can read the full post from Google here.