How does Google crawl a website?

Saneesha Talim
Mar 25
3 min read

Google uses a bot called Googlebot to discover and visit pages on your site. It follows links, reads your content, and decides what to store.

You have a crawl budget which is a limited number of pages Google will crawl per visit.

If Google can't crawl a page, it can't rank it. Crawling comes first, everything else follows.

When I first started learning SEO, I kept hearing "make sure Google can crawl your site." I nodded along but had no idea what that actually meant.

So here's what I now understand, written the way I wish someone had explained it to me.

What is crawling?

Crawling is how Google discovers pages on the internet. It sends out a bot called Googlebot that visits URLs, reads the content and follows links to find new pages.

How does Googlebot find your pages?

Googlebot discovers pages in two main ways:

By following links from pages it already knows and
By reading your XML sitemap (a file you create that lists all your important URLs).

This is why internal linking matters so much. If a page has no links pointing to it, the Googlebot may never find it even if the content is great.

Crawl budget, crawl rate, crawl limit: what are they?

These three terms confused me at first because they sound almost identical. Here's how I now distinguish them.

Crawl budget

The total number of pages Googlebot will crawl on your site in a given period.

Small sites rarely need to worry about this. Large sites (millions of pages) need to manage it actively.

Crawl

rate

How fast Googlebot crawls requests per second to your server.

Google adjusts this automatically to avoid overloading your site.

Crawl

limit

The maximum speed Google sets based on your server capacity.

It won't exceed this think of it as the speed limit Googlebot respects.

Crawl demand

How much Google wants to crawl your pages driven by popularity and how often content changes. Popular pages updated frequently get crawled more.

Remember: crawl budget = crawl rate × crawl demand

What can block crawling?

Below are a few common things to stop Googlebot from reaching your pages:

robots.txt

A file at yourdomain.com/robots.txt that tells crawlers which pages to visit or skip. Accidentally blocking important pages here = Google never sees them.

Noindex tag

An HTML tag telling Google: crawl this page, but don't add it to search results. The bot visits it but just won't rank the page.

Server errors (5xx)

If your server returns an error when Googlebot visits, it can't read the page. Repeated 5xx errors can reduce your crawl budget over time.

Remember:

robots.txt = "don't come in."

Noindex = "come in, but don't tell anyone about this room."

How do I check if Google is crawling my site?

The easiest way is Google Search Console.

Open Google Search Console → go to the Indexing (Coverage) report.

It's free and shows you exactly which pages have been crawled, when, whether there were any issues and what are the issues (if any).

It sounds complicated but it's actually very readable once you know what you're looking at.

One thing to remember

Crawl → Index → Rank

That's the order. Crawling is step one of SEO.

If Google can't crawl your page, it can't index it.

If it can't index it, it can't rank it.

If Google can't crawl your page, nothing else matters. This is the foundation everything else sits on.

In my next post, I'm going to cover what happens after crawling: indexing. That's where things get interesting — and where nofollow tags, canonical tags, and duplicate content come in.

How does Google crawl a website?

What is crawling?

How does Googlebot find your pages?

Crawl budget, crawl rate, crawl limit: what are they?

Crawl budget

Crawl

rate

Crawl

limit

Crawl demand

What can block crawling?

robots.txt

Noindex tag

Server errors (5xx)

How do I check if Google is crawling my site?

One thing to remember

Recent Posts

Comments