What is robots.txt? Why did I get a 400 error for my website

Saneesha Talim
Apr 5
2 min read

I checked my own robots.txt file.

And found a 400 error.

I panicked, wondering if Google was blocked from crawling my entire site.

Here's what Google's documentation say about it:

"If you use a CMS, such as Wix or Blogger, you might not need to (or be able to) edit your robots.txt file directly."

So the 400 error wasn't a problem. It's just how Wix works on a non-custom domain.

Additionally, when Google encounters a 4xx error on a robots.txt file, it assumes that there are NO crawl restrictions.

Meaning Google treated my site as completely open to crawling (which is what I want)

So what is robots.txt?

A robots.txt file is a plain text file that sits at the root of a website. Its job is simple: tell search engine crawlers which parts of your site they can and cannot visit.

You can find any site's robots.txt by typing: yourwebsite.com/robots.txt

If a page loads, it exists.

If you get an error, keep reading.

How to read a robots.txt file

Here's what the file looks like and how to read it:

User-agent: *
Disallow: /admin/
Allow: /public/
Sitemap: https://yoursite.com/sitemap.xml

User-agent: * means these rules apply to ALL crawlers (Googlebot, Bing, everyone)
Disallow means the crawler cannot visit that specific path
Allow means crawl this path even if a broader Disallow rule exists
Sitemap links to your sitemap (you can list more than one)

Is blocking the same as not indexing?

No. And this is the mistake most people make.

robots.txt controls CRAWLING. It does NOT control INDEXING.

If you block a page in robots.txt, Google cannot crawl it.

But if other websites link to that page, Google can still find it, index it, and show it in search results. Just without a description or snippet.

So if you want a page completely hidden from Google? robots.txt alone is not enough. You need a noindex tag as well. (More on that in my next post.)

One more thing worth knowing...

Google caches your robots.txt for up to 24 hours. So if you make a change today, don't expect Google to pick it up immediately. It takes time to refresh.

A 400 error on a robots.txt file was something I had never expected.

But it sent me to Google's documentation and taught me more about how crawling actually works.

Sometimes the errors are the best teachers.

What is robots.txt? Why did I get a 400 error for my website

So what is robots.txt?

How to read a robots.txt file

Is blocking the same as not indexing?

One more thing worth knowing...

Recent Posts

Comments