top of page

What is robots.txt? Why did I get a 400 error for my website

  • Writer: Saneesha Talim
    Saneesha Talim
  • Apr 5
  • 2 min read

I checked my own robots.txt file.

And found a 400 error.


I panicked, wondering if Google was blocked from crawling my entire site.


Here's what  Google's documentation say about it:

"If you use a CMS, such as Wix or Blogger, you might not need to (or be able to) edit your robots.txt file directly."

So the 400 error wasn't a problem. It's just how Wix works on a non-custom domain.

Additionally, when Google encounters a 4xx error on a robots.txt file, it assumes that there are NO crawl restrictions.


Meaning Google treated my site as completely open to crawling (which is what I want)


So what is robots.txt?


A robots.txt file is a plain text file that sits at the root of a website. Its job is simple: tell search engine crawlers which parts of your site they can and cannot visit.


You can find any site's robots.txt by typing: yourwebsite.com/robots.txt

If a page loads, it exists.

If you get an error, keep reading.


How to read a robots.txt file


Here's what the file looks like and how to read it:

User-agent: *
Disallow: /admin/
Allow: /public/
Sitemap: https://yoursite.com/sitemap.xml
  • User-agent: * means these rules apply  to ALL crawlers (Googlebot, Bing, everyone)

  • Disallow means the crawler cannot visit that specific path

  • Allow means crawl this path even if a broader Disallow rule exists

  • Sitemap links to your sitemap (you can list more than one)


Is blocking the same as not indexing?


No. And this is the mistake most people make.


robots.txt controls CRAWLING. It does NOT control INDEXING.


If you block a page in robots.txt, Google cannot crawl it.


But if other websites link to that page, Google can still find it, index it, and show it in search results. Just without a description or snippet.


So if you want a page completely hidden from Google? robots.txt alone is not enough. You need a noindex tag as well. (More on that in my next post.)


One more thing worth knowing...


Google caches your robots.txt for up to 24 hours. So if you make a change today, don't expect Google to pick it up immediately. It takes time to refresh.


A 400 error on a robots.txt file was something I had never expected.

But it sent me to Google's documentation and taught me more about how crawling actually works.

Sometimes the errors are the best teachers.

Recent Posts

See All
How does Google crawl a website?

Google uses a bot called Googlebot to discover and visit pages on your site. It follows links, reads your content, and decides what to store. You have a crawl budget which is a limited number of pag

 
 

Comments


bottom of page