Irony: If Google Can’t Reach Your Robots.txt File, It Might Not List Your Site
I reported at the Search Engine Roundtable this morning that Google said if your robots.txt is unreachable, your site might not make it into the Google index. By unreachable, Google means that if your server simply times out and does not return any server response when Googlebot attempts to access your robots.txt file, then it might not include any of your pages in their index.
Googler John Mueller explained that Google tends to lean on the “safe” side when this situation pops up. When I showed this to Danny, he felt it was ironic that if Google can’t read what you want to block, it might block everything. But if you think about it, with all the legal woos Google has to deal with about indexing content, should they risk indexing a site that might have a nofollow directive in their robots.txt file?
It is important to clarify that a robots.txt file is not required in order to be listed with Google. If you don’t have one and Google sees a normal server status response such as a 404 not found, all’s good. It’s only if Google asks for a robots.txt file and gets no response at all where this might be an issue. Rare case, but good to know.
Some opinions expressed in this article may be those of a guest author and not necessarily Search Engine Land. Staff authors are listed here.
(Some images used under license from Shutterstock.com.)
Everything you need to know about SEO, delivered every Thursday.