Fun with robots.txt

One of the most boring topics in technical SEO is robots.txt. Rarely is there an interesting problem needing to be solved in the file, and most errors come from not understanding the directives or from typos. The general purpose of a robots.txt file is simply to suggest to crawlers where they can and cannot go.

Basic parts of the robots.txt file

Other things you should know about robots.txt

User-Agent: Googlebot
Allow: .js
Allow: .css

Now for the fun stuff!

Many companies have done creative things with their robots.txt files. Take a look at the following examples!

ASCII art and job openings

Nike.com has a nice take on their slogan inside their robots.txt, “just crawl it” but they also included their logo.

Seer also uses art and has a recruitment message.

TripAdvisor has a recruitment message right in the robots.txt file.

Fun robots

Yelp likes to remind the robots that Asimov’s Three Laws are in effect.

As does last.fm.

According to YouTube, we already lost the war to robots.

Page One Power has a nice “Star Wars” reference in their robots.txt.

Google wants to make sure Larry Page and Sergey Brin are safe from Terminators in their killer-robots.txt file.

Who can ignore the front page of the internet? Reddit references Bender from “Futurama” and Gort from “The Day The Earth Stood Still.”

Humans.txt?

Humans.txt describes themselves as “an initiative for knowing the people behind a website. It’s a TXT file that contains information about the different people who have contributed to building the website.” I was surprised to see this more often than I would have thought when I tried on a few domains. Check out https://www.google.com/humans.txt.

Just using robots.txt to mess with people at this point

One of my favorite examples is from Oliver Mason, who disallows everything and bids his blog farewell, only to then allow every individual file again farther down in the file. As he comments at the bottom, he knows this is a bad idea. (Don’t just read the robots.txt here, seriously, go read this guy’s whole website.)

On my personal website, I have a robots.txt file to mess with people as well. The file validates fine, even though at first glance it would look like I’m blocking all crawlers.

The reason is that I saved the file with a BOM (byte order mark) character at the beginning, which makes my first line invalid — as you can see when I go to verify in Google Search Console. With the first line invalid, the Disallow has no User-Agent reference, so it is also invalid.

Indexed pages that shouldn’t exist

If you search for “World’s Greatest SEO,” you’ll find a page on Matt Cutts’ website that doesn’t actually exist. SEO Mofo chose a directory (/files) that is blocked by https://www.mattcutts.com/robots.txt. The only information Google has about this page is from the links that were built to the non-existent page. While the page 404s, Google still shows it in the search results with the anchor text from the links.

A whole freaking website inside robots.txt

Thought up by Alec Bertram, this amazing feat is chronicled where else but his robots.txt file. He has the how, the source and even a menu to guide you.

This was also used on vinna.cc to embed an entire game into the file. Head over to https://vinna.cc/robots.txt and play Robots Robots Revolution!


About The Author

Patrick Stox
Patrick Stox is an SEO Specialist for IBM and an organizer for the Raleigh SEO Meetup, the most successful SEO Meetup in the US.