• http://twitter.com/ChaseSEO Chase Anderson

    Great article – I think it’s a very common misconception to think robots.txt will remove files from the index. Thank you for spreading the good word and helping keep everyone up on the less than admirable techniques that white hats might not be aware of.

  • Stefano Piotto

    Good, only a clarification. Most webmasters use both robots.txt and meta robots, thinking this is a double security, but the robots.txt block the spider that can not read the meta robots noindex instruction: it’s a possible own gol.

  • http://www.rimmkaufman.com/ George Michie

    Important article, Tom, and well written, too. This is a very common misunderstanding.

  • http://www.clippingpathindia.com/ clipping path service provider

    This is very important article.

  • http://www.michaelcropper.co.uk/ Michael Cropper

    If people believe that using robots.txt to keep files secure they should choose a new profession. Anything that is private should be kept behind a login – it’s as simple as that.

  • http://gallardomark.com/ Mark Gallardo

    a lot are still confused of these two ;) glad you explained it very well. i could share this to them ;)

  • स्वप्निल कुलकर्णी.

    I think, by-default it is allowed to access(crawl) all the web pages of your website and when you are using robots.txt then you are specifically disallowing pages of your website. Then why Google.com/robots.txt is using following line in robots.txt file ?

    Allow: /news/directory

    Is there any reason??

  • http://twitter.com/sharithurow sharithurow

    Great article and examples!

    I like the Google thing (and I mean that sarcastically) that if you robots.txt a page, then they can’t read the meta tag. It’s one way of wanting to choose what does and does not go into the index, taking away control from website owners.

    I get it. It’s their index and their search results. But I think website owners should ultimately pick. They probably know more about their specific group of users than a search engine does. They have better context (at least I hope they do).

  • http://about.me/mohammedalami Meding44

    I agree totally with you @tom clients are not aware about Google index, so they think because they changed their robots.txt every thing is solved. Google while refreshing its index doesn’t care about. The only way is to add noindex on page, and some times we should clear cache via GWT .. very time consuming. Hope standard will evolve to adapt to new comprehension we’ve about search engines.

  • http://www.linkworxseo.com/ Link Worx Seo

    What I have read and found is that the allow: is not needed. Stick to the disallow and forget about using allow: If you search the net for robots.txt file programs and research the proper usage and layout of one, you will find that the allow: is not actually needed when done properly.

  • Usha Ghosh

    Hey Tim! Its a very good article :)

    But, please note down the broken links in your article!!




  • robthespy

    I think most of us knew this. But it’s important nonetheless. And I’m sure there are any people who will benefit from this.

    Well done, Tom!

  • cheryl511

    up to I saw the bank draft four $4386, I be certain …that…my neighbour had been truly bringing in money part time on their apple laptop.. there sisters neighbour has done this 4 only about eight months and resently repaid the depts on there apartment and bourt a top of the range BMW M3. go to, jump15.comCHECK IT OUT

  • http://www.facebook.com/therealbenguest Ben Guest

    Anything that needs to be kept private, needs to be kept off the internet.

  • http://www.irishwonder.com IrishWonder

    Robots meta tag is a good solution but if you’ve got a directory on your server with .pdf files or images or something else non-HTML that you do not want indexed or visible or don’t want anyone to know about you’re out of luck with robots meta. The only thing you can do in this case is password protect the directory.