Google Webmaster Tools Updates Robots.txt Testing Tool

Google Webmaster Tools - Facebook Featured Google announced they have updated their robots.txt testing tool in Webmaster Tools. The update brings you a few more features including: (1) Highlights which line in your robots.txt file is blocking a specific page. (2) Make test changes to the robots.txt tool and test it before you make the file live. (3) Google will also show you older versions of your robots.txt file to see past issues. Here is a screen shot showing off the highlighting feature: [...]


Should You Gate Content? The SEO Implications

gates-shut--closed-shutterstock Content marketing is the hot topic these days, but lead generation marketers are faced with a conundrum: How do I get the visibility for my content via organic search while at the same time capturing leads? Are SEO and gated content mutually exclusive? There's no magic answer to this question, but there are some important considerations to address in order to decide your course of action. Search Robots ≠ Humans First and foremost, from an SEO perspective, it's important to understand that search engine robots and humans are not equal. Search robots navigate and read a site very differe [...]


Google Orders Terminator Robots Not To Kill Founders Brin & Page

google-killer-robottxt How do you stop the unstoppable killer Terminators if you're not Sarah Connor? Google does it with a simple text file. People have noted today that Google has a special "robot.txt" file that pokes fun at stopping the Terminators. Robots.txt files are actually used to stop robots of the less threatening kind - it's like Google that crawls the web looking for content. Use robots.txt, and you can stop them from gathering your content. The new Google easter egg robots.txt was uploaded recently, maybe on the anniversary of the Robots.txt file? You can access the new file at google.com/ [...]


Robots.txt Celebrates 20 Years Of Blocking Search Engines

googlebot-at-the-beach-1403523511 Today is the 20th anniversary of the robots.txt directive being available for webmasters to block search engines from crawling their pages. The robots.txt was created by Martijn Koster in 1994 while he was working at Nexor after having issues with crawlers hitting his sites too hard. All major search engines back then, including WebCrawler, Lycos and AltaVista, quickly adopted it; and even 20 years later, all major search engines continue to support it and obey it. Brian Ussery posted on his blog about the 20-year anniversary and documented the most common robots.txt mistakes he has seen [...]


Google Fixing Reverse DNS GoogleBot Verification

Google Webmaster Tools - Facebook Featured Google has confirmed that there are some GoogleBot useragent spiders not properly passing verification protocol. Savvy webmasters noticed that GoogleBot over the .249.70.0 /24 IP range was not returning the proper reverse DNS verification details. The response given was "no such host is known," but the activity webmasters noticed was that it did appear to be a legit Google crawler. Google's John Mueller confirmed on my Google+ post that this was indeed an issue on Google's end. Temporarily they have stopped GoogleBot activity in those IP ranges and will fix the issue before they continue [...]


Google’s Matt Cutts: NoFollow Attributes On Internal Links Don’t Hurt But Generally Don’t Do It

google-matt-cutts-sitedown In a video, Google's head of search spam Matt Cutts published today an answer to the question, "Should I use rel="nofollow" on internal links to a login page?" Matt Cutts basically said you shouldn't, but said it won't hurt you if you did. Matt said, "It doesn't hurt if you want to put a nofollow pointing to a login page or to a page that you think is really useless." But Matt said, "in general" it also doesn't hurt to not add a nofollow, and in general, you should let Googlebot crawl and explore your site. In most cases, using a noindex may be better than using a nofollow on the link at [...]


Snowden Petition Blocked From Google? Like All Petitions, It Won’t Be When It Gets Enough Signatures

donotenter-block-caution-featured Search for "edward snowden petition" on Google to find the petition filed through the White House petitions site, and you'll see something odd. The petition has no description, because the White House won't let Google crawl the page. But it's not a move against Snowden, as some might think. It's part of how the petitions site has worked with search engines for some time. Here's how the listing looks: Notice the description: "A description for this result is not available because of the site's robot.txt -- learn more." iAcquire noted the oddity this week, that the page is listed [...]


Twitter Opens Up To More Crawling, But Do Search Engines Want Its Search Results In Theirs?

twitter-search-featured Twitter recently updated its robots.txt file and, though the change opens up millions of pages to being crawled, there's no guarantee that the main search engines want what Twitter is offering. The Sociable seems to have been the first to notice Twitter's robots.txt changes, which now specifically allow Google, Bing, Yahoo, Yandex and other bots to crawl through some of Twitter's search results pages. Twitter: Change Made To Help With Discovery Twitter confirmed the change to us, saying: This change will help people find popular and helpful Twitter pages, such as the #olympics hashtag pa [...]


The Latest & Greatest On SEO Pagination

pages-pagination-featured Technical SEO topics such as pagination are near and dear to my heart. This article will build upon and update my previous treatment of pagination and SEO. I've written and presented often on pagination for SEO. Why so much attention on this subject? The reason is simple: it can be a big, hairy deal for sites. It's right up there with faceted navigation as one of the most problematic crawling and indexing issues for large-scale SEO. It's a tactic (actually a set of tactics) that our teams are continually evolving, testing, and refining. So it was "double prizes" when Google announced [...]


Google Slows Web Crawlers To Help Blackouts Sites

As you know, there are many sites going black to protest SOPA and PIPA. Google has already offered blackout SEO advice but they decided to take it one step further by slowing down their spiders today. Pierre Far from Google posted on his Google+ that Google is slowing down GoogleBot's crawl activity to reduce the effect on their site's search rankings, if they did not follow the Google SEO advice from yesterday. Pierre Far said: Hello webmasters! We realize many webmasters are concerned about the medium-term effects of today's blackout. As a precaution, the crawl team at Google has conf [...]


How To Blackout Your Site (For SOPA/PIPA) Without Hurting SEO

Google-Webmaster-SEO-Rep-1304428070 A number of websites are (or were) planning to "go black" this week while the U.S. Congress discusses issues related to the Stop Online Piracy Act (SOPA) and the Protect IP Act (PIPA). The website blackouts are part of a larger social media effort against the bills that our Greg Finn wrote about this morning on Marketing Land. You may be thinking about joining the website blackout movement, but yikes ... what about the SEO implications? How do you take your site offline in protest without messing up your visibility in Google's search results? Well, Google's Pierre Far shared several tips [...]


Google Can Now Execute AJAX & JavaScript For Indexing

Google-Webmaster-SEO-Rep-1304428070 This morning we reported that the comments on Facebook are being indexed by Google. Google's Matt Cutts just confirmed on Twitter that Google is now able to "execute AJAX/JS to index some dynamic comments." This gives Google's spider, GoogleBot, the ability to read comments in AJAX or JavaScript, such as Facebook comments or Disqus comments and others that are dynamically loaded via AJAX or JavaScript. In addition, this means, Google is better at seeing the content behind more of your JavaScript or AJAX. Postscript: Google now has an official blog post up with more details. Related [...]


Google Disables URL Removals After Bug Allows Anyone To Remove Any Site

Google Webmaster This morning, James Breckenridge discovered a loophole within Google's Webmaster Tools that allowed anyone to remove any site from Google. Both James and I sent this information to Google as soon as we heard of it. After several hours, Google has told us, "we're still investigating this report, and to be cautious we disabled all URL removals earlier this morning." So now, if you even own a site, you won't be able to remove the site or pages from the site using Google's URL removal tool. How did this loophole work? Pretty simple as James described. You use the following URL when logged [...]


Google Webmaster Tools Remove URL With Blocking Not Required

Google announced on the Webmaster blog that they have removed a requirement for removing URLs via Google Webmaster Tools. Google no longer requires you to block access to the URL you want to remove from Google prior to submitted the URL removal request. Google said they have eliminated "the requirement that the webpage's URL must first be blocked by a site owner before the page can be removed from Google's search results." Why did Google drop this requirement? Simply because since you have already validated and verified you are the owner of the site, Google felt it was redundant to requi [...]


A Lesson From the Indexing of Google Translate: Blocking Search Results From Search Results

Last year, Google published an SEO Report Card of 100 Google properties. In it, they rated themselves on how well the sites were optimized for search. Google's Matt Cutts presented the results at SMX West 2010 in Ignite format. He noted that not every Googler is an expert in search and search engine optimization. Googlers who don't work in search don't get preferential treatment from those who do and just like any site on the internet, sometimes things aren't implemented correctly. Just because a site is owned by Google doesn't mean it's the best example of what to do in terms of SEO. This [...]


Google Releases Details On Controlling GoogleBot & Google’s Crawl

The Google Webmaster Central blog announced Google has published a new set of documents in the Google Code section on how to control Google from crawling and indexing your site. You can read the set of documents over here. The technical documents are broken down into five section: Getting Started Robots.txt specification Robots meta tag and X-Robots-Tag specification Google's crawlers References I personally printed them out as my weekend reading. [...]


Robots.txt Recruiter: Daily Mail Uses Robots.txt File To Find SEO

Malcolm Coles spotted that the Daily Mail, one of the UK's largest papers, changed their robots.txt file to include a line which reads: # August 12th, MailOnline are looking for a talented SEO Manager so if you found this then you're the kind of techie we need!# Send your CV to holly dot ward at mailonline dot co dot uk How clever! They suspect some of the best SEOs out there would be sniffing around their robots.txt file and used it to recruit a new SEO manger. If anything, it is getting the word out there via the press that they are looking for a new SEO. This reminds me of when Go [...]


Facebook: No Plans To Give Search Engines Access To Facebook Questions

That's one of the big questions people are asking after yesterday's launch of Facebook Questions. While many have assumed the answer would be "yes," a Facebook spokesperson tells us that assumption is wrong. Currently, search engines cannot access questions and answers through our Questions product. That may be something we consider for the future but have no current plans to allow it. Facebook is blocking search engines by only showing Questions to logged-in users. Sure enough, a site:facebook.com/questions/ search on Google shows only a handful of results, none of which are actuall [...]


OpenOffice.org Is MIA In Bing, But It’s Not Censorship

The home page of OpenOffice.org, the well-known Microsoft Office competitor, is missing from Microsoft's Bing search engine. While it sounds suspicious, the problem has nothing to do with Bing itself -- it's a technical problem on OpenOffice.org's end. Ian McAnerin noticed earlier today that OpenOffice.org doesn't show up in Bing on searches for [open office] and [openoffice.org]. He wonders if Bing is "allowing its results to be unduly influenced by either money or corporate policy." But, upon further digging with some help from SEL's Vanessa Fox, that's not the case. To be clear: Page [...]


Google Adds Googlebot-News User Agent To Allow Blocking Google News

Google announced they have added a new supported user agent that can be used to block Google from indexing stories specifically on Google News. You can now specific the user agent Googlebot-News in your robots.txt file to exclude documents from being crawled and displayed in Google News. Google has several user agents for controlling how Google has access to your content. They have Googlebot, Googlebot-Image and so on. Each can be used to block specific content from showing up in specific Google search properties. In an effort to please some of the news providers, Google announced yes [...]


Get Our News, Everywhere!

Daily Email:

Follow Search Engine Land on Twitter @sengineland Like Search Engine Land on Facebook Follow Search Engine Land on Google+ Get the Search Engine Land Feed Connect with Search Engine Land on LinkedIn Check out our Tumblr! See us on Pinterest

 
 

Click to watch SMX conference video

Join us at one of our SMX or MarTech events:

United States

Europe

Australia & China

Learn more about: SMX | MarTech


Free Daily Search News Recap!

SearchCap is a once-per-day newsletter update - sign up below and get the news delivered to you!

 


 

Search Engine Land Periodic Table of SEO Success Factors

Get Your Copy
Read The Full SEO Guide