Sign Up To Receive This Content Via Email:  


Google Fixing Reverse DNS GoogleBot Verification

Google Webmaster Tools - Facebook Featured Google has confirmed that there are some GoogleBot useragent spiders not properly passing verification protocol. Savvy webmasters noticed that GoogleBot over the .249.70.0 /24 IP range was not returning the proper reverse DNS verification details. The response given was "no such host is known," but the activity webmasters noticed was that it did appear to be a legit Google crawler. Google's John Mueller confirmed on my Google+ post that this was indeed an issue on Google's end. Temporarily they have stopped GoogleBot activity in those IP ranges and will fix the issue before they continue [...]


Google’s Matt Cutts: NoFollow Attributes On Internal Links Don’t Hurt But Generally Don’t Do It

google-matt-cutts-sitedown In a video, Google's head of search spam Matt Cutts published today an answer to the question, "Should I use rel="nofollow" on internal links to a login page?" Matt Cutts basically said you shouldn't, but said it won't hurt you if you did. Matt said, "It doesn't hurt if you want to put a nofollow pointing to a login page or to a page that you think is really useless." But Matt said, "in general" it also doesn't hurt to not add a nofollow, and in general, you should let Googlebot crawl and explore your site. In most cases, using a noindex may be better than using a nofollow on the link at [...]


Snowden Petition Blocked From Google? Like All Petitions, It Won’t Be When It Gets Enough Signatures

donotenter-block-caution-featured Search for "edward snowden petition" on Google to find the petition filed through the White House petitions site, and you'll see something odd. The petition has no description, because the White House won't let Google crawl the page. But it's not a move against Snowden, as some might think. It's part of how the petitions site has worked with search engines for some time. Here's how the listing looks: Notice the description: "A description for this result is not available because of the site's robot.txt -- learn more." iAcquire noted the oddity this week, that the page is listed [...]


Twitter Opens Up To More Crawling, But Do Search Engines Want Its Search Results In Theirs?

twitter-search-featured Twitter recently updated its robots.txt file and, though the change opens up millions of pages to being crawled, there's no guarantee that the main search engines want what Twitter is offering. The Sociable seems to have been the first to notice Twitter's robots.txt changes, which now specifically allow Google, Bing, Yahoo, Yandex and other bots to crawl through some of Twitter's search results pages. Twitter: Change Made To Help With Discovery Twitter confirmed the change to us, saying: This change will help people find popular and helpful Twitter pages, such as the #olympics hashtag pa [...]


The Latest & Greatest On SEO Pagination

pages-pagination-featured Technical SEO topics such as pagination are near and dear to my heart. This article will build upon and update my previous treatment of pagination and SEO. I've written and presented often on pagination for SEO. Why so much attention on this subject? The reason is simple: it can be a big, hairy deal for sites. It's right up there with faceted navigation as one of the most problematic crawling and indexing issues for large-scale SEO. It's a tactic (actually a set of tactics) that our teams are continually evolving, testing, and refining. So it was "double prizes" when Google announced [...]


Google Slows Web Crawlers To Help Blackouts Sites

As you know, there are many sites going black to protest SOPA and PIPA. Google has already offered blackout SEO advice but they decided to take it one step further by slowing down their spiders today. Pierre Far from Google posted on his Google+ that Google is slowing down GoogleBot's crawl activity to reduce the effect on their site's search rankings, if they did not follow the Google SEO advice from yesterday. Pierre Far said: Hello webmasters! We realize many webmasters are concerned about the medium-term effects of today's blackout. As a precaution, the crawl team at Google has conf [...]


How To Blackout Your Site (For SOPA/PIPA) Without Hurting SEO

Google-Webmaster-SEO-Rep-1304428070 A number of websites are (or were) planning to "go black" this week while the U.S. Congress discusses issues related to the Stop Online Piracy Act (SOPA) and the Protect IP Act (PIPA). The website blackouts are part of a larger social media effort against the bills that our Greg Finn wrote about this morning on Marketing Land. You may be thinking about joining the website blackout movement, but yikes ... what about the SEO implications? How do you take your site offline in protest without messing up your visibility in Google's search results? Well, Google's Pierre Far shared several tips [...]


Google Can Now Execute AJAX & JavaScript For Indexing

Google-Webmaster-SEO-Rep-1304428070 This morning we reported that the comments on Facebook are being indexed by Google. Google's Matt Cutts just confirmed on Twitter that Google is now able to "execute AJAX/JS to index some dynamic comments." This gives Google's spider, GoogleBot, the ability to read comments in AJAX or JavaScript, such as Facebook comments or Disqus comments and others that are dynamically loaded via AJAX or JavaScript. In addition, this means, Google is better at seeing the content behind more of your JavaScript or AJAX. Postscript: Google now has an official blog post up with more details. Related [...]


Google Disables URL Removals After Bug Allows Anyone To Remove Any Site

Google Webmaster This morning, James Breckenridge discovered a loophole within Google's Webmaster Tools that allowed anyone to remove any site from Google. Both James and I sent this information to Google as soon as we heard of it. After several hours, Google has told us, "we're still investigating this report, and to be cautious we disabled all URL removals earlier this morning." So now, if you even own a site, you won't be able to remove the site or pages from the site using Google's URL removal tool. How did this loophole work? Pretty simple as James described. You use the following URL when logged [...]


Google Webmaster Tools Remove URL With Blocking Not Required

Google announced on the Webmaster blog that they have removed a requirement for removing URLs via Google Webmaster Tools. Google no longer requires you to block access to the URL you want to remove from Google prior to submitted the URL removal request. Google said they have eliminated "the requirement that the webpage's URL must first be blocked by a site owner before the page can be removed from Google's search results." Why did Google drop this requirement? Simply because since you have already validated and verified you are the owner of the site, Google felt it was redundant to requi [...]


A Lesson From the Indexing of Google Translate: Blocking Search Results From Search Results

Last year, Google published an SEO Report Card of 100 Google properties. In it, they rated themselves on how well the sites were optimized for search. Google's Matt Cutts presented the results at SMX West 2010 in Ignite format. He noted that not every Googler is an expert in search and search engine optimization. Googlers who don't work in search don't get preferential treatment from those who do and just like any site on the internet, sometimes things aren't implemented correctly. Just because a site is owned by Google doesn't mean it's the best example of what to do in terms of SEO. This [...]


Google Releases Details On Controlling GoogleBot & Google’s Crawl

The Google Webmaster Central blog announced Google has published a new set of documents in the Google Code section on how to control Google from crawling and indexing your site. You can read the set of documents over here. The technical documents are broken down into five section: Getting Started Robots.txt specification Robots meta tag and X-Robots-Tag specification Google's crawlers References I personally printed them out as my weekend reading. [...]


Robots.txt Recruiter: Daily Mail Uses Robots.txt File To Find SEO

Malcolm Coles spotted that the Daily Mail, one of the UK's largest papers, changed their robots.txt file to include a line which reads: # August 12th, MailOnline are looking for a talented SEO Manager so if you found this then you're the kind of techie we need!# Send your CV to holly dot ward at mailonline dot co dot uk How clever! They suspect some of the best SEOs out there would be sniffing around their robots.txt file and used it to recruit a new SEO manger. If anything, it is getting the word out there via the press that they are looking for a new SEO. This reminds me of when Go [...]


Facebook: No Plans To Give Search Engines Access To Facebook Questions

That's one of the big questions people are asking after yesterday's launch of Facebook Questions. While many have assumed the answer would be "yes," a Facebook spokesperson tells us that assumption is wrong. Currently, search engines cannot access questions and answers through our Questions product. That may be something we consider for the future but have no current plans to allow it. Facebook is blocking search engines by only showing Questions to logged-in users. Sure enough, a site:facebook.com/questions/ search on Google shows only a handful of results, none of which are actuall [...]


OpenOffice.org Is MIA In Bing, But It’s Not Censorship

The home page of OpenOffice.org, the well-known Microsoft Office competitor, is missing from Microsoft's Bing search engine. While it sounds suspicious, the problem has nothing to do with Bing itself -- it's a technical problem on OpenOffice.org's end. Ian McAnerin noticed earlier today that OpenOffice.org doesn't show up in Bing on searches for [open office] and [openoffice.org]. He wonders if Bing is "allowing its results to be unduly influenced by either money or corporate policy." But, upon further digging with some help from SEL's Vanessa Fox, that's not the case. To be clear: Page [...]


Google Adds Googlebot-News User Agent To Allow Blocking Google News

Google announced they have added a new supported user agent that can be used to block Google from indexing stories specifically on Google News. You can now specific the user agent Googlebot-News in your robots.txt file to exclude documents from being crawled and displayed in Google News. Google has several user agents for controlling how Google has access to your content. They have Googlebot, Googlebot-Image and so on. Each can be used to block specific content from showing up in specific Google search properties. In an effort to please some of the news providers, Google announced yes [...]


Head-To-Head: ACAP Versus Robots.txt For Controlling Search Engines

In the battle between search engines and some mainstream news publishers, ACAP has been lurking for several years. ACAP -- the Automated Content Access Protocol -- has constantly been positioned by some news executives as a cornerstone to reestablishing the control they feel has been lost over their content. However, the reality is that publishers have more control even without ACAP than is commonly believed by some. In addition, ACAP currently provides no "DRM" or licensing mechanisms over news content. But the system does offer some ideas well worth considering. Below, a look at how it [...]


A Deeper Look At Robots.txt

The Robots Exclusion Protocol (REP) is not exactly a complicated protocol and its uses are fairly limited, and thus it’s usually given short shrift by SEOs. Yet there’s a lot more to it than you might think. Robots.txt has been with us for over 14 years, but how many of us knew that in addition to the disallow directive there’s a noindex directive that Googlebot obeys? That noindexed pages don’t end up in the index but disallowed pages do, and the latter can show up in the search results (albeit with less information since the spiders can’t see the page content)? That disallowed page [...]


Google’s Advice On Using The New Canonical Tag

A month ago, Google, Yahoo and Microsoft announced they will be supporting a new canonical tag that allows you to tell search engines that page X is a duplicate page to page Z. In a way, it is a 301 redirect, without the physical redirect. The tag is incredibly powerful, as are 301 redirects and using this tag should be done with caution and slowly. Matt Cutts posted a new video explaining how one should go about using this tag, being that it is so new. Here is the video: [...]


Live Search Testing New Crawler; MSNBot/2.0b

The Live Search Blog announced they are letting a new robot loose. The new search engine crawler is named msnbot/2.0b and will be added to the army of current MSN spiders, currently named msnbot/1.1. The new spider is currently being tested but will ultimately replace the old spider. The new spider will respect the current robots.txt protocol set up for MSNBot, so no need to set up anything new in your robots.txt file. In addition, Microsoft promised to crawl slowly in their msnbot/2.0b tests. MSNBot/1.1 is not that old. It was added back in February of this year and introduced HTTP [...]


Get Our News, Everywhere!

Daily Email:

Follow Search Engine Land on Twitter @sengineland Like Search Engine Land on Facebook Follow Search Engine Land on Google+ Get the Search Engine Land Feed Connect with Search Engine Land on LinkedIn Check out our Tumblr! See us on Pinterest

 
 

Click to watch SMX conference video

Join us at one of our SMX or MarTech events:

United States

Europe

Australia & China

Learn more about: SMX | MarTech


Free Daily Search News Recap!

SearchCap is a once-per-day newsletter update - sign up below and get the news delivered to you!

 


 

Search Engine Land Periodic Table of SEO Success Factors

Get Your Copy
Read The Full SEO Guide