Google’s Matt Cutts has issued a weather report on Google indexing and ranking issues, which I’ll get into below. In addition, I’ll recap a number of Google concerns that I’ve noted over the past two or three weeks. I’ve been saving these to follow up on when I’m out at Google later this month.
I’m meeting up with Matt and others from the search quality team in Mountain View, then with Vanessa Fox and others from Google Webmaster Central up in the Kirkland / Seattle office. So expect to hear back. And if you’ve got things you’d like me to ask about, note them in the comments below.
Now onward to news of visible PageRank updates, supplemental listings, filetype command changes, lost home pages in country-specific results, an update on the disappearing sex blogs, concerns that CSS is being spidered to detect hidden text, duplicate content worries, the Minus 30 penalty and what to do if Google thinks you are a malware site.
Visible PageRank Updated
Scores shown in the Google Toolbar and elsewhere are currently being updated. That shouldn’t impact search rankings, since Matt says these scores have already been in use there. Google Toolbar PageRank Update Being Reported from Search Engine Roundtable covers some of the changes that have been noted, and Google Pagerank update or outage from Dave Naylor has him showing how scores from different data centers are all over the place.
FYI, I’ve always told people not to worry about PageRank scores, that they focus too on them. To underscore this, it occurs to me I don’t even know the PageRank score for Search Engine Land. So I fired up the Google Toolbar and found we’ve got a zero. Google’s "measure of our importance" is 0/10.
Despite this, my monthly stats recap covers how Google is sending us plenty of traffic. That will grow as both our content, link reputation and site reputation grows.
Here’s a good example of that. Search for [sex blogs]. Out on the third page of results, you’ll see the Digg article about my own article, Of Disappearing Sex Blogs & Google Updates. But my article is nowhere to be found. Seriously. Check out:
At the end of the results, you’ll see:
In order to show you the most relevant results, we have omitted some entries very similar to the 20 already displayed.
If you like, you can repeat the search with the omitted results included.
If I use that link, then I finally see the page. That helps me know it is in the index. It’s just not ranking. Even going 100 pages into a search on [sex blogs], it doesn’t rank while the Digg page about the actual article it does.
What’s going on? My view is that Digg is a long established site with far more reputation and trust than Search Engine Land has. As I said, that will change over time. While I wait, fair to say, I’m not watching that PageRank meter and worrying about when it will change. I’ll know long before then that the site is more trusted based on how my traffic from Google increases.
Still, I think if Google’s going to offer the meter, the scores ought to match the same PageRank scores currently in the index. The idea of it being potentially a full quarter behind just doesn’t feel right.
Matt tries to calm people who fret that they’ve got pages in Google’s supplemental results. And yes, sites will do well even if they have some of their pages in the supplemental index. And yes, supplemental pages can rank and produce traffic. But c’mon. Matt already notes that pages in the supplemental index are those deemed less important from a PageRank perspective by Google. He says the freshness of these pages is set to improve, but they’re still not as fresh as other pages. Bottom line: pages in there aren’t as important. People understand this, and it’s no wonder they get concerned about showing up there.
I’ve hated the supplemental index since it was launched back in 2003. As I wrote then:
Using a supplemental index may be new for Google, but it’s old to the search engine industry. Inktomi did the same thing in the past, rolling out what became known as the small "Best Of The Web" and larger "Rest Of The Web" indexes in June 2000.
It was a terrible, terrible system. Horrible. As a search expert, you never seemed to know which of Inktomi’s partners was hitting all of its information or only the popular Best Of The Web index. As for consumers, well, forget it — they had no clue.
It also doesn’t sound reassuring to say, "we’ll check the good stuff first, then the other stuff only if we need to." What if some good stuff for whatever reason is in the second index? That’s a fear some searchers had in the past — and it will remain with Google’s revival of this system.
Why not simply expand the existing Google index, rather than go to a two tier approach?
"The supplemental is simply a new Google experiment. As you know we’re always trying new and different ways to provide high quality search results," said Google spokesperson Nate Tyler.
OK, it’s new, it’s experimental – but Google also says there are currently no plans to eventually integrate it into the main index.
Basically, the supplemental index is a way for Google to hit less important pages in specific instances when it can’t find matches in the main index. Trying to search against tens of billions of pages all at once is time consuming and expensive. Far easier to hit just the "best of the web," exactly as Inktomi used to do — and for exactly the same reasons. But it’s a continuing reminder that Google can’t do it all. No matter how great those machines are, they have to divide up that index. The "best of the web" might still be tens of billions of pages, but divisions still raise concerns.
Need some more advice on the supplemental index? Matt McGee recently had an article you might look at, Breaking Out of Google’s Supplemental Index.
Filetype Command To Change
Want to know all the .doc documents in Google? This command won’t help you:
That’s because Google insists on the command also having a search query term, like this:
This looks to change in the near future, says Matt. When it does, expect mini-size comparison to begin once again. Ugh. I hate those type of size wars.
Country-Specific Results & Lost Home Pages
Sometimes a search on a Google country specific site (like Google UK) apparently wasn’t showing the home page of a web site if pages were narrowed down to a particular country. A fix for this is happening. Google Not Showing Localized Pages Within Country For .COM TLDs from Search Engine Roundtable
covers more about what’s been happening with it.
Back To The Sex Blogs
OK, that covers what Matt talked about in his post. Be sure you read it, to get the full scoop from him. But what else have we got, Google index and ranking-wise? Let’s get back to the sex, or the sex bloggers, that is.
Google Search Snafu Can Have Huge Impact on Niche Blogs from PBS MediaShift revisits the sex blog story I already covered at the end of last month. It’s a nice, fresh look most notable for giving us finally an official explanation of what happened:
This is relatively rare. In this case, a bug (plus an unfortunate side-effect that caused our internal tests to miss the problem) caused a very small number of sites (less than ten sites that we know of) to rank lower for about four days. We believe this situation is fully corrected and the very small number of impacted sites have been returned to their proper ranking. We’ve fixed the particular set of circumstances that led to this situation, and we’ve put changes in place to prevent it from happening again. We’ve also added more tests to our internal processes to spot similar problems to this in the future.
Sorry, Matt — that’s not quite adding up to me. I can see the bug hitting an individual site, but a bug that just happened to hit a small number of sex related sites? It feels more like a bug or change to a ranking algorithm tied to sex-related queries that didn’t do what Google was hoping and impacted a small number of sites people care about. It might have reduced the rank of a bunch of other sex site and perhaps in ways many people would want.
By the way Growing Threat to Online Business – Becoming Collateral Damage in Google from Threadwatch also looks at the MediaShift article and start a discussion on the impact ranking shifts in Google can have on businesses.
CSS Crawling For Spam Detection?
Google Takes on Hidden Text got attention after hitting Digg, starting some concerns this might be part of an attempt to track down usage of CSS to hide spam text. Maybe. Honestly, I thought Google had been spidering CSS files on the odd occasion anyway.
Is Google Sending GoogleBot CSS Hunting: Google Crawling CSS Files? from Search Engine Roundtable gives some background and scattered other reports. It’s on my to do list to follow up on. Short answer is yes, potentially this could hurt some sites, if Google decides to get smarter looking at CSS used for "poor man’s cloaking" of text. But then again, it could also wipe out a lot of sites that are completely ignorant that hiding text with CSS might be bad. I’d say if Google starts doing this, it’s more likely to be an additional signal they’ll pick up rather than an immediate death sentence.
Duplicate Content Freakout Continues
If I never hear duplicate content raised as a concern again, I’d only be too happy. I think there’s likely too much unnecessary worry in this area. Now that people have learned to accept the idea of a sandbox in however they want to define that, it’s no longer the handy general purpose excuse for why they might not be ranking well. So instead, let’s blame duplicate content!
I’m not making light of the worries entirely, just trying to provide some perspective. The fact there are so many worries is a pretty good indicator there’s some validity behind them.
Guide To Fixing Google Duplicate Content & Canonical URL Issues from Search Engine Roundtable covers a WebmasterWorld thread with some advice for those with issues, and More Tidbits on Google’s Duplicate Content Filter covers another thread with a ton of advice from Google’s Adam Lasnik.
I also highly recommend watching the video we mentioned last month where Rand Fishkin of SEOmoz talks with Google’s Vanessa Fox on the issue and other ones. Informative, plus funny as well. WebProNews is doing an awesome job with these videos.
30 Below In Google: Minus 30 Penalty
Updating Talk of the Google "Minus 30" Penalty from Search Engine Roundtable covers those at WebmasterWorld getting back into the idea that there’s a "Minus 30" penalty that if imposed on a site causes them to go from the first page to the third page or beyond in results. I suppose Minus 3.0 is a better name than Sandbox 2.0. Anyway, it’s also on the to do list for follow-up.
What To Do When Google Tells People Your Website Is Dangerous? at TechDirt covers how a site accidentally got one of Google’s rare malware warnings and how turning to a blog outcry for help seemed the fastest way to solve the issue.
My AOL Has "Safest" Results & Free Results Safer Than Paid article from last month addressed this issue at the end, so let me highlight that now:
Google Webmaster Central recently rolled out new support to help site owners know if they are bad, bad, bad. Check out the information here on how to monitor this in the tools they provide and how to appeal, if you think you’ve been unfairly nabbed.
Of course, back to the story, the problem was the Google tells you to appeal to StopBadware.org, which in turn apparently takes 10 days for review. No wonder they went the blog outcry route.
FYI, more information about Google messages and how StopBadware.org is involved with them are covered at StopBadware in this FAQ. Help about it from Google is here, and I’d like to see that help page get updated to point to the specific FAQ at StopBadware.org.
What’s Your Problem?
So that’s what I’ve spotted over the past week. As I said, got issues you want addressed? Have more examples to highlight some of the points above? Leave a comment, then I’ll have that handy for my Google visits.