January 2007 Update On Google Indexing & Ranking Issues

Google’s Matt Cutts has issued a weather report on Google indexing and ranking issues, which I’ll get into below. In addition, I’ll recap a number of Google concerns that I’ve noted over the past two or three weeks. I’ve been saving these to follow up on when I’m out at Google later this month.

I’m meeting up with Matt and others from the search quality team in Mountain View, then with Vanessa Fox and others from Google Webmaster Central up in the Kirkland / Seattle office. So expect to hear back. And if you’ve got things you’d like me to ask about, note them in the comments below.

Now onward to news of visible PageRank updates, supplemental listings, filetype command changes, lost home pages in country-specific results, an update on the disappearing sex blogs, concerns that CSS is being spidered to detect hidden text, duplicate content worries, the Minus 30 penalty and what to do if Google thinks you are a malware site.

Visible PageRank Updated

Scores shown in the Google Toolbar and elsewhere are currently being updated. That shouldn’t impact search rankings, since Matt says these scores have already been in use there. Google Toolbar PageRank Update Being Reported from Search Engine Roundtable covers some of the changes that have been noted, and Google Pagerank update or outage from Dave Naylor has him showing how scores from different data centers are all over the place.

FYI, I’ve always told people not to worry about PageRank scores, that they focus too on them. To underscore this, it occurs to me I don’t even know the PageRank score for Search Engine Land. So I fired up the Google Toolbar and found we’ve got a zero. Google’s "measure of our importance" is 0/10.

Despite this, my monthly stats recap covers how Google is sending us plenty of traffic. That will grow as both our content, link reputation and site reputation grows.

Here’s a good example of that. Search for [sex blogs]. Out on the third page of results, you’ll see the Digg article about my own article, Of Disappearing Sex Blogs & Google Updates. But my article is nowhere to be found. Seriously. Check out:

sex blogs site:searchengineland.com

At the end of the results, you’ll see:

In order to show you the most relevant results, we have omitted some entries very similar to the 20 already displayed.
If you like, you can repeat the search with the omitted results included.

If I use that link, then I finally see the page. That helps me know it is in the index. It’s just not ranking. Even going 100 pages into a search on [sex blogs], it doesn’t rank while the Digg page about the actual article it does.

What’s going on? My view is that Digg is a long established site with far more reputation and trust than Search Engine Land has. As I said, that will change over time. While I wait, fair to say, I’m not watching that PageRank meter and worrying about when it will change. I’ll know long before then that the site is more trusted based on how my traffic from Google increases.

Still, I think if Google’s going to offer the meter, the scores ought to match the same PageRank scores currently in the index. The idea of it being potentially a full quarter behind just doesn’t feel right.

Supplemental Reassurance

Matt tries to calm people who fret that they’ve got pages in Google’s supplemental results. And yes, sites will do well even if they have some of their pages in the supplemental index. And yes, supplemental pages can rank and produce traffic. But c’mon. Matt already notes that pages in the supplemental index are those deemed less important from a PageRank perspective by Google. He says the freshness of these pages is set to improve, but they’re still not as fresh as other pages. Bottom line: pages in there aren’t as important. People understand this, and it’s no wonder they get concerned about showing up there.

I’ve hated the supplemental index since it was launched back in 2003. As I wrote then:

Using a supplemental index may be new for Google, but it’s old to the search engine industry. Inktomi did the same thing in the past, rolling out what became known as the small "Best Of The Web" and larger "Rest Of The Web" indexes in June 2000.

It was a terrible, terrible system. Horrible. As a search expert, you never seemed to know which of Inktomi’s partners was hitting all of its information or only the popular Best Of The Web index. As for consumers, well, forget it — they had no clue.

It also doesn’t sound reassuring to say, "we’ll check the good stuff first, then the other stuff only if we need to." What if some good stuff for whatever reason is in the second index? That’s a fear some searchers had in the past — and it will remain with Google’s revival of this system.

Why not simply expand the existing Google index, rather than go to a two tier approach?

"The supplemental is simply a new Google experiment. As you know we’re always trying new and different ways to provide high quality search results," said Google spokesperson Nate Tyler.

OK, it’s new, it’s experimental – but Google also says there are currently no plans to eventually integrate it into the main index.

Basically, the supplemental index is a way for Google to hit less important pages in specific instances when it can’t find matches in the main index. Trying to search against tens of billions of pages all at once is time consuming and expensive. Far easier to hit just the "best of the web," exactly as Inktomi used to do — and for exactly the same reasons. But it’s a continuing reminder that Google can’t do it all. No matter how great those machines are, they have to divide up that index. The "best of the web" might still be tens of billions of pages, but divisions still raise concerns.

Need some more advice on the supplemental index? Matt McGee recently had an article you might look at, Breaking Out of Google’s Supplemental Index.

Filetype Command To Change

Want to know all the .doc documents in Google? This command won’t help you:

filetype:doc

That’s because Google insists on the command also having a search query term, like this:

filetype:doc cars

This looks to change in the near future, says Matt. When it does, expect mini-size comparison to begin once again. Ugh. I hate those type of size wars.

Country-Specific Results & Lost Home Pages

Sometimes a search on a Google country specific site (like Google UK) apparently wasn’t showing the home page of a web site if pages were narrowed down to a particular country. A fix for this is happening. Google Not Showing Localized Pages Within Country For .COM TLDs from Search Engine Roundtable

covers more about what’s been happening with it.

Back To The Sex Blogs

OK, that covers what Matt talked about in his post. Be sure you read it, to get the full scoop from him. But what else have we got, Google index and ranking-wise? Let’s get back to the sex, or the sex bloggers, that is.

Google Search Snafu Can Have Huge Impact on Niche Blogs from PBS MediaShift revisits the sex blog story I already covered at the end of last month. It’s a nice, fresh look most notable for giving us finally an official explanation of what happened:

This is relatively rare. In this case, a bug (plus an unfortunate side-effect that caused our internal tests to miss the problem) caused a very small number of sites (less than ten sites that we know of) to rank lower for about four days. We believe this situation is fully corrected and the very small number of impacted sites have been returned to their proper ranking. We’ve fixed the particular set of circumstances that led to this situation, and we’ve put changes in place to prevent it from happening again. We’ve also added more tests to our internal processes to spot similar problems to this in the future.

Sorry, Matt — that’s not quite adding up to me. I can see the bug hitting an individual site, but a bug that just happened to hit a small number of sex related sites? It feels more like a bug or change to a ranking algorithm tied to sex-related queries that didn’t do what Google was hoping and impacted a small number of sites people care about. It might have reduced the rank of a bunch of other sex site and perhaps in ways many people would want.

By the way Growing Threat to Online Business – Becoming Collateral Damage in Google from Threadwatch also looks at the MediaShift article and start a discussion on the impact ranking shifts in Google can have on businesses.

CSS Crawling For Spam Detection?

Google Takes on Hidden Text  got attention after hitting Digg, starting some concerns this might be part of an attempt to track down usage of CSS to hide spam text. Maybe. Honestly, I thought Google had been spidering CSS files on the odd occasion anyway.

Is Google Sending GoogleBot CSS Hunting: Google Crawling CSS Files? from Search Engine Roundtable gives some background and scattered other reports. It’s on my to do list to follow up on. Short answer is yes, potentially this could hurt some sites, if Google decides to get smarter looking at CSS used for "poor man’s cloaking" of text. But then again, it could also wipe out a lot of sites that are completely ignorant that hiding text with CSS might be bad. I’d say if Google starts doing this, it’s more likely to be an additional signal they’ll pick up rather than an immediate death sentence.

Duplicate Content Freakout Continues

If I never hear duplicate content raised as a concern again, I’d only be too happy. I think there’s likely too much unnecessary worry in this area. Now that people have learned to accept the idea of a sandbox in however they want to define that, it’s no longer the handy general purpose excuse for why they might not be ranking well. So instead, let’s blame duplicate content!

I’m not making light of the worries entirely, just trying to provide some perspective. The fact there are so many worries is a pretty good indicator there’s some validity behind them.

Guide To Fixing Google Duplicate Content & Canonical URL Issues from Search Engine Roundtable covers a WebmasterWorld thread with some advice for those with issues, and More Tidbits on Google’s Duplicate Content Filter covers another thread with a ton of advice from Google’s Adam Lasnik.

I also highly recommend watching the video we mentioned last month where Rand Fishkin of SEOmoz talks with Google’s Vanessa Fox on the issue and other ones. Informative, plus funny as well. WebProNews is doing an awesome job with these videos.

30 Below In Google: Minus 30 Penalty

Updating Talk of the Google "Minus 30" Penalty from Search Engine Roundtable covers those at WebmasterWorld getting back into the idea that there’s a "Minus 30" penalty that if imposed on a site causes them to go from the first page to the third page or beyond in results. I suppose Minus 3.0 is a better name than Sandbox 2.0. Anyway, it’s also on the to do list for follow-up.

Malware Reporting

What To Do When Google Tells People Your Website Is Dangerous? at TechDirt covers how a site accidentally got one of Google’s rare malware warnings and how turning to a blog outcry for help seemed the fastest way to solve the issue.

My AOL Has "Safest" Results & Free Results Safer Than Paid article from last month addressed this issue at the end, so let me highlight that now:

Google Webmaster Central recently rolled out new support to help site owners know if they are bad, bad, bad. Check out the information here on how to monitor this in the tools they provide and how to appeal, if you think you’ve been unfairly nabbed.

Of course, back to the story, the problem was the Google tells you to appeal to StopBadware.org, which in turn apparently takes 10 days for review. No wonder they went the blog outcry route.

FYI, more information about Google messages and how StopBadware.org is involved with them are covered at StopBadware in this FAQ. Help about it from Google is here, and I’d like to see that help page get updated to point to the specific FAQ at StopBadware.org.

What’s Your Problem?

So that’s what I’ve spotted over the past week. As I said, got issues you want addressed? Have more examples to highlight some of the points above? Leave a comment, then I’ll have that handy for my Google visits.

Related Topics: Channel: SEO | Google: SEO | Google: Webmaster Central

Sponsored


About The Author: is a Founding Editor of Search Engine Land. He’s a widely cited authority on search engines and search marketing issues who has covered the space since 1996. Danny also serves as Chief Content Officer for Third Door Media, which publishes Search Engine Land and produces the SMX: Search Marketing Expo conference series. He has a personal blog called Daggle (and keeps his disclosures page there). He can be found on Facebook, Google + and microblogs on Twitter as @dannysullivan.

Connect with the author via: Email | Twitter | Google+ | LinkedIn



SearchCap:

Get all the top search stories emailed daily!  

Share

Other ways to share:
 

Read before commenting! We welcome constructive comments and allow any that meet our common sense criteria. This means being respectful and polite to others. It means providing helpful information that contributes to a story or discussion. It means leaving links only that substantially add further to a discussion. Comments using foul language, being disrespectful to others or otherwise violating what we believe are common sense standards of discussion will be deleted. Comments may also be removed if they are posted from anonymous accounts. You can read more about our comments policy here.
  • http://seoblog.intrapromote.com/ Erik

    Danny,

    What you describe about SEL / Digg is really common, and I guess you’re seeing the other side of it. We had plenty of posts that SEW either covered or linked to in passing, and it nearly ALWAYS outranked our post. In the end, we just comforted ourselves by saying, “well, at least people are getting the info from somewhere, right?”

    No, it didn’t make us feel better either… ;-)

  • http://brianmseo.blogspot.com Brian M

    Hi Danny,

    It would be great to have some way to let Google know that there was a problem, other than leaving posts in forums.

    How about a “Contact Us” link in Webmaster tools so we could let them know when we see a problem or don’t understand why our home page has been dropped from the index (that also happened here in the US this week). We can report Spam, or request re-inclusion if our site has violated their guidelines, but there is no place to say, “Help!”

    Yes, they would probably be flooded with requests, but it would be better than the convoluted way that information is getting passed around.

  • http://www.wolf-howl.com graywolf

    Darn I was going to blog about digg “beating” you on the sex blog thing. I cam across it last night when putting the threadwatch story together.

  • http://seo-theory.blogspot.com/ Michael Martinez

    I think part of the Duplicate Content Freakout is due to the old school SEOs trotting out ‘duplicate content goes into the Supplemental Index’ responses in knee-jerk fashion every time someone complains about their pages going Supplemental.

    If people would just stop and do some actual analysis or ask for more information before provding explanations in the SEO forums, the signal-to-noise ratio would improve tremendously.

    Even many admins and moderators don’t bother to get enough facts before handing out irrelevant stock answers. So the SEO advisory community is very much to blame for many of the misunderstandings about duplicate content simply because the “experts” are very dismissive of anyone whose content has gone Supplemental.

  • http://sethf.com/ Seth Finkelstein

    Take a look at the SERPS now for starts-with-vee-and-rhymes-with-”Niagra”.

    How is this reconciled with Matt Cutts’ statement:
    http://blog.outer-court.com/archive/2006-08-03-n29.html

    “And the fact is, we don’t really have much in the way to say, this is a link from the ODP, or from .gov, or .edu, to give that some sort of special boost. It’s just those sites tend to have higher PageRank, because more people link to them (and reputable people link to them).”

    While it may be all coincidence or trends in spamming, it sure looks like .edu sites have TrustRank bonuses.

  • http://jambecorp.blogspot.com James

    >CSS Crawling

    What I’m interested in is how Google handles text hidden using css for other reasons. Particularly ajax type situations, or people who use CSS/Javascript to show and hide menus…

    Otherwise finding hidden text seems like a good idea to me :)

  • http://seo-theory.blogspot.com/ Michael Martinez

    Seth Finkelstein wrote: “While it may be all coincidence or trends in spamming, it sure looks like .edu sites have TrustRank bonuses.”

    It looks more like people have built up trust in those pages through inbound linkage because the domains themselves have not been lumped in with so-called “bad neighborhoods”.

    Trust is a self-managing mechanism. All the search engines can do is pick a standard for measuring trust and apply it. Each standard has some flaws, of course.

    BTW — as Danny noted here on SearchEngineLand (and elsewhere), “TrustRank” is a Yahoo! term. Google uses “Trust Filters” (per Matt Cutts on his blog).

  • http://linkleecher.com Tom Churm

    Hi Danny,

    I discovered your radio shows about three months ago now via WebMasterRadio.fm and have become a regular listener and subscriber to your sites’s feeds.

    >>What’s your problem?

    OK, since you asked…

    My problem is I have a site being penalized: (churm.com – full link not added to avoid giving you any kind of penalty) – it’s out of the index. In Google’s WebMaster Tools I receive no message stating that I’m being penalized, perhaps because it’s an affiliate site and Google thinks it’s therefore evil ?

    I believe my site adds content by offering RSS feeds for Amazon products and searches – which Amazon does not. Also I believe that because my site offers a more minimal layout than Amazon it is easier for at least some people to use.

    But several reinclusion requests sent off over a period of several months have resulted in not a single response back from Google…and a site not found in Google is, these days, a dead site.

    Is there anything else I can do, or do I have to just write off the favorite domain in my portfolio as unusable?

    Thanks,

    Tom

  • http://searchengineland.com/070111-100415.php core3

    RE: “Country-Specific Results & Lost Home Pages”.
    I’m glad to learn that Matt Cutts has promised to look into this problem, although his comments so far seem to indicate doubts that it IS a problem. I just want to add confirmation from my own experience in the UK that it comes up frequently. British companies with sites hosted in the UK and with plenty of evidence in site content that describes them as a UK-based or UK-only business who choose the .com domain (rather than co.uk) can be listed in the top ten default results of google.co.uk but not found at all when the same search is performed using the “pages from the UK” option offered by google.co.uk. Companies choosing the co.uk domain for their sites are guaranteed an appropriate ranking in the “pages from the UK” results. Otherwise, however, there is little difference between Google’s default results and its “pages from the UK” results, at least for users located in the UK and it’s hard to understand what purpose is served by offering the two options at all. If the default results offered a ‘world view’ by ignoring the user’s (UK) location and then a (very different) UK-centric view via the “pages from the UK” option, it would make some sense. There is a strong case for making the default results at google.co.uk “pages from the UK” – perhaps with an option to choose a wider-world view as an alternative… in other words reversing the present situation, but at the same time making the results that are not “pages from the UK” more of a real alternative.

  • http://linkleecher.com Tom Churm

    Hi Danny,

    Maybe I’m being a little naive here, but you asked us for questions that you would pose to Matt…

    …Then you visited him and came back, but we never found out if our questions were presented to him or not.

    I have a serious problem with my site being delisted and don’t know anyone at Google who I can ask for help.

    Also repeated reinclusion requests have resulted in no response from Google.

    Is there any way I get help, in regards to my aforementioned problem ?

    Thanks and I’m sorry to bother you with this,

    Tom

Get Our News, Everywhere!

Daily Email:

Follow Search Engine Land on Twitter @sengineland Like Search Engine Land on Facebook Follow Search Engine Land on Google+ Get the Search Engine Land Feed Connect with Search Engine Land on LinkedIn Check out our Tumblr! See us on Pinterest

 
 

Click to watch SMX conference video

Join us at one of our SMX or MarTech events:

United States

Europe

Australia & China

Learn more about: SMX | MarTech


Free Daily Search News Recap!

SearchCap is a once-per-day newsletter update - sign up below and get the news delivered to you!

 


 

Search Engine Land Periodic Table of SEO Success Factors

Get Your Copy
Read The Full SEO Guide