Search Engine Land » Platforms » Google » January 2007 Update On Google Indexing & Ranking Issues

January 2007 Update On Google Indexing & Ranking Issues

Google’s Matt Cutts has issued a weather report on Google indexing and ranking issues, which I’ll get into below. In addition, I’ll recap a number of Google concerns that I’ve noted over the past two or three weeks. I’ve been saving these to follow up on when I’m out at Google later this month. I’m […]

Danny Sullivan on January 11, 2007 at 10:04 am | Reading time: 11 minutes

Google’s Matt Cutts has

issued a weather report on Google indexing and ranking issues, which I’ll
get into below. In addition, I’ll recap a number of Google concerns that I’ve
noted over the past two or three weeks. I’ve been saving these to follow up on
when I’m out at Google later this month.

I’m meeting up with Matt and others from the search quality team in Mountain
View, then with Vanessa Fox and others from
Google Webmaster Central up in
the Kirkland / Seattle office. So expect to hear back. And if you’ve got things
you’d like me to ask about, note them in the comments below.

Now onward to news of visible PageRank updates, supplemental listings,
filetype command changes, lost home pages in country-specific results, an update
on the disappearing sex blogs, concerns that CSS is being spidered to detect
hidden text, duplicate content worries, the Minus 30 penalty and what to do if
Google thinks you are a malware site.

Visible PageRank Updated

Scores shown in the Google Toolbar and elsewhere are currently being updated.
That shouldn’t impact search rankings, since Matt says these scores have already
been in use there.

Google Toolbar PageRank Update Being Reported from Search Engine Roundtable
covers some of the changes that have been noted, and

Google Pagerank update or outage from Dave Naylor has him showing how scores
from different data centers are all over the place.

FYI, I’ve always told people not to worry about PageRank scores, that they focus
too on them. To underscore this, it occurs to me I don’t even know the PageRank
score for Search Engine Land. So I fired up the Google Toolbar and found we’ve
got a zero. Google’s "measure of our importance" is 0/10.

Despite this, my
monthly stats recap covers how Google is sending us plenty of traffic. That
will grow as both our content, link reputation and site reputation grows.

Here’s a good example of that. Search for [sex blogs]. Out on the
third
page of results, you’ll see the Digg

article about my own article,
Of
Disappearing Sex Blogs & Google Updates. But my article is nowhere to be
found. Seriously. Check out:

sex blogs site:searchengineland.com

At the end of the results, you’ll see:

In order to show you the most relevant results, we have omitted some
entries very similar to the 20 already displayed.
If you like, you can

repeat the search with the omitted results included.

If I use that link, then I finally see the page. That helps me know it is in
the index. It’s just not ranking.

Even going 100 pages into a search on [sex blogs], it doesn’t rank while the
Digg page about the actual article it does.

What’s going on? My view is that Digg is a long established site with far
more reputation and trust than Search Engine Land has. As I said, that will
change over time. While I wait, fair to say, I’m not watching that PageRank
meter and worrying about when it will change. I’ll know long before then that
the site is more trusted based on how my traffic from Google increases.

Still, I think if Google’s going to offer the meter, the scores ought to
match the same PageRank scores currently in the index. The idea of it being
potentially a full quarter behind just doesn’t feel right.

Supplemental Reassurance

Matt tries to calm people who fret that they’ve got pages in Google’s
supplemental results. And yes, sites will do well even if they have some of
their pages in the supplemental index. And yes, supplemental pages can rank and
produce traffic. But c’mon. Matt already notes that pages in the supplemental
index are those deemed less important from a PageRank perspective by Google. He
says the freshness of these pages is set to improve, but they’re still not as
fresh as other pages. Bottom line: pages in there aren’t as important. People
understand this, and it’s no wonder they get concerned about showing up there.

I’ve hated the supplemental index since it was launched back in 2003. As I
wrote
then:

Using a supplemental index may
be new for Google, but it’s old to the search engine industry. Inktomi did the
same thing in the past,
rolling
out what became known as the small "Best Of The Web" and larger "Rest Of
The Web" indexes in June 2000.

It was a terrible, terrible
system. Horrible. As a search expert, you never seemed to know which of
Inktomi’s partners was hitting all of its information or only the popular Best
Of The Web index. As for consumers, well, forget it — they had no clue.

It also doesn’t sound
reassuring to say, "we’ll check the good stuff first, then the other stuff
only if we need to." What if some good stuff for whatever reason is in the
second index? That’s a fear some searchers had in the past — and it will
remain with Google’s revival of this system.

Why not simply expand the
existing Google index, rather than go to a two tier approach?

"The supplemental is simply a
new Google experiment. As you know we’re always trying new and different ways
to provide high quality search results," said Google spokesperson Nate Tyler.

OK, it’s new, it’s experimental
— but Google also says there are currently no plans to eventually integrate
it into the main index.

Basically, the supplemental index is a way for Google to hit less important
pages in specific instances when it can’t find matches in the main index. Trying
to search against tens of billions of pages all at once is time consuming and
expensive. Far easier to hit just the "best of the web," exactly as Inktomi used
to do — and for exactly the same reasons. But it’s a continuing reminder that
Google can’t do it all. No matter how great those machines are, they have to
divide up that index. The "best of the web" might still be tens of billions of
pages, but divisions still raise concerns.

Need some more advice on the supplemental index? Matt McGee recently had an
article you might look at,
Breaking Out of
Google’s Supplemental Index.

Filetype Command To Change

Want to know all the .doc documents in Google? This command won’t help you:

filetype:doc

That’s because Google insists on the command also having a search query term,
like this:

filetype:doc cars

This looks to change in the near future, says Matt. When it does, expect
mini-size comparison to begin once again. Ugh. I
hate those
type of size wars.

Country-Specific Results & Lost Home Pages

Sometimes a search on a Google country specific site (like Google UK)
apparently wasn’t showing the home page of a web site if pages were narrowed
down to a particular country. A fix for this is happening.

Google Not Showing Localized Pages Within Country For .COM TLDs from Search
Engine Roundtable

covers more about what’s been happening with it.

Back To The Sex Blogs

OK, that covers what Matt talked about in his post. Be sure you
read
it, to get the full scoop from him. But what else have we got, Google index
and ranking-wise? Let’s get back to the sex, or the sex bloggers, that is.

Google Search Snafu Can Have Huge Impact on Niche Blogs from PBS MediaShift
revisits the sex blog story I
already covered at
the end of last month. It’s a nice, fresh look most notable for giving us
finally an official explanation of what happened:

This is relatively rare. In this case, a bug (plus an unfortunate
side-effect that caused our internal tests to miss the problem) caused a very
small number of sites (less than ten sites that we know of) to rank lower for
about four days. We believe this situation is fully corrected and the very
small number of impacted sites have been returned to their proper ranking.
We’ve fixed the particular set of circumstances that led to this situation,
and we’ve put changes in place to prevent it from happening again. We’ve also
added more tests to our internal processes to spot similar problems to this in
the future.

Sorry, Matt — that’s not quite adding up to me. I can see the bug hitting an
individual site, but a bug that just happened to hit a small number of sex
related sites? It feels more like a bug or change to a ranking algorithm tied to
sex-related queries that didn’t do what Google was hoping and impacted a small
number of sites people care about. It might have reduced the rank of a bunch of
other sex site and perhaps in ways many people would want.

By the way Growing Threat to
Online Business – Becoming Collateral Damage in Google from Threadwatch also
looks at the MediaShift article and start a discussion on the impact ranking
shifts in Google can have on businesses.

CSS Crawling For Spam Detection?

Google Takes on Hidden Text got attention after
hitting Digg,
starting some concerns this might be part of an attempt to track down usage of
CSS to hide spam text. Maybe. Honestly, I thought Google had been spidering CSS
files on the odd occasion anyway.

Is Google Sending GoogleBot CSS Hunting: Google Crawling CSS Files?
from Search Engine Roundtable gives some background and scattered other reports.
It’s on my to do list to follow up on. Short answer is yes, potentially this
could hurt some sites, if Google decides to get smarter looking at CSS used for
"poor man’s cloaking" of text. But then again, it could also wipe out a lot of
sites that are completely ignorant that hiding text with CSS might be bad. I’d
say if Google starts doing this, it’s more likely to be an additional signal
they’ll pick up rather than an immediate death sentence.

Duplicate Content Freakout Continues

If I never hear duplicate content raised as a concern again, I’d
only be too happy. I think there’s likely too much unnecessary worry in this
area. Now that people have learned to accept the idea of a sandbox in however
they want to define that, it’s no longer the handy general purpose excuse for
why they might not be ranking well. So instead, let’s blame duplicate content!

I’m not making light of the worries entirely, just trying to
provide some perspective. The fact there are so many worries is a pretty good
indicator there’s some validity behind them.

Guide To Fixing Google Duplicate Content & Canonical URL Issues
from Search
Engine Roundtable covers a WebmasterWorld
thread with some
advice for those with issues, and

More Tidbits on Google’s Duplicate Content Filter covers another
thread with a ton
of advice from Google’s Adam Lasnik.

I also highly recommend watching the

video we mentioned
last month where Rand Fishkin of SEOmoz talks with Google’s Vanessa Fox on the
issue and other ones. Informative, plus funny as well. WebProNews is doing an
awesome job with these videos.

30 Below In Google: Minus 30 Penalty

Updating Talk of the Google "Minus 30" Penalty
from Search Engine Roundtable covers those at WebmasterWorld getting back into
the idea that there’s a "Minus 30" penalty that if imposed on a site causes them
to go from the first page to the third page or beyond in results. I suppose
Minus 3.0 is a better name than Sandbox 2.0. Anyway, it’s also on the to do list
for follow-up.

Malware Reporting

What To Do
When Google Tells People Your Website Is Dangerous? at TechDirt covers how
a site accidentally got one of Google’s rare malware warnings and how turning to
a blog

outcry for help seemed the fastest way to solve the issue.

My AOL Has "Safest"
Results & Free Results Safer Than Paid article from last month addressed
this issue at the end, so let me highlight that now:

Google Webmaster Central
recently

rolled out new support to help site owners know if they are bad, bad, bad.
Check out the information here on how to monitor this in the tools they
provide and how to appeal, if you think you’ve been unfairly nabbed.

Of course, back to the story, the problem was the Google tells you to appeal
to StopBadware.org, which in turn apparently takes 10 days for review. No wonder
they went the blog outcry route.

FYI, more information about Google messages and how StopBadware.org is
involved with them are covered at StopBadware in this
FAQ.
Help about it from Google is

here, and I’d like to see that help page get updated to point to the
specific FAQ at StopBadware.org.

What’s Your Problem?

So that’s what I’ve spotted over the past week. As I said, got issues you
want addressed? Have more examples to highlight some of the points above? Leave
a comment, then I’ll have that handy for my Google visits.

Contributing authors are invited to create content for Search Engine Land and are chosen for their expertise and contribution to the search community. Our contributors work under the oversight of the editorial staff and contributions are checked for quality and relevance to our readers. The opinions they express are their own.

Add Search Engine Land to your Google News feed.