Back to top

    January 2007 Update On Google Indexing & Ranking Issues

    Google’s Matt Cutts has issued a weather report on Google indexing and ranking issues, which I’ll get into below. In addition, I’ll recap a number of Google concerns that I’ve noted over the past two or three weeks. I’ve been saving these to follow up on when I’m out at Google later this month. I’m […]

    Google’s Matt Cutts has

    issued
    a weather report on Google indexing and ranking issues, which I’ll
    get into below. In addition, I’ll recap a number of Google concerns that I’ve
    noted over the past two or three weeks. I’ve been saving these to follow up on
    when I’m out at Google later this month.

    I’m meeting up with Matt and others from the search quality team in Mountain
    View, then with Vanessa Fox and others from
    Google Webmaster Central up in
    the Kirkland / Seattle office. So expect to hear back. And if you’ve got things
    you’d like me to ask about, note them in the comments below.

    Now onward to news of visible PageRank updates, supplemental listings,
    filetype command changes, lost home pages in country-specific results, an update
    on the disappearing sex blogs, concerns that CSS is being spidered to detect
    hidden text, duplicate content worries, the Minus 30 penalty and what to do if
    Google thinks you are a malware site.

    Visible PageRank Updated

    Scores shown in the Google Toolbar and elsewhere are currently being updated.
    That shouldn’t impact search rankings, since Matt says these scores have already
    been in use there.

    Google Toolbar PageRank Update Being Reported
    from Search Engine Roundtable
    covers some of the changes that have been noted, and

    Google Pagerank update or outage
    from Dave Naylor has him showing how scores
    from different data centers are all over the place.

    FYI, I’ve always told people not to worry about PageRank scores, that they focus
    too on them. To underscore this, it occurs to me I don’t even know the PageRank
    score for Search Engine Land. So I fired up the Google Toolbar and found we’ve
    got a zero. Google’s "measure of our importance" is 0/10.

    Despite this, my
    monthly stats recap
    covers how Google is sending us plenty of traffic. That
    will grow as both our content, link reputation and site reputation grows.

    Here’s a good example of that. Search for [sex blogs]. Out on the
    third
    page
    of results, you’ll see the Digg

    article
    about my own article,
    Of
    Disappearing Sex Blogs & Google Updates
    . But my article is nowhere to be
    found. Seriously. Check out:


    sex blogs site:searchengineland.com

    At the end of the results, you’ll see:

    In order to show you the most relevant results, we have omitted some
    entries very similar to the 20 already displayed.
    If you like, you can

    repeat the search with the omitted results included
    .

    If I use that link, then I finally see the page. That helps me know it is in
    the index. It’s just not ranking.

    Even going
    100 pages into a search on [sex blogs], it doesn’t rank while the
    Digg page about the actual article it does.

    What’s going on? My view is that Digg is a long established site with far
    more reputation and trust than Search Engine Land has. As I said, that will
    change over time. While I wait, fair to say, I’m not watching that PageRank
    meter and worrying about when it will change. I’ll know long before then that
    the site is more trusted based on how my traffic from Google increases.

    Still, I think if Google’s going to offer the meter, the scores ought to
    match the same PageRank scores currently in the index. The idea of it being
    potentially a full quarter behind just doesn’t feel right.

    Supplemental Reassurance

    Matt tries to calm people who fret that they’ve got pages in Google’s
    supplemental results. And yes, sites will do well even if they have some of
    their pages in the supplemental index. And yes, supplemental pages can rank and
    produce traffic. But c’mon. Matt already notes that pages in the supplemental
    index are those deemed less important from a PageRank perspective by Google. He
    says the freshness of these pages is set to improve, but they’re still not as
    fresh as other pages. Bottom line: pages in there aren’t as important. People
    understand this, and it’s no wonder they get concerned about showing up there.

    I’ve hated the supplemental index since it was launched back in 2003. As I
    wrote
    then:

    Using a supplemental index may
    be new for Google, but it’s old to the search engine industry. Inktomi did the
    same thing in the past,
    rolling
    out
    what became known as the small "Best Of The Web" and larger "Rest Of
    The Web" indexes in June 2000.

    It was a terrible, terrible
    system. Horrible. As a search expert, you never seemed to know which of
    Inktomi’s partners was hitting all of its information or only the popular Best
    Of The Web index. As for consumers, well, forget it — they had no clue.

    It also doesn’t sound
    reassuring to say, "we’ll check the good stuff first, then the other stuff
    only if we need to." What if some good stuff for whatever reason is in the
    second index? That’s a fear some searchers had in the past — and it will
    remain with Google’s revival of this system.

    Why not simply expand the
    existing Google index, rather than go to a two tier approach?

    "The supplemental is simply a
    new Google experiment. As you know we’re always trying new and different ways
    to provide high quality search results," said Google spokesperson Nate Tyler.

    OK, it’s new, it’s experimental
    — but Google also says there are currently no plans to eventually integrate
    it into the main index.

    Basically, the supplemental index is a way for Google to hit less important
    pages in specific instances when it can’t find matches in the main index. Trying
    to search against tens of billions of pages all at once is time consuming and
    expensive. Far easier to hit just the "best of the web," exactly as Inktomi used
    to do — and for exactly the same reasons. But it’s a continuing reminder that
    Google can’t do it all. No matter how great those machines are, they have to
    divide up that index. The "best of the web" might still be tens of billions of
    pages, but divisions still raise concerns.

    Need some more advice on the supplemental index? Matt McGee recently had an
    article you might look at,
    Breaking Out of
    Google’s Supplemental Index
    .

    Filetype Command To Change

    Want to know all the .doc documents in Google? This command won’t help you:


    filetype:doc

    That’s because Google insists on the command also having a search query term,
    like this:


    filetype:doc cars

    This looks to change in the near future, says Matt. When it does, expect
    mini-size comparison to begin once again. Ugh. I
    hate those
    type of size wars.

    Country-Specific Results & Lost Home Pages

    Sometimes a search on a Google country specific site (like Google UK)
    apparently wasn’t showing the home page of a web site if pages were narrowed
    down to a particular country. A fix for this is happening.

    Google Not Showing Localized Pages Within Country For .COM TLDs
    from Search
    Engine Roundtable

    covers more about what’s been happening with it.

    Back To The Sex Blogs

    OK, that covers what Matt talked about in his post. Be sure you
    read
    it
    , to get the full scoop from him. But what else have we got, Google index
    and ranking-wise? Let’s get back to the sex, or the sex bloggers, that is.


    Google Search Snafu Can Have Huge Impact on Niche Blogs
    from PBS MediaShift
    revisits the sex blog story I
    already covered at
    the end of last month. It’s a nice, fresh look most notable for giving us
    finally an official explanation of what happened:

    This is relatively rare. In this case, a bug (plus an unfortunate
    side-effect that caused our internal tests to miss the problem) caused a very
    small number of sites (less than ten sites that we know of) to rank lower for
    about four days. We believe this situation is fully corrected and the very
    small number of impacted sites have been returned to their proper ranking.
    We’ve fixed the particular set of circumstances that led to this situation,
    and we’ve put changes in place to prevent it from happening again. We’ve also
    added more tests to our internal processes to spot similar problems to this in
    the future.

    Sorry, Matt — that’s not quite adding up to me. I can see the bug hitting an
    individual site, but a bug that just happened to hit a small number of sex
    related sites? It feels more like a bug or change to a ranking algorithm tied to
    sex-related queries that didn’t do what Google was hoping and impacted a small
    number of sites people care about. It might have reduced the rank of a bunch of
    other sex site and perhaps in ways many people would want.

    By the way Growing Threat to
    Online Business – Becoming Collateral Damage in Google
    from Threadwatch also
    looks at the MediaShift article and start a discussion on the impact ranking
    shifts in Google can have on businesses.

    CSS Crawling For Spam Detection?


    Google Takes on Hidden Text
     got attention after
    hitting Digg,
    starting some concerns this might be part of an attempt to track down usage of
    CSS to hide spam text. Maybe. Honestly, I thought Google had been spidering CSS
    files on the odd occasion anyway.

    Is Google Sending GoogleBot CSS Hunting: Google Crawling CSS Files?
    from Search Engine Roundtable gives some background and scattered other reports.
    It’s on my to do list to follow up on. Short answer is yes, potentially this
    could hurt some sites, if Google decides to get smarter looking at CSS used for
    "poor man’s cloaking" of text. But then again, it could also wipe out a lot of
    sites that are completely ignorant that hiding text with CSS might be bad. I’d
    say if Google starts doing this, it’s more likely to be an additional signal
    they’ll pick up rather than an immediate death sentence.

    Duplicate Content Freakout Continues

    If I never hear duplicate content raised as a concern again, I’d
    only be too happy. I think there’s likely too much unnecessary worry in this
    area. Now that people have learned to accept the idea of a sandbox in however
    they want to define that, it’s no longer the handy general purpose excuse for
    why they might not be ranking well. So instead, let’s blame duplicate content!

    I’m not making light of the worries entirely, just trying to
    provide some perspective. The fact there are so many worries is a pretty good
    indicator there’s some validity behind them.

    Guide To Fixing Google Duplicate Content & Canonical URL Issues
    from Search
    Engine Roundtable covers a WebmasterWorld
    thread with some
    advice for those with issues, and

    More Tidbits on Google’s Duplicate Content Filter
    covers another
    thread with a ton
    of advice from Google’s Adam Lasnik.

    I also highly recommend watching the

    video
    we mentioned
    last month where Rand Fishkin of SEOmoz talks with Google’s Vanessa Fox on the
    issue and other ones. Informative, plus funny as well. WebProNews is doing an
    awesome job with these videos.

    30 Below In Google: Minus 30 Penalty

    Updating Talk of the Google "Minus 30" Penalty
    from Search Engine Roundtable covers those at WebmasterWorld getting back into
    the idea that there’s a "Minus 30" penalty that if imposed on a site causes them
    to go from the first page to the third page or beyond in results. I suppose
    Minus 3.0 is a better name than Sandbox 2.0. Anyway, it’s also on the to do list
    for follow-up.

    Malware Reporting

    What To Do
    When Google Tells People Your Website Is Dangerous?
    at TechDirt covers how
    a site accidentally got one of Google’s rare malware warnings and how turning to
    a blog

    outcry
    for help seemed the fastest way to solve the issue.

    My AOL Has "Safest"
    Results & Free Results Safer Than Paid
    article from last month addressed
    this issue at the end, so let me highlight that now:

    Google Webmaster Central
    recently

    rolled out new support
    to help site owners know if they are bad, bad, bad.
    Check out the information here on how to monitor this in the tools they
    provide and how to appeal, if you think you’ve been unfairly nabbed.

    Of course, back to the story, the problem was the Google tells you to appeal
    to StopBadware.org, which in turn apparently takes 10 days for review. No wonder
    they went the blog outcry route.

    FYI, more information about Google messages and how StopBadware.org is
    involved with them are covered at StopBadware in this
    FAQ.
    Help about it from Google is

    here
    , and I’d like to see that help page get updated to point to the
    specific FAQ at StopBadware.org.

    What’s Your Problem?

    So that’s what I’ve spotted over the past week. As I said, got issues you
    want addressed? Have more examples to highlight some of the points above? Leave
    a comment, then I’ll have that handy for my Google visits.


    Contributing authors are invited to create content for Search Engine Land and are chosen for their expertise and contribution to the search community. Our contributors work under the oversight of the editorial staff and contributions are checked for quality and relevance to our readers. Search Engine Land is owned by Semrush. Contributor was not asked to make any direct or indirect mentions of Semrush. The opinions they express are their own.


    About the Author

    Danny Sullivan
    Danny Sullivan was a journalist and analyst who covered the digital and search marketing space from 1996 through 2017. He was also a cofounder of Third Door Media, which publishes Search Engine Land and MarTech, and produces the SMX: Search Marketing Expo and MarTech events. He retired from journalism and Third Door Media in June 2017. You can learn more about him on his personal site & blog He can also be found on Facebook and Twitter.