I hear this question, in various forms, quite often these days:
“What’s up with Google now, post-Panda/Farmer/whaddayacallit? What am I supposed to be doing for SEO?”
(Usually accompanied with a deep sigh, aggressive hand gestures, and/or grimacing.)
It All Started With Caffeine
“Caffeine lets us index web pages on an enormous scale.” – Carrie Grimes, Google
If we look back a year ago, when Google rolled out Caffeine, which was (and still is) unprecedented in search, it was this infrastructure change that allowed for the dramatic algorithm improvements we’ve seen recently.
Caffeine was not an algorithm change but instead a massive improvement to the freshness of Google’s index and its ability to crawl and then index content nearly in real time.
But closely timed with these changes was the Mayday update, which specifically focused on returning quality results for long-tail queries. Ecommerce sites were impacted, as were any sites with an architecture built around item-level URLs standing on thin content and separated by several clicks from higher-authority pages (like home pages, major categories, or any URL with authority and unique content).
Then came Panda/Farmer. While Mayday appeared to hit a relatively small portion of the total query space, the latest version of Panda has a much stronger impact, hitting about 12% of all searches. As distinct from Mayday, which focused on long-tail quality and authority (penalizing shortcuts such as simply matching keywords to queries), Panda focuses on concepts such as quality, authority, trust and credibility, and also incorporates user signals.
So why does Caffeine matter so much? It seems that Caffeine, at least in part, has enabled these evolutions in the algorithm, through its ability to index such a massive portion of the web. Carrie Grimes from Google again:
“Caffeine lets us index web pages on an enormous scale. In fact, every second Caffeine processes hundreds of thousands of pages in parallel. If this were a pile of paper it would grow three miles taller every second. Caffeine takes up nearly 100 million gigabytes of storage in one database and adds new information at a rate of hundreds of thousands of gigabytes per day. You would need 625,000 of the largest iPods to store that much information; if these were stacked end-to-end they would go for more than 40 miles.”
In order to rank URLs appropriately, they must be in an index (after being crawled and fetched). These are distinct processes with their own sets of algorithms. Caffeine represents a new model in search, whereby the largest modern day index of the web has been created, in order to model the data accurately and rank pages based on content and social signals, as well as the PageRank equation signals Google has built its search engine upon.
What Does This Mean For SEO?
It has been common practice for many years to monitor overall site indexing in each of the major engines (mostly focusing on Google, naturally). Sites that weren’t being indexed deeply would need specific tactics to push that number up, and sites well indexed would be monitored closely to ensure that was sustained.
What’s different post-Panda is that indexing, as a metric or signal, is no longer viable, simply because Google seems to want everything it can get in its index. The index is not a signal of anything, anymore, except that Google has the URL in its databases.
We’ve seen several large sites which were impacted by Panda, and in each case, indexation remained fairly flat while traffic from Google organic search plunged 50% or more.
In her piece on Google confirming Mayday impacts the long-tail, Vanessa Fox reported:
“I asked Google for more specifics and they told me that it was a rankings change, not a crawling or indexing change, which seems to imply that sites getting less traffic still have their pages indexed, but some of those pages are no longer ranking as highly as before.”
This is precisely what we’re seeing with Panda, as well.
Recommended SEO Approach For Panda
While most of what works now, has always worked, there is at least one important change.
The SEO model has changed with Panda in that, rather than getting as many URLs as you can indexed, you now want only your highest-quality, most important URLs indexed. Consistent signals should be sent as to which pages are most important:
- Decide which URLs are canonical and create strong signals (rel canonical, robot exclusion, internal link profile, XML sitemaps)
- Decide which URLs are your most valuable and ensure they are indexed and well optimized
- Remove any extraneous, overhead, duplicate, low value and unnecessary URLs from the index
- Build internal links to canonical, high-value URLs from authority pages (strong mozRank, unique referring domains, total links, are example metrics)
- Build high-quality external links via social media efforts
Pay special attention to number 3 above. If your properties have low-quality or significantly duplicative content, it is best to remove those URLs from the indexes. Even a site with some high-quality content and lots of thin or low-quality content could see traffic deterioration because of Panda.
The new SEO, at least as far as Panda is concerned, is about pushing your best quality stuff and the complete removal of low-quality or overhead pages from the indexes. Which means it’s not as easy anymore to compete by simply producing pages at scale, unless they’re created with quality in mind. Which means for some sites, SEO just got a whole lot harder.
- Just What User Behavior Data Does Google Use to Influence Search Rankings? SEOByTheSea.com
- Google Told You So SEOmoz
- What Google’s Latest Changes Mean For SEO AudetteMedia
- Why Quality Is The Only Sustainable SEO Strategy SearchEngineLand
Opinions expressed in the article are those of the guest author and not necessarily Search Engine Land.