Researchers Track Down a Plague of Fake Web Pages from the NY Times reports on a recent paper from Microsoft Research released named “Spam Double-Funnel: Connecting Web Spammers with Advertisers” [PDF download] (also note that Gary linked to it Friday).
The report categorizes search results spam by industry category, showing that some search categories have a 30% or more rate of spam. Here is a chart covering various :
Read this from section 4.0:
In late September 2006, we submitted the 1,000 keywords to the Search Ranger system, which retrieved the top-50 results from all three major search engines. In total, we collected 101,585 unique URLs from 1,000x50x3=150,000 search results. With a set of approximately 500 known-spammer redirection domains and AdSense IDs at that time, the system identified 12,635 unique spam URLs, which accounted for 11.6% of all the top-50 appearances. (The actual redirection-spam density should be higher because some of the doorway pages had been deactivated, which were no longer causing URL redirections when we scanned which were no longer causing URL redirections when we scanned them.)
The NY Times summarizes the paper saying they “discovered that the average spam density — a measure of the percentage of Web pages that contain only advertisements — was 11 percent for 1,000 keywords they used in their research.”
Here are some other references for you:
- Strider URL Tracer with Typo-Patrol from Microsoft Research - Strider Typo-Patrol from Microsoft Research - Typo Domain Spotting Tool & Domain Registration Stats from SEW Blog - Google AdSense For Domains Program Overdue For Reform — And Yahoo & Microsoft Should Also Take Note from SEW Blog - MS Research: Typo-Squatters Are Gaming Google from eWeek