The Promise & Reality Of Mixing The Social Graph With Search Engines
I’m having a bad day. Aside from my desktop crashing, we get another spate of “let’s blame SEO” to start my morning off. Robert Scoble uses that theme as a launching pad for a series of videos on how Facebook potentially could be a killer search engine — regardless of the fact he seems to […]
I’m having a bad day. Aside from my desktop crashing, we get another spate of
“let’s blame SEO” to start my morning off. Robert Scoble uses that theme as
a launching pad for a
series of videos on how Facebook potentially could be
a killer search engine — regardless of the fact he seems to have no clue
that “social graph” or social networking mixing has been tried and abandoned
with search. Having watched his videos, which have sparked
I’ll do some debunking, some educating for those who want more history of
what’s been done in the area, plus I’ll swing around to that New York Times
article today that ascribes super-ranking powers to SEO.
Robert’s excited about “social graph search,” which is the idea that if
you know a network of people, you can use their connection to improve search
results. It’s a “revolution” coming in search that will overtake all the
major search engines, he says. Maybe, but it’s not like we haven’t heard
this before. I’ll go through his arguments, but it really feels like this is
more about getting attention to Robert’s videos, period.
Part 1 of Robert’s social graph video series starts off by telling us
that there’s no way we’d have gotten to his videos from a search engine.
That’s absurd. People write about what’s in video content all the time. Want
to see the Lazy Sunday video? Oh, look — I
number one on Google without Google needing to analyze the words inside the
That’s the real point that Robert’s trying to make, of course — that
search engines typically don’t analyze all the words within a video in the
way they read the words in a web page. Want to understand more about that?
My Video Search
Challenge Isn’t Speech Recognition, It’s Content Owner Management post
from February explains this in more depth as well as why it really hasn’t
been an issue. In particular to Robert’s argument, it’s because there are
plenty of people who will reference what’s in the video content in more user
friendly and search engine friendly HTML text.
The meat of his first part is to talk about three different types of
search engines: crawlers (like Google), Techmeme and Mahalo to discuss how
they are or are not “SEO resistant,” as if SEO is a bad thing — you know,
that SEO equals spam.
SEO is not spam. It’s like saying email is spam. There’s email; there’s
email marketing; there’s email spam. These are all different things. You
want to better understand why SEO isn’t spam? Then read the posts below:
Virginia, SEO Is Rocket Science – Defending Search Engine Optimization
SEO, Yet Again!
- Why The SEO
Folks Were Mad At You, Jason
- From My
Inbox: More Defense Of SEO
- SEO: Real Skills
That Can Protect Your Traffic
Want to be like Robert — and Jason Calacanis — and keep equating SEO
with spam? Then screw you.
I’ve had enough of people trying to advance their own personal agendas (Jason hoping someone will care about Mahalo; Robert hoping someone will watch his videos) on the back of an industry that is full of plenty of people who do good work.
[NOTE: Originally, I went with the F-word above. I was really tired; it was a bad day, as said, when I wrote this. And I wanted to make an impact. But the points in this article are important, and I refer to it constantly, so I dropped that in October 2010 so as not to possibly offend some readers].
Last week, I was part of a meeting at Google along with a number of
notable SEOs, being asked about ways Google could be better. This group
wasn’t pushing for Google to make it easier for them to spam the listings. A
chief concern they had was how Google (along with other major search
engines) continues to have difficulties identifying original source
documents. You know — you publish your blog post, then some other site with
more authority than you picks it up, and then that site gets the top
ranking. SEOs are leading the charge to help site owners get a fix for this
overall. But all people like Jason and Robert want to do is characterize
them as evil comment spammers for their own personal gain.
Back to the video. Robert goes through how search engines make use of “on
the page” factors, though he doesn’t call it that, and greatly simplifies the
process. But yes, search engines look at the frequency and location of words
on a page to determine if a page is relevant to that.
Robert then explains that PageRank is also used, using incorrect
shorthand for link analysis that’s part of “off the page” factors that
search engines use to rank pages — looking at the quality and the context
of links to a page. My
What Is Google
PageRank? A Guide For Searchers & Webmasters post from April goes into
more depth about what exactly PageRank is and how it is not the same as link
analysis. Give it a read, Robert.
Next, we get the news that paid links are hard for Google to tell apart
from “real” links. Actually, Robert says Google can’t tell the difference
between them. In reality, it can easily identify many types of paid links.
But not all of them.
Apparently, the SEO community feels it’s its “birthright” to stick paid
links into pages. Actually, Robert — there’s disagreement about that within the SEO
community, and the bigger audience that feels it has a birthright to do what
it wants with paid links are the content owners themselves. SEOs aren’t
selling the links; they’re buying the real estate that others are selling.
Robert then shifts gears to Techmeme and how, in his view, news won’t get
on the site until someone starts to blog about it. Um, yeah — that’s part
of the “meme” part of Techmeme. New stuff can (and does) hit the site pretty
quickly, too, since important blogs catch things fast and talk about it.
FYI, Q&A With Gabe
Rivera, Creator Of Techmeme from me in January talks about Techmeme and
how it works in more depth.
Techmeme is described as an SEO resistant site. Sure, in the sense that
you’re dealing with a smaller source list. From that reason, Google News is
more resistant. Any vertical or specialized search engine that deals with a
subset of sites is SEO resistant (or more correctly, spam resistant).
Mahalo comes up next and how by using a small number of human editors, it
can be harder to spam. Sure. So’s the Yahoo
Directory. You remember the Yahoo Directory, right? It used, um, a small
number of human editors to categorize the web. Advances in crawler-based
search engines meant you could get really good relevancy and be spam
resistant, which caused the Yahoo Directory to effectively be abandoned by
Yahoo. Mahalo’s approach to custom-tailor the most popular searches is
interesting — but despite heaps and heaps of publicity the new service has
had showered upon it, it still hasn’t gained any real traction among
Launches With Human-Crafted Search Results from me in May describes the
service in more depth.
Finally, Robert turns to Facebook, talking about the “social
graph” term that’s now being bandied about as this month’s new Kool-Aid to
drink. I’m being harsh — there’s obvious value in being able to look at the
connections between people and form a ranking mechanism that can be applied to
things. SocialRank, PeopleRank — whatever you want to call it. The idea at the
moment is that Facebook especially has it, so everyone else better look out.
That leads to Part 2,
where how adding the social graph to existing search technologies will really
change the search game. Wow. Wow! I mean, yeah, never heard that before.
Personalized search? The
has been that by knowing some things about you, a search engine might refine
your results to make them more relevant. A teenager searching for music might
get different matches than a senior citizen. A man looking for flowers might see
different listings than a woman.
Eurekster’s twist on this concept is to provide personalized
results based not on who you are but who you know. Friends, colleagues and
anyone in your Eurekster social network will influence the type of results you
The potential of using your friends or colleagues is enormous.
Imagine Eurekster being used by all the employees of a medical research firm,
where many might do similar medical-related queries. With Eurekster, all the
employees can be linked together and benefit from the searches and selections
made by their colleagues.
Libraries are another institution that might latch on to the
Eurekster concept. Librarians are constantly asked by patrons for assistance.
Eurekster would allow librarians to collaborate invisibly with each other and
share what they’ve found to be the best for various queries.
There are downsides. Not all of my friends have the same
interests as me. In addition, as my social network grows — because my friends
invite their friends and so on — commonalities that are useful get diluted.
Eurekster is still out there, but the idea of a network of friends
influencing search results seems to have died at some point over the years. Viva
not the revolution Robert was promising us.
Well, maybe Eurekster had no luck with that particular model since the
company was small. Well, Yahoo’s not small.
And in June 2005, it rolled out Yahoo My Web 2.0, which promised to bring social
networks into search. As I
After seeing what was planned, I remarked to Yahoo senior vice
president of search Jeff Weiner sitting next to me that they were building “an
eBay for knowledge.” Jeff was already literally bouncing at times with
excitement in showing the new system, and the remark made him smile even more
He smiled because that’s exactly the Yahoo goal. My Web is
Yahoo’s community rating system for information. Just as you buy things on eBay
depending on ratings to know if you’ll trust a seller, My Web is what Yahoo
hopes will help you choose more wisely the information you receive, whether you
actively check reviews, contribute or remain an ordinary searcher who completely
ignores the tagging and social search components.
In short, Yahoo’s not banking on tagging — the categorization
of material — as a way to help people find things better. It’s banking that the
mere act of saving things at all, even without tags, will give them a clue about
what are trusted pages across the web. By looking at patterns of saving, Yahoo
will have trust networks to tap into….
We’ve had a generation of search engines that depended on
on-the-page factors such as word location and frequency. We’ve had a current
second generation that tapped into link analysis, looking at how people are
linking and what they say in links.
Personal search is that third generational jump, and Yahoo’s
flavor of personal search is a social network one that it hopes will improve
relevancy in web wide results in the way that link analysis helped drive back
spam and improve relevancy years ago.
“We’re creating personal anchor text for pages, but by having
a trust network, we can actually pretty much eliminate spamming,” Walther said.
Guess what? Still no revolution. The masses didn’t descend upon Yahoo My Web to
form networks and save search results. In fact, Yahoo pretty much pulled back
from the product, even
dropping inline integration with it from regular search results back in
But before Robert gets into applying social networks to search, he prattles
on about Mahalo again being so superior to Google. Oddly for a video, he doesn’t
show us any of the search results pages he’s talking about, saying at one point
he can’t show us them. Apparently the camera he uses can’t be swiveled to show a
screen. C’mon, Robert — if we’re investing the time (over a half-hour) to watch
your video, make use of the medium.
I mean seriously, you might want to ding Google for deciding Amazon deserves
to have a top listing for HDTV over all the other types of information that
could be shown (especially if you’re one of the millions who use Google from
outside the United States). But if you ding it for that, what’s up with Mahalo’s
supposedly great human method of also deciding Amazon deserves a top spot.
Mahalo lists more than a top seven list, of course — you get a chunk of
review sites, manufacturers, retailers and so on. But here’s the deal on Mahalo
— it’s not really a search engine. The page it provides is good human crafted
content, a good destination page like you might find at some of the other
destination pages that Google lists. Mahalo — as Jason Calacanis himself will
tell you — is a great place to start searching if your searches involve very
popular queries. But if you want to hit those “search tail” terms that people
always encounter? It’s not going to help.
My desktop computer crashed today just when I got back from a trip, and since
it’s likely to be down for a day or two, I decided to start using a new Vista
laptop I purchased until I can buy a Mac replacement (heh — well, maybe not).
But ZoneAlarm for Vista doesn’t block http referrer information as it does on
Windows XP. That led me to do a tail search — block referrer plugin for firefox — something that’s only going to
happen a few times per month, relatively speaking. Try it on
Google, then on
Mahalo. Plenty of good solutions right at the top on Google. On Mahalo, I
have to wade through four “related” links that aren’t relevant (Ryan Block,
Sunscreen, Netscape and Jet), then I get Google results.
Robert also tells us that Mahalo rocks because you know, the first thing you
do in the search process for HDTVs is want to know the manufacturers. Bull.
First of all, no one can predict how someone else will go through a search
process, so that’s bull strike one. But if I want to play magic mindreader like
Robert, I’ll say the first thing people want are some guides to HDTVs. What is
an HDTV? Is 720p enough to make a TV HD quality? Does it have to have HDMI?
Saying you first go to a manufacturer site in the process is like saying that
if you want to buy a new car, just go visit some car dealerships. Me — I go to
Consumer Reports, figure out the cars I might want from a third party trusted
resource, understand the jargon I might encounter, and then I go to the
dealership. And as someone who bought an HDTV last year, I also remember going
to the horrible manufacturer web sites where they often provided only sparse
info about their own products and certainly didn’t compare them to competing
We then learn from Robert that Google can’t change to be like Mahalo because
it has algorithms that are “stuck in sand, stuck in cement” and shifting will
Insane. Seriously, like you want to scream stop talking. Robert’s a
personally likable guy, but watching him make statements like this is like
watching someone driving a car full speed toward a concrete wall while yelling
“It’ll be OK — we’ll get through.”
Google has constantly changed its ranking algorithm over the years and will
continue doing so. If Robert knew any SEOs, they’d tell him this firsthand. But
more to the point, Google can’t change the ranking algorithm to be more like
Mahalo because Mahalo isn’t using an algorithm to rank web pages — it’s using
human editors. Maybe Google someday might get an algorithm to mimic much of what
Mahalo does, but that still wouldn’t be the same as using actual humans.
So it is impossible for Google to change! Maybe the algorithm, but Google
could easily hire editors of their own, pay them more than Jason does and do
what Mahalo is doing if that model takes off — which, so far, hasn’t happened.
Robert then jumps into the idea that Google also can’t integrate social
networking into its algorithms, pointing to Google’s largely failed
Orkut social networking site as an example. He
completely overlooks the fact that Google is playing the human/social aspect on
a different level — personalized search, where results are refined based on
what you as an individual seem to like. That’s a major shift for Google, and
it’s also one that I’ve found personally compelling. For more on the service,
- Google Ramps Up
- Just Behave:
Google’s Marissa Mayer on Personalized Search
- Google Search
History Expands, Becomes Web History
In particular, Google has been talking about how personalized search allows
personalized PageRank (and see
here for a patent look), a way where rankings revolve around what you
personally like. It’s not a hard leap to extend that into a “social network PageRank”
model, where if you define a social network, the collective interests of that
network could be used to model the rankings. Google’s not doing that now, but to
suggest that the mechanism are somehow impossible from either a company attitude
or technological model is simply being ignorant of Google.
Finally — halfway into part two of his video, 23 minutes of covering all
that’s “wrong” with some existing players, Robert unveils how social might be
blended into Facebook, giving you the impression this is simply a “please hire
me” pitch to Facebook itself.
First step from Robert, use old-style on-the-page ranking. Yeah, there’s a
waste of time.
Here’s a thought — why not just license an existing search engine period? I
mean, how do you search the web when on the Facebook site itself? You don’t. The
Facebook search box only searches within Facebook (and despite
claims from Facebook
itself that it is some type of fantastic people search engine, I’ve found the
search less than compelling).
partnered with Microsoft, so it’s somewhat amazing (if not telling) that
there’s no ability to search using Microsoft’s Live Search. Hit
MySpace, in contrast, and you’ll see how
the Google partnership has Google web search over there.
Facebook doesn’t need to build on-the-page ranking from scratch, not to
mention the nightmare situation of trying to crawl billions of pages.
Next, Robert gets into the social network and trust aspect. Sure — an
exactly like what Eurekster and Yahoo already promised. There’s nothing new
here. Well, there is. As Facebook has grown, we’ve also had frustration grow —
including the famed
Facebook Bankruptcy that Jason Calacanis declared last month.
People have friends on Facebook who aren’t friends at all. It’s just easier
to accept them. Robert, at the time of this writing, has 4,875 “friends” in the
system. Really — he knows all of these people? And wants all of them
influencing his searches?
Ah — but see, Facebook knows how to “lock out” the SEOs, Robert tells us, so
he’s not overwhelmed by noise. Sure — but on the flipside, Robert is one of the
top FBOs out there, Facebook Optimizers, to the degree people have been
complaining about how his
activities dominate the news updates that Facebook sends out to those with him
as a friend.
There will be more FBOs, no doubt about that. Any system that has lots of
traffic will attract people who will study ways to tap into that traffic. That’s
good and bad. It’s good in that since it’s going to happen, you want people to
learn appropriate ways
to do this. It’s bad in that there will no doubt be spamming that comes along
For such hype about his video, I was pretty much left with a “is that it”
response? Facebook will get pages, then look at a social network and
hopefully get those people to proactively rank pages when they search. Despite
the fact that the Yahoo My Web experience tells me people don’t want to build
search results — they just want to search.
Social network data applied to search does have promise. But to assume that
social networks can’t be spammed and lack noise is foolish. To assume that
people want to participate in actively shaping results is also mistaken, in my
view. To also assume that major players like Google or Yahoo can’t tap into
things that make Mahalo, Techmeme or Facebook good is shortsighted. Yahoo
Answers is akin to Mahalo. Google News is Techmeme across multiple subjects.
By the way, Robert, if you’re tired of the SEO “noise” you think screws up
results, then do this. In a search for
scoble, for well
over a year now, you’ve crowded out variety in the results by not redirecting
your scoble.weblogs.com address to your new home at scobleizer.com. That means
you get both results 1 & 2 for your new place as well as results 3 & 4. You also
have no content at scobelizer.wordpress.com, plus another version of your old
place at radio.weblogs.com.
You have contacts with the Weblogs folks — getting a redirect should be easy
for you. You can kill or block that WordPress site that you no longer use. I
assume you maintain these other sites simply so that when people search for you by name, you
crowd out anyone else from ranking well, perhaps people who might disagree with
you on topics. That’s an aspect of SEO — it’s a tactic used as part of public
relations in SEO. It introduces the same “noise” into the results that you
cheered about not being present in Mahalo. So clean it up or cut it out with
the SEO slams. You’re doing SEO yourself.
What about that New York Times article I mentioned?
When Bad News Follows You is the article, another amazing “SEO sucks”
story. The New York Times has opened its archives for crawling, which apparently
is causing people to come forward at the not so astounding rate of one person
per day complaining about articles casting them in a bad light. Blame SEO:
Technically complex, search engine optimization pushes
Times content to or near the top of search results, regardless of its
importance or accuracy.
Wow, seriously — did I just read that in the New York Times? SEO just shoves
whatever crud it wants to the top of Google. Hey, think there are some SEOs out
there that would like to rank for “new york times.” I guess they just need to
SEO up some pages and they get there. Not.
Geez. The rest of the article does some hand-wringing on what to do to make
the “right” articles appear at the top of the results (why not just sprinkle
some SEO fairy dust on them?).
Insane. If an article is factually incorrect, then correct it. If the article
is about someone with a negative connotation, then a later article comes out
updating the story, link prominently from the top of the negative article to the
latest version of a story. It’s called online journalism in the 2000s.
Postscript: I purposely haven’t read any of the other
commentary on Robert’s post until I could brain dump my own thoughts. Since
then, I read Rand Fishkin’s
Used to Respect Robert Scoble’s Opinion post, and he does a great job of poking back at the SEO attack as
well as debunking Robert’s ideas on that Maholo-Google search shoot-out I
Google and search from Dave Winer points out that spam is not the problem
that both Jason and Robert like to paint it as. There’s spam, but there are lots
and lots of great results, too.
Why Google Should be Scared of Facebook from Dare Obasanjo highlights that
Facebook’s wall around its content is a threat to Google, but that’s a wall I
think will get ripped down sooner rather than later, if only when Facebook
decides it needs to show more ads on those pages and so needs more traffic.
Techmeme has much more