Get the best search news, tips and resources, delivered each day.
Q&A With Gabe Rivera, Creator Of Techmeme
Over the past decade, I’ve seen a lot of search tools that were supposed to transform my life. Few of them have. But Techmeme was one of those. When it kicked off back in September 2005, I wrote a review, gave it a preliminary thumbs-up and soon found myself addicted. It has become my newspaper, my front page guide to what’s going in the blogosphere relating to tech.
I met Techmeme creator Gabe Rivera in person for the first time last August, during the Search Engine Strategies conference in San Jose. I roped him into being on the Meet The Blog & Feed Search Engines panel at the last minute, where he joined Technorati, Topix, Bloglines and Sphere. Gabe asked the audience if anyone had heard of Techmeme. Apparently, no hands went up — at least in the audience. All the panelists certainly had.
I love that anecdote, because I feel it largely reflects how Techmeme operates. The masses might have Digg, but perhaps the influencers have Techmeme. Certainly plenty of large, influential bloggers I know keep an eye on what it is covering. But I recommend it for anyone, not just influencers, for the easy way it organizes what’s happening with technology stories.
I caught up with Gabe last week, to talk about the service, how it has grown, operates and future plans.
Q. Do you feel Techmeme has a more early adopter or influencer audience?
Yes, and in particular a lot of influencers in the tech media are Techmeme readers. A good number of high profile bloggers and reporters reload Techmeme all day long in fact. One sign of which is citations of course, for example the “via” link included the other day on Engadget [link here] and so many other sites. I’ve also run into a number of VCs and executives who use it too, so I’m pleased with that aspect of the demographics.
The traffic is not huge. I still get fewer than 30,000 daily uniques [unique visitors per day]. But then these are actual loyal users, since I see very little search traffic, perhaps due to poor SEO as my permalinks don’t lead to individual articles.
[Note: Techmeme permalinks are little paper icons that show next to stories, marked as “Click This” in the illustration below:
To understand why the SEO is “poor,” compare how Techmeme lists the TechCrunch story about the Wikiseek launch to how Digg does it. Techmeme jumps you to where the story is mixed with others on different topics, diluting the overall content or theme of the page. Digg give the story its own dedicated page, which often fills up with user generated content from comments].
But I doubt even good SEO will help a lot. People who arrive from Google tend to be looking for primary documents, not an aggregated page that quotes primary documents, and therefore rarely return anyway.
Q. Have you considered the idea of creating a Techmeme widget or button of some type, similar to those offered by Del.icio.us or Digg, so people could easily point to how they are listed at Techmeme or tell people to see discussions on a topic, the way Technorati or Sphere do?
Given the number of sites I monitor, that would probably only help a small number of bloggers a small portion of the time. So it’s probably not the highest priority feature for me to work on.
Q. How about a way to enter a URL and see if it’s been on Techmeme, to see the related posts that are clustered with a main item? Or a way to keyword search for articles or subjects.
That would certainly be a welcome feature, useful for some. But I’m really inclined to offer something more powerful if I do add search. In the meantime Techmeme River [launched in December, more background here] actually enables search, if what you’re searching for appeared in the last 5 days.
Q. Personally, I’ve also wanted that ability to go back in time easily and see the “front pages.” I know how to do it, as I’ve written about, but I want it even easier.
I suppose you mean a simple, clickable interface in additional to the “Page version” text input I offer already. I hear a lot of requests for obviously useful features like that. But considering the cost/benefit, I believe there are better things for me to tackle for now. I’ll be sure to write that one down though!
Q. Is Techmeme an echo chamber, just showing blogs commenting about blogs commenting about blogs? Does Techmeme feed into that echo chamber? Or how do you break apart the conversations on a particular topic into sub-conversations or topics?
Clearly Techmeme creates superficial incentives for “echo chamber” participation, yet I don’t see clear evidence that this makes things noticeably worse. I still like to trot out the example of the day my site launched. eBay’s acquisition of Skype became one of those huge story clusters, and this was hours before Techmeme [then tech.memeorandum] was publicly launched, i.e. before anyone believed they could get on the site by linking to stories.
I’d also point out the idea of many headlines on a single major story is not a problem in and of itself. Consider that the iPhone unveiling will probably be one of the major stories for all of 2007. So on one day for it to account for 40 percent of the headlines on Techmeme is not all that out of whack.
A more serious problem is when multiple stories essentially say the same thing. Ideally no two posts should just be rewrites of the same facts. Unfortunately it’s hard to accomplish that with software. Now I do expect to introduce something that tends to reduce those story clusters in a smarter way, but it will take some time to do that right.
Q. This page shows how the Yahoo purchase of MyBlogLog broke the story up into various clusters. You’ve got news of the sale, but also a report on the sale price, Yahoo’s Jeremy Zawodny doing a personal welcome. How do you determine which of the many stories to break out into these sub-clusters?
It’s driven by what my software things is the most important story inside of a cluster, then ranked by what it believes are the most important stories that discuss that. On that page, it just thinks that Om Malik’s post was the second most important [after the headline story from the CESblog].
Q. Over time, those clusters change. Some stories rise; all of them eventually fall. How does that work?
All of this derives from this “importance” measure I just alluded to. Importance is determined by a number of factors. Citations can increase importance, so a post that accumulates inbound links can rise. Time is a factor as well. A headline that’s appeared on the page for most of the day loses importance. Headlines usually fall off the page when the time component swamps all other factors. That’s how old news gives way to newer news.
Q. Old meaning how long from when Techmeme first spotted a story, not how long since the story was actually written, right?
“Old” in terms of how long it’s appeared on Techmeme. Though how old an article is according to its publication date is also a factor.
Q. Last month, Robert Scoble was explaining how Techmeme doesn’t pick up a source just because it links to an item on Techmeme. It picks up a source because other sources on Techmeme link to that source. That’s correct, right? To get into Techmeme, you need someone already in it to link to you — just linking to them doesn’t help.
That’s right. I think Robert recognized that that guy had a Technorati-type service in mind and believed merely linking out would trigger inclusion. That is, Technorati includes you in a URL search provided you link to that URL. Techmeme doesn’t work that way.
Q. But you did have a post with tips on getting on Techmeme in November where one of the ways was indeed said to be linking to you.
Yes, so if a post on your site includes a Techmeme permalink, as opposed to linking to an article that Techmeme links to, and my system notices a moderate amount of traffic through that link, and in addition determines your site is not spammy and the referral is real, your site stands a much better chance of appearing under the Techmeme “Discussion” for the article.
This is a good way for a news organization without RSS feeds to enable an article to appear on Techmeme. Without inbound links and without an RSS feed, many article URLs are undiscoverable by my system, except through this mechanism.
Q. When you launched way back, you had to have seed sites, correct? A group of sites you started out with. Do you still use that list?
Yes, though the list has been modified since then. Since September 2005, all of my sites have utilized a source discovery process in which the majority of monitored sources are found through yet not actually contained in the seeding set.
Q. How many sources do you monitor?
It’s in the low thousands. But on any given day, new sources are added and dropped so the total monitored over time is much larger.
I believe a good automated news site doesn’t really need to monitor more than a few thousand sources at a time. Maybe even a few hundred will suffice. Of course the key is the intelligence of the thing that sifts through those sources.
A good analogy for Techmeme is “automated blogger”. And there are indeed bloggers who largely work like filters or routers who churn through lots of stuff and only post the most interesting bits. Instapundit is a classic example. These people can only check a few hundred blogs daily, even the ones who work extremely efficiently.
Q. I’m exactly like that. I monitor like 100 to 200 feeds, to build our search headlines for the SearchCap newsletter each day. And if I keep coming across a blog being mentioned by other blogs with good stuff, I eventually add them to my feeds.
Right. Now Techmeme’s not as smart as you, but it can run faster, and there’s a lot of tedious stuff it can do that you would never want to do. So Techmeme is kind of like a blogger, but with strengths and weaknesses that go with the automation.
Q. Why do sources get dropped? Do they fail to post new material? Fail to keep being cited?
Fail to keep being cited. Every day Techmeme performs a bit of a reset, usually around 3am Eastern, where it doesn’t update for about an hour as it repeats the source discovery. So every day it tries to find the best few thousand sources. A blog can make the list one day but not the next.
[NOTE: I’ve seen this personally. My personal blog Daggle, for example. Things that I post there, I never assume they’ll make Techmeme. But when I announced I was leaving Search Engine Watch, suddenly it got on and went to the top of the page for a bit. Links from elsewhere made it relevant for inclusion. Occasionally I see it make it for other things. But with Search Engine Land, we’re far more likely to be in on a regular basis, since I gather we say things that enough other people in Techmeme find interesting enough to link to regularly.]
Q. Is spamming much of an issue for you? Since just linking to a story doesn’t get you on Techmeme automatically, seems like that should reduce a lot of it.
Exactly. I designed the algorithms to discover the most consistently useful news sources, and as a result spam blogs almost never come up. It certainly helps that reputable blogs almost never link to spam blogs.
Q. Do you spider everything, the full text of posts. Or do you just read whatever’s in a feed.
If my robot sees an indication that there’s additional text on the article page, it’ll spider that page. I need to make it a little smarter so that it succeeds in this more reliably, but I’m most of the way there even now.
Q. Do you store the full text over time?
Yes. Though not readily retrievable by all parts of my system, the data is archived.
Q. Do you depend on feeds to know who is authoring a story?
I utilize both the feed and the main page. I found early on you need to visit both to accumulate all this metadata.
I think it’s going pretty well. The obvious test being that sponsors are renewing. I’m even sold out for the next few months. Now the approach I’ve taken tends to limit who can sponsor Techmeme, so I don’t have a network of thousands of potential advertisers like Google, but fortunately there are enough sponsors out there to fill Techmeme’s inventory.
Q. How the Techmeme River going? Lots of use?
Not lots, though I wasn’t expecting lots. The river was intended just to cover some unusual uses cases. For example, when you need to scan everything that’s happened over the past few days, or you want to search through recent Techmeme posts for certain authors or title keywords.
Q. How about a way to see the top things on Techmeme, the most popular items rather than the river, that just lists them by date? When I was at Popurls, I was surprised not to see the top stories from Techmeme shown right up there with the top picks from Digg, Delicious, Reddit and others. Perhaps a popular list would help?
What you want is basically the Techmeme RSS feed. It includes the top 20 to 30 items of the day. Perhaps I could create starting page based only on these posts, but others have effectively done so already. For example, Original Signal includes Techmeme.
Q. Is Techmeme too elite with its sources?
Any selection of finite length is necessarily elitist. Now people often protest about Techmeme’s elitism, but these people are typically bloggers who don’t show up on Techmeme frequently. I don’t believe I’ve had anyone ask me make it less elitist that doesn’t blog! Yet my readers are largely non-bloggers.
For better or for worse, well-read bloggers tend to have better access to interesting news, and also tend to exercise the talent that helped establish them in the first place. I’m rather unapologetic that there are lots of less established writers who will never show up on Techmeme.
Q. Do you hand remove or add sources?
If I believe a site is essential enough, I’ll add it to the seed set, and I do that occasionally for Techmeme.
Q. Does it make any sense for people to ask you for inclusion or does it make more sense for them to get the attention of A-listers already in Techmeme?
Getting the attention of bloggers has probably worked for many more people than has asking me. Trackbacks, emails, and comments can be very effective and require much less time than the actual reporting and writing. And appealing to “A-listers” isn’t even necessary. The “B” and “C” list also wield considerable power in the Techmeme ecosystem.
The trending looks good on WeSmirch, the celebrity gossip site, which apparently has decent word of mouth. Ballbug, the baseball site is clearly the worst of the bunch trafficwise. The political site, memeorandum, is doing OK but could do better.
I think I want to further develop the technologies that improve the existing sites before introducing even more sites. Although 10 minutes of actual labor can lead to a new vertical, launching something truly compelling will depend on new technologies. Fortunately, these are the same technologies that will improve my existing sites.
Q. Ever been on Digg?
Techmeme has never made the front page or any category pages of Digg. I believe a few links have been submitted, but I suppose they were of little interest to the Digg community!
Q. Why not sell up or work for a larger company?
I’m not sure there’s a larger company that sees eye to eye with me about where all this is headed or where the value is and isn’t, to be perfectly vague. Probably the biggest reason though: I nap after lunch, unconditionally, and most big companies can’t accommodate this need nearly as well as my own.