Subscribe Via Web Feed Subscribe with Google Add to My Yahoo! Subscribe with Bloglines Add to netvibes Subscribe with Live.com

« Google Tests On Search Page Keyword Bookmarking Service? | Main | SearchCap: The Day In Search, August 2, 2007 »

Aug. 2, 2007 at 12:58pm Eastern by Barry Schwartz

It's Not Just Google That Treats Underscores Like Dashes

Last week's news that Google is now is treating underscores URLs as word separators, as it does with hyphens, quickly spread through the SEOs and webmaster communities. But what about the other search engines?

I immediately contacted them to find out how they treat underscores and hyphens. Finally, the results are in. Yahoo and Microsoft (and now also Ask.com), the other two of the big three, confirmed that they do treat underscores the same as dashes or hyphens in the URL.

Let me step back and explain this a bit more.

Some SEOs believe that the keywords in the URL of a page have some limited impact on the ranking of that page in the search engines. So if you sold blue widgets, and you had a page at www.domain.com/blue-widgets.html, those keywords are sometime perceived to help - while keeping all the other factors in ranking a page equal.

In the past, Google treated hyphens but not underscores in a URL as a word separator. So in our example above, the blue-widgets part would be seen as two different words: blue & widgets.

If it were like this, blue_widgets, then Google would have seen it as one single word: blue_widgets.

Now Google treats underscores the same way as hyphens. As for Microsoft, Ramez Naam told me:

We treat underscores as word separators in URLs. Always have.

Priyank Shanker Garg from Yahoo told me:

For URL tokenization (separating words in URLs), we treat dashes or underscores identically, but these are not our only tokens and we take a more general approach to finding words in URL.

I also asked Ask.com, but they've yet to send a reply.

Postscript: Peter Linsley of Ask.com has now given me a response, they treat underscores as word separators also.

For the record, we also treat underscores as word separators in URLs.

Postscript: We have an update from Google's Matt Cutts that Actually, Dashes Aren't The Same As Underscores Yet. We will keep you posted on this.

Like The Story? Vote For It On Yahoo Buzz!
Subscribe To Our Daily Search News Recap!
Your Email:
Send me the monthly search newsletter too! (Learn more about our newsletters and feeds)
Subscribe To Our Search Feed!
Subscribe Via Web FeedSubscribe with GoogleAdd to My Yahoo!Subscribe with BloglinesAdd to netvibes
Subscribe with Live.comSubscribe in NewsGator OnlineSubscribe in RojoAdd to My AOL
Share & Bookmark This Story!
By Barry Schwartz Permalink Jump To Comments See Related Stories In: SEO: Domain Names & URLs



Reader Comments

Somehow I think that we are going to see the number 301 used more in the next month than it has been used in the past year...

So in our example above, www.domain.com/blue_widgets.html, would be seen as one word, or as "bluewidgets."
Almost, but...Google actually looked at it as "blue_widgets," including the underscore. See Matt Cutts' somewhat nerdy take on it from a couple of years ago.
Comment by jimbeetle [TypeKey Profile Page] | August 2, 2007 2:27 PM

jimbeetle, it should be noted that Google is constantly evolving, so Matt's earlier advice in this instance may no longer apply.

He's the one who revealed that Google is now treating underscores as word-breaks.

I'd guess that this change was made to improve performance for the majority of users. There's lots of documents where "word1_word2" should be made to be more relevant to user searches for "word1 word2", but relatively few users who would type in "word1_word2". So, this change makes sense, and Google is king at doing what provides best usability.

To the very few of us that continually conduct our own experiments this of course has been known for some time; Keywords in URLs and URLs (Update)

These experiments also show if the keywords are actually indexed and if concatenated keywords in urls are recognized.

You can't beat testing a hypothesis yourself.... :)

- Michael

This is not much in the way of news. As others have commented this tokenization has been around for quite some time.

The rule here is if you're in the search marketing game, you should have known this for quite some time.

jimbeetle, it should be noted that Google is constantly evolving, so Matt's earlier advice in this instance may no longer apply.

I was referring to the way G looked at "blue_widgets" in the past. It saw it as "blue_widgets" and not as Barry stated, "bluewidgets".

Comment by jimbeetle [TypeKey Profile Page] | August 3, 2007 10:36 AM

It's interesting how we all jump when Google finally makes this decision, yet Yahoo has always recognized the underscore as separation. This was approached at last years PubCon - I specifically asked the question regarding underscores vs dashes and Matt responded that I should do nothing as Google would implement the separation in the near future. Yahoo's Tim Meyer stated it was already noticed on their end. I'm not a fan of Yahoo, but I prefer to give credit where credit is due.

I advise that we should all, take out a one page blog post on how Yahoo beat Google in the Underscore War.

One reason one might still opt for going with dashes instead of underscores could be that there are likely 2nd-tier and 3rd-tier search engine sites which don't treat the underscores as white-space characters. Having those less-important sites linking to you is useful, and if they don't place your links on nice, semantically-related pages, it doesn't get you as much.

Also, it should be noted that Google still gives different search results if you search for "blue_widgets" versus "blue widgets". They apparently give exact-match priority to searches including the underscores.

So, the functionality that Matt liked which included exact-match of terms including underscores is really still supported, while they now also do good relevancy for the multi-word search term cases.

Sorry to do this but Google is lying!

Google still treats underscores as one word!

You may not like the url as the example but the reason for the url is because Google and especially Adam Lasnik lie...

http://www.google.com/search?hl=en&q=adam_lasnik_the_google_drag_queen&btnG=Search

The cache date is July 30

http://72.14.235.104/search?q=cache:fDezWvI9tN8J:www.geekentertainment.tv/2007/07/18/dontcha-wish-your-cell-phone-was-hot-like-me/feed/+adam_lasnik_the_google_drag_queen&hl=en&ct=clnk&cd=1


Comments on: Dontcha Wish Your Cell Phone Was Hot Like Me?iPhone nay but Google yea! Check out Adam Lasnsnik The Google Drag Queen! http://www.igorthetroll.com/Adam_Lasnik_The_Google_Drag_Queen.jpg ...
www.geekentertainment.tv/2007/07/18/dontcha-wish-your-cell-phone-was-hot-like-me/feed/


This one shows the result for the keywords!
---------------------------------------------
But this one does not show!

http://www.google.com/search?q=adam+lasnik+the+google+drag+queen&hl=en&pwst=1&start=30&sa=N&filter=0

So this is a lie again and again by Google.

Thanks for posting the comments.

Looks like the blog is being taken down
http://www.geekentertainment.tv

Is it going the way of
http://www.threadwatch.org

Is big G now a bad G

I just spoke with Randfish at seomoz,org and he told me, “That Matt C, said to him that the underscore equal dash fix is new and will take time to propagate.”

Wag of a finger at Matt, he should have waited to make the announcement of the fix until the changes have taken place...

We need to check if the change does happen or some more B.S. on Google part.

Google been a bad boy lately, lots of lies and hiding stuff.

Will fill you in on more hot stuff later...

Igor

I think that Matt's statement could mean that the algo doesn't always treat the - and _ the same. The reason being, _ are not permitted in domain names. For instance Google has been rumored to parse urls. So IMO, the domain parse may just remove the hyphen so it can use the same function to parse all domains. Then when it encounters hyphens in a file or folder name it replaces the - with a space.

There is also good reason to believe if urls are parsed then / = ? ; . could also be used as delimiters for keywords or analyzing particular URI tokens within the parsed url ie: the protocol, domain and TLD being analyzed ([b]possibly[/b] used in determining trust) . IMO, there is no reason not to assume the list above are either removed or replaced with a space. FWIW, it is likely, making this assumption has little or no downside and if correct it has huge upside. Isn't that what we are paid to know and figure out? Which makes any comment from Matt, though interesting, inconsequential to the implementation of the strategy? IME, analyzing and parsing URI's, the algo has to deal with or use these character list in order to delimit and analyze the URI tokens and keyword matches. When you try to actually do it using regex it becomes apparent these are the keys to analyzing and parsing URIs for keyword matches.

Comment by WebmasterT [TypeKey Profile Page] | September 4, 2007 7:18 PM

Search:

Search Marketing Expo

Save the date for:
SMX Madrid (in Spanish, May 20-21)
SMX Advanced - Seattle, WA (June 3-4) Register today! Early bird rate expires May 9!
SMX Local & Mobile - San Francisco, CA (July 24-25) (July 24-25) Pre-agenda rate expires May 2. Get the lowest rate by registering now.
SMX East - NYC - (Oct. 6-8)
SMX London - November 4 & 5, 2008

Search Marketing Now

Learn more about search marketing through free online webcasts and webinars from our sister site Search Marketing Now.

Upcoming Webcasts:

Most Recent News Posts

About Search Engine Land

Stay Updated!

Get Our Search Newsletters:
Email:
Daily Monthly

Get Our Search Feed:
Subscribe Via Web FeedSubscribe with Google
Add to My Yahoo!Subscribe with Bloglines
Add to netvibesSubscribe with Live.com
Subscribe in NewsGator OnlineSubscribe in Rojo
Add to My AOL
More About Our Feeds & Newsletters

Add to Technorati Favorites

Track Us Socially:
Facebook: Our Search News App
Facebook: Search Engine Land Page
Facebook: Search Engine Land Group
Flickr: Search Engine Land
LinkedIn: Search Engine Land Group
Twitter: Search Engine Land Feed

Bragroll