Aug 2, 2007 at 12:58pm ET by Barry Schwartz
Last week’s news that Google is now is treating underscores URLs as word separators, as it does with hyphens, quickly spread through the SEOs and webmaster communities. But what about the other search engines?
I immediately contacted them to find out how they treat underscores and hyphens. Finally, the results are in. Yahoo and Microsoft (and now also Ask.com), the other two of the big three, confirmed that they do treat underscores the same as dashes or hyphens in the URL.
Let me step back and explain this a bit more.
Some SEOs believe that the keywords in the URL of a page have some limited impact on the ranking of that page in the search engines. So if you sold blue widgets, and you had a page at www.domain.com/blue-widgets.html, those keywords are sometime perceived to help – while keeping all the other factors in ranking a page equal.
In the past, Google treated hyphens but not underscores in a URL as a word separator. So in our example above, the blue-widgets part would be seen as two different words: blue & widgets.
If it were like this, blue_widgets, then Google would have seen it as one single word: blue_widgets.
Now Google treats underscores the same way as hyphens. As for Microsoft, Ramez Naam told me:
We treat underscores as word separators in URLs. Always have.
Priyank Shanker Garg from Yahoo told me:
For URL tokenization (separating words in URLs), we treat dashes or underscores identically, but these are not our only tokens and we take a more general approach to finding words in URL.
I also asked Ask.com, but they’ve yet to send a reply.
Postscript: Peter Linsley of Ask.com has now given me a response, they treat underscores as word separators also.
For the record, we also treat underscores as word separators in URLs.
Postscript: We have an update from Google’s Matt Cutts that Actually, Dashes Aren’t The Same As Underscores Yet. We will keep you posted on this.
Share, Bookmark & Discuss This Article
More:
Keep Updated: News Via Email | News Via RSS Feed | News Via Twitter
See more stories like this in the Members Library! Check out the SEO: Domain Names & URLs sections of the Members Library where this story is filed. Members also get access to exclusive video content, a members-only weekly & monthly newsletter, plus more. Check out all the benefits!
TOP STORIES
SEARCH NEWS BRIEFS
FEATURES & ANALYSIS
RECENT COMMENTS
Stay on top of all the search news with our daily summary, the SearchCap newsletter. View a sample ›
Search Engine Land produces SMX, the Search Marketing Expo conference series. SMX events deliver the most comprehensive educational and networking experiences - whether you're just starting in search marketing or you're a seasoned expert.
SMX Web Site » | SMX Difference » | SMX News »
Join us at an upcoming SMX event:
Learn more about search marketing with our free online webcasts and webinars from our sister site, Search Marketing Now. Upcoming online events include:
Featured sites from our Blogroll
Become a premium member today and receive:
Somehow I think that we are going to see the number 301 used more in the next month than it has been used in the past year…
Almost, but…Google actually looked at it as “blue_widgets,” including the underscore. See Matt Cutts’ somewhat nerdy take on it from a couple of years ago.
jimbeetle, it should be noted that Google is constantly evolving, so Matt’s earlier advice in this instance may no longer apply.
He’s the one who revealed that Google is now treating underscores as word-breaks.
I’d guess that this change was made to improve performance for the majority of users. There’s lots of documents where “word1_word2″ should be made to be more relevant to user searches for “word1 word2″, but relatively few users who would type in “word1_word2″. So, this change makes sense, and Google is king at doing what provides best usability.
To the very few of us that continually conduct our own experiments this of course has been known for some time; Keywords in URLs and URLs (Update)
These experiments also show if the keywords are actually indexed and if concatenated keywords in urls are recognized.
You can’t beat testing a hypothesis yourself…. :)
- Michael
This is not much in the way of news. As others have commented this tokenization has been around for quite some time.
The rule here is if you’re in the search marketing game, you should have known this for quite some time.
I was referring to the way G looked at “blue_widgets” in the past. It saw it as “blue_widgets” and not as Barry stated, “bluewidgets”.
It’s interesting how we all jump when Google finally makes this decision, yet Yahoo has always recognized the underscore as separation. This was approached at last years PubCon – I specifically asked the question regarding underscores vs dashes and Matt responded that I should do nothing as Google would implement the separation in the near future. Yahoo’s Tim Meyer stated it was already noticed on their end. I’m not a fan of Yahoo, but I prefer to give credit where credit is due.
I advise that we should all, take out a one page blog post on how Yahoo beat Google in the Underscore War.
One reason one might still opt for going with dashes instead of underscores could be that there are likely 2nd-tier and 3rd-tier search engine sites which don’t treat the underscores as white-space characters. Having those less-important sites linking to you is useful, and if they don’t place your links on nice, semantically-related pages, it doesn’t get you as much.
Also, it should be noted that Google still gives different search results if you search for “blue_widgets” versus “blue widgets”. They apparently give exact-match priority to searches including the underscores.
So, the functionality that Matt liked which included exact-match of terms including underscores is really still supported, while they now also do good relevancy for the multi-word search term cases.
Sorry to do this but Google is lying!
Google still treats underscores as one word!
You may not like the url as the example but the reason for the url is because Google and especially Adam Lasnik lie…
http://www.google.com/search?hl=en&q=adam_lasnik_the_google_drag_queen&btnG=Search
The cache date is July 30
http://72.14.235.104/search?q=cache:fDezWvI9tN8J:www.geekentertainment.tv/2007/07/18/dontcha-wish-your-cell-phone-was-hot-like-me/feed/+adam_lasnik_the_google_drag_queen&hl=en&ct=clnk&cd=1
Comments on: Dontcha Wish Your Cell Phone Was Hot Like Me?iPhone nay but Google yea! Check out Adam Lasnsnik The Google Drag Queen! http://www.igorthetroll.com/Adam_Lasnik_The_Google_Drag_Queen.jpg …
http://www.geekentertainment.tv/2007/07/18/dontcha-wish-your-cell-phone-was-hot-like-me/feed/
This one shows the result for the keywords!
———————————————
But this one does not show!
http://www.google.com/search?q=adam+lasnik+the+google+drag+queen&hl=en&pwst=1&start=30&sa=N&filter=0
So this is a lie again and again by Google.
Thanks for posting the comments.
Looks like the blog is being taken down
http://www.geekentertainment.tv
Is it going the way of
http://www.threadwatch.org
Is big G now a bad G
I just spoke with Randfish at seomoz,org and he told me, “That Matt C, said to him that the underscore equal dash fix is new and will take time to propagate.”
Wag of a finger at Matt, he should have waited to make the announcement of the fix until the changes have taken place…
We need to check if the change does happen or some more B.S. on Google part.
Google been a bad boy lately, lots of lies and hiding stuff.
Will fill you in on more hot stuff later…
Igor
I think that Matt’s statement could mean that the algo doesn’t always treat the – and _ the same. The reason being, _ are not permitted in domain names. For instance Google has been rumored to parse urls. So IMO, the domain parse may just remove the hyphen so it can use the same function to parse all domains. Then when it encounters hyphens in a file or folder name it replaces the – with a space.
There is also good reason to believe if urls are parsed then / = ? ; . could also be used as delimiters for keywords or analyzing particular URI tokens within the parsed url ie: the protocol, domain and TLD being analyzed ([b]possibly[/b] used in determining trust) . IMO, there is no reason not to assume the list above are either removed or replaced with a space. FWIW, it is likely, making this assumption has little or no downside and if correct it has huge upside. Isn’t that what we are paid to know and figure out? Which makes any comment from Matt, though interesting, inconsequential to the implementation of the strategy? IME, analyzing and parsing URI’s, the algo has to deal with or use these character list in order to delimit and analyze the URI tokens and keyword matches. When you try to actually do it using regex it becomes apparent these are the keys to analyzing and parsing URIs for keyword matches.