• http://www.trulybored.com gamermk

    Somehow I think that we are going to see the number 301 used more in the next month than it has been used in the past year…

  • jimbeetle

    So in our example above, http://www.domain.com/blue_widgets.html, would be seen as one word, or as “bluewidgets.”

    Almost, but…Google actually looked at it as “blue_widgets,” including the underscore. See Matt Cutts’ somewhat nerdy take on it from a couple of years ago.

  • http://www.naturalsearchblog.com Silver

    jimbeetle, it should be noted that Google is constantly evolving, so Matt’s earlier advice in this instance may no longer apply.

    He’s the one who revealed that Google is now treating underscores as word-breaks.

    I’d guess that this change was made to improve performance for the majority of users. There’s lots of documents where “word1_word2″ should be made to be more relevant to user searches for “word1 word2″, but relatively few users who would type in “word1_word2″. So, this change makes sense, and Google is king at doing what provides best usability.

  • http://www.seo-blog.com MichaelDuz

    To the very few of us that continually conduct our own experiments this of course has been known for some time; Keywords in URLs and URLs (Update)

    These experiments also show if the keywords are actually indexed and if concatenated keywords in urls are recognized.

    You can’t beat testing a hypothesis yourself…. :)

    - Michael

  • http://www.dannedelko.com Dan Nedelko

    This is not much in the way of news. As others have commented this tokenization has been around for quite some time.

    The rule here is if you’re in the search marketing game, you should have known this for quite some time.

  • jimbeetle

    jimbeetle, it should be noted that Google is constantly evolving, so Matt’s earlier advice in this instance may no longer apply.

    I was referring to the way G looked at “blue_widgets” in the past. It saw it as “blue_widgets” and not as Barry stated, “bluewidgets”.

  • http://www.psymple.com Asia

    It’s interesting how we all jump when Google finally makes this decision, yet Yahoo has always recognized the underscore as separation. This was approached at last years PubCon – I specifically asked the question regarding underscores vs dashes and Matt responded that I should do nothing as Google would implement the separation in the near future. Yahoo’s Tim Meyer stated it was already noticed on their end. I’m not a fan of Yahoo, but I prefer to give credit where credit is due.

    I advise that we should all, take out a one page blog post on how Yahoo beat Google in the Underscore War.

  • http://www.naturalsearchblog.com Silver

    One reason one might still opt for going with dashes instead of underscores could be that there are likely 2nd-tier and 3rd-tier search engine sites which don’t treat the underscores as white-space characters. Having those less-important sites linking to you is useful, and if they don’t place your links on nice, semantically-related pages, it doesn’t get you as much.

    Also, it should be noted that Google still gives different search results if you search for “blue_widgets” versus “blue widgets”. They apparently give exact-match priority to searches including the underscores.

    So, the functionality that Matt liked which included exact-match of terms including underscores is really still supported, while they now also do good relevancy for the multi-word search term cases.

  • http://www,igorthetroll.com Igor The Troll

    Sorry to do this but Google is lying!

    Google still treats underscores as one word!

    You may not like the url as the example but the reason for the url is because Google and especially Adam Lasnik lie…

    http://www.google.com/search?hl=en&q=adam_lasnik_the_google_drag_queen&btnG=Search

    The cache date is July 30

    http://72.14.235.104/search?q=cache:fDezWvI9tN8J:www.geekentertainment.tv/2007/07/18/dontcha-wish-your-cell-phone-was-hot-like-me/feed/+adam_lasnik_the_google_drag_queen&hl=en&ct=clnk&cd=1

    Comments on: Dontcha Wish Your Cell Phone Was Hot Like Me?iPhone nay but Google yea! Check out Adam Lasnsnik The Google Drag Queen! http://www.igorthetroll.com/Adam_Lasnik_The_Google_Drag_Queen.jpg
    http://www.geekentertainment.tv/2007/07/18/dontcha-wish-your-cell-phone-was-hot-like-me/feed/

    This one shows the result for the keywords!
    ———————————————
    But this one does not show!

    http://www.google.com/search?q=adam+lasnik+the+google+drag+queen&hl=en&pwst=1&start=30&sa=N&filter=0

    So this is a lie again and again by Google.

  • http://www.igorthetroll.com Igor The Troll

    Thanks for posting the comments.

    Looks like the blog is being taken down
    http://www.geekentertainment.tv

    Is it going the way of
    http://www.threadwatch.org

    Is big G now a bad G

    I just spoke with Randfish at seomoz,org and he told me, “That Matt C, said to him that the underscore equal dash fix is new and will take time to propagate.”

    Wag of a finger at Matt, he should have waited to make the announcement of the fix until the changes have taken place…

    We need to check if the change does happen or some more B.S. on Google part.

    Google been a bad boy lately, lots of lies and hiding stuff.

    Will fill you in on more hot stuff later…

    Igor

  • WebmasterT

    I think that Matt’s statement could mean that the algo doesn’t always treat the – and _ the same. The reason being, _ are not permitted in domain names. For instance Google has been rumored to parse urls. So IMO, the domain parse may just remove the hyphen so it can use the same function to parse all domains. Then when it encounters hyphens in a file or folder name it replaces the – with a space.

    There is also good reason to believe if urls are parsed then / = ? ; . could also be used as delimiters for keywords or analyzing particular URI tokens within the parsed url ie: the protocol, domain and TLD being analyzed ([b]possibly[/b] used in determining trust) . IMO, there is no reason not to assume the list above are either removed or replaced with a space. FWIW, it is likely, making this assumption has little or no downside and if correct it has huge upside. Isn’t that what we are paid to know and figure out? Which makes any comment from Matt, though interesting, inconsequential to the implementation of the strategy? IME, analyzing and parsing URI’s, the algo has to deal with or use these character list in order to delimit and analyze the URI tokens and keyword matches. When you try to actually do it using regex it becomes apparent these are the keys to analyzing and parsing URIs for keyword matches.