Search Engine Land » SEO » Having A Crawlable Site Architecture Still Matters

Having A Crawlable Site Architecture Still Matters

Over the past few years, all of the search engines have made tremendous strides in trying to discover different parts of the web, find deep content and understand complex URL structures better. The first big stride came when each of the search engines adopted the XML Sitemap protocol. Website owners now had the ability to […]

Michael Gray on March 5, 2009 at 5:30 am | Reading time: 3 minutes

Over the past few years, all of the search engines have made tremendous strides in trying to discover different parts of the web, find deep content and understand complex URL structures better. The first big stride came when each of the search engines adopted the XML Sitemap protocol. Website owners now had the ability to give the search engines a list of all of their URL’s.

A few weeks ago, another major step occurred when the search engines announced the adoption of a canonical URL tag to help them understand what the true URL is. The problem with this is that designers and developers are starting to use these tools as a crutch to compensate for a site architecture that is un-crawlable, poorly designed and unnecessarily confusing.

While you can use a sitemap to provide search engines with the URL’s, if you don’t provide a crawlable navigation scheme, you remove the power of internal anchor text from your website. Without that internal anchor text, you make it harder for the search engines to understand what your pages are about. Additionally, by removing the internal navigation, you lose the ability to help the search engines understand which parts of your website are important and set up a hierarchy.

Sitewide links across the top, side and footer areas help search engine find the high level key areas of your website. While there is an importance level tag for each URL in an XML sitemap, exactly how much weight, if any the search engines put on it, remains open for debate. Internal anchor text is big part of the puzzle, and removing it is like forgetting to put the sugar in your cookie recipe and expecting everything to come out of the oven tasting the same.

The new canonical tag solves the problem of pages with identical or nearly identical content existing for different URL’s. The search engines have indicated they will use this as signal to determine the proper URL but it is not an absolute declaration of the proper URL. For example, let’s assume this is a URL:

https://example.com/videos/

But people have the ability to sort page elements, producing URL’s like:

https://example.com/videos/?sort=newest
https://example.com/videos/?sort=mostwatched
https://example.com/videos/?sort=funniest

What happens when those URL’s start acquiring more backlinks than the pure URL, which is specified in the canonical tag – which URL will the engines prefer? Don’t think it matters? What if instead of a “sort” parameter you use an affiliate ID parameter, and it acquires the most backlinks, and starts to outrank the proper URL specified in the canonical tag? It might look something like this:

https://example.com/?aff=12345

As a website owner, you might be giving away a commission to someone who didn’t deserve it.

The takeaway here is that you should design your website structure and CMS to use and be naturally crawlable, with search engine friendly URL structures. Tools like sitemaps and canonical tags should supplement, but never replace or displace standard crawling as the way your website interacts with search engines.

Contributing authors are invited to create content for Search Engine Land and are chosen for their expertise and contribution to the search community. Our contributors work under the oversight of the editorial staff and contributions are checked for quality and relevance to our readers. The opinions they express are their own.

Add Search Engine Land to your Google News feed.