Search Engine Land » SEO » When Ignorance Isn’t Bliss, Part 2

When Ignorance Isn’t Bliss, Part 2

Last week in Part 1 of What You Don’t Know About Your Web Site Can Hurt You, I revealed a few scary secrets most small business webmasters learn about the hard way. If that didn’t scare you away, here are a few more to help the weary web site owner stay in good standing with […]

Christine Churchill on July 26, 2007 at 8:52 am | Reading time: 12 minutes

Chat with SearchBot

Last week in Part 1 of What You Don’t Know About Your Web Site Can Hurt You, I revealed a few scary secrets most small business webmasters learn about the hard way. If that didn’t scare you away, here are a few more to help the weary web site owner stay in good standing with search engines and your visitors.

1. You suffer from relative linking issues

Every webmaster knows it’s good insurance to regularly check your site for broken links. However, one of the most common types of broken links is self-imposed and thus preventable. What is it? It sounds woefully easy, but using the right kind of relative links can save your visitors a lot of frustration.

What’s a relative link? An example of a relative link is “staff.html”—in an anchor tag this would appear as:

<a href="staff.html"> Staff</a>.

Many times designers will use a relative link because of the ease during migration from design site to live site. The problem with this simple type of relative link is that they can break as your site grows in complexity and you develop a hierarchical directory structure. The relative link gets its name because it is “relative” to the current directory. If you happen to move the content to a different directory you can end up with a 404 “file not found” error because the relative links are pointing to pages that no longer fall under the current directory.

Another option is an absolute link reference that uses the full http address in the domain name. An example is the Search Engine Land staff page (https://searchengineland.com/staff.html). In an anchor tag an absolute link to this page might appear as:

<a href="https://searchengineland.com/staff.html">Staff</a>

A number of sites have gone to the use of absolute links due to being scraped or the fear of hijacking. The downside of this practice is that many companies use a staging server to test sites prior to uploading them to the open web. The reference to an actual domain complicates the testing process when the site is in a developmental environment. Consequently, many designers prefer to use a relative linking structure.

Here’s a typical situation when sites grow. Small businesses often start with small, flat sites. That is to say, every page links from the home page, but goes no deeper. Over time the webmaster creates subdirectories to logically group files that contain, for instance, new product lines. The webmaster cuts and pastes all his/her previous footers and other navigation (that used relative links) into the new subdirectory. However, since the pages do not exist in the new subdirectory, errors begin popping up all over the place. What a mess!

Enter the “absolute relative link,” sometimes called the server-relative or domain-relative link. This is the hybrid version of the absolute and relative link. (And no, I didn’t make up the name. I read about them after being burned by plain relative links back in the late 1990’s. It was a lesson I never forgot!) An absolute relative link includes an initial backslash to tell the server to start from the root directory and follow this path to the page. An example in an anchor tag would be:

<a href="/products/fun-product.html">Fun Product</a>.

The absolute relative (server-relative link) offers a flexible solution that will make your web designer happy because s/he can test the site on a staging server and migrate it to a live server without domain name problems. It also allows him to use a standard navigation scheme, which works even if the links are referenced from different subdirectories.

The good news is it is EASY to check your links. There are numerous automated link checkers including Zenu Link Sleuth that can scan your site and report bad links to you. Remember that links are the pathway engines use to crawl your site. You don’t want the spiders and bots to crash into a dead end at your main navigation, so use a link checker.

2. Spider traps are keeping your site from being properly indexed

Your marketing department “ooo’d” and “ahh’d” over the web designer’s concepts so you paid big bucks to have it coded. Now you have a drop-dead-gorgeous site that spiders can’t crawl easily… if at all.

Google’s technical guidelines tell us, “…if JavaScript, cookies, session IDs, frames, DHTML, or Flash keep you from seeing all of your site in a text browser, then search engine spiders may have trouble crawling your site.” Sadly, most of these problems could have been remedied before they were a problem if the webmaster knew they were issues.

As a small business webmaster with an “un-crawlable” site, you have several options. You can arm yourself with knowledge about spider traps and look for them yourself, or you can bring a search engine optimization (SEO) consultant on board to review the design and techniques used on the site. A knowledgeable optimizer brought on during the design phase can advise you on how to create a beautiful site—and even keep some flash—while ensuring the site is easy to crawl. You don’t have to give up glitz to do well in the engines; you just have to be careful how the site is constructed.

A fast way to view your site in the way a search engine would is to download the Lynx text web browser and run your site through it. This is a free Open Source piece of software originally developed at the University of Kansas that lets you see your page as a search engine might read it: in text format.

Google’s Webmaster Guidelines are a great resource for learning more about designing crawler-friendly sites.

3. Previous SEO firms used shady tactics that cast your site in a bad light

Shady tactics come in many forms. They become a problem when you go past visible text and start embedding keywords in the code at every opportunity. Invisible text, excessive keyword stuffing, doorway pages, cloaking practices or any number of other shady tactics can cause the engines to put your site in Search Engine Hell. There are a number of “tricks” that ill-informed or unscrupulous SEOs and webmasters might use in an attempt to coerce the search engines to rank a site higher than it otherwise deserves. The problem with these tactics is they are short-lived. As search engines continue to improve the sophistication of their indexing and ranking algorithms, more black-hat tactics will be detected and dealt with (usually by banning your site from the index).

Note that while many of these tactics involve manipulating the content in some shady way (cloaking, doorway pages and hidden text being the worst offenders), bad linking tactics can also cause your site to be viewed as shady by the search engines. Ever used a cheap linking company to build links? Well that’s likely what you got: cheap links! While the initial monetary cost was low, now you have to pay the real price. Oftentimes, companies like these link your site to link farms and bad neighborhoods. Or worse yet, you get ninety percent of your links from comment spam.

If you’re unsure of your link status, sign up for Webmaster Central and let Google tell you the known links it has indexed. If you see that all your links are the same type (e.g. all are reciprocal or are all from spammy sites), you have a problem.

If you have already been caught by Google and been banned from their index, you should take a hard look at your site to determine what the cause might be. Remove the offending tactic then humbly request re-inclusion.

Want more information on other tactics that can get you in trouble? Check out Google’s guidelines.

4. You have canonical domain name problems

Does an engine see “www.mysite.com” and “mysite.com” as separate sites? What are the downsides of ignoring this problem? If you don’t redirect one version of the site to the other, Google and other engines will see the two as separate sites. Yep—that means you could have major link-splitting and duplicate content problems.

The most common fix to this issue is to create a 301 permanent redirect from one to the other. Also, Google’s Webmaster Tools now allow you to specify to Google which version you prefer. This will help with canonical domain issues in Google, but not other engines.

For instructions on creating a 301 on an Apache server see the Apache Web Server documentation.
IIS redirects are handled differently.

“What about using the DNS CNAME entry,” you may ask? It is acceptable to set up a DNS CNAME entry to point alternate names back to the primary name. This would be done as follows:

<table width="50%" border=0> <tr> <td>mysite.com</td><td> A </td><td>aaa.bbb.ccc.ddd (IP address direct)</td> </tr> <tr> <td>www.mysite.com</td><td> CNAME </td><td>mysite.com</td> </tr> </table>

This tells everyone (not just browsers and search engine spiders) that www.mysite.com is an alias for the preferred mysite.com.

Is it possible to handle canonical name issues this way? Yes. Does it work with the search engines? Yes, if properly implemented. Is it recommended? No, because it is easy to make implementation errors that result in an infinite loop on the DNS resolution. In addition, this involves getting the IT department or the ISP staff involved, which may take longer to implement than a 301 redirection rule.

5. You’re haunted by an unreliable server

Going cheap on hosting often equates to unreliability. Look for providers that include telephone support (not just email) and have extended operating hours. You’ll also want to check their time zone, especially if you live on the coasts.

Consider setting up an outside service to monitor your server so you know how your ISP is really doing. While most brag about 99.99% uptime, you may find they are stretching the truth. There are several low-cost server-monitoring services out there (Red Alert and Server Check Pro, for example), that can ping your server every 15 minutes then let you know when and if it is down. This type of service can also detect slow-performing servers.

Some have chosen to use a server checker on the hosting company’s site in order to decide which host to use. You should be aware that the performance of the hosting company’s site may differ from the sites they host.

In addition to affecting the customer’s experience on the site, a poorly performing server can also have a negative effect on your site’s search engine performance. If the site is down or slow in responding, the search engine spider may get tired of waiting and decide to de-list that page.

6. All your web content has been stolen

Your web site is publicly accessible and the content is available electronically. These two facts make it simple for an unscrupulous webmaster to snag a copy of your site’s content and clone it on the web. Automated scrapers are stealing content daily making it possible for others to use your content for their gain.

The easiest way to detect when this has happened is to grab a long text snippet (perhaps eight to 10 words or so) from your site and drop it into a search box, placing quotes around it to indicate an exact match search for that phrase. Assuming the content is original and unique, the only page that shows up should be your own. There are also tools like Copyscape.com that can check for violations of your content rights on a regular basis.

If another site has stolen and published your content, Google may think your site contains the duplicate content and the offending site has the original. At Google Webmaster Central Blog, Adam Lasnik wrote a great piece called Deftly Dealing With Duplicate Content.

If you find yourself the victim of stolen content, Google has spelled out the steps to filing a copyright infringement complaint. A great way to prove your ownership of the copy is to use the Way Back Machine. This free tool will allow you to show how long the content has been on your website.

7. Bogons can eat your web site

No, this isn’t the name of a new monster flick. This is a bizarre and unusual situation involving Internet traffic from IP addresses that are not currently assigned to ANY ISP. Since these lists are manually maintained and can lag behind current legitimate assignments, search engines may be unable to access your site through no fault of your own. If you recently changed your hosting to a newly commissioned data center or if your ISP recently assigned you an IP from a newly commissioned block of addresses, you should see this hosting issues article for the details. This unusual case falls into the fact-is-stranger-than-fiction category, but since we witnessed at least one account of it firsthand, we know it can happen.

In these two articles, you’ve read about a diverse list of issues webmasters and SEOs look at when diagnosing potential web site problems. To be sure your site is operating in an optimal state, review this list with your IT staff and/or your hosting company. Spending a little time on preventative maintenance now can ensure you don’t suffer from a cyber tragedy later on.

Christine Churchill is President of KeyRelevance.com, a full service search engine marketing firm. The Small Is Beautiful column appears on Wednesdays at Search Engine Land.

Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.

Add Search Engine Land to your Google News feed.