16 common on-site SEO mistakes you might be making right now
Columnist Pratik Dholakiya shares the 16 technical SEO issues he sees most frequently. Even the most experienced SEO professionals can sometimes overlook these common issues!
Editors note: It has been called to our attention a number of points in this article are outdated or incorrect. We plan to update the article shortly.
SEO is more than inbound marketing. There’s massive overlap, but there’s a technical side to SEO that sometimes gets neglected, especially by casual followers of the industry.
As somebody who spends a great deal of time looking at sites searching for opportunities to optimize, I notice patterns that creep up often: technical mistakes that show up again and again.
Let’s go over these mistakes. If my experience is anything to go by, odds are high you’re making at least one of them.
1. Nofollowing your own URLs
There comes a time in every SEO’s life when they need to keep a page hidden from the search results — to prevent duplicate content issues, to hide member areas, to keep thin content pages out of the index, to hide archives and internal search result pages, during an A/B test and so on. This is perfectly innocent, perfectly noble and perfectly necessary. However…
… do not use the “nofollow” tag to accomplish this!
The “nofollow” tag doesn’t prevent pages from being indexed by the search engines, but it does ruin the flow of PageRank through your site.
For the very same reason, you should not attempt to sculpt the flow of PageRank through your site by using the “nofollow” tag. Let me explain.
The “nofollow” tag does prevent PageRank from passing through a link, but Google still takes into account the total number of links on your page when determining how much PageRank to pass. In other words, your followed links will pass the same amount of PageRank regardless of whether the other links on the page are nofollowed or not.
I still see this happening often: SEO newcomers and webmasters using “nofollow” tags on their own content, either thinking that it will prevent a page from showing up in the search results, or thinking that they can use it to direct PageRank to their most important pages. The “nofollow” tag accomplishes neither of these things.
When you use a “nofollow” tag, you are throwing away PageRank. Don’t do it, not even on pages that you don’t want indexed. If you want to keep a page out of the index, use this in your HTML head:
The above directive prevents the page from turning up in the search results but recommends that the search engine follow the links on the page. That way, any PageRank that flows into the unindexed page will be passed back to your site through the links on the page, rather than getting dumped.
2. Not using canonicalization
The rel=canonical tag in the HTML head looks like this:
It tells search engines that instead of the current page, the linked URL should be treated as “canon” by the search engines.
Why would you use this tag? The purpose of it is to prevent duplicate content from getting indexed, which can result in diluting your search engine authority. Using the canonical tag also seems to pass PageRank from the non-canonical page to the canonical page, so there is no need to be concerned about losing the PageRank accumulated by the non-canonical page.
This is a place where conversion optimizers can often fail. Page alternates in an A/B test should make use of the canonical tag so that the alternate page doesn’t get indexed (and so that any authority picked up by the alternate page is passed to the primary page).
Variations on product pages, such as alternates with a different color, are another common example. Duplicates can also get created any time URL query strings are in use. For this reason, sitewide canonicalization can be a good solution for sites that make use of query strings. Self-referencing canonical pages are not generally thought to be an issue.
3. Poor use of outbound links
If you’re linking to another site in your site-wide navigation, and it’s not one of your social media profiles, odds are you should remove the link.
From a pure PageRank standpoint, external links dilute the authority that gets passed back to your own site. This isn’t to say that you shouldn’t be linking to anybody else (which would utterly defeat the purpose of using links as a ranking factor). But outbound links in your own site navigation compound the losses by affecting every page.
Of course, Google has come a long way since the original PageRank algorithm, but there’s another reason why external links in the navigation are iffy: It’s easy for them to look like spam.
The situation is, of course, far worse if the links use keyword anchor text or if the links are placed somewhere where they could be confused for internal site navigation.
Outbound links in the primary content are generally not an issue, but it is important to screen them for quality. Links to “bad neighborhoods” can get a site penalized by Google’s spam team or pushed down the rankings by anti-spam algorithms.
And, of course, it is absolutely crucial that you always nofollow advertisement links of any kind.
4. Not enough outbound links
The idea that “a little bit of knowledge is a dangerous thing” definitely applies here. A limited understanding of how the search engines work leads some to believe that they should never link to another site. While it’s true that the pure PageRank algorithm would suggest this, it’s simply not how things work out in the field.
A case study by Reboot Online makes a pretty clear case for this. They created 10 sites featuring a nonsense keyword, five featuring authoritative outbound links and five not.
The results were about as definitive as possible for a study of this size: All five of the sites with outbound links performed better than the sites without them.
In a post on PageRank sculpting by Google’s former head of web spam, Matt Cutts, he also mentions that “parts of our system encourage links to good sites,” which seems to confirm the idea that linking to other sites is important.
To be fair, John Mueller has openly stated that outbound links aren’t “specifically a ranking factor,” while adding that they “can bring value to your content and that in turn can be relevant for us in search.” In context of the Reboot Online study and Matt Cutts’s statement, this might be interpreted to mean that including citations boosts confidence in content, rather than meaning that outbound links have no effect at all.
Regardless, well-sourced content is a must if you want to be taken seriously — which may have a positive, if indirect, effect on rankings.
5. Poor internal link structure
There’s more than one right way to structure your links, but there are plenty of wrong ways to do it, too.
Let’s start with the basics. As the Google guidelines state:
Build your site with a logical link structure. Every page should be reachable from at least one static text link.
Your typical modern content management system will usually handle at least this much automatically. But this functionality sometimes gets broken. One dangerous myth is that you are supposed to canonicalize multiple page posts back to the first page. In reality, you should either leave well enough alone or canonicalize to a single page that contains the entire post. This goes for archives and similar pages, too. Canonicalizing these pages runs the risk of erasing the links on these pages from the search index.
A completely flat link architecture is another common issue. Some take the idea that every page needs to be accessible through links a bit too far, including links to virtually every page on the site within the navigation.
From the user perspective, this creates obvious issues by making it very difficult to locate appropriate pages.
But this confusion passes on to the search engines and the way that they interpret your site. Without a clear hierarchy, search engines have a very difficult time parsing which pages on your site are most important, which pages cover which topics, and so on.
Remember, there’s much more to the algorithm than PageRank. A categorical hierarchy helps search engines understand your site semantically, which is very important for rankings.
Watch out for tag clouds and long lists of dated archives. These show up less often in modern CMS themes, but they occur often enough that you should know they are to be avoided. Click-throughs on these are awful, and the extra links divide up PageRank. Dated archive lists, in particular, add no semantic information to your link architecture, and category links are much more organized than muddy tag clouds.
Finally, while it’s not exactly a mistake not to, we highly recommend referencing your own content within your body content. Contextual links within body content are generally believed to count more than links in the navigation, and they certainly add important semantic value.
6. Poor URL architecture
URL architecture can be a difficult thing to fix without breaking other aspects of your SEO, so we don’t recommend rushing into this, or you might do more harm than good.
That said, one of the most frequent issues I come across is a lack of solid URL architecture. In particular, folder organization is often spotty.
A few common issues:
- Blog posts listed in multiple categories, resulting in blog posts listed in multiple folders, creating duplicate content issues as a result.
- URLs with no folders other than the parent domain. While this is precisely the form your most important pages should take, pages further down the hierarchy should be listed in folders to categorize them.
- URLs with folders that are, themselves, 404 pages. If a URL is listed under a folder, many users expect that folder to be an operational page. From an architecture perspective, it’s semantically confusing, and from an internal link perspective, it’s ideal to have links to these pages from a parent folder.
- Junk URLs full of numbers and letters. These days, these are primarily reserved for search result pages and database queries that aren’t intended to be indexed and found in search engines. Your URLs should contain useful information intelligible to a human if you want them to contribute positively to your performance in the search engines.
In addressing these issues, there are two complications you want to avoid: creating 404 pages and losing existing link authority. When you change your URL architecture, you need to make sure that the old pages 301 to the new ones. Ideally, any internal links to the old pages should also be updated, since PageRank is reduced by the damping factor every time it passes through a link or 301.
As an exception, if blog posts are listed in multiple categories, a 301 isn’t always necessary, but in its place you should canonicalize to the preferable page.
7. Using frames
Frames and iframes are needed in a few places, but you should never use them for anything that you want to be indexed. Google is pretty clear on this:
Frames can cause problems for search engines because they don’t correspond to the conceptual model of the web. In this model, one page displays only one URL. Pages that use frames or iframes display several URLs (one for each frame) within a single page. Google tries to associate framed content with the page containing the frames, but we don’t guarantee that we will.
This isn’t to say that your site should never use them. YouTube embeds make use of iframes, for example.
What you absolutely should not do is use frames as a method of navigating content on your site. This not only makes the content difficult to index, it ruins your site architecture and makes it very difficult for people to reference your content with links.
8. Using unindexable formats
Search engines have limited ability to crawl and index the content found inside images, flash files, Java applets and videos.
As with frames, this isn’t to say that you should never use these formats for anything on your site. What it does mean is that you should never trust the search engines to properly index the content in these formats, and you should always provide alternate content for both users and search engines to access.
9. Not using transcripts
Failing to include transcripts or captions for videos is likely the most common failure associated with unindexable formats. Transcripts and captions allow search engines (and YouTube) to understand videos in a way that isn’t otherwise possible.
A study by Liveclicker found that 37 web pages saw a 16 percent increase in revenue when they added transcripts, and Digital Discovery Networks found that their captioned videos saw 7.32 percent more views on average.
If a transcript would take up too much space on your page, a scroll box is likely the best solution. Alternatives that include the content in the html but hide them from the user are likely to be considered cloaking and should be avoided for this reason.
10. Using image alt attributes incorrectly
As mentioned above, you should avoid using images in place of text, since it is difficult for the search engines to interpret the image and very unlikely that it will interpret it the same way as text.
One thing most webmasters these days are well aware of is the image alt attribute, often referred to as the “alt tag.” The alt tag is meant to provide a text alternative for an image if that image cannot be displayed. In other words, if the user is using a screen reader due to a visual impairment, or if their device is incapable of loading the image, he or she will be presented with the text of the alt attribute instead.
The problem is, a very large portion of webmasters are using it incorrectly. What I mean is that they are treating the alt tag as if it were a keyword tag of some kind, but that is not what it is intended for. All too often, I run into sites that stuff keywords into their image alt tags that have little or nothing to do with the image itself. Even when the keywords are relevant, they often don’t provide the information somebody would need if they can’t see or load the image.
That said, in general it’s considered good practice to keep the image alt below 125 characters. If the image is a large graph or infographic that would require a larger alt to explain, the text should be included elsewhere.
11. Unintentional cloaking
Google takes a strong stance against cloaking, but not every incident of cloaking is intentional. While the odds that unintentional cloaking will get you penalized are relatively low, it’s a good idea to avoid cloaking entirely to be on the safe side.
How does unintentional cloaking happen?
A classic example of cloaking is placing text on the site with a color that matches the background. This makes the text invisible to readers, while the search engines can still crawl it. In the past, spammers used to include keywords in hidden text like this, hoping that it would improve their visibility in search results. This hasn’t worked in a very long time, but some spammers do occasionally still try to use this “tactic.”
Unfortunately, this can also happen by accident, when certain elements of your style sheet are accidentally rendered the same color as the background. This should be avoided.
Another frequent accident to watch out for is empty anchor text links: href links with no anchor text. Too many of these may also be considered cloaking.
12. ‘Sneaky’ redirects
A “sneaky” redirect is any redirect that effectively cloaks the search engines from seeing the same page as the user, or sends the user to a page they weren’t expecting to visit. Google has an explicit stance against this as well.
I strongly recommend against using any method of redirecting users other than a 301 redirect, with the rare exception of 302 redirects if they really are intended to be temporary. Using any other method of redirection runs the risk of working for users but not for search engines, resulting in unintentional cloaking.
Related to this, avoid redirect chains, and don’t redirect to an unrelated page. Both of these practices are unfortunately quite common.
Redirecting to an unrelated page is something that is often done because some webmasters think that no URL that has previously existed should ever go 404. This actually isn’t true; Google prefers that you leave a page 404 as opposed to redirecting to an unrelated page, like the home page, for example. Redirects are intended to move users to identical pages, or pages that serve the same purpose, as the original page. Redirects to unrelated pages are considered “soft 404s” at best and sneaky redirects at worst.
Redirect chains throw away PageRank due to Google’s damping factor, and they may also be considered sneaky redirects if they appear to be misleading users or search engines, intentionally or otherwise.
13. Missing or duplicate meta descriptions
It amazes me how often I still come across sites that don’t seem to have heard about meta descriptions. This is one of very few places where search engines give you almost complete control, so don’t waste that opportunity. There’s not much to say here that can’t be found elsewhere, so I’ll leave it at that. I just can’t skip over this one, because I still see it very frequently.
A less obvious issue is the duplicate meta description. I usually see this happen because a template includes a meta description, resulting in entire sections of the site with the same description.
Often this is done intentionally, because developers have heard that every page should have a meta description, and this is their solution. Unfortunately, this actually does more harm than good.
Meta descriptions take the place of Google’s automated search snippet, and while Google’s automated snippet isn’t always optimal, it is bound to be better than a generic snippet designed for a swath of pages.
Then, there are the meta descriptions you shouldn’t have!
Yes, it’s a thing.
This is admittedly a bit of a controversial position, but I am of the opinion that not every page needs a meta description, and there are cases in which using a meta description can be counterproductive.
Consider the case of blog posts designed for long-tail. Google’s automated snippets grab bits of content related to the search phrases the user searched for. In some cases, this means that Google’s automated snippet can actually be better.
In the case of a blog post designed for long-tail, there’s no way to include every possible phrase a user might have searched for in the meta description. Adding a meta description in this case can lead to a situation where the user doesn’t see any reference to their search query in the snippet, and that may discourage them from clicking through as a result.
How can you determine when it’s a good idea to include the meta description and when not to?
This mostly comes down to keyword strategy. For highly focused pages with a very clear topic, a custom meta description is likely the best choice. For posts more along the lines of a “rant,” or in cases where the content covers a very large number of topics, it’s worth considering the possibility that meta descriptions could actually discourage click-throughs.
Less controversially, it’s important to avoid short meta descriptions, as well as meta descriptions too long for search engines. A good target to shoot for is 130 to 150 characters, in most cases.
14. No XML sitemap (or an out-of-date one)
Google crawls and indexes websites much faster than they used to, and this might be why I see so many sites that don’t make use of one these days. But XML sitemaps are still valuable, and I still believe every website needs one. A case study published at Bruce Clay resulted in the percentage of pages indexed increasing from 24 percent to 68 percent as a result of implementing an XML sitemap. Indexation issues still happen, and XML sitemaps still help.
Make sure that you add your XML sitemap via Google Search Console to ensure that the search engine is aware of it.
Is your sitemap up to date? If you’re not using a CMS that automatically updates the XML sitemap every time you update the content, this needs to change. Static XML sitemaps are virtually useless in this day and age, since websites are updated so frequently.
15. Bad use of subheadings
Here are a few issues I see fairly often with subheadings:
- Using H1 tags for subheadings. Please don’t do this. The H1 tag is meant to serve as a title for the entire page. Using more than one may confuse the search engines as far as the topic of the page.
- Using subheadings inconsistently. What I mean here is skipping straight to H3 tags without using H2 tags, using H2 tags when you intend them to be subsections of another H2 tag, and so on. The heading tags create a very clear hierarchy for search engines to crawl and understand, so don’t mess with the order in which they’re intended to be used.
- Using heading tags in the navigation or menu. I’ve seen cases where entire sections of the site shared the same H1 tag because it was included in a common header for the section. This can lead to keyword cannibalization and similar issues. Including subheading tags within the navigation may also lead to confusion of which content belongs to the body.
- Using bold or size formatting in place of subheadings. While this is less of an issue, I would highly recommend sticking to subheading tags, with the exception of subsections within subheadings. Again, subheadings give the search engines a very clear hierarchy, which can assist them in semantically interpreting your site — a hierarchy which is likely to be less clear if you use size formatting, bolding and so on.
16. Bad use of bold formatting
While correlative studies definitely point to use of keywords in bold formatting having a positive relationship with rankings, it’s very easy to interpret these results in the wrong way.
SEOPressor ran an experiment to see how adding bold or strong tags to their keywords would affect rankings. The experiment was a disaster.
The page’s rankings dropped from rank 64 to rank 84. Obviously, they weren’t testing this on a high-risk page, but the results are fairly definitive, especially since the effect went away soon after the formatting was removed.
Interestingly, another article written by the same author suffered during the same time period, suggesting that you might possibly harm authorship reputation by stuffing keywords into bold tags.
Sycosure ran their own case study in response and saw similar results. After adding bold tags to the primary keyword for one of their articles, it disappeared from the search results entirely. Ultimately, the page did recover before the bold tags were removed, perhaps due to other signs of quality, but the implications are clear.
The lesson here is that bold formatting is probably best avoided as an SEO tactic altogether. It’s certainly useful to highlight content in order to make your pages easier to skim, but it appears to be harmful if it’s associated with your keywords. The positive correlations found in many studies probably have more to do with what search marketers are doing than what is influencing search results.
Over to you
Did you catch yourself making any of these mistakes? No worries — these are incredibly common. What matters is that you fix the problem and put a process in place to keep the problem at bay.
How about you? Any common mistakes I missed?
Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.
New on Search Engine Land