Everything you need to know about SEO, delivered every Thursday.
What You Can Learn From Google’s “Site” Operator
Google has a set of advanced search operators that can be accessed either through the advanced search page, or by using specialized commands in conjunction with your query from any Google web search box. One of the most useful for search engine optimization is the “site:” operator.
Google’s “site:” is an advanced search operator that allows you to see the URLs they have indexed for your website.
To access it, you simply type into Google’s search box the following:
Substituting your own domain for example.com, of course.
This is one of the first things we do when researching and reviewing a new client’s website, because there’s a lot you can learn from the results that are displayed.
You can learn:
- Approximately how many URLs (not pages) from your site are indexed
- If you have duplicate content issues
- What your title tags look like
When you run a query using the “site:” operator, in the top-right of the search results page (SERP) you’ll see something like: “Results 1 – 10 of about 162,000 from www.example.com.” Which means that they have approximately 162,000 URLs of your website indexed. You won’t be able to see all 162,000 in the results, however, as you can generally only drill down to the first 1000.
Knowing how many URLs Google has indexed is important because it can help determine if there are any indexing problems. For instance, if Google says you have 162,000 URLs indexed, but you know your website doesn’t have even close to that many, you probably have duplicate content issues. This doesn’t mean that you actually have pages of duplicate content, but that you may have the same content being indexed under multiple URLs.
As an example, your home page could be indexed as www.example.com as well as www.example.com/index.php. Or if you have tracking URLs in place, you may find that you’re seeing both the “real” URL being indexed, as well as the tracking URL. That’s not a good thing, and you’ll want to use your robots.txt exclusion file to make sure those don’t get indexed.
Other forms of content being indexed under multiple URLs can happen when your content management system spits out a variety of query strings for pages of essentially the same content. You may be able to fix this via your robots.txt, or you may need your developer to tweak your CMS to only allow one form of the URL to ever be output.
Another thing you might notice in the SERP when you run the “site:” operator is your home page (or any page of your website) is being indexed under a variety of URLs that you wouldn’t expect. If you see this happening, it might mean that your 404-error page is not set up correctly to send out a 404 http header response, and is instead redirecting people to your home page (probably via a 301-redirect). This shouldn’t in and of itself be a problem, but it is when it comes to the search engines and indexing. You definitely don’t want your home page indexed under multiple URLs, as Google’s duplicate content filter might stop your “real” home page from showing up when it’s supposed to.
If your home page being indexed under multiple URLs is not because of a 404-error page being set up incorrectly, it might be due to your website providing unique session IDs for any browser that visits the website. If you must use session IDs to track the visitors to your website, you’ll need to make sure that these will not be shown to search engine spiders and that they only get the “clean” version of each URL. Your website developers should be able to implement this if brought to their attention.
Though the “site:” operator can teach you a lot about how Google indexes your website, there are some things that it doesn’t show you. For example, the “site:” operator doesn’t show you:
- What your SERP description will look like
- Which pages of your website are most important
Often people see their search results from a “site:” operator and panic because the snippet or description that shows up underneath their URL is part of their navigation or something else that looks icky and not click-worthy. Don’t despair! The snippet that shows up when you use a “site:” operator query is rarely the same as the description that shows up for an actual keyword query. Perform some keyword searches yourself and you’ll see the difference.
The other mistake people make is thinking that the order of the pages listed when you use the “site:” operator in the SERP shows the order of importance of those pages. While Google does tend to show the home page of a site before the other pages, the rest of the list isn’t sorted in any particular order of importance. So be careful about drawing any conclusions based on that.
I find new discoveries all the time when using Google’s “site:” operator for the websites I review. If this isn’t something you currently check for your websites or for those of your clients, you may be missing some important information that could improve your website’s search engine performance.
If you’d like to learn more, last year Vanessa Fox posted some info about the site: command on Google’s Webmaster Central blog.
Jill Whalen, CEO and founder of High Rankings, a search marketing firm outside of Boston, and co-founder of SEMNE, a New England search marketing networking organization, has been performing SEO since 1995. Jill is the host of the High Rankings Advisor search engine marketing newsletter. The 100% Organic column appears Thursdays at Search Engine Land.
Some opinions expressed in this article may be those of a guest author and not necessarily Search Engine Land. Staff authors are listed here.