• Search Engine Land
  • Sections
    • SEO
    • SEM
    • Local
    • Retail
    • Google
    • Bing
    • Social
    • Resources
    • More
    • Home
  • Follow Us
    • Follow
  • Search Engine Land
  • SEO
  • SEM
  • Local
  • Retail
  • Google
  • Bing
  • Social
  • Resources
  • Live
  • More
  • Events
    • Follow
  • SUBSCRIBE

Search Engine Land

Search Engine Land
  • SEO
  • SEM
  • Local
  • Retail
  • Google
  • Bing
  • Social
  • Resources
  • More
  • Newsletters
  • Home
SEO

How to keep your staging or development site out of the index

You generally wouldn't want your staging site appearing in search results, so how can you prevent Google from indexing this content? Columnist Patrick Stox offers some tips.

Patrick Stox on November 22, 2017 at 11:45 am
  • More

One of the most common technical SEO issues I come across is the inadvertent indexing of development servers, staging sites, production servers, or whatever other name you use.

There are a number of reasons this happens, ranging from people thinking no one would ever link to these areas to technical misunderstandings. These parts of the website are usually sensitive in nature and having them in the search engine’s index risks exposing planned campaigns, business intelligence or private data.

How to tell if your dev server is being indexed

You can use Google search to determine if your staging site is being indexed. For instance, to locate a staging site, you might search Google for site:domain.com and look through the results or add operators like -inurl:www to remove any www.domain.com URLs. You can also use third-party tools like SimilarWeb or SEMrush to find the subdomains.

There may be other sensitive areas that contain login portals or information not meant for public consumption. In addition to various Google search operators (also known as Google Dorking), websites tend to block these areas in their robots.txt files, telling you exactly where you shouldn’t look. What could go wrong with telling people where to find the information you don’t want them to see?

There are many actions you can take to keep visitors and search engines off dev servers and other sensitive areas of the site. Here are the options:

Good: HTTP authentication

Anything you want to keep out of the index should include server-side authentication. Requiring authentication for access is the preferred method of keeping out users and search engines.

Good: IP whitelisting

Allowing only known IP addresses — such as those belonging to your network, clients and so on — is another great step in securing your website and ensuring only those users who need to see the area of the website will see it.

Maybe: Noindex in robots.txt

Noindex in robots.txt is not officially supported, but it may work to remove pages from the index. The problem I have with this method is that it still tells people where they shouldn’t look, and it may not work forever or with all search engines.

The reason I say this is a “maybe” is that it can work and could actually be combined with a disallow in robots.txt, unlike some other methods which don’t work if you disallow crawling (which I will talk about later in this article).

Maybe: Noindex tags

A noindex tag either in the robots meta tag or an X-Robots-Tag in the HTTP header can help keep your pages out of the search results.

One issue I see with this is that it means more pages to be crawled by the search engines, which eats into your crawl budget. I typically see this tag used when there is also a disallow in the robots.txt file. If you’re telling Google not to crawl the page, then they can’t respect the noindex tag because they can’t see it.

Another common issue is that these tags may be applied on the staging site and then left on the page when it goes live, effectively removing that page from the index.

Maybe: Canonical

If you have a canonical set on your staging server that points to your main website, essentially all the signals should be consolidated correctly. There may be mismatches in content that could cause some issues, and as with noindex tags, Google will have to crawl additional pages. Webmasters also tend to add a disallow in the robots.txt file, so Google once again can’t crawl the page and can’t respect the canonical because they can’t see it.

You also risk these tags not changing when migrating from the production server to live, which may cause the one you don’t want to show to be the canonical version.

Bad: Not doing anything

Not doing anything to prevent indexing of staging sites is usually because someone assumes no one will ever link to this area, so there’s no need to do anything. I’ve also heard that Google will just “figure it out” — but I wouldn’t typically trust them with my duplicate content issues. Would you?

Bad: Disallow in robots.txt

This is probably the most common way people try to keep a staging site from being indexed. With the disallow directive in robots.txt, you’re telling search engines not to crawl the page — but that doesn’t keep them from indexing the page. They know a page exists at that location and will still show it in the search results, even without knowing exactly what is there. They have hints from links, for instance, on the type of information on the page.

When Google indexes a page that’s blocked from crawling, you’ll typically see the following message in search results: “A description for this result is not available because of this site’s robots.txt.”

If you recall from earlier, this directive will also prevent Google from seeing other tags on the page, such as noindex and canonical tags, because it prevents them from seeing anything on the page at all. You also risk not remembering to remove this disallow when taking a website live, which could prevent crawling upon launch.

What if you got something indexed by accident?

Crawling can take time depending on the importance of a URL (likely low in the case of a staging site). It may take months before a URL is re-crawled, so any block or issue may not be processed for quite a while.

If you got something indexed that shouldn’t be, your best bet is to submit a URL removal request in Google Search Console. This should remove it for around 90 days, giving you time to take corrective actions.


Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.



About The Author

Patrick Stox
Patrick Stox is product advisor, technical SEO and brand ambassador at Ahrefs. He’s an organizer for the Raleigh SEO Meetup, Raleigh SEO Conference, Beer & SEO Meetup, Findability Conference, and moderator on /r/TechSEO.

Related Topics

All Things SEO ColumnChannel: SEOSEO - Search Engine OptimizationSEO: Crawling and RobotsSEO: Domains & URLs

We're listening.

Have something to say about this article? Share it with us on Facebook, Twitter or our LinkedIn Group.

Get the daily newsletter search marketers rely on.
See terms.

ATTEND OUR EVENTS

Lorem ipsum doler this is promo text about SMX events.

February 23, 2021: SMX Report

April 13, 2021: SMX Create

May 18-19, 2021: SMX London

June 8-9, 2021: SMX Paris

June 15-16, 2021: SMX Advanced

August 17, 2021: SMX Convert

November 9-10, 2021: SMX Next

October 2021: SMX Advanced Europe

December 17, 2021: SMX Code

Available On-Demand: SMX

×


Learn More About Our SMX Events

Discover actionable tactics that can help you overcome crucial marketing challenges. Our next conference will be held:

MarTech 2021: March 16-17

MarTech 2021: Sept. 14-15

MarTech 2020: Watch On-Demand

×

Attend MarTech - Click Here


Learn More About Our MarTech Events

White Papers

  • The State of Local Marketing Report 2020-2021
  • Quality CRM Data: The Key to Delivering Great Customer Experiences
  • How the Microsoft Search Network Can Maximize Your Search Campaigns
  • The Marketer’s Playbook for Customer Acquisition
  • How To Optimize SEO With UGC
See More Whitepapers

Webinars

  • How to Avoid the Digital Transformation Trap
  • How to Build a Marketing System of Record
  • Meet BIMI: The brand-boosting email security marketers must have for 2021
See More Webinars

Research Reports

  • Local Marketing Solutions for Multi-Location Businesses
  • Enterprise Digital Asset Management Platforms
  • Identity Resolution Platforms
  • Customer Data Platforms
  • B2B Marketing Automation Platforms
  • Call Analytics Platforms
See More Research

h
Receive daily search news and analysis.
Search Engine Land
Download the Search Engine Land App on iTunes Download the Search Engine Land App on Google Play

Channels

  • SEO
  • SEM
  • Local
  • Retail
  • Google
  • Bing
  • Social

Our Events

  • SMX
  • MarTech

Resources

  • White Papers
  • Research
  • Webinars
  • Search Marketing Expo
  • MarTech Conference

About

  • About Us
  • Contact
  • Privacy
  • Marketing Opportunities
  • Staff
  • Connect With Us

Follow Us

  • Facebook
  • Twitter
  • LinkedIn
  • Newsletters
  • Instagram
  • RSS
  • Youtube
  • iOS App
  • Google Play

© 2021 Third Door Media, Inc. All rights reserved.

Your privacy means the world to us. We share your personal information only when you give us explicit permission to do so, and confirm we have your permission each time. Learn more by viewing our privacy policy.Ok