Bing has posted on the Bing Webmaster blog their best practices on how to implement XML Sitemaps files for your site. The post includes a large topic on how the best practices apply to really large web sites, with many URLs.
Bing’s Six Best Practices For Sitemaps:
- Follow the sitemaps reference at www.sitemaps.org. Common mistakes we see are people thinking that HTML Sitemaps are sitemaps, malformed XML Sitemaps, XML Sitemaps too large (max 50,000 links and up to 10 megabytes uncompressed) and links in sitemaps not correctly encoded.
- Have relevant sitemaps linking to the most relevant content on your sites. Avoid duplicate links and dead links: a best practice is to generate sitemaps at least once a day, to minimize the number of broken links in sitemaps.
- Select the right format: (a) Use RSS feed, to list real-time all new and updated content posted on your site, during the last 24 hours. Avoid listing only the past 10 newest links on your site, search engines may not visit RSS as often as you want and may miss new URLs. (This can also be submitted inside Bing Webmaster Tools as a Sitemap option.) (b) Use XML Sitemap files and sitemaps index file to generate a complete snapshot of all relevant URLs on your site daily.
- Consolidate sitemaps: Avoid too many XML Sitemaps per site and avoid too many RSS feeds: Ideally, have only one sitemap index file listing all relevant sitemap files and sitemap index files, and only one RSS listing the latest content on your site.
- Use sitemap properties and RSS properties as appropriate.
- Tell search engines where our sitemaps XML URLs and RSS URLs are located by referencing them in your robots.txt files or by publishing the location of your sitemaps in search engines’ Webmaster Tools.
Note, while these are Bing’s best practices, Google may differ on what they recommend. Make sure to review Google’s Sitemaps guidelines as well.
Bing & Really Large Sites
Bing want you to be cautious about if you really need a ton of URLs listed in your Sitemap file. If you do, a single Sitemap index file can support up to 2.5 billion links, that is 50,000 links per Sitemap file times 50,000. But you can go all the way up to 125 trillion links by using multiple sitemap index files.
Bing tells you to ask yourself, “think first if you really need so many links on your site. In general search engines will not crawl and index all of that. It’s highly preferable that you link only to the most relevant web pages to make sure that at least these relevant web pages are discovered, crawled and indexed.”
For more details, see the Bing webmaster blog.