Since Google launched XML Sitemaps back in 2005, they’ve added specialized formats to enable site owners to submit content other than web pages. Until now, site owners have had to create separate Sitemaps for each content type:
That’s now changed. You can now create a single XML Sitemap that contains any combination of these content types. The Google Webmaster Central blog post doesn’t mention News Sitemaps, so presumably news content can’t be mixed with the other types.
Great news for site owners? Possibly. It may be easier to create and maintain a single Sitemap in some cases, but the lowest overhead way to create and maintain Sitemaps generally is via a script that creates and updates the file automatically. And it might be easier to keep track of things separately.
Certainly, from a metrics perspective, it may make sense to keep content types separated. When you submit an XML Sitemap to Google Webmaster Tools, you can see a report of the total number of URLs in the Sitemap and the number of those URLs indexed. For a long time, I’ve suggested that site owners create separate Sitemaps for different types of content and page types to easily track what percentage of each of those types Google is indexing. (This report provides much more accurate numbers than a site: operator search, although it only works accurately if the Sitemap contains a comprehensive and canonical list of URLs for a category.)
The previous limits on URLs and size (50,000 URLs and 10MB) still apply as well, so many sites will find that everything won’t fit in one file in any case.
What about the sitemaps.org alliance?
The question I have about this move, however, is what about sitemaps.org? In 2006, Google, Microsoft, and Yahoo came together to support a joint protocol. While the standard XML Sitemaps protocol for web pages is supported jointly, Google launched specialized Sitemaps on their own, and not as part of that alliance. If site owners start modifying their web XML Sitemaps to include additional markup needed by the additional formats, won’t that break the other engines’ ability to parse the files? Doesn’t that mean that for all practical purposes, Google is encouraging site owners to submit XML Sitemap files that don’t adhere to the standard and can be used only by Google?
In spite of sitemaps.org and the later joint support of autodiscovery in robots.txt, it appears that Google isn’t keeping either in mind as they evolve XML Sitemaps. They may say that they would love for the other engines to support this new combined format, but they didn’t involve the Microsoft or Yahoo as they developed the specialized formats, and it’s unlikely they gave advance warning of this launch to give the other engines time to at least adjust their parsers to be able to handle this new combined format.
Maybe this move is no big deal. But a lot of work went into the launch of that alliance, with the goal of making things easier for content owners web-wide, and not just for one search engine. So the partial dismantling of it, even in spirit, is a bit disappointing.
Update: It’s true that the specialized elements shouldn’t break existing parsers, even if those parsers weren’t built specifically for those additional elements. However, I do think it’s quite possible that the existing (non-Google) parsers aren’t set up to process files with additional elements and this change could break them. It’s definitely the case that the search engines aren’t working together on these extensions, and I’d just like to see more cooperation and advancement of the original aims of the alliance: making it easier for content owners to work with search engines.