Ever since Google Webmaster Tools started reporting on broken links to a site, webmasters have been asking for the sources of those links. Today, Google has delivered. From Webmaster Tools you can now see the page that each broken link is coming from. This information should be of great help for webmasters in ensuring the visitors find their sites and that their links are properly credited.
The value of the 404 error report
Why does Google report broken links in the first place? As Googlebot crawls the web, it stores a list of all the links it finds. It then uses that list for a couple of things:
- As the source list to crawl more pages on the web
- To help calculate PageRank
If your site has a page with the URL www.example.com/mypage.html and someone links to it using the URL www.example.com/mpage.html, then a few things can happen:
- Visitors who click on that link arrive at the 404 page for your site and aren’t able to get to the content they were looking for
- Googlebot follows that link and instead of finding a valid page of your site to crawl, receives a 404 page
- Google can’t use that link to give a specific page on your site link credit (because it has no page to credit)
Clearly, knowing about broken links to your site is valuable. The best solution in these situations generally is to implement a 301 redirect from the incorrect URL to the one. If you see a 404 error for www.example.com/mpage.html, then you can be pretty sure they meant to link to www.example.com/mypage.html. By implementing the redirect, visitors who click the link find the right content, Googlebot finds the content, and mypage.html gets credit for the link. In addition, you can scan your site to see if any of the broken links are internal, and fix them. But finding broken links on your site can be tedious (although it’s valuable to run a broken links checker on your site in any case to ensure you’re providing the best user experience possible) and it would be nice to be able to fix broken external links at the source, rather than implementing all those redirects.
New data: the source of the broken link
Now fixing the source has gotten a lot easier. Beside each broken link listed in the 404 report will be the source URL for that link. You can download the report into Excel and sort it by the source URLs to get a list of all the internal broken links so you can easily fix them. You’ll then also have a list of all the external sites with broken links to your pages. You can contact the site owners and ask for the links to be fixed (which will help the user experience for their visitors as well). You won’t be able to get all external links fixed, so for the rest, you can continue implementing redirects.
Accessing the data
To view the broken links to your pages and the source URL for them:
- Log into Google Webmaster Tools and access your site (you have to be verified as a site owner).
- Click Web Crawl from the Diagnostics tab.
- Click Not Found to view the URLs that return 404 errors.
- Click the Linked From number to view a list of links to individual 404 pages.
- Click Download all sources of errors on this site to download a CSV file that you can open and sort in Excel.
Microsoft Live Search recently added 404 reporting in their webmaster tools as well, although as with Google’s initial launch, they don’t provide source URL data yet.
Google has also added functionality to the Webmaster Tools API to enable site owners to provide input on control settings (such as preferred domain and crawl rate) that could previously only be done via the application. As they note in the blog post:
“This is especially useful if you have a large number of sites. With the Webmaster Tools API, you can perform hundreds of operations in the time that it would take to add and verify a single site through the web interface.”