Using Google Code Search To Find Vulnerable Sites
ShoeMoney wrote a detailed write up on how hackers can easily use Google Code Search to quickly find sites that are vulnerable to being hacked. ShoeMoney shows XSS exploits, SQL injection exploits and more. ShoeMoney wasn’t the first to spot this. SEO Egghead wrote about some examples on October 5th. Is Google to blame? I don’t think so.
Postscript From Danny: Finding security exploits via Google or other search engines is pretty old news, going back for years. Below, a recap of some of these issues plus how you need to watch what your systems are spitting out for Google and other search engines.
Another story in July talked of using regular Google to seek out exploits.
Back in January 2005, McAfee released a tool to tap into Google to do the same thing.
Here’s New Scientist with an article on using Google to find exploits back in August 2003. From the lead:
Computer hackers have adopted a startling strategy in their attempts to break into websites. By using the popular search engine Google, they do not have to visit a site to plan an attack. Instead, they can get all the information they need from Google’s cached versions of web pages, say experts in the US.
And another from Wired in March 2003, same topic:
“Google, properly leveraged, has more intrusion potential than any hacking tool,” said hacker Adrian Lamo, who recently sounded the alarm.
Google Code Search scans through just computer code, which potentially makes finding exploits easier. The concerns over this were aired back after it launched in October. See articles such as:
- Google Code Search peers into programs’ flaws from SecurityFocus
- Google Code Search for Security Vulnerabilities from Chris Shiflett (and discusses some of the exact things at ShoeMoney’s article)
- Google Code Search gives security experts a sinking feeling from SearchSecurity.com
- Using Google Code Search to Find Security Bugs from O’Reilly.
I think ShoeMoney’s post is mainly interesting in that he made use of the Google Sitemaps program and was spitting out a file listing everything on his web server. Everything. He writes:
Now while this was interesting it still did not explain how the page was even indexed…. ohh wait I use Google Sitemaps and I had it on to index everything (the default setting) OUPS!!
Now to be honest… this is my fault. I in no way blame Google what so ever. I had old exploitable code on my server and I told sitemaps to index it so… my fault.
I have since been working with the sitemaps team and I had some suggestions to leave some files off by default (like .inc .func) or only allow common web files with extensions like .php .html .asp etc… I hope they do this cause as sitemaps gets more popular its only going to expose more idiot webmasters like me that run with the default settings.
To be clear, sitemaps has no “default” setting to index everything. By default, Google itself will spider any URL it comes across. But the “default” ShoeMoney is talking about almost certainly relates to a third-party sitemaps program to generate a sitemaps file for Google.
I’m not sure what blog software he’s using, but he’s probably got a plug-in running and the defaults of THAT PLUG-IN (not Google) was spitting this all out into a sitemaps file that ShoeMoney was telling Google to index.
The idea of automatically blocking some files from sitemaps is interesting but doesn’t make a lot of sense. Some people don’t use “common extensions” at all and are going to be annoyed to discover that Google is “ignoring” what they told it to index. The idea behind a site owner purposely putting out a sitemaps file is that they are explicitly saying, “index this stuff.” Don’t want it indexed? Don’t put it out on the web.
The real culprit is whatever program is generating links to some of these files, as well as security needing to be tightened over all. ShoeMoney’s pretty with it in not blaming Google. And one expert in that SearchSecurity.com article saw positives in Google Code Search:
Still, the new search engine has plenty of potential as a legitimate tool for developers and could end up being a net positive in terms of security, Caceres said.
“People shouldn’t be so quick to label this a security disaster,” he said. “Security-wise, in the long term I think it could be a good thing because developers will realize that what they do has implications and will be seen. So maybe they’ll be a little more careful.”