Google has rolled out a new Google Patents search engine (Official Google Blog post here), allowing people to search across the full-text of US patents from 1790 through the middle of 2006. The company plans to update the service further with more 2006 filings and move to a weekly or better refresh schedule in the future.
US patents can already be searched by the general public using, among other resources, the search service provided by the US Patent & Trademark Office. So why bother with a new service from Google?
"We’ve really applied the Google experience to it. It’s easier so you don’t have to apply the fielded searches," said Bill Brougher, a group product manager at Google overseeing the project. "I think it will open up patent search to a lot of non-lawyers."
Google has used the scanning technology used for Google Book Search to create a full-text index of all US patents through the middle of 2006. The USPTO also offers full-text searching, but only for patents going back to 1976. Before that date, you have to locate patents using relatively limited meta data (author, abstract, issue date, etc.).
In contrast, the Google product should mean that you can quickly search across the entire text of the patents — including meta data fields — to find matches. I haven’t tested how well it works, as the service has just gone live. However, Search Engine Land patents correspondent Bill Slawski is playing with the service now to take a close up look. We’ll postscript to that review when it is ready.
All US patents, over 7 million of them, are included in the service through mid-2006. Patent applications are not, but Brougher said adding these would be a "logical" addition to the service, as will be the inclusion of non-US patents.
Google is licensing the patent information, though Brougher didn’t know how much was spent to obtain the patent images that were scanned. He did joke that it wasn’t much relatively speaking, otherwise he’d likely recall the figure more readily.
The service will offer an advanced search page, to serve those who do want to do fielded searches (where you search within "fields" or "categories" like author name). However, only a limited set of what Google considers the most important fields will be offered. Depending on what’s there, some professionals may still want to use the USPTO service or others.
If so, Google won’t mind too much. This isn’t seen as a replacement for existing tools, both free and paid.
"Clearly most patent attorneys will continue to use the very nice paid services with lots of other features we haven’t added, but I think this will be useful enough for some non-attorneys to use," Brougher said.
The patent office adds updates each week, Brougher said, so once the rest of 2006 has been added to the service, Google expects to follow a weekly schedule of updates. However, he had no timeline as to when the rest of the data would be added or when regular updates would start.
At launch, there will be no features such as getting alerts via email or web feeds, but that might be something added in the future, Brougher said.
As for the origins of Google Patent search, it started several years ago as one of Google’s famed 20 percent time projects, time that engineers have to work on anything that interests them. However, it morphed into having a large team effort get behind rolling it out and supporting it as a regular service.
Postscript by Bill Slawski
This is a very new service from Google, and I’m expecting to see some errors. I’m seeing those. The system probably needs some debugging.
In addition to “page not found” or server error messages, I’m a little concerned about some of the results that I am seeing.
Should a patent search return all of the documents that mention a query or just the documents that the search engine deems the most relevant? Should it rank them by date order or by some perceived relevance, which Google states that they do on the About Google Patent Search page?
Q. How do you rank patents in the search results?
A. As with Google Web Search, we rank patent results according to their relevance to a given search query. We use a number of signals to evaluate how relevant each patent is to a user’s query, and we determine our results algorithmically.
I ran a post called Trends in Search Related Patents last week at SEO by the Sea in which I looked at how many times the following words showed up in a search using the “spec” operator in the granted patents database at the USPTO: Internet, Search Engine, Algorithm, Google, and Yahoo!
The “spec” search operator at the USPTO attempts to match words to a query from these fields in a patent: “patent description, including a brief summary and background of the invention, the detailed description, and a brief description of the drawing, if applicable.” It doesn’t search the fields for titles, or abstracts, or claims. But it returns all of the documents that contain the word in the description fields, and does so in date/patent number order.
Here’s a comparison between the USPTO search and the Google search of the number of documents that show up for each year in a ten year period for the use of the word “Internet”:
USPTO – 84 patents
Google – 124 patents
USPTO – 217 patents
Google – 246 patents
USPTO – 419 patents
Google – 460 patents
USPTO – 1,740 patents
Google – 467 patents
USPTO – 3,389 patents
Google – 416 patents
USPTO – 5,055 patents
Google – 537 patents
USPTO – 6,905 patents
Google – 614 patents
USPTO – 8,810 patents
Google – 651 patents
USPTO – 11,031 patents
Google – 667 patents
USPTO – 13,800 patents
Google – 732 patents
USPTO – 14,368 patents
Google – 740 patents
It appears that there is some type of filtering going on here to limit the number of documents returned. While that might not be a bad idea, the fact that the reasons why thousands of documents might not appear as a result of a search, without any transparent understanding of why, might be troublesome to many searchers who rely more on accuracy than some machine generated relevance.
The granted patent search at the USPTO offers thirty-one search operators which you can use to find or sort patents. Google’s new patent search offers considerably less, but includes what are probably the most useful of the bunch.
Examples of Google’s unique patent search operators
By Patent Number – patent:1247412
By Inventor – ininventor:edison
By Assignee – inassignee:fairchild
By Current US Classification – uspclass:”99/385″
By International Classification – intlpclass:”A63B 6308″
Search by Issue Date or Filing Date: enter a range of dates in provided fields
Google Search Operators
* The “-” operator excludes all results that include this search term, as in [ flying -airplane ];
* Phrase search only returns results that include this exact phrase, as in [ "over the shoulder" ];
* The “OR” operator returns results that include either of your search terms, as in [ rayon OR nylon ];
The search operators that are offered to searchers here are probably sufficient for most folks who will use this system. For someone interested in exploring patents in more detail, using Google’s patent search along with the USPTO search may helpful.
Some Papers About Patent Search
These are some of the more interesting and/or highly cited documents that I was able to locate about issues surrounding patent search.
Christopher Lucas. (2004) Patent Semantics: Analysis, Search, and Visualization of Large Text Corpora (pdf), M.Eng. EECS Thesis.
Caspar J. Fall, A. Törcsvári, K. Benzineb, and G. Karetka, (2003) Automated Categorization in the International Patent Classification (pdf), SIGIR Forum, v. 37, i. 1, p. 10 – 25
Michele Fattori, Giorgio Pedrazzi, and Roberta Turra, (2003) Text mining applied to patent mapping: a practical business case (pdf)
Leah S. Larkey, Margaret Connell, and Jamie Callan, (2000) Collection Selection and Results Merging with Topically Organized U.S. Patents and TREC Data (pdf), In Proceedings of Ninth International Conference on Information Knowledge and Management, (November 6-10, 2000. Washington D.C.) ACM Press, pp. 282-289.
Leah S. Larkey, (1999) A Patent Search and Classification System In Digital Libraries (pdf), 99 – The Fourth ACM Conference on Digital Libraries (Berkeley, CA, Aug. 11-14 1999) ACM Press, pp. 79-87.
Leah S. Larkey, (1998) Some Issues in the Automatic Classification of U.S. Patents In Learning for Text Categorization (pdf). Papers from the 1998 Workshop. AAAI Press, Technical Report WS-98-05, pp. 87-90.