Google Authorship Beyond Webpages
In last month’s post about authorship, I shared that Google has been experimenting with inferring authorship for PDF documents in addition to webpages. This piqued my curiosity to see if any other indexable filetypes could also have inferred authorship. Microsoft Office Files PowerPoint files appear to infer authorship similar to PDFs and webpages, looking for […]
In last month’s post about authorship, I shared that Google has been experimenting with inferring authorship for PDF documents in addition to webpages. This piqued my curiosity to see if any other indexable filetypes could also have inferred authorship.
Microsoft Office Files
PowerPoint files appear to infer authorship similar to PDFs and webpages, looking for the term “by” followed by the author’s name.
To generate the authorship snippet on an Excel file, I had to add “by Janet Driscoll Miller” to a tab in the workbook, and Google uses the tab name as the title of the page. Having a byline appear only in a cell of the worksheet wasn’t enough to generate the snippet.
The most interesting case, though, was with Word documents. Using an old whitepaper I had, I did some testing with the byline again. In one version where I removed the byline, the author snippet was still showing, even though I had removed the words “by Janet Driscoll Miller” and I had no other byline in the document.
After combing through the document, I found that there was a paragraph at the end of the paper that could be the culprit.
Although there was no traditional byline, it appeared that this paragraph at the end of the document did help Google identify me as the author. To test this, I tried a version with the “About the Author” paragraph removed.
No author snippet. What this demonstrates to me is that, while traditional bylines are the most common way for Google to infer authorship, the search engine is increasingly able to do so based on context (to some degree).
Other Types Of Text-Based Files
Since Google can read text in other types of text-based files, would it be able to infer authorship within these documents? I tested rich text format (.rtf) and text format (.txt) files. Interestingly, author snippets only showed for rich text format documents and, as with the Word document with the “About the Author” section, authorship was inferred by more than just a byline.
Interestingly, regular text files did not generate any form of author snippet.
While Google can’t read text in a JPG file or other types of image files, it can index certain types of vector graphic files, such as SVG and postscript files. Could Google infer authorship from text within these files? As you can see, Google showed authorship when a byline was included in the SVG file in its text.
However, I couldn’t get an authorship snippet to display when I saved the same file as a postscript file.
Considering they are tied to your Google ID, it would seem sensible for Google Docs to show authorship if those documents are open for Web sharing. While I had trouble generating my own snippet to show, I was able to find one example of authorship showing for a presentation.
In my estimation, a natural fit for authorship would be actual books listed in Google Books — however, it doesn’t appear that the authorship snippet has been applied to Google Book listings yet.
These listings came from a search for content on the Google Books site; but, Web search didn’t yield an author snippet, either.
Over the next month, I’ll continue to work on author testing to see what other goodies I can find!
Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.