The SEO purist may argue why anyone would ever want to use PDF content on a website for search purposes. The reality, however, is that many businesses have a lot of PDF assets. These may include sell sheets, brochures, white papers, technical briefs, etc. The purist simply says why not convert these to html? In the real world, not everyone has the time, budget, and expertise to do that. There may also be other “marketing” reasons. Perhaps a company wants its prospects to experience the content along with all the other brand elements inherent in its print materials. Whatever the reason, there are lots of PDFs available on the web, and you can optimize PDFs to get high-ranking search results. Here are some tips on the right way to do it.

1. Make sure your PDFs are text based. Okay, this first one is pretty obvious. However, we still find companies whose materials were designed in an image-based program. When the PDF is made using these programs, the PDF is an image; there is no text for the search engines to read.

2. Complete the document properties. It seems like the vast majority of PDFs are without specified document properties, the most important of which is the Title. The Title property, if present, almost invariably represents the words that will be displayed as the heading of the search result. It’s the equivalent of the html title tag. If you don’t complete the Title property, the search engine is going to generate a title from the PDF’s content, and it may not be what you would choose. We’ve all seen some pretty goofy looking titles to search results associated with PDFs. Not only do they look ridiculous, but they probably won’t get clicked. In the full version of Acrobat, go to File>Document Properties to specify the Title.

There are other document properties (meta data) you can supply, including Author, Subject, and Keywords, but presently these appear to have little search-related affect. It would be nice if Subject acted as the meta description to be displayed under the heading of the search result, but I haven’t seen this to be true. For now, however, I’d complete the Subject property as if it were a meta description. Perhaps in the future search engines will treat it as such.

3. Optimize the copy. Copy in text-based PDFs is no different than web-page copy. Optimize it.

4. Build links into PDFs. Make sure you include links in your PDFs, and pay attention to the anchor text used. Search engines do recognize these links. Not very often, but sometimes you’ll find backlinks in PDFs. Their limited occurrence, however, is likely related to the fact that most people don’t put links into PDFs; most people treat PDFs as static print documents. In addition to including links in PDFs for search-related purposes, there’s also a good business reason. Often, PDFs are passed along to others via email. Accordingly, a reader may be viewing the PDF in isolation (i.e., not associated with your website.) By placing links into PDFs, you give these readers an easy way to click back into your site, where you can further influence them.

5. Pay attention to the version. While search engines do “read” and index PDFs, search engines’ capabilities tend to lag new versions of Acrobat. Although Acrobat 8 is out, for now you should save your PDFs as version 1.6 (Acrobat 7) or lower to ensure search engines can index the content.

Not only is saving PDFs at a lower version good for the search engines, it’s also good for users. Not everyone has the latest versions of Acrobat Reader. Accordingly, I’d recommend saving PDFs as version 1.5 or lower. This way it will be good for search engines and most readers.

6. Optimize the file size for search. Don’t post a huge PDF for download. Not only is this annoying and unnecessary for site visitors, it’s also burdensome for the search engines. If it’s too big, the search engines may abandon the PDF before even getting access to its content. Using the full version of Acrobat, select Advanced>PDF Optimizer to “right-size” the document.

You may also want to enable the “Optimize for Fast Web View” option in the Preferences>General Settings panel. This allows the PDF to be “loaded” a page at a time, rather than waiting for the whole PDF to download.

7. Pay attention to placement. If you bury links to PDFs deep within your site’s file structure, they’re less likely to get indexed. If you want to use PDFs for high-ranking search results, links to those PDFs should be on web pages closer to the root level of the site’s file structure.

8. Influence meta descriptions for PDFs. For web pages, the meta description is what is displayed under the title in a search result. With PDFs, the search engines search the copy of the PDF and select something to display. While with PDFs you have less control of what is displayed as the description to the search result, you can still influence this. The best way to do this is to make sure that you have a good, optimized sentence or two near the start of your PDF. If these sentences correspond to the search term used, it’s likely that these sentences are the ones that will be displayed as the description under the search result’s heading.

9. Specify the reading order. As noted above, search engines search the copy of the PDF and select something to display as a description under the search result’s heading. Depending on how the reading order of your PDF is specified, this may lead the search engine to select some pretty strange stuff to display.

In a previous column, Organic Landing Page: A Case Study, I noted a search result for “transit seating.” That search result is noted below:

Admittedly, this is not a very enticing description, and it’s not likely to get clicked even if it ranks highly in the search results. Why did Google select this text to display? Because it’s the first thing Google read in the PDF.

Every PDF has a reading order. Similar to properly optimized web pages, you want to make sure that valuable content is read first. How do you know the reading order? With the PDF open and while using the full version of Acrobat, select Advanced>Accessibility>Add Tags to Document. Then select Advanced>Accessibility>Touch Up Reading Order. Then the reading order of the PDF will be displayed.

You can see in the image above that the reading order of the transit seating PDF does not start with valuable content. Rather, many extraneous items are “read” before the valuable content. That’s why Google displayed what it did in the search result. If you want PDFs to be optimized for search, make sure you understand the reading order of the PDF and use the Touch Up Reading Order tool to manage what the search engine will read first.

10. Tag your PDFs You can also add tags to your PDFs, similar to html tags. Again, with the PDF open and while using the full version of Acrobat, select Advanced>Accessibility>Add Tags to Document. Acrobat will give you a document report and recommend things you may want to consider changing. You’ll have the ability to tag headings, alternate text for images, etc.

11. Pay attention. Every time you open a PDF, make even a small change, and save it once again, major unseen things may change. The reading order may change automatically. You may inadvertently save it as a higher version. It may get saved using the default size setting instead of a properly optimized size. If you’re going to further optimize existing PDFs, may sure you check all of these things before posting a new version of the PDF.

Opinions expressed in the article are those of the guest author and not necessarily Search Engine Land.

Related Topics: B2B Search Marketing Column | Channel: Search Marketing

Sponsored


About The Author: is Managing Director of Proteus SEO , which specializes exclusively in B2B search engine optimization, and Proteus B2B, which specializes in repositioning business-to-business companies and their brands. You can reach Galen at gdeyoung@proteusb2b.com and follow him on Twitter.

Connect with the author via: Email



SearchCap:

Get all the top search stories emailed daily!  

Share

Other ways to share:

Read before commenting! We welcome constructive comments and allow any that meet our common sense criteria. This means being respectful and polite to others. It means providing helpful information that contributes to a story or discussion. It means leaving links only that substantially add further to a discussion. Comments using foul language, being disrespectful to others or otherwise violating what we believe are common sense standards of discussion will be deleted. Comments may also be removed if they are posted from anonymous accounts. You can read more about our comments policy here.
  • crimsongirl

    Do you have to use Acrobat to create the PDF? Or can I use Photoshop and save it as a PDF? Does the text in a Photoshop PDF count as text or as an image?

  • http://www.visionefx.net rickvidallon

    Great advise.
    Learned some new tips here!

    Thanks,
    Rick Vidallon

  • http://www.jabz.biz/ Jab

    Funny thing,…I just optimized my PDF-Files. Unfortunately I still have a question. I posted it on DigitalPoint Forums: http://forums.digitalpoint.com/showthread.php?t=471185 It has to do with cacheing and the HTML-Version of my documents. Maybe you can help.

    Best regards – Jab

  • http://www.writingassist.com Technical Writer

    Great piece, Galen. Although they are generally well-skilled in Adobe Acrobat, technical writers and technical marketing writers often deliver PDF documents to clients without giving any thought to properly tagging them for search.

  • http://www.a3webtech.com rolygate

    Thank you for an excellent PDF SEO tutorial. Many of these tips, like source-ordering, are invaluable.

    On a different note, do you think many people still use Acrobat reader? Perhaps in the corporate world they do. I believe that Internet-savvy people are more likely to use FoxitReader, though – it is far quicker, and speed is what you need here. Editing-wise, for me Acrobat Full brings a new meaning to the word “agony”. Alternative editing apps might be usefully investigated.

  • http://www.francis-seo.com Galen De Young

    Crimsongirl:

    You can use other programs to generate pdfs, and ideally, they should be text-based programs, such as Word, Quark, etc. Image-based programs generally generate image-based pdfs. A quick way to test things is to try to select the text in your pdfs. If you can select the text (not as a box, but as individual words and letters), then you’re likely okay. However, as I noted in the article, just having it text-based won’t help if you don’t consider the other matters as well.

    Jab:

    While Google doesn’t appear to provide a cache date for pdfs in “view as html” mode, it appears these are cached images. Google’s language: “Google automatically generates html versions of documents as we crawl the web.” I’m not aware of any option to stop that, short of blocking search engines from the pdf content, which would result in the pdf content not being indexed at all.

  • http://www.sparkinternetmarketing.com/blog spark internet marketing

    Thanks for the info. Many of our business clients have a lot of PDF content and helping them optimize it has been challenging. We’ll certainly pass along these tips.

 

Get Our News, Everywhere!

Daily Email:

Follow Search Engine Land on Twitter @sengineland Like Search Engine Land on Facebook Follow Search Engine Land on Google+ Get the Search Engine Land Feed Connect with Search Engine Land on LinkedIn Check out our Tumblr! See us on Pinterest

 
 

Click to watch SMX conference video

Join us at one of our SMX or MarTech events:

United States

Europe

Australia & China

Learn more about: SMX | MarTech


Free Daily Search News Recap!

SearchCap is a once-per-day newsletter update - sign up below and get the news delivered to you!

 


 

Search Engine Land Periodic Table of SEO Success Factors

Get Your Copy
Read The Full SEO Guide