Building A Macro To Download Webpages With Excel

When conducting research, it’s not uncommon to perform repetitive tasks like downloading a group of webpages. A favorite tool I use to automate this is iMacros — it’s free and powerful. Just be sure to use the Firefox version, as its capabilities are limited in other browsers.

Say you want a list of the Fortune 500 companies’ websites. Normally, this would be an arduous task. CNN’s Fortune magazine website links to the websites, but you have to visit every company profile on a separate page. Watch what I do with a spreadsheet and a macro.

Get a list of company page URLs from the CNN Money Fortune 500 list.

I use the Web Developer tool to get a list of links to all the company profiles. Just select Information and View Link Information.

View links on a webpage.

View links on a webpage

Then, copy the links you want to scrape.

A list of webpage links.

A list of webpage links

And, paste into Excel.

A  list of webpage links in Excel.

A list of webpage links in Excel

Add these formulas to row 1:

Column B: 1 You will assign a unique number to each row. This will be used to name your download files.

Column C: =”URL GOTO=” & A1 This is the command that will tell the macro what web page to visit.

Column D: =”SAVEAS TYPE=HTM FOLDER=c:\web-pages FILE=” & B1 & “.html” This saves a copy of the web page on your hard drive.

Column E: WAIT SECONDS=3 This instruction tells the macro to pause for three seconds. I find a brief pause between pages helps keep the macro running smoothly.

In Excel, number column B 1 to n (1, 2, 3, 4, etc.).

Copy row 1 of columns C, D, and E then paste them into every row so your spreadsheet looks like this:

A Web browser macro in Excel.

A Web browser macro in Excel

You are now done with Excel and ready to create your macro. First, each cell has to become a separate cell. Below is my trick for accomplishing this.

Copy the macro in Excel, columns C, D, and E; only the completed cells, no empty cells.

Paste special as text into Word.

Formatting the Excel web browser macro in Word.

Formatting the Excel Web browser macro in Word

 

In Word, use Find and Replace. Find ^t, tabs, and replace with ^p, paragraph breaks. Replace All. Your macro will change to one line per instruction.

The formatted macro in Word.

The formatted macro in Word

Before you go further, you will need to create your download folder. In this tutorial, it is

C:/web-pages

Install iMacros for Firefox. Avoid using the Chrome and other browser versions as they lack important features. After installing, you will find the program in your View Sidebars menu.

iMacros for Firefox

iMacros for Firefox

Copy your macro from Word.

In iMacro, open #Current.imm, or any macro. It does not matter because you will give your macro its own name. Select Edit then Edit Macro. Paste your macro into the editor. Instead of the Save button, click on the disk icon – save-icon. Name your macro #Fortune.imm.

The iMacros editor.

The iMacros editor

Go back to the iMacro menu. Select Edit, Refresh Macro List.

Click on #Fortune.imm in the macro list. Select Play, Play (the big button).

Your macro will run and download all the pages into C:/web-pages.

And there you have it. This tutorial shows three tricks I use:

  • Using macros to automate repetitive tasks
  • Using Excel formulas to write my macros
  • Using Word to convert tabular Excel content into separate lines

I admit I am leaving you hanging because, at this point, you have a set of HTML files on your hard drive, but you do not yet have a list of links to the actual Fortune 500 websites. To automate that part, I use a PHP script that opens each HTML file and extracts the URLs using PHP’s pattern matching command.

I’ll share my PHP script in my next column.

I use macros to capture all sorts of information, including SEO tools data for multiple sites and even Google keyword tool suggestions. Macros make it a breeze to dig through lots of pages, and Excel makes the creation of macros almost effortless.

Opinions expressed in the article are those of the guest author and not necessarily Search Engine Land.

Related Topics: All Things SEO Column | Channel: Analytics | How To | How To: Analytics

Sponsored


About The Author: operates Schmitz Marketing, an Internet Marketing consultancy helping brands succeed at Inbound Marketing, Social Media and SEO. You can read more from Tom at Hitchhiker's Guide to Traffic.

Connect with the author via: Email | Twitter | Google+ | LinkedIn



SearchCap:

Get all the top search stories emailed daily!  

Share

Other ways to share:

Read before commenting! We welcome constructive comments and allow any that meet our common sense criteria. This means being respectful and polite to others. It means providing helpful information that contributes to a story or discussion. It means leaving links only that substantially add further to a discussion. Comments using foul language, being disrespectful to others or otherwise violating what we believe are common sense standards of discussion will be deleted. Comments may also be removed if they are posted from anonymous accounts. You can read more about our comments policy here.
  • TmWe

    Sounds quite like the ‘Super Seo Unblockable Website Scraping & Ddos Tool’ that I keep getting sales emails for.

  • http://www.seo-theory.com/ Michael Martinez

    Just what we need: More Crawl Spam techniques for the mindless SEO industry to abuse. Not a good article at all, my friend.

  • http://twitter.com/TomSchmitz Thomas M. Schmitz

    The practicing SEO understands web page retrieval is required to undertake certain projects. When activities becomes formulaic and repetitive, automation makes sense, which is why tools like Screaming Frog are well received. I doubt I’ve opened the floodgate, let alone cracked the levy.

  • http://www.seo-theory.com/ Michael Martinez

    You’re contributing to the problem by encouraging it and excusing yourself from taking responsibility. Bad SEO is bad SEO and all this link checking crap is the worst-possible SEO. No one needs to grab all these pages from Websites. That’s inexcusable, irresponsible, and completely unethical. “Let’s just pile more bad practices on top of others because of Penguin” doesn’t cut the mustard. Crawl Spam has to stop. NO MORE EXCUSES.

 

Get Our News, Everywhere!

Daily Email:

Follow Search Engine Land on Twitter @sengineland Like Search Engine Land on Facebook Follow Search Engine Land on Google+ Get the Search Engine Land Feed Connect with Search Engine Land on LinkedIn Check out our Tumblr! See us on Pinterest

 
 

Click to watch SMX conference video

Join us at one of our SMX or MarTech events:

United States

Europe

Australia & China

Learn more about: SMX | MarTech


Free Daily Search News Recap!

SearchCap is a once-per-day newsletter update - sign up below and get the news delivered to you!

 


 

Search Engine Land Periodic Table of SEO Success Factors

Get Your Copy
Read The Full SEO Guide