When conducting research, it’s not uncommon to perform repetitive tasks like downloading a group of webpages. A favorite tool I use to automate this is iMacros — it’s free and powerful. Just be sure to use the Firefox version, as its capabilities are limited in other browsers.
Say you want a list of the Fortune 500 companies’ websites. Normally, this would be an arduous task. CNN’s Fortune magazine website links to the websites, but you have to visit every company profile on a separate page. Watch what I do with a spreadsheet and a macro.
Get a list of company page URLs from the CNN Money Fortune 500 list.
I use the Web Developer tool to get a list of links to all the company profiles. Just select Information and View Link Information.
Then, copy the links you want to scrape.
And, paste into Excel.
Add these formulas to row 1:
Column B: 1 You will assign a unique number to each row. This will be used to name your download files.
Column C: =”URL GOTO=” & A1 This is the command that will tell the macro what web page to visit.
Column D: =”SAVEAS TYPE=HTM FOLDER=c:\web-pages FILE=” & B1 & “.html” This saves a copy of the web page on your hard drive.
Column E: WAIT SECONDS=3 This instruction tells the macro to pause for three seconds. I find a brief pause between pages helps keep the macro running smoothly.
In Excel, number column B 1 to n (1, 2, 3, 4, etc.).
Copy row 1 of columns C, D, and E then paste them into every row so your spreadsheet looks like this:
You are now done with Excel and ready to create your macro. First, each cell has to become a separate cell. Below is my trick for accomplishing this.
Copy the macro in Excel, columns C, D, and E; only the completed cells, no empty cells.
Paste special as text into Word.
In Word, use Find and Replace. Find ^t, tabs, and replace with ^p, paragraph breaks. Replace All. Your macro will change to one line per instruction.
Before you go further, you will need to create your download folder. In this tutorial, it is
Install iMacros for Firefox. Avoid using the Chrome and other browser versions as they lack important features. After installing, you will find the program in your View Sidebars menu.
Copy your macro from Word.
In iMacro, open #Current.imm, or any macro. It does not matter because you will give your macro its own name. Select Edit then Edit Macro. Paste your macro into the editor. Instead of the Save button, click on the disk icon – . Name your macro #Fortune.imm.
Go back to the iMacro menu. Select Edit, Refresh Macro List.
Click on #Fortune.imm in the macro list. Select Play, Play (the big button).
Your macro will run and download all the pages into C:/web-pages.
And there you have it. This tutorial shows three tricks I use:
- Using macros to automate repetitive tasks
- Using Excel formulas to write my macros
- Using Word to convert tabular Excel content into separate lines
I admit I am leaving you hanging because, at this point, you have a set of HTML files on your hard drive, but you do not yet have a list of links to the actual Fortune 500 websites. To automate that part, I use a PHP script that opens each HTML file and extracts the URLs using PHP’s pattern matching command.
I’ll share my PHP script in my next column.
I use macros to capture all sorts of information, including SEO tools data for multiple sites and even Google keyword tool suggestions. Macros make it a breeze to dig through lots of pages, and Excel makes the creation of macros almost effortless.
Opinions expressed in the article are those of the guest author and not necessarily Search Engine Land.