Learning SEO From Building A Web Crawler

There is no doubt that you can learn a tremendous amount of information on search engine optimization (SEO) by reading sites like this or ones in our blogroll, but there is always a lot to be learned from getting your hands dirty. Now, you can get your hands dirty by experimenting and trying SEO techniques […]

There is no doubt that you can learn a tremendous amount of information on search engine optimization (SEO) by reading sites like this or ones in our blogroll, but there is always a lot to be learned from getting your hands dirty. Now, you can get your hands dirty by experimenting and trying SEO techniques out on sites and you can also learn an incredible amount by trying to reverse engineer a web crawler by building your own.

In fact, Google Webmaster Analyst, JohnMu, tweeted this morning stating that fact. He said, “Want to learn about indexing/crawling? Don’t read – code a spider.”

That is exactly what SEOmoz did, they built a crawler and index of web pages to better learn about the internet, plus share that data with the industry. Linkscape was introduced in October 2008 and has grown to 44 billion web pages and 474 billion links.

Rand Fishkin of SEOmoz has posted the lessons learned from building an index of the web. So, maybe, in this case, reading about someone else’s experiences and findings in building such a crawler can help you.


Search Engine Land is owned by Semrush. We remain committed to providing high-quality coverage of marketing topics. Unless otherwise noted, this page’s content was written by either an employee or a paid contractor of Semrush Inc.


About the Author

Barry Schwartz

Barry Schwartz is a technologist and a Contributing Editor to Search Engine Land and a member of the programming team for SMX events. He owns RustyBrick, a NY based web consulting firm. He also runs Search Engine Roundtable, a popular search blog on very advanced SEM topics.

In 2019, Barry was awarded the Outstanding Community Services Award from Search Engine Land, in 2018 he was awarded the US Search Awards the "US Search Personality Of The Year," you can learn more over here and in 2023 he was listed as a top 50 most influential PPCer by Marketing O'Clock.

Barry can be followed on X here and you can learn more about Barry Schwartz over here or on his personal site.