There is no doubt that you can learn a tremendous amount of information on search engine optimization (SEO) by reading sites like this or ones in our blogroll, but there is always a lot to be learned from getting your hands dirty. Now, you can get your hands dirty by experimenting and trying SEO techniques out on sites and you can also learn an incredible amount by trying to reverse engineer a web crawler by building your own.
In fact, Google Webmaster Analyst, JohnMu, tweeted this morning stating that fact. He said, “Want to learn about indexing/crawling? Don’t read – code a spider.”
That is exactly what SEOmoz did, they built a crawler and index of web pages to better learn about the internet, plus share that data with the industry. Linkscape was introduced in October 2008 and has grown to 44 billion web pages and 474 billion links.
Rand Fishkin of SEOmoz has posted the lessons learned from building an index of the web. So, maybe, in this case, reading about someone else’s experiences and findings in building such a crawler can help you.