Advanced

warning: Creating default object from empty value in /home/skyontech/www/www/blog/modules/taxonomy/taxonomy.pages.inc on line 34.

Adding a cache to your scripts that read HTML files

I'm a big fan of Perl, but seem to use it for just one reason these days: reading and processing various HTML files out on the web (ie writing a spider). The LWP Perl modules makes this easy, and can honor robots.txt automatically. I've found, though, especially during development, that I don't want to access the live web page again and again as I develop the rest of the script. This is even more relevant for HTML pages that take a long time to return, or when I'm developing the script offline, or without intranet access.

My solution? Adding some HTML cache code in my object that can save a copy of the HTML locally the first time it is accessed, then automatically use that file each subsequent run.

Syndicate content