Program for analyzing the code of web pages

Of course, to analyze the code of all these pages, I needed a program to automate the progress. I chose to write it in Perl, although it?s not my first language, but it had a lot of great libraries avaliable for parsing HTML and CSS. Most importantly I used the following:

To validate (X)HTML i used offline HTML validator provided by Web Design Group (WDG). You might wonder why I didn't use the well-known W3C Markup Validator? First of all Dagfinn Parnas, who?s research I?m trying to replicate as closely as possible, used this validator, but also the WDG Validator is a way lot easier to install into your computer.

As for CSS, there doesn?t seem to be no other validator available than the W3C CSS Validator. I have to say great thanks to Steve Ferguson, who made available a precompiled version of the CSS Validator ? I tried several days to compile the validator by my own, but didn?t succeed.

To store all the data to be gathered by the program, I created a MySQL database with 38 tables. MySQL was used as this is the database I?m most familiar with.

To retrieve the URI-s from web, I used the well-known GNU Wget.

All the Perl source code of the program is freely available for download and licensed under the same terms as Perl itself (which means: feel free to modify or do whatever you think, just don't do anything nasty). The installation instructions are hidden into the following tarball inside the file INSTALL:

Oh, and if you didn't guessed it already, you need Linux to run the whole thing.

Kirjutatud 12. juunil 2006.


Eesti Trinoloogide Maja. Eesti trinoloogiahuviliste avalik kogunemiskoht.



RSS, RSS kommentaarid, XHTML, CSS, AA