Of course, to analyze the code of all these pages, I needed a program to automate the progress. I chose to write it in Perl, although it?s not my first language, but it had a lot of great libraries avaliable for parsing HTML and CSS. Most importantly I used the following:
To validate (X)HTML i used offline HTML validator provided by Web Design Group (WDG). You might wonder why I didn't use the well-known W3C Markup Validator? First of all Dagfinn Parnas, who?s research I?m trying to replicate as closely as possible, used this validator, but also the WDG Validator is a way lot easier to install into your computer.
As for CSS, there doesn?t seem to be no other validator available than the W3C CSS Validator. I have to say great thanks to Steve Ferguson, who made available a precompiled version of the CSS Validator ? I tried several days to compile the validator by my own, but didn?t succeed.
To store all the data to be gathered by the program, I created a MySQL database with 38 tables. MySQL was used as this is the database I?m most familiar with.
To retrieve the URI-s from web, I used the well-known GNU Wget.
All the Perl source code of the program is freely available for download and licensed under the same terms as Perl itself (which means: feel free to modify or do whatever you think, just don't do anything nasty). The installation instructions are hidden into the following tarball inside the file INSTALL:
Oh, and if you didn't guessed it already, you need Linux to run the whole thing.
Kirjutatud 12. juunil 2006.