Deprecated: mysql_connect(): The mysql extension is deprecated and will be removed in the future: use mysqli or PDO instead in /home/triinne/public_html/mysqlconnect.php on line 3
Program for analyzing the code of web pages

Program for analyzing the code of web pages

Of course, to analyze the code of all these pages, I needed a program to automate the progress. I chose to write it in Perl, although it?s not my first language, but it had a lot of great libraries avaliable for parsing HTML and CSS. Most importantly I used the following:

To validate (X)HTML i used offline HTML validator provided by Web Design Group (WDG). You might wonder why I didn't use the well-known W3C Markup Validator? First of all Dagfinn Parnas, who?s research I?m trying to replicate as closely as possible, used this validator, but also the WDG Validator is a way lot easier to install into your computer.

As for CSS, there doesn?t seem to be no other validator available than the W3C CSS Validator. I have to say great thanks to Steve Ferguson, who made available a precompiled version of the CSS Validator ? I tried several days to compile the validator by my own, but didn?t succeed.

To store all the data to be gathered by the program, I created a MySQL database with 38 tables. MySQL was used as this is the database I?m most familiar with.

To retrieve the URI-s from web, I used the well-known GNU Wget.

All the Perl source code of the program is freely available for download and licensed under the same terms as Perl itself (which means: feel free to modify or do whatever you think, just don't do anything nasty). The installation instructions are hidden into the following tarball inside the file INSTALL:

Oh, and if you didn't guessed it already, you need Linux to run the whole thing.

Kirjutatud 12. juunil 2006.

Trinoloogialeht

Eesti Trinoloogide Maja. Eesti trinoloogiahuviliste avalik kogunemiskoht. info@triin.net

Peamenüü

Sisukord

RSS, RSS kommentaarid, XHTML, CSS, AA