Selection of pages

As I have already done several researches, harvesting the use of web standards in Estonia and by the members of World Wide Web Consortium. (See:

So it was natural, that this time I wanted to do something bigger. But the big problem was: how do I collect a representative selection of all the pages in the web? The solution almost accidentally fall into my hands, when I found one study from 2001.

Apparently five years ago Dagfinn Parnas from Norway conducted a survey as a part of he’s masters thesis (titled How to cope with invalid HTML) to find out how many pages in the web are valid according to their document type declaration. He validated over 2 million URI-s, gathered by the Open Directory Project (which are freely downloadable in RDF format), and came to the conclusion, that only 0.7% of webpages use valid HTML.

Five years have passed since then, and the face of the web has changed quite a bit (see figure 1). So I took the opportunity to perform another research using the URI-s in Open Directory Project as a selection and tools similar to the ones Dagfinn Parnas used – making available a comparison between the web now and then.

Releases of IE, Mozilla (Firefox), Opera and Safari since 2001 to 2006. The latest stable IE was out at the end of 2001, all the other browsers from the last year.

Figure 1. Major releases of popular browsers since 2001. Most of the current browsers have been under active development since then, only Microsoft Internet Explorer (IE) has been lagging behind. (Figure is based on the data of Wikipedia.)

Kirjutatud 12. juunil 2006.

Eelmine Järgmine

Selection of pages

Trinoloogialeht

Peamenüü

Sisukord