Validating sites of W3C members

Table of Contents

Four years ago...

In 22th of February 2002 Marko Karppinen performed a validation-survey on the websites of W3C members. As he said:

it would be understandable to assume that the w3c member organizations are on the forefront of web standardization efforts.

But sadly it turned out, that only 18 sites out of 506 were using valid HTML.

Has anything changed since then?

Obviously I grabbed the opportunity to repeat he’s experiment, and performed a validation of all the websites of W3C members. (Which was an easy task, as I had my mass-validation scripts left over from previous investigations.)

Today W3C has 401 members (interestingly that’s much fewer than about 500 members they had at 2002). Out of those, 397 had a website (at least a one mentioned in W3C members list). The script was unable to validate 45 of those sites for various reasons (the main one being an <meta http-equiv="redirect" used in HTML - a damn stupid script, I know). So, 352 sites remained.

And here are the results of validating those compared to the results of Marko Karppinen at 2002:

Comparison of validity of W3C members sites at 2002 to 2006
Result2002 survey2006 survey
nr of pagespercentagenr of pagespercentage
Not valid48896%28681%
Tentatively valid00%52%

As seen from above, the W3C members are clearly getting better at producing valid websites. There are 13% more valid websites then there were at 2002. And it’s not only percentages: at 2002 there were 43 valid pages less than it is today.

On the sad side: over 80% of the W3C members still have an invalid website. And these are the members of organisation who’s mission is:

To lead the World Wide Web to its full potential by developing protocols and guidelines that ensure long-term growth for the Web.

List of valid sites

Here’s a list of all those members who’s website was found to be using valid HTML. Note, that only three of those had a valid site back in 2002.

A bit more statistics

I also gathered some information about doctypes and encodings used on the websites of W3C members.


When we look at the set of invalid pages (because all valid pages must a doctype to be valid at the first place), then only 48% included a document type declaration. This is definitely more than my 2005 February survey found on Estonian pages (35%), but definitely less, than you would expect from W3C members.

Usage of doctypes by W3C members
DOCTYPEnr of pages
HTML 4.01 Transitional100
XHTML 1.0 Transitional80
HTML 4.0 Transitional39
XHTML 1.0 Strict19
XHTML 1.14
HTML 4.01 Strict3
HTML 4.01 Frameset2
HTML 4.0 Strict1
-//W3C//Dtd HTML 4.01 Transitional//EN1
-//W3C//DTD HTML 3.2//EN1
HTML 3.21
-//W3C//DTD HTML 4.1 Transitional//EN1

As seen from the table above, the most popular are the transitional doctypes. Interestingly, compared to my Estonian sites survey, XHTML 1.0 Transitional is more popular among the W3C members, than in Estonian sites, where it landed on the third place. Also, on the fourth place is XHTML 1.0 Strict instead of Estonian HTML 3.2.

Usage of doctypes on valid W3C members sites
DOCTYPEnr of pages
XHTML 1.0 Transitional32
XHTML 1.0 Strict12
HTML 4.01 Transitional9
XHTML 1.14
HTML 4.0 Transitional2
HTML 4.01 Strict1
HTML 4.01 Frameset1

Doctypes usage on valid pages makes even more clear, that W3C members strive towards XHTML. And a strict doctype on the second page is definitely a good sign (although a first page would be even better, the valid websites of Estonia have a strict doctype on the fourth place).

But what about XHTML 1.1? W3C says it should be served as application/xhtml+xml and definitely not as text/html. Are the members following this advice?

Sadly, they’re not. Only one of the four members using XHTML 1.1, Progeny Systems, serves it with a correct mime-type. Actually, it looks like it’s the only W3C member who’s serving he’s website as XML. But even they aren’t perfect - they’re markup is missing the XML declaration, and they’re using tables for layout.

Character encodings

111 pages had no encoding specified in HTML. Of course not a requirement, but belongs to a good style. the validator had problems with parsing 31 pages because either a wrong character encoding was specified, or the page did not conform to the standards of UTF-8 (the default character set).

This aside, the most popular encodings were ISO 8859-1 and UTF-8.

Usage of encodings by W3C members
Encodingnr of pages

Last words

So, this how good at web standards are the W3C members on year 2006. Four years since the original survey. The sites have improved, but they could be a lot better.

Maybe at 2008 we have 50% of valid W3C member sites. Maybe...

But what about ordinary sites? Will we have 17% of the World Wide Web validate for 2008? 2009? 2010?

Anyway, we don’t know even how many valid websites do we have today. My own research has shown, that it could be as high as 2%, but there were some serious flaws, which made the investigated set of sites not very representative.

Further studies will hopefully bring some clarity to the subject, until then, we may just happily conclude, that in regards to the standards, at least the W3C members are doing better.

Kirjutatud 4. märtsil 2005.


Eesti Trinoloogide Maja. Eesti trinoloogiahuviliste avalik kogunemiskoht.


Samal teemal

RSS, RSS kommentaarid, XHTML, CSS, AA