Web Standards in Estonia vol 3

This survey continues the work done in previous surveys on February 2005 and August 2005. Also, the recent work of Validating sites of W3C members (which uses the same methodology) might be of interest to you.

All the figures on this web page are also represented as tables. If you want to see the tables, turn off the showing of images in your browser or view the page with a text‐only browser. If you only see the tables, or don’t see neither tables or figures, then it probably means your browser does not correctly support the object element.

The selection of pages
Results
Doctypes
Buggy doctypes
Encodings
Summary

The selection of pages

I acquired the set of Estonian web pages the same way as in previous surveys: they’re all taken from Neti.ee Estonian servers list.

Compared to the previous survey (done in August 2005), there were 1290 more pages this time.

Survey	Nr of pages	Growth
February 2005	21,905
August 2005	22,735	830
March 2006	27,211	4476

Table 1. The number of pages that took part from each of the Estonian-webpages-survey.

When we look at table 1, we’ll see there has been a significant growth in the number of pages used in survey. From February to August (6 months) 830 more pages were added to the list, but from August to March (7 months) 4476 pages were added. That’s quite a difference! Probably it’s because the winter is the active time of doing business (and also – it seems – buying domain names).

From the full set of 27,211 pages 3186 pages were excluded for various reasons (redirect to another page, no HTML content, error page etc), which left us with 24,024 pages to perform validation and other kinds of statistics on.

Results

The results of our previous survey were a bit disappointing (to say the least). The number of valid pages had only risen by four and the overall percentage of valid pages had fallen from 2.22% to 2.17%.

That fore I was quite surprised when I first saw the results of this very survey. It looked like a miracle.

I checked everything twice, but it seems to be correct: the amount of valid pages in Estonia has risen from 2.17% to 3.02%! That’s almost 1% of growth! That’s almost 40% more valid pages! The following table 2 provides you with hard numbers.

Result of validation	Nr of pages	Percentage
Not valid	23,176	96.47%
Valid	725	3.02%
Tentatively valid	124	0.52%

Table 2. The results of validating all the pages.

Doctypes

If previously 34.98% of invalid pages had doctype specified, then now it has risen to 35.54%. Minor change, but indeed.

The distribution of document types used in invalid and valid pages can be seen from the following figures 1 and 2.

Invalid pages
Document type	Number of pages	Change
HTML 4.01 Transitional	4056	+865
HTML 4.0 Transitional	1691	+206
XHTML 1.0 Transitional	1364	+662
HTML 3.2	405	+193
HTML 4.01 Frameset	245	+23
XHTML 1.0 Strict	146	+63
XHTML 1.1	85	+30

Figure 1. Changes in the use of doctypes on invalid pages. The increase is marked with green and decrease with red (brighter then green). All the other figures in this page use the same colors in exactly the same meaning.

Valid pages
Document type	Number of pages	Change
HTML 4.01 Transitional	336	+104
XHTML 1.0 Transitional	232	+122
XHTML 1.0 Strict	57	+26
XHTML 1.1	36	+19
HTML 4.0 Transitional	32	+8
HTML 4.01 Frameset	11	+3

Figure 2. Changes in the use of doctypes on valid pages.

The most significant change to note from both figures above: the usage of XHTML 1.0 Transitional has almost doubled. Also the XHTML 1.0 Strict and XHTML 1.1 have also nearly doubled they’re usage. This certainly is the sign that XHTML is moving in the direction of becoming more popular than HTML (especially when creating new webpages).

On the bad side of XHTML 1.1, which (according to W3C) must not be served as text/html. But none of the valid pages, that used XHTML 1.1, served it as an XHTML or XML application.

However the great rise of HTML 3.2 is probably some kind of mistake in my data-gathering process, because last time the HTML 3.2 actually made a big fall, but now somehow it has risen again, but there is still less HTML 3.2 now, then there was in the first survey.

Buggy doctypes

The survey discovered a lot of mistakes made in the spelling of doctypes’ Formal Public Identifiers (FPI). Clearly, this indicates, that most of the developers don’t really know what a doctype is – but here are the most common mistakes made.

First of all, FPI is case-sensitive. A lot of developers don’t seem to know that. A lot of pages had doctypes written entirely or partly in lowercase (but also in uppercase). This is wrong. The bad examples follow:

-//w3c//dtd html 4.0 transitional//en 22
-//W3C//Dtd XHTML 1.1//EN
-//W3C//DTD HTML 3.2 FINAL//EN
-//W3C//DTD html 4.01 Transitional//EN
-//w3c//dtd html 4.0//en
-//W3C//DTD XHTML 1.0 strict//EN

Secondly, the language identifier at the end of FPI refers to the language the Document Type Definition itself is written in. All the W3C documents are written in english, that’s why there’s "EN" at the end. But a lot of developers thought this might be a place to specify the language of the HTML document. Wrong again. Bad examples follow:

-//W3C//DTD HTML 4.0 Transitional//ET
-//W3C//DTD HTML 4.01 Transitional//ET
-//W3C//DTD XHTML 1.0 Transitional//ET
-//W3C//DTD HTML 4.0 Transitional//EE
-//W3C//DTD XHTML 1.1//EE
-//W3C//DTD HTML 4.0 Transitional//RU
-//W3C//DTD XHTML 1.0 Transitional//RU
-//W3C//DTD HTML 4.0 Transitional//LT
-//W3C//DTD HTML 4.0 Transitional//LV
-//W3C//DTD XHTML 1.0 Strict//NO

And lastly there were the misspelled FPI’s. Hard to say, why some people write something like "XHTML 1.1 Transitional" without checking to see, if a doctype like that even exists. More bad examples follow:

-//W3C//DTD HTML 5.0 Transitional//EN
-//W3C//DTD HTML 4.1 Transitional//EN
-//W3C//DTD HTML 4.01 Strict//EN
-//W3C//DTD HTML 3.02 Transitional//EN
-//W3C//DTD XHTML 1.1 Transitional//EN
-//W3C//DTD XHTML 1.0//EN
-//W3C//DTD HTM 4.0 Transitional//EN
-//W3C//DTD W3 HTML//EN
-//W3C//DTD 3.2//EN
-//W3C//DTD Transitional 1.0 Strict//EN
-//WC3//DTD HTML 4.0 Transitional//EN

Encodings

The number of pages, that specify encoding has risen from 72% to 73%.

As seen from figure 3, UTF-8 has made the most significant rise and conquered the second place. ISO 8859-13 has risen from 10th place to 8th.

Encodings
Encoding	Number of pages	Change
iso-8859-1	11019	+1441
utf-8	2124	+739
windows-1257	1867	+268
windows-1251	1030	+159
windows-1252	594	+80
iso-8859-4	264	+46
iso-8859-15	212	+47
iso-8859-13	88	+72
windows-1250	73	+5
iso-8859-2	63	+3
koi8-r	26	+12
us-ascii	18	+3
iso8859-1	13	+3
utf8	11	?

Figure 3. Changes in the popularity of encodings. As almost half of the pages use iso-8859-1 the number-of-pages-axis has been cut.

The misspelled "iso8859-15" still has 13th place, but with the rise of UTF-8, the 14th place has been achieved by another misspelling: "utf8".

Other popular misspellings (used more than once):

iso8859-15
unicode
win-1251
_autodetect_all
et-iso-8859-1
iso-8851-15
iso-8859
latin-1
windows 1251
windows1251

Summary

There seems to be only good news this time:

A lot more valid pages!
A lot more XHTML!
A lot more UTF-8!

Let’s hope the progress continues...

Kirjutatud 11. märtsil 2006.

Web Standards in Estonia vol 3

Table of Contents

The selection of pages

Results

Doctypes

Buggy doctypes

Encodings

Summary

Trinoloogialeht

Peamenüü

Samal teemal