HTTP headers

Besides HTML, CSS and JavaScript the program also collected data about HTTP headers. Sure, this data only reflects those headers that are common on the front pages of websites, and as I said, all the pages with other status code than 200 OK were removed from further analyzes.

First of all, lets take a look at the very first line of any HTTP response, typically something like this:

HTTP/1.1 200 OK

1.1 here designates the HTTP version number. Version 1.1 is the most recent HTTP standard, 1.0 being the previous. Anything below 1.0 is deprecated. As figure 3 clearly shows, the situation in web regarding HTTP versions is sound: 97% of pages are served using the latest standard. Although I should point out, that the program dismissed all the pages with HTTP version below 1.0. Anyway, I’m quite sure that web browsers also don’t accept anything below 1.0.

97.15% of pages uses HTTP version 1.1. Remaining 2.85% uses version 1.0.

Figure 3. HTTP versions. Majority of pages uses the latest HTTP version 1.1.

The following figure 4 shows 21 most frequently used HTTP headers on web pages.

Content-type, date, server, connection, content-length, ...

Figure 4. Most frequently used HTTP headers.

Of course 100% of pages have Content-Type header, because pages without that were previously removed from the selection. I’ll discuss the usage of Content-Type header later in the text, when I talk about the encodings used.

The values of Date header are presumably uninteresting, but the Server header provides a lot of information regarding which HTTP servers are used. As can be seen from figure 5, by far the most popular HTTP server is Apache, followed by Microsoft Internet Information Service.

Apatche, Microsoft-IIS, Zeus

Figure 5. Three most popular HTTP servers (based on Server headers).

The previous figure is continued in figure 6, magnifying the usage of rarer servers. Note that these are just the values of the Server header, not necessarily names of HTTP server softwares. For example Rapidsite is actually a web hosting service provider.

Zeus, Netscape-Enterprise, Rapidsite, Squeegit, Sun-one-web-server, Lotus-Domino, ...

Figure 6. Lesser popular values of Server header (extension to the previous figure).

The fourth header, used on almost every page, is the Connection. Aside from few quirks, all pages are served with either keep-alive or close as values of this header. Figure 7 shows, that the first value is used slightly more often than the other.

Keep-Alive 58.71%, Close 41.29%

Figure 7. Values for the Connection header.

But the HTTP headers statistics was not the main target of this research, it could be described more as a side-result, although a planned one. So, lets leave the HTTP protocol for now, and dive into the markup itself.

Kirjutatud 12. juunil 2006.


Eesti Trinoloogide Maja. Eesti trinoloogiahuviliste avalik kogunemiskoht.



RSS, RSS kommentaarid, XHTML, CSS, AA