First of all, lets take a look at the very first line of any HTTP response, typically something like this:
HTTP/1.1 200 OK
1.1 here designates the HTTP version number.
Version 1.1 is the most recent HTTP standard, 1.0 being the previous.
Anything below 1.0 is deprecated. As figure 3
clearly shows, the situation in web regarding HTTP versions is sound:
97% of pages are served using the latest standard. Although I should
point out, that the program dismissed all the pages with HTTP version
below 1.0. Anyway, I’m quite sure that web browsers also don’t accept
anything below 1.0.
Figure 3. HTTP versions. Majority of pages uses the latest HTTP version 1.1.
The following figure 4 shows 21 most frequently used HTTP headers on web pages.
Figure 4. Most frequently used HTTP headers.
Of course 100% of pages have Content-Type header, because pages without that were previously removed from the selection. I’ll discuss the usage of Content-Type header later in the text, when I talk about the encodings used.
The values of Date header are presumably uninteresting, but the Server header provides a lot of information regarding which HTTP servers are used. As can be seen from figure 5, by far the most popular HTTP server is Apache, followed by Microsoft Internet Information Service.
Figure 5. Three most popular HTTP servers (based on Server headers).
The previous figure is continued in figure 6, magnifying the usage of rarer servers. Note that these are just the values of the Server header, not necessarily names of HTTP server softwares. For example Rapidsite is actually a web hosting service provider.
Figure 6. Lesser popular values of Server header (extension to the previous figure).
The fourth header, used on almost every page, is the Connection. Aside from few quirks, all pages are served with either keep-alive or close as values of this header. Figure 7 shows, that the first value is used slightly more often than the other.
Figure 7. Values for the Connection header.
But the HTTP headers statistics was not the main target of this research, it could be described more as a side-result, although a planned one. So, lets leave the HTTP protocol for now, and dive into the markup itself.
Kirjutatud 12. juunil 2006.