XHTML and the application/xhtml+xml MIME type

Feb 11 2005

Web servers send a “MIME type” with each file which tells the browser what kind of file it is. For instance, HTML files are “text/html” and CSS files are “text/css”. “Well,” you may be asking, “why not just use the file extension to determine the file type?” While file extensions commonly indicate an author’s intentions, that’s not always the case. For instance, a dynamically generated image might have a filename such as “example.com/gimmiephoto.cgi?resource=kd73jkldi674k”. And, without an accompanying MIME type, a browser might not know what to do with the file. Or, suppose that you wanted to link to an HTML file as part of a tutorial but have the browser display its source code instead of rendering the page? You could do that by having the server send the file as “text/plain”.

Now on to the fun part. The latest approved version of (X)HTML is XHTML 1.1. And, the W3C defines a set of acceptable. MIME types for XHTML. In short, web browsers serving XHTML 1.1:

Keep in mind that the W3C is very specific in its wording — a MIME type’s acceptability could range from “Must Not” to “Should Not” to “May” to “Must” (with many in between). So, while “text/html” is discouraged (“Should Not”) it’s not specifically forbidden.

So, what does setting the MIME type to application/xhtml+xml get you? Much like using a stricter DOCTYPE, sending documents as application/xhtml+xml lowers a browser’s tolerance for deprecated or malformed markup. An article at O’Reilly’s XML.com provides a good summary:

  • “All of your pages must be well-formed XML. [All] your end tags must match all your start tags, no overlaps, none missing. […] If a single end tag is missing, Mozilla users won’t see your page at all; they’ll see an XML debugging message instead.”
  • “[When] attached to XML pages (including XHTML pages served with the proper XHTML MIME type), CSS selectors are case-sensitive. This shouldn’t come as too much of a surprise; everything in XML is case-sensitive. Keep all your CSS selectors lowercase and you’ll be okay.”
  • “Whereas the HTML DOM is case-insensitive (and tag names are returned from functions like getElementsByTagName() in uppercase), the XML DOM is case-sensitive and tag names are returned in lowercase.”
  • “Still on the JavaScript front, collections like document.images, document.applets, document.links, document.forms, and document.anchors do not exist when serving XHTML as XML. You’ll need to use the more generic document.getElementsByTagName() method and weed out the elements you’re actually interested in.

All of this may sound like a barrel of fun, but it’s not just a matter of reconfiguring your web server to send XHTML files as application/xhtml+xml. Unfortunately, IE b0rks when handed application/xhtml+xml files — it doesn’t even render them. So, the best we can do for now is to only send application/xhtml+xml to browsers which accept it. And, this is easier than it might initially seem. Mark Pilgrim (the article’s author) discovered that browsers which can handle application/xhtml+xml also properly send the HTTP header “HTTP_ACCEPT”. So, if the browser includes “application/xhtml+xml” among the MIME types which it accepts, then it can handle that MIME type (in both theory and in practice).

So, how does one check the HTTP_ACCEPT header? Well, Mark has taken care of that as well. He’s written a short snippet of PHP code which can be dropped into pages (static or dynamic) and automagically sends the application/xhtml+xml MIME type as needed. And, if you’re running Apache, he also includes some code for mod_rewrite which works just as well and doesn’t require any per-page modifications.

Or, for a more thorough approach, Roger Johansson at 456 Berea Street has a write-up on some code snippets for both PHP and ASP. And, with the PHP script which he found, browsers which can handle application/xhtml+xml and not only given that MIME type but they’re also given the XML prolog and the HTTP “Vary” header to indicate the nature of the content negotiation. As written, that script also sends the iso-8859-1 charset; I’d probably be inclined to go for UTF-8 instead (which Roger cited difficulty in doing, but I’d be open to trying it).

I look forward to implementing these methods on some upcoming projects. It’ll keep our code honest. Really honest.

Leave a Reply