Q: When is HTML5 not HTML5?

A: When it’s HTML.

Confused? After the announcements of the last few days you have every right to be.

I mentioned in my ‘about‘ page that I intended to write a series of posts looking at HTML5. To set the scene I had intended to start with a brief history of browsers and HTML implementations, working forward with detours into XHTML along the way before arriving at HTML5.

Given the HTML5 publicity relating to the new logo, I think it’ll work better if I leap, Tarantino style, into the middle of the story, before a few flashbacks explaining the events that led us here. As for how it all ends: who knows… I think we’ll have to do a ‘Kill Bill’ and leave that for part 2.

HTML5 crudely is the collection of the latest updates to the HTML specification, which are gradually being implemented by browser vendors. The HTML specification is being developed by the WHAT (Web Hypertext Application Technology) Working Group (more on them later) and adopted / published by the W3C (more on them, too…).

HTML is one of a number of technologies that make up the ‘web’ experience, along with CSS (Cascading Style Sheets), JavaScript and others. However HTML is the technology that has become synonymous in the non-technical mainstream with web-pages and has reached a certain level of common knowledge: for example you might encounter a pub-quiz question asking what the words of the acronym ‘HTML’ are. HTML5 ‘showcase’ sites such as the Arcade Fire Wilderness Downtown interactive film, and Steve Job’s comments about Flash vs HTML5 have brought the technology to public attention in recent months and started something of an HTML5 bandwagon. This bandwagon has tended to encompass a number of web-technologies including HTML5, but that’s neither here nor there to the media and in the meantime plenty of IT professionals are confused enough about HTML versions as it is.

HTML5 logo - transformers parody Into this background the W3 launched the new HTML5 logo, prompting different reactions from all sorts of folks. Joe Public mostly shrugged and wondered why HTML5 needs a logo at all. The more comedic, lampooned its likeness to a superhero logo and created a transformers version. And web-design professionals (spearheaded by Jeremy Keith) read the small print and collectively flipped out.

The key point was that the W3 announced in the FAQ that the new HTML5 logo was intended to be used as an umbrella term: “The logo is a general-purpose visual identity for a broad set of open web technologies, including HTML5, CSS, SVG, WOFF, and others”. This was viewed in many quarters as further muddying the waters of what HTML actually is, by the one body that should be being precise as to its purpose. It was as if “the government suddenly announced that from today, all vegetables will be called potatoes, just because some vegetables are potatoes”.

Flashback-time. How did we get here? And who are all these committees?

The relationship between vendors implementing standards and standards bodies defining them has always been uneasy. To misquote John Godfrey Saxe (sometimes attributed to Bismark) “Webstandards, like sausages, cease to inspire respect in proportion as we know how they are made”.

Tim Berners Lee created the initial HTML implementation from 1989 – ’92, based on SGML, with the addition of features including hyperlinks to allow navigation between pages. In 1994 this was formalised as HTML 2.0, published by the IETF, the Internet Engineering Task Force. The scope of HTML kept extending from this time, with browser makers Netscape and Microsoft extending the language and the W3 (World Wide Web Consortium) formed to to develop open web-standards.

In 1997 HTML 4 was published. From here the next step in the development of HTML was seen as convergence with XML. XHTML (Extensible HTML) was envisaged as being an XML valid version of HTML, removing the tolerance of the huge amount of poorly formed HTML sloshing around the internet and forcing pages to be well formed XML in order to be rendered. XHTML1.0 was basically HTML 4 with XML syntax and XHTML1.1 then removed deprecated elements. Many pages have been authored as XHTML, however in most cases (like this website is!) they are served as “text/html” rather than as “application/xhtml+xml” – which avoids the ‘draconian error handling‘ of XHTML well-formedness.

The WC3 moved onto developing XHTML2.0, adding new language features and importantly, without a requirement for backward compatibility with previous HTML versions. A split developed within the standards community, following discontent with the direction of the W3C. As a result, the WHATWG (Web Hypertext Application Technology) formed from individuals of Apple, the Mozilla Foundation, and Opera Software, with a main stated focus on HTML and the needs of real world authors: “extending HTML4 Forms to support features requested by authors, without breaking backwards compatibility with existing content”. They set about creating two new specifications: Web Forms 2.0 and Web Apps 1.0. which eventually merged into a specification called HTML5. In the meantime, the W3C continued work on XHTML2.0.

A key intent of XHTML was the enforcement of strict syntax and draconian error handling for invalid XHTML. Much like writing a software program, no syntax errors would be tolerated – if the XHTML was at all invalid, the page wouldn’t be rendered. One benefit of this strictness was intended to remove the burden on browser makers (on an increasing number of devices) from having to have to sail the seas of bad markup and attempt to forgivingly render every malformed page. This seems like a sensible goal; but in reality it was too much change, especially as so much HTML is produced by non-technical folks, who just want to publish as easily as possible.

As described by Tim Berners Lee: “The attempt to get the world to switch to XML, including quotes around attribute values and slashes in empty tags and namespaces all at once didn’t work. The large HTML-generating public did not move, largely because the browsers didn’t complain. Some large communities did shift and are enjoying the fruits of well-formed systems, but not all. It is important to maintain HTML incrementally, as well as continuing a transition to well-formed world, and developing more power in that world”

This led to the W3C convening a new HTML working group, using the output from the WHATWG as the basis of a future HTML version. This is a somewhat confusing situation with two bodies involved in the creation of a standard, one using the output of the other. Remember John Godfrey Saxe and his quote from earlier about making sausages?

Back to the future.

So. As we saw earlier, the W3C launched a logo and muddied the water about what HTML5 is and caused a web-developer outcry. In the following days, Ian Hickson from the WHATWG responded by announcing that the WHATWG specification “will henceforth just be known as ‘HTML’” . The intention is that rather than versioning the HTML specification it is instead developed as a living document with a feature-set that develops over time, and which is generally backwardly compatible with previous HTML features. The W3 in turn saw the light and revised the FAQ to remove the HTML5 ‘umbrella’ term: the logo now “represents HTML5, the cornerstone for modern Web applications“. Doug Schepers from the W3 also published a response explaining the logo creation process.

So the events of the ‘HTML5 logo’ appear to be quickly smoothed over, but do cast a revealing light into the tensions involved in creating standards and in particular the relationships within the web-standards world.

So WHAT(WG) next?

The current result of all this brouhaha, is that:

The WHATWG will incrementally develop a specifcation, simply known as HTML. This makes some sense, as browsers have never fully implemented a given version of HTML in a specific browser version, but have tended to gradually adopt features over time.
The W3C will continue working on producing a ratified snapshot of the HTML specification that will be named HTML5 and be based on the WHATWG specification.
And the marketing people have a lovely new HTML5 superhero logo to add to their powerpoint decks (thanks Andy!).

But remember, much like a Tarantino film, this is just the middle of the story: there is sure to be more drama, more acerbic exchanges, and more Mexican stand-offs in the years to come…

Further reading (and inspiration):
Mark Pilgrim – Dive into HTML5
Jeremy Keith – A brief history of markup
Jeffrey Zeldman – HTML5 vs HTML

Kris Coverdale

IT's Not Rocket Science

Q: When is HTML5 not HTML5?

A: When it’s HTML.

Flashback-time. How did we get here? And who are all these committees?

Back to the future.

So WHAT(WG) next?

1 Comment

Leave a Reply Cancel reply