HTML5 – the ‘italic’ elements – em, i and cite

I skipped a couple of days, so by way of catch up, three elements today. Italic text is really semantically complicated. Who knew? Presenting the <i>, <em> and <cite> elements…

I haven’t completed my archaeological dig, but as far back as 1995, HTML 2.0 had already standardised both <i>and <em>, with <i> used for italic, and <em> for an “emphasized phrase, typically rendered as italics”. My recollection however is that most at the time used <i>. With HTML4.0, CSS and the separation of style and content, <i> was deprecated and <em> popularised for emphasis, typically rendered as italics, but open to other use e.g. a screen-reader could apply emphasis on that word.

Having spent the last ten years trying to get people to use the more semantic em, it’s then somewhat galling to web-developers to find that <i> has been undeprecated in HTML5. Again both <i> and <em> have been assigned meanings, although slightly revised from their HTML2.0 definition.

To examine the definitions, it’s worth considering the background of italic usage in print. In publishing italics are used in a number of different ways (as described on the wikipedia italic page). Firstly, italic might be used to apply emphasis to an individual word in a sentence. For this usage, <em> is recommended, to mark specifically that the stress on this word changes the overall meaning of the sentence.

Alternately italic might be used for a character’s thought process such as a recollection or flashback: HTML5 uses <i> for this purpose to represent “a span of text in an alternate voice or mood”. Other similar uses for a passage of text that would generally appear as italic are a “taxonomic designation, a technical term, an idiomatic phrase from another language, a thought, or a ship name in Western texts.”.

Another use of italic in print, is when citing titles of works e.g. “He wrote his thesis on The Scarlet Letter” [incidentally title here highlighted with the WordPress italic function which inserts <em> tags]. For the title of a work (be it book, song title, painting etc.) then the <cite> element should be used, specifically for the title and not for other details such as author name and suchlike.

So italic text in print falls out into 3 different markup options in HTML. Which is all semantically lovely. I do worry however, that content authors and front-end developers are not all semantic experts and are unlikely to understand the nuances of these elements. This means we we’ll see them continuing to be used differently to the specification. After all, I certainly didn’t know ship names should be italic; did you? And also, when all said and done, did we ever win the em debate?

As suggested in the interesting comment thread at Impressive Webs it does raise fears that the backwardly compatible nature of HTML5, means that standard-makers have perhaps felt they needed to find new meanings for old elements rather than biting the bullet and retiring them. I’m going to endeavour to use them as intended, but it will be interesting to see when content publishing tools (such as WordPress) catch up, and how they handle a user’s intention just to have italic text in their web-page having to take on this multi-faceted semantic meaning.

Leave a Reply

Your email address will not be published. Required fields are marked *