Life Beyond Code

Intro to microformats

Posted on: July 25, 2006

I was searching on some tutorial on MicroFormats introduction and found some material Here on Web, Here is a brief Intro to MicroFormats, The next Big thing on Semantic Web.

Microformats are an important – no, very important – new idea on the web. In fact, I think they are so important, they could precipitate a leap of evolution more important than AJAX and as important as XML web services. But first, an introduction.

The focal site for microformats, microformats.org, is not clear at all on what microformats are, but here is my understanding:

Microformats build on the semantic capabilities of the web, using existing standards.

Unless you’re fairly technical, that’s probably meaningless. So, to explain. <h1>, <h2>, <p>, <ul> – all of these and other HTML tags are designed to tell human readers, web browsers and other HTML readers what sort of information they contain. Not what it looks like – that’s what CSS is for – but how that bit of information relates to other bits on the page. Is it a heading, a paragraph or a list of things?

Where HTML is not enough…

This is quite useful, for example, for automatically generating outlines of documents. But it’s not 1% of the distance we could go with the idea. For example, a lot of websites have information about points of contact – employees, business associates, personal contact info and so on. Just like this:

Nick Nettleton

20 Crescent Lane, Bath BA1 2PX, UK
+44 (0)1225 358 346
nospam@example.com

Here’s a typical HTML for this:

<p><strong>Nick Nettleton</strong><br /> 

20 Crescent Lane, Bath BA1 2LT, UK<br />

+44 (0)1225 358 346<br />

<a href="mailto:nick@plumdigitalmedia.com">nospam@example.com</a></p>

This is great for human readers, becaue we know what a phone number and address looks like. But to a computer, it’s a just paragraph of text, with a more important bit in bold at the top and an email address at the bottom. That is the total semantic capability of HTML for this bit of text.

A clever web browser, plugin, page scraper or other HTML reader could try recognise what looks like a phone number, and put a Skype button next to it. It could even try to recognise my address and separate out its compontents – street, town, postcode, country. But it would be hard pressed to get this right on a regular basis, and would be at the mercy of what information I choose to include, as well as my particular way of representing it.

For example, there is small village near where I live called Petit France. If my address was “20 High Street, Petit France”, even a highly intelligent address parser, such as a human reader, with no further information to go on, cannot be sure whether I live in the villiage of Petit France, or in a town called Petit in France.

As anyone who has received a letter starting with ‘Dear Mr Nick’ or somesuch will know – and especially anyone who has worked with databases of people – this sort of confusion is a constant and major issue.

So how can we give readers more information to go on, more semantics or meta information to understand whether the text really is contact information, and which bits of it are what?

Header-style tagging

One solution would be to come up with stronger rules on how people should write their names and addresses. It’s not a bad idea. We have fairly good rules in place already, they are quite consistent from country to country, and a solution like this would apply not just to the web, but also to the whole electronic and even paper world. For example:

Given name: Nick
Family name: Nettleton
Address: 20 Crescent Lane
Town: Bath
Postcode: BA1 2PX
Country: UK
Telephone: +44 (0)1225 358 346
Email: nospam@example.com

That’s pretty darn clear for a human reader, and we know this format can work well in the electronic world because we use this for HTTP and email headers. It’s a good idea.

But there’s one downside. We’ve traded one sort of clarity (knowing exactly what is what) for another: readability. If I already know Nick lives in the UK, and how UK people tend to write their names and addresses, the much simpler format up top is way clearer and easier to read. A directory full of contact details presented in the format above will be extremely hard to read, and probably not very popular.

So what other technique can we use to give readers a better chance of understanding contact information, but without compromising on visual clarity?

Creating new HTML tags

HTML was originally designed with human readability and semantics in mind. As far as it goes, it’s pretty successful in this. Since our return from the bad days of table-based layouts, good, simple HTML is quick and easy to read, and describes the information it tags accurately – as headings, paragraphs, lists and so on. Better still, it is designed so that the semantic information, the HTML tags, can be hidden from viewers and replaced with more natural visual cues to semantics, such as spacing, font
size and character weight.

This has been extremely effective, and applied to contact information would solve all our problems at a single stroke:

<contact>
<givenname>Nick</givenname>
<familyname>Nettleton</familyname>
<!-- etc -->
</contact>

In a web browser, only the text is displayed to the viewer, while style information can be used to add visual cues based on the semantics. HTML readers, and users that need a little more than visual cues on the semantics, can inspect the source code itself.

The trouble here is that, unlike XML, HTML doesn’t allow us to add new tags like this on an as-required basis, and for very good reason: HTML’s huge value is by virtue of it being a shared standard. History has shown us that some companies will use flexibility in this to their own advantage and the significant detriment of the overall community – and use their market position to seal their alternative approach.

hCard: giving meaning to HTML classes

Nevertheless, the game is not over. There is a microformat called hCard that instead of uses HTML classes instead of new tags to add new semantics to HTML. Since these class names have no semantic meaning under HTML, conform to no standards (beyond the characters that can be used), and are intended for referencing only visual information, the W3C has little to say on what class names you can or can’t use.

HTML designers tend to use similar class names for many things across projects and between each other, simply because life is easier that way, and after all, there are only a few different ways of writing the words ‘box’, ‘header’ or ‘footer’, for example.

So the hCard format rather cleverly leverages this flexibility to use the HTML class attribute for something it wasn’t intended: semantics. By giving an element the class name ‘postal-code’ – in the right context – it indicates to readers that understand this format, that the contained text is a post code.

Here are my contact details, as above, expressed using this format:

<p class="vcard">

<div class="fn">
<span class="given-name">Nick</span>
<span class="family-name">Nettleton</span>
Nettleton
</div>

<div class="adr">
<span class="type">Work</span>
<span class="street-address">20 Crescent Lane</span>,
<span class="locality">BATH</span>
<span class="postal-code">BA1 2PX</span>,
<span class="country-name">UK</span>
</div>

<div class="tel">
<span class="type">Work</span>
<span class="value">+44 (0)1225 358 346</a>
</div>

<a class="email" href="mailto:nospam@example.com">nospam@example.com</a>
</p>

This is an extremely appealing idea. If the web designer sets up the appropriate stylesheet rules for the classes, the information will appear in browsers exactly as the example at the beginning of this article. Meanwhile, readers that need to know more can inspect the code. Web browsers understanding this information can, for example, automatically insert an ‘Add to address book’ or ‘Call with Skype’ button beside the address, and be certain of using the right details in the right way.

It’s not a completely new idea: web designers have been given things semantic – or at least meaningful – class names for years. JavaScript developers have been using class names not to define style, but to identify groups of elements that they want to apply dynamic behaviours to. Which helps to make the microformat even more appealing – it’s something we’re already familiar with.

What is new is to promote it as a standard, which makes it far more valuable than an ad-hoc habit or convention, subject to local variations.

hCard is just one of a number of such microformats, a some of them using classes in this way. Others, such as XFN for attaching personal relationship information to hyperlinks, use the HTML rel attribute; and others use plain HTML.

There are at least two underlying drawbacks with this approach, which I will look at in further posts, since I have written enough for now. For more and follow up, see www.microformats.org.

Advertisements

2 Responses to "Intro to microformats"

Microformats are being used by Google too now. Anyways, this article is really a good introduction to Microformats.

-Micheal

Hi Nick,

Yes, I read this article at your blogs . Blogs is a knowledge sharing space. I found the article so interesting that i wish it should be in my blogs, if you can see on this article, i have mentioned, from where i have read this.

-Usman Ahmad

>Hi usman

>Please link through to my original article, rather than publishing it on your website without a credit. The original URL is here: >http://www.nicknettleton.com/zine/microformats.

>Many thanks,
>Nick.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Calendar

July 2006
M T W T F S S
« Jun   Aug »
 12
3456789
10111213141516
17181920212223
24252627282930
31  

My del.icio.us

RSS Readings

  • An error has occurred; the feed is probably down. Try again later.

And Here is ME...

Sand Play - at Budva Beach (Montenegro)

My friends - Salman, Michal, Imre

Teradata Team Dinner, ProMonte Project, Montenegro (Imre, Viladimir, Danial)

More Photos

Top Clicks

  • None
%d bloggers like this: