It's About The Hashbangs

Before I get started here’s the disclaimer: The opinions expressed in this rant are my own personal opinions on web development and do not represent the views of my employer, its engineering organisation or any other employees.

A few months back there was a flurry of blog posts and conversations over Twitter both for and against the now fairly common practice of using hashbang urls (example) and JavaScript routing in favor of traditional URLs and full page loads. There is also growing interest around several JavaScript MVC frameworks that are make heavy use of this technique. Since people started doing this kind of thing I’ve been pretty squeamish about the idea. At the time that this discussion erupted across the web I really wanted to comment on it but until recently, although I was almost certain that hashbang URLs were destructive, I found myself unable to put in definite terms why.

As you probably know if you’ve been reading this blog for a while, I have for a long time been an avid proponent of progressive enhancement and as many people correctly pointed out many of the arguments against hashbang URLs seemed to fold this philosophy in which clouded the issue quite a lot. In a well reasoned post, my colleague, Ben Cherry pointed this out and expressed that it wasn’t really hashbangs that were the problem and that they were merely a temporary work around until we get pushState support. As he put it, “It’s Not About The Hashbang”.

After quite a lot of thought and some attention to some of the issues that surround web apps that use hashbang URLs I’ve come to conclusion that it most definitely is about the hashbangs. This technique, on its own, is destructive to the web. The implementation is inappropriate, even as a temporary measure or as a downgrade experience.

Let me explain.

URLs are important. The reason the web is so powerful is that it is a web of information. Any piece of content can reference any other piece of content directly. Our information is no longer siloed into various disconnected libraries, now all our data is linked together. The web is much better at doing this than as a platform for delivering applications but yeah, that’s a whole other blog post. The means by which one piece of data is linked with another piece of data is via a URL. That makes the URL possibly the most important part of the web. If you are working on a web app I assume you value its content. If you value the content that a web app holds then you need to value it’s URLs even more. Directly addressable content is what makes web apps better than desktop apps. It’s certainly not the UIs.

URLs are forever. The web has a pretty long memory. Techniques and technology may change but content published to the web gets indexed, archived and otherwise preserved as do the URLs that they link to. There’s no such thing as a temporary fix when it comes to URLs. If you introduce a change to your URL scheme you are stuck with it for the forseeable future. You may internally change your links to fit your new URL scheme but you have no control over the rest of the web that links to your content.

Cool URLs don’t change. For this and other reasons, Tim Berners-Lee wrote a classic article, Cool URLs don’t change in which he explains how to make future proof URLs and why that is important. If you change your URLs you sever links with from the rest of the web. You’ve just turned your web app into a data silo. Your content has just become a lot less useful. However, as much as we try it’s pretty impossible not to introduce change from time to time, sometimes data does need to be deleted, sometimes you need to move to a new domain name, sometimes you just need to reorganise.

Luckily, HTTP gives us the tools to handle this gracefully. If content is deleted we can tell the web it’s no longer there with a 410 (thanks Nick!), if it’s moved to a different place on the web we can tell the world its new location with a 301 or a 302. HTTP gives us the ability to manage change. Further to that, it’s years old, fairly well specified and most importantly understood by not just browsers but all devices that can access the web including search engines and other spiders.

Going under the radar. So, you’ve implemented hashbang URLs. This means that the part of the URL after the #, the identifies the specific content, is not even sent in the HTTP request. It’s completely invisible to your server. As far as your server is concerned it’s receiving requests for the root document and sending it with a 200 success code no matter what. It no longer has the ability to determine if the URL has moved to a different location or even if the content being requested exists at all. This entire job is left up to some JavaScript that happens to be running on that page. Sure, your javascript can examine the hash portion of the URL, show the relevant content or if it’s missing show a ‘Content not found’ message. It can even redirect to different locations internal and external to the web app.

The important difference is that this is entirely opaque to anything that hasn’t got a JavaScript runtime and a document object model. Spiders and search indexers can and do sometimes implement JavaScript runtimes. However, even in this case there’s no well recognised way to say ‘this is a redirect’ or ‘this content is not found’ in a way that non-humans will understand. You’ve just rendered your content invisible to everything apart from people running certain browsers. The hashbang itself is an attempt to address this by Google but it’s quite a painful thing to implement and why get yourself into a situation where you are creating a fix for something you just broke. Just don’t break it in the first place.

Once you hashbang, you can’t go back. This is probably the stickiest issue. Ben’s post put forward the point that when pushState is more widely adopted then we can leave hashbangs behind and return to traditional URLs. Well, fact is, you can’t. Earlier I stated that URLs are forever, they get indexed and archived and generally kept around. To add to that, cool URLs don’t change. We don’t want to disconnect ourselves from all the valuable links to our content. If you’ve implemented hashbang URLs at any point then want to change them without breaking links the only way you can do it is by running some JavaScript on the root document of your domain. Forever. It’s in no way temporary, you are stuck with it.

It’s not all doom and gloom. For the web apps that have made the jump already it’s too late but I urge you to think really hard about making the jump to hashbang URLs when creating new content or considering a switch from traditional URLs. There is a path forward in the not too distant future. pushState is coming to browsers at quite a rate and, as Kyle Neath said to me in a bar last week, is probably the most important innovation in web development since Firebug. You can implement, as Github have done, pushState for browsers that support it but by all means fall back to traditional URLs rather than hashbang URLs. Even if some users are getting hashbang URLs they will be publishing content linking to them, tweeting them and bookmarking them and you’ll be stuck with supporting them all the same.

Can we all agree to let it go the way of flash intros, please?

Writing a new library? Sketch it out first

So you’re embarking on writing a new library. You might dive right in to the code and work straight on solving the problem or you might take a more considered approach and start thinking about how to model the problem, what classes and methods you need to create, how they interact and so on. However, rather than doing that why not start with sketching out not how the library works but how you would like it to be used. I think you’ll see pretty positive results if you try it.

Usability isn’t just for designers. If you want your co-workers or other developers to use and be productive with this code your creating its got to be as simple and fun to use as your user interfaces. Your library’s API is a user interface. History has shown that libraries and open source projects that put their API design up front are far more successful than their competitors. Take jQuery for instance, you can bet that when John sat down to create jQuery he wasn’t thinking “How do I animate CSS style?” or “How do I create and event system?” his main concern was how he wanted jQuery to be used. At least initially, the internals of jQuery were a mish mash of various libraries that came before it. What actually powered the explose growth of jQuery was the revolutionary API concepts it introduced: Method chaining, Selector-focussed API, simple plug in interface and so on. Other success stories like Ruby on Rails shared a similar focus on API design.

So, how do you write your code in such a way that it will make programmers happy and propell you into internet stardom? When you start out writing a new library or feature, start with a sketch. Cast aside any ideas that you might have already have about the implementation details and try to also cast aside any technical constraints and just start writing how you’d like your code to be used. Let’s take a DOM builder, for example. Mashing strings (or indeed using the terribly designed W3C DOM API) is no fun in JavaScript. If you were taking an implementation first approach you might start with an JSON-style data structure describing your DOM fragment or maybe even consider a set of objects that map to the various types of elements and this may well be how you implement it under the hood but put that aside for now. How would your code look when using the library? Here’s my first pass:

var fragment = build(
  div({ id: 'contact' },
      li('Twitter: ', a({ href: '' }, '@danwrong')) 

Come up with the simplest, most aesthetically pleasing API you can and keep itterating on it until you’ve weeded out all the complexity you can. If certain arguments are optional or have a sensible defaults then make sure that the developer doesn’t have to worry about them unless they need to, is your library similar to any other library? If it is maybe its worth adopting the conventions of that library so it acts in a way that other developers expect it to. Developers dislike like reading documentation as much as your site’s users dislike read help pages. Strive for an API so simple that you can describe it in a few sentences.

Once you are happy with your sketch start building it out. You might need a little meta programming magic to realise the UI you are going for and that’s fine but use with caution. You need to weigh up whether the magic you are adding is going to make your code act in a way that is unpredictable to developers used to working with the language in question or if it will prevent developers from using their existing knowledge of the language to solve problems with your code. If you smell badness then leave the magic behind. In our example above we need to do quite a bit of work with the function arguments in order to support variable amounts of child nodes and to allow optional attribute hashes. In this case I’d make the call that this doesn’t add any extra confusion to the code so I’m happy to go with that.

Also, at this point consider how the API you are trying to design will impact real world pratical issues. For instance, in the example above it I would need to create functions for all HTML tags on the global namespace. Am I happy with my library a defining a global function called a()? No. Time to scale back on that idea a little.

Finally, when it comes to the implementation ensure that the architecture is sound and don’t try to couple the under the hood implementation with the API design to a degree that it’ll make the quality of the library suffer. I often view the API as a simple, pretty veneer over the top of a well crafted solution. Don’t hide the raw internals, make them accessable alongside the simple API. That way if the developer is on the beaten track they get a simple API but if they have specific needs they can delve right in and get stuff done.

So yeah, next time you write some code. Think about sketching out the API first rather than leaving it as an after thought. Your colleagues will thank you.

I’d love to speak more about this at JSConf. The competition is very hot but here’s my proposal if your interested. If it’s picked I’ll go in to detail on some great and poor examples of API design and also show off some nice techniques for bringing your API sketches to life.

Added OEmbed and Embeddable Player to

I’ve just added OEmbed to for fun and profit. I’d not really looked into it until Dustin and Russ pointed out and I really like the idea although it seems a bit under done at the moment (You can only have ‘image’, ‘video’ or ‘rich’ as media types. What about audio?). I’ve been meaning to implement an embeddable player for for a long time so I decided to add them in.

The OEmbed endpoint is:{format}

It supports JSON, XML and JSON-P (if you provide a callback parameter to a json request). For example:

Returns this:

  "title":"Little mix of all the dubstep tunes I've been listening to lately: WRONG BEAT",
  "html":"<iframe allowtransparency='true' frameborder='0' scrolling='no' src='' style='width: 395px;  height: 65px; border: none;'></iframe>\n",

The HTML property contains a snippet of HTML that renders the player:

All very beta at the moment but give it a try. Also, check the short dubstep mix I did a while back. Will post a longer one soon.

Put that data-* attribute away, son...You might hurt someone

HTML 5 data-* attributes allow us to add custom attributes to elements as long as they are prefixed with ‘data-’ and since this was first discussed on John Resig’s blog I’ve been interested in how people will use and abuse this feature. I greeted the feature with mixed feelings. It’s definitely a simple way to enrich the semantic value of HTML pages as well as helping to improve some of the more toxic parts of Microformats. XML namespaces are definitely a more complete solution but this is a simple and immeadiately adoptable means to add invisible semantic data to HTML documents.

However, as John hinted in his post, there’s an enormous temptation for JavaScript authors to use this to embed configuration data for their scripts directly into HTML. Many developers have been itching for an excuse to do this for a long time. Some just added attributes willy-nilly like crazy web standards bandits, some would love to do add arbiturary configuration into their HTML but felt a bit squeamish about moving away from the HTML specs and opted to abuse the class attribute from within the standard. For the record, I’d tend to side with the former. If it works and there’s a good reason for it then I say do it. However, back then I explained why there is no good reason to add unsemantic configuration data into your HTML and now that we have standards-approved carte-blanche to do this I’d like to reiterate that it’s still not the way forward. If you’ve not read that article then its worth a quick read before you go on.

By all means, use data-* attributes to add semantically valuable data to your HTML but if you are just using it to prop up a script you are writing think again.

An Example

If you’ve not already watched it go now and watch Yehuda’s Screencast on evented programming with jQuery. The ideas in here represent a massive progression in client-side scripting. It’s nothing short of essential viewing. However, it also happens to be the latest example I’ve come across of needless use of data-* attributes and, while not wanting to take away from how progressive and clever the content is as a whole, I feel the need to use it as my counter-example for this article.

In the screencast, Yehuda is creating a tab interface. The markup he proposes is something like this:

<ul class="tabs">
  <li data-content="first">First</li>
  <li data-content="second">Second</li>
  <li data-content="third">Third</li>

<div class="pane" id="first">Some content</div>
<div class="pane" id="second">Some content</div>
<div class="pane" id="third" class="selected">Some content</div>

The idea being that when the tab <li> is clicked the script then interrogates data-content to decide which div to show. However, without the JavaScript operating on this the HTML has no semantical value. The <li>s are just list elements (and will be read as such by assistive technologies). In fact, the browser doesn’t know that those list elements are in anyway associated with <div>s below. Here’s how I think it should be marked up:

<ul class="section-nav">
  <li><a href="#first">First</a></li>
  <li><a href="#second">Second</a></li>
  <li><a href="#third">Third</a></li>

<div class="section" id="first">Some content</div>
<div class="section" id="second">Some content</div>
<div class="section" id="third">Some content</div>

Now, before we even add JavaScript we have links that we can click that will jump you to the specified content. If you click the back button you will jump back to the previous tab’s content. With this in place you could even make tabs work solely by using CSS and the :target pseudo-selector. If you wanted to go HTML 5 crazy you could even use <nav> and <section> elements which would further enhance the semantics of the document. By correctly associating the tab link and the tab content we can take advantage of the browsers facilities to navigate this type of content even before we get out the old JavaScript crowbar.

With this markup as a base it’s then just as trivial to hook in the script but instead of interogatting data-content we just look at the anchor of the link. Because we are now using anchors, users can deep link into a particular tab, it would be trivial to support the back button and assistive technologies will make better sense of it, amoung other things.

Leave Yehuda Alone!

Of course, I’m picking apart what was a very simple and purposefully contrived example, but as usage of data-* attributes picks up, it’s important to not abuse this facility and to continue find as many semantic hooks for your scripts as possible. It may now be a “standard” but it doesn’t mean that its a good solution. When looking for hooks for my scripts, this is the process I follow:

1. Build up your markup to be as meaningful as possible. If it submits a request it should be a <form>, if its linking to another piece of content it’s an <a>. Even if you’re building a very complex piece of UI seek to build as much of it as you can into your document (while keeping the semantics intact) before you go anywhere near your JavaScript.

2. Write your script to take advantage of the semantics your HTML document has to offer. This will get you a long way in many cases, however, you may well find that there is still configuration information you need to pass into your script. Rather than turn to data-* attributes its best to consider inferring this information via context in the same way that CSS does. This way you can assert things like “all <input>s with type ‘slider’ and a class ‘day’ have a min of 1 and a max of 31” then you can change this in one place rather than visiting each element’s data-* attributes individually. Read this article for more detail on how to do that. We don’t need to change the heading colour in every single heading element in our site now we have CSS, let’s not start doing that kind of thing again now we have data-* attributes.

I welcome the data-* attribute. It’s a simple and immediately useful method to add custom semantic data to HTML documents. Just avoid using it to litter implementation-specific crap into your documents :)

JavaScript Archive RSS Feed

JavaScript Linkage