Wednesday, March 6, 2013

Client Storage and the Jukebox

Jukebox Demo

After playing around with my jukebox every day (my son likes it when I play songs on it for him), I found it a little annoying that every time I want to use the jukebox, I need to drag and drop the audio tracks into it. Par for the course when you use a web app? That was probably correct 4 years ago, but today, HTML5 gives us options! HTML5 offers a small veriety of different options for storing data in the browser (client-side) - local storage, session storage, indexed database, and the file system. Session storage is useful only to store data for that session. (It's not very clear to be where you'd really want to use session storage). Local storage is useful for storing data from invocation to invocation but there are limits on the amount of data the browser allows (typically 2-5MB). An indexed database allows for asynchronous access and storage of binary data, coupled with indexing for easier searching. The file system is a sandboxed, local file system but the spec is still early and support is sparse.

For this feature, we'll use the indexed database to store the audio blobs and track info. Insertion order, for now, is dependent on when you inserted the files. Later, we'll go back using local storage or additional indexed database stores to add in playlists with ordering. In an ideal scenario, we'd use local storage to store jukebox configuration settings, the indexed database to store playlists, track categories, and track trivia, and the actual tracks would get stored in a sandboxed file system. We'll take a small hop forward before we just head-first into all the possibilities of client-side storage, and tackle the indexed database first.
So, let's give the indexed database a spin. All we want to do is store our audio blobs when they're added to the jukebox, load everything that was stored automatically when the site is loaded, and of course, allow the jukebox to play the tracks.

Right off the bat, storing a Blob in an IndexedDB in Chrome will not work. There is a bug that currently prevents Blobs from being stored (but it's recognized as a bug, and judging by the amount of activity in the thread, I'd guess that it stands a good chance at being fixed, since this is supported in every other Javascript engine). So, we need to use a polyfill to stringify our blobs for storage in the IndexedDB, and convert it back to a Blob when we pull it out. In this case, I attached a FileReader object, and read in the file as a binary string, then assigned the binary string as a key/value pair to the hash storing all the tracks. Here's the code snippet:

  var files = e.dataTransfer.files;
  for(var i = 0; i < files.length; i++) {
    var f = files[i];

    var reader = new FileReader();
    reader.onload = (function(filePtr) {
      return function(e) {
        var str = e.target.result;
        addTrack(filePtr.name, e.target.result);
      };
    })(f);
    ...

Important things to remember here: FileReader is an asynchronous API, meaning -
1. Files are read in parallel to the main thread executing. A call to "readAs...()" fires off that thread and the main thread continues execution. As a result, callbacks are needed to obtain the result (in this case, it's the 'onload' callback) and do the appropriate handling of the data.
2. Because the main thread continues execution, if there is stateful information that is needed when the callback gets executed, you need to pass that information in to the callback. Otherwise, your data will be out of sync. This is particularly true if the above code snippet actually had 2 or more files to deal with. For large files, the file name would not be correct.

As we can see in the following code, we simply write the binary string into the indexed database, and there's no issues:
  function addTrack(name, blob) {
    var transaction = db.transaction(["tracks"], "readwrite");
    var store = transaction.objectStore("tracks");
    var request = store.put({ "name": name, "blob": blob, "timeStamp": new Date().getTime() });
  }

So, insertion's pretty easy since we take advantage of some native API functions to return us the type of data that's friendly to the indexed database. Pulling data from the database though... well, here we go...

The Jukebox does a single pull of data from the data - right at the very beginning of the script loading. We start a transaction to open a database handle, and upon loading, load up all the data.

  1.  var request = window.webkitIndexedDB.open("jukebox", 1);
  2.  request.onerror = error;
  3.  request.onsuccess = function(e) {
  4.    db = e.target.result;
  5.
  6.    var transaction = db.transaction(["tracks"], "readwrite");
  7.    var store = transaction.objectStore("tracks");
  8.
  9.    var keyRange = window.webkitIDBKeyRange.lowerBound(0);
  10.    var cursorRequest = store.openCursor(keyRange);
  11.    cursorRequest.onerror = error;
  12.    cursorRequest.onsuccess = function(e) {
  13.      var result = e.target.result;
  14.
  15.      if (!!result == false)
  16.        return;
  17.
  18.      var arrayBuffer = new Uint8Array(new ArrayBuffer(result.value.blob.length));
  19.      for(var i = 0; i < result.value.blob.length; i++)
  20.        arrayBuffer[i] = result.value.blob.charCodeAt(i);
  21.
  22.      var blob = new Blob([arrayBuffer]);
  23.      fileBlobs[result.value.name] = window.URL.createObjectURL(blob);
  24.
  25.      var option = document.createElement("option");
  26.      option.innerHTML = result.value.name;
  27.      option.id = result.value.name;
  28.      option.addEventListener('dblclick', handleDoubleClick, false);
  29.      option.addEventListener('click', handleClick, false);
  30.
  31.      playlistElement.appendChild(option);
  32.
  33.      result.continue();
  34.    };
  35.  }

I'm sure all of this code makes sense. Line 1, we send the request to open the database 'jukebox'. Because the indexed database is an asynchronous database, we don't know when the open will complete, so the only way to correctly handle errors, success, etc. is to assign callbacks, which are done after the call is made. It has to be done this way. Prior to the open request, we don't have a handle to the transaction, so we attach the callbacks on, and wait. We assign an error handler in line 2 ('error' is a function reference), and a success handler in line 3.

When we have successfully opened the database, this is when we want to grab all our track data. All the track data is stored in the 'tracks' store, so we open up a transaction to that store in lines 6 and 7. Line 9 allows us to specify a key range for our search in the the 'tracks' store. Data is stored based on timestamp, so we set our min range to 0, in order to grab everything. Line 10 opens a new request on the 'tracks' store to return all the data that matches the key range in the request.

At this point, we just need to wait for the data to be returned to us, then we'll do the appropriate thing. 2 indexed database concepts to be aware of. First, we originally passed in a hash, so the returned data (e.target.result in line 13) is a reference to the hash, along with additional members. Lastly, only 1 result is returned at a time in our success callback. For a request with multiple matches, we'll have multiple calls of the success callback - each returning a single set of data that matches the key range. However, we must call continue() from the returned result (line 33). This ensures that we'll get the entire data set.

Lines 18-22 do the conversion of the track data into something that our native <audio> tag can understand. First (line 18), we create an array buffer with a byte-level view that is the same size as the number of bytes in the track data. Lines 19-20, we iterate and assign the elements of the array buffer. Finally, in line 22, we can create our Blob object. This is by no means the fastest code in the world. There is probably a lot that could be done, using a Web Worker, to parallelize this work, or even a different approach to do a faster conversion.

And that's how we load and store data in the indexed database. It's a little different that the traditional, iterative way of programming where function calls block until the function returns. Think of it more like an implementation of hardware interrupts and handlers, but in the world of Javascript. The good thing? No x86 triple-fault shutdown. (The history and modern implementation of that... REALLY interesting)

No comments:

Post a Comment