Monday, December 23, 2013

CAVLC Encoder Demo

CAVLC Encoder

This is a demo for h.264 CAVLC encoding. This includes all needed VLC lookup tables, and should work for almost all input. There are a few corner cases in level VLC encoding that haven't been tested.

This demo accompanies a tutorial writeup on CAVLC encoding found here

nC:

4x4 sample data


Encoded stream:


Computation variables:
Reordered sequence:
Reordered sequence without zeros:
TotalCoeff (0 <= TotalCoeff <= 16):
T1 (0 <= T1 <= 3):
T1 Encoding:
coeff_token:
Level VLCs:
TotalZeros:
TotalZeros Encoding:
Run Before:

CAVLC Encoding Tutorial

Context Adaptive Variable Length Coding, or CAVLC, is an encoding process used to compactly represent blocks of data in a 4x4 array. Typically, this would be a 4x4 array of luma values, or chroma values. In a previous post, I mentioned encoding I-frames in I_PCM mode, which expresses each pixel in YUV mode, losing no accuracy in the encoded data in the process. The main drawback of I_PCM mode is that it encodes the raw bytes. But, what if you could rely on surrounding data to infer what should happen in the current macroblock, and if there were any differences, just encode the difference? If that was the case, you might need to encode 4-5 bytes instead of 16 (4 x 4 = 16). In a future article, we'll talk about prediction in I-frames (also referred to as intra-prediction). For now, let's focus on the encoding, and how h.264 expresses sparse data succinctly in order to save bytes.

There are a few, in my opinion, rather brief tutorials out on the Internet on how CAVLC works. I hope this tutorial is a bit more complete and in-depth, as my experience reading through those tutorials ended up with me going back to the h.264 spec, reading the decoding process, and reverse engineering it. As I go through the encoding steps, I'll refer back to variable names in the spec (section 9.2), so if you feel the urge to read the spec, this will correlate closely with what you're reading.

Step 1: nC
nC is an index which points to which table to use for a subsequent lookup that specifies the number of non-zero coefficients and trailing 1s. There are a number of rules that dictate the value of nC; for the extent of this tutorial, we'll ignore these rules and always assume the value of nC to be 0. (We'll revisit the rules to determine nC later)

Step 2: Reordering the array
Assume the following input 4x4 array (which is coincidentally used in a lot of examples out there):
0     3     -1     0
0     -1     1     0
1     0     0     0
0     0     0     0

We want to reorder this array in a specific way, starting from the top left corner, and working our way down diagonally to the bottom right corner. The index of coefficients in the reordered array is in parenthesis:
0 (0)     3 (1)     -1 (5)     0 (6)
0 (2)     -1 (4)     1 (7)     0 (12)
1 (3)     0 (8)     0 (11)     0 (13)
0 (9)     0 (10)     0 (14)      0 (15)

In this example, the resulting reordered array becomes: [ 0, 3, 0, 1, -1, -1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0 ]

Step 3: Counting Non-zero Coefficients and Trailing 1s
Iterating through the reordered array, count the total number of non-zero values (totalCoeff). Also, iterating backwards through the array, count the number of trailing 1s (t1), either a positive or negative 1, up to 3 instances.

In our example, non-zero values occur at indices 1, 3, 4, 5, and 7. totalCoeff is assigned a value of 5. Working backwards through the list, we see that trailing 1s starts at index 7 and a string of 1s occurs at indices 3, 4, 5, and 7. At most we can count 3, so t1 is assigned a value of 3.

Step 4: coeff_token
Taking nC, totalCoeff, and t1, we can determine a VLC that compactly represents the totalCoeff and t1. The specific table of values is found in Table 9-5 in the h.264 spec. It's too long to print every enry, but this is what it looks like for our relevant combination:
t1 totalCoeff 0 <= nC < 2 2 <= nC < 4 4 <= nC < 8 8 <= nC nC == -1 nC == -2
1 5 0000 0001 10 0000 110 0100 0 0100 01 - 0000 0001 10
2 5 0000 0010 1 0000 101 0100 1 0100 10 - 0000 0010 0
3 5 0000 100 0011 0 1010 0100 11 - 0001 001
0 6 0000 0000 0111 1 0000 0011 1 0001 001 0101 00 - 0000 0000 111
1 6 0000 0000 110 0000 0110 0011 10 0101 01 - 0000 0000 110


For our combination (nC = 0, totalCoeff = 5, t1 = 3), coeff_token is 0000100.

Step 5: Encoding Trailing 1s
Encoding trailing ones is not too difficult. If t1 is nonzero, begin working backwards through each trailing 1. For each trailing 1 ([-1, -1, 1] in our case), assign 0 to each +1, and 1 to each -1.

For our example, the resulting bit string is 011

Step 6: Encoding remaining coefficients
This step is perhaps the most confusing. Details on this step are forthcoming.
Step 7: Count Zeros
This step is rather simple: ignore all zeros after the last non-zero coefficient. Count the number of remaining zeros (totalZeros). Another VLC table (Table 9-8 in the h.264 spec) is used to determine this bitstream, and is a function of totalZeros and totalCoeff.

For our example (totalZeros = 3, totalCoeff = 5), this corresponds to the bit string 111

Step 8: Encode Run Before
The final step is to encode the number of zeros that precede each non-zero coefficient, including the trailing 1s. Working backwards, for every non-zero coefficient, count the number of zeros that precede the coefficient. Another VLC table (Table 9-10 in the h.264 spec) is used to provide the correct encoding, and is a function of the number of zeros left to be counted, and the number of zeros preceding the coefficient. If the number of zeros left is 0, stop counting run before. Also, if there are zeros that precede the very first non-zero coefficient, an entry for run before is not added since the remaining zeros left is inferred to precede the very first non-zero coefficient.

For our example, iterate through each non-zero coefficient, working backwards.
1. Index 7: Non-zero coefficient: 1, number of preceding zeros: 1, zeros left: 3, resulting bit string: 10
2. Index 5: Non-zero coefficient: -1, number of preceding zeros: 0, zeros left: 2, resulting bit string: 1
3. Index 4: Non-zero coefficient: -1, number of preceding zeros: 0, zeros left: 2, resulting bit string: 1
4. Index 3: Non-zero coefficient: 1, number of precending zeros: 1, zeros left: 2, resulting bit string: 01
5. Index 1: skip, since this is the very last coefficient

I've coded a Javascript demo that takes a nC value and a 4x4 sample data array, and emits the corresponding CAVLC bit stream. You can find the demo here.

Friday, June 7, 2013

Could this be bad?

Looks like a malformed .mp4 file will crash Windows Media Player... I wonder that could potentially be a root exploit...

Wednesday, May 29, 2013

I can haz h.264 encoder?

Nope. Not mine at least... (or not yet anyways).

Off and on the last few months, I've been trying to put together a basic h.264 encoder. You're probably wondering, "why?", and it's either one of 2 "why" questions... "why do that when there's a perfectly good open source library for encoding (libx264) out there?" Or, "why put yourself through the pain? Masochist?" The short and simple answers, 1. I can't for a variety of design reasons which cannot be changed, and 2. because this is the cornerstone of an entire project.

If you're in a similar boat as me - wanting or needing to write your own encoder - where do you start? What do you enter for your google search words? I saw an interview with Anthony Bourdain (popular chef, author, and host of food/travel shows on a variety of cable channels) where someone asked whether they should get into the restaurant business because they had a passion for food. His reply (in so many words) - "You should try to work in a restaurant for a year, for free if you have to, in order to understand what you're getting yourself into. There are those who enjoy the heat, the really hard work, the long hours, the low profit margin... and there are normal people".

I mention this because my experience in trying to gather a starting point for writing an encoder is similar in experience. "h.264 encoder tutorial source code" are all words you'll probably come up with, and the results may leave you scratching your head. You might read some forum posts, find that people are somewhat rude, and tell you, "Go read the spec" as not only their default answer, but their only answer. If after all this, you *still* feel the urge to write an encoder, well, you might be one of the few that Chef Bourdain is talking about.

Some good resources to start off on that are available on the web:
  • The spec. Because this is the document everyone will tell you to read. (And it's annoying that universally used specs have paywalls...)
  • A really simple encoder. I think is a great article to help wade your toes into the waters of h.264 encoding, because those waters are going to get really murky. You want to understand the very basics of how the h.264 encoding process works, because every feature supported in h.264 is, in some way, derived off of this very basic encoding flow.

Some notes on the above links:
- I understand the reason why people will always direct you to read the spec if you have a question, because, the answer will be in there (it will... it might just be hiding sometimes). But... the spec is hard to read. It really is. Read the spec, but understand that the depth of knowledge that you need to understand from the spec may/may not require you to read the entire spec.
- The basic encoder from Ben Mesander is a great starting point. It opens up the door to many questions that you may ask, but it's a great starting point. There is one bug that I've found related to the slice header. Ben uses a static slice header (0x00, 0x00, 0x00, 0x01, 0x05, 0x88, 0x84, 0x21, 0xa0) for all his picture slices. In the comments, someone points out that "0x05" should actually be "0x65". 0x65 is correct, according to the spec. The other issue related to the slice header is that slice headers cannot be static. The spec dictates that consecutive slice headers must have at least one difference between a handful of fields (frame_num, and idr_pic_id for instance). This means that a static header can't be used, but it is ok to toggle between 2 slice headers that change a couple of those bits. My suggestion:

    const uint8_t slice_header1[] = { 0x00, 0x00, 0x00, 0x01, 0x65, 0x88, 0x84, 0x21, 0xa0 };
    const uint8_t slice_header2[] = { 0x00, 0x00, 0x00, 0x01, 0x65, 0x88, 0x94, 0x21, 0xa0 };

In the code that writes the slice header, consider a piece of code like this:
    int i, j, use_slice_header_1 = 1;
    
    ...
    
    if (use_slice_header_1)
      fwrite(slice_header1, 1, sizeof(slice_header1), stdout);
    else
      fwrite(slice_header2, 1, sizeof(slice_header2), stdout);
    
    use_slice_header_1 = !use_slice_header_1;

Note that the slice headers are slightly different. Full disclosure: I haven't completely tested this suggestion, but my basic test does almost the same thing. The bits might be off, in which case, let me know and I'll fix and test it.

I like tools. Moreover, I like tools that are helpful for developers, because I'm a developer at heart. As a developer, I like tools that are simple, accessible, full of information that can be used for all types of debugging, and organized in a way that is comprehensible by everyone that uses it. Since this is a website (and I'm focusing a lot on Javascript, HTML5, etc.), tools should be web-based, as much as possible. Here's a list of tools that I've written that are h.264 related and you may find useful:
  • Exponential Golomb Code Calculator - works for calculating unsigned exponential golomb codes for now. Signed exponential golomb codes will be added soon (blog post about EGCs incoming...)
  • (Added 12/23/2013) h.264 CAVLC Encoder - demo of CAVLC encoding, based on input nC and 4x4 sample data
  • h.264 decoder - incoming

Tutorials I've written related to h.264: Many more posts about h.264 on the way. I'll try to add links here. In the meanwhile, good luck! (I know I needed lots of it)

Tuesday, May 21, 2013

Unsigned Exponential Golomb Codes

Exponential Golomb codes are sequences of binary strings that allow for variable length encoding of numbers. The premise behind this is that for network transmission or data compaction, allocated a fixed number of bytes to store a value is expensive, so the trade-off is data compaction at the cost of slightly higher processing time. Exponential Golomb codes are used a lot in h.264 streams, because using a bit is such a premium, especially when these streams need to be transmitted in real time over a network.

Here's a Javascript calculator that allows you to compute the exponential Golomb code, and vice versa, for all non-negative numbers. (I'll add signed exponential Golomb codes later)

Unsigned Number:



Unsigned Exponential Golomb Code:

Sunday, May 5, 2013

It's been a while...

Still here. Life outside of my programming projects (taxes, children, work, family vacation) have been a huge interruption. I wrote a basic h.264 encoder a while back, and have had trouble getting the output stream to play correctly in your day to day media players. Turns out that after using some tricks in ffmpeg, I'm not encoding frames incorrectly. It's almost definitely that my mp4 container is encoding data incorrectly and throwing everything off. So, I'm now writing an mp4 container encoder. All of these atom types are spectacularly boring, but you have to slog your way through it all to make a credible and functional encoder. I hope to have something working in the coming days and finally I'll be able to play some of my home-made video. Then, I'll have a LOT more to say about h.264, mp4/Quicktime.

Thursday, March 7, 2013

Equalizer Added...

Jukebox: here

Added a basic equalizer to the Jukebox. Not 100% sure that it's doing everything correctly, but it's much too late to go back and double check.

Taking a small break from audio. I will come back in a few days and unravel the AudioContext object and how it gets hooked into the Canvas tag to visualize each sound's equalization. But for now, just enjoy the pretty color gradient while you play your music...

Wednesday, March 6, 2013

Client Storage and the Jukebox

Jukebox Demo

After playing around with my jukebox every day (my son likes it when I play songs on it for him), I found it a little annoying that every time I want to use the jukebox, I need to drag and drop the audio tracks into it. Par for the course when you use a web app? That was probably correct 4 years ago, but today, HTML5 gives us options! HTML5 offers a small veriety of different options for storing data in the browser (client-side) - local storage, session storage, indexed database, and the file system. Session storage is useful only to store data for that session. (It's not very clear to be where you'd really want to use session storage). Local storage is useful for storing data from invocation to invocation but there are limits on the amount of data the browser allows (typically 2-5MB). An indexed database allows for asynchronous access and storage of binary data, coupled with indexing for easier searching. The file system is a sandboxed, local file system but the spec is still early and support is sparse.

For this feature, we'll use the indexed database to store the audio blobs and track info. Insertion order, for now, is dependent on when you inserted the files. Later, we'll go back using local storage or additional indexed database stores to add in playlists with ordering. In an ideal scenario, we'd use local storage to store jukebox configuration settings, the indexed database to store playlists, track categories, and track trivia, and the actual tracks would get stored in a sandboxed file system. We'll take a small hop forward before we just head-first into all the possibilities of client-side storage, and tackle the indexed database first.
So, let's give the indexed database a spin. All we want to do is store our audio blobs when they're added to the jukebox, load everything that was stored automatically when the site is loaded, and of course, allow the jukebox to play the tracks.

Right off the bat, storing a Blob in an IndexedDB in Chrome will not work. There is a bug that currently prevents Blobs from being stored (but it's recognized as a bug, and judging by the amount of activity in the thread, I'd guess that it stands a good chance at being fixed, since this is supported in every other Javascript engine). So, we need to use a polyfill to stringify our blobs for storage in the IndexedDB, and convert it back to a Blob when we pull it out. In this case, I attached a FileReader object, and read in the file as a binary string, then assigned the binary string as a key/value pair to the hash storing all the tracks. Here's the code snippet:

  var files = e.dataTransfer.files;
  for(var i = 0; i < files.length; i++) {
    var f = files[i];

    var reader = new FileReader();
    reader.onload = (function(filePtr) {
      return function(e) {
        var str = e.target.result;
        addTrack(filePtr.name, e.target.result);
      };
    })(f);
    ...

Important things to remember here: FileReader is an asynchronous API, meaning -
1. Files are read in parallel to the main thread executing. A call to "readAs...()" fires off that thread and the main thread continues execution. As a result, callbacks are needed to obtain the result (in this case, it's the 'onload' callback) and do the appropriate handling of the data.
2. Because the main thread continues execution, if there is stateful information that is needed when the callback gets executed, you need to pass that information in to the callback. Otherwise, your data will be out of sync. This is particularly true if the above code snippet actually had 2 or more files to deal with. For large files, the file name would not be correct.

As we can see in the following code, we simply write the binary string into the indexed database, and there's no issues:
  function addTrack(name, blob) {
    var transaction = db.transaction(["tracks"], "readwrite");
    var store = transaction.objectStore("tracks");
    var request = store.put({ "name": name, "blob": blob, "timeStamp": new Date().getTime() });
  }

So, insertion's pretty easy since we take advantage of some native API functions to return us the type of data that's friendly to the indexed database. Pulling data from the database though... well, here we go...

The Jukebox does a single pull of data from the data - right at the very beginning of the script loading. We start a transaction to open a database handle, and upon loading, load up all the data.

  1.  var request = window.webkitIndexedDB.open("jukebox", 1);
  2.  request.onerror = error;
  3.  request.onsuccess = function(e) {
  4.    db = e.target.result;
  5.
  6.    var transaction = db.transaction(["tracks"], "readwrite");
  7.    var store = transaction.objectStore("tracks");
  8.
  9.    var keyRange = window.webkitIDBKeyRange.lowerBound(0);
  10.    var cursorRequest = store.openCursor(keyRange);
  11.    cursorRequest.onerror = error;
  12.    cursorRequest.onsuccess = function(e) {
  13.      var result = e.target.result;
  14.
  15.      if (!!result == false)
  16.        return;
  17.
  18.      var arrayBuffer = new Uint8Array(new ArrayBuffer(result.value.blob.length));
  19.      for(var i = 0; i < result.value.blob.length; i++)
  20.        arrayBuffer[i] = result.value.blob.charCodeAt(i);
  21.
  22.      var blob = new Blob([arrayBuffer]);
  23.      fileBlobs[result.value.name] = window.URL.createObjectURL(blob);
  24.
  25.      var option = document.createElement("option");
  26.      option.innerHTML = result.value.name;
  27.      option.id = result.value.name;
  28.      option.addEventListener('dblclick', handleDoubleClick, false);
  29.      option.addEventListener('click', handleClick, false);
  30.
  31.      playlistElement.appendChild(option);
  32.
  33.      result.continue();
  34.    };
  35.  }

I'm sure all of this code makes sense. Line 1, we send the request to open the database 'jukebox'. Because the indexed database is an asynchronous database, we don't know when the open will complete, so the only way to correctly handle errors, success, etc. is to assign callbacks, which are done after the call is made. It has to be done this way. Prior to the open request, we don't have a handle to the transaction, so we attach the callbacks on, and wait. We assign an error handler in line 2 ('error' is a function reference), and a success handler in line 3.

When we have successfully opened the database, this is when we want to grab all our track data. All the track data is stored in the 'tracks' store, so we open up a transaction to that store in lines 6 and 7. Line 9 allows us to specify a key range for our search in the the 'tracks' store. Data is stored based on timestamp, so we set our min range to 0, in order to grab everything. Line 10 opens a new request on the 'tracks' store to return all the data that matches the key range in the request.

At this point, we just need to wait for the data to be returned to us, then we'll do the appropriate thing. 2 indexed database concepts to be aware of. First, we originally passed in a hash, so the returned data (e.target.result in line 13) is a reference to the hash, along with additional members. Lastly, only 1 result is returned at a time in our success callback. For a request with multiple matches, we'll have multiple calls of the success callback - each returning a single set of data that matches the key range. However, we must call continue() from the returned result (line 33). This ensures that we'll get the entire data set.

Lines 18-22 do the conversion of the track data into something that our native <audio> tag can understand. First (line 18), we create an array buffer with a byte-level view that is the same size as the number of bytes in the track data. Lines 19-20, we iterate and assign the elements of the array buffer. Finally, in line 22, we can create our Blob object. This is by no means the fastest code in the world. There is probably a lot that could be done, using a Web Worker, to parallelize this work, or even a different approach to do a faster conversion.

And that's how we load and store data in the indexed database. It's a little different that the traditional, iterative way of programming where function calls block until the function returns. Think of it more like an implementation of hardware interrupts and handlers, but in the world of Javascript. The good thing? No x86 triple-fault shutdown. (The history and modern implementation of that... REALLY interesting)

Jukebox

Jukebox (for Google Chrome - not intended to run on any other browsers for now):
Now Playing:
Length:




Playlist (drop files to add)


Version History:
v0.1 - Demo
v0.1a - Bug fix in looping the playlist
v0.2 - Added support for saving songs using an indexed database backend
v0.25 - Added an equalizer

Stripping .mp4 files...

On a whim, I tried dropping an mp4 file, with video, in to the jukebox on Chrome, just to see what would happen. Turns out... Chrome automatically strips off the video stream, and plays the audio just fine!

MP4 files... if you don't know much/anything about this file format, MP4 is simply a container format, with hooks that allow audio streams (like AAC) and video streams (like h.264) to attach content, synchronize during playback, and do other media-ish things. An .mp4 file doesn't mean very much. If the video or audio stored in the file uses an obscure or unsupported codec, you can't play it.

Leaving work this afternoon, I wondered whether the <audio> tag would support playback of just the audio stream, since it plays .m4a files, which are intended to be mp4 files with only the audio stream. Looks like it does!

I'm going to come back to this topic later, and peel back some of the .mp4 file format. Stripping the video stream isn't very difficult, conceptually, and should be able to be done with very little effort.

Tuesday, March 5, 2013

80 columns 4 j00

Annoying when coding guidelines say every line should be 80 columns, and people check in code that isn't...

HTML5 Specification Links

Quick list of links to various HTML5 specs.

Polyfills and Chrome

In an upcoming post, I'll touch on the IndexedDB, adding some persistence to the Jukebox. However, for now, a small post about bugs.

After doing some testing and getting my feet wet with the IndexedDB, I tried to do my first insertion - a hash containing a string and a Blob, and immediately saw that everything blew up. Why? After a little searching, I found that blobs are not a supported insertion type for the IndexedDB in Chrome. (This is where you try unsuccessfully to convert me to Firefox)

Software bugs exist. (So do hardware bugs... that's a completely different level of complexity) It's hard when you write applications on software stack, and run into bugs, feature limitations, etc., because it forces you to think in somewhat unnatural ways to accomplish what you're intending to accomplish - you're expecting the native API to support something; it doesn't; you have to write some code to work around that. And that piece of code... that's a polyfill.

I've been hit by a few issues in Chrome/Webkit. It's annoying. It's to the point at times that I've written my own polyfill to either manage the problem in a satisfactory way, or work around the problem completely. But, that's life as a software developer.

Friday, March 1, 2013

A minor detour

While working on audio visualization and processing, I thought it might be fun to try to learn HTML5's support for drag-and-drop, and see how well I could tie it into <audio> to make a jukebox of sorts. Call this version 0.1 - demo.

Just drag in any music file (.wav, .mp3, .m4a). Double-click a track to start playing. The player should rotate through the playlist and keep playing songs.

Disclaimer: I write my Javascript for Google Chrome. If this works on a Mozilla or IE browser, great. (If it works on an IE browser, I'll be really surprised... IE is pretty horrible) I'm not investing the time to make everything cross-compatible.
Disclaimer: You're on your own for sound files. I cannot and will not provide any music to use.
Disclaimer: This is a demo. If you're an enterprising person, or you're really lazy, you'll choose to copy and paste my code into your own application. Use of this code comes with no guarantee of support. If you choose to use it, learn from it, do something with it, at least do me the favor of dropping a comment in the box below and let me know.

    UPDATE: The Jukebox demo has been relocated here

The Interesting Stuff
Drag-and-drop support in HTML5 is really nice, and rather painless. Quick notes on how to use it:
1. Add the 'drop' and 'dragover' event handlers to the element that is going to be dropped into. In my demo, I use a <div>. I'm not sure if all elements can be dropped into - my brief research didn't suggest that there were any restrictions on particular tags.
2. The drop handlers need to stop the browser from handling any dragged content natively. This is done through function calls to e.preventDefault() and/or e.stopPropogation() in the dragover and drop handlers.
3. When dropping files, you'll get a file list returned from e.dataTransfer.files. You'll need to handle it like you'd handle any file list in Javascript. Iterate through the array of File objects and handle each file iteratively.

I really like the File->Blob URL conversion that can be done rather painlessly through the native File API. Want a way to handle your File blobs in a way that native HTML tags and the browser will understand? Convert it to a Blob, then generate a blob:// URL for the Blob. You can treat that file as a URL for the lifetime of your program. Here's a snippet showing how I generate and store my blob URLs in an indexed hash:
  fileBlobs[f.name] = window.URL.createObjectURL(f.slice(0, f.size, "image"));

Despite storing local files in a Blob, there is still some latency between when the source URL is assigned to the blob in the audio tag, and when the meta data from the file is parsed so things like the track duration can be retrieved. If you try to read the duration attribute right after assigning the src attribute, you'll likely get 'NaN' returned. The workaround - add an event listener for the 'loadedmetadata' event to the audio tag. Once the meta data has loaded, you can read the duration attribute and get a correct value (in seconds). Here's an example of how I handled support for calculating the duration:
  audio.addEventListener('loadedmetadata', function(e) {
    var durationDiv = document.getElementById("duration");
    var minutes = Math.floor(audio.duration / 60);
    var seconds = Math.floor(audio.duration) % 60;
    if (seconds < 10)
      durationDiv.innerHTML = "<b>Length:</b> " + minutes + ":0" + seconds;
    else
      durationDiv.innerHTML = "<b>Length:</b> " + minutes + ":" + seconds;
  }, false);

I'll revisit this demo in the future, adding some visualizations rendered from AudioContext post-processing, a track randomizer, custom play controls, support for track seeking, maybe even real-time, controllable filtering.

Thursday, February 28, 2013

HTML5 Audio

Been playing around with HTML5 audio recently. If you're new to the <audio> tag, this was introduced with HTML5 in order to allow native audio support. Having built in widgets to control audio is great, but the underlying AudioContext library is where the real power is. I'm still playing with the AudioContext library, and should have something more to say about this in the near future. But, for this post, we'll focus on generating a <audio> tag, and some of the important things that you can do from the scripting side to control the functionality and react to events.

A quick table and descriptions of useful attributes:
AttributeDescription
srcThe URL to the audio file
durationThe length, in seconds, of the track. You will get trailing decimals for the exact length of the track (it's not rounded off)
pausedTrue if playback is paused, false otherwise
autoplayWhen set to true, the track will automatically start playing when enough of the track has loaded
controlsWhen set to true, a control widget is rendered on the HTML page


Here's an audio tag that isn't hooked up to anything, but has the controls enabled:

Useful methods:
MethodDescription
play()Starts audio playback
pause()Pauses audio playback


Note that there is no stop() function. However, I think this is a reasonable way to mimic stop:
  function stop(audioElement) {
    audioElement.pause();
    audioElement.currentTime = 0;
  }

Useful events:
EventDescription
playCalled when the audio track begins playing
pauseCalled when the audio track is paused
endedCalled when the audio track has ended


Here's a code snippet instantiating an audio element using Javascript:

  // Create an <audio> element dynamically.
  var audio = new Audio();
  audio.src = 'file.m4a';
  audio.controls = true;
  audio.addEventListener('play', function() { isPlaying = true; }, false);
  audio.addEventListener('pause', function() { isPlaying = false; }, false);
  audio.addEventListener('ended', function() { isPlaying = false; }, false);

  document.body.appendChild(audio);

In this example, I create an audio tag with a source file pointing locally to "file.m4a". Controls are enabled and visible when the page is rendered. I also add 3 event handlers that simply toggle a state variable that shows whether the track is playing or not.

This scratches the surface of what you can do with HTML5 <audio>. Later, I'll get into tying <audio> with the AudioContext object for sound mixing (filters, panning, gain, etc.) and visualization (frequency and amplitude graphing). Think of this as the tiny tip of a very, very large iceberg.

Monday, February 25, 2013

h.264... @#$%!#$

I've been playing around with video encoding lately, especially focusing on learning the algorithmic design that goes into something so complex. If you're not familiar with video encoding, consider the following scenario: a 5 second video clip at 30 frames per second and each frame's dimensions are 640x480 using 24-bit to represent every pixel in an image. 5 seconds of uncompressed video? You're easily looking at 138,240,000 bytes (5 * 30 * 640 * 480 * 3 bytes). Despite the size of hard drives these days, that's not maintainable.

There are a number of video formats out there - .avi, .wmv, .mpg, .mov (QuickTime), .mp4. h.264 is the encoding format for the video stream, and can fit in a .mov, .mp4, .m4v or other MPEG-4 files. And for better or worse, this is the encoding that I'm currently focusing on.

If you're wanting to write your own h.264 encoder, you need to read the spec. However, the spec is written in spec language, a really obscure dialect of the English language, where practically nothing makes sense. So, if you're writing an encoder, here is the spec, made free to download.

There are a number of web sites and pages that you can scour on the web to look for information about h.264. A few keywords that might be useful: CAVLC, CABAC, intra-prediction, inter-prediction, quantization, DCT, I-frames, B-frames, P-frames.

One interesting website with a very basic look at h.264: a blog entry at cardinalpeak.com that illustrates a basic encoder for 128x96 video using all I-frames, with no compression or prediction. It's a good start towards understanding h.264, but I found while trying to understand portions of this, that it was lacking in the following descriptions:
-h.264 is a bitstream format. If you do a lot of work on computer architecture, you end up thinking a lot in terms of endianness, bytes, words, and double words. As a bitstream format, all tables and field sequences specified for h.264 must be understood and translated as a sequence of bits, with only some nominal padding to convert a sequence of bits into a byte. When encoding headers, it may be useful to think of each header as a stream of 0's and 1's, starting from the left and going to the right. If you see 0x80 in the next byte to be read, the next bit that is read by bit stream is 1, not 0.

-Exponential Golomb codes are used to encode variable length fields, and need to be understood and correctly encoded in order for your encoded stream to be correctly decoded. There really isn't a workaround to this, it just has to be done. I've not done a comparison of a Golomb lookup table versus an algorithmic transformation, but there is value in understanding algorithmically understanding how an exponential Golomb calculator works. The Wikipedia entry is a pretty good start. If you're looking for code, something like this should work (in pseudo-code):

function eGolombCalc(num) {
   finalBits = -1;
   returnVaue = num + 1;
   num++;
   while (num) {
     finalBits += 2;
     num = num >> 1;
   }

   return finalBits, returnValue;
}

You'll always end up with an odd number of total bits in an exponential Golomb code, and since the first half of the bits are leading zeroes, it can help to keep track of the total number of bits in the resulting code. The non-zero portion of the sequence is simply the original value + 1.

So, there's a start towards a full h.264 encoder. Building on the website links and other information here, my best advice to build off of this is to look at the following (in order): sequence parameter set (SPS) header, picture parameter set (PPS) header, slice header, macroblock header, intra-prediction, CAVLC, inter-prediction, CABAC, and quantization and other topics. I may touch upon these topics in the future.

Welcome!

Welcome to my new blog!  I'll be posting on lots of random topics related to programming, application development, computer architecture, gaming, etc.  A little about me - I'm a computer engineer working for a leading company in the semiconductor industry.  My areas of interest and expertise include: computer architecture, embedded software (firmware, microcode), software/hardware validation methodology, software engineering methodology, compilers, media encoding/decoding (audio, video, pictures), genetic and neural network algorithms, C, C++, Perl, HTML/HTML5, Javascript, SQL, Java, and the lives of my wife and 2 boys.  

Want to reach me through the Internet?  wobblycucumber at gmail dot com