Thursday, February 28, 2013

HTML5 Audio

Been playing around with HTML5 audio recently. If you're new to the <audio> tag, this was introduced with HTML5 in order to allow native audio support. Having built in widgets to control audio is great, but the underlying AudioContext library is where the real power is. I'm still playing with the AudioContext library, and should have something more to say about this in the near future. But, for this post, we'll focus on generating a <audio> tag, and some of the important things that you can do from the scripting side to control the functionality and react to events.

A quick table and descriptions of useful attributes:
AttributeDescription
srcThe URL to the audio file
durationThe length, in seconds, of the track. You will get trailing decimals for the exact length of the track (it's not rounded off)
pausedTrue if playback is paused, false otherwise
autoplayWhen set to true, the track will automatically start playing when enough of the track has loaded
controlsWhen set to true, a control widget is rendered on the HTML page


Here's an audio tag that isn't hooked up to anything, but has the controls enabled:

Useful methods:
MethodDescription
play()Starts audio playback
pause()Pauses audio playback


Note that there is no stop() function. However, I think this is a reasonable way to mimic stop:
  function stop(audioElement) {
    audioElement.pause();
    audioElement.currentTime = 0;
  }

Useful events:
EventDescription
playCalled when the audio track begins playing
pauseCalled when the audio track is paused
endedCalled when the audio track has ended


Here's a code snippet instantiating an audio element using Javascript:

  // Create an <audio> element dynamically.
  var audio = new Audio();
  audio.src = 'file.m4a';
  audio.controls = true;
  audio.addEventListener('play', function() { isPlaying = true; }, false);
  audio.addEventListener('pause', function() { isPlaying = false; }, false);
  audio.addEventListener('ended', function() { isPlaying = false; }, false);

  document.body.appendChild(audio);

In this example, I create an audio tag with a source file pointing locally to "file.m4a". Controls are enabled and visible when the page is rendered. I also add 3 event handlers that simply toggle a state variable that shows whether the track is playing or not.

This scratches the surface of what you can do with HTML5 <audio>. Later, I'll get into tying <audio> with the AudioContext object for sound mixing (filters, panning, gain, etc.) and visualization (frequency and amplitude graphing). Think of this as the tiny tip of a very, very large iceberg.

Monday, February 25, 2013

h.264... @#$%!#$

I've been playing around with video encoding lately, especially focusing on learning the algorithmic design that goes into something so complex. If you're not familiar with video encoding, consider the following scenario: a 5 second video clip at 30 frames per second and each frame's dimensions are 640x480 using 24-bit to represent every pixel in an image. 5 seconds of uncompressed video? You're easily looking at 138,240,000 bytes (5 * 30 * 640 * 480 * 3 bytes). Despite the size of hard drives these days, that's not maintainable.

There are a number of video formats out there - .avi, .wmv, .mpg, .mov (QuickTime), .mp4. h.264 is the encoding format for the video stream, and can fit in a .mov, .mp4, .m4v or other MPEG-4 files. And for better or worse, this is the encoding that I'm currently focusing on.

If you're wanting to write your own h.264 encoder, you need to read the spec. However, the spec is written in spec language, a really obscure dialect of the English language, where practically nothing makes sense. So, if you're writing an encoder, here is the spec, made free to download.

There are a number of web sites and pages that you can scour on the web to look for information about h.264. A few keywords that might be useful: CAVLC, CABAC, intra-prediction, inter-prediction, quantization, DCT, I-frames, B-frames, P-frames.

One interesting website with a very basic look at h.264: a blog entry at cardinalpeak.com that illustrates a basic encoder for 128x96 video using all I-frames, with no compression or prediction. It's a good start towards understanding h.264, but I found while trying to understand portions of this, that it was lacking in the following descriptions:
-h.264 is a bitstream format. If you do a lot of work on computer architecture, you end up thinking a lot in terms of endianness, bytes, words, and double words. As a bitstream format, all tables and field sequences specified for h.264 must be understood and translated as a sequence of bits, with only some nominal padding to convert a sequence of bits into a byte. When encoding headers, it may be useful to think of each header as a stream of 0's and 1's, starting from the left and going to the right. If you see 0x80 in the next byte to be read, the next bit that is read by bit stream is 1, not 0.

-Exponential Golomb codes are used to encode variable length fields, and need to be understood and correctly encoded in order for your encoded stream to be correctly decoded. There really isn't a workaround to this, it just has to be done. I've not done a comparison of a Golomb lookup table versus an algorithmic transformation, but there is value in understanding algorithmically understanding how an exponential Golomb calculator works. The Wikipedia entry is a pretty good start. If you're looking for code, something like this should work (in pseudo-code):

function eGolombCalc(num) {
   finalBits = -1;
   returnVaue = num + 1;
   num++;
   while (num) {
     finalBits += 2;
     num = num >> 1;
   }

   return finalBits, returnValue;
}

You'll always end up with an odd number of total bits in an exponential Golomb code, and since the first half of the bits are leading zeroes, it can help to keep track of the total number of bits in the resulting code. The non-zero portion of the sequence is simply the original value + 1.

So, there's a start towards a full h.264 encoder. Building on the website links and other information here, my best advice to build off of this is to look at the following (in order): sequence parameter set (SPS) header, picture parameter set (PPS) header, slice header, macroblock header, intra-prediction, CAVLC, inter-prediction, CABAC, and quantization and other topics. I may touch upon these topics in the future.

Welcome!

Welcome to my new blog!  I'll be posting on lots of random topics related to programming, application development, computer architecture, gaming, etc.  A little about me - I'm a computer engineer working for a leading company in the semiconductor industry.  My areas of interest and expertise include: computer architecture, embedded software (firmware, microcode), software/hardware validation methodology, software engineering methodology, compilers, media encoding/decoding (audio, video, pictures), genetic and neural network algorithms, C, C++, Perl, HTML/HTML5, Javascript, SQL, Java, and the lives of my wife and 2 boys.  

Want to reach me through the Internet?  wobblycucumber at gmail dot com