Monday, December 23, 2013

CAVLC Encoder Demo

CAVLC Encoder

This is a demo for h.264 CAVLC encoding. This includes all needed VLC lookup tables, and should work for almost all input. There are a few corner cases in level VLC encoding that haven't been tested.

This demo accompanies a tutorial writeup on CAVLC encoding found here

nC:

4x4 sample data


Encoded stream:


Computation variables:
Reordered sequence:
Reordered sequence without zeros:
TotalCoeff (0 <= TotalCoeff <= 16):
T1 (0 <= T1 <= 3):
T1 Encoding:
coeff_token:
Level VLCs:
TotalZeros:
TotalZeros Encoding:
Run Before:

CAVLC Encoding Tutorial

Context Adaptive Variable Length Coding, or CAVLC, is an encoding process used to compactly represent blocks of data in a 4x4 array. Typically, this would be a 4x4 array of luma values, or chroma values. In a previous post, I mentioned encoding I-frames in I_PCM mode, which expresses each pixel in YUV mode, losing no accuracy in the encoded data in the process. The main drawback of I_PCM mode is that it encodes the raw bytes. But, what if you could rely on surrounding data to infer what should happen in the current macroblock, and if there were any differences, just encode the difference? If that was the case, you might need to encode 4-5 bytes instead of 16 (4 x 4 = 16). In a future article, we'll talk about prediction in I-frames (also referred to as intra-prediction). For now, let's focus on the encoding, and how h.264 expresses sparse data succinctly in order to save bytes.

There are a few, in my opinion, rather brief tutorials out on the Internet on how CAVLC works. I hope this tutorial is a bit more complete and in-depth, as my experience reading through those tutorials ended up with me going back to the h.264 spec, reading the decoding process, and reverse engineering it. As I go through the encoding steps, I'll refer back to variable names in the spec (section 9.2), so if you feel the urge to read the spec, this will correlate closely with what you're reading.

Step 1: nC
nC is an index which points to which table to use for a subsequent lookup that specifies the number of non-zero coefficients and trailing 1s. There are a number of rules that dictate the value of nC; for the extent of this tutorial, we'll ignore these rules and always assume the value of nC to be 0. (We'll revisit the rules to determine nC later)

Step 2: Reordering the array
Assume the following input 4x4 array (which is coincidentally used in a lot of examples out there):
0     3     -1     0
0     -1     1     0
1     0     0     0
0     0     0     0

We want to reorder this array in a specific way, starting from the top left corner, and working our way down diagonally to the bottom right corner. The index of coefficients in the reordered array is in parenthesis:
0 (0)     3 (1)     -1 (5)     0 (6)
0 (2)     -1 (4)     1 (7)     0 (12)
1 (3)     0 (8)     0 (11)     0 (13)
0 (9)     0 (10)     0 (14)      0 (15)

In this example, the resulting reordered array becomes: [ 0, 3, 0, 1, -1, -1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0 ]

Step 3: Counting Non-zero Coefficients and Trailing 1s
Iterating through the reordered array, count the total number of non-zero values (totalCoeff). Also, iterating backwards through the array, count the number of trailing 1s (t1), either a positive or negative 1, up to 3 instances.

In our example, non-zero values occur at indices 1, 3, 4, 5, and 7. totalCoeff is assigned a value of 5. Working backwards through the list, we see that trailing 1s starts at index 7 and a string of 1s occurs at indices 3, 4, 5, and 7. At most we can count 3, so t1 is assigned a value of 3.

Step 4: coeff_token
Taking nC, totalCoeff, and t1, we can determine a VLC that compactly represents the totalCoeff and t1. The specific table of values is found in Table 9-5 in the h.264 spec. It's too long to print every enry, but this is what it looks like for our relevant combination:
t1 totalCoeff 0 <= nC < 2 2 <= nC < 4 4 <= nC < 8 8 <= nC nC == -1 nC == -2
1 5 0000 0001 10 0000 110 0100 0 0100 01 - 0000 0001 10
2 5 0000 0010 1 0000 101 0100 1 0100 10 - 0000 0010 0
3 5 0000 100 0011 0 1010 0100 11 - 0001 001
0 6 0000 0000 0111 1 0000 0011 1 0001 001 0101 00 - 0000 0000 111
1 6 0000 0000 110 0000 0110 0011 10 0101 01 - 0000 0000 110


For our combination (nC = 0, totalCoeff = 5, t1 = 3), coeff_token is 0000100.

Step 5: Encoding Trailing 1s
Encoding trailing ones is not too difficult. If t1 is nonzero, begin working backwards through each trailing 1. For each trailing 1 ([-1, -1, 1] in our case), assign 0 to each +1, and 1 to each -1.

For our example, the resulting bit string is 011

Step 6: Encoding remaining coefficients
This step is perhaps the most confusing. Details on this step are forthcoming.
Step 7: Count Zeros
This step is rather simple: ignore all zeros after the last non-zero coefficient. Count the number of remaining zeros (totalZeros). Another VLC table (Table 9-8 in the h.264 spec) is used to determine this bitstream, and is a function of totalZeros and totalCoeff.

For our example (totalZeros = 3, totalCoeff = 5), this corresponds to the bit string 111

Step 8: Encode Run Before
The final step is to encode the number of zeros that precede each non-zero coefficient, including the trailing 1s. Working backwards, for every non-zero coefficient, count the number of zeros that precede the coefficient. Another VLC table (Table 9-10 in the h.264 spec) is used to provide the correct encoding, and is a function of the number of zeros left to be counted, and the number of zeros preceding the coefficient. If the number of zeros left is 0, stop counting run before. Also, if there are zeros that precede the very first non-zero coefficient, an entry for run before is not added since the remaining zeros left is inferred to precede the very first non-zero coefficient.

For our example, iterate through each non-zero coefficient, working backwards.
1. Index 7: Non-zero coefficient: 1, number of preceding zeros: 1, zeros left: 3, resulting bit string: 10
2. Index 5: Non-zero coefficient: -1, number of preceding zeros: 0, zeros left: 2, resulting bit string: 1
3. Index 4: Non-zero coefficient: -1, number of preceding zeros: 0, zeros left: 2, resulting bit string: 1
4. Index 3: Non-zero coefficient: 1, number of precending zeros: 1, zeros left: 2, resulting bit string: 01
5. Index 1: skip, since this is the very last coefficient

I've coded a Javascript demo that takes a nC value and a 4x4 sample data array, and emits the corresponding CAVLC bit stream. You can find the demo here.

Friday, June 7, 2013

Could this be bad?

Looks like a malformed .mp4 file will crash Windows Media Player... I wonder that could potentially be a root exploit...

Wednesday, May 29, 2013

I can haz h.264 encoder?

Nope. Not mine at least... (or not yet anyways).

Off and on the last few months, I've been trying to put together a basic h.264 encoder. You're probably wondering, "why?", and it's either one of 2 "why" questions... "why do that when there's a perfectly good open source library for encoding (libx264) out there?" Or, "why put yourself through the pain? Masochist?" The short and simple answers, 1. I can't for a variety of design reasons which cannot be changed, and 2. because this is the cornerstone of an entire project.

If you're in a similar boat as me - wanting or needing to write your own encoder - where do you start? What do you enter for your google search words? I saw an interview with Anthony Bourdain (popular chef, author, and host of food/travel shows on a variety of cable channels) where someone asked whether they should get into the restaurant business because they had a passion for food. His reply (in so many words) - "You should try to work in a restaurant for a year, for free if you have to, in order to understand what you're getting yourself into. There are those who enjoy the heat, the really hard work, the long hours, the low profit margin... and there are normal people".

I mention this because my experience in trying to gather a starting point for writing an encoder is similar in experience. "h.264 encoder tutorial source code" are all words you'll probably come up with, and the results may leave you scratching your head. You might read some forum posts, find that people are somewhat rude, and tell you, "Go read the spec" as not only their default answer, but their only answer. If after all this, you *still* feel the urge to write an encoder, well, you might be one of the few that Chef Bourdain is talking about.

Some good resources to start off on that are available on the web:
  • The spec. Because this is the document everyone will tell you to read. (And it's annoying that universally used specs have paywalls...)
  • A really simple encoder. I think is a great article to help wade your toes into the waters of h.264 encoding, because those waters are going to get really murky. You want to understand the very basics of how the h.264 encoding process works, because every feature supported in h.264 is, in some way, derived off of this very basic encoding flow.

Some notes on the above links:
- I understand the reason why people will always direct you to read the spec if you have a question, because, the answer will be in there (it will... it might just be hiding sometimes). But... the spec is hard to read. It really is. Read the spec, but understand that the depth of knowledge that you need to understand from the spec may/may not require you to read the entire spec.
- The basic encoder from Ben Mesander is a great starting point. It opens up the door to many questions that you may ask, but it's a great starting point. There is one bug that I've found related to the slice header. Ben uses a static slice header (0x00, 0x00, 0x00, 0x01, 0x05, 0x88, 0x84, 0x21, 0xa0) for all his picture slices. In the comments, someone points out that "0x05" should actually be "0x65". 0x65 is correct, according to the spec. The other issue related to the slice header is that slice headers cannot be static. The spec dictates that consecutive slice headers must have at least one difference between a handful of fields (frame_num, and idr_pic_id for instance). This means that a static header can't be used, but it is ok to toggle between 2 slice headers that change a couple of those bits. My suggestion:

    const uint8_t slice_header1[] = { 0x00, 0x00, 0x00, 0x01, 0x65, 0x88, 0x84, 0x21, 0xa0 };
    const uint8_t slice_header2[] = { 0x00, 0x00, 0x00, 0x01, 0x65, 0x88, 0x94, 0x21, 0xa0 };

In the code that writes the slice header, consider a piece of code like this:
    int i, j, use_slice_header_1 = 1;
    
    ...
    
    if (use_slice_header_1)
      fwrite(slice_header1, 1, sizeof(slice_header1), stdout);
    else
      fwrite(slice_header2, 1, sizeof(slice_header2), stdout);
    
    use_slice_header_1 = !use_slice_header_1;

Note that the slice headers are slightly different. Full disclosure: I haven't completely tested this suggestion, but my basic test does almost the same thing. The bits might be off, in which case, let me know and I'll fix and test it.

I like tools. Moreover, I like tools that are helpful for developers, because I'm a developer at heart. As a developer, I like tools that are simple, accessible, full of information that can be used for all types of debugging, and organized in a way that is comprehensible by everyone that uses it. Since this is a website (and I'm focusing a lot on Javascript, HTML5, etc.), tools should be web-based, as much as possible. Here's a list of tools that I've written that are h.264 related and you may find useful:
  • Exponential Golomb Code Calculator - works for calculating unsigned exponential golomb codes for now. Signed exponential golomb codes will be added soon (blog post about EGCs incoming...)
  • (Added 12/23/2013) h.264 CAVLC Encoder - demo of CAVLC encoding, based on input nC and 4x4 sample data
  • h.264 decoder - incoming

Tutorials I've written related to h.264: Many more posts about h.264 on the way. I'll try to add links here. In the meanwhile, good luck! (I know I needed lots of it)

Tuesday, May 21, 2013

Unsigned Exponential Golomb Codes

Exponential Golomb codes are sequences of binary strings that allow for variable length encoding of numbers. The premise behind this is that for network transmission or data compaction, allocated a fixed number of bytes to store a value is expensive, so the trade-off is data compaction at the cost of slightly higher processing time. Exponential Golomb codes are used a lot in h.264 streams, because using a bit is such a premium, especially when these streams need to be transmitted in real time over a network.

Here's a Javascript calculator that allows you to compute the exponential Golomb code, and vice versa, for all non-negative numbers. (I'll add signed exponential Golomb codes later)

Unsigned Number:



Unsigned Exponential Golomb Code:

Sunday, May 5, 2013

It's been a while...

Still here. Life outside of my programming projects (taxes, children, work, family vacation) have been a huge interruption. I wrote a basic h.264 encoder a while back, and have had trouble getting the output stream to play correctly in your day to day media players. Turns out that after using some tricks in ffmpeg, I'm not encoding frames incorrectly. It's almost definitely that my mp4 container is encoding data incorrectly and throwing everything off. So, I'm now writing an mp4 container encoder. All of these atom types are spectacularly boring, but you have to slog your way through it all to make a credible and functional encoder. I hope to have something working in the coming days and finally I'll be able to play some of my home-made video. Then, I'll have a LOT more to say about h.264, mp4/Quicktime.