Monday, December 23, 2013

CAVLC Encoding Tutorial

Context Adaptive Variable Length Coding, or CAVLC, is an encoding process used to compactly represent blocks of data in a 4x4 array. Typically, this would be a 4x4 array of luma values, or chroma values. In a previous post, I mentioned encoding I-frames in I_PCM mode, which expresses each pixel in YUV mode, losing no accuracy in the encoded data in the process. The main drawback of I_PCM mode is that it encodes the raw bytes. But, what if you could rely on surrounding data to infer what should happen in the current macroblock, and if there were any differences, just encode the difference? If that was the case, you might need to encode 4-5 bytes instead of 16 (4 x 4 = 16). In a future article, we'll talk about prediction in I-frames (also referred to as intra-prediction). For now, let's focus on the encoding, and how h.264 expresses sparse data succinctly in order to save bytes.

There are a few, in my opinion, rather brief tutorials out on the Internet on how CAVLC works. I hope this tutorial is a bit more complete and in-depth, as my experience reading through those tutorials ended up with me going back to the h.264 spec, reading the decoding process, and reverse engineering it. As I go through the encoding steps, I'll refer back to variable names in the spec (section 9.2), so if you feel the urge to read the spec, this will correlate closely with what you're reading.

Step 1: nC
nC is an index which points to which table to use for a subsequent lookup that specifies the number of non-zero coefficients and trailing 1s. There are a number of rules that dictate the value of nC; for the extent of this tutorial, we'll ignore these rules and always assume the value of nC to be 0. (We'll revisit the rules to determine nC later)

Step 2: Reordering the array
Assume the following input 4x4 array (which is coincidentally used in a lot of examples out there):
0     3     -1     0
0     -1     1     0
1     0     0     0
0     0     0     0

We want to reorder this array in a specific way, starting from the top left corner, and working our way down diagonally to the bottom right corner. The index of coefficients in the reordered array is in parenthesis:
0 (0)     3 (1)     -1 (5)     0 (6)
0 (2)     -1 (4)     1 (7)     0 (12)
1 (3)     0 (8)     0 (11)     0 (13)
0 (9)     0 (10)     0 (14)      0 (15)

In this example, the resulting reordered array becomes: [ 0, 3, 0, 1, -1, -1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0 ]

Step 3: Counting Non-zero Coefficients and Trailing 1s
Iterating through the reordered array, count the total number of non-zero values (totalCoeff). Also, iterating backwards through the array, count the number of trailing 1s (t1), either a positive or negative 1, up to 3 instances.

In our example, non-zero values occur at indices 1, 3, 4, 5, and 7. totalCoeff is assigned a value of 5. Working backwards through the list, we see that trailing 1s starts at index 7 and a string of 1s occurs at indices 3, 4, 5, and 7. At most we can count 3, so t1 is assigned a value of 3.

Step 4: coeff_token
Taking nC, totalCoeff, and t1, we can determine a VLC that compactly represents the totalCoeff and t1. The specific table of values is found in Table 9-5 in the h.264 spec. It's too long to print every enry, but this is what it looks like for our relevant combination:
t1 totalCoeff 0 <= nC < 2 2 <= nC < 4 4 <= nC < 8 8 <= nC nC == -1 nC == -2
1 5 0000 0001 10 0000 110 0100 0 0100 01 - 0000 0001 10
2 5 0000 0010 1 0000 101 0100 1 0100 10 - 0000 0010 0
3 5 0000 100 0011 0 1010 0100 11 - 0001 001
0 6 0000 0000 0111 1 0000 0011 1 0001 001 0101 00 - 0000 0000 111
1 6 0000 0000 110 0000 0110 0011 10 0101 01 - 0000 0000 110


For our combination (nC = 0, totalCoeff = 5, t1 = 3), coeff_token is 0000100.

Step 5: Encoding Trailing 1s
Encoding trailing ones is not too difficult. If t1 is nonzero, begin working backwards through each trailing 1. For each trailing 1 ([-1, -1, 1] in our case), assign 0 to each +1, and 1 to each -1.

For our example, the resulting bit string is 011

Step 6: Encoding remaining coefficients
This step is perhaps the most confusing. Details on this step are forthcoming.
Step 7: Count Zeros
This step is rather simple: ignore all zeros after the last non-zero coefficient. Count the number of remaining zeros (totalZeros). Another VLC table (Table 9-8 in the h.264 spec) is used to determine this bitstream, and is a function of totalZeros and totalCoeff.

For our example (totalZeros = 3, totalCoeff = 5), this corresponds to the bit string 111

Step 8: Encode Run Before
The final step is to encode the number of zeros that precede each non-zero coefficient, including the trailing 1s. Working backwards, for every non-zero coefficient, count the number of zeros that precede the coefficient. Another VLC table (Table 9-10 in the h.264 spec) is used to provide the correct encoding, and is a function of the number of zeros left to be counted, and the number of zeros preceding the coefficient. If the number of zeros left is 0, stop counting run before. Also, if there are zeros that precede the very first non-zero coefficient, an entry for run before is not added since the remaining zeros left is inferred to precede the very first non-zero coefficient.

For our example, iterate through each non-zero coefficient, working backwards.
1. Index 7: Non-zero coefficient: 1, number of preceding zeros: 1, zeros left: 3, resulting bit string: 10
2. Index 5: Non-zero coefficient: -1, number of preceding zeros: 0, zeros left: 2, resulting bit string: 1
3. Index 4: Non-zero coefficient: -1, number of preceding zeros: 0, zeros left: 2, resulting bit string: 1
4. Index 3: Non-zero coefficient: 1, number of precending zeros: 1, zeros left: 2, resulting bit string: 01
5. Index 1: skip, since this is the very last coefficient

I've coded a Javascript demo that takes a nC value and a 4x4 sample data array, and emits the corresponding CAVLC bit stream. You can find the demo here.

1 comment: