Wednesday, May 29, 2013

I can haz h.264 encoder?

Nope. Not mine at least... (or not yet anyways).

Off and on the last few months, I've been trying to put together a basic h.264 encoder. You're probably wondering, "why?", and it's either one of 2 "why" questions... "why do that when there's a perfectly good open source library for encoding (libx264) out there?" Or, "why put yourself through the pain? Masochist?" The short and simple answers, 1. I can't for a variety of design reasons which cannot be changed, and 2. because this is the cornerstone of an entire project.

If you're in a similar boat as me - wanting or needing to write your own encoder - where do you start? What do you enter for your google search words? I saw an interview with Anthony Bourdain (popular chef, author, and host of food/travel shows on a variety of cable channels) where someone asked whether they should get into the restaurant business because they had a passion for food. His reply (in so many words) - "You should try to work in a restaurant for a year, for free if you have to, in order to understand what you're getting yourself into. There are those who enjoy the heat, the really hard work, the long hours, the low profit margin... and there are normal people".

I mention this because my experience in trying to gather a starting point for writing an encoder is similar in experience. "h.264 encoder tutorial source code" are all words you'll probably come up with, and the results may leave you scratching your head. You might read some forum posts, find that people are somewhat rude, and tell you, "Go read the spec" as not only their default answer, but their only answer. If after all this, you *still* feel the urge to write an encoder, well, you might be one of the few that Chef Bourdain is talking about.

Some good resources to start off on that are available on the web:
  • The spec. Because this is the document everyone will tell you to read. (And it's annoying that universally used specs have paywalls...)
  • A really simple encoder. I think is a great article to help wade your toes into the waters of h.264 encoding, because those waters are going to get really murky. You want to understand the very basics of how the h.264 encoding process works, because every feature supported in h.264 is, in some way, derived off of this very basic encoding flow.

Some notes on the above links:
- I understand the reason why people will always direct you to read the spec if you have a question, because, the answer will be in there (it will... it might just be hiding sometimes). But... the spec is hard to read. It really is. Read the spec, but understand that the depth of knowledge that you need to understand from the spec may/may not require you to read the entire spec.
- The basic encoder from Ben Mesander is a great starting point. It opens up the door to many questions that you may ask, but it's a great starting point. There is one bug that I've found related to the slice header. Ben uses a static slice header (0x00, 0x00, 0x00, 0x01, 0x05, 0x88, 0x84, 0x21, 0xa0) for all his picture slices. In the comments, someone points out that "0x05" should actually be "0x65". 0x65 is correct, according to the spec. The other issue related to the slice header is that slice headers cannot be static. The spec dictates that consecutive slice headers must have at least one difference between a handful of fields (frame_num, and idr_pic_id for instance). This means that a static header can't be used, but it is ok to toggle between 2 slice headers that change a couple of those bits. My suggestion:

    const uint8_t slice_header1[] = { 0x00, 0x00, 0x00, 0x01, 0x65, 0x88, 0x84, 0x21, 0xa0 };
    const uint8_t slice_header2[] = { 0x00, 0x00, 0x00, 0x01, 0x65, 0x88, 0x94, 0x21, 0xa0 };

In the code that writes the slice header, consider a piece of code like this:
    int i, j, use_slice_header_1 = 1;
    
    ...
    
    if (use_slice_header_1)
      fwrite(slice_header1, 1, sizeof(slice_header1), stdout);
    else
      fwrite(slice_header2, 1, sizeof(slice_header2), stdout);
    
    use_slice_header_1 = !use_slice_header_1;

Note that the slice headers are slightly different. Full disclosure: I haven't completely tested this suggestion, but my basic test does almost the same thing. The bits might be off, in which case, let me know and I'll fix and test it.

I like tools. Moreover, I like tools that are helpful for developers, because I'm a developer at heart. As a developer, I like tools that are simple, accessible, full of information that can be used for all types of debugging, and organized in a way that is comprehensible by everyone that uses it. Since this is a website (and I'm focusing a lot on Javascript, HTML5, etc.), tools should be web-based, as much as possible. Here's a list of tools that I've written that are h.264 related and you may find useful:
  • Exponential Golomb Code Calculator - works for calculating unsigned exponential golomb codes for now. Signed exponential golomb codes will be added soon (blog post about EGCs incoming...)
  • (Added 12/23/2013) h.264 CAVLC Encoder - demo of CAVLC encoding, based on input nC and 4x4 sample data
  • h.264 decoder - incoming

Tutorials I've written related to h.264: Many more posts about h.264 on the way. I'll try to add links here. In the meanwhile, good luck! (I know I needed lots of it)

Tuesday, May 21, 2013

Unsigned Exponential Golomb Codes

Exponential Golomb codes are sequences of binary strings that allow for variable length encoding of numbers. The premise behind this is that for network transmission or data compaction, allocated a fixed number of bytes to store a value is expensive, so the trade-off is data compaction at the cost of slightly higher processing time. Exponential Golomb codes are used a lot in h.264 streams, because using a bit is such a premium, especially when these streams need to be transmitted in real time over a network.

Here's a Javascript calculator that allows you to compute the exponential Golomb code, and vice versa, for all non-negative numbers. (I'll add signed exponential Golomb codes later)

Unsigned Number:



Unsigned Exponential Golomb Code:

Sunday, May 5, 2013

It's been a while...

Still here. Life outside of my programming projects (taxes, children, work, family vacation) have been a huge interruption. I wrote a basic h.264 encoder a while back, and have had trouble getting the output stream to play correctly in your day to day media players. Turns out that after using some tricks in ffmpeg, I'm not encoding frames incorrectly. It's almost definitely that my mp4 container is encoding data incorrectly and throwing everything off. So, I'm now writing an mp4 container encoder. All of these atom types are spectacularly boring, but you have to slog your way through it all to make a credible and functional encoder. I hope to have something working in the coming days and finally I'll be able to play some of my home-made video. Then, I'll have a LOT more to say about h.264, mp4/Quicktime.