Off and on the last few months, I've been trying to put together a basic h.264 encoder. You're probably wondering, "why?", and it's either one of 2 "why" questions... "why do that when there's a perfectly good open source library for encoding (libx264) out there?" Or, "why put yourself through the pain? Masochist?" The short and simple answers, 1. I can't for a variety of design reasons which cannot be changed, and 2. because this is the cornerstone of an entire project.
If you're in a similar boat as me - wanting or needing to write your own encoder - where do you start? What do you enter for your google search words? I saw an interview with Anthony Bourdain (popular chef, author, and host of food/travel shows on a variety of cable channels) where someone asked whether they should get into the restaurant business because they had a passion for food. His reply (in so many words) - "You should try to work in a restaurant for a year, for free if you have to, in order to understand what you're getting yourself into. There are those who enjoy the heat, the really hard work, the long hours, the low profit margin... and there are normal people".
I mention this because my experience in trying to gather a starting point for writing an encoder is similar in experience. "h.264 encoder tutorial source code" are all words you'll probably come up with, and the results may leave you scratching your head. You might read some forum posts, find that people are somewhat rude, and tell you, "Go read the spec" as not only their default answer, but their only answer. If after all this, you *still* feel the urge to write an encoder, well, you might be one of the few that Chef Bourdain is talking about.
Some good resources to start off on that are available on the web:
- The spec. Because this is the document everyone will tell you to read. (And it's annoying that universally used specs have paywalls...)
- A really simple encoder. I think is a great article to help wade your toes into the waters of h.264 encoding, because those waters are going to get really murky. You want to understand the very basics of how the h.264 encoding process works, because every feature supported in h.264 is, in some way, derived off of this very basic encoding flow.
Some notes on the above links:
- I understand the reason why people will always direct you to read the spec if you have a question, because, the answer will be in there (it will... it might just be hiding sometimes). But... the spec is hard to read. It really is. Read the spec, but understand that the depth of knowledge that you need to understand from the spec may/may not require you to read the entire spec.
- The basic encoder from Ben Mesander is a great starting point. It opens up the door to many questions that you may ask, but it's a great starting point. There is one bug that I've found related to the slice header. Ben uses a static slice header (0x00, 0x00, 0x00, 0x01, 0x05, 0x88, 0x84, 0x21, 0xa0) for all his picture slices. In the comments, someone points out that "0x05" should actually be "0x65". 0x65 is correct, according to the spec. The other issue related to the slice header is that slice headers cannot be static. The spec dictates that consecutive slice headers must have at least one difference between a handful of fields (frame_num, and idr_pic_id for instance). This means that a static header can't be used, but it is ok to toggle between 2 slice headers that change a couple of those bits. My suggestion:
const uint8_t slice_header1[] = { 0x00, 0x00, 0x00, 0x01, 0x65, 0x88, 0x84, 0x21, 0xa0 };
const uint8_t slice_header2[] = { 0x00, 0x00, 0x00, 0x01, 0x65, 0x88, 0x94, 0x21, 0xa0 };
In the code that writes the slice header, consider a piece of code like this:
int i, j, use_slice_header_1 = 1;
...
if (use_slice_header_1)
fwrite(slice_header1, 1, sizeof(slice_header1), stdout);
else
fwrite(slice_header2, 1, sizeof(slice_header2), stdout);
use_slice_header_1 = !use_slice_header_1;
Note that the slice headers are slightly different. Full disclosure: I haven't completely tested this suggestion, but my basic test does almost the same thing. The bits might be off, in which case, let me know and I'll fix and test it.
I like tools. Moreover, I like tools that are helpful for developers, because I'm a developer at heart. As a developer, I like tools that are simple, accessible, full of information that can be used for all types of debugging, and organized in a way that is comprehensible by everyone that uses it. Since this is a website (and I'm focusing a lot on Javascript, HTML5, etc.), tools should be web-based, as much as possible. Here's a list of tools that I've written that are h.264 related and you may find useful:
- Exponential Golomb Code Calculator - works for calculating unsigned exponential golomb codes for now. Signed exponential golomb codes will be added soon (blog post about EGCs incoming...)
- (Added 12/23/2013) h.264 CAVLC Encoder - demo of CAVLC encoding, based on input nC and 4x4 sample data
- h.264 decoder - incoming
Tutorials I've written related to h.264:
- (Added 12/23/2013) CAVLC Encoding