Encoding videos with FFMPEG and H.264

Prelude to encoding

Encoding videos can be a real pain. While ffmpeg is an excellent utility for converting videos between various formats it is also a pain to use. If you put H.264 codec on top of this, you will get a real mess with command lines that are longer than 200 characters. H.264 is a great format and linux1 implementation is hiding in the libx264. It will come pre-installed with Ubuntu 10.04, but if you are more adventurous you should try and install the latest versions from SVN repositories as described on Ubuntu forums. The whole procedure is quite easy and very well documented so in this post I will assume that you have installed the latest versions.

The focus of this test and tutorial will be on video production, more specifically on an average YouTube Vlogger video. This means, one pass encoding, high compression and very uniform videos. For encoding movies and other videos with very dynamic content and big changes in the scenes you will have to look elsewhere of wait for another post. :)

Test setup

For the test video I used a short clip that was provided by Nixie Pixel, do check her out on YouTube. Video was recorded with Logitech QuickCam Pro 9000 with the resolution 960×720 at 25 frames per second. GStreamer‘s gst-launch utility was used for capturing and the clip was recorded as raw data – video/x-raw-yuv. Audio was captured with an external studio USB microphone and encoded with 1536 kb/s.

The initial file size was 745 MB (780252094 bytes) for a 29.8 seconds of video. All encoding was done on an old Core2Quad Q6600 running at 2.4 GHz with 4GB of RAM. I used -threads 0 parameter with ffmpeg to use all CPU cores while encoding. System load during conversion was around 5.5 and it took from just a few seconds up to two minutes to encode the whole clip.

Latest version of ffmpeg comes with a decent number of preset options for encoding with libx264. I will not go into details about each and every option because that would be a wrist slashing material and quite a long read. However, I will use the presets to simplify the encoding. I decided to use six different presets: hq, veryslow, default, fast, veryfast and lossless_slower. With this I used two different crf2 parameters: 15 and 28. They are both maximum and minimum of what is usually recommend. 15 being lowest compression and highest quality/bitrate and 28 being the opposite. Time was measured with the time command and file size was reported by ls -lh command.

Encoding command that I used while working with different -vpre and -crf parameters:

$ ffmpeg -threads 0 -i bw-test.avi -vcodec libx264 -vpre default -crf 15.0 \
  -acodec libmp3lame -ab 64k -ar 48000 -ac 2 -f mp4 out_default_15.mp4

Sound was always encoded with libmp3lame, I know that mp4 files should have AAC encoded sound, but a lot of times audio and video were out of sync after YouTube processed the video. That is why I usually stick with mp3 encoding for sound. For vlog type videos, I encode sound with only 64kbs. Which should be more than enough for human speech.

The results

Results include time needed to perform encoding, file size of the encoded clip and a couple of screen shots. One of Nixie talking and the other of llama walking, so there is at least some motion blur on the video from the llama.

$ ffmpeg -i bw-test.avi -ss 00:00:6.7 -vframes 1 -vcodec png \
  -f image2 sample_original_1.png
$ ffmpeg -i bw-test.avi -ss 00:00:24.10 -vframes 1 -vcodec png \
  -f image2 sample_original_2.png

Here are the two screenshots from the original uncompressed clip, clicking on them will give you a larger version.

Results are in the table below, clicking on time and file size will open both captured frames for the respective test clip.

crfhqveryslowdefaultfastveryfastlossless_slower
150m 43s
16 MB
2m 50s
15MB
0m 31s
18 MB
0m 24s
17 MB
0m 9s
16 MB
0m 41s
17 MB
280m 22s
2.3 MB
1m 18s
1.9 MB
0m 14s
2.3 MB
0m 12s
2.5 MB
0m 6s
2.1 MB
0m 19s
2.2 MB

Complete test video is available on YouTube, I decided to upload the -crf 15 -vpre veryfast version of the video.

The conclusion

End results are a little bit surprising. After numerous tests I decided that I can only show a relevant difference between both extremes; minimum and maximum compression. If you compare the -crf 15 screenshots, they are almost identical. Even when comparing very fast and very slow or lossless presets.

When the value of crf is increased the quality will decrease and when it reaches 28 the change is very obvious. If you use -crf 20 you will get a decent balance between the file size and the quality. Just to be sure, I spent some time on YouTube watching various vlogs and other stuff that people post there. After seeing that, I am pretty sure that you can use the worse settings that I mentioned before. Veryfast preset and -crf 28. Your videos will be encoded in no time and upload will be fast and nobody will notice the loss of quality. ;)

And now the time has come to talk of many things, of shoes and ship and sealing wax, of cabbages and kings and why the sea is boiling hot and whether pigs have wings. :)



Footnotes:
  1. and other similar operating systems []
  2. CRF – constant rate factor []


Leave a comment