Nowadays almost every laptop comes equipped with the camera that is sitting a little bit above the LCD screen, staring at you. Desktops are a different story since monitors with built in cameras are not that common. Unless you’re one of those fruitcakes and you own the Cinema Display. However getting a camera isn’t all that difficult and you will be overwhelmed by the offer. Cameras connected to computers are mostly called webcams1 and they come in various shapes, sizes and prices. Prices range from $10 up to $100 and there is a real difference between $10 camera and $100 one.
Now you decided to become the Jenni of the next decade and you will be using free software to accomplish your goal, right? Before you run to the store and buy the most expensive piece of equipment make sure you know what you are buying. First stop would probably be the Linux UVC driver page where you can check which cameras are supported and which not. Other, non UVC cameras are also supported but the support is limited. You will find more about the other drivers in the LinuxTV wiki.
Your webcam is now sitting on top of your monitor, pointed at your face or under your desk pointed at, well, something, and you want to record whatever the camera is pointed at. I admit, I had no idea on how to do this in Linux. Two weeks ago when I bought the cam I was losing my virginity sort of speak. I got me a Philips SPC 1330NC and I was in luck since it is supported by Linux UVC drivers2.
There are number of programs that can record input from Video4Linux2 source. USB webcams are all v4l2 sources. I had trouble with all the programs I tried and sadly none of them was working like it should or the quality of the recording did not meet my criteria. I was bouncing between ffmpeg and cvlc3 and in both cases I had issues with audio/video sync and quality of the output picture was sometimes questionable.
GStreamer to the rescue
Then she said that I should use GStreamer. I did a little research since I was convinced that GStreamer is only a framework for building other applications. However, it comes equipped with the gst-launch utility which is the swiss-army knife of video and audio manipulation in Linux.
Now let’s get busy building a pipeline!
$ gst-launch v4l2src device=/dev/video0 ! \ 'video/x-raw-yuv,width=640,height=480,framerate=30/1' ! \ xvimagesink
GStreamer has a simple pipeline workflow. First you define the source, then you tell what to do with the source and in the end you tell where to put it by specifying a sink. Sink can be almost anything: a file, your screen or a network which turns your computer into a streaming server. In the above case you tell gst-launch to take video4linux2 as a stream source. Then you link it to a capability, which is defined it by a mime type and a few optional properties, telling v4l2 that you want video, Giving it a proper resolution and a frame rate. In the end you link everything to xvimagesink. This sink will display your stream on the screen.
Now that you have a live feed on your screen you can try putting it in a file.
$ gst-launch v4l2src ! 'video/x-raw-yuv,width=640,height=480,framerate=30/1' ! \ queue ! videorate ! 'video/x-raw-yuv,framerate=30/1' ! theoraenc ! \ queue ! oggmux ! filesink location=me_funny_dancing.ogg
What is different this time? First there is queue which provides some buffer for the next element in the pipeline, videorate. Videorate will take each frame of the input and feed it to the next element at the requested framerate. This is achieved by duplicating or dropping frames. Stream is then linked to theoraenc which encodes raw video into theora stream. Then the stream is linked to oggmux which takes theora stream and muxes it into an ogg container. In the end a properly contained theora encoded video is linked to a filesink which writes all the data to a file. Try playing it!
$ mplayer me_funny_dancing.ogg
See yourself doing a funny dance. But oh, noes! The quality is kind of bad and you didn’t see yourself while you were recording. Let’s take it one step of at a time. First the recording feedback.
$ gst-launch v4l2src ! 'video/x-raw-yuv,width=640,height=480,framerate=30/1' ! \ tee name=t_vid ! queue ! xvimagesink sync=false t_vid. ! queue ! videorate ! \ 'video/x-raw-yuv,framerate=30/1' ! theoraenc ! queue ! oggmux ! \ filesink location=me_funny_dancing.ogg
Yes, this pipeline is getting more and more complicated. With tee you can split data stream into multiple pads and handle each one of them separately. First you create an element tee and name it t_vid then you link first part with xvimagesink so it displays live stream. Then by appending t_vid. you tell gstreamer that you want to split the data stream and you link the rest of it to theora encoder and muxer and filesink like in previous example. You will end up with your pretty video on the screen which is also being recorded into a file.
As we all know that there is no home made porn without sound we somehow need to add sound to the video. But how? Simple, by complicating our already complicated pipeline.
$ gst-launch v4l2src ! 'video/x-raw-yuv,width=640,height=480,framerate=30/1' ! \ tee name=t_vid ! queue ! xvimagesink sync=false t_vid. ! queue ! videorate ! \ video/x-raw-yuv,framerate=30/1 ! theoraenc ! queue ! mux. \ alsasrc device=hw:1,0 ! audio/x-raw-int,rate=48000,channels=2,depth=16 ! \ queue ! audioconvert ! queue ! vorbisenc ! queue ! mux. \ oggmux name=mux ! filesink location=me_funny_dancing.ogg
Are your eyes bleeding? You’re not finished yet, so get ready for more. What in the name of a pipeline is this? You now already know all about splitting the pipeline and how to handle those two splits so let’s go straight to the changes. In the third line you can see that where video should be linked with oggmux it is actually linked with mux. which is a name for your future muxer that you will create later. Next you need to specify one additional source for sound. In my case it is Alsa source and the device I used is hw:1,0 which corresponds with the built in microphone on the webcam. Your device will probably be something else. Just like with the video we need to tell alsasrc what kind of data we want. Mime type is audio/x-raw-int sampling rate is 48kHz with two channels and 16 bit depth. This is sometimes enforced by the audio source and you might have to play with the settings a little bit. You will need to link it to audioconvert element which takes your audio and converts it to something that encoder expects. Vorbisenc is an audio encoder as you probably guessed and it is linked to mux. At last you will create that muxer you were referring to before and link it to the filesink.
There, you’re all set! No wait, don’t run off just yet. What about the quality? Satisfied? No? Then stick around! So, on the fly encoding to theora works but the quality of your recording is not that good. The problem with theora is that even setting the quality to maximum doesn’t help much. You will need something else, something more raw! Try this:
$ gst-launch v4l2src ! 'video/x-raw-yuv,width=640,height=480,framerate=30/1' ! \ tee name=t_vid ! queue ! videoflip method=horizontal-flip ! \ xvimagesink sync=false t_vid. ! queue ! \ videorate ! 'video/x-raw-yuv,framerate=30/1' ! queue ! mux. \ alsasrc device=hw:1,0 ! audio/x-raw-int,rate=48000,channels=2,depth=16 ! queue ! \ audioconvert ! queue ! mux. avimux name=mux ! \ filesink location=me_dancing_funny.avi
Don’t be surprised if you see your mirrored image on the screen. You should, some people prefer the mirrored preview. It certainly makes nose picking easier, right? If you don’t like it that then simply remove the videoflip element. With this pipeline your video will stay uncompressed and will be written in an avi container with the help of avimux. Please note that ten seconds of your video will require around 150 Megabytes of disk space and one minute is close to 1 Gigabyte if you record in 640×480@30fps! Yes it will be huge.
The problem with this raw video is that is not editable in Kdenlive. If you try to open it Kdenlive will simply crash and die in a puff of smoke. My main problem was how to record a video with a decent quality. I tried dozens of combinations, but nothing really worked and raw video was not editable.
And now the/my solution. Quality of H.264 would be good for me, but H.264 encoding takes a lot of time even on a Core2Quad CPU and it can’t be done on the fly. You need to record raw video and then convert to H.264. Here’s how:
$ gst-launch filesrc location=me_funny_dancing.avi ! \ decodebin name=decode decode. ! queue ! x264enc ! mp4mux name=mux ! \ filesink location=me_funny_dancing.mp4 decode. ! \ queue ! audioconvert ! faac ! mux.
This time you need to specify a different source. It needs to be the file which contains your recording. Then you link it to the decodebin which will create a pipeline that you can link to x264enc and then mp4mux. Decodebin works similar to tee, you get more than one pipeline out of it so that you can separetely work with audio and video. You will need to mux audio and video together with mp4mux and write it to file. Careful eye will notice that faac element was added before the ending mux. This is because mp4 container can’t contain raw audio so we convert it to mpeg2/4.
Conversion will take some time, but it is well worth it. Video that you get will be of a very high quality and it can be edited with Kdenlive and later rendered into your final product.
This concludes the first part of Gstreamer tutorial. Next time I will talk more about streaming live feeds to all the viewers out the on the internet.
Comments, suggestions, corrections and flames are well appreciated.
Update: Instead of hw:1,0 (or whatever device you are using) you might try with plughw:1,0. In layman’s terms, plughw is a software abstraction above the hardware device that will do a lot of converting instead of you. It will match sampling rates and channel numbers and so on and so on.