Media manual

Domain knowledge

This tutorial is a good intro for building some domain knowledge. Bear in mind that the tutorial is rather old, and some ffmpeg functions have become deprecated - but the basics are still valid.

In the FFmpeg base code there is the ffplay.c player - a very good way to see how things are managed. In particular, some newer FFmpeg functions are used, while current pyglet media code still uses functions that have now been deprecated.

Current code architecture

The overview of the media code is the following:


Found in media/sources folder.

Source s represent data containing media information. They can come from disk or be created in memory. A Source ‘s responsibility is to read data into and provide audio and/or video data out of its stream. Essentially, it’s a producer.


One implementation of the StreamingSource is the FFmpegSource. It implements the Source base class by calling FFmpeg functions wrapped by ctypes and found in media/sources/ffmpeg_lib. They offer basic functionalities for handling media streams, such as opening a file, reading stream info, reading a packet, and decoding audio and video packets.

The FFmpegSource maintains two queues, one for audio packets and one for video packets, with a pre-determined maximum size. When the source is loaded, it will read packets from the stream and will fill up the queues until one of them is full. It has then to stop because we never know what type of packet we will get next from the stream. It could be a packet of the same type as the filled up queue, in which case we would not be able to store the additional packet.

Whenever a Player - a consumer of a source - asks for audio data or a video frame, the Source will pop the next packet from the appropriate queue, decode the data, and return the result to the Player. If this results in available space in both audio and video queues, it will read additional packets until one of the queues is full again.


Found in media/

The Player is the main object that drives the source. It maintains an internal sequence of sources or iterator of sources that it can play sequentially. Its responsibilities are to play, pause and seek into the source.

If the source contains audio, the Player will instantiate an AudioPlayer by asking the SoundDriver to create an appropriate AudioPlayer for the given platform. The AudioDriver is a singleton created according to which drivers are available. Currently supported sound drivers are: DirectSound, PulseAudio and OpenAL.

If the source contains video, the Player has a get_texture() method returning the current video frame.

The player has an internal master clock which is used to synchronize the video and the audio. The audio synchronization is delegated to the AudioPlayer. More info found below. The video synchronization is made by asking the Source for the next video timestamp. The Player then schedules on pyglet event loop a call to its update_texture() with a delay equals to the difference between the next video timestamp and the master clock current time.

When update_texture() is called, we will check if the actual master clock time is not too late compared to the video timestamp. This could happen if the loop was very busy and the function could not be called on time. In this case, the frame would be skipped until we find a frame with a suitable timestamp for the current master clock time.


Found in media/drivers

The AudioPlayer is responsible only for the audio data. It can read, pause, and seek into the Source.

In order to accomplish these tasks, the audio player keeps a reference to the AudioDriver singleton which provides access to the lower level functions for the selected audio driver.

When instructed to play, it will register itself on pyglet event loop and check every 0.1 seconds if there is enough space in its audio buffer. If so it will ask the source for more audio data to refill its audio buffer. It’s also at this time that it will check for the difference between the estimated audio time and the Player master clock. A weighted average is used to smooth the inaccuracies of the audio time estimation as explained in If the resulting difference is too big, the Source get_audio_data() method has a compensation_time argument which allows it to shorten or stretch the number of audio samples. This allows the audio to get back in synch with the master clock.


Found in media/drivers

The AudioDriver is a wrapper around the low-level sound driver available on the platform. It’s a singleton. It can create an AudioPlayer appropriate for the current AudioDriver.

Normal operation of the Player

The client code instantiates a media player this way:

player =
source =

When the client code runs

The Player will check if there is an audio track on the media. If so it will instantiate an AudioPlayer appropriate for the available sound driver on the platform. It will create an empty Texture if the media contains video frames and will schedule its update_texture() to be called immediately. Finally it will start the master clock.

The AudioPlayer will ask its Source for audio data. The Source will pop the next available audio packet and will decode it. The resulting audio data will be returned to the AudioPlayer. If the audio queue and the video queues are not full, the Source will read more packets from the stream until one of the queues is full again.

When the update_texture() method is called, the next video timestamp will be checked with the master clock. We allow a delay up to the frame duration. If the master clock is beyond that time, the frame will be skipped. We will check the following frames for its timestamp until we find the appropriate frame for the master clock time. We will set the texture to the new video frame. We will check for the next video frame timestamp and we will schedule a new call to update_texture() with a delay equals to the difference between the next video timestamps and the master clock time.

Helpful tools

I’ve found that using the binary ffprobe is a good way to explore the content of a media file. Here’s a couple of things which might be interesting and helpful:

ffprobe samples_v1.01\SampleVideo_320x240_1mb.3gp -show_frames

This will show information about each frame in the file. You can choose only audio or only video frames by using the v flag for video and a for audio.:

ffprobe samples_v1.01\SampleVideo_320x240_1mb.3gp -show_frames -select_streams v

You can also ask to see a subset of frame information this way:

ffprobe samples_v1.01\SampleVideo_320x240_1mb.3gp -show_frames
-select_streams v -show_entries frame=pkt_pts,pict_type

Finally, you can get a more compact view with the additional compact flag:

ffprobe samples_v1.01SampleVideo_320x240_1mb.3gp -show_frames -select_streams v -show_entries frame=pkt_pts,pict_type -of compact

Convert video to mkv

ffmpeg -i <original_video> -c:v libx264 -preset slow -profile:v high -crf 18
-coder 1 -pix_fmt yuv420p -movflags +faststart -g 30 -bf 2 -c:a aac -b:a 384k
-profile:a aac_low <outputfilename.mkv>