How to reencode a large Matroska container to something smaller

Carlo Wood, Jul 2007

Used utilities

This recipe uses the following utilities:

UtilityDebian Package
mkvmergemkvtoolnix
mkvextractmkvtoolnix
mplayermplayer
mencodermencoder
MP4Boxgpac
mp4creatormpeg4ip-server
bcbc
statcoreutils
exprcoreutils

The recipe

Suppose you have a large Matroska container (extension .mkv or .mka) and you want to make a smaller version. For the sake of copying and pasting command, I'll use environment variables. Let MKVFILE be the original Matroska container:

MKVFILE="the.matrix.revolutions.2003.dvd9.720p.hddvd.x264-hv.mkv"

The first step is to extract the audio. You don't want to touch the audio, so we'll leave it intact. However, we need to know it's size, therefore we start with extracting it.

List all the tracks in Matroska container:

$ mkvmerge -i "$MKVFILE"
File 'the.matrix.revolutions.2003.dvd9.720p.hddvd.x264-hv.mkv': container: Matroska
Track ID 1: video (V_MPEG4/ISO/AVC)
Track ID 2: audio (A_AC3)

See the NOTES of man mkvextract for the meaning of (A_AC3) and (V_MPEG4/ISO/AVC).

The audio is in track 2, therefore extract track 2:

mkvextract tracks "$MKVFILE" 2:audio.ac3

Store the size in bytes in AUDIOSIZE:

AUDIOSIZE=$(stat -c%s audio.ac3); echo $AUDIOSIZE

Next, we need to calculate new values for the resolution and bitrate. In order to do so we need some information about the original file.

mplayer -identify -v -frames 0 "$MKVFILE" 2>/dev/null | grep '^\[mkv\]'

This should tell you the duration of the movie in seconds, the frames per second of the video track, and the resolutie (Pixel width and Pixel height). For example:

$ mplayer -identify -v -frames 0 "$MKVFILE" 2>/dev/null | grep '^\[mkv\]' | \
    egrep 'duration|Track type: Video|Pixel width|Pixel height'
[mkv] | + duration: 7755.936s
[mkv] |  + Default duration: 32.000ms ( = 31.250 fps)
[mkv] |  + Track type: Video
[mkv] |  + Default duration: 41.708ms ( = 23.976 fps)
[mkv] |   + Pixel width: 1280
[mkv] |   + Pixel height: 528

Put those numbers in variables,

DURATION=7755.936
FPS=23.976024
MFPS=24000/1001
WIDTH=1280
HEIGHT=528

Note that I used 23.976024, because the real value is exactly 24000/1001. The variable MFPS is set to this quotient with 1001 as divisor. See the man page of mencoder for hints of valid possibilities. Also note that 1280 and 528 are multiples of 16, in order to not waste valuable bandwidth while encoding a movie, you want to keep it that way.

Lets make an estimate of the new resolution and bitrate. In order to do that, we need the original average bitrate, which is the original raw video stream size devided by the duration. If you want to do this precisely, you'd have to execute the following commands:

mkvextract tracks "$MKVFILE" 1:video.h264
VIDEOSIZE=$(stat -c%s video.h264)

But, if the audio and video track are the only two tracks, you might as well just set

VIDEOSIZE=$(expr $(stat -c%s "$MKVFILE") - $AUDIOSIZE); echo "$VIDEOSIZE bytes"

The original average bitrate in kbit/s is then,

BITRATE=$(echo "scale=2; $VIDEOSIZE * 8 / $DURATION / 1000" | bc); echo "$BITRATE kbit/s"

Next we have to choose the target size. Lets say we want to put the result on a single layered DVD, which can contain around 4.7 * 1,000,000,000 bytes. Due to extra bytes needed for ext2, I couldn't get more than 4685697024 bytes on that. To play it safe, lets set the maximum target file size to 4680000000. [ Be warned that if you really want to write the result to a DVD, you cannot write files larger than 4GB (4294967295) bytes on an iso9660 filesystem; You can still write the large file as, for example, an ext2 filesystem image to the DVD, of course. ]

MAXTARGETSIZE=4280000000        # Write image as iso9660 (can be used with K3B DVD data projects).
# OR, if you know what you're doing (do you know how to create a ext2 image?!)
MAXTARGETSIZE=4680000000	# Write image as ext2. You will only be able to mount/read this DVD on linux.

This means that our video target size is about,

MAXVIDEOSIZE=$(expr $MAXTARGETSIZE - $AUDIOSIZE); echo "$MAXVIDEOSIZE bytes"

The new bitrate will then be something like,

NEWBITRATE=$(echo "scale=6; tmp = $BITRATE * $MAXVIDEOSIZE / $VIDEOSIZE + 0.5; scale=0; tmp / 1" | bc)
echo "$NEWBITRATE kbit/s"

Following this documentation, we'll have the bitrate scale proportional to the square root of resolution, hence bitrate / sqrt(width * height) = constant. This constant is thus,

RATIO=$(echo "scale=3; $BITRATE / sqrt ( $WIDTH * $HEIGHT )" | bc)

Furthermore, we want to keep the aspect ratio approximately the same (not entirely, because we need to do some cropping in order to end up with a resolution that is a multiple of 16 again). The approximate target aspect ratio is thus,

ASPECTRATIO=$(echo "scale=6; $WIDTH / $HEIGHT" | bc); echo $ASPECTRATIO

Now we can calculate the new height. Make sure it's a multiple of 16:

NEWHEIGHT=$(echo "scale=6; tmp = $NEWBITRATE / ( $RATIO * sqrt( $ASPECTRATIO ) ); scale=0; (tmp + 8) / 16 * 16" | bc)
NEWWIDTH=$(echo "scale=6; tmp = $NEWHEIGHT * $ASPECTRATIO; scale=0; tmp / 16 * 16" | bc)
echo "$NEWWIDTH x $NEWHEIGHT"

Because this changed the aspect ratio, we calculate how much pixels we need to crop the width of the original to get the same aspect ratio again:

NEWASPECTRATIO=$(echo "scale=6; $NEWWIDTH / $NEWHEIGHT" | bc); echo $NEWASPECTRATIO

Once you have chosen NEWWIDTH and NEWHEIGHT such that NEWASPECTRATIO is less than ASPECTRATIO - we can calculate the number of pixels to crop the original width:

CROP=$(echo "scale=6; tmp = $WIDTH - $NEWASPECTRATIO * $HEIGHT + 0.5; scale=0; tmp / 1" | bc)
LEFTCROP=$(echo "scale=0; $CROP / 2" | bc)
CROPPEDWIDTH=$(echo "$WIDTH - $CROP" | bc)
echo "Needed crop is $CROP pixels."

Double check that CROP is non-negative.

Now we can finally transform the video. Because mencoder is not capable of creating a sensible multiplexed MPEG4 (in the presence of B frames anyway), either giving errors or resulting in desynced audio; we have to ignore the audio (to avoid the errors) and generate a RAW video (h264), instead of directly an mpeg4 file. This path is a bit risky, especially because mencoder doesn't write all needed information to the video stream - but it turns out to work for me in those cases that I tried it.

We'll go for the highest quality transformation. Read the documention of mencoder to find the meaning of the -x264encopts. Also see this page.

mencoder "$MKVFILE" -vf crop=$CROPPEDWIDTH:$HEIGHT:$LEFTCROP:0,scale=$NEWWIDTH:$NEWHEIGHT,hqdn3d=2:1:2 -ovc x264 \
    -x264encopts subq=6:partitions=all:8x8dct:me=umh:frameref=5:bframes=3:b_pyramid:weight_b:threads=auto:bitrate=$NEWBITRATE:pass=1 \
    -of rawvideo -ofps $MFPS -mc 0 -noskip -nosound -o /dev/null

And the second and third pass both use pass=3:

mencoder "$MKVFILE" -vf crop=$CROPPEDWIDTH:$HEIGHT:$LEFTCROP:0,scale=$NEWWIDTH:$NEWHEIGHT,hqdn3d=2:1:2 -ovc x264 \
    -x264encopts subq=6:partitions=all:8x8dct:me=umh:frameref=5:bframes=3:b_pyramid:weight_b:threads=auto:bitrate=$NEWBITRATE:pass=3 \
    -of rawvideo -ofps $MFPS -mc 0 -noskip -nosound -o /dev/null
rm -f video.h264
mencoder "$MKVFILE" -vf crop=$CROPPEDWIDTH:$HEIGHT:$LEFTCROP:0,scale=$NEWWIDTH:$NEWHEIGHT,hqdn3d=2:1:2 -ovc x264 \
    -x264encopts subq=6:partitions=all:8x8dct:me=umh:frameref=5:bframes=3:b_pyramid:weight_b:threads=auto:bitrate=$NEWBITRATE:pass=3 \
    -of rawvideo -ofps $MFPS -mc 0 -noskip -nosound -o video.h264

Don't forget to remove video.h264 before writing, because if the file exists, it will be overwritten - but not truncated: the result will be a file with the largest possible size and the end of the longest file will be still be visible after the end of the result. But anyway, mencoder should end with a summary printing the size; for example:

Video stream: 3892.795 kbit/s  (486599 B/s)  size: 3773719241 bytes  7755.289 secs  185943 frames
x264 [info]: slice I:3403  Avg QP:10.00  size: 47050
x264 [info]: slice P:82214 Avg QP:10.00  size: 29983
x264 [info]: slice B:100324 Avg QP:11.71  size: 11448
x264 [info]: mb I  I16..4: 15.1% 34.5% 50.4%
x264 [info]: mb P  I16..4:  3.6% 12.6% 16.2%  P16..4: 34.4% 16.4%  9.2%  2.0%  1.7%    skip: 4.0%
x264 [info]: mb B  I16..4:  0.7%  1.0%  2.4%  B16..8: 32.7%  5.7% 17.6%  direct:24.2%  skip:15.7%
x264 [info]: 8x8 transform  intra:36.7%  inter:28.7%
x264 [info]: ref P  65.8% 15.7%  8.2%  5.5%  4.7%
x264 [info]: ref B  75.2% 13.0%  5.3%  3.7%  2.9%
x264 [info]: kb/s:3892.7

Here you can see that the size is 3773719241 bytes (which be smaller than MAXVIDEOSIZE). We can also see that there are I, P and B frames and that the real bitrate is 3892.7 kbit/s.

Now we transform this file into an mpeg4 file using MP4Box:

rm -f video.mp4
MP4Box -fps $FPS -add video.h264 video.mp4

If you get the error message "gf_import_h264: Assertion `nal_start' failed.", then have a look here. Also note that MP4Box writes to /tmp, creating a file there with a size a little more than the size of video.h264. If it runs out of disk space, you do not get a warning or error-- but it will crash in the second phase. Make sure you have enough room in /tmp thus. Or alternatively you can try instead,

rm -f video.mp4
mp4creator -c video.h264 -rate $FPS video.mp4

Also here it is important to delete video.mp4 before running either command— or it will create a mess by trying to append the stream.

For me this command prints "Error decoding sei message", but it seems you can ignore that. Just wait patiently until it is finished (it does it's job quietly, which takes a while).

The last step is then to multiplex the video with the audio and put them in a Matroska container again. Run the utility mmg and add both audio.ac3 and video.mp4, and pick an appropriate output filename and click 'Start muxing'. Or, alternatively, run mkvmerge directly with a command line like:

mkvmerge -o the.matrix.revolutions.2003.DVD5.x264.mkv -a 0 -D -S audio.ac3 -d 1 -A -S video.mp4 --track-order 0:0,1:1

Note that if you need to correct an audio/video desync, this can be done in this last command line. For example, suppose that in the original you had to press the minus key twice in mplayer, delaying the audio 200 ms, then by adding --delay 0:200ms directly in front of audio.ac3, the audio track will be delayed 200 ms. See man mkvmerge for more details.