Monday, March 10, 2008

ICC vs GCC-4.3

Since GCC-4.3.0 is about to be released I decided to take a look at its new intel Core 2 tuning and SSSE3 code generation by emerging the package found on Dirtyepic's overlay. I compared the time it would take to re-encode a video with ffmpeg and a WAV sample with oggenc. The video clip I used can be found here (1920x816 MOV, 1:46, 128.3MB) while the WAV file is just the extracted audio track thereof.

I used these four compiler collections and their CFLAGS:
  1. GCC 4.1.2 (-march=nocona -O3 -pipe -msse3)
  2. GCC 4.2.3 (-march=nocona -O3 -pipe -msse3)
  3. GCC 4.3.0-pre20080302 (-march=core2 -O3 -pipe -mssse3)
  4. ICC 10.1 20080112 (-O3 -xT -ipo -gcc)
My system's specs:
  • Q6600(B3) @ 3.21GHz
  • 400Mhz FSB (266Mhz northbridge strap)
  • 2GB PC3-15000 1603Mhz (8-8-8-24)
  • kernel 2.6.24-gentoo-r3 (kernel lock preemption and preemptible kernel model, 1000Hz timer freq, see config)
I recompiled the following packages with emerge after changing my environment to the appropriate compiler using gcc-config:
  • x11-libs/libXau-1.0.3 USE="-debug"
  • x11-libs/libXdmcp-1.0.2 USE="-debug"
  • x11-libs/libXext-1.0.4 USE="-debug"
  • x11-libs/libX11-1.1.3-r1 USE="ipv6 -debug -xcb"
  • media-libs/libogg-1.1.3
  • media-libs/faac-1.26-r1
  • media-sound/lame-3.97-r1 USE="-debug -mp3rtp"
  • media-libs/xvid-1.1.3-r3 USE="(-altivec) -examples"
  • media-libs/x264-svn-20080301 USE="threads -debug"
  • media-libs/a52dec-0.7.4-r5 USE="-djbfft -oss"
  • media-libs/amrnb-7.0.0.0
  • media-libs/faad2-2.6.1 USE="-drm"
  • media-libs/libpng-1.2.25
  • dev-libs/libxml2-2.6.31 USE="ipv6 python readline -bootstrap -build -debug -doc -examples -test"
  • media-libs/libvorbis-1.2.0 USE="-doc"
  • media-libs/speex-1.2_beta3 USE="ogg sse"
  • media-libs/flac-1.2.1-r2 USE="cxx ogg sse -3dnow (-altivec) -debug -doc"
  • media-libs/libtheora-1.0_beta2-r1 USE="encode -doc -examples"
  • media-libs/freetype-2.3.5-r2 USE="X -bindist -debug -doc -utils"
  • media-libs/giflib-4.1.6 USE="X -rle"
  • media-sound/vorbis-tools-1.2.0 USE="flac nls ogg123 speex"
  • media-video/ffmpeg-0.4.9_p20070616-r2 USE="X a52 aac amr doc encode ieee1394 imlib ipv6 mmx ogg sdl theora threads truetype v4l vorbis x264 xvid zlib (-altivec) -debug -network -oss -test"
All remaining system libraries which ffmpeg and oggenc might link to were compiled with gcc 4.3.0 (e.g. glibc).

Note on ICC: Multifile interprocedural optimizations didn't work for lame, flac, a52dec and faad2, where I needed to resort to single file interprocedural optimizations and thus used '-O3 -xT -ip -gcc'. Also, ICC didn't seem to compile ffmpeg. For that reason I needed to recompile libX11, libXau, libXdmcp and libXext with gcc-4.3.0 or else ffmpeg would complain about symbol lookup errors.

I used the following command for re-encoding the video clip:

ffmpeg -y -i 2744_trailer01-en_1920.mov \
-f avi -vcodec mpeg4 -b 800k -g 300 \
-bf 2 -acodec libfaac output.avi

I repeated it 5 times and got these results:

GCC-4.1.2: 437.24 sec
GCC-4.2.3: 436.98 sec
GCC-4.3.0: 436.17 sec
ICC 10.1: 429.72 sec

For ogg encoding I first extracted the audio track of the clip.

ffmpeg -y -i 2744_trailer01-en_1920.mov \
output.wav

and then encoded it with oggenc:

rm -f output.wav; oggenc output.wav

This command was repeated 30 times and resulted in the following times:

GCC-4.1.2: 217.00 sec
GCC-4.2.3: 216.97 sec
GCC-4.3.0: 206.90 sec
ICC 10.1: 191.91 sec

Doing the graphs I decided to truncate the bars and only show the relevant upper parts. Thus these graphs don't represent absolute values but demonstrate the differences in execution time between the code produced by each compiler collection:

ffmpeg chart
oggenc chart
It turns out that the GCC 4.3 branch yields quite a noticeable performance boost, probably thanks to its new Core 2 tuning option. ICC's optimizations are still unmatched and show that GCC could still need some improvement. After all ICCs lead in video encoding was most probably just caused by its shared libraries (e.g. flac) because ffmpeg itself was compiled with GCC (see above).

As a conclusion, GCC and especially the upcoming release produces code which is more than fast enough for a normal desktop system. Even with libraries that benefit greatly from ICC's vectorization techniques the advantage of ICC over GCC is negligible and wouldn't justify the time spent in recompilation and porting.

Labels: , ,