Monday, March 10, 2008

ICC vs GCC-4.3

Since GCC-4.3.0 is about to be released I decided to take a look at its new intel Core 2 tuning and SSSE3 code generation by emerging the package found on Dirtyepic's overlay. I compared the time it would take to re-encode a video with ffmpeg and a WAV sample with oggenc. The video clip I used can be found here (1920x816 MOV, 1:46, 128.3MB) while the WAV file is just the extracted audio track thereof.

I used these four compiler collections and their CFLAGS:
  1. GCC 4.1.2 (-march=nocona -O3 -pipe -msse3)
  2. GCC 4.2.3 (-march=nocona -O3 -pipe -msse3)
  3. GCC 4.3.0-pre20080302 (-march=core2 -O3 -pipe -mssse3)
  4. ICC 10.1 20080112 (-O3 -xT -ipo -gcc)
My system's specs:
  • Q6600(B3) @ 3.21GHz
  • 400Mhz FSB (266Mhz northbridge strap)
  • 2GB PC3-15000 1603Mhz (8-8-8-24)
  • kernel 2.6.24-gentoo-r3 (kernel lock preemption and preemptible kernel model, 1000Hz timer freq, see config)
I recompiled the following packages with emerge after changing my environment to the appropriate compiler using gcc-config:
  • x11-libs/libXau-1.0.3 USE="-debug"
  • x11-libs/libXdmcp-1.0.2 USE="-debug"
  • x11-libs/libXext-1.0.4 USE="-debug"
  • x11-libs/libX11-1.1.3-r1 USE="ipv6 -debug -xcb"
  • media-libs/libogg-1.1.3
  • media-libs/faac-1.26-r1
  • media-sound/lame-3.97-r1 USE="-debug -mp3rtp"
  • media-libs/xvid-1.1.3-r3 USE="(-altivec) -examples"
  • media-libs/x264-svn-20080301 USE="threads -debug"
  • media-libs/a52dec-0.7.4-r5 USE="-djbfft -oss"
  • media-libs/amrnb-7.0.0.0
  • media-libs/faad2-2.6.1 USE="-drm"
  • media-libs/libpng-1.2.25
  • dev-libs/libxml2-2.6.31 USE="ipv6 python readline -bootstrap -build -debug -doc -examples -test"
  • media-libs/libvorbis-1.2.0 USE="-doc"
  • media-libs/speex-1.2_beta3 USE="ogg sse"
  • media-libs/flac-1.2.1-r2 USE="cxx ogg sse -3dnow (-altivec) -debug -doc"
  • media-libs/libtheora-1.0_beta2-r1 USE="encode -doc -examples"
  • media-libs/freetype-2.3.5-r2 USE="X -bindist -debug -doc -utils"
  • media-libs/giflib-4.1.6 USE="X -rle"
  • media-sound/vorbis-tools-1.2.0 USE="flac nls ogg123 speex"
  • media-video/ffmpeg-0.4.9_p20070616-r2 USE="X a52 aac amr doc encode ieee1394 imlib ipv6 mmx ogg sdl theora threads truetype v4l vorbis x264 xvid zlib (-altivec) -debug -network -oss -test"
All remaining system libraries which ffmpeg and oggenc might link to were compiled with gcc 4.3.0 (e.g. glibc).

Note on ICC: Multifile interprocedural optimizations didn't work for lame, flac, a52dec and faad2, where I needed to resort to single file interprocedural optimizations and thus used '-O3 -xT -ip -gcc'. Also, ICC didn't seem to compile ffmpeg. For that reason I needed to recompile libX11, libXau, libXdmcp and libXext with gcc-4.3.0 or else ffmpeg would complain about symbol lookup errors.

I used the following command for re-encoding the video clip:

ffmpeg -y -i 2744_trailer01-en_1920.mov \
-f avi -vcodec mpeg4 -b 800k -g 300 \
-bf 2 -acodec libfaac output.avi

I repeated it 5 times and got these results:

GCC-4.1.2: 437.24 sec
GCC-4.2.3: 436.98 sec
GCC-4.3.0: 436.17 sec
ICC 10.1: 429.72 sec

For ogg encoding I first extracted the audio track of the clip.

ffmpeg -y -i 2744_trailer01-en_1920.mov \
output.wav

and then encoded it with oggenc:

rm -f output.wav; oggenc output.wav

This command was repeated 30 times and resulted in the following times:

GCC-4.1.2: 217.00 sec
GCC-4.2.3: 216.97 sec
GCC-4.3.0: 206.90 sec
ICC 10.1: 191.91 sec

Doing the graphs I decided to truncate the bars and only show the relevant upper parts. Thus these graphs don't represent absolute values but demonstrate the differences in execution time between the code produced by each compiler collection:

ffmpeg chart
oggenc chart
It turns out that the GCC 4.3 branch yields quite a noticeable performance boost, probably thanks to its new Core 2 tuning option. ICC's optimizations are still unmatched and show that GCC could still need some improvement. After all ICCs lead in video encoding was most probably just caused by its shared libraries (e.g. flac) because ffmpeg itself was compiled with GCC (see above).

As a conclusion, GCC and especially the upcoming release produces code which is more than fast enough for a normal desktop system. Even with libraries that benefit greatly from ICC's vectorization techniques the advantage of ICC over GCC is negligible and wouldn't justify the time spent in recompilation and porting.

Labels: , ,


Thursday, January 24, 2008

HDAPS patch for 2.6.24-rc8

I've been a Thinkpad fan since the first time I laid hand on one so my next portable companion had to be the model X61 which turned out to be exceptionally reliable (and portable), with all its features running in Linux including the fingerprint sensor and the hard disk Active Protection System (HDAPS). Since I've had reasons to use iwlwifi drivers for my 3945ABG Wifi card I chose to use the in-kernel driver provided by the upcoming 2.6.24 release. This kernel also supports the intel HDA sound card well. I prefer to have all drivers in one place other than having to compile them externally, so this was the way to go.
What's lacking was the HDAPS disk parking kernel patch which isn't yet included in the mainstream kernel. Up until release candidate 6 (don't know about 7) the patch 1077-002.patch found on thinkwiki.org worked after some fiddling with the line numbers. As of rc8 I needed to swap some functions around in order to make it compile, the resulting patch can be found here.

Maybe later I'll take a look at the 'error check fix' mentioned on thinkwiki.org as well.

Labels: ,


Sunday, October 21, 2007

Software RAID on Gentoo and Debian

Up until now I backed up my hard drive with important files with unison. I did that by regularly plugging in a similar second hard drive into one of my empty drive bays, running unison, waiting for it to finish and finally pulling the new hard drive out again to store it on my shelf. This obviously spares at least one drive from running all the time but is a very tedious task, especially if it's IDE drives which aren't hot-swappable.

So I've recently gone through the very unpleasant course of loosing the data on one of my old IDE hard drives, of course the one which I don't back up with unison but located in the same file server. Rescuing data wasn't possible, even forensic tools like foremost and scalpel couldn't retrieve all the files, e.g. mp3 files were all either 7.2MB or 42MB in size and contained everything but a valid music stream. foremost didn't even finish and segfaulted after carving about 30% of the disk.

To make life easier and to possibly prevent such tragedies from happening in the future, I ordered two identical Seagate SATA hard drives, each 500GB. While Seagate doesn't precisely build silent drives but grants five years of warranty, these seemed to be perfect for the file server which is running in the cellar. But while planning on using RAID1 for the Seagate drives, I decided to do the same with my workstation - but this time not for reasons of redudancy but for the sake of access speed - and bought two Western Digital HDDs of the same size which promised to be less loud and also had 16MB of cache (instead of 8MB on the Seagate disks).

Once they arrived I added a spare PCI SATA RAID controller with a SiliconImage chipset into the file server and defined a RAID1 array in the card's BIOS setup tool. The stock Debian kernel contained all needed modules and after installing the dmraid with apt, I could view the setup

# dmraid -r
/dev/sda: sil, "sil_ahbgafcdfhah", mirror, ok, 976771120 sectors, data@ 0
/dev/sdb: sil, "sil_ahbgafcdfhah", mirror, ok, 976771120 sectors, data@ 0

# dmraid -s sil_ahbgafcdfhah
*** Active Set
name : sil_ahbgafcdfhah
size : 976771120
stride : 0
type : mirror
status : ok
subsets: 0
devs : 2
spares : 0

Mounting worked the usual way but instead of using the raw device node, you'd need to mount the respective mapped device, in my case /dev/mapper/sil_ahbgafcdfhah .

My workstation also has a (fake/software) RAID controller (onboard ICH7) but it wasn't as easy to set up because the RAID1 array acts as the boot drive and the kernel needs to be booted accordingly. Also, grub doesn't recognize RAID setups and thus I needed to create a separate small primary partition at the beginning of the RAID drive which is accessible, no matter if access it as a RAID drive or as its underlying disks. On this partition I store the kernel and its ramdisk, as well as grub's stage files. I did that by booting the Gentoo install CD, setting up dmraid as I did on the Debian box, formatting the RAID drive and copying the old drive's partitions with

# find . | cpio -pdum /mnt/target

The second problem was building a working kernel and its ramdisk. Custom built kernels didn't work because they're unable to initialize the mapping with dmraid. Luckily Gentoo's genkernel package saved my day.

# genkernel --dmraid all --menuconfig

After compiling the kernel in my chrooted target system, I copied the resulting files to my boot partition mentioned above and add the following entry to grub's config file:

title=Gentoo Linux 2.6.23 genkernel
root (hd0,3)
kernel /kernel-genkernel-x86_64-2.6.23-gentoo root=/dev/ram0 init=/linuxrc \
real_root=/dev/mapper/isw_bcaheaacjd_NewSystem2 dodmraid vga=792 ramdisk=8192
initrd (hd0,3)/initramfs-genkernel-x86_64-2.6.23-gentoo

Note that the fact that the root partition is the partition with number four has nothing to do with it being situated at the very beginning of the disk.

Labels: , ,


Saturday, October 20, 2007

Persona 2 in pSX

I'm a huge fan of the megami tensei (女神転生) series by ATLUS (and formerly Namco), not only because of its exceptionally eerie atmosphere but also because of the great soundtrack and unmatched NPC interaction. With interaction I don't mean a simple four-way dialogs as you find it in the Elder Scroll based on a like-dislike-meter. In Persona, for example, you'll have to find out the personality which can be one of cheerful, timid, gloomy, bluff, temperant, arrogant, wise and fool or a combination of up to three. Depending on the (mix of) personalities, the demons react differently to interaction with either fear, anger, happiness or interest or a logical mix of two. To make things more complicated there are countless (well, in fact only 57) ways to interact with a demon which depends on which character or combination of characters of your party you use. Interactions thus range from reading horoscopes and giving advices to interrogations and passionate gazes. I didn't yet mention that talking is not always an option, especially if a previous action made the demon hate or fear you. Anyway, successfully negotiating with demons is important because they might present you items or join your cause.

Unfortunately, the current shin megami tensei titles such as Nocturne and Digital Devil Saga don't feature such complex interactions, but therefore other great ideas.

Well, as I lost my saves of Persona and bought an import copy of Persona 2 I decided to give it a spin. My PS2 is quite loud though and I'm obviously too lazy to clean its fans so I decided to make ePSXe run my gentoo system. Portage told me that epsxe was masked so I looked around a bit more and finally found pSX, though not in portage. After manually adding some missing ia32 libraries it was all set up - no need to download and configure plugins for each and every vital part of a PSX emulator as in ePSXe. It just workd out of the box, even with real optical drives. OK, I prefer using an image after all:

# cdrdao read-cd --read-raw --datafile persona2.bin --device 3,0,0 --driver generic-mmc-raw persona2.toc

Using a normal (read cheap, without support for dual-shock) PS2-controller to USB adapter works fine after:

# modprobe joydev

See for yourself:



Labels: ,