Martin Guy <martinwguy@yahoo.it>
25 August 2007
Last updated: 18 December 2008
The Maverick Crunch code generation in GCC has never worked. Among other things, it cannot reliably compare two floating point numbers. For details, see the file compare-bug.
To use this you need to be using an ARM EABI kernel and file system such as the new Debian "armel" port or the Angstrom distribution and you need to build your own fixed version of GCC.
gcc-4.1.2 will build in 64MB (just!). If you have some swap space, so much the better.
To build gcc-4.2.0 with full optimisation you need more than 64MB RAM; if you only have 64MB you can work around this with some manual intervention, described below.
With 64MB it gets into trouble when it arrives at gcc/insn-recog.c: if you have no swap space, gcc will die horribly and/or the kernel will start killing processes. If you do have some swap space it will take forever, thrashing pages in and out of swap and getting nowhere. You can see this happening by launching "vmstat 5" on another console and watching the "si so" columns go into the hundreds and the "us" column go down to zero.
You can get past the file that requires more than 64MB of virtual memory by waiting until the compilation gets to insn-recog.c (or until it fails if you have no swap), then use "make CFLAGS=-g" to compile the problem file without optimisation, and when it has succeeded on that file, interrupt it and continue with plain "make".
Alternatives are to build it on an ARM simulator such as QEMU where up to 247MB of simulated RAM can be configured, or to use a cross-compiler on a big computer; these options are not covered here.
Here is the original tarball from futaris.org, of which there is also an unpacked browsable copy on this machine and here are single files containing all the necessary patches in one lump for gcc-4.1.2 and for gcc-4.2.0.
I built a native compiler on the ARM box itself running the "armel" Debian ARM EABI port.
Here's how to make a Crunch-capable version of gcc-4.1.2. To make gcc-4.2.0, just s/4.1.2/4.2.0/.
wget ftp://sourceware.org/pub/gcc/releases/gcc-4.1.2/gcc-4.1.2.tar.bz2
tar xjf gcc-4.1.2.tar.bz2
wget http://files.futaris.org/gcc/crunch.tar.bz2
tar xjf crunch.tar.bz2
cd gcc-4.1.2
for a in `cat ../gcc/gcc-4.1.2/series`
do
patch -p1 < ../gcc/gcc-4.1.2/$a
done
cd ..
(One patch in the gcc-4.2.0 SRC_URI list,
fix-ICE-in-arm_unwind_emit_set.diff, has already been applied
in the mainline gcc tarball. "Patch" reports it as an already-applied or
reversed patch: just say No :)
mkdir build-4.1.2
cd build-4.1.2
../gcc-4.1.2/configure --disable-nls --enable-shared --with-system-zlib \
--without-included-gettext --enable-threads=posix \
--enable-clocale=gnu --enable-mpfr --disable-libssp \
--disable-bootstrap --enable-languages=c \
--program-suffix=-4.1.2-futaris arm-futaris-linux-gnueabi
make
make install
make check-gcc RUNTESTFLAGS="ieee.exp --target_board=unix/-mcpu=ep9312/-mfpu=maverick/-mfloat-abi=softfp"
gcc-4.1.2-futaris -mcpu=ep9312 -mfpu=maverick -mfloat-abi=softfp -O2 foo.c
or
./configure CC=gcc-4.1.2-futaris CFLAGS="-O2 -mcpu=ep9312 -mfpu=maverick -mfloat-abi=softfp"
make
make install
One of the gcc patches, unbreak-armv4t.patch already changes the default instruction set from armv5 to armv4. Without this, the gcc build dies half way saying "Illegal instruction".
Other flags that could be used in the gcc configuration line are --with-cpu=ep9312 --with-fpu=maverick and presumably --with-float-abi=softfp but I never found a combination that would work with all the GCC components;
It unpacks to three shell scripts: 1-download.sh, 2-unpack.sh and 3-build.sh and contains patches for the abovementioned tools. I've applied their gcc patches to gcc-4.1.2, built and tested it as above, and it fails 15 of the tests in gcc's IEEE test suite (while the futaris patches pass 100% of the tests). With the necessary -ffix-crunch-d1 flag, it fails 14 of the tests.
However, it does seem to work with a couple of small testpieces that spring bugs in unpatched gcc, and LAME seems to work ok with it and at the same speed as the futaris-patched versions, give or take 1%; the output file is playable but is different from the output produced by a correct FP implementation.
Except where -O flags are specified, LAME was compiled -O2
| Compiler | Flags | Text size | User CPU time |
|---|---|---|---|
| gcc-4.1.2 prerelease (Debian arm) | none (emulated FPA) | 267363 | 68m10.590s |
| gcc-4.2.1 (Debian armel) | none (soft-float) | 302619 | 5m55.766s 5m54.12s |
| gcc-4.2.1 (Debian armel) | -mcpu=ep9312 -mfast-math | 301891 | 6m10.106s (!) |
| gcc-4.1.2-futaris | -mcpu=ep9312 -mfpu=maverick -mfloat-abi=softfp | 289693 | 2m28.8s 2m28.6s 2m27.544s |
| gcc-4.1.2-futaris | -mcpu=ep9312 -mfpu=maverick -mfloat-abi=softfp -Os | 254291 | 2m31.5s 2m31.5s |
| gcc-4.2.0-futaris | -mcpu=ep9312 -mfpu=maverick -mfloat-abi=softfp | 287061 | 2m29.85 2m29.86 2m28.394s |
| gcc-4.1.2-cirrus | -mcpu=ep9312 -mfpu=maverick -mfloat-abi=softfp | 287065 | 2m28.514s |
Or, to summarise
The only floating point option to pass all the "paranoia" tests on Cirrus hardware is soft float.
Conclusion: there are still no patches to gcc that generate reliable Maverick Crunch code.
Interestingly, the standard gcc for x86 can compile paranoia.c but fails many of the tests. Without optimisation, gcc-3.4 and gcc-4.1.2 have 1 defect and 1 flaw; with -O2 gcc-4.1.2 has 3 failures, 4 serious defects, 3 defects and 2 flaws.
In April, Hasjim sent me a twisty little maze of patch ideas for gcc 4.2.1, 4.2.2, 4.2.3 and 4.3.0, all different. I've tried and compared them, analysed and fiddled with them and have cooked up a set for 4.3.2.
The best maximum-speed options are currently:
-mcpu=ep9312 -mfpu=maverick -mfloat-abi=softfp -mcirrus-fix-invalid-insns -O2 -mfast-mathgiving 5.9 Mflops on the FFTW benchmark tests/bench -opatient cf1024 as opposed to 5.4 with the 4.1.2 and 4.2.0 sets. Most of the speed improvement is due to the awesome pipeline definition patch.
The strategies are:
Since we can't ever attain perfect IEEE math without disabling the FP add and sub instructions too, I prefer to declare a precision of ±2-1022 for doubles and enable all the denorm-unsafe instructions for maximum speed.
We discuss these things on the linux-cirrus mailing list, carbon copied to the GCC groups and others when appropriate.
| Martin Guy <martinwguy@yahoo.it> | Useful? Make a donation |