Fixing Maverick Crunch FPU code generation
in GCC 4.1.2 and 4.2.0

Martin Guy <martinwguy@yahoo.it>
25 August 2007
Last updated: 18 December 2008


News

18 December 2008
I've cooked up a faster and more reliable set of patches than these for GCC 4.3.2. See the end of this document or their own separate page.

Preamble

I've been trying to make the Maverick Crunch FPU usable with Debian on Cirrus Logic EP93xx chips, which contain an ARM920T integer processor and a Maverick Crunch floating point coprocessor.

The Maverick Crunch code generation in GCC has never worked. Among other things, it cannot reliably compare two floating point numbers. For details, see the file compare-bug.

To use this you need to be using an ARM EABI kernel and file system such as the new Debian "armel" port or the Angstrom distribution and you need to build your own fixed version of GCC.

System requirements

You will need at least 64MB of RAM to build gcc.

gcc-4.1.2 will build in 64MB (just!). If you have some swap space, so much the better.

To build gcc-4.2.0 with full optimisation you need more than 64MB RAM; if you only have 64MB you can work around this with some manual intervention, described below.

With 64MB it gets into trouble when it arrives at gcc/insn-recog.c: if you have no swap space, gcc will die horribly and/or the kernel will start killing processes. If you do have some swap space it will take forever, thrashing pages in and out of swap and getting nowhere. You can see this happening by launching "vmstat 5" on another console and watching the "si so" columns go into the hundreds and the "us" column go down to zero.

You can get past the file that requires more than 64MB of virtual memory by waiting until the compilation gets to insn-recog.c (or until it fails if you have no swap), then use "make CFLAGS=-g" to compile the problem file without optimisation, and when it has succeeded on that file, interrupt it and continue with plain "make".

Alternatives are to build it on an ARM simulator such as QEMU where up to 247MB of simulated RAM can be configured, or to use a cross-compiler on a big computer; these options are not covered here.

Fixing and building gcc

I have found sets of patches for GCC-4.1.2 and gcc-4.2.0 that repair it, targetted on the OpenEmbedded project. For the surrounding technical discussion, follow this thread in the gcc mailing list and the patches' author's warning about remaining problems, essentially he says:

Here is the original tarball from futaris.org, of which there is also an unpacked browsable copy on this machine and here are single files containing all the necessary patches in one lump for gcc-4.1.2 and for gcc-4.2.0.

I built a native compiler on the ARM box itself running the "armel" Debian ARM EABI port.

Here's how to make a Crunch-capable version of gcc-4.1.2. To make gcc-4.2.0, just s/4.1.2/4.2.0/.

Using the new compiler

Now you can compile things using real Maverick instructions, by using lines like
    gcc-4.1.2-futaris -mcpu=ep9312 -mfpu=maverick -mfloat-abi=softfp -O2 foo.c
or
    ./configure CC=gcc-4.1.2-futaris CFLAGS="-O2 -mcpu=ep9312 -mfpu=maverick -mfloat-abi=softfp"
    make
    make install

Trying to set Maverick Crunch as the default compiler target

I haven't figured how to make these three settings the default values yet.

One of the gcc patches, unbreak-armv4t.patch already changes the default instruction set from armv5 to armv4. Without this, the gcc build dies half way saying "Illegal instruction".

Other flags that could be used in the gcc configuration line are --with-cpu=ep9312 --with-fpu=maverick and presumably --with-float-abi=softfp but I never found a combination that would work with all the GCC components;

Maybe I need to build a specially configured version of binutils too, but really these are gcc/as bugs: the ep9312 cpu type should be removed and gcc should clue the assembler in when cpu=arm920t and fpu=maverick the same as it currently does when cpu=ep9312.

Alternative patch sets

Cirrus have just published a set of patches for gcc-4.1.2, binutils-2.17 and uClibc-0.9.29 as crunch-tools-1.4.0.tar.bz2.

It unpacks to three shell scripts: 1-download.sh, 2-unpack.sh and 3-build.sh and contains patches for the abovementioned tools. I've applied their gcc patches to gcc-4.1.2, built and tested it as above, and it fails 15 of the tests in gcc's IEEE test suite (while the futaris patches pass 100% of the tests). With the necessary -ffix-crunch-d1 flag, it fails 14 of the tests.

However, it does seem to work with a couple of small testpieces that spring bugs in unpatched gcc, and LAME seems to work ok with it and at the same speed as the futaris-patched versions, give or take 1%; the output file is playable but is different from the output produced by a correct FP implementation.

Speed tests

These results are for test runs on a 200 MHz EP93xx, the TS-7250 board running Debian armel with no active background processes, to encode a 6460460-byte 16-bit stereo 44100Hz WAV file (actually 2 identical mono tracks) of 30 seconds plus 6 seconds of silence using LAME compiled with different compilers, using default LAME options (MPEG 1 layer III v1 CBR 128 kbps joint-stereo).

Except where -O flags are specified, LAME was compiled -O2

Compiler Flags Text size User CPU time
gcc-4.1.2 prerelease
(Debian arm)
none (emulated FPA) 267363 68m10.590s
gcc-4.2.1 (Debian armel) none (soft-float) 302619 5m55.766s
5m54.12s
gcc-4.2.1 (Debian armel) -mcpu=ep9312 -mfast-math 301891 6m10.106s (!)
gcc-4.1.2-futaris -mcpu=ep9312 -mfpu=maverick -mfloat-abi=softfp 289693 2m28.8s
2m28.6s
2m27.544s
gcc-4.1.2-futaris -mcpu=ep9312 -mfpu=maverick -mfloat-abi=softfp -Os 254291 2m31.5s
2m31.5s
gcc-4.2.0-futaris -mcpu=ep9312 -mfpu=maverick -mfloat-abi=softfp 287061 2m29.85
2m29.86
2m28.394s
gcc-4.1.2-cirrus -mcpu=ep9312 -mfpu=maverick -mfloat-abi=softfp 287065 2m28.514s

Or, to summarise


Update 22 Oct 2007

While these compilers "seem to work" and pass the gcc test suite, the intensive floating point validation program paranoia reports failures and inaccuracies with all of them. In fact, with "paranoia", the futaris patches described here produce executables that die from segmentation faults and illegal instructions or loop indefinitely when run on Cirrus Maverick Crunch hardware. The behaviour is not even repeatable from run to run but seems random.

The only floating point option to pass all the "paranoia" tests on Cirrus hardware is soft float.

Conclusion: there are still no patches to gcc that generate reliable Maverick Crunch code.

Interestingly, the standard gcc for x86 can compile paranoia.c but fails many of the tests. Without optimisation, gcc-3.4 and gcc-4.1.2 have 1 defect and 1 flaw; with -O2 gcc-4.1.2 has 3 failures, 4 serious defects, 3 defects and 2 flaws.


Update 11 Dec 2008

I always seem to get round to working on the Crunch patches in Autumn. I wonder why...

In April, Hasjim sent me a twisty little maze of patch ideas for gcc 4.2.1, 4.2.2, 4.2.3 and 4.3.0, all different. I've tried and compared them, analysed and fiddled with them and have cooked up a set for 4.3.2.

The best maximum-speed options are currently:

  -mcpu=ep9312 -mfpu=maverick -mfloat-abi=softfp -mcirrus-fix-invalid-insns -O2 -mfast-math
giving 5.9 Mflops on the FFTW benchmark tests/bench -opatient cf1024 as opposed to 5.4 with the 4.1.2 and 4.2.0 sets. Most of the speed improvement is due to the awesome pipeline definition patch.

The strategies are:

This work, as patches for GCC and as prebuilt compilers, is available here.

TODO

We discuss these things on the linux-cirrus mailing list, carbon copied to the GCC groups and others when appropriate.


Martin Guy <martinwguy@yahoo.it> Useful? Make a donation