I've just spent some time revisiting an old program I wrote for a competition, a pentomino solver: I got the algorithm right by writing it in C, then converted that C code to optimal asm, and then went back and optimized the C so as to enable the compiler to generate equivalent code to my asm version. Back in 1996/97 when I wrote this, the asm version ran nearly twice as fast as the C version. Today, the difference is smaller, and after re-optimizing the C code, the times to find all 2339 solutions on my 1.6 GHz Pentium-M laptop were 270 vs 311 ms, i.e. the asm code is still better, but now just a little more than 10% faster. (BTW, on a P4 the times were 170 vs 220 ms, i.e. approx your 1.2x factor.) OTOH, I actually spent much _more_ time getting the C code correct than the asm version! Perhaps the assembly team got 90% of the theoretical optimum, and in the real world the compiler team are not going to do this well, perhaps 50% worse. That would still mean the compiler got within a factor 2 of the absolute optimum. Not too bad I'd say. Of course I've made up the 90% and 50%, but you have to be very pessimistic to believe the numbers are much worse than this... These optimizations do things that compilers are not allowed to do (because they may change the outcome, although in ways that the user does not care for in this particular application); or Sure, so most compilers have flags that change the language semantics, like an aggressive floating point mode, reduced aliasing mode etc. The important thing a working asm version offers you is a better appreciation for exactly how your chosen algorithm is going to wrap itself around the available resources. Terje -- - <Terje.Mathisen@hda.hydro.com> "almost all programming can be viewed as an exercise in caching"