Complaining about the SBCL x86/x86-64 calling convention is one favourite pastime of SBCL hackers. The convention is disgusting, slow, largely undocumented, and really hard to change since assumptions that are implied by the convention are hard-coded into many places. If you don't change all of the places correctly, you lose in ways that are very hard to debug. I've looked at this at times, taken some preliminary stabs at the implementation, and decided that I couldn't justify spending weeks on making it actually work.
Alastair Bridgewater was going to document this stuff, but after becoming sufficiently enlightened on how the current implementation worked, he instead fixed some of the worst problems and documented the new convention. All in one weekend, which was truly superb hacking.
The effect is really quite dramatic on some cases. The nonsense benchmark code from my previous post on this subject took 0.9 seconds on a vanilla SBCL and 0.3 seconds with Alastair's patch. (The baseline is different than last time, since I upgraded from a Athlon 64 to an X2). On less contrived examples the effect is naturally smaller, but for example the CL-BENCH Ackermann function is about 30% faster with the new version.