SBCL character IO has been always been rather slow, but after the Unicode support was added about a year ago it got even worse. To give an idea of how bad, reading and printing a 65MB (1.3 million lines) file line-by-line takes <1.5 seconds with Perl, 5-8 seconds with other Lisps that I have installed, and 15 seconds with SBCL 0.9.6. A pre-unicode SBCL takes about 7 seconds.
So I went hunting for some low-hanging fruit in (fd-)streams, and found quite a lot.
-
There were several places where the Unicode-induced separation of
SIMPLE-BASE-STRING
and(SIMPLE-ARRAY CHARACTER)
had forced formerly inlined operations (stuff likeAREF
,FIND
,REPLACE
, etc) to be replaced with a generic calls due to insufficient type information. -
The addition of the
OUTPUT-NOTHING
restart when trying to write a character into a stream with an incompatible external format was causing overhead on every iteration of some inner loops. Though I have a vague recollection that it was even worse at some point (creation of a restart on every iteration of the innermost loop) than it was now (establishing a catch tag on every iteration). -
The input buffer for UTF-8 streams never received more than one character at a time.
-
READ-LINE
was fetching data from the internal input buffer character by character, instead of looking ahead for a newline and then copying a bigger batch of characters at once.
After fixing all of the above and doing some additional micro-optimizations SBCL now takes about 3.5 seconds, which isn't too bad. If you've been having IO performance troubles with SBCL, now might be a good time to test CVS SBCL.
One thing that I ran into and didn't have time to look at is that
SB-SYS:*STDIN*
doesn't get a CIN-BUFFER
at all, and thus is still
painfully slow. If this is intentional, my guess is that
FD-STREAM-READ-N-BYTES
doesn't play along well with line-buffering.