Faster character IO

Posted on 2005-10-29 in Lisp

SBCL character IO has been always been rather slow, but after the Unicode support was added about a year ago it got even worse. To give an idea of how bad, reading and printing a 65MB (1.3 million lines) file line-by-line takes <1.5 seconds with Perl, 5-8 seconds with other Lisps that I have installed, and 15 seconds with SBCL 0.9.6. A pre-unicode SBCL takes about 7 seconds.

So I went hunting for some low-hanging fruit in (fd-)streams, and found quite a lot.

There were several places where the Unicode-induced separation of SIMPLE-BASE-STRING and (SIMPLE-ARRAY CHARACTER) had forced formerly inlined operations (stuff like AREF, FIND, REPLACE, etc) to be replaced with a generic calls due to insufficient type information.
The addition of the OUTPUT-NOTHING restart when trying to write a character into a stream with an incompatible external format was causing overhead on every iteration of some inner loops. Though I have a vague recollection that it was even worse at some point (creation of a restart on every iteration of the innermost loop) than it was now (establishing a catch tag on every iteration).
The input buffer for UTF-8 streams never received more than one character at a time.
READ-LINE was fetching data from the internal input buffer character by character, instead of looking ahead for a newline and then copying a bigger batch of characters at once.

After fixing all of the above and doing some additional micro-optimizations SBCL now takes about 3.5 seconds, which isn't too bad. If you've been having IO performance troubles with SBCL, now might be a good time to test CVS SBCL.

One thing that I ran into and didn't have time to look at is that SB-SYS:*STDIN* doesn't get a CIN-BUFFER at all, and thus is still painfully slow. If this is intentional, my guess is that FD-STREAM-READ-N-BYTES doesn't play along well with line-buffering.

Name
Message
	As an antispam measure, you need to write a super-secret password below. Today's password is "xyzzy" (without the quotes).
Password

Faster character IO

Comments