My blog now has simple comment support. This turned out to be a much better way to procrastinate than I'd expected. Prior to this, everything in the system has been stored in files, but I decided that comments were better off being stored in a database. In retrospect, this probably wasn't such a good idea.
The initial coding took a few hours, much of which was spent figuring out the right way to use CLSQL. I was especially bit by the caching that CLSQL does, which caused some problems that were hard to diagnose. After being bitten couple of times I just turned it off. Having caching on by default seems like a bad choice.
The real fun started once I decided to check that the system would
still work with a light load, and ran ab
(ApacheBench) with 5
concurrent processes accessing the web server. It failed on
alarmingly many requests. So I got to spend most of the day debugging
threads.
First problem in SB-BSD-SOCKETS. gethostbyname
and gethostbyaddr
return data in statically allocated buffers, which will be overwritten
by the next call. SB-BSD-SOCKETS accounts for this by copying the
data immediately after the call. However, it's possible for one
thread to overwrite the buffer before another thread had time to
copy the data to safety. Boom!
Second problem in the SBCL internal caches, which caused occasional nonsense errors like "STRING is a bad type specifier for sequences". Surprisingly easy to trigger once you figure out what's going on:
(defun random-type (n)
`(integer ,(random n) ,(+ n (random n))))
(defun one-test ()
(dotimes (i 10000)
(let ((type1 (random-type 500))
(type2 (random-type 500)))
(let ((a (subtypep type1 type2)))
(dotimes (i 100)
(assert (eq (subtypep type1 type2) a))))))
(format t "ok~%")
(force-output))
(defun test ()
(dotimes (i 10)
(sb-thread:make-thread #'one-test)))
The heavy-handed solution is to sprinkle some magic pixie locks on
all the functions created by DEFINE-HASH-CACHE
. Unfortunately
these functions are called very often, and the mutex
overhead caused a 50% slowdown in the average page generation time.
Definitely not committable in this state. Unfortunately the locks
need to be recursive, so spinlocks as currently implemented were
not an option.
Third problem between keyboard and chair, though I'll happily assign
some of the blame to CLSQL. I forgot to specify the database
for one call to SELECT
, and it ended up using *DEFAULT-DATABASE*
.
This wouldn't have been too bad, except that WITH-DATABASE
has a really strange feature: instead of just binding the connection
to the specified variable it will also SETF
it before establishing the new binding. I.e.
CL-USER> (progn
(setf clsql::*default-database* nil)
(clsql:with-database (clsql::*default-database* *db-spec*))
clsql::*default-database*)
#<CLSQL-POSTGRESQL-SOCKET:POSTGRESQL-SOCKET-DATABASE localhost:7432/blog/jsnell CLOSED {100289EF71}>
Due to the above points the same connection ended up getting used from multiple threads at the same time in certain circumstances, with predictably bad results. I spent a couple of merry hours debugging this as a fd-stream problem before realizing the mistake.
After fixing these the Araneida instance has now survived without errors for 100000 requests with 10 concurrent client processes and another 100000 with 20 clients. That should suffice for now, and it doesn't really matter that handling the average page request takes 25ms instead of 18ms.
I've been working through Practical Common Lisp today, and thought I'd do some procrastination of my own by checking out planet.lisp.org where I saw your post. This is a big offtopic, but I'm thinking about rel databases and lisp at the moment and you triggered that...
One thing that concerns me every time I look at lisp is that as far as I can see I won't be able to use a persistence framework without writing one myself.
I come from a webobjects background and like to use such a tool in my projects. In webobjects, the persistence framework called 'the EOModel', but the last twelve months I've been using is called cayenne (which I'm told is similar to hibernate - all of these are java-based).
All these strategies are heavily object-oriented. You create wrapper classes for each table. Within the application there are one to many object graphs which the application builds up before saying 'commit' causing them all to get written to the database (SQL and alignment of ids are taken care of).
Is there a lispy way of achieving these sort of outcomes with a relational database?
One workaround I've been thinking I might be able to use in lisp would be just to do a lot more in stored procedures on the database side. But I'm not a big fan of this approach and my gut feel tells me that it won't be as powerful as a persistence model.