For the last five years or so it's always been my firm intent to take part in the programming contest associated with the International Conference on Functional Programming (ICFP). And each year something has prevented it. But this year there was no emergency at work, no computer hardware broke, no sisters were getting married, etc. So instead of playing poker on the net, which had been consuming all of my free time for the last couple of weeks, I read the 22 page spec and fired up emacs. (Just kidding, emacs was already running).
The surface task was to write an interpreter for a weird string-rewriting language. The organizers supplied a large blob of data, which when run through the interpreter would produce as output some image drawing operations (for which you basically had to write some kind of a visualizer if you wanted to achieve anything). The goal was to come up with some prefix to the program which would make it instead produce output that would be as close as possible to a certain target image.
The intended way to achieve that goal was to notice that the drawing operations generated by the blob would first write a clue message, which would then be hidden in the final image by other image operations. This seems like a really bad decision. I luckily noticed the message since my first version of the image conversion tool didn't support the flood fill operation. But apparently a lot of teams never saw the message, and were left to stumble in the dark for the whole weekend. The image that could be drawn by using the clue would then lead to another obscure puzzle. Again, I was lucky to figure out the solution after a while, but judging by IRC and mailing list traffic a huge amount of teams never got it, and were basically stuck.
That clue could then finally be used to produce some concrete details
on how the big blob of data was using the string-rewriting language to produce
the image. There was even a catalog of the functions that the blob contained.
But the really useful data seemed to be hidden behind yet more
puzzles. So at this point I just did a minimal hack to make a token improvement
to the produced image: the source image had a grove of apple trees, the
target had pear trees. And according to the catalog the function apple_tree
was exactly as large as pear_tree
. So I wrote a prefix that overwrote the
former with the latter. And then I
submitted that prefix, and switched to doing something more interesting.
(I think that token improvement was still in the top 20 something like 8 hours
before the contest ended, which probably says something about how much
progress people were making).
I did rather enjoy writing the interpreter and the visualization tool, and the specifications for both were mostly very good. Unfortunately the spec contained only a couple of trivial test cases with the expected results, so if your interpreter had a problem, figuring out what exactly was going wrong just from looking at execution traces was really hard. The organizers originally replied on the mailing list that such debugging "is exactly part of the task", but later released an example trace from a few iterations at the start. There was a documented prefix that would run some tests on the implementation, and generate an image from those results, but the coverage of those tests didn't seem to be very good. (I had several bugs that only showed up with the full image, not with the test one).
The part of the interpreter that many teams seemed to have big trouble with was that you couldn't really use a basic string or array to represent the program. If you did, performance would be orders of magnitude too slow (people were reporting getting 1 iteration / second, when drawing the basic image would require 2 million iterations) due to excessive copying of strings. Now, this was even pointed out in the specification! Paraphrasing: "these two operations needs to run faster than in linear time". And still people tried to use strings, bawled when their stupid implementation wasn't fast enough, and decided that the only solution would be to rewrite their program in C++ instead of their favourite Haskell/Ocaml/CL. Good grief...
For what it's worth, I used just about the stupidest imaginable implementation strategy beyond just a naive string: represent the program as a linked list of variable length chunks, which will share backing storage when possible. My first CL implementation of this ran at about 5.5k iterations / second. This was good enough at the stage in the competition that I got to, and would've been easy to optimize further if I'd decided to continue (afterwards I made a 15 line change that gave a 8x speedup, so the basic image now only takes 41 seconds to render on an Athlon x2 3800+). And this was with a stupid data structure and couple of minor performance hacks. It seems obvious that practically any language could have been used to write a sufficiently fast interpreter. It never ceases to amaze me how programmers would rather blame their tools than think about the problem for a couple of minutes.
Anyway, the organizers obviously put in a huge effort for this contest, so thanks to them for that. It's just that the format really wasn't what I was looking for in a programming contest. But at least it was interesting enough to temporarily shake me out of playing poker into doing some hacking again :-) (Faster SBCL hash tables coming soon, I hope).
I've made the source code for the interpreter available since several people have asked for it. I'm not sure why they've asked for it, since it's not very good code, and probably contains no worthy insights. But if you want it, it's there.
Addenda: After writing the above, I read a few messages on the mailing list which claimed that there really wasn't much of a puzzle aspect, but that success was mainly determined by how good tools (compilers, debuggers, disassemblers, etc) you were able to write. While it's possible that after the initial two humps that I described above the puzzles were irrelevant, that wasn't my impression. At the point where I stopped, it didn't feel to me as if sufficient knowledge was available for writing the tools, but rather was hidden behind encrypted pages, steganography, etc. None of which I really wanted to deal with.
There was definitely enough information available to make a start at reverse-engineering, but I don't think there was enough time to reverse-engineer enough of the system to figure out how to write the tools, write them, and then use the tools to actually solve the real problem. I'm sure things were different for larger teams, but that doesn't really comfort me as a one person team :-) My impression is that in the earlier ICFP contests the tasks were such that it was possible for a single programmer to achieve a decent result, even if it's unlikely that it's good enough to win. In this case you don't get any points for the reverse-engineering or for the tools, but just for the end result.
(Having written the above, I'm now sure that the eventual winner will turn out to be a single programmer who only started working on the task 8 hours before the deadline).
I totally agree! Although C++ was our common language and we used it from the start, it was FAR TOO SLOW using strings. A naive translation to C++ from Ruby or Python would have been pointless. In fact, I told people in the chat room during the comp, not to change from Python+Psyco (and I didn't do it because I wanted to beat them).
*However*, a naive translation from strings to ropes (s/string/crope/g basically gave us a 3-4 order-of-magnitude speedup, except that the semantics of string.substr and crope.substr have different semantics).
Check out my teammate's post mortem:
http://marco-za.blogspot.com/2007/07/icfp-how-we-reached-top-15.html