What a classic rant, can't believe I hadn't seen this before. (I hadn't seen the original Crockford DEC64 page either. But the rant is at least amusing, while the DEC64 proposal didn't really have many redeeming properties).
A lovely walk through optimizing a single Opus Magnum solution, step by step.
A tale of a surprisingly long-running positive-return lottery syndicate. Or actually, two of them. Who then end up eating into each others' profits, and start a dirty fight on both sides. (Ok, this last bit isn't a big part of the story. But it's what I chuckled at the most.)
This story did not go where I expected it to from the title.
> Dataflow and data dependencies can be viewed as the fundamental expression of the structure of a particular computation, whether it’s done on a small sequential machine, a larger superscalar out-of-order CPU, a GPU, or in hardware (be it a hand-soldered digital circuit, a FPGA, or an ASIC). Dataflow and keeping track of the shape of data dependencies is an organizing principle of both the machines themselves and the compilers that target them.
How to think about optimization.
>The other thing I’ll say is that even though I’ve been talking about adding cycle estimates for compute-bound loops here, this technique works and is useful at pretty much any scale. It’s applicable in any system where work is started and then processed asynchronously, with the results arriving some time later
Paul Khuong on one of the hardest bugs he has had to debug.