Stuff I read last week

About: A collection of links and short commentary, published weekly. In theory the inclusion criteria are for it to be something I read last week (duh) and was either particularly interesting or something I might want to refer to later. Subscribe with RSS or see my main blog for long-form writing.

2018-03-12

A silly review of DEC64
What a classic rant, can't believe I hadn't seen this before. (I hadn't seen the original Crockford DEC64 page either. But the rant is at least amusing, while the DEC64 proposal didn't really have many redeeming properties).
What Works And Why: Opus Magnum
A lovely walk through optimizing a single Opus Magnum solution, step by step.
The Lottery Hackers
A tale of a surprisingly long-running positive-return lottery syndicate. Or actually, two of them. Who then end up eating into each others' profits, and start a dirty fight on both sides. (Ok, this last bit isn't a big part of the story. But it's what I chuckled at the most.)
The case of the missing core files
This story did not go where I expected it to from the title.
A whirlwind introduction to dataflow graphs
> Dataflow and data dependencies can be viewed as the fundamental expression of the structure of a particular computation, whether it’s done on a small sequential machine, a larger superscalar out-of-order CPU, a GPU, or in hardware (be it a hand-soldered digital circuit, a FPGA, or an ASIC). Dataflow and keeping track of the shape of data dependencies is an organizing principle of both the machines themselves and the compilers that target them.

How to think about optimization.

>The other thing I’ll say is that even though I’ve been talking about adding cycle estimates for compute-bound loops here, this technique works and is useful at pretty much any scale. It’s applicable in any system where work is started and then processed asynchronously, with the results arriving some time later
The 34th Bit
Paul Khuong on one of the hardest bugs he has had to debug.

2018-03-04

Zombie Processes are Eating your Memory
Debugging story. A Windows service causing an invisible 40GB kernel memory leak due to some kind of race condition where it occasionally fails to properly release process handles.
Thinking About Writing
Some interesting thoughts on the value of writing blog posts vs. living documents that work as long-term resources.
Optimizing BitBlt by generating code on the fly
Early versions of Windows did bit-blitting by JITting a routine specialized for the actual parameters. This is the evolution of those routines.
LLVM’s garbage collection facilities and SBCL’s generational GC
Thoughts from Martin Cracauer on what the GC implications would be on using LLVM as a SBCL code generator.
.
2ine – OS/2 emulator for Linux
What goes into implementing enough of the OS/2 ABI to get an absolutely minimal graphical application working?
Reading bits in far too many ways
> There are lots of plausible ways to pack bits into bytes, and all have their strengths and weaknesses that I’ll go into later. For now, let’s just cover the differences.

Also, part 2
Making Direct3D games faster in Wine using modern OpenGL
Writing an asynchronous (when allowed by the semantics) D3D to OpenGL shim.
Queryparser, an Open Source Tool for Parsing and Analyzing SQL
Writing a SQL parser in Haskell isn't very interesting. The good part is everything else about this. All the way from the genesis of the tool (need to figure out what all the relations in the system really are, for a hellish schema transition), to where the system actually ended up at and what other use cases naturally appeared.

The typical HN second guessing comments feel even more depressing than usual. Why didn't they just read the documentation of the tables to figure out the details? Why not use a Python SQL parser instead of writing a new one? Why did they want this schema transition anyway? It's like there's zero empathy for other people's problems being more complicated than can be explained in the setup of a blog post.
FinFisher exposed
A deep dive into reverse-engineering an ultra-obfuscated piece of malware, with multiple layers of custom virtual machines. Really awesome.
Having fun and surprises with IPv6
A network debugging war story, involved IPv6, fragementation and QUIC.

I'm probably going to disagree a bit on the moral of the story. The authors' takeaway here is that routers should not be reordering packets.

What I see here is yet another instance of full transport layer header encryption making it impossible to do the right thing. Why does the server need to MTU-probe with a massive packet? Because there's way for the path to give a signal about the packets size (MSS clamping in TCP). Why does the receiver end up blocking the queue on fragmentation? Because there's no way for it to know what the intended order was, the packet numbers are encrypted. So it has to assume the receiving order is the delivery order.

But look Ma! No ossification!

2018-02-13

Making Sense of Melee: The Illusion of Objective Ranks and the Real Impact of Everything
> I do not believe in objectivity in rankings. This is not to say I think being objective with regards to rankings is impossible, nor do I think "objective" tools serve no purpose (the tools I've written have already proven highly useful in generating baselines for seeding tournaments). No, more specifically I want to stress that "objective" ranking systems are much less objective than they actually seem, and the word "algorithmic" or "empirical" might be better.

Rating systems, once again. I don't think I agree with much of this article (e.g. the reasoning for Elo not working for double-elimination seem totally nonsensical). But the core idea of not having tournament seeding be purely algorithmic? Sure.
StarCraft: Remastered – Emulating a buffer overflow for fun and profit
Blizzard going above and beyond on remastering an old game. There was apparently a large number of user-made StarCraft maps that relied on buffer overflows to read/modify game internals (all the way to basically rewriting some of the game logic). How do you not break these maps when the game is basically completely rewritten? By basically building an elaborate buffer overflow emulation layer.

Just a crazy level of dedication.
Solving Minesweeper and making it better
"How do you write a minesweeper puzzle generator that always generates a level that can be won without guessing" is a boring question. That kind of level generation sucks. For a moment it looks like it's where this article is going. It's not, though. The core idea here is a lot more clever.
Native ImGui in the browser
"Dear ImGUI" in the browser with WebGL and webasm. I've been wanting to do something like this for a couple of small browser-based games.

(Something a bit odd going on with the keyboard handling though).

2018-01-29

Embedded in Rust: Brave new I/O
Using the Rust type system to make access to the global register state of embedded devices safe. (And some thoughts on API design).
Fuzzing TCP servers
A reasonably general-purpose system for fuzzing servers all the way from the main event loop, not just at some arbitrary "this is where we can feed the system a continous block of bytes" boundary.

(Also: Remind me to write about fuzzing the TCP stack itself, at some point).
Low-Latency Video Processing Using Thousands of Tiny Threads
Distributing video encoding with more granularity than by keyframe.
The Accidental HFT Firm – Now we were sending orders before the data pkt arrived
Absolutely lovely networking war stories (yay!) about early HTF (blech). All of the hacks are lovely. The one that resonated the most with me was figuring out a way of finding an application-level use case for out-of-order TCP transmitting (sending the header and footer of an order early, and once you've decided on a trade to make, sending out the price/count/stock id as a tiny packet that fills in the gap between the header and footer).
Plotter Drawings: Five seconds of Donkey Kong
Sparklines of the values of individual memory locations. Works because early game consoles had so little RAM. Such a simple idea, such cool output.

2018-01-15

Life in Unix v7: An attempt at a simple task
> It was said that ‘‘you really can’t appreciate troff (and runoff and scribe) unless you do all of your document preparation on a fixed width font 24 line by 80 column terminal’’. ‘‘Challenge accepted’’ I said to myself.

But the title is a misleading: this was a modern v7 port with some extra amenities. It's hard to appreciate just how primitive these early systems were without using them in their original form.
A project to resurrect Unix on the PDP-7 from scans of the original assembly code
Speaking of which, here's some people attempting to get the original PDP-7 unix (i.e. pre-v1) running agin. There aren't scans for all of the source code though. Most importantly the shell is missing. So they had to rewrite one themselves.
Tim Sweeney on the first version of the Unreal Editor
A story full of "I didn't realize it was impossible, so I went ahead and did it" moments.
Origins and development of TOPS-20
A bit like a a very compressed "Soul of a New Machine". I've been reading a lot of old timesharing papers, most of them are dreadfully boring even for me. (Don't ask why I've been reading that stuff...) But this particular kind of personal story of the creation of influential but totally forgotten technology is like catnip.
Timeouts and cancellation for humans
Designing a timeout system for a Python IO API. (Feels like a very Common Lisp-y solution to me, with the hidden global state with enforced dynamic extent).

2018-01-08

IOHIDeous
> One tiny, ugly bug. Fifteen years. Full system compromise.

A step-by-step walk through an OS X local vulnerability (and it's a lot of steps). Another of those writeups that make you wonder how anyone ever manages to get from concept to an actual exploit.
Meltdown and Spectre
Oh, ok... So that's what the Linux page table changes discussed a couple of weeks ago were about. This looks really bad. It seems amazing that nobody found it before now, but on the other hand at least the exploits for Spectre look really hard to pull off (needing to e.g. reverse engineer the branch predictor, so that it can be trained to expose one bit of data...) So maybe a lot of people had tried, and nobody just managed to do it.
Finding a CPU design bug in the Xbox 360
Now that's a much more approachable speculative execution bug!

2017-12-31

Taking a long look at QUIC
I like this paper. It doesn't just naively measure TCP vs QUIC, but also tries to map the results to the underlying mechanisms. But "ouch" on the fairness tests.
Exploring the ChestXray14 dataset: problems
The dangers of just throwing garbage data at a machine learning model. (And in a medical setting, even!)
Implementing malloc: Students and systems programming
What happens if you switch the example code of a difficult systems programming course from gnarly '80s style C to modern C? Apparently the students are able to implement much more complex memory allocator features.

(They also changed the malloc test suite at the same time. But it seems hard to believe that the tests would have a major effect here).
How to print integers really fast
> The correct solution to the “integer printing is too slow” problem is simple: don’t do that.
...
> However, once you find yourself in this bad spot, it’s trivial to do better than generic libc conversion code. This makes it a dangerously fun problem in a way… especially given that the data distribution can matter so much.

Paul Khuong on fast integer -> decimal string conversions.
The current state of kernel page-table isolation
Reporting on some mysterious ongoing Linux development. Just how bad a security bug does this have to be, if they're willing to take a 5% system-wide performance hit to work around it?
Repairing a 1960s mainframe: Fixing the IBM 1401's core memory and power supply
Insane retro-hardware maintenance.
Economics of Minecraft
A funny story about extreme capitalism in an MMO setting, the kind you'd expect to be a digression in a good Neal Stephenson novel. I have no idea whether this is actually true or not, but it probable doesn't matter either way :)
Computer latency: 1977-2017
Input-to-display latency measurements for 40 years of computers.
Voxel terrain: physics
How the voxel physics engine of Roblox works.
Creating believable crowds in Planet Coaster
And then for physics of a different kind... Basically modeling crowds of people as a fluid dynamics system.

2017-12-10

The First Unix Port
> Developing and testing a virtual version of Unix on OS/32 has practical advantages. There was no need for exclusive use of the machine; [...]. And the OS/32 interactive debugger was available for breakingpoint and single-stepping through the Unix kernel just like any other program.

A port of Unix v6, from before it was really meant to be portable. A lovely systems programming story.
The bug heard round the world Discussion of the software problem which delayed the first Shuttle orbital flight.
Zebras All the Way Down
A very amusing system programmer's lament.
Video Conference Part 1: These Things Suck
Ignore the title. It's not actually a rant about Skype sucking, but a really cool article series on someone writing their own codec + packet-loss tolerant UDP networking for a prototype video conferencing app.
ffwd: delegation is (much) faster than you think
Micro-optimizing lockless message passing between threads.
- Two cache-lines per core for clients to send requests to the server.
- Two cache-lines per socket for servers to send responses to the client.
- Server reads requests from clients and writes responses to a local buffer, then flushes the buffer in one go once out of requests or out of space in response buffer.
- Point of asymmetry is that store buffers are per-core, and the server process will be doing as many writes as the clients combined. So try to write as large batches in one go as possible.
Then use this to replace locks on data structures. Instead of data structures being shared, they're owned by a specific server process. If a client needs to operate on data structure, it asks the server to do it instead. Assuming heavy contention, this'll be much faster since fewer cache coherency roundtrips are required.

(Obviously not widely applicable, due to the scheme requiring busylooping to work well.)
Debugging an evil Go runtime bug
This'll go into the hall of fame of great debugging stories.
What have we learned from the PDP-11?
The PDP-I 1 was designed to be a small computer, yet its design has been successfully extended to high-performance models. This paper recollects the experience of designing the PDP-I I, commenting on its success from the point of view of its goals, its use of technology, and on the people who designed, built and marketed it.

A lovely mid-life postmortem for the PDP-11.

(Via Dave Cheney; a useful companion piece putting the paper in the historical context, but not a replacement for reading the original.)
The Case for Learned Index Structures
Could you replace B-Tree/hash/bloom filter database indexes with machine learning models? The depressing answer appears to be that it's viable. I thought the systems programmer was going to be the last job in the world!

But assuming this is the state of the art (rather than a more typical "this is what we were deploying 5 years ago" Google paper), it's not quite practical yet. CPUs aren't efficient enough, communication overhead with GPUs/TPUs too large. But that's an architecture problem that will get solved.

2017-11-20

A year of Google & Apple maps Time lapses of taking a screenshot of a specific location once a month in different map services.
Bufferbloat on the Internet backbone 400+ millisecond buffers on core Internet routers.
Did Microsoft Just Manually Patch Their Equation Editor Executable? What do you do when there's a buffer overflow in a program that hasn't been compiled for 17 years but still needs to be distributed with every copy of Office?
Determinism in League of Legends: Unified Clock It's totally normal to have 8 different concepts of time in the same program, right?
Dangers of the Decompiler Assembly tricks for fooling decompilers.

◀ Earlier | Index | Later ▶