> Developing and testing a virtual version of Unix on OS/32 has practical advantages. There was no need for exclusive use of the machine; [...]. And the OS/32 interactive debugger was available for breakingpoint and single-stepping through the Unix kernel just like any other program.
A port of Unix v6, from before it was really meant to be portable. A lovely systems programming story.
A very amusing system programmer's lament.
Ignore the title. It's not actually a rant about Skype sucking, but a really cool article series on someone writing their own codec + packet-loss tolerant UDP networking for a prototype video conferencing app.
Micro-optimizing lockless message passing between threads.
Then use this to replace locks on data structures. Instead of data structures being shared, they're owned by a specific server process. If a client needs to operate on data structure, it asks the server to do it instead. Assuming heavy contention, this'll be much faster since fewer cache coherency roundtrips are required.
(Obviously not widely applicable, due to the scheme requiring busylooping to work well.)
This'll go into the hall of fame of great debugging stories.
The PDP-I 1 was designed to be a small computer, yet its design has been successfully extended to high-performance models. This paper recollects the experience of designing the PDP-I I, commenting on its success from the point of view of its goals, its use of technology, and on the people who designed, built and marketed it.
A lovely mid-life postmortem for the PDP-11.
(Via Dave Cheney; a useful companion piece putting the paper in the historical context, but not a replacement for reading the original.)
Could you replace B-Tree/hash/bloom filter database indexes with machine learning models? The depressing answer appears to be that it's viable. I thought the systems programmer was going to be the last job in the world!
But assuming this is the state of the art (rather than a more typical "this is what we were deploying 5 years ago" Google paper), it's not quite practical yet. CPUs aren't efficient enough, communication overhead with GPUs/TPUs too large. But that's an architecture problem that will get solved.
Creating a bogus @google.com address using one bug, and then getting rights to access the details of all (most?) bugs in the system.
There's a funny thing where for most software vulnerabilities a writeup like this can explain exactly what happened why. But then for web-based services all you see is a bug that looks totally braindead. It's user accounts! how hard can that possibly be?
I work on stuff related to security of account systems at the moment; turns out that just like in every other area of modern computing, it's incredibly complicated.
You've forgot the PIN and passphrase of your hardware bitcoin wallet. Must be out of luck, those things can't possibly have trivial vulnerabilities. I mean, surely those sophisticated cryptocurrency enthusiasts would be able to spot the snakeoil a mile away. Oh... No?
Rewriting the Xen control plane to optimize the boot time of thousands of tiny VMs. Benchmarked against Docker, which seems just absurdly bad at this one.
(Not impressed by the numbers outside of booting with regard to "lightness". E.g. the throughput and scaling numbers for the personal firewall application are just miserable.)
It's possible for multiple houses (e.g. in different cities) to have the same street name and number. What's the closest pair of such "twins" in the UK?
I truly appreciate the dedication that went into researching this bit of trivia.
Why POSIX filesystem semantics aren't a good fit for large scale systems. (The funny thing is that it never occurred to me that anyone would use POSIX APIs in a modern distributed context. But apparently in supercomputing they do).
How do you enjoy the process of creating a new language, when you've been writing compilers for a long time? By adding artificial restrictions. Only assembly; no libraries, programming languages, or code generators.
Implement a minimal BCPL-like language in assembly. Then use that to implement a Lisp interpreter, that interpreter to create a reasonably featured VM (e.g. garbage collector, delim. continuations). Write a compiler, assembler, disassembler, linker targeting the VM. Then use these tools to write the language you originally wanted with objects, pattern matching, and non-sexp macros.
Thoughts about what features a REPL (and the language being evaluated in the REPL!) should have to be useful.
Read this as part of some archaeology into numeric representation in early Lisp systems. But it actually turned out to be pretty neat systems paper in general. One thing that's striking is how readable this 50 year old paper still is. The vocabulary of systems programming has changed surprisingly little (just switched from words to bytes), and even the problems being solved are at the core the same. It's all about memory hierarchies, even at the dawn of computing.
Paper describes an early version of BBN Lisp for a machine with 16K words of core memory, and 88K words of absurdly slow drum memory. Hardware has no paging support. How do you make efficient use of the drum memory, to fit in meaningful programs? So you need to somehow do paging in software, and reorganize the data layouts to minimize pointer chasing and page faults. (The latter bit is what I was really interested in, while looking at the history of tagged pointers).
Take a bitset implementation that splits the set into blocks, and adaptively uses the best data representation for each block. How do you determine which internal representations actually make sense?
Slides with anecdotes on game optimization in general, but on the Jaguar CPU in particular. E.g. didn't realize you really have to use SIMD on those CPUs, or you can't even use the full cache bandwidth. Neat example of a custom spatial database near the end.
"tl;dr don't bother".
The painful step-by-step journey of implementing a seemingly trivial optimization in a production compiler. Especially the "Lessons Learned" part is great; I'm fighting the temptation not to just quote all of it here.
> I switched to a 12” MacBook before I started working on my swiftc PR. It was so slow that I was only able to iterate on the code once a day, because a single compile and test run would take all night. I ended up buying a top-of-the-line 15” MacBook Pro because it was the only way to iterate on the codebase more than once a day.
> It’s really easy to break swiftc because of how complex it is. My original pull request was approved and merged in a month. Despite only having about 200 lines of changes, I received 125 comments from six reviewers. Even after that much scruitiny, it was reverted almost immediately because it introduced a memory leak that a seventh person found after running a four hour long standard library integration test.
Yes, it's a Bitcoin article. But it's also really good!
> Bitcoin neatly avoids the double-spending problem plaguing proof-of-work-as-cash schemes because it eschews puzzle solutions themselves having value.
Example of how some of the new features in the C++ standard will work together.
A chip reverse engineering story with the best digressions. It's not just about figuring out that the supposed RAM chip is actually a touch tone dialtone generator; it's also figuring out the maths on every dialtone generator on the market to exactly identify this one. And then going into some semiconductor physics for good measure.
An introduction to Futamura projections, phrased in terms of physical objects rather than partial evaluation of source code.
> In practice, a message broker is a service that transforms network errors and machine failures into filled disks. Then you add more disks.
On why you probably want either a load balancer or a database, not a pubsub system.
Slava Pestov reads through The NeWS Book: An Introduction to the Network/Extensible Window System from 1989. I never knew anything about NeWS, except from the Unix Haters Handbook X11 rant, so it was nice to fill it in with some more facts.
> Specifically, what I needed was mostly like a tree diff but I wasn’t optimizing for the same thing as other algorithms, what I wanted to optimize for was resulting file size, including indentation.
Many people don't appreciate how complicated handling configuration data is in the real world. (Pretty much every one of my jobs has at some point turned into a configuration handling nightmare). This is a good story on exactly that. There's a need for a seemingly very simple config manipulation operation, but a couple of weeks later you find yourself doing dynamic programming.
(Also, this is not just a good story, but a great example on just how to present an algorithm).
A walk through early CPU branch prediction strategies.
How to make a practical web search system using bloom filters rather than an inverted index. I especially like the notes on how classical problems of signature-based don't really matter in this domain. E.g. a modest amount of false positives is not a problem, since the full result set needs to be scored no matter what. Or how sharding the index by number-of-unique-terms was impractical in the past due to excessive disk seeks, but no problem when the index needs to be sharded to hundreds of machines anyway.