Creating a bogus @google.com address using one bug, and then getting rights to access the details of all (most?) bugs in the system.
There's a funny thing where for most software vulnerabilities a writeup like this can explain exactly what happened why. But then for web-based services all you see is a bug that looks totally braindead. It's user accounts! how hard can that possibly be?
I work on stuff related to security of account systems at the moment; turns out that just like in every other area of modern computing, it's incredibly complicated.
You've forgot the PIN and passphrase of your hardware bitcoin wallet. Must be out of luck, those things can't possibly have trivial vulnerabilities. I mean, surely those sophisticated cryptocurrency enthusiasts would be able to spot the snakeoil a mile away. Oh... No?
Rewriting the Xen control plane to optimize the boot time of thousands of tiny VMs. Benchmarked against Docker, which seems just absurdly bad at this one.
(Not impressed by the numbers outside of booting with regard to "lightness". E.g. the throughput and scaling numbers for the personal firewall application are just miserable.)
It's possible for multiple houses (e.g. in different cities) to have the same street name and number. What's the closest pair of such "twins" in the UK?
I truly appreciate the dedication that went into researching this bit of trivia.
Why POSIX filesystem semantics aren't a good fit for large scale systems. (The funny thing is that it never occurred to me that anyone would use POSIX APIs in a modern distributed context. But apparently in supercomputing they do).
How do you enjoy the process of creating a new language, when you've been writing compilers for a long time? By adding artificial restrictions. Only assembly; no libraries, programming languages, or code generators.
Implement a minimal BCPL-like language in assembly. Then use that to implement a Lisp interpreter, that interpreter to create a reasonably featured VM (e.g. garbage collector, delim. continuations). Write a compiler, assembler, disassembler, linker targeting the VM. Then use these tools to write the language you originally wanted with objects, pattern matching, and non-sexp macros.
Thoughts about what features a REPL (and the language being evaluated in the REPL!) should have to be useful.
Read this as part of some archaeology into numeric representation in early Lisp systems. But it actually turned out to be pretty neat systems paper in general. One thing that's striking is how readable this 50 year old paper still is. The vocabulary of systems programming has changed surprisingly little (just switched from words to bytes), and even the problems being solved are at the core the same. It's all about memory hierarchies, even at the dawn of computing.
Paper describes an early version of BBN Lisp for a machine with 16K words of core memory, and 88K words of absurdly slow drum memory. Hardware has no paging support. How do you make efficient use of the drum memory, to fit in meaningful programs? So you need to somehow do paging in software, and reorganize the data layouts to minimize pointer chasing and page faults. (The latter bit is what I was really interested in, while looking at the history of tagged pointers).
Take a bitset implementation that splits the set into blocks, and adaptively uses the best data representation for each block. How do you determine which internal representations actually make sense?
Slides with anecdotes on game optimization in general, but on the Jaguar CPU in particular. E.g. didn't realize you really have to use SIMD on those CPUs, or you can't even use the full cache bandwidth. Neat example of a custom spatial database near the end.
"tl;dr don't bother".
The painful step-by-step journey of implementing a seemingly trivial optimization in a production compiler. Especially the "Lessons Learned" part is great; I'm fighting the temptation not to just quote all of it here.
> I switched to a 12” MacBook before I started working on my swiftc PR. It was so slow that I was only able to iterate on the code once a day, because a single compile and test run would take all night. I ended up buying a top-of-the-line 15” MacBook Pro because it was the only way to iterate on the codebase more than once a day.
> It’s really easy to break swiftc because of how complex it is. My original pull request was approved and merged in a month. Despite only having about 200 lines of changes, I received 125 comments from six reviewers. Even after that much scruitiny, it was reverted almost immediately because it introduced a memory leak that a seventh person found after running a four hour long standard library integration test.
Yes, it's a Bitcoin article. But it's also really good!
> Bitcoin neatly avoids the double-spending problem plaguing proof-of-work-as-cash schemes because it eschews puzzle solutions themselves having value.
Example of how some of the new features in the C++ standard will work together.
A chip reverse engineering story with the best digressions. It's not just about figuring out that the supposed RAM chip is actually a touch tone dialtone generator; it's also figuring out the maths on every dialtone generator on the market to exactly identify this one. And then going into some semiconductor physics for good measure.
An introduction to Futamura projections, phrased in terms of physical objects rather than partial evaluation of source code.
> In practice, a message broker is a service that transforms network errors and machine failures into filled disks. Then you add more disks.
On why you probably want either a load balancer or a database, not a pubsub system.
Slava Pestov reads through The NeWS Book: An Introduction to the Network/Extensible Window System from 1989. I never knew anything about NeWS, except from the Unix Haters Handbook X11 rant, so it was nice to fill it in with some more facts.
> Specifically, what I needed was mostly like a tree diff but I wasn’t optimizing for the same thing as other algorithms, what I wanted to optimize for was resulting file size, including indentation.
Many people don't appreciate how complicated handling configuration data is in the real world. (Pretty much every one of my jobs has at some point turned into a configuration handling nightmare). This is a good story on exactly that. There's a need for a seemingly very simple config manipulation operation, but a couple of weeks later you find yourself doing dynamic programming.
(Also, this is not just a good story, but a great example on just how to present an algorithm).
A walk through early CPU branch prediction strategies.
How to make a practical web search system using bloom filters rather than an inverted index. I especially like the notes on how classical problems of signature-based don't really matter in this domain. E.g. a modest amount of false positives is not a problem, since the full result set needs to be scored no matter what. Or how sharding the index by number-of-unique-terms was impractical in the past due to excessive disk seeks, but no problem when the index needs to be sharded to hundreds of machines anyway.
A deep dive of how MVCC works in Postgres, from concepts all the way down to the exact source code.
Reverse engineering the microcode in Athlons and Phenoms. Half of this work was done by mutating existing microcode update files, and probing the behavior of various instructions in a minimal operating system. The other half was done by delayering a CPU and using a electron microscope to find and read the microcode ROM.
Then write a proof-of-concept remote triggerable trojan in microcode.
Another trip to crazytown. How Windows Vista would artificially limit network throughput if any sound was playing. (With an effect that would be magnified linearly as more NICs were added to the machine). Brought up in the HN discussion of my PS4 download speed post.
I like the idea of treating programming languages as a creative work to be reviewed critically.
A case study in how not to change defaults when evolving a program from one use case to another. (Any blog platform will inevitably try to transform into a general purpose CMS and call a dystopian hellscape of ecommerce plugins an "ecosystem"). But I can't understand how anyone would think that changing the default RSS feed item count from 10 (which sounds pretty standard) to infinite could be the right thing.
A good discussion on the problems with transparent huge pages. (I turn them off at work for our data analysis machines, due to some absolutely crippling throughput issues they cause. Really need to check whether that server is already running on a 4.6+ kernel, with the supposedly improved THP behavior mentioned in this thread.)
Why does Linux load average include processes that are blocked on swapping. (Never realized they did; thought it used the classical definition). You know it's good software archaeology when it's treating with something that's still relevant today, and the search bottoms out in MACRO-10 code.
> font-size is the worst.
Just how hard coan it be to determine which font size should be used for an element based on the CSS? Pretty damn hard, it turns out.
> To recap, we are now at four different notions of font size being inherited: ...
Why and how to deprecate a programming language.
The thesis here is that the Linux kernel isn't a monorepo. Instead it's a monotree with multiple repositories. There are multiple repositories, e.g. the main one by Linus, subsystem specific ones, etc. Hence not a monorepo. But all of those repositories are rooted in the same tree, with changes flowing between the repos arbitrarily (so they're not polyrepos, which would generally need to be totally independent of each other). Hence the need for the new term.
Unsurprisingly, Github doesn't support this fairly unique workflow.
Computer science paper recommendations from Fabien Giesen, with long summaries of exactly why these papers are particularly useful/interesting.
A HN comment from 2015 explaining why the 6502 instruction set encouraged a SOA layout over AOS.