Avoid wasting time on fetching the secondary bucket by maintaining a bloom filter of keys that required falling back to the secondary.
A linear probing hash table for batch purposes that can basically get 100% occupancy by having a dense hash-key sorted array as the primary, and a bit-array with popcount tricks to find the index into the dense array. It's not at all obvious to me why this works, it seems like for any reasonable hash code length the bitmap has to be wasting tremendous amounts of memory. Especially considering their bitmap encoding seems wasteful (two bits per possible hashcode, seems like you could get it very close to one bit without any more memory accesses).
But the benchmarks claim it works, so...
Schema-less flatbuffers. The HN thread turned into a delightful pissing match between protobuf implementors.
Converting a (part of a) row-order query engine to batched column-order.
This seems to be the patient zero for database engines optimizing for branch mispredicts by batching operations on homogenous data?
Protobuf dom that works entirely in-place on the original input string.