Cache replacement algorithms are fascinating and have always been one of my favorite research and development hobbies. I really like the LHD concept but the comparison to other algorithms makes it less informative for practical use cases than it could be. Specifically:
- It primarily compares against relatively weak, naive algorithms like LRU because (inexplicably) that is what many open source systems use. All of the high-performance closed source systems I've seen use better algorithms that are not represented here. Maybe an availability bias? Also, some great algorithms don't even have names AFAIK, just implementations, which doesn't lend them to easy reference.
- CPU cache efficiency of algorithm implementations is often highly relevant to real-world performance of cache replacement algorithms. This is frequently overlooked in the academic literature, which focuses on hit rate under various workloads. Many algorithms that produce good hit rates have poor throughput in practice due to CPU cache behavior; quite a bit of R&D at database companies focuses on minimizing CPU cache thrashing from these algorithms. A slight reduction in hit rate that substantially reduces CPU cache thrashing can be a big win. Cache throughput often comes from surprising places.
- Schedule-based replacement mechanics are ignored. Given some large number of concurrent in-flight operations against the cache, you can adaptively reschedule the order of operations to approximate Bélády's optimal algorithm. For many workloads you can get a step function improvement in cache throughput from this, and it applies to the aforementioned CPU cache efficiency as well. Design of schedule-optimized cache replacement algorithms is not trivial; the obvious way is NP-Hard, so the challenge is in finding efficient approximations.
Generally though, I like the formulation of LHD. It would be interesting to see how it sits in the broader scope of real-world cache replacement algorithm implementations. This is one of those areas in database computer science where the closed source design theory is richer than what makes it into the academic literature, so it is more difficult to figure out what the state-of-the-art even looks like.
Disclaimer: I’m definitely not an expert in this area and my understanding of the paper may be off.
I wonder if Jeff Dean (and google) have tried a neural network version of this. The core idea here seems similar to “learned index structures”
One of the http caches has a priority policy where you can prioritize small files over large ones under the presumption that for some workloads communication overhead (including setup tear down and connection pooling) dominates the cost. Storing ten small results may be less wear and tear on the system than one big one.
Which is essentially the same observation this policy is based on, but with gradations instead of a binary decision.
Paper seems to miss the modified-2Q algorithm which was in memcached as of the cited 1.4.33 (became default in 1.5.0). Has very different performance characteristics than straight LRU.
Still need to give a more thorough read, but spotting other errors, ie; FB memc (according to their paper) balances pages by tail age, not hit rate.
Seems interesting though.
It’s not clear from the paper how the rank is updated. If anyone understand better, could you explain.
The eviction is doing random sampling of N entry and is straight forward. But how does it compute the hit density online per entry.
Could the algorithm be applied to block cache problem with single item size, like filesystem block cache or database cache?