As others have already said, if you're a programmer, please just read the original papers:
https://meltdownattack.com/meltdown.pdf (start with this one)
They are extremely well written, clear and to the point. Understanding them will take you less time than trying to get rid of all the tortured analogies and unnecessary simplifications people have been trying to make up over the past week. It's bad enough that we face the daunting task of explaining this stuff to people who don't care about computers, there's no need to perpetuate misunderstanding among those who deal with computers for a living. Just read the real thing.
And on the subject of explaining this to others, it might surprise you how far you can get if you try to honestly explain how the attacks work. I refuse to use the silly train station metaphors, so I tried to describe the basic idea of how speculative execution works in out of order CPUs to my parents (who can browse the Internet, with some effort, and were patient enough to listen to me for 10 minutes or so). I don't think I got the notion of return-oriented programming across very well, but the basic idea of Meltdown and side channel timing attacks in general is actually very easy to convey on the basis of a reasonably simplified picture of a CPU - you need the explain the basic role of cache memory, virtual vs physical addressing, the TLB and the basic notion of branch prediction. That's all you need to understand the principle of how the attacks work, if not the details of the implementation.
Things you must know before you can understand Meltdown:
* The memory hierarchy (registers, cache, memory); really all programmers always need to know the memory hierarchy and Meltdown just sort of reinforces that.
* The basics of kernel memory management (kernel memory is mapped into userland processes and protected by page table permissions checks).
* Very basic assembly language (basically what a variable assignment and an "if" statement compile down to).
* The idea of pipelined CPUs, the idea that on modern CPUs the registers you see in assembly instructions are actually renamed from a larger invisible register file, and the distinction between instruction execution and retirement.
If you've got this I think you can just read the paper: https://meltdownattack.com/meltdown.pdf. It's really well written. In particular: I don't think you need to understand much about timing attacks. The Flush+Reload paper (you can just Google it, it'll be the first result) is also really well written, but you'll be fine in the Meltdown paper without having read it.
My naive expectation would have been that the CPU maintains some kind of process level isolation.
My new understanding is now that the concept of a process and isolation of processes is handled by the kernel.
This is probably a silly question, but maybe we could handle process isolation in the CPU somehow?
I don't understand one part. If you read from an arbitrary memory location (during speculative execution, I get all that) how does that read pull data from a different process? Aren't all addresses virtual until they go through the MMU and get translated to a physical address depending on the process?
Or does this work only because the kernel exists in the same virtual address space, hence KPTI as a mitigation?
Tried to write a meltdown explanation for "everyday" developers. There are some loose analogies and inexact writing. (Please point out mistakes, they're mine)
My explanation: https://blog.cloudflare.com/meltdown-spectre-non-technical/
Here is the same in 10 lines of pseudo-code
How many bits per second (or kbps, mbps, etc) of memory reading is possible with Meltdown when run from JS vs running natively?
Somewhat related, is it possible to neuter the JS engines in Firefox or Chrome so that they don't JIT JS and would doing so have any real world impact on mitigating this attack? If it relies on speedy execution to be possible maybe a solution would be to have a NeuterScript extension that deliberately slows things down.
Unfortunately, this gets some big things wrong. Meltdown is not about speculative execution. (Spectre is.) Meltdown is about out-of-order execution - no branches required. The authors are clear about this in the paper. From Section 2.1:
"In practice, CPUs supporting out-of-order execution support running operations speculatively to the extent that the processor’s out-of-order logic processes instructions before the CPU is certain whether the instruction will be needed and committed. In this paper, we refer to speculative execution in a more restricted meaning, where it refers to an instruction sequence following a branch, and use the term out-of-order execution to refer to any way of getting an operation executed before the processor has committed the results of all prior instructions."
In this explanation, the author starts by showing two different code branches, which is misleading. Meltdown does not require code branches - which is what makes it so surprising. This is the C code example from the paper:
No branches: you have an exception, and then in the code following that exception, you have some memory access. Despite the exception, the access happens because of out-of-order execution. The actual exploit is, in assembly:
raise_exception(); // the line below is never reached access(probe_array[data * 4096]);
The exception is raised on the mov command, as it loads a kernel address. This exception will eventually cause the processor to abandon all of the current code it is executing, and the program will terminate from a segmentation fault. But. There is a race condition: before the processor deals with the exception, but after the memory has been accessed, the second mov instruction executes, which uses the data which caused the exception. This shouldn't matter, as execution is abandoned, but data is brought into the cache based on this value, and using side-channel attacks, we can figure out what this value was. From the paper:
; rcx = kernel address ; rbx = probe array retry: mov al, byte [rcx] shl rax, 0xc jz retry mov rbx, qword [rbx + rax]
"To load data from the main memory into a register, the data in the main memory is referenced using a virtual address. In parallel to translating a virtual address into a physical address, the CPU also checks the permission bits of the virtual address, i.e., whether this virtual address is user accessible or only accessible by the kernel. As already discussed in Section 2.2, this hardware-based isolation through a permission bit is considered secure and recommended by the hardware vendors. Hence, modern operating systems always map the entire kernel into the virtual address space of every user process.
As a consequence, all kernel addresses lead to a valid physical address when translating them, and the CPU can access the content of such addresses. The only difference to accessing a user space address is that the CPU raises an exception as the current permission level does not allow to access such an address. Hence, the user space cannot simply read the contents of such an address. However, Meltdown exploits the out-of-order execution of modern CPUs, which still executes instructions in the small time window between the illegal memory access and the raising of the exception."
I find the paper to be very readable. They give a good overview of modern computer architecture, and then walk through all of the steps of their attack. I highly recommend reading it: https://meltdownattack.com/meltdown.pdf
Suggestion - this logo:
Should probably say "Intel" somewhere ;)
I have a couple question about Meltdown and the Intel chips. My understanding is that a key part of this is that upon speculative execution the page table permission checks only happen when the "transient"instruction is retired.
Was this simply a performance engineering trade off made by Intel? Would checking the PTE permissions on speculative execution result in giving up any performance gained by the speculative execution?
Cache line is 64 bytes on x86-64 if I am not mistaken, not 4096 :)
Nice read anyway.
I am surprised this hasn’t been explained in terms of a vulnerability chain. Ie, break it up into parts. As soon as you have an oracle providing cache timing info you have a vulnerability.
Basic Bayesian analysis suggests that there is more fruit to fall off the tree.