This is great, but remember that it covers Meltdown, not Spectre. Meltdown is the more immediate disaster, but Spectre is the more batshit vulnerability. You really want to get your head around:
* The branch target injection variant of Spectre if you want to get a sense of how amazing this vulnerability is: you can spoof the branch predictor to trick a target process into running arbitrary code in its address space! This is crazy!
* The misprediction variant of Spectre if you want to get a hopeless feeling in the pit of your stomach, since the implications of mispredict are that certain kinds of programs are riddled with a new kind of side channel we didn't really grok until last week, and no upcoming microcode update seems to be in the offing.
You could probably use the same Python conceit to illustrate the other two attacks; someone might take a crack at that.
(I'm not disputing that the R-Pi's aren't vulnerable to Spectre).
This is a good overview of modern, superscalar, out-of-order, speculative CPUs that literally any programmer could easily understand. Recommended reading for every single engineer in the whole world (who doesn't already understand this stuff from reading source material e.g., Google Zero post).
I understood everything up until the "suppose we flush our cache before executing the code" part which is probably the most important part.
There was a comment below the article that explained this part a little further:
> Imagine the value at the kernel address, which gets loaded into _w, was 0xabde3167. Then the value of _x is 0x100, and address user_mem[0x100] will end up in the cache. A subsequent load of user_mem[0x100] will be fast.
> Now imagine the value at the kernel address, which gets loaded into _w, was 0xabde3067. Then the value of _x is 0x000, and address user_mem[0x000] will end up in the cache. A subsequent load of user_mem[0x100] will be slow.
> So we can use the speed of a read from user_mem[0x100] to discriminate between the two options. Information has leaked, via a side channel, from kernel to user.
Ok I think I understand the subtleties of these attacks now. But: can anyone tell me why the accessibility check for protected memory doesn't happen before the cache loads the contents of RAM? If that happened then none of these attacks would be possible.
I got my computer engineering degree in 1999 and ended up going the computer science route making CRUD apps all day. I feel in my gut that some engineer, somewhere, MUST have asked this question at one of the big chip manufacturers.
Am I missing something fundamental? Is the access check too expensive? If it isn't, then can the microcode be updated to do this, or is caching/accessibility checking happening at a level above microcode? If that's the case then it would seem that pretty much all processors everywhere that do speculation without protected memory access checks are now obsolete.
With all the news about these attacks lately, this is one of the best posts I've seen in explaining to less knowledgable people how exactly speculation causes a problem.
One question I still have that gets glossed over is how timing of instructions is captured.
The cores in (all versions of?) Raspberry Pi do speculatively execute. It's just that the window of opportunity is tiny - just a few cycles (and maybe up to twice that many instructions) - and there's (probably) no way to get an indirected side-effect.
I wouldn't write off the ability to get a useful side-effect signal. The variants widely documented are not the only possible methods of inducing speculative side-effects.
> The lack of speculation in the ARM1176, Cortex-A7, and Cortex-A53 cores used in Raspberry Pi render us immune to attacks of the sort.
I didn't check, but these will almost certainly have branch prediction. What they probably lack is a predictor advanced enough to speculate on indirect branches, which AIUI is the primary vector of Spectre.
I was already on the lookout for a small ARM-based mini PC, just for doing financial transactions and record-keeping. Now that seems more pressing but I don't know of any such thing in existence.
I tried doing that on RPi 3, but the IO seemed not up to the job -- the CPU appeared to be just about tolerable, but using micro SD as a disk was too slow and prone to failure (I'd have tried an external USB disk but I believe the problems were in part because of poor I/O bandwidth). Other single board machines seemed to have better provision for disks that are up to the task I had in mind, but lack software support, so that I had little confidence in security updates, for example.
If somebody sold this I think they'd have my money tomorrow:
* An ARM mini-PC
* With a decent security update team behind it (probably the hard part?)
* That will let me run some basics: for me, a Unixy OS with Chrome/Chromium, emacs, ledger and python, without a big effort to install those and keep them up to date
* Ideally without too much anti-commodification BS (from my customer perspective) so that hardware can be swapped out if needed
Does anything like that exist?
This is a fantastic read. Timing attacks are insidious and tend to crop it in the oddest places. I first learned of them when learning how to securely compare strings (used a lot with passwords). A naive implementation means that you can easily guess if a character is correct depending on how fast the compare function returns.
This article wonderfully explains a complex context without losing a lot of relevant detail.
> In the good old days*, the speed of processors was well matched with the speed of memory access...Over the ensuing 35 years, processors have become very much faster, but memory only modestly so: a single Cortex-A53 in a Raspberry Pi 3 can execute an instruction roughly every 0.5ns (nanoseconds), but can take up to 100ns to access main memory.
In real-world terms, what's the fastest processor we could build today whose execution speed is reasonably matched to it's main memory access speed (so it doesn't need caches, etc)?
I could imagine that a processor, with a simple design that closely matches a naive model of how CPUs work, would be very useful for high-security applications. It would be much easier to reason about up-front.
I enjoyed reading this a lot. I wonder why the developers decided to allow reading kernel-memory in the first place. When a scalar processor reads kernel memory, it crashes. When a speculative processor reads kernel memory, it relies on the assumption that the read is never committed to prevent leakage. It takes no expert to realise this is a potentially dangerous decision (and, as becomes clear now, is only valid in the absence of a cache).
To me it would make a lot more sense to use a special value to indicate the read did not succeed and propagate this value until it is time to crash. I guess this introduces some overhead (e.g. reserve a special value); but are there any other drawbacks?
The best explanation of Meltdown I’ve read.
The best comment made on that blog post was by Eben himself:
"One almost wishes that they’d stuck with the original name for the KPTI patchset: Forcefully Unmap Complete Kernel With Interrupt Trampolines.
Now that's funny!!!
I've been wondering (and haven't seen it addressed anywhere) if these attacks could be used to get the private key out of game consoles. These days I would assume not - that the key would be in a secure enclave - but the current generation of consoles are a few years old now and maybe that's not the case.
Off topic, but I haven't seen this discussed anywhere yet. My understanding is that font files can contain complex instruction sequences to control exactly how a font is rendered. I believe Windows implements a kernel space VM to execute these instructions. I know variants 1 and 2 did not necessarily require eBPF but that it made the attack simpler because the desired instruction sequences could be injected directly into kernel space (rather than finding existing sequences in the code base). It seems that in theory font rendering could serve a similar function on some platforms.
Now... I am interested in assembly :D Any recommendations ???
Really awesome explanation
Hey, what about Intel Xscale processors like the PXA2xx series ?
These do have Dynamic branch prediction/folding afaik and may be affected ?
Does somebody have a spectre.c tuned for generic armv5tel for example?
Current versions of spectre.c, like this one https://gist.github.com/LionsAd/5116c9cd37f5805c797ed16fafbe... still contain "_mm_clflush" and therefore do not compile on ARM at all.
Fantastic read. Before reading the article, I assumed there were many HN readers who were extremely proud about their Raspberry Pi being invulnerable to Spectre or Meltdown
I've done some cursory searching and not found anything, so I'll ask here: what mechanism is used to measure how long it takes to access a specific address in memory?
I assume there is some way to tell the CPU "when memory location X is read, store the current time in register Y" or some such thing. Could anyone share what that mechanism is?
How many RPi users are using this board to run untrusted code?
The RPi may mitigate risk of these attacks simply in the way it is used.
None of the RISC-V chips are either:
What would finally bring this all together to me would be an example of a real world attack that would be carried out using these methods on some target, perhaps with an implementation.
Didn't ARM say the Cortex A53 is vulnerable to Meltdown?
Does anyone expect bug revelations after this to be less worse, or is there still a chance there could be vulnerabilities that are worse than these?
Technical details aside, I find it quite amusing that the hardware in my pi zero is more secure than my desktop that is two orders of magnitude more expensive