Modern CPU Architecture: Threat or Menace? The Case Of Branch Prediction
Thomas Ptacek | September 5th, 2006 | Filed Under: Defenses, New Findings, Uncategorized
From Aciicmez, Seifert and Koc, a new localhost timing attack that currently recovers key bits from OpenSSL, this time exploiting Pentium branch prediction.
We’ve written about timing attacks before; they’re the scariest practical attacks on cryptosystems today. The basic idea is simple: the length of time a crypto algorithm takes is often related to the secret bits its handling. If you can precisely measure the amount of time a crypto operation takes, you can predict things about those secret bits. And if there is one thing computers are good at, it’s taking lots and lots of tiny measurements.
The Aciicmez et al paper is a variant on cache timing. What these attacks exploit is that an operation can take wildly differing amounts of time depending on whether they involve bits that are in the cache. If you can control the contents of the cache, or probe the system with lots of inputs to map the cache out, you can make guesses at key bits.
The Aciicmez et al attack notes that in addition to the L1 and L2 caches, modern CPUs cache branch histories to facilitate prediction. What’s happening is, as CPUs get faster, they rely more on pipelining, which involves executing small pieces of multiple instructions at the same time. Pipelining works well when the CPU can load and execute a straight line of instructions, but gets tricky when the path a program takes through the instructions is broken up by jumps and branches. If the CPU can guess which path the program will take (the direction of the branches), it can keep its pipeline full. If it guesses wrong, it has to jettison the contents of the pipeline and start over. That takes time, which can be measured.
Modern CPUs predict branches by remembering the direction of previously executed branches. These results are stored in a cache (the BTB), which on x86 processors work much like any other cache. On multi-threaded processors, the branch prediction logic (and the cache) are a shared resource; two processes executing at the same time can influence the contents of the cache. The authors use this to attack RSA in OpenSSL 0.9.7c in 3 ways:
Attackers can clear the BTB, forcing the CPU to predict branches won’t be taken, which reveals whether branches have been taken or not (because taken branches will be mispredicted, dragging speed down). To speed this up and improve fidelity, attackers can exploit knowledge of the target architecture to clear only the BTB entries they care about.
Attackers can figure out where the RSA code is, and surgically clear out the BTB to predict specific branches, directly revealing key bits.
Worst of all, attackers can clear the BTB, trigger an RSA operation, and then probe her own branches to see which branches the RSA code took, creating a trace history of the encryption.
Cache timing has been fertile ground for timing attack research:
Osvik, Shamir, and Tromer used it against AES when the cache implementation is known (as is usually the case). This is an excellent paper with multiple practical attacks and is a great introduction to what’s going on.
DJB implemented a similar attack, over a network, that relies on basic statistics instead of implementation details. We’ve written about this.
Colin Percival exploited the fact that on Intel multi-threaded chips, threads simply share the cache and can map its contents directly. Yeah, yeah, we wrote about him too.
Remote timing attacks are practical, but localhost timing attacks are easier and most of these papers exploit that. One lesson here is that untrusting parties shouldn’t share the same computing resources, which creates side channel exposure. The more general lesson is that no matter how sound the math behind your crypto is, the systems stuff is what’s going to kill you.


Add New Comment
Thanks. Your comment is awaiting approval by a moderator.
Do you already have an account? Log in and claim this comment.
Add New Comment
Trackbacks