Detecting Virtualized Rootkits
Thomas Ptacek | January 24th, 2007 | Filed Under: Defenses, Disclosure, Uncategorized
Joanna Rutkowska responded, below.
1.
Recall that hardware-assisted virtualization, as exploited by Dino’s Vitriol and Rutkowska’s BluePill, hides code in the “ring -1” hypervisor privilege level carved out by the AMD and Intel Virtualization Extensions (SVM and VT, respectively). In this environment, we can arrange for the CPU to trap to our malicious code any time someone does something to try to detect us, and emulate the innocuous response.
For example, under VT, the CPUID instruction could betray the presence of virtualization extensions. So VT traps to the hypervisor (where we’ve stuck our rootkit) when CPUID is executed, and allows us to emulate the response. From a functional perspective, you can’t look at the result and know it was emulated. You can, however, use the cycle counter to time the CPUID instruction; a trap and the software code executed to emulate CPUID will take noticeably longer than a “native” CPUID execution. “Unfortunately”, Intel has arranged for hypervisors to control the cycle counter as well; the cycles consumed during virtualization can be trivially masked with about 5 lines of C code.
So the blog debate about detecting virtualization has revolved around using “trusted” clocks, such as those on an uninfected machine, to time the instructions. Of course this will work, but it’s a pain.
2.
Last September, Peter Ferrie at Symantec made this blog claim:
Anyway, I found something that you can do in the guest that the hypervisor can’t see until after it’s happened; so, it can’t hide the side effects. It doesn’t need a network connection and it doesn’t need a user to time anything. It’s also quick (it executes in one timeslice), but, I still can’t tell you how I did it.
Today, I read Ferrie’s paper describing the trick. It’s very similar to one of the (many) tricks we recently presented in a speaking engagement.
You can’t trust the TSC timing of the CPUID intruction, because the hypervisor can simply mask the cycle overhead of any VM trap. But you might be able to trust the TSC timing of side effects that don’t cause VM traps.
For instance, you can saturate any of the caches in the system, and baseline the cycles it takes to read known cached values. Then you cause VM trap (for instance, by executing CPUID). The VM exit and entrance, along with the software that runs in the hypervisor to emulate CPUID, will evict cache entries. When you get control back from the hypervisor, you loop timing the “cached” values again; if they differ significantly, you’re virtualized.
Ferrie uses the TLB cache; the TLB keeps you from having to walk the whole page directory hierarchy when you access a virtual address. There are other caches you can saturate. I could give details to Joanna, or just refer you to any of the last N blog posts we’ve written about timing attacks against cryptosystems; the differences between timing RSA and timing a hypervisor are:
Unlike the crypto attacks, hypervisor timing is done with the target on the same CPU core and the same CPU thread and the same CPU resources as the “attacker”.
Similarly, unlike in a crypto timing attack, a hypervisor timer can quiesce the whole system (for a few milliseconds), which dramatically reduces the signal/noise ratio.
Unlike the crypto attacks, the “signal” you’re looking for isn’t complex; you don’t need a trace of “branches taken” in a known piece of code, averaged out to eliminate noise. You just have to observe a causal timing relationship with CPUID (or something else that causes a VM exit).
Remember also that BluePill “emulates” the SVM instructions, allowing for nested VMs. BluePill therefore has to pretend to allow code to run in “ring -1”, to complete the illusion that code is talking directly to the hardware. The only difference between “ring 0” and “ring -1” is that there are additional instructions available in “ring -1”. This is a long-winded way of saying that you can detect a surreptitious hypervisor even on systems that are “supposed to be” running hypervisors.
3.
So that’s the timing attack on hypervisors. We use the following terms:
Direct timing executes instruction sequences that are known to cause VM exits, and counts the cycles they take, looking for abnormally high results.
Indirect timing executes those same instruction sequences, but brackets them with code that saturates a chip resource, baselines the resource, and then checks that the timing remains invariant after the VM exit.
We demoed Direct Timing, against our Vitriol rootkit, at Black Hat. We’ve written several Indirect Timing test cases since then; we hoped to demo at Black Hat this year, but Ferrie has scooped us!
But, at the risk of putting both Ferrie and Rutkowska on our scent, that is far from the end of the story. Ferrie’s paper is premised on the idea that hardware virtualization confers “functional transparency” to a rootkit —- that is to say, there’s no functional difference to observe if hardware virtualization is being used to hide a rootkit (Ferrie presents this in stark contrast to software virtualization, like VMWare and Bochs, which offer a myriad of observable functional “tells”).
Not so!
Here’s a simple example. It’s not ours, but it illustrates the point nicely.
AGP, the graphics bus, wants to offer graphics chipsets access to a physically contiguous buffer of host memory. But it’s hard for a kernel to promise physically contiguous memory in the quantities needed; the kernel is better used to assembling large amounts of memory out of a bunch of physically discontiguous pages; these big chunks of memory can be “virtually” contiguous on the host, using the MMU, but that doesn’t do a graphics chipset much good.
So AGP uses a scatter/gather mechanism called the GART. The GART is a physical-to-physical memory mapping mechanism that establishes an aperture in system memory. It appears contiguous, but is actually backed by discontiguous pages and a programmable mapping table.
You can use the GART to alias physical addresses, and scan memory. Yes, attempts to program the GART can be intercepted and emulated. But they aren’t. That would be work!
4.
And so you have one of the central challenges of trying to install an invisible malicious hypervisor. VMware doesn’t have this problem; it just doesn’t give you a GART to play with, and you don’t care, because you know you’re being virtualized. But Joanna can’t do that! She has to perfectly emulate every detail of the CPU and the chipset, while selectively guarding and emulating behaviors betray the presence of the hypervisor.
If this sounds suspiciously similar to the problem of passively observing network traffic between two hosts whose operating systems you don’t know on a network filled with chaff traffic designed to confuse the observer, that’s because it’s the exact same problem. But turned against the attacker. Ah, irony.
So we have the following set of tactics for detecting virtualization:
Timing Challenges
Direct Timing Challenges: how long does CPUID take?
Indirect Timing Challenges: how does CPUID impact the cache?
Functional Challenges
5.
It turns out, from where we stand, that virtualization is kind of a sucky place to hide rootkit code. There’s a jungle full of places to hide kernel code; “orphaned” kernel threads and backdoored page fault handlers are probably just scratching the surface. From what we can see, there’s no one good way to detect all kernel rootkits. But a virtualized rootkit is a different story: all you ever have to know is, “on this Pentium 4 633, when I’m in ring -1, does this system behave in a way that indicates it’s already virtualized?”.
There are some really easy functional and timing traps you can fall in to. Some of them are easy to defend against, if you know they’re there. And I didn’t think of all (or most) of them. So instead of telling you about them, I’ll:
Hope I’ve gotten some credibility from this post, and
Say “95faf2cfb27b4e271a8943ad44f7d865”, which is the SHA-1 of a quick text file list of the tricks we know about. When Ferrie or Joanna publish enough of them, I’ll post the tfile. =)
It’s nonced, by the way.
Virtualized rootkits are detectable, without external clocks.
(By the way: Ferrie’s paper is excellent, and goes into a huge amount of errata detail he found in software hypervisors).


joanna
January 24th, 2007 7:45 pmI guess I could point out some issues in Ptacek’s and Ferrie’s papers, like e.g. that they seem to ignore the fact that SVM implements so called ASIDs - a mechanism which is *designed* to actually prevent guest’s TLB flushes during VMEXITs, or I could mention the (not-yet-released) IOMMU feature, which should make it possible for something like Blue Pill Level 2 to not to worry about GART (BTW, BPL1 doesn’t need to worry about GART by definition), etc…
But that’s not the right direction for this discussion IMO. It’s more then 6 months now since I presented Blue Pill for the first time and yet we still don’t have any reliable, systematic, documented way to approach type III malware detection (and AFAIK, in case of AMD systems we also don’t have any way to prevent it - at least this applies to machines myself and my colleges have)…
Some people work hard to find some “hacks” to prove that I was wrong saying the this technology could be used to build “100% undetectable” malware. Well, I’m pretty sure that they will find flaws in many particular implementations - in fact even we, in COSEINC, have found some already. But what does that proof? Only that the particular implementation, of either malware or the underlying virtualization technology, is buggy.
How do you guys imagine future commercial A/V products? Will they be based on many such hacks or maybe it would be better if they used some more reliable, documented and systematic approach? But there is no such solution - and there will not be any, if CPU vendors didn’t introduce some changes, like e.g. adding special ‘CHECK’ instruction, which I discussed during my BH presentation.
So, that’s how I see this: we have quite good, systematic methods to find type I malware, we don’t have any good ones for the type II malware, but we can imagine to have them in the future, but as far as type III malware is concerned, I don’t believe we can do anything *useful* without the help of the CPU vendors…
Although, I might be wrong and the real answer could be: 95faf2cfb27b4e271a8943ad44f7d865…
Thomas Ptacek
January 24th, 2007 7:51 pmThis is pretty much the same thing the IDS vendors said, when we pointed out that overlapping fragments confused their TCP reassembly engines. “Of course there are bugs in our code. But we’ll just fix them!”. Yes, but there will always be more bugs.
The IOMMU point is well taken. I gather you haven’t used it yet, just like we don’t use LeGrande and the MPT in Vitriol. It’s worth mentioning that some of these detection techniques also present vectors for guest-hopping attacks.
The ASID tagging, TLB-preserving functionality is another point well taken. But that’s just one of the attacks. How do you turn off the branch predictor? Remember that these attacks have been tuned to pull keys out of SSL processes running in different CPU threads; hypervisor timers get to run right next to your VM exit.
I don’t have any proof about the detectability of BluePill, because you haven’t released BluePill or allowed anyone to test it. Would you like to do that? I think we’re up to the challenge!
Failing that, we’re always going to be talking in the hypothetical. But I’m curious: why do you think we can imagine reliable kernel rootkit detectors, but not reliable hypervisor detectors? Hypervisors are more disruptive than kernel threads.
Steo
January 24th, 2007 9:16 pmThomas,
someone somewhere will always find someway of detecting a virtualized rootkit without an external clock as you said. When it becomes apparent that that it has been done someone will find a way to find it.
Until the situation is put to us we will not be able to really explore it.
Nice debate,
regards
Steo
Thomas Ptacek
January 24th, 2007 10:21 pmNobody can reliably predict the future, but we can use our experience and knowledge to guess where the trends are heading.
I predict that, because “detect unexpected virtualization” is a simpler goal than “detect any modification to the OS runtime”, virtualized rootkits will remain easier to detect than kernel rootkits.
one.miguel
January 24th, 2007 11:24 pmI wonder how large Bluepill must be in order to do all those cool things…
Thomas Ptacek
January 24th, 2007 11:30 pmMy guess, not counting LWIP, is about 1500 lines of code.
Nate
January 25th, 2007 2:04 amThere is a well-known problem of “code trying to understand code”: the Halting Problem. If Tom comes up with a technique for detecting a hypervisor on CPU v1, Joanna can come up with a technique that avoids the detection (assume she can update the hardware for CPU v2).
But then Tom gets to run his old technique on all CPU v1’s and try to find a new technique for CPU v2. And then the cycle can repeat with Joanna updating her approach for CPU v3.
Assuming both people are dedicated to playing this game forever, Tom wins percentage-wise because CPU v1 is not immediately retired and he’s slowly building up a pool of CPUs with different characteristics. It doesn’t matter that his codebase is growing because he only needs one trick per CPU version. But Joanna needs to support all CPU versions and all approaches to all CPU versions (n^m). To win, Joanna would have to force old CPUs to retire more quickly than Tom can update his detection logic.
But you’re also forgetting that at some point, the CPU vendor is not incentivized to keep playing the game. If, for instance, cache timing variance became so sophisticated that the vendor had to consider flushing the cache completely, they’d probably stop there. CPU vendors have other goals (i.e. performance) that directly conflict with hiding virtualization.
You can see the kind of effort it takes to eliminate covert channels in rainbow book systems. That kind of cost is hard to justify in the commodity CPU world, and is only effective for some subset of all possible covert channels.
The way I like to say it, “Show me your undetectable hypervisor, and I’ll show you a CPU that never had any errata.” Take a look at the length of errata sheets for modern CPUs, then add in chipsets and peripherals. We’re headed the opposite direction from Joanna.
Matt
January 25th, 2007 1:35 pmNate:
I agree with most of your comment, but I’m not sure how the Halting Problem relates to what you’re saying. There are certainly decidability issues in v12n detection, but they’re usually discussed at a level of abstraction well above variations in CPUs.
Thomas Ptacek
January 25th, 2007 2:34 pmThe variances between a virtualized P4 and a “native” P4, which hypervisor malware needs to detect and mask, can be made data-dependent. The X86 ISA doesn’t provide a direct way to query the state of the microarchitectural details we’re playing with. In other words, without “detecting the detector” *and* analyzing its code, we think we can put the CPU into a setting where only the hypervisor detector knows what CPU state to expect after a VM-exit.
anon
February 5th, 2007 7:29 pmerm,
are you sure that 95faf2cfb27b4e271a8943ad44f7d865 is sha-1?
/shrugs
Thomas Ptacek
February 7th, 2007 10:41 amIn retrospect, no.
Let’s use this:
d86ded8e6f086cbc86bb07d854e58e1d60680958
H
July 8th, 2007 7:33 pmJoanna,
Remember what was discussed at Fed 06? When someone mentioned how close you are? That is still the case. Remember, it is about the data, not control of the machine. App layer, not Ring 0. Grid, not individual CPUs. Arrive/come in as a fully authorized process, via the EMSes. There is your control. Of everything.
That is what is worth making better, trust me.
Best, H
See you all in Sin City… I’ll be the one questioning Tony as to his true intentions and ‘goodness’.. just waiting to see what strangeness Dan comes up with THIS year… yeeesh….
Mark
May 15th, 2008 11:45 amWhen will the Thomas Ptacek list of virtualization detection techniques be published
d86ded8e6f086cbc86bb07d854e58e1d60680958
and have you added any new ones
Leave a reply