Archive for April, 2008
Eric Monti | April 29th, 2008 | Filed Under: Reversing
I recently worked on a project that involved embedded systems and reverse engineering. This sort of territory can be a little hairy the first few times out. I ran into some interesting challenges and discoveries along the way which I thought might be worth writing a little bit about. I can’t tell you what the target was. But, it was important. And, we beat the crap out of it. So instead, I’ll tell you what I wish it was: a networked 4-slot toaster.
Now… to make things interesting; Early on, I’d discovered a vulnerability in the toaster that allowed any attacker to load their own firmware on the device. Ouch! My toast! My beautiful toast!
In order to drive home the risk (mostly to the vendor) of the firmware loading vulnerability, I was asked by my customer (also the vendor’s customer) to demonstrate the attack by actually loading malicious firmware onto the device and getting it to run.
Mind you, the request to prove this is actually pretty sane. I had little knowledge of the boot loader, or even of the firmware image format. I couldn’t say for sure that there wasn’t a code-signing feature, which would prevent the toaster from loading any image that wasn’t cryptographically signed by the vendor. That would have rendered the firmware loading attack impotent. To make things worse, the vendor was being pretty light on details. Can’t say I blame them.
So… I was tasked with demonstrating that the vendor’s firmware:
- Could be reverse engineered
- Modified
- Loaded back onto the device
- By an attacker
Think “embedded rootkit”. Embedded rootkits are sort of a holy grail: it’s easy to load a rootkit on a PC, but tricky to get them onto a wireless router or switch (or toaster). The reward for doing it, though, is that nobody ever thinks to check their router or toaster for rootkits. Initially, my bar was just going to be to demonstrate I’d actually changed some aspect of the program with my own firmware patch and move on to more penetration testing.
.
Some embedded projects are easier than others. Sometimes you get lucky. You find a debug shell, a file-system, or a serial console. A lot of times, devices that look like black boxes have JTAG ports, to which you can attach a debugger. Devices built in the last 10 years tend to have GDB stubs, so you can target them with cross-debuggers. Some even have firewire, through which you can DMA in and out of system memory.
This wasn’t any of those. Without reversing the firmware, I had no way to execute code on the device. For the purposes of penetration testing, I really couldn’t even see what was actually happening on my target. I hate that. All I had to work with was the firmware image file itself, offline, and some observable external behaviors of the device.
So… if I was going to make any of my objectives happen, I was basically starting purely from static analysis. Back in 2005, Tom wrote about a similar set of steps he took given similar starting point. Without realizing it this time out, I actually followed them almost to the letter. There were a few points where our paths diverged slightly because of differences in our target and approaches.
.
This is my story. Details have been fuzzed to protect the guilty toaster.
1. In Which I Get The Image
I start out with a couple megs worth of firmware image file. This file is sent to the device when you do a firmware upgrade the official way. You can usually find firmware images on the vendor’s download site, or on the CDs that came in the box.
2. In Which I Scan The Image
I feed it to deezee. Deezee is part of Matasano’s homegrown “Black Bag” toolkit. What deezee does is search through a binary file for compressed data by looking for zlib signatures, and then extracting anything it finds. Its crazy how often this seems to work on unknown binary files. As it turns out, zlib is the industry standard compression format, even for toasters.
Sure enough, deezee finds not one, but 3 distinct blobs of compressed data packed in the file. Hmm… really… deezee should tell me where it finds these things I think. Lets fix that now. Much better.
Edit: Here’s the link to blackbag with patch applied.
3. In Which I Read
With addresses for compressed chunks in hand, I open up the file in my hex editor.
Deezee found the first compressed segment around 384 bytes from the beginning of the file. Here’s where technique comes into the picture.
Deezee found three “hits” in the image. I assume they aren’t just sprinkled haphazardly throughout the image; there should be some kind of header on them. I want to know what that header looks like. Maybe it’s used for things besides compressed blobs? The way I’m going to do that is, I’m going to compare the bytes surrounding the compressed blobs for all three hits.
So, comparing the preceding 384 bytes of that and the other two hits, I see several similarities:
- A 4-byte signature. I can tell because it’s always the same four bytes, in the same place relative to the blob.
- An ASCIIZ string, apparently describing the firmware version, padded to 128 bytes. You see “padded strings” all the time in binaries; what they are is a C-struct with a “char version[128]”.
- A 16-byte ASCIIZ string of numbers describing just the version, again padded with NULs.
- 4 bytes which, when unpacked as an integer, happens to match the size of the compressed chunk. (If your hex editor won’t tell you the big and little endian 32 bit value of four highlighted bytes, get a new hex editor). This is the trick that cracks most image formats: you look for 16 or 32 bit fields that look like lengths, and try to reconcile them to the file and its features.
- 4 bytes which… hmm… could be CRC32 checksum? This is a bit of a shot in the dark, but it’s easy enough to check:
$ cat app.bin | ruby -e \
'require "zlib"; puts Zlib.crc32( STDIN.read ).to_s(16)'
a76ea2ad
And… they match! I’ll need to keep this checksum in mind when I try modifying the file. Bodes well for no code signing!
Why did I try a CRC32 checksum first? Well… first off I assumed the field was 32 bits long. And CRC’s (Cyclic Redundancy Check) are used a lot in file formats that encapsulate other files. As you can see above, I used Zlib’s crc32 function to check my file chunk. Could just easily have been OpenSSL’s since it too is a standard Ruby library. Two shining examples of how common CRC32 is.
4. In Which Other Headers Are Found
Now that I know what to look for, I find out that this header also prefixes some other chunks. Ones that aren’t compressed and so weren’t noticed by deezee. I also confirm that CRC32 header field for each chunk and it’s used on the other’s too. Five total, including the compressed ones. The last one is a big chunk of base64 which looks like it might be an encryption signature. Hmm… might be used for code signing somehow? Looks like I still need to confirm or discount this possibility.
5. In Which I Check For Metadata
Go back and look again at the original chunks of compressed data. The zlib headers in the file included the original filenames which give me some clues as to what each is. Going on filenames, I’m thinking I’ve got phase one and two boot-loaders and the third chunk is the actual app. I’m interested in the application. The Unix binutils ‘file’ command doesn’t tell me anything useful about any of them, but it’s always worth a shot.
6. Behold: Low Hanging Fruit
Strings the application file using GNU strings with ‘-t x’ so I get nice hex offsets for the where the strings are found. Lots of interesting stuff:
- Uh… well there’s this:
Nice ASCII art. Just a wild guess, but I think maybe this is VxWorks?
- The regular string ‘vxworks’ shows up in a lot more places too.
- “GNU ld version 2.9-mips3264-010729”. Ok so it’s MIPS. This is also good news. It means that somewhere out there, there’s probably a GNU binutils distribution that can understand this file format.
- Lots of what looks like function names close together. When do function names occur in compiled output? Two reasons: debugging code, or a symbol table. A symbol table would be a score; I make a note of that for later.
- Lots of assertion messages with references to source code line numbers xxx.c(###). Always handy. Even if you don’t have symbols for functions, with an hour of data entry in your disassembler you can usually fake them based on debug and assertion strings. You want to take the time to do this. By the time you’ve guessed even 10% of the symbols in an image, you’re usually going to be able to comprehend 90% of what the image does, without reading any assembly.
- The strings you see when the device boots up.
- Lots of error, exception, and status messages such and one typically finds in a program. Sometimes very handy as we’ll soon see.
7. In Which Firmware Is Patched
I’ve got a pretty good idea of what the program is and how it’s rolled up in the firmware image file.
My job right now is to figure out whether I can change the firmware and load it onto the toaster. So I make a simple change to it: there’s a string which gets displayed when the system boots up. Using my hex editor, I change it to something noticeably different, being careful to keep it the same size as the original. Then I re-compress it at the same level as the original, re-roll it into the header format, and change the CRC32 checksum.
I load it onto my target using the bug I found… and… sweet satisfaction! If there’s any code signing here, it doesn’t work. That spooky looking signature at the end is for something else. Knowing this, It’s well worth the effort to keep reversing the image.
8. In Which A Loader Offset Is Sought
Now, the binutils ‘file’ command didnt recognize the compressed image, nor did it recognize either of the apparent “boot-loaders”. If it was a well-known format like ELF or COFF, it would have.
Unfortunately, the executable headers are what tell us how to load the whole program into memory. Without a known executable format, I’ll need to find out the load offsets myself. If I don’t, when I load the program into a disassembler, none of the data and instruction offsets will make sense. You can read a binary image with broken offsets, but it’s not fun.
Headers are usually located at the beginning of the file. But I’m not seeing any strong indications of any sort of header information at all in this one. My guess: an older version of VxWorks, predating VxWorks ELF. On the bright side, firmware images aren’t usually relocatable. Unlike a Windows PE program, which is literally edited and patched up by the linker at runtime, firmware tends to be loaded at a specific address.
I could try reversing either or both of the bootloaders to find that address, but… I might just run into the same problem walking all the way back up to the first boot-loader. To be honest, I’m not really interested in unlocking the secrets of the toaster boot loader. If I was trying to jailbreak the toaster to heat unauthorized off-brand Pop Tarts, maybe I would be. Exercise for the reader.
9. In Which I Infer A Loader Offset
I’ve got another idea. Lets look closer at some of these strings and their surrounding data. I scroll around in my hex editor around the sections with strings until I’ve found what I’m looking for.

What we’re looking for is a list of strings preceded by their address table. At this point, I really don’t care what the strings are, just that they are recognizable text. The addresses prefixing the strings in the table are 32-bit big endian numbers pointing to the ASCIIZ strings. Only parts of the addresses match up to real addresses in the file, though, and this is key.The first entry (blue) points to 0x104DEDD8. The string actually lives at 0x004D9DD8 in the file. Subtract the two and you get 0x10005000 - the load offset for this segment.
If I’m unsure of my assumptions here, I check some of the other entries in the table (red, green, and brown). Same pattern holds. We’ve got a winner.
10. In Which I Sanity Check The Offset
Search for 0x10005000 and 0x5000 near the beginning of the file. I want to confirm my assumptions about the lack of a header. Nothing comes up. I’m still very certain this is my winner. The lack of a search hit here suggests again that the loader decides where to load everything, though this program’s compiler knew where that would be when it was compiled too.
11. In Which I Dissassemble
We are now at step 11 and it is time to load the file in a disassembler.
We use IDA Pro. IDA wants to know the architecture of the file. We choose a processor type of “mipsb”; that’s MIPS, which we learned from scanning the image format for strings, and running in big-endian mode.
How do we know it’s big endian? Big endian is a safe guess for non-X86 architectures, but in this case it’s more than a guess: the addresses and lengths in the file were big endian. What if we hadn’t seen a string identifying it as MIPS? I’d probably take a fragment of the file and feed it through “objdump -d” multiple times, specifying MIPS, PowerPC, ARM, and SPARC, in that order. You know you have a match when the instructions sort of make sense.
I let it load as one big segment. Once loaded in IDA, I go to “Edit -> Segments -> Rebase Segment” and enter the address I came up with in step #9. Rebasing tells the disassembler what the load offset is.
- Here I have to tell you about Thomas’ irrational fear of IDA’s rebasing. According to Tom this is based on a bad experience rebasing a 10 megabyte firmware image, waiting a day for it to finish, and only then noticing he got the address wrong. Here’s a handy trick Thomas uses instead of rebasing: feed the file to binutils “objcopy”. Objcopy is a best-kept-secret for reversing: you can feed it a file of type “binary”, and tell it to spit out an ELF file, specifying the header values. Several other tools suck at handling raw files, but rock at handling ELF. Why is his fear irrational? Well… I just disable IDA’s auto-analyze feature before I rebase. Then I do some manual poking around first to make sure I’m on the right track before turning it back on again. But… nobody tell Thomas this! He always comes up with crazy cool tricks using other tools when he gets pissed off at IDA.
12. In Which We Work Out A Symbol Table
Based on where I found strings in the image, I have a pretty good idea of the general region of memory where program data lives. Remember those function names I filed for later? Now is later.
I take the addresses of a few function name strings I find close together and search the file for the four bytes making up their (newly rebased) addresses. I’m consistently getting search hits near the end of my probable data segment, all in the same general area. Look around there and start converting every 4-byte aligned chunk starting with 0x104 to an offset. I start seeing offsets to strings of what looks like function names at 16-byte intervals. The string offsets are right next to offsets pointing way back into the code area and some pointing into the data area. I look a little closer at the hex dump of this area:

Blue is what points to the “name” of the symbol, red points to the actual thing in memory. The last column is the “type” of thing. Green things (0x500) are functions, and purple (0x700) are data. There’s also some oddballs strewn about with a type of 0x900. These point way out of bounds past the end of my rebased file. Could be a segment I don’t know about from this file, or it could just be something else created at run-time. I don’t stress about this, stick with what I know for sure right now.
I take this pattern and translate it into a structure for IDA, then find where the table begins and ends based on the pattern. This is an array, so that’s how I define it in IDA. The last element of the array is 16 bytes of 0x00 helping me (and IDA) see where it ends.
13. In Which I Import The Table Into IDA
Things start getting recognized quickly by IDA’s auto-analysis once I tell it about all these symbol offsets. But I also want IDA to know the names of everything in the symbol table.
I write a quick Ruby script to write an IDC script to do this work for me. IDC is the scripting language built into IDA; in the 21st century, most people write their IDA tools in Python or Ruby, but it’s sometimes faster to write short scripts that use IDC as a half-way format.Run ruby… run IDC… take a look at the results. Awesome! Just about everything was identified in this symbol table! Furthermore, everything was identified in the same segment off of one load address.
Just to make sure about that missing header I check my final results from the symbol table. Indeed, two of the function symbols point right at 0x10005000, the beginning of the program. One’s called ‘sysInit’ and the other ‘start’. There can’t be a header there, it’s a function. Though I can tell as much from the name “start”, a little googling and research tells me “sysInit” is what VxWorks usually uses as its program entry point.
.
Besides proving the firmware loading risk, I had what was shaping up as a target rich source of vulnerabilities on my hands. I really wanted a look at the code running on the device for the purposes of additional vulnerability testing. Now that I had readable, cross-referenced disassembly, I had a lot more to work with for finding and confirming other vulnerabilities both already found, or that I might find as I moved forward.
As Tom put it, “hilarity ensued”. Truly, vulnerability research is just as good in ‘08 as it was in ‘05. Unfortunately, I can’t talk specific details from there, but I wanted to share with our readers the road I took to this point.
Getting a full reversing of an embedded system like this is pretty satisfying and can open up lots of possibilities. The DD-WRT, OpenWRT, and NLSU2-Linux Unslung crowds have gone nuts with this sort of thing with pretty impressive results putting Linux distros on just about anything they can get their hands on.
I’d love to hear about some of your own embedded system reversing experiences.
29 Comments
Thomas Ptacek | April 24th, 2008 | Filed Under: Gatherings
ChiSec is the single best gathering of security professionals in the
Chicago metro area: it’s free of charge, free of vendors, and free of
membership. You just show up, and so do other people, and somehow, by
the power of the long tail cluetrain infoconomy, the whole thing works
itself out, as if some mysterious tipping “point”, aided by the wisdom
of the crowd and the power of thinking without thinking, is propelling
it towards a freakonomic logic of life that is made to stick.
Where was I? Moneyball! No, wait: the location. It’s at
Houlihan’s on Wacker, which is on the corner of Wacker and Michigan. This ChiSec
only: a sure-to-be-exciting discussion about why we continue to have
ChiSec at a Houlihan’s. Come armed with suggestions for alternatives!
ChiSec is next Wednesday. You do not need to RSVP.
3 Comments
Thomas Ptacek | April 24th, 2008 | Filed Under: Matasano
Are you a student looking for some experience in the information
security field?
Why, yes!
Consider an internship with Matasano, in Chicago or New York. This is
a paid position.
Sounds interesting. I’ve interned for security companies in the
past, and got experience making copies of TPS reports, delivering
mail, and even providing back massages to senior partners. What can I
expect from you?
At Matasano, you can expect to do those things too. But you can also
expect to:
Learn or hone reverse engineering skills
Research vulnerabilities in high-profile software
Find zero-day vulnerabilities and never talk about them!
Write reversing and security testing tools in fun languages
like Ruby or ok wait just Ruby.
Not sold yet?
No.
Consider some of the projects our interns have worked on: web
applications your mother has heard of, plus many that she hasn’t!
Hardware and RTOS systems built for CPUs that are documented only in
secret binutils distributions from India! Popular cryptosystems
deployed throughout the Fortune 500!
What’s an RTOS?
Exactly! Consider whether you’re going to learn more with us than at
any other internship:
You’ll do vulnerability research work almost exclusively.
You’ll likely get a diverse set of targets, from Win32 to
custom embedded platforms.
You’ll have opportunities to work at a very low level
(for instance, firmware and chipsets) and at very high levels
(for instance, AJAX toolkits).
You’ll get a chance to develop and promote new security
tools and techniques.
But I don’t know how to do most of this stuff, Thomas.
Can you code?
Sure, in Python.
Are you… interested in any of that RTOS-y, firmware-y, crypto-y
security stuff?
I might be if you’d tell me what it is.
Excellent! You’ll fit right in. Here are our requirements:
Strong computer programming skills, in any language. You
don’t need to be an expert C programmer, but be forewarned,
you may be one by the time you leave.
Enrollment in a computer science curriculum.
Strong written English skills.
Ability to work consistent on-site core hours in either
Chicago (we’re in the Loop) or Manhattan (we’re downtown).
Do you have any more details?
I do!
This is a salaried position.
Interships run between 10-12 weeks.
Office space and computers (we’re a Mac shop) provided.
How do I apply?
Email us at careers@matasano.com.
23 Comments
Dave G. | April 22nd, 2008 | Filed Under: Industry Punditry
More and more I hear people discussing coverage in terms of security testing. I am here to give you some bad news. You will rarely get a genuine answer on how much Coverage you actually received. It is dependent on approach, methodology, tools and skill set.
51% percent of Wikipedia editors agree, the most common forms of coverage testing are:
- Function coverage - Has each function in the program been executed?
- Statement coverage - Has each line of the source code been executed?
- Condition coverage - Has each evaluation point (such as a true/false decision) been executed?
- Path coverage - Has every possible route through a given part of the code been executed?
- Entry/exit coverage - Has every possible call and return of the function been executed?
Each one of these will either result in too little testing or too much testing. What’s worse: it’s unlikely that any two organizations will ever be able to measure this in a way that even allows you to have a conversation about whether or not effective levels of coverage have been obtained.
None of these are going to be effective measures of how well your software has been tested. All are, however, guaranteed to increase the amount of time you spend testing.
The problem with security testing is that the devil really is in the details. And there are enough of them that the traditional QA coverage models mentioned above aren’t really effective.
For security testing, lets add:
- Input Coverage - Has every input (e.g. form field, packet fields) been tested?
- Vuln. Class Coverage - Has every form of vulnerability been tested?
- Threat Based Coverage - Has every threat evaluated?
By combining these two, maybe you have an answer that means something. It is obviously still incomplete. There are still application, network and host state that all impact each of the above tests. Also, there are attacks that specifically relate to state and not inputs.
Now we have arrived at one of the places where the security testing world differs from the QA testing world. For the average application, you can make certain assumptions about the environment it will run in to guide the likelihood of certain states. But in the security testing world, an attacker is actively trying to induce any form of state that will cause an advantage.
So, let me ask the readers of this blog some questions:
- What is an acceptable level of coverage in a security test?
- And if you happen to own security somewhere, what would it take for you to actually find a coverage % credible? I am going to guess the M word will rear its ugly head.
- Do you ever have anything that you can even come close to measuring? There are so many states inside of real world applications, even pen test specific forms of coverage aren’t going to come close to being complete.
- If yes, can you effectively convey that to anyone in a way that will actually give them some level of assurance (the a-word of computer security)?
Caution: I am not actually saying, “Don’t try.” All I’m really saying is, “Measuring this stuff is hard, and the amount of time to do it in a credible way is probably best spent on actually testing more.”
6 Comments
Thomas Ptacek | April 21st, 2008 | Filed Under: Don't Believe The Hype
Despite repeated assertion, I am dubious about the standing of
“defense in depth” as a core principle for security design. It is, for
example, not one of Saltzer and Schroeder’s Principals for the Protection of Information
. It does, however, feature prominently in the Common Criteria
—- which should tell you something.
To help sort out the controversy, I enlisted the support of my
colleagues, adding “thoughtful commentary on this post” to their
quarterly MBOs. I got responses from:
Dave Goldsmith
Eric Monti
Max Caceres
Jeremy Rauch
“If it were me,” begins Dave, I’d define defense in depth so that it
has some meaning.” “That’d be a first,” riffs Eric. Burned! Continues
Dave, “as a core principal for security design, [depth] has always
been a little odd. It’s the baker’s dozen donut of
design principals. Do the other ones and you get this for free.”
Eric agrees. “I think that when your approach begins from the basis of
‘depth’, you run the risk of covering bad design problems with
complexity instead of rooting them out.” But Eric also associates
‘depth’ with network security, not application security, cautioning
that it may make sense to employ layered security when you don’t
control the “critical design elements.” “It irks me when vendors talk
about ‘defense in depth’,” he says, but “I generally take it as good sign when
customers do.”
Well, as I’m thinking about it, “defense in depth” originates from
military thinking. Here’s an interesting Google book search: “defense
in depth” in books published between 1900 and 1980, before the term
was hijacked by infosec.
In this sense, the strategy of defense in depth succeeds or fails in
any computing setting only to the extent that the analogy to warfare
applies to that setting. Despite its attractiveness, with clean
mappings to “attackers” and “defenders”, computer security is more
like a puzzle than combat. In particular, computer security challenges
often lack the attributes of combat most applicable to “defense in
depth”:
Attrition
Exploits are not worn down by being forced to overcome multiple
defenses; the attack that hits home after an obstacle course is no
weaker than that attack, unimpeded. Jeremy agrees: “Linking computer
security to military strategy is a lot like using traditional military
tactics against terrorism. It doesn’t work.”
“But I’ve seen experienced exploit developers get worn down and
demoralized by multiple filters they need to overcome to exploit”,
replies Max. “jumping off memory in the JVM to overcome DEP
involves effort to bypass a deliberate security mechanism, while doing
the Heap Feng Shui dance also involves effort: this time to deal
with an allocation structure that’s dynamic.”
“I know when I audit code that I have seen places where a
vulnerability would have existed if not for multiple defenses,” says
Dave. “Unfortunately, I can’t recall a single one right now.” My point
exactly. And you have to do better than that example: you have to find
the one where no single well-designed defense could have worked.
“There are no winters on the Internet,” argues Jeremy. “For a hacker,
there’s no drawn-out war. It’s either attacks of convenience, or a
slog against a specific target, with no penalties for taking your time.”
Deterrence
The presence of a countermeasure rarely gives an attacker pause,
because the failure costs the attacker little and success rewards
greatly. “The price you pay for the time expended developing an
incredibly complex exploit isn’t losing life or limb,” says Jeremy,
“All it costs you is a normal sex life. And thats what porn is for!” No offense, Mark.
“There’s a saying,” Max notes. “You don’t need to be faster than the
cheetah, only faster than the guy running next to you.” Not many
people target OpenBSD or GRLinux when there are soft targets to strike
instead.
“I’m not sure that’s a deterrant,” relies Dave. “The people who will
stop looking are the people who can’t find things anyways.”
Delay
A typical attack —- and, in particular, most of the attacks that top
enterprise threat models —- executes in milliseconds. The time costs
of successive countermeasures delay the attack in sub-human
timescales.
Max concedes the point. “I agree delaying an attack is meaningless.
Delaying the research required to get there may be worth something,
though.”
“Delaying for delay’s sake is silly,” says Dave, “but the purpose
isn’t to delay. It’s to encourage the attacker to make a
mistake. It’s like the old game Berzerk. Electrically charged walls.
Evil robots. Time limits. All multiply the likelihood that the game
will end. The more you remove, the easier the game would be. Even
just sitting there as an attacker can get you caught. To maintain
access to systems, you have to do something that is ultimately
detectable (no matter how unlikely). Even just sitting there
encourages the big boucing head of detection to roll by screaming
Intruder Alert! Intruder Alert!’”
Reaction
Because they rarely buy meaningful amounts of time for the defender,
countermeasures afford little opportunity for defenders to retaliate (for
instance, by involving law enforcement). In fact, the grooming
requirements of countermeasures often have the opposite effect,
forcing defenders to chase shadows or scramble to update filters.
Max isn’t sure. “If in the process of figuring out the policies of
your web app firewall I trigger 100 alerts, you may be paying more
attention to me by the time I actually get the exploit right.”
Eric isn’t having any of that. “If we ever make computers HALF as
smart and alert as one average armed soldier standing by a door, then
defense in depth may have a chance. Until then my money is on
evasion. Even with a super-smart human security expert sitting 24/7
behind an IDS today, we have no real hope of filtering and reacting in
time to security events. I totally agree about the grooming
requirements. I’ve seen these become obsessions that completely wash
out more productive uses of time.”
Predictability
The constraints on real-world combatants are far stricter than those
placed on computer attackers. Warfighters can work from a palette of
reasonable assumptions, including the fact that enemies can’t
teleport, stop time, shapeshift, or reverse the trajectories of
bullets. Analogies can be made from each of these fantasy capabilities
to real instances of real security attacks.
“That’s a stretch,” replies Max. “I do think computer attacks can be
somewhat predictable, or at least, as unpredictable as an ace war pilot.”
Eric agrees, with reservations. “There are some predictable behaviors
that can be useful to notice and incorporate into your defenses. But,
the problem is their shelf life is so short. As soon as those defenses
are known, they’re obsolete.”
“But you can do everything right in an application,” replies Jeremy,
“and still find it being used for unintended outcomes.” But isn’t the
infosec equivalent here “accepted risk”? “I suppose the concept of
accepted risk is a lot like the military concept of acceptable collateral damage —- but at least in security,
the stuff at risk isn’t 18 year old kids.”
30 Comments
Thomas Ptacek | April 17th, 2008 | Filed Under: Defenses, Uncategorized
This is all my fault.
Many moons ago, I wrote a blog post chiding developers for
checking the return value of malloc, the C function that allocates
chunks of memory for programs to work with. When malloc fails, it
returns NULL. According to Hoyle, you’re meant to check for that
value, because malloc can fail at absolutely any time (you are not
the only program claiming memory).
I stand by that argument, and by most of the wording of that blog
post. Now about that word “most”.
Dave LeBlanc and I go back, though he may not remember that. Last
bubble, we were dev leads on competing products. We’ve taken different
career paths, and, long story short, he’s technically now more of an
authority on secure coding than I am. And I’m telling you this because
LeBlanc’s response to my last post is —- faithful paraphrase —-
“are you high?”
LeBlanc thinks you’d have to be not check malloc returns, because:
not checking will inevitably crash the program, and crashes
are bad,
not checking leads to the bug class Dowd found, and
not checking leads to the bug class Dowd found.
LeBlanc is right. I am wrong. “Not checking” is bad. Let me make a
very slight semantic adjustment, so that I might be inassailably
correct (again).
Here are three (extremely contrived) code examples. The first, let’s
call, “unchecked”. It simply doesn’t check the return of malloc.
#define hostile /**/
void *
_setup(unsigned hostile slot, unsigned hostile id) {
u_int32_t *slots = malloc(SLOTS_SIZE);
slots[slot] = id; // XXX write32 corruption
return slots;
}
The second, we’ll call “caller-checks”. As you can see, it does.
#define hostile /**/
void *
_setup(unsigned hostile slot, unsigned hostile id) {
u_int32_t *slots = NULL;
if((slots = malloc(SLOTS_SIZE)))
slots[slot] = id;
return slots;
}
Now the third, which looks suspiciously like the first, we’ll
“callee-checks”.
#define hostile /**/
void *
_setup(unsigned hostile slot, unsigned hostile id) {
u_int32_t *slots = malloc(SLOTS_SIZE);
// NOT REACHED ON FAILURE
slots[slot] = id;
return slots;
}
What’s the difference between the first and the third? In the third,
if malloc fails, it does not return NULL. It instead hands the
program off to a recovery regime, which, by default, safely and
immediately ends the program.
What’s the difference between caller-checks and callee-checks?
First, callee-checks is safer. You can’t screw it up. The worst you
can do is write a program that will abruptly terminate. This is far
better than the current worst-case scenario, in which manifestly
common programmer errors allow Mark Dowd to upload malicious code into
your program.
Second, callee-checks is cleaner. In the caller-checks case, not only
does “setup” need to check, but so does “setup“‘s caller, and it’s
caller’s caller, and it’s caller’s caller’s caller, all the way down
to the place where your program inevitably gives up and terminates the
program.
“But, Thomas”, you say, “not all programs do give up and abort. Some
have policies for handling out-of-memory conditions”. And so they
do. And in most cases, those policies are global, and can simply be
substituted for the default behavior of exiting the program.
But I will grant you that in many cases, you have a genuinely useful
recovery regime that is specific to one code-path —- say, an
arena-style allocation regime for a particular user request —- and no
global policy will help. So, I submit to you a fourth option, which I
will not name, and which looks suspiciously like example (2):
#define hostile /**/
void *
_setup(unsigned hostile slot, unsigned hostile id) {
u_int32_t *slots = NULL;
if((slots = unsafe_malloc(SLOTS_SIZE)))
slots[slot] = id;
return slots;
}
Did you see the difference? It’s subtle. But it’s also easy to grep
for and easy to check.
I am an advocate for checking malloc —- callee-checks style. It is
simply harder to screw up, and, in the overwhelming majority of cases,
which you can check for yourself by randomly sampling Google Code
Search, it costs you nothing in terms of reliability of
functionality. Stop caller-checking malloc.
To LeBlanc’s other point about C++ and constructors throwing
exceptions, I refer him to Cargill, or back to our blog, noting
that exceptions are themselves inherently dangerous. “When a language
feature requires you to be that-language-feature-safe”, I believe I
said, “you have a security problem”.
As for his specific example: you can’t blindly throw exceptions from a
ctor. Even Meyers (MECv1, Item 9) catches that one. DON’T DATE ROBOTS!
55 Comments
Wes Brown | April 17th, 2008 | Filed Under: Development
In hindsight, rather than write a post about injectable virtual machine specifications, I should have started off with the rationale behind the whole concept and explained what they are to provide context to the readers. In this post, when we speak of virtual machines, we are discussing bytecode virtual machines such as UCSD Pascal’s p-Code machine, or the Java Virtual Machine.
All what an exploit by itself does is open the door to allow attacks in the form of payloads. To do something useful, we need a payload which is a block of code that is injected and then does tasks for us. Sometimes an exploit is tightly coupled with the payload, but it is important to keep the two components distinct organizationally.
There are different classes of payloads akin to the classes of exploitable vulnerabilities. The oldest and most well known is the traditional shellcode. Shellcode is commonly written in machine code and many spawn a command shell to allow the attacker to interact with the operating system. However, they are static, inflexible, and targeted to one execution environment. Machine code needs to be written to the specific architecture of the victim. It can break with patches or other changes to the environment.
Another common payload is the syscall proxy. The attacker sends messages to the proxy to execute system calls. This is more flexible than the traditional shellcode as it allows the attacker to dynamically react to the situation in the target execution environment. A major weakness is that the driving logic is on the attacker side, and this can make it fragile. Examples of software that uses this technique include CORE IMPACT and Metasploit.
DLL Injection is another payload technique, and its advantage lies leveraging the existing program code and libraries in memory. This allows easy implementation of higher level features. Logic can be placed on the target side, rather than relying on a proxy. However, it is static and it is usually Windows specific.
Another payload type that I find very interesting are exploit compilers. This is typically an intelligent compiler with retargetable backends that are written in a high level language. A notable example of this is Dave Aitel’s CANVAS. It offers a very nice abstraction of lower level code, and is very flexible. However, capabilities are often fixed at compile-time.
This brings us to a payload type that I have been researching: injectable virtual machines, which are bytecode executing environments as a payload. The driving logic is in the bytecode which can be embedded in the payload, or transmitted remotely.
Typical advantages are:
- Compact. A well structured bytecode language is more compact than machine code. Once the cost in memory space is paid for the virtual machine, the actual program to be executed can be much smaller than equivalent machine code.
- Machine independent. A well written virtual machine can abstract enough that bytecode can execute regardless of the underlying architecture. There are some limitations here, such as the difference between syscall proxying on a Unix versus Windows system, but this can be abstracted by yet another layer.
- Dynamic. Because it is a virtual machine, ‘in flight missile repair’ can be conducted, changing the entire characteristic of the program environment. This is especially useful with one-shot exploits.
- Assimiliation. Due to the inherent flexibility of virtual machines, this payload type is free to incorporate other techniques such as those mentioned earlier. A syscall proxy can be implemented, and DLL injection can be used to provide the virtual machine with functionality.
Bytecode virtual machines have a long history that dates back past the more common modern ones such as Python or Java. By looking at the early examples that ran in very constrained computing environments, we can transfer what we learn to a similiar context.
This post should hopefully help provide more context for the readers to understand the raison d’etre behind injectable virtual machines and my research. As always, I welcome feedback and comments.
5 Comments
Thomas Ptacek | April 15th, 2008 | Filed Under: Defenses
How nasty is the Flash vulnerability Dowd found?
Combined with any DNS vulnerability or any high-profile cross-site
scripting vulnerability, the weaponized version of this attack would
probably clock in at tens of thousands of compromised browsers per
minute.
Is this a new bug class?
Sort of. It depends on what you mean by the term “class”. For example:
most researchers consider heap overflows a seperate bug class from
stack overflows. In reality, though, the same underlying coding error
causes both vulnerabilities: poorly bounded copies. On the other hand,
epistimologically, integer overflows are a new bug class, because the
underlying coding error is a type violation, which creates an
unbounded copy.
See how I used the word “epistemologic” there? That means you don’t
care about the difference. Wild writes from NULL pointers are probably
their own bug class.
So this is like the heap overflow revelation in the late 90s? NULL
pointers are exploitable now?
No. Learn everything you can from Dowd’s paper and NULL pointers still
aren’t usually exploitable:
They need to be written to, not read from; lots of fuzzer
advisories trace down to loads, not stores, from NULL.
The offset needs to be controlled by the attacker; most of
the time, offsets are hardcoded (most offsets are structure
references).
The wild write needs to happen before any pointer loads that
will crash the program.
Is there a pattern worth looking for here? Absolutely. Look for things
that can return NULL that have random-access indexing. Malloc is a
perfect example.
Wait a minute. Didn’t you say people shouldn’t check malloc?
Yes. This bug is a perfect case in point for why I’m right.
Consider: it is not the case that the Flash runtime never checks for
allocation failure. What happened is, the Flash developers have an
allocation checking regime that defaults to unchecked, and requires
them to audit every allocation.
The way it should work is, by default, when using the simplest, most
common allocation calls (malloc, or, in Flash, mem_Alloc), the
program should abort if malloc fails. Returning and catching NULL is
inadequate.
“But we can’t just abort when any given malloc fails! What about
user-specified sizes?” You don’t have to abort on every malloc. You
just have to abort by default. When you know you’re taking a value
from a user, or any other unsafe input, you should use “unsafeAlloc”,
which is simply malloc. Then you audit your code for the 3 places in
the whole project that use “unsafeAlloc” and make sure the checking
regime works.
Doesn’t runtime security, like in Vista, solve most of these
problems?
Maybe, maybe not. Obviously it didn’t here, because Flash turned
runtime security off.
But look at the bigger picture. Runtime security measures like ASLR
and cookies and W^X memory all address the “dumb exploit” pattern. The
“dumb exploit” pattern is an artifact of hardcoded runtimes generated
by C compilers. When your exploit is shotgunned in through a dumb
runtime, you lose both predictability and control of the target
program. That’s basically what runtime security is capitalizing on:
your exploit doesn’t know where DLLs are based, and so it can’t return
directly into them.
The problem is, the hardcoded runtimes are going away. The vast
majority of code written going forward targets extremely complicated
runtimes, like the bytecode VM in ActionScript. In a bytecode VM
scenario, an exploit has much more flexibility:
There might be 10x as many places to overwrite that will
compromise the target; for instance, the abstract syntax
tree objects containing method tables.
Valuable information might be readily accessible from known
relative offsets or, better still, from registers kept loaded
with intepreter state.
Just as with ActionScript, the content buffer that vectors
your exploit in might be executable in the target runtime,
leaving you only with the problem of compromising the
verifier.
Just as with ActionScript, there may be an extremely powerful
executive running on top of the CPU, rather than just machine
code instructions running directly on the CPU.
These are all ways that high-level languages make runtime security
harder.
But high-level languages are supposed to be a huge security win!
They probably are. But remember, even in the most intricate schemes
(and Javscript compiled to a bytecode VM that runs off the system
stack qualifies), high-level languages are really just glue around
low-level languages: the most interesting features in Python, Ruby,
and Javascript are implemented natively.
So, you get two interesting phenomenon:
You need to audit the runtime to make sure that the C code
that implements the core language isn’t vulnerable (this is
why Perl was a bad bet in 1995, when everyone was saying that
buffer overflows were C’s fault).
You need to audit all the native extensions (such as Quicktime
for Java), bearing in mind that unlike a server or a client,
the attack surface for a language extension is arbitrary
callers with arbitrary arguments —- a much more painful place
to be.
Has Mark Dowd simply outclassed us? Should we pack it up and quit?
Yes. But don’t feel bad about that. You’re a human being, and he’s a
remorseless killing machine. Big Blue crushed Kasparov, and now he’s
not the prime minister of Russia! At a certain point, you have to
concede the field, moving on to games where human beings still have the
advantage. Computers haven’t solved Go, for instance. For us
researchers, I suggest we take advantage of Mark Dowd’s robotic
inability to love, and take up the arts, such as watercolors or
interpretive dance.
16 Comments
Thomas Ptacek | April 15th, 2008 | Filed Under: New Findings, This Old Vulnerability
The evidence is now overwhelming that Mark Dowd was, in fact, sent
back through time to kill the mother of the person who will grow up to
challenge SkyNet. Please direct your attention to Dowd’s 25-page bombshell
on a Flash bytecode attack.
Some context. Reliable Flash vulnerabilities are catastrophes. In
2008, we have lots of different browsers. We have different versions
of the OS, and we have Mac users. But we’ve only got one Flash vendor,
and everyone has Flash installed. Why do you care about Flash
exploits? Because in the field, any one of them wins a commanding
majority of browser installs for an attacker. It is the Cyberdyne
Systems Model 101 of clientsides.
So that’s pretty bad-ass. But that’s not why the fate of humanity
demands that we hunt down Dowd and dissolve him in molten
steel.
Look at the details of this attack. It’s a weaponized NULL pointer
attack that desynchronizes a bytecode verifier to slip malicious
ActionScript bytecode into the Flash runtime. If you’re not an exploit
writer, think of it this way: you know
that crazy version of Super Mario Brothers that Japan refused to ship to the US markets
because they thought the difficulty would upset and provoke us? This
is the exploit equivalent of that guy who played the perfect game of
it on YouTube.
Let’s break it down a bit:

Start with the vulnerability.
It’s an integer overflow, but not a simple one.
When the Flash runtime reads in scene data from a SWF file, there’s a
numeric field that, when bounds-checked, is interpreted as a signed
number, but when used is treated as unsigned. So there are values the
field can take that are treated as tiny and innocuous at
time-of-check, but actually evaluate as huge numbers at time-of-use.
A by-the-numbers integer overflow normally knocks the bounds checking
off a strncpy or memcpy call, turning code that carefully copies, say,
1k of memory into code that will copy 2 megs of data, splattering it
all over process memory. Not here. Instead, Flash uses the malicious
number as a count of bytes to allocate.
When you ask Flash to allocate several gigs of memory all at once, the
allocation fails, returning NULL. Attempt to use that NULL address and
you will crash the program. This happens all the time in real code.
Many crashes are traceable to NULL pointers. And, since nothing
(usually) lives at NULL, NULL pointer crashes are usually code for
“not exploitable”.
Not this time. Flash forgets to check that allocation failed, a
ludicrously common error. It then uses that pointer with an offset
controlled by the attacker. NULL isn’t valid. NULL plus 1024 isn’t
valud. But NULL + 0x8f71ba90 is, as is NULL + N for any N that
addresses valid memory.
To this address, controlled by attackers via wild offset, Flash writes
a value that is also controlled by the attacker. This is the write32
pattern: a vulnerability that gives the attacker the means to set any
one value in memory to a value of their choosing. Game over.

Except not quite.
The exploit doesn’t actually get to offset an arbitrary number of
bytes from 0. A complicated set of conditions constrains the address
it writes to and the value it gives it.
The the actual write occurs via a structure offset. Flash is hardcoded
to translate your offset into another number. Working offsets, as it
turns out, will be greater than 0x80000000, and will be evenly
divisible by 12 after 4 is added to them. Note: I thought I was
hardcore when I wrote shellcode with no lowercase letters for the IMAP
vulnerability in the ’90s.
That’s not all. The value that Flash will write to the wild pointer
isn’t totally controlled by the attacker either. It’s cast up from a
16 bit integer to a 32 bit integer, and has another variable
subtracted to it. This is the point in the report that I started
giggling uncontrollably, embarassing myself at the coffee shop.
The net result of this silliness is that it’s hard to do what
attackers normally do with a write32 vulnerability, which is to
clobber a function’s address with a pointer back to their buffer, so
that their shellcode is called when the clobbered function is
called. So Dowd’s exploit takes things in a different direction, and
manipulates the ActionScript bytecode state.
ActionScript bytecode state; yeah, about that.
ActionScript is Javascript that controls Flash animations. But the Javascript system
used by Flash is pretty advanced; for performance, it transforms
Javascript into bytecodes for a VM. For a bytecode VM, ActionScript is
pretty tight; its runtime stack is integrated with the CPU’s runtime
stack. The memory it uses to execute code is the same memory that the
Flash C-code runtime uses to manage its own state.
ActionScript is a register-based VM, meaning that its bytecode
instructions concern themselves chiefly with moving values in and out
of memory slots that simulate CPU registers. Those registers live in
the runtime stack and are accessed by indexing. Meaning, a malicious
Flash bytecode instruction can index its way to an arbitrary address
on the system stack. Game over.

Except not quite.
You can’t just inject malicious bytecodes.
Flash players have to execute bytecode sklorked directly off of web
pages, most of which are controlled by organized criminals. So Flash
doesn’t execute arbitrary bytecodes; they’re verified before
execution. The verifier ensures, among other things, that register
accesses from the bytecode stream reference valid register slots.
But. For performance, the Flash VM is broken into a two-pass system
with a verifier that validates bytecode (time-of-check) and an
executive that later evaluates it (time-of-use). And the interpretation
of bytecode differs at time-of-check and time-of-use. Here’s the
situation:
The verifier ignores undefined bytecodes.
The verifier keeps a table in memory that defines how long
any one bytecode instruction is.
The bytecode length table is a valid target of the NULL
pointer overwrite.
The executive has totally different machinery for interpeting
bytecode.
Clobber the right value in the length table, and you can make an
unused bytecode instruction that the verifier ignores seem much longer
than it is. The “extra” bytes slip past the verifier. But they don’t
slip past the executive, which has no idea that the unused bytecode
has trailing bytes. If those trailing bytes are themselves valid
bytecode, Flash will run them. Unverified. Giving them access to the
whole system stack. Game over.

Except not quite.
The Koopa shell on the second platform is a trap and if you touch it
you die.
Ok actually there’s no catch. Dowd’s exploit uses a NULL pointer
write32 to knock the locks off the bytecode interpreter in Flash, so
that his SWF file can run bytecode that will rewrite the system stack.
But, just to rub it in, or because this stuff just comes natural to
you when you are manufactured by a malicious cluster of supercomputers
inside SkyNet instead of nurtured by loving human parents, Dowd gives
himself additional constraints.
To wit: his exploit must (because he’s messing with us) corrupt the
Flash runtime, rewrite it to execute his trojan, and leave it running
steady as if nothing had happened. Meaning:
His modification to the verifier can’t break existing
instructions.
His bytecode has to swap values into the stack instead
of clobbering them directly.
Portions of his shellcode have to run as both Flash bytecode and
an X86 first-stage shellcode boot.

Two fun details.
First, even though IE and Firefox use different Flash builds, the
addressing inside them is compatible. The exploit works in both
places.
Second, Flash isn’t compiled with ASLR. So the attack works on
Vista.
Mass casualty. Go Flash!
100 Comments
Jeremy Rauch | April 14th, 2008 | Filed Under: NYSec
Third Tuesday tomorrow (4/15) — its time for NYSec.
6PM at Pound + Pence. Pound + Pence is located at 55 Liberty St, at the corner of Liberty and Nassau. Its easily accessed from just about any of the subway lines, the PATH, NY Waterway, etc.
We’ve been seated in different areas the last few meetings. Rather than wander aimlessly around the bar, I’d recommend asking at the front where the NYSec people are. They should send you our way.
No Comments