Blackbag sub: Binary “sed”-style Record Format Fuzzing

Thomas Ptacek | October 16th, 2006 | Filed Under: Matasano, Reversing, Uncategorized

You’re starting on a new project, attacking a product with its own custom protocol. There’s a couple of things you’re going to do to get started, and then a few ways to go from there.

First, you’re going to set the product up and see it working in its natural habitat. Then, when you know it’s working, you’re going to repeat that step, this time with some form of traffic capture in place. (A year ago “capture” would have meant “sniffing” for me, but now it’s almost always a logging plugboard proxy. A good, easy-to-use [as in, no config files] proxy is one of the most valuable tools you can have).

Next, you’ll eyeball the captures to get an idea of how the protocol works. Your goal: break the TX stream up into individual messages, which you’ll later replay. You’ll do this in any of a number of ways, most of which involve a decent hex editor:

  • Looking for common Type/Length/Value patterns, such as big-endian 32 bit integers less than 65k that match up in the hex dump.

  • Looking for “signpost values”, byte sequences that occur at regular intervals, like “NTLMSSP” and “SMB” in the simplest cases.

  • Looking for ASCII strings and delimiters, like NUL characters, or prefixing length words.

  • Looking at the order in which traffic is sent, breaking up the stream at each place where the server responded.

  • Feeding bytes to the real server one-at-a-time until a response is forthcoming.

If you’re me, your aim is a directory full of little files, like “msg.0”, “msg.1”, etc, any of which you can cat onto the network and observe the results. Which is what you’re going to do next, to see if messages are simply replayable as-is.

Now you want to modify those messages, perhaps to insert Javascript into a text string, or to make a Unicode name string 10,000 characters long. You are in a maze of twisty passages:

  1. You can use the knowledge you’ve gained to attempt to write an actual client for the protocol, in C (if you’re me) or Ruby (if Dino). But no matter what language you choose, you’re making an investment that may not pay off…

  2. … a risk you can mitigate by cobbling together a “client” of sorts using Unix command line tools, output concatenation, and shell subroutines, which I promise for one-off test cases is faster than Ruby or Python.

  3. Or, you can turn the messages into “fuzzing templates”.

I’ve been playing with option (3) for the past few days and I like it a lot. You should try it. Here’s what I’ve been up to:

I started out with a really simple tool called “sub”. I posted it here a few months ago. In its original incarnation, it didn’t do much; it had three features, all based on expanding simple variables inside a binary stream:

  • ${hex:10} generated 10 random hex characters

  • ${word:7} inserted a random 7-letter dictionary word

  • ${shell:ls} inserted the output of “ls”

So, this was kind of dumb. The only meaningful feature here was “${shell}” (you can implement “${hex}” in terms of it, as well as “${word}”, which I don’t even know why I implemented in the first place). Most annoyingly, “sub” was stateless; you could write a shell expansion that would give you an LE32-length-delimited string (“echo -n string | bkb len -tx”), but there was no way to represent that directly in the “template” message, which gradually devolved back into a shell script.

The other day, I added two little features to “sub” which made it a whole lot more useful to me:

  • ${mark:bkb len -tx} consumes subsequent bytes of the input stream until it encounters a terminating string (“####” by default, but you can change that heredoc-style), then passes those bytes to a shell pipeline, whose output is the expansion of the ${mark}-data-#### escape.

  • Macros.

For all these silly Blackbag tools, “sub” is now like the rug from Lebowski: it really ties the room together. For example, the compressed base64’d unicode string of 10,000 ‘A’s with a 16 bit little-endian length field:

${lenle16}${gz:<<!}${b64:<<?}${unicode:<<.}${a:10000}.?!####

Yes, the “terminators” (each block ends with a unique one) are a total misfeature; I wrote the minimum amount of code to make this work. But in the real world, you’re never tightly nested like this; LE16 TLV’s nested in an LE32 frame are about as complicated as it gets in most situations.

Macros I wrote in about 5 minutes:

  • lengths in LE/BE 16 bit, 32 bit, and ASCII

  • hexification

  • environment variables (good for session IDs)

  • stripping/replacing NUL bytes

  • unicode and base64 encoding

  • padding to N-byte boundaries

Why are Unix shell pipelines the way to do this? By way of example, the “gz” macro is simply:

gz=mark:gzip -cf

If I wanted to waste another 5 minutes I could map the OpenSSL transforms in, but, exercise for the reader and all that.

None of these ideas are new. This is a simple “block-based fuzzing framework”, to use Aitel’s terminology. The difference here is you don’t need to write code to use it; just open up a hex editor and start mangling your binary records with escape sequences.

1 Comment so far

  • […] Here’s a trivial (kinda broken) shell pipeline that turns “sub” into a struct generator, a la SPIKE or CASL (but simpler): […]

  • Leave a reply