Reversing a “ZLib-Obfuscated?” Network Protocol

Eric Monti | May 21st, 2007 | Filed Under: Bitching About Protocols, Reversing, Uncategorized

We just wrapped up a security assessment on a commercial enterprise server/agent security product. I can’t get too specific here, but we did run into an interesting problem that we thought would be worth a post.
The application we were evaluating had a home-grown network protocol doing some interesting things worth investigating. What we were seeing from our network capture wasn’t too far from this:

00  46414b45 02000000  06060601 5e000000  |FAKE............|
10  dab624ba da73fed5  b9872696 08ea97a5  |..$..s....&.....|
20  2d626160 60c86248  61c86748 65e000b2  |-ba``.bHa.gHe...|
30  bd80ac0c 863c0605  0617b098 3450cc99  |.....<......4P..|
40  c18a2186 21802191  a1042817 c3100294  |..!.!.!...(.....|
50  89610806 92b94015  310c6e0c 990c3940  |.a....@.1.n...9@|
60  961e4332 502c8f21  0dc84f67 0000        |..C2P,.!..Og..|
6e

Just by glancing at the first 16 bytes, you can spot (1) a message signature; (2) some 4-byte little-endian word values, one of which was obviously a length value for the payload; and (3) version number of 1.6.6.6 in the middle.

This looked promising so, we decided to pick it apart some more and see where it got us.

Let me just add at this point: General approaches can vary a lot when it comes to reverse engineering. As you’ll see, what we were doing was not strictly just protocol reversing. We had access to server-side binaries, which we were simultaneously disassembling to guide us at several steps. We could have just gone the strict disassembly route, but in my experience combining the two tends to yield much quicker results.

So, away we went. Or rather, got stuck next. Just past the header of the protocol was a chunk of seemingly meaningless binary data. A bit of disassembling told us that it was something compressed with .NET’s DeflateStream. Here was the real payload and it was time to write our first bit of code.

Since we were working with BlackBag (as regular readers will have noticed — Matasano tends to do) our ideal tools would be small focused ones that could run on Unix. Preferably in the middle of a list of several piped commands so we could say things like:

% cat  | _inflate_ | hexdump -C

And if things got interesting, maybe even:

% cat  | _inflate_ | bkb sub  | _deflate_ | bkb blit

We figured, we should be able to get the “Inflated” stream using Zlib. So, we set out to put together some Ruby to take a “deflated” standard input and dump “inflated” standard output.

#!/usr/bin/env ruby

require 'zlib'
buf = STDIN.read()

zs = Zlib::Inflate.new
out = zs.inflate buf

STDOUT.write(out)

And… Fire!

% cat msg.raw |bkb shf 16 | inflate.rb|hd
./inflate.rb:7:in `inflate': incorrect header check (Zlib::DataError)
from ./inflate.rb:7

Woops… maybe not so simple. We asked the Google! Turned out .NET’s DeflateStream doesn’t use the usual ZLIB header and footer as defined in RFC 1950.

Side note: Obviously this had already been tackled. Even though we didn’t try the IronPython solution linked above, I’d probably recommend using it or something like it unless you need something really quick and dirty as we did. The obvious question, is why didn’t we? We were sticking with ruby for other reasons on this session and didn’t really need a “robust” solution just yet.

So we actually read RFC 1950 at this point. Turned out we just needed to tack on the header (and maybe the footer) ourselves.

#!/usr/bin/env ruby
require 'zlib'
header = "x78x01"
buf = STDIN.read()
zs = Zlib::Inflate.new
# Add the header first
zs << header
out = zs.inflate buf
STDOUT.write(out)

Um.. Fire?

$ cat msg.raw |bkb shf 16 |./inflate.rb |hd

00  b6a45b7b 499fd59d  c2917411 2f7666a2  |..[{I.....t./vf.|
10  04000000 6a006400  6f006500 08000000  |....j.d.o.e.....|
20  4a006f00 68006e00  20004400 6f006500  |J.o.h.n. .D.o.e.|
30  1b000000 43003a00  5c005000 61007400  |....C.:..P.a.t.|
40  68005c00 54006f00  5c005300 6f006d00  |h..T.o..S.o.m.|
50  65005c00 46006900  6c006500 2e006300  |e..F.i.l.e...c.|
60  6f006e00 66006900  6700                    |o.n.f.i.g.|
6a

Much better.

Those who’ve read the RFC or are already familiar with ZLib may notice we didn’t bother with the ADLER32 checksum footer. Our quick/dirty Ruby ZLib implementation didn’t seem to notice when it was missing. Honestly not sure whether this is expected behavior or not, but it suited us just fine. We really just wanted to get back to picking apart the protocol.

What was “inflated” might also need to get “deflated” again, so we also whipped up a “deflater”.

#!/usr/bin/env ruby
require 'zlib'
buf = STDIN.read()
zs = Zlib::Deflate.new()
out =  zs.deflate(buf,Zlib::SYNC_FLUSH)
# Output the deflated chunk without the 2b zlib header and 4b adler32 footer
STDOUT.write(dst[2,(dst.length - 6)])

Turned out we didn’t need to use the “deflate” script much: between protocol decoding and disassembly, we learned one of the original uncompressed 4-bytes in the protocol’s header was for payload *type*, either *deflated* or *raw*. So, even though we confirmed our deflater worked well enough, we usually just changed the type to *raw* whenever we wanted to send something back to the server.

And in conclusion (which I have to speak in vague terms about to protect the guilty - sorry). Now that we could read and compose messages, we learned this protocol was letting the agent do some truly crazy things. Things like, passing entire lists of fields to insert/update directly into SQL. Without authentication.

Identifying and decompressing the protocol’s payload was the only hurdle we had to get over to proceed with other attacks. In the end, these culminated in several findings, including trivial database corruption and even injecting malicious data to capture admin privileges through the product’s console. Again… without authentication.

Moral of the story:

I try to not to speculate too much about what developers’ intentions are or were when I find something like this. Hindsight is 20/20 and it’s generally a lot easier to break than build. But, I couldn’t help but wonder whether they had intended to use DeflateStream as a cheap form of obfuscation here. It’s just as possible they just wanted to keep the payloads small and didn’t even consider the risks faced by the protocol at all.

Zlib is not encryption (I feel dumb even saying it). Even more so if your protocol is wide open. Authentication would have been tricky no matter what. There were inherent trust boundaries invading way into the agent. That was even more reason for this protocol to use encryption. Though frankly it wouldn’t have solved all this protocol’s problems — crypto is not an argent projectile. There were some deeper design issues lurking here.

But at the very least, it would have raised the bar for reversing. Because the ZLib “hurdle” took us all of about 20 minutes to beat.

24 Comments so far

  • Thomas Ptacek

    May 21st, 2007 11:53 pm

    The goofier the encoding, the more unbelievably horrible the underlying protocol is destined to be. Take a guess at what lurks behind the “custom crypto” protocols! My favorite: the protocol XOR’d against a repeating string.

  • anonymous

    May 22nd, 2007 1:06 am

    Instead of the focus being to obfuscate the traffic, perhaps it was just compressed to be … um … compressed.

    Compression = less data = less network traffic = less impact on deployed networks.

    I *WISH* people would use compression by default in custom protocols.

  • Thomas Ptacek

    May 22nd, 2007 8:30 am

    A. I don’t wish that. Sure, LZ compression is better than whatever ad hoc nonsense protocol developers come up with on their own (label compression, minimum bit width encoding, etc). But for almost all protocols, compression adds drama, but very little value. Miniscule amounts of data to begin with.

    B. In this case, without further documenting the nature of the target, we’re pretty sure this is a case of “Lempel-Ziv Encryption”. =)

  • Jon

    May 22nd, 2007 8:56 am

    My fav was a web app assessment where there was a bunch of Base64 goodies. These, in turn, were gzip objects, which were in turn, Java serialized objects, which included lists and so on. Who needs to worry about parsing input when ya have direct access to the memory objects :-)

  • Jeremiah Blatz

    May 22nd, 2007 9:40 am

    See, this is why people should just stick to HTTP. No, really. HTTP is bloated and sucks for real client-server apps, etc., and so on, but at least people have solved the crypto and authentication problems. The difference between ajax and a client-server app is pretty much nil on the server end, and it’s easy to avoid the “whoops, we forgot to do authentication” problem.

  • Gunnar

    May 22nd, 2007 11:41 am

    why bother with authentication when you can just “authorize” unauthenticatable requests? Look how well this approach has worked for MQ.

  • Thomas Ptacek

    May 22nd, 2007 12:34 pm

    So, I largely agree on the HTTP thing, with two caveats:

    (a) HTTP lulls people into a false sense of security (to wit: the HTTP “variants” of crappy old protocols get allowed through firewalls). If all you’re doing is base64′ing your old protocol and sticking it in an HTTP sleeve, you’re making the problem worse.

    (b) HTTP is actually fairly complicated, as protocols go; if you can use IIS or LAMP or J2EE, HTTP is a win because your stack has been audited. But particularly with appliances and agents, we’ve seen too many examples of roll-your-own that get taken out with fuzzers.

    There is no question that TLS should be a universal standard, though. If you have to do encryption, you should be using TLS, either via most-recent-OpenSSL or by Win32 schannel.

  • Mike

    May 22nd, 2007 1:05 pm

    This may be your dumbest comment yet, but seeing as how I know nothing about RE (but am interested in learning), how the hell did you determine:

    “Just by glancing at the first 16 bytes, you can spot (1) a message signature; (2) some 4-byte little-endian word values, one of which was obviously a length value for the payload; and (3) version number of 1.6.6.6 in the middle.” ?

  • John T. Hoffoss

    May 22nd, 2007 2:07 pm

    You said “silver bullet”. Please don’t. Speaking in cliche makes me want to scream. Even more so when I see an obviously intelligent person using cliched phrases to emphasize a point. If the point is worth emphasizing, use another sentence!

    Good article though. Much enjoyed, though I hope I never find myself using this product or another from this vendor.

  • Thomas Ptacek

    May 22nd, 2007 4:20 pm

    Mike, that’s a great question; can we take it on in a seperate post instead of a comment?

  • Eric

    May 22nd, 2007 4:21 pm

    John,

    I’m sorry to rain on your party. You may have us over a barrel. The long and short of it is sometimes people just draw blanks. We’re on the case, though!

  • MikeP

    May 22nd, 2007 8:53 pm

    Eric, at the end of the day, isn’t it about the results you produce? After all, taking things step by step, you should come up with something great - Rome wasn’t built in a day.

    Having said that, I have to agree with John. Most security pros despise marketese (marketer-ese?) as being unclear or too verbose, but won’t hesitate to use cliches or to twist a metaphor until it screams. There’s a bit of cognitive dissonance there.

    Perhaps you could use “panacea” next time… might increase the vocabulary of your readers too. :-)

  • MikeP

    May 22nd, 2007 8:55 pm

    All that being said, thank you, this sort of thing could be very useful to me personally and I believe is doing the community a great service.

  • Thomas Ptacek

    May 22nd, 2007 9:32 pm

    You’re not ok with “argent projectile”? Would it help if I converted to “pale glimmering missile”?

  • Thomas Ptacek

    May 22nd, 2007 9:33 pm

    Eww. Sorry.

  • Eric

    May 23rd, 2007 12:43 am

    I used to get really disgusted with the security cliches abuse too. I give in now to annoyance more with myself repeating the same doctrines over and over again at length.

    Saying things like “hindsight is 20/20″, “easier to break than build”, and “security’s not a silver bullet” are just short-hand for the same rants that have been made time and time again. Don’t get me wrong. They’re good rants, just take a lot of words. I agree, it’s a bad habit. (and… my writing style annoys me too sometimes)

    Hint: you may also notice I used all of those in a big lump at the end under a heading called “Moral of the Story”… I think that’s referred to as “going with the flow” nyuk nyuk

    Anyway Mike, you raised a good point about the not-so-obviousness of how some of the analysis was presented. Stay tuned. We’re working on some more protocol reversing material. I’m glad you have enjoyed this post.

    Ahem… We’ll try and keep the cliche “fluff” “toned down” “for all our peeps” “moving forward”. (”metaphorically speaking”, that is)

    I think I’m having too much fun with this. Whatever else, though, lets make sure we keep the pale glimmering stuff out of this.

  • bluffer

    May 23rd, 2007 3:39 am

    when taking the query by mike in a seperate post it would be great if you could explain how you knew it was zlib or .nets inflate or deflate etc just by seeing the plain dump could all this be detemined ?
    i ll post a wild guess for the original query by mike

    msg being fake the length may be 02 & | == 0×32 and if i continue like this the version number could be what you say 0×36 == 6 - 0×30 = 06 till 0×31 == 1 - 0×30 = 01

    06060601

  • Thomas Ptacek

    May 23rd, 2007 12:57 pm

    Part of the point Eric made was that InflateStream and ZLIB are basically the same thing, with a minor protocol change. But yeah, identifying common encoding types would make a great post.

  • Eric

    May 23rd, 2007 1:01 pm

    bluffer,

    The program disassembly was what told us the payload was deflated. The 02 00 00 00 is little endian int for “2″ which corresponded to the type in the code for a compressed payload.

    There are pure protocol analysis ways of finding these things out too, but we didn’t use them in this case.

  • Eric

    May 23rd, 2007 7:49 pm

    This is jumping ahead to some of what I would have covered in the next blog post. There’s plenty of other stuff to talk about in this area though, so we’ll still be doing more.

    Anyway.

    The payload size was 5e 00 00 00 (last field in the 16b header). Two things gave this away as payload length.

    1. 0×0000005e = 94int . 94 was exactly the length between this field and the end of the message.

    2. It was located just before the chunk it described. Makes it a strong candidate for a ‘length’ because of the way deserializing code generally works. Structures out the header, then uses the length element of that structure to read X number of bytes at an offset. In this case the offset was just the end of the header.

  • Thomas Ptacek

    May 23rd, 2007 9:53 pm

    Where “pure protocol analysis” actually boils down to “check the entropy to see if it’s truly random, and then try to decompress with every known decompressor”. =)

  • Alan

    May 24th, 2007 2:34 pm

    If you have the client software on Unix, the first thing I do is use “ldd” on it. (There are similar tools for Windows. Just dump the exports.) That list gives you a good starting point for further reversing.

  • bluffer

    May 25th, 2007 5:47 am

    disassembly is what told us that the payload was deflated
    ok so it wasnt just having a plain dump and peering at it and deducing that it is deflated

    94 was exactly the length between this field and the end of the message.

    so end of meessage was the double null termination
    or is it some thing else like signature following that double null that was denoting a differnt payload stream

    brute force decompression is it really possible to subject a random stream without its associated header footer and other assorted requirements that are specific to thier implementation

    i mean is it possible to evolve some thing
    like

    appendlog(”decrypt.log”,decompressor[i].Decompress(stream));

    where decompresor[i] would be

    type *decompressor {
    “deczlib”
    “declimpel”
    “decziev”
    };

    and stream being char stream[255] = {”\x01\x02\x03\x04\x05\….\x…\xff”}

    dont answer if you are making seperate post
    try covering in that

  • Paulo Calcada

    October 29th, 2007 8:28 am

    Maybe they are only implementing the DEFLATE method (RFC 1951) and ZLIB is a RFC 1950 implementation. That’s the reason for the necessity of the additional header..

    But anyway, thanks for your doc, was very useful for me.

  • Leave a reply