Reversing a “ZLib-Obfuscated?” Network Protocol
Eric Monti | May 21st, 2007 | Filed Under: Bitching About Protocols, Reversing, Uncategorized
We just wrapped up a security assessment on a commercial enterprise server/agent security product. I can’t get too specific here, but we did run into an interesting problem that we thought would be worth a post.
The application we were evaluating had a home-grown network protocol doing some interesting things worth investigating. What we were seeing from our network capture wasn’t too far from this:
00 46414b45 02000000 06060601 5e000000 |FAKE............| 10 dab624ba da73fed5 b9872696 08ea97a5 |..$..s....&.....| 20 2d626160 60c86248 61c86748 65e000b2 |-ba``.bHa.gHe...| 30 bd80ac0c 863c0605 0617b098 3450cc99 |.....<......4P..| 40 c18a2186 21802191 a1042817 c3100294 |..!.!.!...(.....| 50 89610806 92b94015 310c6e0c 990c3940 |.a....@.1.n...9@| 60 961e4332 502c8f21 0dc84f67 0000 |..C2P,.!..Og..| 6e
Just by glancing at the first 16 bytes, you can spot (1) a message signature; (2) some 4-byte little-endian word values, one of which was obviously a length value for the payload; and (3) version number of 1.6.6.6 in the middle.
This looked promising so, we decided to pick it apart some more and see where it got us.
Let me just add at this point: General approaches can vary a lot when it comes to reverse engineering. As you’ll see, what we were doing was not strictly just protocol reversing. We had access to server-side binaries, which we were simultaneously disassembling to guide us at several steps. We could have just gone the strict disassembly route, but in my experience combining the two tends to yield much quicker results.
So, away we went. Or rather, got stuck next. Just past the header of the protocol was a chunk of seemingly meaningless binary data. A bit of disassembling told us that it was something compressed with .NET’s DeflateStream. Here was the real payload and it was time to write our first bit of code.
Since we were working with BlackBag (as regular readers will have noticed — Matasano tends to do) our ideal tools would be small focused ones that could run on Unix. Preferably in the middle of a list of several piped commands so we could say things like:
% cat | _inflate_ | hexdump -C
And if things got interesting, maybe even:
% cat | _inflate_ | bkb sub | _deflate_ | bkb blit
We figured, we should be able to get the “Inflated” stream using Zlib. So, we set out to put together some Ruby to take a “deflated” standard input and dump “inflated” standard output.
#!/usr/bin/env ruby require 'zlib' buf = STDIN.read() zs = Zlib::Inflate.new out = zs.inflate buf STDOUT.write(out)
And… Fire!
% cat msg.raw |bkb shf 16 | inflate.rb|hd ./inflate.rb:7:in `inflate': incorrect header check (Zlib::DataError) from ./inflate.rb:7
Woops… maybe not so simple. We asked the Google! Turned out .NET’s DeflateStream doesn’t use the usual ZLIB header and footer as defined in RFC 1950.
Side note: Obviously this had already been tackled. Even though we didn’t try the IronPython solution linked above, I’d probably recommend using it or something like it unless you need something really quick and dirty as we did. The obvious question, is why didn’t we? We were sticking with ruby for other reasons on this session and didn’t really need a “robust” solution just yet.
So we actually read RFC 1950 at this point. Turned out we just needed to tack on the header (and maybe the footer) ourselves.
#!/usr/bin/env ruby require 'zlib' header = "x78x01" buf = STDIN.read() zs = Zlib::Inflate.new # Add the header first zs << header out = zs.inflate buf STDOUT.write(out)
Um.. Fire?
$ cat msg.raw |bkb shf 16 |./inflate.rb |hd
00 b6a45b7b 499fd59d c2917411 2f7666a2 |..[{I.....t./vf.|
10 04000000 6a006400 6f006500 08000000 |....j.d.o.e.....|
20 4a006f00 68006e00 20004400 6f006500 |J.o.h.n. .D.o.e.|
30 1b000000 43003a00 5c005000 61007400 |....C.:..P.a.t.|
40 68005c00 54006f00 5c005300 6f006d00 |h..T.o..S.o.m.|
50 65005c00 46006900 6c006500 2e006300 |e..F.i.l.e...c.|
60 6f006e00 66006900 6700 |o.n.f.i.g.|
6a
Much better.
Those who’ve read the RFC or are already familiar with ZLib may notice we didn’t bother with the ADLER32 checksum footer. Our quick/dirty Ruby ZLib implementation didn’t seem to notice when it was missing. Honestly not sure whether this is expected behavior or not, but it suited us just fine. We really just wanted to get back to picking apart the protocol.
What was “inflated” might also need to get “deflated” again, so we also whipped up a “deflater”.
#!/usr/bin/env ruby require 'zlib' buf = STDIN.read() zs = Zlib::Deflate.new() out = zs.deflate(buf,Zlib::SYNC_FLUSH) # Output the deflated chunk without the 2b zlib header and 4b adler32 footer STDOUT.write(dst[2,(dst.length - 6)])
Turned out we didn’t need to use the “deflate” script much: between protocol decoding and disassembly, we learned one of the original uncompressed 4-bytes in the protocol’s header was for payload *type*, either *deflated* or *raw*. So, even though we confirmed our deflater worked well enough, we usually just changed the type to *raw* whenever we wanted to send something back to the server.
And in conclusion (which I have to speak in vague terms about to protect the guilty - sorry). Now that we could read and compose messages, we learned this protocol was letting the agent do some truly crazy things. Things like, passing entire lists of fields to insert/update directly into SQL. Without authentication.
Identifying and decompressing the protocol’s payload was the only hurdle we had to get over to proceed with other attacks. In the end, these culminated in several findings, including trivial database corruption and even injecting malicious data to capture admin privileges through the product’s console. Again… without authentication.
Moral of the story:
I try to not to speculate too much about what developers’ intentions are or were when I find something like this. Hindsight is 20/20 and it’s generally a lot easier to break than build. But, I couldn’t help but wonder whether they had intended to use DeflateStream as a cheap form of obfuscation here. It’s just as possible they just wanted to keep the payloads small and didn’t even consider the risks faced by the protocol at all.
Zlib is not encryption (I feel dumb even saying it). Even more so if your protocol is wide open. Authentication would have been tricky no matter what. There were inherent trust boundaries invading way into the agent. That was even more reason for this protocol to use encryption. Though frankly it wouldn’t have solved all this protocol’s problems — crypto is not an argent projectile. There were some deeper design issues lurking here.
But at the very least, it would have raised the bar for reversing. Because the ZLib “hurdle” took us all of about 20 minutes to beat.


Thomas Ptacek
May 21st, 2007 11:53 pmThe goofier the encoding, the more unbelievably horrible the underlying protocol is destined to be. Take a guess at what lurks behind the “custom crypto” protocols! My favorite: the protocol XOR’d against a repeating string.
anonymous
May 22nd, 2007 1:06 amInstead of the focus being to obfuscate the traffic, perhaps it was just compressed to be … um … compressed.
Compression = less data = less network traffic = less impact on deployed networks.
I *WISH* people would use compression by default in custom protocols.
Thomas Ptacek
May 22nd, 2007 8:30 amA. I don’t wish that. Sure, LZ compression is better than whatever ad hoc nonsense protocol developers come up with on their own (label compression, minimum bit width encoding, etc). But for almost all protocols, compression adds drama, but very little value. Miniscule amounts of data to begin with.
B. In this case, without further documenting the nature of the target, we’re pretty sure this is a case of “Lempel-Ziv Encryption”. =)
Jon
May 22nd, 2007 8:56 amMy fav was a web app assessment where there was a bunch of Base64 goodies. These, in turn, were gzip objects, which were in turn, Java serialized objects, which included lists and so on. Who needs to worry about parsing input when ya have direct access to the memory objects
Jeremiah Blatz
May 22nd, 2007 9:40 amSee, this is why people should just stick to HTTP. No, really. HTTP is bloated and sucks for real client-server apps, etc., and so on, but at least people have solved the crypto and authentication problems. The difference between ajax and a client-server app is pretty much nil on the server end, and it’s easy to avoid the “whoops, we forgot to do authentication” problem.
Gunnar
May 22nd, 2007 11:41 amwhy bother with authentication when you can just “authorize” unauthenticatable requests? Look how well this approach has worked for MQ.
Thomas Ptacek
May 22nd, 2007 12:34 pmSo, I largely agree on the HTTP thing, with two caveats:
(a) HTTP lulls people into a false sense of security (to wit: the HTTP “variants” of crappy old protocols get allowed through firewalls). If all you’re doing is base64′ing your old protocol and sticking it in an HTTP sleeve, you’re making the problem worse.
(b) HTTP is actually fairly complicated, as protocols go; if you can use IIS or LAMP or J2EE, HTTP is a win because your stack has been audited. But particularly with appliances and agents, we’ve seen too many examples of roll-your-own that get taken out with fuzzers.
There is no question that TLS should be a universal standard, though. If you have to do encryption, you should be using TLS, either via most-recent-OpenSSL or by Win32 schannel.
Mike
May 22nd, 2007 1:05 pmThis may be your dumbest comment yet, but seeing as how I know nothing about RE (but am interested in learning), how the hell did you determine:
“Just by glancing at the first 16 bytes, you can spot (1) a message signature; (2) some 4-byte little-endian word values, one of which was obviously a length value for the payload; and (3) version number of 1.6.6.6 in the middle.” ?
John T. Hoffoss
May 22nd, 2007 2:07 pmYou said “silver bullet”. Please don’t. Speaking in cliche makes me want to scream. Even more so when I see an obviously intelligent person using cliched phrases to emphasize a point. If the point is worth emphasizing, use another sentence!
Good article though. Much enjoyed, though I hope I never find myself using this product or another from this vendor.
Thomas Ptacek
May 22nd, 2007 4:20 pmMike, that’s a great question; can we take it on in a seperate post instead of a comment?
Eric
May 22nd, 2007 4:21 pmJohn,
I’m sorry to rain on your party. You may have us over a barrel. The long and short of it is sometimes people just draw blanks. We’re on the case, though!
MikeP
May 22nd, 2007 8:53 pmEric, at the end of the day, isn’t it about the results you produce? After all, taking things step by step, you should come up with something great - Rome wasn’t built in a day.
Having said that, I have to agree with John. Most security pros despise marketese (marketer-ese?) as being unclear or too verbose, but won’t hesitate to use cliches or to twist a metaphor until it screams. There’s a bit of cognitive dissonance there.
Perhaps you could use “panacea” next time… might increase the vocabulary of your readers too.
MikeP
May 22nd, 2007 8:55 pmAll that being said, thank you, this sort of thing could be very useful to me personally and I believe is doing the community a great service.
Thomas Ptacek
May 22nd, 2007 9:32 pmYou’re not ok with “argent projectile”? Would it help if I converted to “pale glimmering missile”?
Thomas Ptacek
May 22nd, 2007 9:33 pmEww. Sorry.
Eric
May 23rd, 2007 12:43 amI used to get really disgusted with the security cliches abuse too. I give in now to annoyance more with myself repeating the same doctrines over and over again at length.
Saying things like “hindsight is 20/20″, “easier to break than build”, and “security’s not a silver bullet” are just short-hand for the same rants that have been made time and time again. Don’t get me wrong. They’re good rants, just take a lot of words. I agree, it’s a bad habit. (and… my writing style annoys me too sometimes)
Hint: you may also notice I used all of those in a big lump at the end under a heading called “Moral of the Story”… I think that’s referred to as “going with the flow” nyuk nyuk
Anyway Mike, you raised a good point about the not-so-obviousness of how some of the analysis was presented. Stay tuned. We’re working on some more protocol reversing material. I’m glad you have enjoyed this post.
Ahem… We’ll try and keep the cliche “fluff” “toned down” “for all our peeps” “moving forward”. (”metaphorically speaking”, that is)
I think I’m having too much fun with this. Whatever else, though, lets make sure we keep the pale glimmering stuff out of this.
bluffer
May 23rd, 2007 3:39 amwhen taking the query by mike in a seperate post it would be great if you could explain how you knew it was zlib or .nets inflate or deflate etc just by seeing the plain dump could all this be detemined ?
i ll post a wild guess for the original query by mike
msg being fake the length may be 02 & | == 0×32 and if i continue like this the version number could be what you say 0×36 == 6 - 0×30 = 06 till 0×31 == 1 - 0×30 = 01
06060601
Thomas Ptacek
May 23rd, 2007 12:57 pmPart of the point Eric made was that InflateStream and ZLIB are basically the same thing, with a minor protocol change. But yeah, identifying common encoding types would make a great post.
Eric
May 23rd, 2007 1:01 pmbluffer,
The program disassembly was what told us the payload was deflated. The 02 00 00 00 is little endian int for “2″ which corresponded to the type in the code for a compressed payload.
There are pure protocol analysis ways of finding these things out too, but we didn’t use them in this case.
Eric
May 23rd, 2007 7:49 pmThis is jumping ahead to some of what I would have covered in the next blog post. There’s plenty of other stuff to talk about in this area though, so we’ll still be doing more.
Anyway.
The payload size was 5e 00 00 00 (last field in the 16b header). Two things gave this away as payload length.
1. 0×0000005e = 94int . 94 was exactly the length between this field and the end of the message.
2. It was located just before the chunk it described. Makes it a strong candidate for a ‘length’ because of the way deserializing code generally works. Structures out the header, then uses the length element of that structure to read X number of bytes at an offset. In this case the offset was just the end of the header.
Thomas Ptacek
May 23rd, 2007 9:53 pmWhere “pure protocol analysis” actually boils down to “check the entropy to see if it’s truly random, and then try to decompress with every known decompressor”. =)
Alan
May 24th, 2007 2:34 pmIf you have the client software on Unix, the first thing I do is use “ldd” on it. (There are similar tools for Windows. Just dump the exports.) That list gives you a good starting point for further reversing.
bluffer
May 25th, 2007 5:47 amdisassembly is what told us that the payload was deflated
ok so it wasnt just having a plain dump and peering at it and deducing that it is deflated
94 was exactly the length between this field and the end of the message.
so end of meessage was the double null termination
or is it some thing else like signature following that double null that was denoting a differnt payload stream
brute force decompression is it really possible to subject a random stream without its associated header footer and other assorted requirements that are specific to thier implementation
i mean is it possible to evolve some thing
like
appendlog(”decrypt.log”,decompressor[i].Decompress(stream));
where decompresor[i] would be
type *decompressor {
“deczlib”
“declimpel”
“decziev”
};
and stream being char stream[255] = {”\x01\x02\x03\x04\x05\….\x…\xff”}
dont answer if you are making seperate post
try covering in that
Paulo Calcada
October 29th, 2007 8:28 amMaybe they are only implementing the DEFLATE method (RFC 1951) and ZLIB is a RFC 1950 implementation. That’s the reason for the necessity of the additional header..
But anyway, thanks for your doc, was very useful for me.
Leave a reply