De-Obfuscation For the Impatient

Eric Monti | June 2nd, 2008 | Filed Under: Uncategorized

A classic challenge for companies that build products on high-level languages like Perl, Python, PHP, and Ruby —- as well as .NET, and Java —- is that they are shipping their source code to their customers. Companies don’t like disclosing their source code. Language vendors want companies to use their stuff. And so we have a variety of different schemes that try to “protect” source code so that only the language runtimes can use them. Some of them, like Python’s bytecode compilation, are “built in” to the language. Others are add-on packages and extensions.

From what I can tell, they all work about equally regardless of bells and whistles. Not too well.

Take “Eve Online”, an MMORPG. Eve’s client is written, in part, in Python, and shipped in byte-compiled files. Recently, a player named “Abuser” reverse-engineered and decompiled those files, discovered some vulnerabilities, and developed a cheating bot. For MMORPGs, cheating bots are a big deal. You can read online discussions between the game developers and Abuser, posted publicly. “Abuser” went public about his efforts, but he was far from being the first to do this to Eve Online.

Recently, I needed to get access to some obfuscated Perl code. My target was “encrypted” using Perl “source filters”. Unlike Python compiled bytecode, source filters actually obfuscate regular Perl source code by encrypting it then decrypting at runtime. Breaking Perl source filters is pretty easy for anyone who knows how to use a debugger. I’m going to show you how I went about it, using Immunity Debugger and then automated it with Paimei’s PyDbg.

First, a word about Perl source filters. Here’s an excerpt from “man perlfilter”:

NAME
perlfilter - Source Filters

DESCRIPTION

This article is about a little-known feature of Perl called
source filters. Source filters alter the program text of a
module before Perl sees it, much as a C preprocessor
alters the source text of a C program before the compiler
sees it. This article tells you more about what source
filters are, how they work, and how to write your own.

The original purpose of source filters was to let you
encrypt your program source to prevent casual piracy.
This isn't all they can do, as you'll soon learn. But first,
the basics.

...

Decryption Filters

All decryption filters work on the principle of "security
through obscurity." Regardless of how well you write a
decryption filter and how strong your encryption
algorithm, anyone determined enough can retrieve the
original source code. The reason is quite simple - once
the decryption filter has decrypted the source back to its
original form, fragments of it will be stored in the
computer's memory as Perl parses it. The source might
only be in memory for a short period of time, but anyone
possessing a debugger, skill, and lots of patience can
eventually reconstruct your program.

That said, there are a number of steps that can be taken
to make life difficult for the potential cracker. The most
important: Write your decryption filter in C and statically
link the decryption module into the Perl binary. For further
tips to make life difficult for the potential cracker, see
the file decrypt.pm in the source filters module.

At least they’re pretty honest about the limitations, pointing out that a determined attacker is inevitably going to get around any source encryption you can come up with. That last paragraph bugs me, though. They start out up-front about the limitations, they should just leave it at that. Why the caveat about making it difficult when they obviously know it’s a losing battle. Other things that specifically bother me include the statements “make life difficult” and “lots of patience”.

Also, the bit about the decryption filter being written in C and statically linked is, in reality, a relatively minor obstacle. Not to mention, most people are likely to ignore this man-page and use 3rd party builds of Perl for Win32 which are almost always built dynamically linked for most of their functionality.

That said, my target happened to be a customized source filter using proprietary encryption through a dynamically linked library on win32.

So how does one go about de-obfuscation? Better yet, how does one do it equipped with very little patience?

You can approach the problem different ways:

  1. You could attempt to identify the encryption algorithm and find where key material is stored. This has the advantage that you can then decrypt code independent of the architecture and/or harness that runs it (i.e. Python, PHP, Perl interpreted languages). My experience is that this requires more “patience”.
  2. Use a debugger and find the inevitable place in the harness where decryption occurs and just read the cleartext from a register. This ties you to the architecture and/or harness for the code somewhat more, but it lets you not worry about the encryption and just cut to the chase. In some cases, you may even be able to re-use the same harness for several different variations of a certain type of encryption.

I tend towards the latter approach myself. Side benefit: There are so many crackpot obfuscation encryption schemes. Not having to look too closely at them is better for one’s health.

In the case of Perl, the convention for source filtering is standard enough that my attack stands a good chance of working other filters built similarly with little if any modification.

Refer again to the perlfilter man-page:

The source might only be in memory for a short period of
time, but anyone possessing a debugger, skill, and lots of
patience can eventually reconstruct your program

Just to reiterate… “patience” is not one of my virtues. Friends and family remind me of this frequently. I’ve distilled the steps I took to those that really count below, but all in all, this took me an afternoon or so.

  • First I take a look at perl58.dll. I used a freeware tool called dllexp (DLL Explorer) to examine the functions that are exported from this DLL. Some likely candidates pop right out:

Exported source filter functions

  • Next I try running an encrypted script through perl.exe with the “-c” flag using a debugger. The “-c” is Perl’s syntax checker, it’ll let me hone in just on the obfuscated code in a specific file. I decide to use Immunity Debugger, since I’ve been meaning to make the switch from OllyDbg for a while now. I’m still a bit of a win32 debugging noob, but have gotten somewhat familiar with OllyDbg. It’s an easy enough transition to Immunity since it is basically a Python version of Olly.
  • Run “ImmunityDebugger.exe perl -c obfuscated_file.pl”
  • Debugger comes up. Set a breakpoint on Perl_filter_read. By default, Immunity starts debugging at “WinMain()” aka “ModuleEntryPoint()”. At this point, perl58.dll has already been loaded. I can use the symbolic name setting my breakpoint with “bp Perl_filter_read” in the debugger command-line. Yay!
  • Tell the debugger to continue execution. My breakpoint on Perl_filter_read hits. Good sign.
  • Continue till return. Examine the registers. Nothing jumps out just yet. The first hit on my breakpoint may be too early to rule this function out though.
  • Repeat the ‘continue’ and ‘continue till return’ sequence a few times. Lo and behold… I start seeing what looks like a line of “#” characters getting built up at EDX. Like a line of “####…” programmers like to use to make their comments look pretty. The comment header line is apparently being built up as this function gets called several times:

and then…

  • I keep this up until I see actual comments and code. The further I go, the more the Perl code makes sense. Since I ran with “-c”, there’s no doubt in my mind this is actual code from the obfuscated script.

At this point, the rest is just refining and automating my steps. I could probably have written an ImmunityDebugger plugin or macro in python, but I’m more familiar doing this using PaiMei’s PyDbg, so I decided to use that instead. Need some more info before I code this, though:

  • Fiddle around some more in the debugger examining the different states of EDX and other registers at several breaks. I’m specifically looking for whole lines or chunks if possible. Perl_filter_read appears to be called recursively decrypting in chunks until it has a full line of Perl code to return to its caller.
  • I keep continuing in the debugger until it looks like there’s a complete line at EDX at the RETN at end of function. When I find this point, I step back into the caller. Set a breakpoint at the very next instruction *after* the CALL to Perl_filter_read. Take a look at the surrounding code while I’m at it and step a little further down to some interesting looking MOV statements.

  • EDX isn’t really where the code is supposed to get returned to the caller. It’s where the code is being built up and might not be consistent enough to rely on when the function returns. I’m better off determining the “proper” return value, I think, and relying on that instead. Plus: It’ll improve my understanding of the code somewhat and may help me port to another source filter or harness sometime later if I ever need to (a long shot, admittedly).
  • Take a look at this snippet of code I googled for Perl_filter_read:
Int32
Perl_filter_read(pTHXo_ int idx, SV* buffer, int maxlen)
{
return (
(CPerlObj*)pPerl)->Perl_filter_read(idx, buffer, maxlen)
);
}
  • See how EAX gets pushed onto the stack and passed as “buffer” in the last figure? See how ESI got unrolled to back to EAX? See How I’m sitting right at the spot to receive it in EAX? I set a breakpoint right here and disable all my others.
  • Now, as I continue and break at the new spot, I’m just seeing the comments and code line by line every time. The code is appearing at several registers, but EAX is the one that I sense I should probably use. As I start combining lines of snagged code in a text editor, it’s definitely functional Perl. This is where I want to break in my PyDbg script.
  • Finally, using all this information, I write a quick script for PyDbg. It just automates all the steps I took and snags the value in memory from the EAX register at that last breakpoint. Here’s the code I ended up with. When run, it just dumps out the cleartext line by line.

What have we learned?

Well, obviously, Perl source filters are just not strong protection for your source code. If you know what a CPU register is, and you know how pointers work, it’s a lot easier than the man-page implies to pinpoint the place in the Perl interpreter where cleartext source-code can be found in memory. I’ve even produced a fairly simple script that does just that.

But really, this shouldn’t be news to you. And, how likely are you to run into obfuscated Perl? Not terribly. My point isn’t that Perl is weak. The problem with Perl source filters is an example of a problem that afflicts all high-level language runtimes: if you want your code to run, you have to expose it to the interpreter. On general purpose architectures, you can’t expose something to the interpreter without exposing it to everybody else.

If you’re shipping high-level source code in any form, including bytecode, self-hosted executables, or encrypted bundles, you’re ultimately shipping your source code. Get used to that idea, or go back to writing in C.

16 Comments so far

  • Aaron Portnoy

    June 2nd, 2008 1:28 pm

    Hey Eric,

    In a couple weeks at Recon I’ll be giving a presentation with a colleague on how to do similar things with Python. We go over extracting and modifying code from .pyc and .pyd files, writing our own assembler and disassembler, as well as possible anti-reversing techniques. Although I haven’t looked at EVE yet, I will be demonstrating many cheats I’ve injected into the Pirates of the Caribbean Online MMORPG. The abstract for our talk is located here: http://recon.cx/2008/speakers.html#python.


    Aaron

  • Nate

    June 2nd, 2008 2:34 pm

    Nice. I prefer to use read/write watchpoints though since you don’t have to figure out what code is involved in descrambling the data. All you need is the input and output buffer.

  • sapphirecat

    June 2nd, 2008 10:55 pm

    The m68k had a trace bit, but as far as I remember, only line-by-line tracing (and not breakpoints) trigger it. The old versions of AmigaDOS used it to block tracing of supervisor code, but you could still execute that code without giving up control of your program, using the ‘run-to-line’ feature of the debugger.

    Effectively, unless the computer can securely watch itself (which the trace bit was a small step towards), there is no way to actually secure it.

  • Rolf

    June 3rd, 2008 9:12 am

    You’re mostly right about the lameness of script obfuscators, but:

    “The problem with Perl source filters is an example of a problem that afflicts all high-level language runtimes: if you want your code to run, you have to expose it to the interpreter.”

    For instance, PHP internally contains a function pointer called “execute” that can be replaced at runtime to allow for a custom interpreter. Ergo the PHP bytecode that you see in memory post-decryption is not “typical” bytecode and will not run if passed to the standard interpreter. That said, these protections still tend to be lame — they usually just use the standard interpreter with some XORs thrown in here and there for obfuscation.

  • Mongo

    June 3rd, 2008 9:22 pm

    > Get used to that idea, or go back to writing in C.

    or write in a high level language that can compile to C or compile directly to native code.

  • Eric Monti

    June 3rd, 2008 11:53 pm

    Rolf,

    The PHP example you pose is coincidentally one close to home. You are right, de-obfuscating encrypted intermediate bytecode is a somewhat different beast than straight source. More so if the bytecode is handled in a proprietary manner. But many of the same principles still apply.

    I chose the Perl example since it was simpler to illustrate against. However, we’ve had some similar experience with a popular PHP obfuscation framework a few months back based on crypto with a semi-proprietary intermediate bytecode format as well.

    When we got to the point where we were plucking byte-code out of memory instead of actual source code, it became a matter of decompiling the byte-code. Definitely an extra step, and I’ll concede, a somewhat non-trivial one. (having some “patience” might apply here)

    Having spent that additional time reversing, we were ultimately able to discern the program structure almost completely back to original code, including original variable names and, under the right build conditions, even comments.

    I should point out, also. We took more or less the same approach regarding the crypto; side-stepping the “encryption” altogether and shooting straight for byte-code while it was exposed in memory when fed to the engine’s interpreter.

  • Earle Martin

    June 4th, 2008 2:07 am

    Next time you may wish to try “perl -MO=Deparse obfuscated_file.pl” first.

  • lulz

    June 12th, 2008 4:47 pm

    Earle Martin, you’re not supposed to point out B::Deparse or really any other Perl info (and not Immunity Debugger info). Imagine the humor we would have lost if Eric Monti and others had known that! The “When all you have is a hammer, everything looks like a nail. Let’s show off our hammer!” Matasano approach to work time and blogging is better for everyone: more entertainment for observers and more rewarding for Matasano authors in the blogosphere.

  • Daniel Reynaud

    June 12th, 2008 5:41 pm

    I made a small attempt at formalizing the use of an interpreter as an obfuscation step, the slides of the presentation can be found here :

    http://tcv.loria.fr/slides/reynaud-obfuscation-tcv08.pdf

    The idea is that if you want to publish some high-level code without actually disclosing the source code, you “just” have to compile it to bytecodes and apply low-level obfuscations on this code. Correctly implemented, this should really add a layer of complexity.

    Looks like there is an upcoming presentation at REcon dealing with the same topic :

    http://recon.cx/2008/speakers.html#virtualmachines

  • Eric Monti

    June 18th, 2008 5:20 pm

    @Earle: B::Deparse is probably a valid approach to try for this. The reason I went completely underneath Perl framework to an OS-level user-land debugger was because the instance of perl the target used was packaged with the target, not just the crypto extension. There was a mix of several opensource and proprietary perl extensions to wade through and I didn’t particularly want to rely on yet another 3rd party add-on which might or might not build and install, or even work at all, in the target’s perl environment. Not knocking it, just not my cup of tea.

    Alex Radocea, our super-intern, also pointed out another route I could have gone was to use Perl’s own debugging mode, “perl -d”. This, I think, is even simpler if you’re thinking of staying inside the perl framework. But had I done so I still would have had to do some reversing of the framework to disable the source-filter’s own perl debugger check.

    @lulz: Yes. My “hammer” is frequently a debugger. I don’t really have strong feelings about Immunity, gdb, Windbag, olly, softice, ida, python/ruby libraries, versus something I write myself. Given the choice, I’ll use something I know of and know will work if its available. (If I don’t have to write something new, so much the better).

    Repeat: We used the almost the same exact method on a (technically very different) PHP obfuscation scheme.

    Looks like nail? Have hammer? Hit with hammer! Hit until nail-thing go down.

    I’m pretty ok with that actually.

    Showing off a “hammer” was not the point of the post ,though. Still, I’m glad to see you’re so easily amused, lulz.

  • Thomas Ptacek

    June 18th, 2008 5:30 pm

    Like the Matasano approach of writing debuggers and then blogging about them instead of trying Deparse and giving up (”no findings, all clear!”) when it doesn’t work? We’re hiring.

  • Eric Monti

    June 18th, 2008 6:30 pm

    @Daniel:

    This filter just happened to be a funny target because of the free perl source code thrown in at the end.

    I wouldn’t suggest that reversing a proprietary bytecode format is going to be as easy as the perl source filter example I posed. But I will say it’s probably not going to be as hard as the obfuscation designer imagines it to be.

    Hands down, Perl source filters of the variety I looked at are a very naive approach to code obfuscation. As approaches get smarter, very soon the reverser ends up dealing with bytecodes or “intermediate bytecodes” somewhere a few steps up in abstraction from machine instructions. Once out of de-obfuscation (if there is any — in some cases, bytecode may be comparable to machine instructions, so why obfuscate?), if the attacker ends up staring at proprietary bytecode, their considerations will likely include designing a disassembler.

    Depending on their ultimate goal and the likeliness of re-using the attack in the future, it may just make more sense to just trace machine instructions and render that back up to an any abstract flow picture to make sense out of.

    I’ve not ever had to take it to this point. Hmm but I could imagine cases where I might?

    For instance: Jeff Goldblum, **MY_REVERSING_IDOL**, almost certainly did this to stave off planetary annihilation in Independence Day. Take that, nasty space monsters!!!

    Anyway, there’s certainly a line somewhere between the effort of obfuscation versus the effort required to reverse it, and dont forget factoring in the value of what is obfuscated. The line is where obfuscation may have a *partial* pay-off. If the strength of deterrence is greater than the value of what’s protected, seems like a good pay off.

    At what point does the obfuscator win? Not my call, but it’s an interesting question for the pundits.

  • lulz

    June 22nd, 2008 5:46 am

    Hammers are cool.

    B::Deparse isn’t another third-party tool that might not compile - it is maintained and packaged with perl. If there were technical limitations that excluded the use of it in your case, that could have made for a more interesting post. Instead your post suggests that the natural recourse, in general, is to jump to a debugger. It kind of caters to those who know little on the topic and can admire your hammer slamming, while leaving Perl users asking why you missed the obvious.

    Thomas: Come now. I advocate practical solutions; there is intelligence in simplicity. Although not very illustrious, simple solutions gained from basic research can not only be easier, but can often gain much better results (e.g. valid Perl code instead of asm) from building off the years of hard work that others have invested in an issue. I’m not saying you should not be capable of doing your own work.

  • Thomas Ptacek

    June 22nd, 2008 12:04 pm

    lulz — I take your point. I don’t think Eric can go into the details of why Deparse wasn’t going to work here (it would betray the actual target). I’m just reacting to your “there’s the Matasano way — go write your own debugger and then blog about it! Good technique if what you care about is blogging!”

    What our clients care about is that we find stuff. What our team cares about is that they aren’t forced to run and interpret Webinspect for 80 hours a week.

    Also, Eric didn’t write a debugger — he used PyDbg. He’s gotten to write debuggers on other projects. I’ve done a few. We have an intern writing one now. We’re debugger-happy. How does that sound to you, readers? We’re hiring.

  • Eric Monti

    June 22nd, 2008 4:05 pm

    lulz,

    1. I’m aware of B::Deparse and have used it in the past. It may come packaged with your stock perl distro, but it is definitely not “built-in” to Perl. I *like* shortcuts too!

    2. This should be abundantly clear by now: B::Deparse was *not part* of the perl environment I targeted. Would *you* release your own Perl dist with obfuscation packaged with Deparse?

    3. You will more than likely *never* read anything written by me describing how I installed and ran a Perl module to achieve an objective unless it is something totally out of left-field. Not saying “I’d never do that sort of thing.” I’m saying, “I probably won’t write posts about that.”

    (That said, you’ll probably read me describing the same under Ruby or Python… go figure!)

  • Eric Monti

    June 22nd, 2008 5:04 pm

    For the record… I love Perl. I blame… Tom!

  • Leave a reply