De-Obfuscation For the Impatient
Eric Monti | June 2nd, 2008 | Filed Under: Uncategorized
A classic challenge for companies that build products on high-level languages like Perl, Python, PHP, and Ruby —- as well as .NET, and Java —- is that they are shipping their source code to their customers. Companies don’t like disclosing their source code. Language vendors want companies to use their stuff. And so we have a variety of different schemes that try to “protect” source code so that only the language runtimes can use them. Some of them, like Python’s bytecode compilation, are “built in” to the language. Others are add-on packages and extensions.
From what I can tell, they all work about equally regardless of bells and whistles. Not too well.
Take “Eve Online”, an MMORPG. Eve’s client is written, in part, in Python, and shipped in byte-compiled files. Recently, a player named “Abuser” reverse-engineered and decompiled those files, discovered some vulnerabilities, and developed a cheating bot. For MMORPGs, cheating bots are a big deal. You can read online discussions between the game developers and Abuser, posted publicly. “Abuser” went public about his efforts, but he was far from being the first to do this to Eve Online.
Recently, I needed to get access to some obfuscated Perl code. My target was “encrypted” using Perl “source filters”. Unlike Python compiled bytecode, source filters actually obfuscate regular Perl source code by encrypting it then decrypting at runtime. Breaking Perl source filters is pretty easy for anyone who knows how to use a debugger. I’m going to show you how I went about it, using Immunity Debugger and then automated it with Paimei’s PyDbg.
First, a word about Perl source filters. Here’s an excerpt from “man perlfilter”:
NAME perlfilter - Source Filters DESCRIPTION This article is about a little-known feature of Perl called source filters. Source filters alter the program text of a module before Perl sees it, much as a C preprocessor alters the source text of a C program before the compiler sees it. This article tells you more about what source filters are, how they work, and how to write your own. The original purpose of source filters was to let you encrypt your program source to prevent casual piracy. This isn't all they can do, as you'll soon learn. But first, the basics. ... Decryption Filters All decryption filters work on the principle of "security through obscurity." Regardless of how well you write a decryption filter and how strong your encryption algorithm, anyone determined enough can retrieve the original source code. The reason is quite simple - once the decryption filter has decrypted the source back to its original form, fragments of it will be stored in the computer's memory as Perl parses it. The source might only be in memory for a short period of time, but anyone possessing a debugger, skill, and lots of patience can eventually reconstruct your program. That said, there are a number of steps that can be taken to make life difficult for the potential cracker. The most important: Write your decryption filter in C and statically link the decryption module into the Perl binary. For further tips to make life difficult for the potential cracker, see the file decrypt.pm in the source filters module.
At least they’re pretty honest about the limitations, pointing out that a determined attacker is inevitably going to get around any source encryption you can come up with. That last paragraph bugs me, though. They start out up-front about the limitations, they should just leave it at that. Why the caveat about making it difficult when they obviously know it’s a losing battle. Other things that specifically bother me include the statements “make life difficult” and “lots of patience”.
Also, the bit about the decryption filter being written in C and statically linked is, in reality, a relatively minor obstacle. Not to mention, most people are likely to ignore this man-page and use 3rd party builds of Perl for Win32 which are almost always built dynamically linked for most of their functionality.
That said, my target happened to be a customized source filter using proprietary encryption through a dynamically linked library on win32.
So how does one go about de-obfuscation? Better yet, how does one do it equipped with very little patience?
You can approach the problem different ways:
- You could attempt to identify the encryption algorithm and find where key material is stored. This has the advantage that you can then decrypt code independent of the architecture and/or harness that runs it (i.e. Python, PHP, Perl interpreted languages). My experience is that this requires more “patience”.
- Use a debugger and find the inevitable place in the harness where decryption occurs and just read the cleartext from a register. This ties you to the architecture and/or harness for the code somewhat more, but it lets you not worry about the encryption and just cut to the chase. In some cases, you may even be able to re-use the same harness for several different variations of a certain type of encryption.
I tend towards the latter approach myself. Side benefit: There are so many crackpot obfuscation encryption schemes. Not having to look too closely at them is better for one’s health.
In the case of Perl, the convention for source filtering is standard enough that my attack stands a good chance of working other filters built similarly with little if any modification.
Refer again to the perlfilter man-page:
The source might only be in memory for a short period of time, but anyone possessing a debugger, skill, and lots of patience can eventually reconstruct your program
Just to reiterate… “patience” is not one of my virtues. Friends and family remind me of this frequently. I’ve distilled the steps I took to those that really count below, but all in all, this took me an afternoon or so.
- First I take a look at perl58.dll. I used a freeware tool called dllexp (DLL Explorer) to examine the functions that are exported from this DLL. Some likely candidates pop right out:
- Next I try running an encrypted script through perl.exe with the “-c” flag using a debugger. The “-c” is Perl’s syntax checker, it’ll let me hone in just on the obfuscated code in a specific file. I decide to use Immunity Debugger, since I’ve been meaning to make the switch from OllyDbg for a while now. I’m still a bit of a win32 debugging noob, but have gotten somewhat familiar with OllyDbg. It’s an easy enough transition to Immunity since it is basically a Python version of Olly.
- Run “ImmunityDebugger.exe perl -c obfuscated_file.pl”
- Debugger comes up. Set a breakpoint on Perl_filter_read. By default, Immunity starts debugging at “WinMain()” aka “ModuleEntryPoint()”. At this point, perl58.dll has already been loaded. I can use the symbolic name setting my breakpoint with “bp Perl_filter_read” in the debugger command-line. Yay!
- Tell the debugger to continue execution. My breakpoint on Perl_filter_read hits. Good sign.
- Continue till return. Examine the registers. Nothing jumps out just yet. The first hit on my breakpoint may be too early to rule this function out though.
- Repeat the ‘continue’ and ‘continue till return’ sequence a few times. Lo and behold… I start seeing what looks like a line of “#” characters getting built up at EDX. Like a line of “####…” programmers like to use to make their comments look pretty. The comment header line is apparently being built up as this function gets called several times:
and then…
- I keep this up until I see actual comments and code. The further I go, the more the Perl code makes sense. Since I ran with “-c”, there’s no doubt in my mind this is actual code from the obfuscated script.
At this point, the rest is just refining and automating my steps. I could probably have written an ImmunityDebugger plugin or macro in python, but I’m more familiar doing this using PaiMei’s PyDbg, so I decided to use that instead. Need some more info before I code this, though:
- Fiddle around some more in the debugger examining the different states of EDX and other registers at several breaks. I’m specifically looking for whole lines or chunks if possible. Perl_filter_read appears to be called recursively decrypting in chunks until it has a full line of Perl code to return to its caller.
- I keep continuing in the debugger until it looks like there’s a complete line at EDX at the RETN at end of function. When I find this point, I step back into the caller. Set a breakpoint at the very next instruction *after* the CALL to Perl_filter_read. Take a look at the surrounding code while I’m at it and step a little further down to some interesting looking MOV statements.
- EDX isn’t really where the code is supposed to get returned to the caller. It’s where the code is being built up and might not be consistent enough to rely on when the function returns. I’m better off determining the “proper” return value, I think, and relying on that instead. Plus: It’ll improve my understanding of the code somewhat and may help me port to another source filter or harness sometime later if I ever need to (a long shot, admittedly).
- Take a look at this snippet of code I googled for Perl_filter_read:
Int32
Perl_filter_read(pTHXo_ int idx, SV* buffer, int maxlen)
{
return (
(CPerlObj*)pPerl)->Perl_filter_read(idx, buffer, maxlen)
);
}
- See how EAX gets pushed onto the stack and passed as “buffer” in the last figure? See how ESI got unrolled to back to EAX? See How I’m sitting right at the spot to receive it in EAX? I set a breakpoint right here and disable all my others.
- Now, as I continue and break at the new spot, I’m just seeing the comments and code line by line every time. The code is appearing at several registers, but EAX is the one that I sense I should probably use. As I start combining lines of snagged code in a text editor, it’s definitely functional Perl. This is where I want to break in my PyDbg script.
- Finally, using all this information, I write a quick script for PyDbg. It just automates all the steps I took and snags the value in memory from the EAX register at that last breakpoint. Here’s the code I ended up with. When run, it just dumps out the cleartext line by line.
What have we learned?
Well, obviously, Perl source filters are just not strong protection for your source code. If you know what a CPU register is, and you know how pointers work, it’s a lot easier than the man-page implies to pinpoint the place in the Perl interpreter where cleartext source-code can be found in memory. I’ve even produced a fairly simple script that does just that.
But really, this shouldn’t be news to you. And, how likely are you to run into obfuscated Perl? Not terribly. My point isn’t that Perl is weak. The problem with Perl source filters is an example of a problem that afflicts all high-level language runtimes: if you want your code to run, you have to expose it to the interpreter. On general purpose architectures, you can’t expose something to the interpreter without exposing it to everybody else.
If you’re shipping high-level source code in any form, including bytecode, self-hosted executables, or encrypted bundles, you’re ultimately shipping your source code. Get used to that idea, or go back to writing in C.






Add New Comment
Viewing 16 Comments
Thanks. Your comment is awaiting approval by a moderator.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Add New Comment
Trackbacks