Reversing is Easier Than You Think
Thomas Ptacek | January 17th, 2006 | Filed Under: Reversing
Ilfak tries to correct a common misconception regarding reverse engineering:
Most of the time the goal is not to get a compilable text. What would one do with such a text? Recompile and link it to get a copy of the input binary program? […] One goal of a RET [reverse engineering tool] is to represent the program logic in the most precise and clear way. This representation should be easily understandable by a human being […] […] Another important goal of a RET is to automate program analysis and instrumentation.
A few weeks ago I wrote about the process of taking a storage appliance from a stark metal chassis and its installer CD to the guts of its embedded operating system. I can understand if you’re left wondering why being left with a huge blob of code inspires positive emotions. “Are you really going to read all that? Isn’t it a huge waste of time?”
Well, yes and no.
So I’m going to revise and extend Ilfak’s comments: another important goal of reverse engineering is to gain insight about the topology of a program: what are its entry points, what are the broad strokes of its functionality?
This goal requires neither precision nor fidelity. I’m not working to reconstitute the blueprints (though sometimes I wind up with them); I’m trying to get a sense of the schematic, and I can often get it without reading more than a few instructions in any given function:
- is there really cryptographic authentication happening here, like the RFC claims, or is that stubbed out “until a future release”?
- which of the bytes in this proprietary protocol represent the command word? what are the available commands?
- if X is the function that handles incoming packets, Y is the function that decides whether my password is valid, and Z is the function that lowers the cooling rods, what does the landscape between X, Y, and Z look like? how likely is it that I can skip Y?
When you luck out and get a symbol table (and you often do), you can answer lots of questions like these just from the call graph, no assembly required.
I just read Mike Perry and Nasko Oskov’s Reverse Engineering book on the web. It’s incomplete, but good. What I liked best about it was the attempt at covering techniques besides disassembly. Part of what makes “superficial disassembly” effective for me is the ability to augment the code with active probing.
This is the area where the book needs the most work, but also where the most potential is. Some techniques the book misses:
- Fuzzing. I’m not a huge fan of fuzzing applications to find vulnerabilities. But in reversing scenarios, random inputs reveal the surface of an application like a sonar chirp.
- Hit tracing, like Pedram Amini’s Process Stalker or OllyDbg. In tandem with a “known good” input, hit traces allow you to use traffic generators like a lockpick; the pins are at the shear line when the trace for artificial traffic matches that of the good traffic.
- Logs! The logs themselves are helpful, sure, but they’re even more useful as signposts inside the code —- the string “bad command word %.2x” pinpoints the most important region of the call graph for me.
Appliances and internal applications are incredibly bad. Nobody can look at the squirming ick underneath the rock that is a typical CORBA application and doubt that Microsoft is getting value out of the attention they’re paying to security: it’s still 1993 out there for applications that don’t sit in front of a firewall or ship with a Dell desktop. In virtually every case, a workable understanding of the protocol in play is a game-over condition for internal applications. I don’t need to reverse engineer back to compilable source; I could probably get away with reversing to header files.

