Hand-Detouring Windows Function Calls With HT
Thomas Ptacek | November 27th, 2006 | Filed Under: Reversing, Uncategorized
Last week I wrote about the INT 3 trick for bringing a service up under a debugger, at initialization, without using gflags or editing the registry. Recall that without source code, Windows services are a pain to debug, wanting to run from the Service Controller instead of the command line.
The INT 3 trick is a single-byte binary patch to trap to the debugger. This is great if all you need is to get the service initialization code up under Olly or windbag[*]. But I’m reversing some silly (and convoluted) crypto code. I want PaiMei.
Trapping to the debugger isn’t going to help me, because PaiMei wants to do its own (manual) attach; it’s not a Just-in-Time debugger. Trouble is, services takes milliseconds to start, and I can’t click that fast.
Now I have a lot of options.
The code I’m looking for is isolated in a single DLL. Can I just pull it into Python through ctypes? Ctypes is a work of art; it’s a pitch-perfect foreign-function interface for Python, which means makes C DLL’s directly callable by Python code without any heinous SWIG-style situps. Using ctypes I could just write a little wrapper program for my DLL that was not itself a service, and use PaiMei on that.
Unfortunately, this is a C++ DLL, and ctypes does not play well with C++ at all. The name-mangling problem is easy to get around, but the “defining classes and locating vtables” problem isn’t, especially when I don’t know what the class definitions are.
Since I’m after a DLL, I can patch the DLL import table, in the service executable, redirecting calls to an interceptor function. Simplest case: just sleep 30 seconds and then hand control over to the real function. My excuse for not going this route: it didn’t occur to me until I started writing this post.
I can use Detours. Damn, Detours is slick. It binary-patches executables on the fly to intercept arbitrary Win32 function calls. It does that by:
Adding a new section to the PE image and redirecting the DLL import table to it. Detours uses this to hold dynamically generated code and data payloads as well as to load new DLLs into the target program.

Binary patching the functions to be detoured.
The simplest way to intercept a function call is to find its call site (by scanning through binary code looking for direct CALL instructions with your target function as an operand). But that requires tons of little binary patches and misses indirect calls (ie, function pointers). So Detours does something different:

Detours locates the target function and replaces the first few instructions with a JMP into the interceptor function (which you presumably loaded as a DLL when Detours attached). It takes the original instructions from the JMP site and moves them to a trampoline. When your interceptor is done, control is handed to the trampoline, which executes the original instructions Detours patched away. Then control is handed back to the target function.
Detours is a Microsoft Research project. You can download it for free. Damn, is Detours ever slick. It comes with a “withdll.exe” function that you can use to throw arbitrary DLLs into arbitrary Windows programs, and a whole bunch of little sample DLLs, such as “trace all WinAPI calls”. Damn, is Detours ever slick. I really have no reason to use anything else. Except… I need to set up a Windows C/C++ build environment on my target machine to use it.
I’d rather eat a bug. Bringing us to option 4:
I can do what Detours does by hand. After all, I only need to insert one Sleep() call.
This is what I actually wound up doing. To pull it off, I used HT, an open-source binary editor. Where has this program been all my life? For mainstream platforms, HT does the 80% of IDA Pro I actually care about, and adds inline hex/assembly editing. So, using HT, here’s “Back Alley Detour”:
Open the DLL in HT.
Find the function I want to insert the Sleep in.
NOP out enough instructions to make room for a 5-byte unconditional JMP.
Set a label for the following instruction, “resume”.
Find a nice long patch of debug output in a DLL rdata section. Set a label for the first byte, “egg”.
Go back to my NOPs, hit CTR-A, and enter “JMP egg”.
Go back to my egg, and:
push eax
push ecx
push edx
mov eax, 20000
push eax
call [42330h] ; &KERNEL32.Sleep
pop edx
pop ecx
pop eax
; instructions patched out of original function
jmp resume
Save. Start the service. Attach PaiMei to it. Read RSS feeds for 14 seconds.
I want to do more with Detours. Damn, is Detours ever slick. But for very simple interventions, installing the VC++ compiler on a test lab box is overkill.
[*] Outside of the kernel, Olly rules all over windbg.


Add New Comment
Viewing 8 Comments
Thanks. Your comment is awaiting approval by a moderator.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Add New Comment
Trackbacks