Hand-Detouring Windows Function Calls With HT

Thomas Ptacek | November 27th, 2006 | Filed Under: Reversing, Uncategorized

Last week I wrote about the INT 3 trick for bringing a service up under a debugger, at initialization, without using gflags or editing the registry. Recall that without source code, Windows services are a pain to debug, wanting to run from the Service Controller instead of the command line.

The INT 3 trick is a single-byte binary patch to trap to the debugger. This is great if all you need is to get the service initialization code up under Olly or windbag[*]. But I’m reversing some silly (and convoluted) crypto code. I want PaiMei.

Trapping to the debugger isn’t going to help me, because PaiMei wants to do its own (manual) attach; it’s not a Just-in-Time debugger. Trouble is, services takes milliseconds to start, and I can’t click that fast.

Now I have a lot of options.

  1. The code I’m looking for is isolated in a single DLL. Can I just pull it into Python through ctypes? Ctypes is a work of art; it’s a pitch-perfect foreign-function interface for Python, which means makes C DLL’s directly callable by Python code without any heinous SWIG-style situps. Using ctypes I could just write a little wrapper program for my DLL that was not itself a service, and use PaiMei on that.

    Unfortunately, this is a C++ DLL, and ctypes does not play well with C++ at all. The name-mangling problem is easy to get around, but the “defining classes and locating vtables” problem isn’t, especially when I don’t know what the class definitions are.

  2. Since I’m after a DLL, I can patch the DLL import table, in the service executable, redirecting calls to an interceptor function. Simplest case: just sleep 30 seconds and then hand control over to the real function. My excuse for not going this route: it didn’t occur to me until I started writing this post.

  3. I can use Detours. Damn, Detours is slick. It binary-patches executables on the fly to intercept arbitrary Win32 function calls. It does that by:

    1. Adding a new section to the PE image and redirecting the DLL import table to it. Detours uses this to hold dynamically generated code and data payloads as well as to load new DLLs into the target program.

      PE-edit.png

    2. Binary patching the functions to be detoured.

      The simplest way to intercept a function call is to find its call site (by scanning through binary code looking for direct CALL instructions with your target function as an operand). But that requires tons of little binary patches and misses indirect calls (ie, function pointers). So Detours does something different:

      intercept.png

      Detours locates the target function and replaces the first few instructions with a JMP into the interceptor function (which you presumably loaded as a DLL when Detours attached). It takes the original instructions from the JMP site and moves them to a trampoline. When your interceptor is done, control is handed to the trampoline, which executes the original instructions Detours patched away. Then control is handed back to the target function.

    Detours is a Microsoft Research project. You can download it for free. Damn, is Detours ever slick. It comes with a “withdll.exe” function that you can use to throw arbitrary DLLs into arbitrary Windows programs, and a whole bunch of little sample DLLs, such as “trace all WinAPI calls”. Damn, is Detours ever slick. I really have no reason to use anything else. Except… I need to set up a Windows C/C++ build environment on my target machine to use it.

    I’d rather eat a bug. Bringing us to option 4:

  4. I can do what Detours does by hand. After all, I only need to insert one Sleep() call.

    This is what I actually wound up doing. To pull it off, I used HT, an open-source binary editor. Where has this program been all my life? For mainstream platforms, HT does the 80% of IDA Pro I actually care about, and adds inline hex/assembly editing. So, using HT, here’s “Back Alley Detour”:

    1. Open the DLL in HT.

    2. Find the function I want to insert the Sleep in.

    3. NOP out enough instructions to make room for a 5-byte unconditional JMP.

    4. Set a label for the following instruction, “resume”.

    5. Find a nice long patch of debug output in a DLL rdata section. Set a label for the first byte, “egg”.

    6. Go back to my NOPs, hit CTR-A, and enter “JMP egg”.

    7. Go back to my egg, and:

      push eax

      push ecx

      push edx

      mov eax, 20000

      push eax

      call [42330h] ; &KERNEL32.Sleep

      pop edx

      pop ecx

      pop eax

      ; instructions patched out of original function

      jmp resume

    8. Save. Start the service. Attach PaiMei to it. Read RSS feeds for 14 seconds.

I want to do more with Detours. Damn, is Detours ever slick. But for very simple interventions, installing the VC++ compiler on a test lab box is overkill.

[*] Outside of the kernel, Olly rules all over windbg.

Viewing 8 Comments

    • ^
    • v
    A pretty easy way to handle this is to add a value named "Debugger" to key

    HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\NameOfService.exe

    And point it to an exe of your choice (potentially a debugger). CreateProcess() will then execute whatever you have in 'Debugger' when the SCM tries to start the service. It's a useful trick for mucking with service startup.

    cheers
    • ^
    • v
    HT is very nice, but it's not 80% of IDA Pro - it doesn't even support FLIRT, flow charts, et al -- let alone all the great plug-ins. I'd like to see you put a Unix preprocessor into HT as easily. Not to mention: PaiMei (including IDAPython and pGRAPH), Nintendo DS loader module (I just bought a DS Lite for Final Fantasy III and have a DS-Xtreme on the way), Determina PDB (and the general ease of downloading MS symbol tables and importing them), IDASync, IDACompare, idastruct, IDA RPC Enumerator, BinNavi, BinDiff, BinAudit, and HBGary Inspector.

    i know you are overstating HT's value, as it barely has any features over objdump. it's nice to compare results between the 4 major toolsets (IDA Pro, OllyDbg, HT and gdb/binutils) just to make sure you are on the right track or miss something. I tend to jump from one to the other especially when I'm confused which is probably a lot more often than you are.

    Btw - thanks for speaking about Detours. I was just playing around with binary patching as talked about at Recon 2006 in Luis Miras' talk on Fixing Bugs in Binaries. I downloaded CFF Explorer V. I normally roll VS2k5 to play with my wm5 device anyways, so a MS compiler is usually around.
    • ^
    • v
    I know I'm overstating HT's features, but I think you might be understating them. By "80% of IDA", I mean "cross-referenced PE disassembly with symbols", which objdump can't do.

    Although I also think objdump is highly underrated. =)
    • ^
    • v
    If you are already trapping to ollydbg (sounds like you are) you might find Hernan's uhooker useful to hook the function you want via python. It is an ollydbg plugin

    http://oss.coresecurity.com/projects/uhooker.htm
    • ^
    • v
    Back when I was trying to solve my binary netcat problem, Ivan and Hernan shot me a copy of this so I could pick data up directly from sockets. I like it!

    Unfortunately for me, in this case my problem is that I have a tangled mess of virtual function calls, so I don't have a specific target function to hook.
    • ^
    • v
    That was then..the current version of Uhooker can hook on any address besides function calls

    There are 3 different types of hooks:
    1. Hook Before Entering the function (type 'B')
    2. Hook After the function returns (type 'A')
    3. Hook when executing reaches this address (type '*')

    http://oss.coresecurity.com/uhooker/doc/index.html

    Speaking about detours coolness... check out the third-party patch for MS06-057 from Determina
    http://www.determina.com/security.research/patc...
    • ^
    • v
    This is cool stuff, Ivan.

    I'm going to note what all the cool kids already know: it is impossible to do any kind of reversing/modifying work without realizing how easy binary third-party patching has become. The project I was writing about here was hairy: inter-module, inter-thread, through virtual functions; if I was just trying to block a pathological input to stop an attack, there'd be 20 more tricks we could pull out of our bag to do that.
    • ^
    • v
    ht is pretty nice
    but since you say you like ollydbg you can do all you enumerated from ollydbg itself no need for hte :)

    click in dump and hit ctrl+g and go to .rdata section
    find the place of debug output as you mention hit : (colon) and label it egg

    020D28E0 00 .

    select disassembler window
    and start typing jmp egg (assembler starts on typing or press spacebar to bring it up)

    if it is trashing some opcodes in excess of your required 5 bytes ollydbg will automatically fill with nops till the next valid sequnce of opcodes

    original
    020C1028 |. FF1485 41A10C02 CALL NEAR DWORD PTR DS:[EAX*4+20CA141>

    modified
    020C1028 - E9 B3180100 JMP
    020C102D 90 NOP
    020C102E 90 NOP
    020C102F |. 833D 51A10C02 01 CMP DWORD PTR DS:[20CA151], 1


    after this do ctrl+g in disassembler window
    and type egg

    020D28E0 0000 ADD BYTE PTR DS:[EAX], AL
    020D28E2 0000 ADD BYTE PTR DS:[EAX], AL

    start adding the detour including the trashed instruction copy to executable (selection in rdata section as well as all modifications in code section)

    and there you go all set

    btw doesnt rdata section needs to be set exec charecteristics ? if you are executing code ?

    nice plugin there this uhooker should be interesting to try and play with it thanks ivan for posting a link to it

Trackbacks

close Reblog this comment
blog comments powered by Disqus