Trampolines In x64
We got a few nice features from the new architecture of x64, like larger memory addressing, more registers (so fast call is the standard up to three registers and the rest get on the stack), and of course, a wider bandwidth of 64 bits, etc. AMD had a once in a life opportunity to change the ISA (instruction set architecture) a bit and to make it much better, but instead, they only added a very few new instructions, canceled a lot, and left the decoding as hard as before. Probably they were in a crazy rush, so that time Intel had to catch up with them!
The problem we face when hooking a function is how many bytes we will need to override. I already talked about Hot Patching and branching in x86. But I have never talked at length about x64. Usually most hookers use the JMP relative instruction (0xE9), which is possibly useful for x64 as well. And once again we are limited to a range of 2GB from the JMP instruction’s address, while in x64, 2GB is really nothing much. I also searched a bit over the inet to look for more info and found some interesting approaches. I decided to talk about them here and describe how they work.
1)
JMP relative instruction, when you know in advance the difference from the hooked function to the target trampoline is less than 2GB, is a very good method, only 5 bytes.
2)
JMP RAX
This one is almost optimal, you can branch everywhere in the address space, it takes only 12 bytes. It suffers a destruction of a register. Of course, by the ABI (Application Binary Interface) which the compiler implements, some registers are defined as volatile, means you can use them almost any time without worrying or needing to restore them. Analyzing the function (using a disassembler) you may be able to know which register you can use safely. That’s a big headache though.
Note that you could replace the JMP RAX with PUSH RAX;RET, but it’s still the same size.
3)
; Only if required (when High half is non zero):
MOV [RSP+4], DWORD <High Absolute Address DWORD>
RET
This one was found on Nikolay Igotti’s blog. You split the QWORD address value into two DWORDs. The first push, although pushes a 32 bits value, really allocates 64 bits value on the stack. Then if the high half of the address is non zero, you will have to write it directly to the stack. This will write a full QWORD to the stack which you can then branch to by RETting. It takes 14 (=5+8+1) bytes, and doesn’t dirty any register.
Note that it’s possible to shave some bytes in a case where the high half is -1, by OR [RSP+4], -1 and the like.
4)
This one takes advantage of RIP relative addressing mode, check it out, yo
DQ: <Absolute Address>
This one is cool, it will branch to the full address in the QWORD value. So it takes 14 bytes as well. On the other hand, I don’t like to mix code and data. In the world of firewalls (which do tons of hookings), for instance, hooking this function twice with two different methods, will probably lead to a crash, since most hooking engines disassemble the instructions, you will get garbage beginning with the second instruction.
This method leads to an interesting idea, you can move the address value around within a range of +-2GB, and then you will need only 6 bytes. In the instruction level, anytime you mention RIP it’s a waste of 4 bytes for a required 32 bits displacement, even if it’s 0, that sucks.
Unlikely, but still possible method sometimes is, JMP <Abs Addr>. Some OS API’s let you choose the allocated address, so you can make sure it’s in the first 2GB.
5)
JMP RAX
;;;
PointerToAdderss64:
DQ <Absolute Address>
This one is similar to the former, but it can read the QWORD from everywhere. Because the addressing mode really supports 64 bit values. It takes 11 bytes. Needless to say, it dirties a register.
Note that this MOV RAX is a special instruction that supports a 64 bit addresses.
Each method has its own pros and cons. It seems you can do best by choosing a specific method according to the difference from the hooked function to the target trampoline address.
One crazy idea which I haven’t seen before is to use simple JMP relative (0xE9) instruction, and to jump into the MIDDLE of some function nearby, the function will be hooked in the middle to recover the hole (like a transparent proxy) and the hole will be filled with the full jump to any address.
Any other tricks you wanna share with us? Leave a comment.
출처: http://www.ragestorm.net/blogs/?p=107