1.多了一些64 bit暫存器
r8,r9,r10,r11,r12,r13,r14,r15
2.暫存器長度變成64 bit
rax,rbx,rcx,rdx,rsi,rdi,rsp,rbp
3.堆疊操作指令要對應64 bit暫存器
ex: (Push, pop, call, ret, enter, and leave) push eax;這是非法的
push rax;這是合法的
4.RIP變成合法化來存取資料
mov,rax,qword ptr [rip+300];以前IP只能當成程式記數器,現在可以參與資料獲取 也因此在64 bit下,DS/ES/SS/CS變的很沒有意義了!
5.Calling Conventions (呼叫慣例)變成Callee(被呼叫者)不需要清除Stack,這是Caller呼叫者的工作
EX:
Function(arg1,arg2,arg3,arg4,arg5) ,假設arg1~4都是整數,他們分別會被放進去對應的暫存器
RCX: 1st integer arg
RDX: 2nd integer arg
R8: 3rd integer arg
R9: 4th integer arg
arg5才會被放進去Stack,但是這個stack會先幫這幾個arg預留空間,
所以第5個參數會是放在 rsp+0x20的地方!
這個概念跟我們以前32 bit觀念不同,他並不是__cdecl/ __stdcall/ __fastcall
所以從下面的範例中可以看到呼叫者會去清除stack:
The stack must be kept 16-byte aligned. Since the "call" instruction
pushes an 8-byte return address, this means that every non-leaf
function is going to adjust the stack by a value of the form 16n+8 in
order to restore 16-byte alignment.
ex:
void SomeFunction(int a, int b, int c, int d, int e);
void CallThatFunction()
{
SomeFunction(1, 2, 3, 4, 5);
SomeFunction(6, 7, 8, 9, 10);
}
On entry to CallThatFunction, the stack looks like this:
xxxxxxx0 | .. rest of stack .. | |
xxxxxxx8 | return address | <- RSP |
Due to the presence of the return address, the stack is misaligned. CallThatFunction sets up its stack frame, which might go like this:
sub rsp, 0x28
Notice that the local stack frame size is 16n+8, so that the result is a realigned stack.
xxxxxxx0 | .. rest of stack .. | |
xxxxxxx8 | return address | |
xxxxxxx0 | (arg5) | |
xxxxxxx8 | (arg4 spill) | |
xxxxxxx0 | (arg3 spill) | |
xxxxxxx8 | (arg2 spill) | |
xxxxxxx0 | (arg1 spill) <- RSP |
Now we can set up for the first call:
mov dword ptr [rsp+0x20], 5 ; output parameter 5
mov r9d, 4 ; output parameter 4
mov r8d, 3 ; output parameter 3
mov rdx, 2 ; output parameter 2
mov rcx, 1 ; output parameter 1
call SomeFunction ; Go Speed Racer!
When SomeFunction returns, the stack is not cleaned, so it still looks like it did above. To issue the second call, then, we just shove the new values into the space we already reserved:
mov dword ptr [rsp+0x20], 10 ; output parameter 5CallThatFunction is now finished and can clean its stack and return.
mov r9d, 9 ; output parameter 4
mov r8d, 8 ; output parameter 3
mov rdx, 7 ; output parameter 2
mov rcx, 6 ; output parameter 1
call SomeFunction ; Go Speed Racer!
add rsp, 0x28
ret
Notice that you see very few "push" instructions in amd64 code, since the paradigm is for the caller to reserve parameter space and keep re-using it.
這種呼叫慣例比較像是結合了__cdecl/ __fastcall 的特性:__cdecl : 在被呼叫者(Callee) return後,由呼叫者(Caller)清除堆疊
__fastcall : 32bit時是將最左邊兩個參數會放在ecx跟edx ,而64 bit 這邊是放在rcx,rdx,r8,r9
Reference
http://blogs.msdn.com/oldnewthing/archive/2004/01/14/58579.aspx