原文地址

The Windows x64 ABI (Application Binary Interface) presents some new challenges for assembly programming that don’t exist for x86. A couple of the changes that must be taken into account can can be seen as very positive. First of all, there is now one and only one OS specified calling convention. We certainly could have devised our own calling convention like in x86 where it is a register-based convention, however since the system calling convention was already register based, that would have been an unnecessary complication. The other significant change is that the stack must always remain aligned on 16 byte boundaries. This seems a little onerous at first, but I’ll explain how and why it’s necessary along how it can actually make calling other functions from assembly code more efficient and sometimes even faster than x86. For a detailed description of the calling convention, register usage and reservations, etc… please see this. Another thing that I’ll discuss is exceptions and why all of this is necessary.

For an given function there are three parts we’re going to talk about,
the prolog, body, and epilog. The prologue and epilogue contain all the
setup and tear-down of the function’s “frame”. The prolog is where all
the space on the stack is reserved for local variables and, different
from how the x86 compiler works, the space for the maximum number of
parameter space needed for all the function calls within the body. The
epilog does the reverse and releases the reserved stack space just prior
to returning to the caller. The body of a function is where the user’s
code is placed, either in Pascal, or as we’ll see this is where your
assembler code you write will go.

You may be wondering why the prolog is reserving parameter space in
addition to the space needed for local variables. Why not just push the
parameters on the stack right before calling a function? While there is
technically nothing keeping the compiler from placing parameters for a
function call on the stack immediately before a call, this will have the
effect of making the exception tables larger. As I mentioned above,
exceptions in x64 are not implemented the same as in x86, which was a
stack-based linked list of records. In x64, exceptions are done using
extra data generated by the compiler that describes the stack changes
for a given function and where the handlers/finally blocks are located.
By only modifying the stack within the prolog and epilog, “unwinding”
the stack is easier and more accurate. Another side benefit is that when
passing stack parameters to functions, the space is already available
so the data merely needs to be “MOV”ed onto the stack without the need
for a PUSH. The stack also remains properly aligned, so no extra
finagling of the RSP register is necessary.

Directives

Delphi for Windows 64bit introduced several new assembler directives or
“pseudo-instructions”, .NOFRAME, .PARAMS, .PUSHNV, and .SAVENV. These
directives allow you to control how the compiler sets up the context
frame and ensures that the proper exception table information is
generated.

.NOFRAME

Some functions never make calls to other functions. These are called
“leaf” functions because the don’t do any further “branching” out to
other functions, so like a tree, they represent the “leaf” For functions
such as this, having a full stack frame may be extra overhead you want
eliminate. While the compiler does try and eliminate the stack frame if
it can, there are times that it simply cannot automatically figure this
out. If you are certain a frame is unnecessary, you can use this
directive as a hint to the compiler.

.PARAMS <max params>

This one may be a little confusing because it does not refer to
the parameters passed into the current function, rather this directive
should be placed near the top of the function (preferably before any
actual CPU instructions) with a single ordinal parameter to tell the
compiler what the maximum number of parameters will be needed
for all the function calls within the body. This will allow the compiler
to properly reserve extra, properly aligned, stack space for passing
parameters to other functions. This number should reflect the maximum
number of parameters for all functions and should include even those
parameters that are passed in registers. If you’re going to call a
function that takes 6 parameters, then you should use “.PARAMS 6”.

When you use the .PARAMS directive, a pseudo-variable @Params becomes
available to simplify passing parameters to other functions. It’s fairly
easy to load up a few registers and make a call, but the x64 calling
convention also requires that callers reserve space on the stack even
for register parameters. The .PARAMS directive ensures this is the case,
so you should still use the .PARAMS directive even if you’re going to
call a function in which all parameters are passed in registers. You use
the @Params pseudo-variable as an array, where the first parameter is
at index 0. You generally don’t actually use the first 4 array elements
since those must be passed in registers, so you’ll start at parameter
index 4. The default element size is the register size of
64bits, so if you want to pass a smaller value, you’ll need a cast or
size override such as “DWORD PTR @Params[4]”, or “ @Params[4].Byte”.
Using the @Params pseudo-variable will save the programmer from having
to manually calculate the offsets based on alignments and local
variables. UPDATE: I foobar’ed that one… The
@Params[] array is an array of bytes, which allows you to address every
byte of the parameters. Each parameter takes up 8 bytes (64bits), so
you’ll need to scale accordingly to access each parameter. Casting or
size overrides are still necessary. The above bad example should have
been: “DWORD PTR @Params[4*8]” or “ @Params[4*8].Byte”. Sorry about that.

.PUSHNV <GPReg>, .SAVENV <XMMReg>

According to the x64 calling convention and register usage spec, there
are some registers which are considered non-volatile. This means that
certain registers are guaranteed to have the same value after a function
call as it had before the function call. This doesn’t mean this
register is not available for usage,  it just means the called function
must ensure it is properly preserved and restored. The best place to
preserve the value is on the stack, but that means space should be
reserved for it. These directives provide both the function of ensuring
the compiler includes space for the register in the generated prolog
code and actually places the register’s value in that reserved location.
It also ensures that the function epilog properly restores the register
before cleaning up the local frame. .PUSHNV works with the 64bit
general purpose registers RAX…R15 and .SAVENV works with the 128bit
XMM0..XMM15 SSE2 registers. See the above link for a description of
which registers are considered non-volatile. Even though you can specify
any register, volatile or non-volatile as a parameter to these
directives, only those registers which are actually non-volatile will be
preserved. For instance, .PUSHNV R11 will assemble just fine, but no
changes to the frame will be made. Whereas, .PUSHNV R12 will place a
PUSH R12 instruction right after the PUSH RBP instruction in the prolog.
The compiler will also continue to ensure that the stack remains
aligned. Remember when I talked about why the stack must remain 16byte
aligned? One key reason is that many SSE2 instructions which operate on
128bit memory entities require that the memory access be aligned on a
16byte boundary. Because the compiler ensures this is the case, the
space reserved by the .SAVENV directive is guaranteed to be 16byte
aligned.
Writing assembler code in the new x64 world can be daunting and
frustrating due to the very strict requirements on stack alignment and
exception meta-data. By using the above directives, you are signaling
your intentions to the one thing that is pretty darn good at ensuring
all those requirements are met; the compiler. You should always ensure
the directives are placed at the top of the assembler function body
before any actual CPU instructions. This makes sure the compiler has all
the information and everything is already calculated for when it begins
to see the actual CPU instructions and needs to know what the offset
from RBP where that local variable is located. Also, by ensuring that
all stack manipulations happen within the prolog and epilog, the system
will be able to properly “unwind” the stack past a properly written
assembler function. Without this data, the OS unwind process could
become lost and at worst, skip exception handlers, or at worst call the
wrong one and lead to further corruption. If the unwind process gets
lost enough, the OS may simply kill the process without any warning,
similar to what stack overflows do in 32bit (and 64bit).

More x64 assembler fun-facts–new assembler directives(转载)的更多相关文章

  1. C166 Interfacing C to Assembler

    Interfacing C to Assembler You can easily interface your C programs to routines written in XC16x/C16 ...

  2. win10下Visual Studio 2015,C++ x64编译zlib

    前提安装了visual studio 2015      PS.几乎所有方式,x64的编译都会有点坑,鉴于网上的x86编译方式非常的多,所以不再累赘x86的编译方式 zlib下载源: 官网:http: ...

  3. [转]ARM/Thumb2PortingHowto

    src: https://wiki.edubuntu.org/ARM/Thumb2PortingHowto#ARM_Assembler_Overview When you see some assem ...

  4. An Assembly Language

    BUFFER OVERFLOW 3 An Assembly Language Introduction Basic of x86 Architecture Assembly Language Comp ...

  5. -fomit-frame-pointer 编译选项在gcc 4.8.2版本中的汇编代码研究

    #include void fun(void) { printf("fun"); } int main(int argc, char *argv[]){ fun(); return ...

  6. 领域驱动设计(Domain Driven Design)参考架构详解

    摘要 本文将介绍领域驱动设计(Domain Driven Design)的官方参考架构,该架构分成了Interfaces.Applications和Domain三层以及包含各类基础设施的Infrast ...

  7. Keil使用中的若干问题

    一.混合编程 1.模块内接口: 使用如下标志符: #pragma asm 汇编语句 #pragma endasm 注意:如果在c51程序中使用了汇编语言,注意在keil编译器中需要激活Properti ...

  8. java开发中的链式思维 —— 设计一个链式过滤器

    概述 最近在弄阿里云的sls日志服务,该服务提供了一个搜索接口,可根据各种运算.逻辑等表达式搜出想要的内容.具体语法可见https://help.aliyun.com/document_detail/ ...

  9. SpringBoot2.0源码分析(三):整合RabbitMQ分析

    SpringBoot具体整合rabbitMQ可参考:SpringBoot2.0应用(三):SpringBoot2.0整合RabbitMQ RabbitMQ自动注入 当项目中存在org.springfr ...

  10. 阅读Java Native源码前的准备

    前言 读java native源代码时,我们一般会去网站下载openjdk8源码http://download.java.net/openjdk/jdk8/promoted/b132/openjdk- ...

随机推荐

  1. pj2--图书管理系统

    这一次做得是图书管理系统. 下面是功能框图 下面是流程图 实际在做这个项目的时候根据相应的实际情况对功能流程等等做了一些小小的改变. 下面是一些值得记一笔的地方. 1.借用系统自带的导航控件(Bind ...

  2. Emscripten 安装和使用

    OS: Windows 10 x64 I. install 0. pre install Python2.7 Node js Java 1. down git clone https://github ...

  3. PHP运行出现Notice : Use of undefined constant 的解决办法

    这些是 PHP 的提示而非报错,PHP 本身不需要事先声明变量即可直接使用,但是对未声明变量会有提示.一般作为正式的网站会把提示关掉的,甚至连错误信息也被关掉 关闭 PHP 提示的方法 搜索php.i ...

  4. 七、XHTML介绍

    XHTML简介 1.什么是XHTML? XHTML指的是可扩展超文本标记语言 XHTML与HTML4.01几乎是相同的 XHTML是更严格更纯净的HTML版本 XHTML得到所有主流浏览器的支持 2. ...

  5. MPP数据库

    MPP数据库   版权声明:本文为博主原创文章,转载请注明出处. https://blog.csdn.net/lyc417356935/article/details/45033069 MPP数据库定 ...

  6. 性能测试day05_Jmeter学习

    今天来学习下jmeter这个性能测试工具,虽然说性能测试最主要的是整个性能的思路,但是也少不了工具的帮忙,从以前主流的LR到jmeter的兴起,不过对于性能测试来说,个人感觉jmeter比较适合接口性 ...

  7. oninput、onchange与onpropertychange事件的区别, 与input输入框实时检测

    这几天项目着急,同时也学到好多以前没有接触过的知识.oninput.onchange与onpropertychange事件的区别, 与input输入框实时检测 onchange事件只在键盘或者鼠标操作 ...

  8. Oracle 循环查询

    select * from sys_department start with departmentid = '0303e461-2454-4d5a-bfea-2cd5a4c064c6' connec ...

  9. 转载:c++深拷贝和浅拷贝

    文章来自:http://blog.csdn.net/u010700335/article/details/39830425 C++中类的拷贝有两种:深拷贝,浅拷贝:当出现类的等号赋值时,即会调用拷贝函 ...

  10. java性能优化总结

    本人在java中积累了一些性能优化相关的经验,现在总结如下: 批量处理服务性能优化 RTB服务性能优化 BasicData线上问题解决,疯狂FullGC的问题 BasicData线上部分服务器cpu使 ...