Writing 64-bit Intel code for Apple Platforms
Create 64-bit Intel assembly language instructions that adhere to the application binary interface (ABI) that Apple platforms support.
Overview
Prior to the introduction of Apple silicon, Macs used the Intel 64-bit architecture, often called x86-64, x86_64, AMD64, or x64. The macOS platform Application Binary Interface (ABI) for this architecture defines rules for how to call functions, manage the stack, and perform other operations. If your code includes assembly instructions, you must adhere to these rules in order for your code to interface correctly with code generated by the compilers in Xcode. Similarly, if you write a compiler, the machine instructions you generate must adhere to these rules. If you don’t adhere to them, your code may behave unexpectedly or even crash, and code that seems to work on one operating system may stop working on the next release.
Apple platforms typically follow the data representation and procedure call rules in the standard System V psABI for AMD64, using the LP64 programming model. However, when those rules are in conflict with the longstanding behavior of the Apple LLVM compiler (Clang) on Apple platforms, then the ABI typically diverges from the standard Processor Specific Application Binary Interface (psABI) and instead follows longstanding behavior. Several such divergences are below. If you discover a divergence not described here, please report it to Apple.
Adhere to CPU feature availability
The Intel 64-bit architecture has been extended many times, adding new registers and instructions to the Instruction Set Architecture (ISA). You can leverage ISA extensions to make your code run more efficiently. Different ISA extensions are available on different Macs based on the processor used in that Mac. As a general rule, if your code uses an ISA extension on a processor that doesn’t support it, the processor crashes.
It is strongly recommended that you write apps that have installation requirements which are not more complex than a specific minimum operating system release. This includes requiring a specific set of processor extensions in order for the app to work. Some releases of macOS guarantee the presence of specific ISA extensions because they only support Macs that provide those extensions. If you want to use an ISA extension, but it’s not guaranteed to be present on your app’s minimum macOS deployment target, test for it dynamically using the CPUID instruction and be prepared to fall back to a different implementation if it’s not available.
This table summarizes the ISA extensions guaranteed by different macOS releases:
macOS release | Oldest supported processor | Available CPU features |
|---|---|---|
All releases | The x86-64 baseline, plus CMPXCHG16B, LAHF-SAHF, SSE3, SSSE3 | |
Sierra (10.12) | The above, plus SSE4.1 |
More recent 64-bit Intel Macs included a large number of additional ISA extensions, but the presence of those extensions is not guaranteed by any release of macOS, and you need to test dynamically for them.
Rosetta support for 64-bit Intel processors includes all of the ISA extensions above plus the POPCNT and SSE4.2 ISA extensions.
Adhere to CPU registers’ intended purposes
Register usage for ordinary functions follows the standard psABI.
Calls to the initialization functions for C++ thread_local variables (typically starting with _ZTH and _ZTW) treat rcx, rdx, rsi, r8, r9, r10, and r11 as callee-saved registers in addition to those specified by the standard psABI.
The Swift calling convention uses several registers that don’t have special meaning in the standard psABI, depending on the signature of the Swift function and whether it is synchronous or asynchronous.
Synchronous functions fall into three areas:
- Returns a value indirectly
The first indirect return address is passed in
rax. Additional indirect return addresses are passed as normal arguments that precede all other arguments, for example, the second is inrdi, the third is inrsi, and so on.- Context parameter that fits in a single integer register
Examples are closures and class methods. Functions receive this context in
r13.r13is preserved by such calls; after the call, it must hold the same value that the caller passed in.- Throws
Error These functions use
r12for this purpose. The caller must setr12to zero prior to the call; ifr12is non-zero after return, the function is throwing an error, and the value inr12is that error.r12is no longer a callee-saved register for such calls.
Asynchronous Swift functions receive the address of their async frame in r14. r14 is no longer a callee-saved register for such calls.
Handle data types and data alignment properly
Data type representations largely follow the standard psABI. However, the Apple LLVM compiler supports several types not covered in the psABI; the rules for these types are described here.
The Apple LLVM compiler allows vectors with arbitrary element counts. The storage size (in bytes) of a vector type is always rounded up to the nearest power of two. The alignment is equal to the storage size, except that it’s capped by the maximum native vector size, as determined by the current target CPU features: 64 bytes if AVX-512 is enabled, otherwise 32 bytes if AVX is enabled, otherwise 16 bytes. Note that this means that the ABI for large vector types depends on the target CPU features, and code may not interoperate between files compiled with different CPU features; this is inherited from the standard psABI.
__strong- and__weak-qualified pointer types in Objective-C ARC have the same layout as the underlying reference type in non-ARC Objective-C. However, structures that contain__strong- and__weak-qualified fields have non-trivial ownership, and when the caller passes them as arguments, the callee is responsible for destroying the fields. Furthermore, you must pass and return indirectly structures that contain__weak-qualified fields, just like a non-trivially-copyable C++ class type.
Pass arguments to functions correctly
When passing arguments and returning results to functions, Apple platforms diverge from the standard psABI in the following ways:
Integer arguments that are smaller than
intare required to be promoted tointby the caller, and the callee may assume that this has been done. (This includes enumerations whose underlying type is smaller thanint.) For example, if the caller passes asigned shortargument in a register, the low 32 bits of the register at the moment of call must represent a value between -32,768 and 32,767 (inclusive). Similar, if the caller passes anunsigned charargument in a register, the low 32 bits of the register at the moment of call must represent a value between 0 and 255 (inclusive). This rule also applies to return values and arguments passed on the stack.The classification algorithm considers vectors smaller than 8 bytes to have
INTEGERclass. 8-byte vectors ofdoubleare classified asMEMORY. 8-byte vectors of 64-bit integer element type are classified asINTEGER. Other 8-byte vectors are classified asSSE.For vectors larger than 8 bytes, the classification algorithm uses the rules from the standard psABI, including the rule that vectors larger than the maximum native vector size are classified as
MEMORY. Just as with data layout, this means the calling convention for vector types larger than 16 bytes depends on the current target CPU features, and code may not interoperate between files compiled with different CPU features.The classification algorithm does not perform step (b) of the post-merger cleanup. Instead, after classification is otherwise complete (not during recursive classification),
X87UPis converted toSSEwhen it does not followX87. For example:typedef union { long double d; void *p; } odd_union; void f(odd_union u);The caller passes the first eight bytes of
uinrdi, and passes the second eight bytes (the exponent bits ofu.d) in the low bits ofxmm0.
The psABI rules only apply to C, C++, and Objective-C calls. The Swift calling convention substantially differs from the psABI in ways that exceed the scope of this document to explain, beyond the register usage differences noted above.