x86-64 Assembly Purpose Guide agents through x86-64 assembly: reading compiler output, understanding the ABI, writing inline asm, and common patterns. Triggers "How do I read the assembly GCC generated?" "What are the x86-64 registers?" "What is the calling convention on Linux/macOS?" "How do I write inline assembly in C?" "How do I use SSE/AVX intrinsics?" "This assembly uses %rsp / %rbp — what does it mean?" Workflow 1. Generate and read assembly

AT&T syntax (GCC default)

gcc -S -O2 -fverbose-asm foo.c -o foo.s

Intel syntax

gcc -S -masm = intel -O2 foo.c -o foo.s

From GDB

( gdb ) disassemble /s main

with source

( gdb ) x/20i $rip

From objdump

objdump -d -M intel -S prog

Intel + source (needs -g)

x86-64 registers 64-bit 32-bit 16-bit 8-bit high 8-bit low Purpose %rax %eax %ax %ah %al Return value / accumulator %rbx %ebx %bx %bh %bl Callee-saved %rcx %ecx %cx %ch %cl 4th arg / count %rdx %edx %dx %dh %dl 3rd arg / 2nd return %rsi %esi %si — %sil 2nd arg %rdi %edi %di — %dil 1st arg %rbp %ebp %bp — %bpl Frame pointer (callee-saved) %rsp %esp %sp — %spl Stack pointer %r8 – %r11 %r8d – %r11d %r8w – %r11w — %r8b – %r11b 5th–8th args / caller-saved %r12 – %r15 %r12d – %r15d %r12w – %r15w — %r12b – %r15b Callee-saved %rip Instruction pointer %rflags %eflags Status flags %xmm0 – %xmm7 FP/SIMD args and return %xmm8 – %xmm15 Caller-saved SIMD %ymm0 – %ymm15 AVX 256-bit %zmm0 – %zmm31 AVX-512 512-bit
System V AMD64 ABI (Linux, macOS, FreeBSD) Integer/pointer argument registers (in order): %rdi, %rsi, %rdx, %rcx, %r8, %r9 Floating-point argument registers: %xmm0 – %xmm7 Return values: Integer: %rax (low), %rdx (high if 128-bit) Float: %xmm0 (low), %xmm1 (high) Caller-saved (scratch): %rax, %rcx, %rdx, %rsi, %rdi, %r8–%r11, %xmm0–%xmm15 Callee-saved (must preserve): %rbx, %rbp, %r12–%r15 Stack: 16-byte aligned before call ; call pushes 8 bytes → 16-byte aligned at function entry after prologue. Red zone: 128 bytes below %rsp may be used by leaf functions without adjusting %rsp . Not available in kernel/signal handlers.
Common instruction patterns Pattern Meaning mov %rdi, %rax Copy rdi to rax mov (%rdi), %rax Load 8 bytes from address in rdi mov %rax, 8(%rdi) Store rax to rdi+8 lea 8(%rdi), %rax Load effective address rdi+8 into rax (no memory access) push %rbx Push rbx; rsp -= 8 pop %rbx Pop into rbx; rsp += 8 call foo Push return addr; jmp foo ret Pop return addr; jmp to it xor %eax, %eax Zero rax (smaller encoding than mov $0, %rax ) test %rax, %rax Set ZF if rax == 0 (cheaper than cmp $0, %rax ) cmp $5, %rdi Set flags for rdi - 5 jl label Jump if signed less than
AT&T vs Intel syntax Feature AT&T Intel Operand order source, dest dest, source Register prefix %rax rax Immediate prefix $42 42 Memory operand 8(%rdi) [rdi+8] Size suffix movl , movq — (inferred) GCC emits AT&T by default. Use -masm=intel for Intel syntax.
Inline assembly (GCC extended asm) // Basic: increment a register int x = 5 ; asm volatile ( "incl %0" : "=r" ( x ) // outputs: =r means write-only register : "0" ( x ) // inputs: 0 means same as output 0 : // clobbers: none ) ; // CPUID example uint32_t eax , ebx , ecx , edx ; asm volatile ( "cpuid" : "=a" ( eax ) , "=b" ( ebx ) , "=c" ( ecx ) , "=d" ( edx ) : "a" ( 1 ) // input: leaf 1 ) ; // Atomic increment static inline int atomic_inc ( volatile int * p ) { int ret ; asm volatile ( "lock; xaddl %0, %1" : "=r" ( ret ) , "+m" ( * p ) : "0" ( 1 ) : "memory" ) ; return ret + 1 ; } Constraint codes: "r" — any general register "m" — memory operand "i" — immediate integer "a" , "b" , "c" , "d" — specific registers (%rax, %rbx, %rcx, %rdx) "=" prefix — output (write-only) "+" prefix — read-write "memory" clobber — tells compiler memory may be modified (barrier)
SSE/AVX intrinsics (preferred over inline asm)

include // includes all x86 SIMD headers // Add 8 floats at once with AVX __m256 a = _mm256_loadu_ps ( arr_a ) ; // load 8 floats (unaligned) __m256 b = _mm256_loadu_ps ( arr_b ) ; __m256 c = _mm256_add_ps ( a , b ) ; _mm256_storeu_ps ( result , c ) ; Check CPU support at compile time: -mavx2 or -march=native . Check at runtime: __builtin_cpu_supports("avx2") . For a full register and instruction reference, see references/reference.md .

assembly-x86

安装