LLVM IR and Tooling Purpose Guide agents through the LLVM IR pipeline: generating IR, running optimisation passes with opt , lowering to assembly with llc , and inspecting IR for debugging or performance work. Triggers "Show me the LLVM IR for this function" "How do I run an LLVM optimisation pass?" "What does this LLVM IR instruction mean?" "How do I write a custom LLVM pass?" "Why isn't auto-vectorisation happening in LLVM?" Workflow 1. Generate LLVM IR

Emit textual IR (.ll)

clang -O0 -emit-llvm -S src.c -o src.ll

Emit bitcode (.bc)

clang -O2 -emit-llvm -c src.c -o src.bc

Disassemble bitcode to text

llvm-dis src.bc -o src.ll 2. Run optimisation passes with opt

Apply a specific pass

opt -passes = 'mem2reg,instcombine,simplifycfg' src.ll -S -o out.ll

Standard optimisation pipelines

opt -passes = 'default' src.ll -S -o out.ll opt -passes = 'default' src.ll -S -o out.ll

List available passes

opt --print-passes 2

&1 | less

Print IR before and after a pass

opt -passes = 'instcombine' --print-before = instcombine --print-after = instcombine src.ll -S -o out.ll 2

&1 | less 3. Lower IR to assembly with llc

Compile IR to object file

llc -filetype = obj src.ll -o src.o

Compile to assembly

llc -filetype = asm -masm-syntax = intel src.ll -o src.s

Target a specific CPU

llc -mcpu = skylake -mattr = +avx2 src.ll -o src.s

Show available targets

llc --version 4. Inspect IR Key IR constructs to understand: Construct Meaning alloca Stack allocation (pre-SSA; mem2reg promotes to registers) load / store Memory access getelementptr (GEP) Pointer arithmetic / field access phi SSA φ-node: merges values from predecessor blocks call / invoke Function call ( invoke has exception edges) icmp / fcmp Integer/float comparison br Branch (conditional or unconditional) ret Return bitcast Reinterpret bits (no-op in codegen) ptrtoint / inttoptr Pointer↔integer (avoid where possible) 5. Key passes Pass Effect mem2reg Promote alloca to SSA registers instcombine Instruction combining / peephole simplifycfg CFG cleanup, dead block removal loop-vectorize Auto-vectorisation slp-vectorize Superword-level parallelism (straight-line vectorisation) inline Function inlining gvn Global value numbering (common subexpression elimination) licm Loop-invariant code motion loop-unroll Loop unrolling argpromotion Promote pointer args to values sroa Scalar Replacement of Aggregates 6. Debugging missed optimisations

Why was a loop not vectorised?

clang -O2 -Rpass-missed = loop-vectorize -Rpass-analysis = loop-vectorize src.c

Dump pass pipeline

clang -O2 -mllvm -debug-pass = Structure src.c -o /dev/null 2