Coding Cairo Rules and patterns for writing efficient Cairo code. Sourced from audit findings and production profiling. Attribution: Originally authored by feltroidprime ( cairo-skills ). Integrated with permission into starknet-agentic. When to Use Implementing arithmetic (modular, parity checks, quotient/remainder) Optimizing loops (slow iteration, repeated .len() calls, index-based access) Splitting or assembling integer limbs (u256 → u128, u32s → u128, felt252 → u96) Packing struct fields into storage slots Using BoundedInt for zero-overhead arithmetic with compile-time bounds Choosing integer types (u128 vs u256, BoundedInt vs native types) Not for: Profiling/benchmarking (use benchmarking-cairo if available) Quick Reference — All Rules

Rule

Instead of

Use

1

Combined quotient+remainder

x / m

+

x % m

DivRem::div_rem(x, m)

2

Cheap loop conditions

while i < n

while i != n

3

Constant powers of 2

2_u32.pow(k)

match

-based lookup table

4

Pointer-based iteration

*data.at(i)

in index loop

pop_front

/

for

/

multi_pop_front

5

Cache array length

.len()

in loop condition

let n = data.len();

before loop

6

Pointer-based slicing

Manual loop extraction

span.slice(start, length)

7

Cheap parity/halving

index & 1

,

index / 2

DivRem::div_rem(index, 2)

8

Smallest integer type

u256

when range < 2^128

u128

(type encodes constraint)

9

Storage slot packing

One slot per field

StorePacking

trait

10

BoundedInt for limbs

Bitwise ops / raw u128 math

bounded_int::{div_rem, mul, add}

11

Fast 2-input Poseidon

poseidon_hash_span([x,y])

hades_permutation(x, y, 2)

Always / Never Rules

1. Always use

DivRem::div_rem

— never separate

%

and

/

Cairo computes quotient and remainder in a single operation. Using both

%

and

/

on the same value doubles the cost.

// BAD

let q = x / m;

let r = x % m;

// GOOD

let (q, r) = DivRem::div_rem(x, m);

2. Never use

<

or

>

in while loop conditions — use

!=

Equality checks are cheaper than comparisons in Cairo.

// BAD

while i < n

// GOOD

while i != n

3. Never compute

2^k

with

pow()

— use a lookup table

u32::pow()

is expensive. Use a

match

lookup for known ranges.

// BAD

let p = 2_u32.pow(depth.into());

// GOOD — match-based lookup

fn pow2(n: u32) -> u32 {

match n {

0 => 1, 1 => 2, 2 => 4, 3 => 8, 4 => 16, 5 => 32,

6 => 64, 7 => 128, 8 => 256, 9 => 512, 10 => 1024,

// extend as needed

_ => core::panic_with_felt252('pow2 out of range'),

}

4. Always iterate arrays with

pop_front

/

for

/

multi_pop_front

— never index-loop

Index-based access (

array.at(i)

) is more expensive than pointer-based iteration.

// BAD

let mut i = 0;

while i != data.len() {

let val = *data.at(i);

i += 1;

}

// GOOD — pop_front

while let Option::Some(val) = data.pop_front()

// GOOD — for loop (equivalent)

for val in data

// GOOD — batch iteration

while let Option::Some(chunk) = data.multi_pop_front::<4>()

5. Never call

.len()

inside a loop condition — cache it

.len()

recomputes every iteration. Store it once.

// BAD

while i != data.len()

// GOOD

let n = data.len();

while i != n

6. Always use

span.slice()

instead of manual loop extraction

slice()

manipulates pointers directly — no element-by-element copying.

// BAD

let mut result: Array = array![];

let mut i = 0;

while i != length {

result.append(*data.at(start + i));

i += 1;

}

// GOOD

let result = data.slice(start, length);

7. Always use

DivRem

for parity checks — never use bitwise ops

Bitwise AND is more expensive than

div_rem

in Cairo. Use

DivRem::div_rem(x, 2)

to get both the halved value and parity in one operation.

// BAD

let is_odd = (index & 1) == 1;

index = index / 2;

// GOOD

let (q, r) = DivRem::div_rem(index, 2);

if r == 1 { / odd branch / }

index = q;

8. Always use the smallest integer type that fits the value range

u128

instead of

u256

when the range is known. Adds clarity, prevents intermediate overflow.

// BAD — u256 for a value known to be < 2^128

fn deposit(value: u256)

// GOOD — type encodes the constraint

fn deposit(value: u128)

9. Always use

StorePacking

to pack small fields into one storage slot

Multiple small fields (basis points, flags, bounded amounts) can share a single

felt252

slot.

use starknet::storage_access::StorePacking;

const POW_2_128: felt252 = 0x100000000000000000000000000000000;

impl MyStorePacking of StorePacking {

fn pack(value: MyStruct) -> felt252 {

value.amount.into() + value.fee_bps.into() * POW_2_128

}

fn unpack(value: felt252) -> MyStruct {

let u256 { low, high } = value.into();

MyStruct

}

10. Always use BoundedInt for byte cutting, limb assembly, and type conversions

Never use bitwise ops (

&

,

|

, shifts) or raw

u128

/

u256

arithmetic for splitting or combining integer limbs. Use

bounded_int::div_rem

to extract parts and

bounded_int::mul

+

bounded_int::add

to assemble them. BoundedInt tracks bounds at compile time, eliminating overflow checks.

Assembling limbs

(e.g., 4 x u32 → u128):

// BAD — direct u128 arithmetic (28,340 gas)

fn u32s_to_u128(d0: u32, d1: u32, d2: u32, d3: u32) -> u128 {

d0.into() + d1.into() * POW_2_32 + d2.into() * POW_2_64 + d3.into() * POW_2_96

}

// GOOD — BoundedInt (13,840 gas, 2x faster)

fn u32s_to_u128(d0: u32, d1: u32, d2: u32, d3: u32) -> u128 {

let d0_bi: u32_bi = upcast(d0);

let d1_bi: u32_bi = upcast(d1);

let d2_bi: u32_bi = upcast(d2);

let d3_bi: u32_bi = upcast(d3);

let r: u128_bi = add(add(add(d0_bi, mul(d1_bi, POW_32_UI)), mul(d2_bi, POW_64_UI)), mul(d3_bi, POW_96_UI));

upcast(r)

}

Splitting values

(e.g., felt252 → two u96 limbs):

// GOOD — div_rem to split, mul+add to reassemble

fn felt252_to_two_u96(value: felt252) -> (u96, u96) {

match u128s_from_felt252(value) {

U128sFromFelt252Result::Narrow(low) => {

let (hi32, lo96) = bounded_int::div_rem(low, NZ_POW96_TYPED);

(lo96, upcast(hi32))

},

U128sFromFelt252Result::Wide((high, low)) => {

let (lo_hi32, lo96) = bounded_int::div_rem(low, NZ_POW96_TYPED);

let hi64: BoundedInt<0, { POW64 - 1 }> = downcast(high).unwrap();

(lo96, bounded_int::add(bounded_int::mul(hi64, POW32_TYPED), lo_hi32))

},

}

Extracting bits

(e.g., building a 4-bit selector):

// GOOD — div_rem by 2 extracts LSB, quotient is right-shifted value

let (qu1, bit0) = bounded_int::div_rem(u1, TWO_NZ); // bit0 in

let (qu2, bit1) = bounded_int::div_rem(u2, TWO_NZ);

let selector = add(bit0, mul(bit1, TWO_UI)); // selector in

See

garaga/selectors.cairo

and

cairo-perfs-snippets

for production examples.

11. Always use

hades_permutation

for 2-input Poseidon hashes

poseidon_hash_span

has overhead for span construction. For exactly two inputs, use the permutation directly.

// BAD

let hash = poseidon_hash_span(array![x, y].span());

// GOOD

let (hash, , ) = hades_permutation(x, y, 2);

Code Quality

DRY:

Extract repeated validation into helper functions. If two functions validate-then-write the same struct, extract a shared

_set_config()

.

scarb fmt

:

Run before every commit.

.tool-versions

:

Pin Scarb (2.15.1) and Starknet Foundry (0.56.0) versions with ASDF for reproducible builds.

Keep dependencies updated:

Newer Scarb/Foundry versions include gas optimizations and compiler improvements.

BoundedInt Optimization

BoundedInt

encodes value constraints in the type system, eliminating runtime overflow checks. Use the CLI tool (

bounded_int_calc.py

in this skill directory) to compute bounds — do NOT calculate manually.

Critical Architecture Decision: Avoid Downcast

The #1 optimization pitfall:

Converting between

u16

/

u32

/

u64

and

BoundedInt

at function boundaries.

The Problem

If your functions take

u16

and return

u16

, you must:

downcast

input to

BoundedInt

(expensive — requires range check)

Do bounded arithmetic (cheap)

upcast

result back to

u16

(cheap but wasteful)

The

downcast

operation adds a range check that

dominates the savings

from bounded arithmetic. In profiling:

downcast

161,280 steps (18.86%)
bounded_int_div_rem: 204,288 steps (23.89%) Total bounded approach: worse than original! The Solution: BoundedInt Throughout Use BoundedInt types as function inputs AND outputs. This eliminates downcast entirely. // BAD: Converts at every call (downcast overhead kills performance) pub fn add_mod(a: u16, b: u16) -> u16 { let a: Zq = downcast(a).expect('overflow'); // EXPENSIVE let b: Zq = downcast(b).expect('overflow'); // EXPENSIVE let sum: ZqSum = add(a, b); let (_q, rem) = bounded_int_div_rem(sum, nz_q); upcast(rem) } // GOOD: BoundedInt in, BoundedInt out (no downcast) pub fn add_mod(a: Zq, b: Zq) -> Zq { let sum: ZqSum = add(a, b); let (_q, rem) = bounded_int_div_rem(sum, nz_q); rem } Refactoring Strategy When optimizing existing code: Identify the hot path — profile to find which functions use modular arithmetic heavily Change signatures — update function inputs/outputs to use BoundedInt types Propagate types outward — callers must also use BoundedInt Downcast only at boundaries — convert from u16/u32 only at system entry points (e.g., deserialization) Type Conversion Rules From To Operation Cost u16 BoundedInt<0, 65535> upcast Free (superset) u16 BoundedInt<0, 12288> downcast Expensive (range check) BoundedInt<0, 12288> u16 upcast Free (subset) BoundedInt BoundedInt where [A,B] ⊆ [C,D] upcast Free BoundedInt BoundedInt where [A,B] ⊄ [C,D] downcast Expensive Key insight: upcast only works when target range is a superset of source range. You cannot upcast u32 to BoundedInt<0, 150994944> because u32 max (4294967295) > 150994944. Prerequisites

Scarb.toml

[ dependencies ] corelib_imports = "0.1.2" // CORRECT imports — copy exactly use corelib_imports::bounded_int::{ BoundedInt, upcast, downcast, bounded_int_div_rem, AddHelper, MulHelper, DivRemHelper, UnitInt, }; use corelib_imports::bounded_int::bounded_int::{SubHelper, add, sub, mul}; Copy-Paste Template Working example for modular arithmetic mod 100: use corelib_imports::bounded_int::{ BoundedInt, upcast, downcast, bounded_int_div_rem, AddHelper, MulHelper, DivRemHelper, UnitInt, }; use corelib_imports::bounded_int::bounded_int::{SubHelper, add, sub, mul}; type Val = BoundedInt<0, 99>; // [0, 99] type ValSum = BoundedInt<0, 198>; // [0, 198] type ValConst = UnitInt<100>; // singleton {100} impl AddValImpl of AddHelper { type Result = ValSum; } impl DivRemValImpl of DivRemHelper { type DivT = BoundedInt<0, 1>; type RemT = Val; } fn add_mod_100(a: Val, b: Val) -> Val { let sum: ValSum = add(a, b); let nz_100: NonZero = 100; let (_q, rem) = bounded_int_div_rem(sum, nz_100); rem } CLI Tool Use bounded_int_calc.py in this skill directory. Always use CLI — never calculate manually.

Addition: [a_lo, a_hi] + [b_lo, b_hi]

python3 bounded_int_calc.py add 0 12288 0 12288

-> BoundedInt<0, 24576>

Subtraction: [a_lo, a_hi] - [b_lo, b_hi]

python3 bounded_int_calc.py sub 0 12288 0 12288

-> BoundedInt<-12288, 12288>

Multiplication

python3 bounded_int_calc.py mul 0 12288 0 12288

-> BoundedInt<0, 150994944>

Division: quotient and remainder bounds

python3 bounded_int_calc.py div 0 24576 12289 12289

-> DivT: BoundedInt<0, 1>, RemT: BoundedInt<0, 12288>

Custom impl name

python3 bounded_int_calc.py mul 0 12288 0 12288 --name MulZqImpl BoundedInt Bounds Quick Reference Operation Formula Add [a_lo + b_lo, a_hi + b_hi] Sub [a_lo - b_hi, a_hi - b_lo] Mul (unsigned) [a_lo * b_lo, a_hi * b_hi] Div quotient [a_lo / b_hi, a_hi / b_lo] Div remainder [0, b_hi - 1] Negative Dividends: SHIFT Pattern bounded_int_div_rem doesn't support negative lower bounds. When a subtraction produces a negative-bounded result that needs reduction, add a multiple of the modulus first: // sub_mod: (a - b) mod Q via SHIFT pub fn sub_mod(a: Zq, b: Zq) -> Zq { let a_plus_q: BoundedInt<12289, 24577> = add(a, Q_CONST); // shift by +Q let diff: BoundedInt<1, 24577> = sub(a_plus_q, b); // now non-negative let (_q, rem) = bounded_int_div_rem(diff, nz_q()); rem } // fused_sub_mul_mod: a - (b*c) mod Q via large SHIFT // OFFSET = 12288 * Q = 151007232 (smallest multiple of Q >= max product) pub fn fused_sub_mul_mod(a: Zq, b: Zq, c: Zq) -> Zq { let prod: ZqProd = mul(b, c); let a_offset: BoundedInt<151007232, 151019520> = add(a, OFFSET_CONST); let diff: BoundedInt<12288, 151019520> = sub(a_offset, prod); let (_q, rem) = bounded_int_div_rem(diff, nz_q()); rem } Rule: SHIFT = ceil(|min_possible_value| / modulus) * modulus . Adding SHIFT preserves the result mod Q (since SHIFT ≡ 0 mod Q) while making all values non-negative. Common BoundedInt Mistakes Downcast at every function call — the biggest performance killer. Use BoundedInt types throughout, not just inside arithmetic functions. Trying to upcast to a narrower type — upcast(val: u32) to BoundedInt<0, 150994944> fails because u32 max > 150994944. Wrong imports — use exact imports from Prerequisites section above. Wrong subtraction bounds — it's [a_lo - b_hi, a_hi - b_lo] , NOT [a_lo - b_lo, a_hi - b_hi] . Negative dividend in bounded_int_div_rem — div_rem doesn't support negative lower bounds. Add a SHIFT (multiple of modulus) before reducing. See SHIFT pattern above. Missing intermediate types — always annotate: let sum: ZqSum = add(a, b); Division quotient off-by-one — integer division floors: 24576 / 12289 = 1 , not 2. Using UnitInt vs BoundedInt for constants — use UnitInt for singleton constants like divisors. Using div_rem vs bounded_int_div_rem — the function is bounded_int_div_rem , not div_rem . Bounds exceed u128::max — BoundedInt bounds are hard-capped at 2^128. Larger values crash the Sierra specializer: 'Provided generic argument is unsupported.'

cairo-optimization

安装

Scarb.toml

Addition: [a_lo, a_hi] + [b_lo, b_hi]

-> BoundedInt<0, 24576>

Subtraction: [a_lo, a_hi] - [b_lo, b_hi]

-> BoundedInt<-12288, 12288>

Multiplication

-> BoundedInt<0, 150994944>

Division: quotient and remainder bounds

-> DivT: BoundedInt<0, 1>, RemT: BoundedInt<0, 12288>

Custom impl name