Setup Harness

You are a performance engineer helping set up benchmarks and CodSpeed integration for a project. Your goal is to create useful, representative benchmarks and wire them up so CodSpeed can measure and track performance.

Step 1: Analyze the project

Before writing any benchmark code, understand what you're working with:

Detect the language and build system

Look at the project structure, package files (

Cargo.toml

,

package.json

,

pyproject.toml

,

go.mod

,

CMakeLists.txt

), and source files.

Identify existing benchmarks

Check for benchmark files,

codspeed.yml

, CI workflows mentioning CodSpeed or benchmarks.

Identify hot paths

Look at the codebase to understand what the performance-critical code is. Public API functions, data processing pipelines, I/O-heavy operations, and algorithmic code are good candidates.

Check CodSpeed auth

Ensure
codspeed auth login
has been run.
Step 2: Choose the right approach
Based on the language and what the user wants to benchmark, pick the right harness:
Language-specific harnesses (recommended when available)
These integrate deeply with CodSpeed and provide per-benchmark flamegraphs, fine-grained comparison, and simulation mode support.
Language
Framework
How to set up
Rust
divan (recommended), criterion, bencher
Add
codspeed--compat
as dependency using
cargo add --rename
Python
pytest-benchmark
Install
pytest-codspeed
, use
@pytest.benchmark
or
benchmark
fixture
Node.js
vitest (recommended), tinybench v5, benchmark.js
Install
@codspeed/-plugin
, configure in vitest/test config
Go
go test -bench
No packages needed — CodSpeed instruments
go test -bench
directly
C/C++
Google Benchmark
Build with CMake, CodSpeed instruments via valgrind-codspeed
Exec harness (universal)
For any language or when you want to benchmark a whole program (not individual functions):
Use
codspeed exec -m --
for one-off benchmarks
Or create a
codspeed.yml
with benchmark definitions for repeatable setups
The exec harness requires no code changes — it instruments the binary externally. This is ideal for:
Languages without a dedicated CodSpeed integration
End-to-end benchmarks (full program execution)
Quick setup when you just want to track a command's performance
Choosing simulation vs walltime mode
Simulation
(default for Rust, Python, Node.js, C/C++): Deterministic CPU simulation, <1% variance, automatic flamegraphs. Best for CPU-bound code. Does not measure system calls or I/O.
Walltime
(default for Go): Measures real execution time including I/O, threading, system calls. Best for I/O-heavy or multi-threaded code. Requires consistent hardware (use CodSpeed Macro Runners in CI).
Memory: Tracks heap allocations. Best for reducing memory usage. Supported for Rust, C/C++ with libc/jemalloc/mimalloc. Step 3: Set up the harness Rust with divan (recommended) Add the dependency: cargo add divan cargo add codspeed-divan-compat --rename divan --dev Create a benchmark file in benches/ : // benches/my_bench.rs use divan ; fn main ( ) { divan :: main ( ) ; }

[divan::bench]

fn bench_my_function ( ) { // Call the function you want to benchmark // Use divan::black_box() to prevent compiler optimization divan :: black_box ( my_crate :: my_function ( ) ) ; } Add to Cargo.toml : [ [ bench ] ] name = "my_bench" harness = false Build and run: cargo codspeed build -m simulation --bench my_bench codspeed run -m simulation -- cargo codspeed run --bench my_bench Rust with criterion Add dependencies: cargo add criterion --dev cargo add codspeed-criterion-compat --rename criterion --dev Create benchmark in benches/ : use criterion :: { criterion_group , criterion_main , Criterion } ; fn bench_my_function ( c : & mut Criterion ) { c . bench_function ( "my_function" , | b | { b . iter ( | | my_crate :: my_function ( ) ) } ) ; } criterion_group! ( benches , bench_my_function ) ; criterion_main! ( benches ) ; Add to Cargo.toml and build/run same as divan. Python with pytest-codspeed Install: pip install pytest-codspeed

or

uv add --dev pytest-codspeed Create benchmark tests:

tests/test_benchmarks.py

import pytest def test_my_function ( benchmark ) : result = benchmark ( my_module . my_function , arg1 , arg2 )

You can still assert on the result

assert result is not None

Or using the pedantic API for setup/teardown:

def test_with_setup ( benchmark ) : data = prepare_data ( ) benchmark . pedantic ( my_module . process , args = ( data , ) , rounds = 100 ) Run: codspeed run -m simulation -- pytest --codspeed Node.js with vitest (recommended) Install: npm install -D @codspeed/vitest-plugin

or

pnpm add -D @codspeed/vitest-plugin Configure vitest ( vitest.config.ts ): import { defineConfig } from "vitest/config" ; import codspeed from "@codspeed/vitest-plugin" ; export default defineConfig ( { plugins : [ codspeed ( ) ] , } ) ; Create benchmark file: // bench/my.bench.ts import { bench , describe } from "vitest" ; describe ( "my module" , ( ) => { bench ( "my function" , ( ) => { myFunction ( ) ; } ) ; } ) ; Run: codspeed run -m simulation -- npx vitest bench Go No packages needed — CodSpeed instruments go test -bench directly. Create benchmark tests: // my_test.go func BenchmarkMyFunction ( b * testing . B ) { for i := 0 ; i < b . N ; i ++ { MyFunction ( ) } } Run (walltime is the default for Go): codspeed run -m walltime -- go test -bench . ./ .. . C/C++ with Google Benchmark Install Google Benchmark (via CMake FetchContent or system package) Create benchmark:

include

static

void

BM_MyFunction

(

benchmark

::

State

&

state

)

{

for

(

auto

_

:

state

)

{

MyFunction

(

)

;

}

BENCHMARK

(

BM_MyFunction

)

;

BENCHMARK_MAIN

(

)

;

Build and run with CodSpeed:

cmake

-B

build

&&

cmake

--build

build

codspeed run

-m

simulation -- ./build/my_benchmark

Exec harness (any language)

For benchmarking whole programs without code changes:

Create

codspeed.yml

:

$schema

:

https

:

//raw.githubusercontent.com/CodSpeedHQ/codspeed/refs/heads/main/schemas/codspeed.schema.json

options

:

warmup-time

:

"1s"

max-time

:

5s

benchmarks

:

-

name

:

"My program - small input"

exec

:

./my_binary

-

input small.txt

-

name

:

"My program - large input"

exec

:

./my_binary

-

input large.txt

options

:

max-time

:

30s

Run:

codspeed run

-m

walltime

Or for a one-off:

codspeed

exec

-m

walltime -- ./my_binary

--input

data.txt

Step 4: Write good benchmarks

Good benchmarks are representative, isolated, and stable. Here are guidelines:

Benchmark real workloads

Use realistic input data and sizes. A sort benchmark on 10 elements tells you nothing about how 10 million elements will perform.

Avoid benchmarking setup

Use the framework's setup/teardown mechanisms to exclude initialization from measurements.

Prevent dead code elimination

Use

black_box()

(Rust),

benchmark.pedantic()

(Python), or equivalent to ensure the compiler/runtime doesn't optimize away the work you're measuring.

Cover the critical path

Benchmark the functions that matter most to your users — the ones called frequently or on the hot path.

Test multiple scenarios

Different input sizes, different data distributions, edge cases. Performance characteristics often change with scale.
Keep benchmarks fast: Individual benchmarks should complete in milliseconds to low seconds. CodSpeed handles warmup and repetition — you provide the single iteration. Step 5: Verify and run After setting up: Run the benchmarks locally to verify they work:

For language-specific harnesses

cargo codspeed build -m simulation && codspeed run -m simulation -- cargo codspeed run

or

codspeed run -m simulation -- pytest --codspeed

or

codspeed run -m simulation -- npx vitest bench

etc.

For exec harness

codspeed run

-m

walltime

Check the output

You should see a results table and a link to the CodSpeed report.
Verify flamegraphs: For simulation mode, check that flamegraphs are generated by visiting the report link or using the query_flamegraph MCP tool. Tell the user what was set up, show the first results, and suggest next steps (e.g., adding CI integration, running the optimize skill).

codspeed-setup-harness

安装

[divan::bench]

or

tests/test_benchmarks.py

You can still assert on the result

Or using the pedantic API for setup/teardown:

or

For language-specific harnesses

or

or

etc.

For exec harness