Agent-Ready Codebase

Overview

When agents struggle with a codebase, they are reflecting and amplifying the codebase's existing weaknesses. This skill evaluates codebases against five principles that determine agent effectiveness, and provides concrete guidance to improve each one. It adapts to the project's language and stack.

Based on

"AI Is Forcing Us To Write Good Code"

.

Mode Selection

Determine which mode to operate in based on context:

Audit

The user has an existing codebase and wants to know where it stands. Evaluate all five principles and produce a scorecard with specific findings.

Guide

The user wants to improve a specific principle or set up a new project. Provide targeted, actionable steps for their stack.

If the mode is unclear, ask.

The Five Principles

100% Test Coverage

-- Force every line of code to demonstrate its behavior with an executable example

Thoughtful File Structure

-- Make the filesystem a navigable interface for agents

End-to-End Types

-- Eliminate illegal states and shrink the agent's search space

Fast, Ephemeral, Concurrent Dev Environments

-- Keep feedback loops short and enable parallel agent workflows

Automated Enforcement

-- Remove degrees of freedom from the agent via linters, formatters, and hooks

Audit Workflow

To audit a codebase, work through these steps:

1. Detect the Stack

Identify the primary language, test framework, build system, and database by examining project files (e.g.

package.json

,

go.mod

,

Gemfile

,

pyproject.toml

,

Cargo.toml

). This determines which tooling recommendations apply.

2. Evaluate Each Principle

Read

references/checklist.md

for detailed criteria per principle. For each principle, determine the current state:

Test Coverage

Run or inspect coverage tooling. Look for CI enforcement. Report the current percentage and whether uncovered lines are identifiable.

File Structure

Sample the directory tree. Measure file sizes. Flag catch-all files (

utils

,

helpers

,

common

). Assess whether filenames communicate domain purpose.

Type System

Check for strict mode, semantic type names, API contract schemas, database constraints. Identify

any

/untyped gaps.

Dev Environments

Check for single-command setup, test suite runtime, port/DB isolation, worktree or container support.

Automated Enforcement

Check for linter/formatter configs, CI pipelines, git hooks, agent hooks.
3. Produce the Scorecard
Present findings as a table with one row per principle:
Principle
Rating
Key Finding
Test Coverage
Strong / Adequate / Weak
e.g. "87% coverage, no CI enforcement"
File Structure
Strong / Adequate / Weak
e.g. "3 files over 500 lines, 2 catch-all utils files"
Types
Strong / Adequate / Weak
e.g. "Strict TS, but no API schema generation"
Dev Environments
Strong / Adequate / Weak
e.g. "Manual 8-step setup, no concurrent support"
Enforcement
Strong / Adequate / Weak
e.g. "ESLint configured but not in CI"
4. Prioritize Improvements
Rank the weakest principles and suggest concrete next steps for the top 2-3. Each recommendation should reference the project's actual stack and tooling.
Guide Workflow
When guiding improvements to a specific principle:
Read
references/checklist.md
for the relevant section
Assess current state of that principle in the project
Provide a concrete, ordered list of changes for the project's stack
Where possible, show exact commands or config snippets
Key Insight: Why 100% Coverage
The most counterintuitive principle deserves emphasis. At 100% line coverage:
There is a
phase change: uncovered lines are always from recent changes, removing all ambiguity about what needs testing The coverage report becomes a simple todo list of tests still needed It is not about proving "no bugs" -- it forces the author to demonstrate how every line behaves Unreachable code surfaces immediately and gets deleted Code reviews become easier because reviewers see concrete behavior examples Once achieved, 100% is remarkably easy to maintain -- the coverage report enumerates exactly what lines need testing Resources references/ checklist.md -- Detailed evaluation criteria for each of the five principles, including stack-specific tooling, key indicators (Strong/Adequate/Weak), and guidance. Load this file when performing an audit or providing detailed guidance on any principle.

agent-ready-codebase

安装