- Harness — Long-Running Agent Framework
- Executable protocol enabling any agent task to run continuously across multiple sessions with automatic progress recovery, task dependency resolution, failure rollback, and standardized error handling.
- Design Principles
- Design for the agent, not the human
- — Test output, docs, and task structure are the agent's primary interface
- Progress files ARE the context
- — When context window resets, progress files + git history = full recovery
- Premature completion is the #1 failure mode
- — Structured task lists with explicit completion criteria prevent declaring victory early
- Standardize everything grep-able
- — ERROR on same line, structured timestamps, consistent prefixes
- Fast feedback loops
- — Pre-compute stats, run smoke tests before full validation
- Idempotent everything
- — Init scripts, task execution, environment setup must all be safe to re-run
- Fail safe, not fail silent
- — Every failure must have an explicit recovery strategy
- Commands
- /harness init
# Initialize harness files in project - /harness run # Start/resume the infinite loop
- /harness status # Show current progress and stats
- /harness add "task description" # Add a task to the list
- Activation Marker
- Hooks only take effect when
- .harness-active
- marker file exists in the harness root (same directory as
- harness-tasks.json
- ).
- /harness init
- and
- /harness run
- MUST create this marker:
- touch
/.harness-active - When all tasks complete (no pending/in_progress/retryable left), remove it:
- rm
/.harness-active - Without this marker, all hooks are no-ops — they exit 0 immediately
- Progress Persistence (Dual-File System)
- Maintain two files in the project working directory:
- harness-progress.txt (Append-Only Log)
- Free-text log of all agent actions across sessions. Never truncate.
- [2025-07-01T10:00:00Z] [SESSION-1] INIT Harness initialized for project /path/to/project
- [2025-07-01T10:00:05Z] [SESSION-1] INIT Environment health check: PASS
- [2025-07-01T10:00:10Z] [SESSION-1] LOCK acquired (pid=12345)
- [2025-07-01T10:00:11Z] [SESSION-1] Starting [task-001] Implement user authentication (base=def5678)
- [2025-07-01T10:05:00Z] [SESSION-1] CHECKPOINT [task-001] step=2/4 "auth routes created, tests pending"
- [2025-07-01T10:15:30Z] [SESSION-1] Completed [task-001] (commit abc1234)
- [2025-07-01T10:15:31Z] [SESSION-1] Starting [task-002] Add rate limiting (base=abc1234)
- [2025-07-01T10:20:00Z] [SESSION-1] ERROR [task-002] [TASK_EXEC] Redis connection refused
- [2025-07-01T10:20:01Z] [SESSION-1] ROLLBACK [task-002] git reset --hard abc1234
- [2025-07-01T10:20:02Z] [SESSION-1] STATS tasks_total=5 completed=1 failed=1 pending=3 blocked=0 attempts_total=2 checkpoints=1
- harness-tasks.json (Structured State)
- {
- "version"
- :
- 2
- ,
- "created"
- :
- "2025-07-01T10:00:00Z"
- ,
- "session_config"
- :
- {
- "concurrency_mode"
- :
- "exclusive"
- ,
- "max_tasks_per_session"
- :
- 20
- ,
- "max_sessions"
- :
- 50
- }
- ,
- "tasks"
- :
- [
- {
- "id"
- :
- "task-001"
- ,
- "title"
- :
- "Implement user authentication"
- ,
- "status"
- :
- "completed"
- ,
- "priority"
- :
- "P0"
- ,
- "depends_on"
- :
- [
- ]
- ,
- "attempts"
- :
- 1
- ,
- "max_attempts"
- :
- 3
- ,
- "started_at_commit"
- :
- "def5678"
- ,
- "validation"
- :
- {
- "command"
- :
- "npm test -- --testPathPattern=auth"
- ,
- "timeout_seconds"
- :
- 300
- }
- ,
- "on_failure"
- :
- {
- "cleanup"
- :
- null
- }
- ,
- "error_log"
- :
- [
- ]
- ,
- "checkpoints"
- :
- [
- ]
- ,
- "completed_at"
- :
- "2025-07-01T10:15:30Z"
- }
- ,
- {
- "id"
- :
- "task-002"
- ,
- "title"
- :
- "Add rate limiting"
- ,
- "status"
- :
- "failed"
- ,
- "priority"
- :
- "P1"
- ,
- "depends_on"
- :
- [
- ]
- ,
- "attempts"
- :
- 1
- ,
- "max_attempts"
- :
- 3
- ,
- "started_at_commit"
- :
- "abc1234"
- ,
- "validation"
- :
- {
- "command"
- :
- "npm test -- --testPathPattern=rate-limit"
- ,
- "timeout_seconds"
- :
- 120
- }
- ,
- "on_failure"
- :
- {
- "cleanup"
- :
- "docker compose down redis"
- }
- ,
- "error_log"
- :
- [
- "[TASK_EXEC] Redis connection refused"
- ]
- ,
- "checkpoints"
- :
- [
- ]
- ,
- "completed_at"
- :
- null
- }
- ,
- {
- "id"
- :
- "task-003"
- ,
- "title"
- :
- "Add OAuth providers"
- ,
- "status"
- :
- "pending"
- ,
- "priority"
- :
- "P1"
- ,
- "depends_on"
- :
- [
- "task-001"
- ]
- ,
- "attempts"
- :
- 0
- ,
- "max_attempts"
- :
- 3
- ,
- "started_at_commit"
- :
- null
- ,
- "validation"
- :
- {
- "command"
- :
- "npm test -- --testPathPattern=oauth"
- ,
- "timeout_seconds"
- :
- 180
- }
- ,
- "on_failure"
- :
- {
- "cleanup"
- :
- null
- }
- ,
- "error_log"
- :
- [
- ]
- ,
- "checkpoints"
- :
- [
- ]
- ,
- "completed_at"
- :
- null
- }
- ]
- ,
- "session_count"
- :
- 1
- ,
- "last_session"
- :
- "2025-07-01T10:20:02Z"
- }
- Task statuses:
- pending
- →
- in_progress
- (transient, set only during active execution) →
- completed
- or
- failed
- . A task found as
- in_progress
- at session start means the previous session was interrupted — handle via Context Window Recovery Protocol.
- In concurrent mode (see Concurrency Control), tasks may also carry claim metadata:
- claimed_by
- and
- lease_expires_at
- (ISO timestamp).
- Session boundary
- A session starts when the agent begins executing the Session Start protocol and ends when a Stopping Condition is met or the context window resets. Each session gets a unique SESSION-N identifier (N = session_count after increment). Concurrency Control Before modifying harness-tasks.json , acquire an exclusive lock using portable mkdir (atomic on all POSIX systems, works on both macOS and Linux):
Acquire lock (fail fast if another agent is running)
Lock key must be stable even if invoked from a subdirectory.
ROOT
" $PWD " SEARCH = " $PWD " while [ " $SEARCH " != "/" ] && [ ! -f " $SEARCH /harness-tasks.json" ] ; do SEARCH = " $( dirname " $SEARCH " ) " done if [ -f " $SEARCH /harness-tasks.json" ] ; then ROOT = " $SEARCH " fi PWD_HASH = " $( printf '%s' " $ROOT " | ( shasum -a 256 2
/dev/null || sha256sum 2
/dev/null ) | awk '{print $1}' | cut -c1-16 ) " LOCKDIR = "/tmp/harness- ${PWD_HASH :- unknown} .lock" if ! mkdir " $LOCKDIR " 2
/dev/null ; then
Check if lock holder is still alive
LOCK_PID
$( cat " $LOCKDIR /pid" 2
/dev/null ) if [ -n " $LOCK_PID " ] && kill -0 " $LOCK_PID " 2
/dev/null ; then echo "ERROR: Another harness session is active (pid= $LOCK_PID )" ; exit 1 fi
Stale lock — atomically reclaim via mv to avoid TOCTOU race
STALE
- "
- $LOCKDIR
- .stale.
- $$
- "
- if
- mv
- "
- $LOCKDIR
- "
- "
- $STALE
- "
- 2
- >
- /dev/null
- ;
- then
- rm
- -rf
- "
- $STALE
- "
- mkdir
- "
- $LOCKDIR
- "
- ||
- {
- echo
- "ERROR: Lock contention"
- ;
- exit
- 1
- ;
- }
- echo
- "WARN: Removed stale lock
- ${LOCK_PID
- :+
- from pid=$LOCK_PID}
- "
- else
- echo
- "ERROR: Another agent reclaimed the lock"
- ;
- exit
- 1
- fi
- fi
- echo
- "
- $$
- "
- >
- "
- $LOCKDIR
- /pid"
- trap
- 'rm -rf "$LOCKDIR"'
- EXIT
- Log lock acquisition:
- [timestamp] [SESSION-N] LOCK acquired (pid=
) - Log lock release:
- [timestamp] [SESSION-N] LOCK released
- Modes:
- Exclusive (default)
-
- hold the lock for the entire session (the
- trap EXIT
- handler releases it automatically). Any second session in the same state root fails fast.
- Concurrent (opt-in via
- session_config.concurrency_mode: "concurrent"
- )
-
- treat this as a
- state transaction lock
- . Hold it only while reading/modifying/writing
- harness-tasks.json
- (including
- .bak
- /
- .tmp
- ) and appending to
- harness-progress.txt
- . Release it immediately before doing real work.
- Concurrent mode invariants:
- All workers MUST point at the same state root (the directory that contains
- harness-tasks.json
- ). If you are using separate worktrees/clones, pin it explicitly (e.g.,
- HARNESS_STATE_ROOT=/abs/path/to/state-root
- ).
- Task selection is advisory; the real gate is
- atomic claim
- under the lock: set
- status="in_progress"
- , set
- claimed_by
- (stable worker id, e.g.,
- HARNESS_WORKER_ID
- ), set
- lease_expires_at
- . If claim fails (already
- in_progress
- with a valid lease), pick another eligible task and retry.
- Never run two workers in the same git working directory. Use separate worktrees/clones. Otherwise rollback (
- git reset --hard
- /
- git clean -fd
- ) will destroy other workers.
- Infinite Loop Protocol
- Session Start (Execute Every Time)
- Read state
-
- Read last 200 lines of
- harness-progress.txt
- + full
- harness-tasks.json
- . If JSON is unparseable, see JSON corruption recovery in Error Handling.
- Read git
-
- Run
- git log --oneline -20
- and
- git diff --stat
- to detect uncommitted work
- Acquire lock
- (mode-dependent): Exclusive mode fails if another session is active. Concurrent mode uses the lock only for state transactions.
- Recover interrupted tasks
- (see Context Window Recovery below)
- Health check
-
- Run
- harness-init.sh
- if it exists
- Track session
-
- Increment
- session_count
- in JSON. Check
- session_count
- against
- max_sessions
- — if reached, log STATS and STOP. Initialize per-session task counter to 0.
- Pick next task
- using Task Selection Algorithm below
- Task Selection Algorithm
- Before selecting, run dependency validation:
- Cycle detection
-
- For each non-completed task, walk
- depends_on
- transitively. If any task appears in its own chain, mark it
- failed
- with
- [DEPENDENCY] Circular dependency detected: task-A -> task-B -> task-A
- . Self-references (
- depends_on
- includes own id) are also cycles.
- Blocked propagation
-
- If a task's
- depends_on
- includes a task that is
- failed
- and will never be retried (either
- attempts >= max_attempts
- OR its
- error_log
- contains a
- [DEPENDENCY]
- entry), mark the blocked task as
- failed
- with
- [DEPENDENCY] Blocked by failed task-XXX
- . Repeat until no more tasks can be propagated.
- Then pick the next task in this priority order:
- Tasks with
- status: "pending"
- where ALL
- depends_on
- tasks are
- completed
- — sorted by
- priority
- (P0 > P1 > P2), then by
- id
- (lowest first)
- Tasks with
- status: "failed"
- where
- attempts < max_attempts
- and ALL
- depends_on
- are
- completed
- — sorted by priority, then oldest failure first
- If no eligible tasks remain → log final STATS → STOP
- Task Execution Cycle
- For each task, execute this exact sequence:
- Claim
- (atomic, under lock): Record
- started_at_commit
- = current HEAD hash. Set status to
- in_progress
- , set
- claimed_by
- , set
- lease_expires_at
- , log
- Starting [
] <title> (base= ) - . If the task is already claimed (
- in_progress
- with a valid lease), pick another eligible task and retry.
- Execute with checkpoints
-
- Perform the work. After each significant step, log:
- [timestamp] [SESSION-N] CHECKPOINT [task-id] step=M/N "description of what was done"
- Also append to the task's
- checkpoints
- array:
- { "step": M, "total": N, "description": "...", "timestamp": "ISO" }
- . In concurrent mode, renew the lease at each checkpoint (push
- lease_expires_at
- forward).
- Validate
-
- Run the task's
- validation.command
- with a timeout wrapper (prefer
- timeout
- ; on macOS use
- gtimeout
- from coreutils). If
- validation.command
- is empty/null, log
- ERROR [
] [CONFIG] Missing validation.command - and STOP — do not declare completion without an objective check. Before running, verify the command exists (e.g.,
- command -v
- ) — if missing, treat as
- ENV_SETUP
- error.
- Command exits 0 → PASS
- Command exits non-zero → FAIL
- Command exceeds timeout → TIMEOUT
- Record outcome
- :
- Success
-
- status=
- completed
- , set
- completed_at
- , log
- Completed [
] (commit ) - , git commit
- Failure
-
- increment
- attempts
- , append error to
- error_log
- . Verify
- started_at_commit
- exists via
- git cat-file -t
- — if missing, mark failed at max_attempts. Otherwise execute
- git reset --hard
- and
- git clean -fd
- to rollback ALL commits and remove untracked files. Execute
- on_failure.cleanup
- if defined. Log
- ERROR [
] [ ] - . Set status=
- failed
- (Task Selection Algorithm pass 2 handles retries when attempts < max_attempts)
- Track
-
- Increment per-session task counter. If
- max_tasks_per_session
- reached, log STATS and STOP.
- Continue
- Immediately pick next task (zero idle time) Stopping Conditions All tasks completed All remaining tasks failed at max_attempts or blocked by failed dependencies session_config.max_tasks_per_session reached for this session session_config.max_sessions reached across all sessions User interrupts Context Window Recovery Protocol When a new session starts and finds a task with status: "in_progress" : Exclusive mode: treat this as an interrupted previous session and run the Recovery Protocol below. Concurrent mode: only recover a task if either (a) claimed_by matches this worker, or (b) lease_expires_at is in the past (stale lease). Otherwise, treat it as owned by another worker and do not modify it. Check git state : git diff --stat
Uncommitted changes?
git log --oneline -5
Recent commits since task started?
git stash list
Any stashed work?
- Check checkpoints
-
- Read the task's
- checkpoints
- array to determine last completed step
- Decision matrix
- (verify recent commits belong to this task by checking commit messages for the task-id):
- Uncommitted?
- Recent task commits?
- Checkpoints?
- Action
- No
- No
- None
- Mark
- failed
- with
- [SESSION_TIMEOUT] No progress detected
- , increment attempts
- No
- No
- Some
- Verify file state matches checkpoint claims. If files reflect checkpoint progress, resume from last step. If not, mark
- failed
- — work was lost
- No
- Yes
- Any
- Run
- validation.command
- . If passes → mark
- completed
- . If fails →
- git reset --hard
- , mark
- failed
- Yes
- No
- Any
- Run validation WITH uncommitted changes present. If passes → commit, mark
- completed
- . If fails →
- git reset --hard
- +
- git clean -fd
- , mark
- failed
- Yes
- Yes
- Any
- Commit uncommitted changes, run
- validation.command
- . If passes → mark
- completed
- . If fails →
- git reset --hard
- +
- git clean -fd
- , mark
- failed
- Log recovery
- :
- [timestamp] [SESSION-N] RECOVERY [task-id] action="
" reason=" " - Error Handling & Recovery Strategies
- Each error category has a default recovery strategy:
- Category
- Default Recovery
- Agent Action
- ENV_SETUP
- Re-run init, then STOP if still failing
- Run
- harness-init.sh
- again immediately. If fails twice, log and stop — environment is broken
- CONFIG
- STOP (requires human fix)
- Log the config error precisely (file + field), then STOP. Do not guess or auto-mutate task metadata
- TASK_EXEC
- Rollback via
- git reset --hard
- , retry
- Verify
- started_at_commit
- exists (
- git cat-file -t
- ). If missing, mark failed at max_attempts. Otherwise reset, run
- on_failure.cleanup
- if defined, retry if attempts < max_attempts
- TEST_FAIL
- Rollback via
- git reset --hard
- , retry
- Reset to
- started_at_commit
- , analyze test output to identify fix, retry with targeted changes
- TIMEOUT
- Kill process, execute cleanup, retry
- Wrap validation with
- timeout
- . On timeout, run
- on_failure.cleanup
- , retry (consider splitting task if repeated)
- DEPENDENCY
- Skip task, mark blocked
- Log which dependency failed, mark task as
- failed
- with dependency reason
- SESSION_TIMEOUT
- Use Context Window Recovery Protocol
- New session assesses partial progress via Recovery Protocol — may result in completion or failure depending on validation
- JSON corruption
-
- If
- harness-tasks.json
- cannot be parsed, check for
- harness-tasks.json.bak
- (written before each modification). If backup exists and is valid, restore from it. If no valid backup, log
- ERROR [ENV_SETUP] harness-tasks.json corrupted and unrecoverable
- and STOP — task metadata (validation commands, dependencies, cleanup) cannot be reconstructed from logs alone.
- Backup protocol
- Before every write to harness-tasks.json , copy the current file to harness-tasks.json.bak . Write updates atomically: write JSON to harness-tasks.json.tmp then mv it into place (readers should never see a partial file). Environment Initialization If harness-init.sh exists in the project root, run it at every session start. The script must be idempotent. Example harness-init.sh :
!/bin/bash
set -e npm install 2
/dev/null || pip install -r requirements.txt 2
/dev/null || true curl -sf http://localhost:5432
/dev/null 2
&1 || echo "WARN: DB not reachable" npm test -- --bail --silent 2
/dev/null || echo "WARN: Smoke test failed" echo "Environment health check complete" Standardized Log Format All log entries use grep-friendly format on a single line: [ISO-timestamp] [SESSION-N]
[task-id]? [category]? message [task-id] and [category] are included when applicable (task-scoped entries). Session-level entries ( INIT , LOCK , STATS ) omit them. Types: INIT , Starting , Completed , ERROR , CHECKPOINT , ROLLBACK , RECOVERY , STATS , LOCK , WARN Error categories: ENV_SETUP , CONFIG , TASK_EXEC , TEST_FAIL , TIMEOUT , DEPENDENCY , SESSION_TIMEOUT Filtering: grep "ERROR" harness-progress.txt
All errors
grep "ERROR" harness-progress.txt | grep "TASK_EXEC"
Execution errors only
grep "SESSION-3" harness-progress.txt
All session 3 activity
grep "STATS" harness-progress.txt
All session summaries
grep "CHECKPOINT" harness-progress.txt
All checkpoints
grep "RECOVERY" harness-progress.txt
All recovery actions
- Session Statistics
- At session end, update
- harness-tasks.json
- set last_session to current timestamp. (Do NOT increment session_count here — it is incremented at Session Start.) Then append: [timestamp] [SESSION-N] STATS tasks_total=10 completed=7 failed=1 pending=2 blocked=0 attempts_total=12 checkpoints=23 blocked is computed at stats time: count of pending tasks whose depends_on includes a permanently failed task. It is not a stored status value. Init Command ( /harness init ) Create harness-progress.txt with initialization entry Create harness-tasks.json with empty task list and default session_config Optionally create harness-init.sh template (chmod +x) Ask user: add harness files to .gitignore ? Status Command ( /harness status ) Read harness-tasks.json and harness-progress.txt , then display: Task summary: count by status (completed, failed, pending, blocked). blocked = pending tasks whose depends_on includes a permanently failed task (computed, not a stored status). Per-task one-liner: [status] task-id: title (attempts/max_attempts) Last 5 lines from harness-progress.txt Session count and last session timestamp Does NOT acquire the lock (read-only operation). Add Command ( /harness add ) Append a new task to harness-tasks.json with auto-incremented id ( task-NNN ), status pending , default max_attempts: 3 , empty depends_on , and no validation command (required before the task can be completed). Prompt user for optional fields: priority , depends_on , validation.command , timeout_seconds . Requires lock acquisition (modifies JSON). Tool Dependencies Requires: Bash, file read/write, git. All harness operations must be executed from the project root directory. Does NOT require: specific MCP servers, programming languages, or test frameworks. Concurrent mode requires isolated working directories ( git worktree or separate clones). Do not run concurrent workers in the same working tree.