Harness — Long-Running Agent Framework
Executable protocol enabling any agent task to run continuously across multiple sessions with automatic progress recovery, task dependency resolution, failure rollback, and standardized error handling.
Design Principles
Design for the agent, not the human
— Test output, docs, and task structure are the agent's primary interface
Progress files ARE the context
— When context window resets, progress files + git history = full recovery
Premature completion is the #1 failure mode
— Structured task lists with explicit completion criteria prevent declaring victory early
Standardize everything grep-able
— ERROR on same line, structured timestamps, consistent prefixes
Fast feedback loops
— Pre-compute stats, run smoke tests before full validation
Idempotent everything
— Init scripts, task execution, environment setup must all be safe to re-run
Fail safe, not fail silent
— Every failure must have an explicit recovery strategy
Commands
/harness init # Initialize harness files in project
/harness run # Start/resume the infinite loop
/harness status # Show current progress and stats
/harness add "task description" # Add a task to the list
Activation Marker
Hooks only take effect when
.harness-active
marker file exists in the harness root (same directory as
harness-tasks.json
).
/harness init
and
/harness run
MUST create this marker:
touch /.harness-active
When all tasks complete (no pending/in_progress/retryable left), remove it:
rm /.harness-active
Without this marker, all hooks are no-ops — they exit 0 immediately
Progress Persistence (Dual-File System)
Maintain two files in the project working directory:
harness-progress.txt (Append-Only Log)
Free-text log of all agent actions across sessions. Never truncate.
[2025-07-01T10:00:00Z] [SESSION-1] INIT Harness initialized for project /path/to/project
[2025-07-01T10:00:05Z] [SESSION-1] INIT Environment health check: PASS
[2025-07-01T10:00:10Z] [SESSION-1] LOCK acquired (pid=12345)
[2025-07-01T10:00:11Z] [SESSION-1] Starting [task-001] Implement user authentication (base=def5678)
[2025-07-01T10:05:00Z] [SESSION-1] CHECKPOINT [task-001] step=2/4 "auth routes created, tests pending"
[2025-07-01T10:15:30Z] [SESSION-1] Completed [task-001] (commit abc1234)
[2025-07-01T10:15:31Z] [SESSION-1] Starting [task-002] Add rate limiting (base=abc1234)
[2025-07-01T10:20:00Z] [SESSION-1] ERROR [task-002] [TASK_EXEC] Redis connection refused
[2025-07-01T10:20:01Z] [SESSION-1] ROLLBACK [task-002] git reset --hard abc1234
[2025-07-01T10:20:02Z] [SESSION-1] STATS tasks_total=5 completed=1 failed=1 pending=3 blocked=0 attempts_total=2 checkpoints=1
harness-tasks.json (Structured State)
{
"version"
:
2
,
"created"
:
"2025-07-01T10:00:00Z"
,
"session_config"
:
{
"concurrency_mode"
:
"exclusive"
,
"max_tasks_per_session"
:
20
,
"max_sessions"
:
50
}
,
"tasks"
:
[
{
"id"
:
"task-001"
,
"title"
:
"Implement user authentication"
,
"status"
:
"completed"
,
"priority"
:
"P0"
,
"depends_on"
:
[
]
,
"attempts"
:
1
,
"max_attempts"
:
3
,
"started_at_commit"
:
"def5678"
,
"validation"
:
{
"command"
:
"npm test -- --testPathPattern=auth"
,
"timeout_seconds"
:
300
}
,
"on_failure"
:
{
"cleanup"
:
null
}
,
"error_log"
:
[
]
,
"checkpoints"
:
[
]
,
"completed_at"
:
"2025-07-01T10:15:30Z"
}
,
{
"id"
:
"task-002"
,
"title"
:
"Add rate limiting"
,
"status"
:
"failed"
,
"priority"
:
"P1"
,
"depends_on"
:
[
]
,
"attempts"
:
1
,
"max_attempts"
:
3
,
"started_at_commit"
:
"abc1234"
,
"validation"
:
{
"command"
:
"npm test -- --testPathPattern=rate-limit"
,
"timeout_seconds"
:
120
}
,
"on_failure"
:
{
"cleanup"
:
"docker compose down redis"
}
,
"error_log"
:
[
"[TASK_EXEC] Redis connection refused"
]
,
"checkpoints"
:
[
]
,
"completed_at"
:
null
}
,
{
"id"
:
"task-003"
,
"title"
:
"Add OAuth providers"
,
"status"
:
"pending"
,
"priority"
:
"P1"
,
"depends_on"
:
[
"task-001"
]
,
"attempts"
:
0
,
"max_attempts"
:
3
,
"started_at_commit"
:
null
,
"validation"
:
{
"command"
:
"npm test -- --testPathPattern=oauth"
,
"timeout_seconds"
:
180
}
,
"on_failure"
:
{
"cleanup"
:
null
}
,
"error_log"
:
[
]
,
"checkpoints"
:
[
]
,
"completed_at"
:
null
}
]
,
"session_count"
:
1
,
"last_session"
:
"2025-07-01T10:20:02Z"
}
Task statuses:
pending
→
in_progress
(transient, set only during active execution) →
completed
or
failed
. A task found as
in_progress
at session start means the previous session was interrupted — handle via Context Window Recovery Protocol.
In concurrent mode (see Concurrency Control), tasks may also carry claim metadata:
claimed_by
and
lease_expires_at
(ISO timestamp).
Session boundary: A session starts when the agent begins executing the Session Start protocol and ends when a Stopping Condition is met or the context window resets. Each session gets a unique SESSION-N identifier (N = session_count after increment). Concurrency Control Before modifying harness-tasks.json , acquire an exclusive lock using portable mkdir (atomic on all POSIX systems, works on both macOS and Linux):

Acquire lock (fail fast if another agent is running)

Lock key must be stable even if invoked from a subdirectory.

ROOT

" $PWD " SEARCH = " $PWD " while [ " $SEARCH " != "/" ] && [ ! -f " $SEARCH /harness-tasks.json" ] ; do SEARCH = " $( dirname " $SEARCH " ) " done if [ -f " $SEARCH /harness-tasks.json" ] ; then ROOT = " $SEARCH " fi PWD_HASH = " $( printf '%s' " $ROOT " | ( shasum -a 256 2

/dev/null || sha256sum 2

/dev/null ) | awk '{print $1}' | cut -c1-16 ) " LOCKDIR = "/tmp/harness- ${PWD_HASH :- unknown} .lock" if ! mkdir " $LOCKDIR " 2

/dev/null ; then

Check if lock holder is still alive

LOCK_PID

$( cat " $LOCKDIR /pid" 2

/dev/null ) if [ -n " $LOCK_PID " ] && kill -0 " $LOCK_PID " 2

/dev/null ; then echo "ERROR: Another harness session is active (pid= $LOCK_PID )" ; exit 1 fi

Stale lock — atomically reclaim via mv to avoid TOCTOU race

STALE

"

$LOCKDIR

.stale.

$$

"

if

mv

"

$LOCKDIR

"

$STALE

"

2

>

/dev/null

;

then

rm

-rf

"

$STALE

"

mkdir

"

$LOCKDIR

"

||

{

echo

"ERROR: Lock contention"

;

exit

1

;

}

echo

"WARN: Removed stale lock

${LOCK_PID

:+

from pid=$LOCK_PID}

"

else

echo

"ERROR: Another agent reclaimed the lock"

;

exit

1

fi

echo

"

$$

"

>

"

$LOCKDIR

/pid"

trap

'rm -rf "$LOCKDIR"'

EXIT

Log lock acquisition:

[timestamp] [SESSION-N] LOCK acquired (pid=)

Log lock release:

[timestamp] [SESSION-N] LOCK released

Modes:

Exclusive (default)

hold the lock for the entire session (the

trap EXIT

handler releases it automatically). Any second session in the same state root fails fast.

Concurrent (opt-in via

session_config.concurrency_mode: "concurrent"

)

treat this as a

state transaction lock

. Hold it only while reading/modifying/writing

harness-tasks.json

(including

.bak

/

.tmp

) and appending to

harness-progress.txt

. Release it immediately before doing real work.

Concurrent mode invariants:

All workers MUST point at the same state root (the directory that contains

harness-tasks.json

). If you are using separate worktrees/clones, pin it explicitly (e.g.,

HARNESS_STATE_ROOT=/abs/path/to/state-root

).

Task selection is advisory; the real gate is

atomic claim

under the lock: set

status="in_progress"

, set

claimed_by

(stable worker id, e.g.,

HARNESS_WORKER_ID

), set

lease_expires_at

. If claim fails (already

in_progress

with a valid lease), pick another eligible task and retry.

Never run two workers in the same git working directory. Use separate worktrees/clones. Otherwise rollback (

git reset --hard

/

git clean -fd

) will destroy other workers.

Infinite Loop Protocol

Session Start (Execute Every Time)

Read state

Read last 200 lines of

harness-progress.txt

+ full

harness-tasks.json

. If JSON is unparseable, see JSON corruption recovery in Error Handling.

Read git

Run

git log --oneline -20

and

git diff --stat

to detect uncommitted work

Acquire lock

(mode-dependent): Exclusive mode fails if another session is active. Concurrent mode uses the lock only for state transactions.

Recover interrupted tasks

(see Context Window Recovery below)

Health check

Run

harness-init.sh

if it exists

Track session

Increment

session_count

in JSON. Check

session_count

against

max_sessions

— if reached, log STATS and STOP. Initialize per-session task counter to 0.

Pick next task

using Task Selection Algorithm below

Task Selection Algorithm

Before selecting, run dependency validation:

Cycle detection

For each non-completed task, walk

depends_on

transitively. If any task appears in its own chain, mark it

failed

with

[DEPENDENCY] Circular dependency detected: task-A -> task-B -> task-A

. Self-references (

depends_on

includes own id) are also cycles.

Blocked propagation

If a task's

depends_on

includes a task that is

failed

and will never be retried (either

attempts >= max_attempts

OR its

error_log

contains a

[DEPENDENCY]

entry), mark the blocked task as

failed

with

[DEPENDENCY] Blocked by failed task-XXX

. Repeat until no more tasks can be propagated.

Then pick the next task in this priority order:

Tasks with

status: "pending"

where ALL

depends_on

tasks are

completed

— sorted by

priority

(P0 > P1 > P2), then by

id

(lowest first)

Tasks with

status: "failed"

where

attempts < max_attempts

and ALL

depends_on

are

completed

— sorted by priority, then oldest failure first

If no eligible tasks remain → log final STATS → STOP

Task Execution Cycle

For each task, execute this exact sequence:

Claim

(atomic, under lock): Record

started_at_commit

= current HEAD hash. Set status to

in_progress

, set

claimed_by

, set

lease_expires_at

, log

Starting [] <title> (base=)

. If the task is already claimed (

in_progress

with a valid lease), pick another eligible task and retry.

Execute with checkpoints

Perform the work. After each significant step, log:

[timestamp] [SESSION-N] CHECKPOINT [task-id] step=M/N "description of what was done"

Also append to the task's

checkpoints

array:

{ "step": M, "total": N, "description": "...", "timestamp": "ISO" }

. In concurrent mode, renew the lease at each checkpoint (push

lease_expires_at

forward).

Validate

Run the task's

validation.command

with a timeout wrapper (prefer

timeout

; on macOS use

gtimeout

from coreutils). If

validation.command

is empty/null, log

ERROR [] [CONFIG] Missing validation.command

and STOP — do not declare completion without an objective check. Before running, verify the command exists (e.g.,

command -v

) — if missing, treat as

ENV_SETUP

error.

Command exits 0 → PASS

Command exits non-zero → FAIL

Command exceeds timeout → TIMEOUT

Record outcome

:

Success

status=

completed

, set

completed_at

, log

Completed [] (commit )

, git commit

Failure

increment

attempts

, append error to

error_log

. Verify

started_at_commit

exists via

git cat-file -t

— if missing, mark failed at max_attempts. Otherwise execute

git reset --hard

and

git clean -fd

to rollback ALL commits and remove untracked files. Execute

on_failure.cleanup

if defined. Log

ERROR [] []

. Set status=

failed

(Task Selection Algorithm pass 2 handles retries when attempts < max_attempts)

Track

Increment per-session task counter. If
max_tasks_per_session
reached, log STATS and STOP.
Continue: Immediately pick next task (zero idle time) Stopping Conditions All tasks completed All remaining tasks failed at max_attempts or blocked by failed dependencies session_config.max_tasks_per_session reached for this session session_config.max_sessions reached across all sessions User interrupts Context Window Recovery Protocol When a new session starts and finds a task with status: "in_progress" : Exclusive mode: treat this as an interrupted previous session and run the Recovery Protocol below. Concurrent mode: only recover a task if either (a) claimed_by matches this worker, or (b) lease_expires_at is in the past (stale lease). Otherwise, treat it as owned by another worker and do not modify it. Check git state : git diff --stat

Uncommitted changes?

git log --oneline -5

Recent commits since task started?

git stash list

Any stashed work?

Check checkpoints

Read the task's

checkpoints

array to determine last completed step

Decision matrix

(verify recent commits belong to this task by checking commit messages for the task-id):

Uncommitted?

Recent task commits?

Checkpoints?

Action

No

None

Mark

failed

with

[SESSION_TIMEOUT] No progress detected

, increment attempts

No

Some

Verify file state matches checkpoint claims. If files reflect checkpoint progress, resume from last step. If not, mark

failed

— work was lost

No

Yes

Any

Run

validation.command

. If passes → mark

completed

. If fails →

git reset --hard

, mark

failed

Yes

No

Any

Run validation WITH uncommitted changes present. If passes → commit, mark

completed

. If fails →

git reset --hard

+

git clean -fd

, mark

failed

Yes

Any

Commit uncommitted changes, run

validation.command

. If passes → mark

completed

. If fails →

git reset --hard

+

git clean -fd

, mark

failed

Log recovery

:

[timestamp] [SESSION-N] RECOVERY [task-id] action="" reason=""

Error Handling & Recovery Strategies

Each error category has a default recovery strategy:

Category

Default Recovery

Agent Action

ENV_SETUP

Re-run init, then STOP if still failing

Run

harness-init.sh

again immediately. If fails twice, log and stop — environment is broken

CONFIG

STOP (requires human fix)

Log the config error precisely (file + field), then STOP. Do not guess or auto-mutate task metadata

TASK_EXEC

Rollback via

git reset --hard

, retry

Verify

started_at_commit

exists (

git cat-file -t

). If missing, mark failed at max_attempts. Otherwise reset, run

on_failure.cleanup

if defined, retry if attempts < max_attempts

TEST_FAIL

Rollback via

git reset --hard

, retry

Reset to

started_at_commit

, analyze test output to identify fix, retry with targeted changes

TIMEOUT

Kill process, execute cleanup, retry

Wrap validation with

timeout

. On timeout, run

on_failure.cleanup

, retry (consider splitting task if repeated)

DEPENDENCY

Skip task, mark blocked

Log which dependency failed, mark task as

failed

with dependency reason

SESSION_TIMEOUT

Use Context Window Recovery Protocol

New session assesses partial progress via Recovery Protocol — may result in completion or failure depending on validation

JSON corruption

If
harness-tasks.json
cannot be parsed, check for
harness-tasks.json.bak
(written before each modification). If backup exists and is valid, restore from it. If no valid backup, log
ERROR [ENV_SETUP] harness-tasks.json corrupted and unrecoverable
and STOP — task metadata (validation commands, dependencies, cleanup) cannot be reconstructed from logs alone.
Backup protocol: Before every write to harness-tasks.json , copy the current file to harness-tasks.json.bak . Write updates atomically: write JSON to harness-tasks.json.tmp then mv it into place (readers should never see a partial file). Environment Initialization If harness-init.sh exists in the project root, run it at every session start. The script must be idempotent. Example harness-init.sh :

!/bin/bash

set -e npm install 2

/dev/null || pip install -r requirements.txt 2

/dev/null || true curl -sf http://localhost:5432

/dev/null 2

&1 || echo "WARN: DB not reachable" npm test -- --bail --silent 2

/dev/null || echo "WARN: Smoke test failed" echo "Environment health check complete" Standardized Log Format All log entries use grep-friendly format on a single line: [ISO-timestamp] [SESSION-N] [task-id]? [category]? message [task-id] and [category] are included when applicable (task-scoped entries). Session-level entries ( INIT , LOCK , STATS ) omit them. Types: INIT , Starting , Completed , ERROR , CHECKPOINT , ROLLBACK , RECOVERY , STATS , LOCK , WARN Error categories: ENV_SETUP , CONFIG , TASK_EXEC , TEST_FAIL , TIMEOUT , DEPENDENCY , SESSION_TIMEOUT Filtering: grep "ERROR" harness-progress.txt

All errors

grep "ERROR" harness-progress.txt | grep "TASK_EXEC"

Execution errors only

grep "SESSION-3" harness-progress.txt

All session 3 activity

grep "STATS" harness-progress.txt

All session summaries

grep "CHECKPOINT" harness-progress.txt

All checkpoints

grep "RECOVERY" harness-progress.txt

All recovery actions

Session Statistics
At session end, update
harness-tasks.json: set last_session to current timestamp. (Do NOT increment session_count here — it is incremented at Session Start.) Then append: [timestamp] [SESSION-N] STATS tasks_total=10 completed=7 failed=1 pending=2 blocked=0 attempts_total=12 checkpoints=23 blocked is computed at stats time: count of pending tasks whose depends_on includes a permanently failed task. It is not a stored status value. Init Command ( /harness init ) Create harness-progress.txt with initialization entry Create harness-tasks.json with empty task list and default session_config Optionally create harness-init.sh template (chmod +x) Ask user: add harness files to .gitignore ? Status Command ( /harness status ) Read harness-tasks.json and harness-progress.txt , then display: Task summary: count by status (completed, failed, pending, blocked). blocked = pending tasks whose depends_on includes a permanently failed task (computed, not a stored status). Per-task one-liner: [status] task-id: title (attempts/max_attempts) Last 5 lines from harness-progress.txt Session count and last session timestamp Does NOT acquire the lock (read-only operation). Add Command ( /harness add ) Append a new task to harness-tasks.json with auto-incremented id ( task-NNN ), status pending , default max_attempts: 3 , empty depends_on , and no validation command (required before the task can be completed). Prompt user for optional fields: priority , depends_on , validation.command , timeout_seconds . Requires lock acquisition (modifies JSON). Tool Dependencies Requires: Bash, file read/write, git. All harness operations must be executed from the project root directory. Does NOT require: specific MCP servers, programming languages, or test frameworks. Concurrent mode requires isolated working directories ( git worktree or separate clones). Do not run concurrent workers in the same working tree.

安装

Acquire lock (fail fast if another agent is running)

Lock key must be stable even if invoked from a subdirectory.

ROOT

Check if lock holder is still alive

LOCK_PID

Stale lock — atomically reclaim via mv to avoid TOCTOU race

STALE

Uncommitted changes?

Recent commits since task started?

Any stashed work?

!/bin/bash

All errors

Execution errors only

All session 3 activity

All session summaries

All checkpoints

All recovery actions