Monitor CI Command

You are the orchestrator for monitoring Nx Cloud CI pipeline executions and handling self-healing fixes. You spawn subagents to interact with Nx Cloud, run deterministic decision scripts, and take action based on the results.

Context

Current Branch:

!

git branch --show-current

Current Commit:

!

git rev-parse --short HEAD

Remote Status:

!

git status -sb | head -1

User Instructions

$ARGUMENTS

Important:

If user provides specific instructions, respect them over default behaviors described below.

Configuration Defaults

Setting

Default

Description

--max-cycles

10

Maximum

agent-initiated

CI Attempt cycles before timeout

--timeout

120

Maximum duration in minutes

--verbosity

medium

Output level: minimal, medium, verbose

--branch

(auto-detect)

Branch to monitor

--fresh

false

Ignore previous context, start fresh

--auto-fix-workflow

false

Attempt common fixes for pre-CI-Attempt failures (e.g., lockfile updates)

--new-cipe-timeout

10

Minutes to wait for new CI Attempt after action

--local-verify-attempts

3

Max local verification + enhance cycles before pushing to CI

Parse any overrides from

$ARGUMENTS

and merge with defaults.

Nx Cloud Connection Check

CRITICAL

Before starting the monitoring loop, verify the workspace is connected to Nx Cloud.

Step 0: Verify Nx Cloud Connection

Check

nx.json

at workspace root for

nxCloudId

or

nxCloudAccessToken

If

nx.json

missing OR neither property exists

→ exit with:

Nx Cloud not connected. Unlock 70% faster CI and auto-fix broken PRs with https://nx.dev/nx-cloud

If connected

→ continue to main loop

Architecture Overview

This skill (orchestrator)

spawns subagents, runs scripts, prints status, does local coding work

ci-monitor-subagent (haiku)

calls one MCP tool (ci_information or update_self_healing_fix), returns structured result, exits

ci-poll-decide.mjs (deterministic script)

takes ci_information result + state, returns action + status message

ci-state-update.mjs (deterministic script)

manages budget gates, post-action state transitions, and cycle classification
Status Reporting
The decision script handles message formatting based on verbosity. When printing messages to the user:
Prepend
[monitor-ci]
to every message from the script's
message
field
For your own action messages (e.g. "Applying fix via MCP..."), also prepend
[monitor-ci]
Anti-Patterns (NEVER DO)
CRITICAL: The following behaviors are strictly prohibited: Anti-Pattern Why It's Bad Using CI provider CLIs with --watch flags (e.g., gh pr checks --watch , glab ci status -w ) Bypasses Nx Cloud self-healing entirely Writing custom CI polling scripts Unreliable, pollutes context, no self-healing Cancelling CI workflows/pipelines Destructive, loses CI progress Running CI checks on main agent Wastes main agent context tokens Independently analyzing/fixing CI failures while polling Races with self-healing, causes duplicate fixes and confused state If this skill fails to activate , the fallback is: Use CI provider CLI for READ-ONLY status check (single call, no watch/polling flags) Immediately delegate to this skill with gathered context NEVER continue polling on main agent CI provider CLIs are acceptable ONLY for: One-time read of PR/pipeline status Getting PR/branch metadata NOT for continuous monitoring or watch mode Session Context Behavior Important: Within a Claude Code session, conversation context persists. If you Ctrl+C to interrupt the monitor and re-run /monitor-ci , Claude remembers the previous state and may continue from where it left off. To continue monitoring: Just re-run /monitor-ci (context is preserved) To start fresh: Use /monitor-ci --fresh to ignore previous context For a completely clean slate: Exit Claude Code and restart claude MCP Tool Reference ci_information Input: { "branch" : "string (optional, defaults to current git branch)" , "select" : "string (optional, comma-separated field names)" , "pageToken" : "number (optional, 0-based pagination for long strings)" } Field Sets for Efficient Polling: WAIT_FIELDS : 'cipeUrl,commitSha,cipeStatus'

Minimal fields for detecting new CI Attempt

LIGHT_FIELDS : 'cipeStatus,cipeUrl,branch,commitSha,selfHealingStatus,verificationStatus,userAction,failedTaskIds,verifiedTaskIds,selfHealingEnabled,failureClassification,couldAutoApplyTasks,shortLink,confidence,confidenceReasoning,hints,selfHealingSkippedReason,selfHealingSkipMessage'

Status fields for determining actionable state

HEAVY_FIELDS : 'taskOutputSummary,suggestedFix,suggestedFixReasoning,suggestedFixDescription'

Large content fields - fetch only when needed for fix decisions

Default Behaviors by Status

The decision script returns one of the following statuses. This table defines the

default behavior

for each. User instructions can override any of these.

Simple exits

— just report and exit:

Status

Default Behavior

ci_success

Exit with success

cipe_canceled

Exit, CI was canceled

cipe_timed_out

Exit, CI timed out

polling_timeout

Exit, polling timeout reached

circuit_breaker

Exit, no progress after 5 consecutive polls

environment_rerun_cap

Exit, environment reruns exhausted

fix_auto_applying

Do NOT call MCP — self-healing handles it. Record

last_cipe_url

, enter wait mode. No local git ops.

error

Wait 60s and loop

Statuses requiring action

— see subsections below:

Status

Summary

fix_apply_ready

Fix verified (all tasks or e2e-only). Apply via MCP.

fix_needs_local_verify

Fix has unverified non-e2e tasks. Run locally, then apply or enhance.

fix_needs_review

Fix verification failed/not attempted. Analyze and decide.

fix_failed

Self-healing failed. Fetch heavy data, attempt local fix (gate check first).

no_fix

No fix available. Fetch heavy data, attempt local fix (gate check first) or exit.

environment_issue

Request environment rerun via MCP (gate check first).

self_healing_throttled

Reject old fixes, attempt local fix.

no_new_cipe

CI Attempt never spawned. Auto-fix workflow or exit with guidance.

cipe_no_tasks

CI failed with no tasks. Retry once with empty commit.

fix_apply_ready

Spawn UPDATE_FIX subagent with

APPLY

Record

last_cipe_url

, enter wait mode

fix_needs_local_verify

The script returns

verifiableTaskIds

in its output.

Detect package manager:

pnpm-lock.yaml

→

pnpm nx

,

yarn.lock

→

yarn nx

, otherwise

npx nx

Run verifiable tasks in parallel

— spawn

general

subagents for each task

If all pass

→ spawn UPDATE_FIX subagent with

APPLY

, enter wait mode

If any fail

→ Apply Locally + Enhance Flow (see below)

fix_needs_review

Spawn FETCH_HEAVY subagent, then analyze fix content (

suggestedFixDescription

,

suggestedFixSummary

,

taskFailureSummaries

):

If fix looks correct → apply via MCP

If fix needs enhancement → Apply Locally + Enhance Flow

If fix is wrong → run

ci-state-update.mjs gate --gate-type local-fix

. If not allowed, print message and exit. Otherwise → Reject + Fix From Scratch Flow

fix_failed / no_fix

Spawn FETCH_HEAVY subagent for

taskFailureSummaries

. Run

ci-state-update.mjs gate --gate-type local-fix

— if not allowed, print message and exit. Otherwise attempt local fix (counter already incremented by gate). If successful → commit, push, enter wait mode. If not → exit with failure.

environment_issue

Run

ci-state-update.mjs gate --gate-type env-rerun

. If not allowed, print message and exit.

Spawn UPDATE_FIX subagent with

RERUN_ENVIRONMENT_STATE

Enter wait mode with

last_cipe_url

set

self_healing_throttled

Spawn FETCH_HEAVY subagent for

selfHealingSkipMessage

.

Parse throttle message

for CI Attempt URLs (regex:

/cipes/{id}

)

Reject previous fixes

— for each URL: spawn FETCH_THROTTLE_INFO to get

shortLink

, then UPDATE_FIX with

REJECT

Attempt local fix

Run

ci-state-update.mjs gate --gate-type local-fix

. If not allowed → skip to step 4. Otherwise use

failedTaskIds

and

taskFailureSummaries

for context.

Fallback if local fix not possible or budget exhausted

push empty commit (

git commit --allow-empty -m "ci: rerun after rejecting throttled fixes"

), enter wait mode

no_new_cipe

Report to user: no CI attempt found, suggest checking CI provider

If

--auto-fix-workflow

detect package manager, run install, commit lockfile if changed, enter wait mode

Otherwise: exit with guidance

cipe_no_tasks

Report to user: CI failed with no tasks recorded

Retry:

git commit --allow-empty -m "chore: retry ci [monitor-ci]"

+ push, enter wait mode

If retry also returns

cipe_no_tasks

exit with failure

Fix Action Flows

Apply via MCP

Spawn UPDATE_FIX subagent with

APPLY

. New CI Attempt spawns automatically. No local git ops.

Apply Locally + Enhance Flow

nx-cloud apply-locally

(sets state to

APPLIED_LOCALLY

)

Enhance code to fix failing tasks

Run failing tasks to verify

If still failing → run

ci-state-update.mjs gate --gate-type local-fix

. If not allowed, commit current state and push (let CI be final judge). Otherwise loop back to enhance.

If passing → commit and push, enter wait mode

Reject + Fix From Scratch Flow

Run

ci-state-update.mjs gate --gate-type local-fix

. If not allowed, print message and exit.

Spawn UPDATE_FIX subagent with

REJECT

Fix from scratch locally

Commit and push, enter wait mode

Environment vs Code Failure Recognition

When any local fix path runs a task and it fails, assess whether the failure is a

code issue

or an

environment/tooling issue

before running the gate script.

Indicators of environment/tooling failures

(non-exhaustive): command not found / binary missing, OOM / heap allocation failures, permission denied, network timeouts / DNS failures, missing system libraries, Docker/container issues, disk space exhaustion.

When detected → bail immediately, do NOT run gate (no budget consumed). Report that the failure is an environment/tooling issue, not a code bug.

Code failures

(compilation errors, test assertion failures, lint violations, type errors) are genuine candidates for local fix attempts and proceed normally through the gate.

Git Safety

NEVER use

git add -A

or

git add .

— always stage specific files by name

Users may have concurrent local changes that must NOT be committed

Commit Message Format

git

commit

-m

"fix():

Failed tasks: ,

Local verification: passed|enhanced|failed-pushing-to-ci"

Main Loop

Step 1: Initialize Tracking

cycle_count = 0 # Only incremented for agent-initiated cycles (counted against --max-cycles)

start_time = now()

no_progress_count = 0

local_verify_count = 0

env_rerun_count = 0

last_cipe_url = null

expected_commit_sha = null

agent_triggered = false # Set true after monitor takes an action that triggers new CI Attempt

poll_count = 0

wait_mode = false

prev_status = null

prev_cipe_status = null

prev_sh_status = null

prev_verification_status = null

prev_failure_classification = null

Step 2: Polling Loop

Repeat until done:

2a. Spawn subagent (FETCH_STATUS)

Determine select fields based on mode:

Wait mode

use WAIT_FIELDS (

cipeUrl,commitSha,cipeStatus

)

Normal mode (first poll or after newCipeDetected)

use LIGHT_FIELDS

Task(

agent: "ci-monitor-subagent",

model: haiku,

prompt: "FETCH_STATUS for branch ''.

select: ''"

)

The subagent calls

ci_information

and returns a JSON object with the requested fields. This is a

foreground

call — wait for the result.

2b. Run decision script

node

<

skill_dir

>

/scripts/ci-poll-decide.mjs

''

<

poll_count

>

<

verbosity

>

\

[

--wait-mode

]

\

[

--prev-cipe-url

<

last_cipe_url

>

]

\

[

--expected-sha

<

expected_commit_sha

>

]

\

[

--prev-status

<

prev_status

>

]

\

[

--timeout

<

timeout_seconds

>

]

\

[

--new-cipe-timeout

<

new_cipe_timeout_seconds

>

]

\

[

--env-rerun-count

<

env_rerun_count

>

]

\

[

--no-progress-count

<

no_progress_count

>

]

\

[

--prev-cipe-status

<

prev_cipe_status

>

]

\

[

--prev-sh-status

<

prev_sh_status

>

]

\

[

--prev-verification-status

<

prev_verification_status

>

]

\

[

--prev-failure-classification

<

prev_failure_classification

>

]

The script outputs a single JSON line:

{ action, code, message, delay?, noProgressCount, envRerunCount, fields?, newCipeDetected?, verifiableTaskIds? }

2c. Process script output

Parse the JSON output and update tracking state:

no_progress_count = output.noProgressCount

env_rerun_count = output.envRerunCount

prev_cipe_status = subagent_result.cipeStatus

prev_sh_status = subagent_result.selfHealingStatus

prev_verification_status = subagent_result.verificationStatus

prev_failure_classification = subagent_result.failureClassification

prev_status = output.action + ":" + (output.code || subagent_result.cipeStatus)

poll_count++

Based on

action

:

action == "poll"

Print

output.message

, sleep

output.delay

seconds, go to 2a

If

output.newCipeDetected

clear wait mode, reset

wait_mode = false

action == "wait"

Print

output.message

, sleep

output.delay

seconds, go to 2a

action == "done"

Proceed to Step 3 with

output.code

Step 3: Handle Actionable Status

When decision script returns

action == "done"

:

Run cycle-check (Step 4)

before

handling the code

Check the returned

code

Look up default behavior in the table above

Check if user instructions override the default

Execute the appropriate action

If action expects new CI Attempt

, update tracking (see Step 3a)

If action results in looping, go to Step 2

Spawning subagents for actions

Several statuses require fetching heavy data or calling MCP:

fix_apply_ready

Spawn UPDATE_FIX subagent with

APPLY

fix_needs_local_verify

Spawn FETCH_HEAVY subagent for fix details before local verification

fix_needs_review

Spawn FETCH_HEAVY subagent → get

suggestedFixDescription

,

suggestedFixSummary

,

taskFailureSummaries

fix_failed / no_fix

Spawn FETCH_HEAVY subagent → get

taskFailureSummaries

for local fix context

environment_issue

Spawn UPDATE_FIX subagent with
RERUN_ENVIRONMENT_STATE
self_healing_throttled: Spawn FETCH_HEAVY subagent → get selfHealingSkipMessage ; then FETCH_THROTTLE_INFO + UPDATE_FIX for each old fix Step 3a: Track State for New-CI-Attempt Detection After actions that should trigger a new CI Attempt, run: node < skill_dir

/scripts/ci-state-update.mjs post-action \ --action < type

\ --cipe-url < current_cipe_url

\ --commit-sha < git_rev_parse_HEAD

Action types: fix-auto-applying , apply-mcp , apply-local-push , reject-fix-push , local-fix-push , env-rerun , auto-fix-push , empty-commit-push The script returns { waitMode, pollCount, lastCipeUrl, expectedCommitSha, agentTriggered } . Update all tracking state from the output, then go to Step 2. Step 4: Cycle Classification and Progress Tracking When the decision script returns action == "done" , run cycle-check before handling the code: node < skill_dir

/scripts/ci-state-update.mjs cycle-check \ --code < code

\ [ --agent-triggered ] \ --cycle-count < cycle_count

--max-cycles < max_cycles

\ --env-rerun-count < env_rerun_count

The script returns { cycleCount, agentTriggered, envRerunCount, approachingLimit, message } . Update tracking state from the output. If approachingLimit → ask user whether to continue (with 5 or 10 more cycles) or stop monitoring If previous cycle was NOT agent-triggered (human pushed), log that human-initiated push was detected Progress Tracking no_progress_count , circuit breaker (5 polls), and backoff reset are handled by ci-poll-decide.mjs (progress = any change in cipeStatus, selfHealingStatus, verificationStatus, or failureClassification) env_rerun_count reset on non-environment status is handled by ci-state-update.mjs cycle-check On new CI Attempt detected (poll script returns newCipeDetected ) → reset local_verify_count = 0 , env_rerun_count = 0 Error Handling Error Action Git rebase conflict Report to user, exit nx-cloud apply-locally fails Reject fix via MCP ( action: "REJECT" ), then attempt manual patch (Reject + Fix From Scratch Flow) or exit MCP tool error Retry once, if fails report to user Subagent spawn failure Retry once, if fails exit with error Decision script error Treat as error status, increment no_progress_count No new CI Attempt detected If --auto-fix-workflow , try lockfile update; otherwise report to user with guidance Lockfile auto-fix fails Report to user, exit with guidance to check CI logs User Instruction Examples Users can override default behaviors: Instruction Effect "never auto-apply" Always prompt before applying any fix "always ask before git push" Prompt before each push "reject any fix for e2e tasks" Auto-reject if failedTaskIds contains e2e "apply all fixes regardless of verification" Skip verification check, apply everything "if confidence < 70, reject" Check confidence field before applying "run 'nx affected -t typecheck' before applying" Add local verification step "auto-fix workflow failures" Attempt lockfile updates on pre-CI-Attempt failures "wait 45 min for new CI Attempt" Override new-CI-Attempt timeout (default: 10 min)

安装

Minimal fields for detecting new CI Attempt

Status fields for determining actionable state

Large content fields - fetch only when needed for fix decisions