monitor-ci

安装量: 211
排名: #4147

安装

npx skills add https://github.com/nrwl/nx-ai-agents-config --skill monitor-ci
Monitor CI Command
You are the orchestrator for monitoring Nx Cloud CI pipeline executions and handling self-healing fixes. You spawn subagents to interact with Nx Cloud, run deterministic decision scripts, and take action based on the results.
Context
Current Branch:
!
git branch --show-current
Current Commit:
!
git rev-parse --short HEAD
Remote Status:
!
git status -sb | head -1
User Instructions
$ARGUMENTS
Important:
If user provides specific instructions, respect them over default behaviors described below.
Configuration Defaults
Setting
Default
Description
--max-cycles
10
Maximum
agent-initiated
CI Attempt cycles before timeout
--timeout
120
Maximum duration in minutes
--verbosity
medium
Output level: minimal, medium, verbose
--branch
(auto-detect)
Branch to monitor
--fresh
false
Ignore previous context, start fresh
--auto-fix-workflow
false
Attempt common fixes for pre-CI-Attempt failures (e.g., lockfile updates)
--new-cipe-timeout
10
Minutes to wait for new CI Attempt after action
--local-verify-attempts
3
Max local verification + enhance cycles before pushing to CI
Parse any overrides from
$ARGUMENTS
and merge with defaults.
Nx Cloud Connection Check
CRITICAL
Before starting the monitoring loop, verify the workspace is connected to Nx Cloud.
Step 0: Verify Nx Cloud Connection
Check
nx.json
at workspace root for
nxCloudId
or
nxCloudAccessToken
If
nx.json
missing OR neither property exists
→ exit with:
Nx Cloud not connected. Unlock 70% faster CI and auto-fix broken PRs with https://nx.dev/nx-cloud
If connected
→ continue to main loop
Architecture Overview
This skill (orchestrator)
spawns subagents, runs scripts, prints status, does local coding work
ci-monitor-subagent (haiku)
calls one MCP tool (ci_information or update_self_healing_fix), returns structured result, exits
ci-poll-decide.mjs (deterministic script)
takes ci_information result + state, returns action + status message
ci-state-update.mjs (deterministic script)
manages budget gates, post-action state transitions, and cycle classification
Status Reporting
The decision script handles message formatting based on verbosity. When printing messages to the user:
Prepend
[monitor-ci]
to every message from the script's
message
field
For your own action messages (e.g. "Applying fix via MCP..."), also prepend
[monitor-ci]
Anti-Patterns (NEVER DO)
CRITICAL
The following behaviors are strictly prohibited: Anti-Pattern Why It's Bad Using CI provider CLIs with --watch flags (e.g., gh pr checks --watch , glab ci status -w ) Bypasses Nx Cloud self-healing entirely Writing custom CI polling scripts Unreliable, pollutes context, no self-healing Cancelling CI workflows/pipelines Destructive, loses CI progress Running CI checks on main agent Wastes main agent context tokens Independently analyzing/fixing CI failures while polling Races with self-healing, causes duplicate fixes and confused state If this skill fails to activate , the fallback is: Use CI provider CLI for READ-ONLY status check (single call, no watch/polling flags) Immediately delegate to this skill with gathered context NEVER continue polling on main agent CI provider CLIs are acceptable ONLY for: One-time read of PR/pipeline status Getting PR/branch metadata NOT for continuous monitoring or watch mode Session Context Behavior Important: Within a Claude Code session, conversation context persists. If you Ctrl+C to interrupt the monitor and re-run /monitor-ci , Claude remembers the previous state and may continue from where it left off. To continue monitoring: Just re-run /monitor-ci (context is preserved) To start fresh: Use /monitor-ci --fresh to ignore previous context For a completely clean slate: Exit Claude Code and restart claude MCP Tool Reference ci_information Input: { "branch" : "string (optional, defaults to current git branch)" , "select" : "string (optional, comma-separated field names)" , "pageToken" : "number (optional, 0-based pagination for long strings)" } Field Sets for Efficient Polling: WAIT_FIELDS : 'cipeUrl,commitSha,cipeStatus'

Minimal fields for detecting new CI Attempt

LIGHT_FIELDS : 'cipeStatus,cipeUrl,branch,commitSha,selfHealingStatus,verificationStatus,userAction,failedTaskIds,verifiedTaskIds,selfHealingEnabled,failureClassification,couldAutoApplyTasks,shortLink,confidence,confidenceReasoning,hints,selfHealingSkippedReason,selfHealingSkipMessage'

Status fields for determining actionable state

HEAVY_FIELDS : 'taskOutputSummary,suggestedFix,suggestedFixReasoning,suggestedFixDescription'

Large content fields - fetch only when needed for fix decisions

Default Behaviors by Status
The decision script returns one of the following statuses. This table defines the
default behavior
for each. User instructions can override any of these.
Simple exits
— just report and exit:
Status
Default Behavior
ci_success
Exit with success
cipe_canceled
Exit, CI was canceled
cipe_timed_out
Exit, CI timed out
polling_timeout
Exit, polling timeout reached
circuit_breaker
Exit, no progress after 5 consecutive polls
environment_rerun_cap
Exit, environment reruns exhausted
fix_auto_applying
Do NOT call MCP — self-healing handles it. Record
last_cipe_url
, enter wait mode. No local git ops.
error
Wait 60s and loop
Statuses requiring action
— see subsections below:
Status
Summary
fix_apply_ready
Fix verified (all tasks or e2e-only). Apply via MCP.
fix_needs_local_verify
Fix has unverified non-e2e tasks. Run locally, then apply or enhance.
fix_needs_review
Fix verification failed/not attempted. Analyze and decide.
fix_failed
Self-healing failed. Fetch heavy data, attempt local fix (gate check first).
no_fix
No fix available. Fetch heavy data, attempt local fix (gate check first) or exit.
environment_issue
Request environment rerun via MCP (gate check first).
self_healing_throttled
Reject old fixes, attempt local fix.
no_new_cipe
CI Attempt never spawned. Auto-fix workflow or exit with guidance.
cipe_no_tasks
CI failed with no tasks. Retry once with empty commit.
fix_apply_ready
Spawn UPDATE_FIX subagent with
APPLY
Record
last_cipe_url
, enter wait mode
fix_needs_local_verify
The script returns
verifiableTaskIds
in its output.
Detect package manager:
pnpm-lock.yaml
pnpm nx
,
yarn.lock
yarn nx
, otherwise
npx nx
Run verifiable tasks in parallel
— spawn
general
subagents for each task
If all pass
→ spawn UPDATE_FIX subagent with
APPLY
, enter wait mode
If any fail
→ Apply Locally + Enhance Flow (see below)
fix_needs_review
Spawn FETCH_HEAVY subagent, then analyze fix content (
suggestedFixDescription
,
suggestedFixSummary
,
taskFailureSummaries
):
If fix looks correct → apply via MCP
If fix needs enhancement → Apply Locally + Enhance Flow
If fix is wrong → run
ci-state-update.mjs gate --gate-type local-fix
. If not allowed, print message and exit. Otherwise → Reject + Fix From Scratch Flow
fix_failed / no_fix
Spawn FETCH_HEAVY subagent for
taskFailureSummaries
. Run
ci-state-update.mjs gate --gate-type local-fix
— if not allowed, print message and exit. Otherwise attempt local fix (counter already incremented by gate). If successful → commit, push, enter wait mode. If not → exit with failure.
environment_issue
Run
ci-state-update.mjs gate --gate-type env-rerun
. If not allowed, print message and exit.
Spawn UPDATE_FIX subagent with
RERUN_ENVIRONMENT_STATE
Enter wait mode with
last_cipe_url
set
self_healing_throttled
Spawn FETCH_HEAVY subagent for
selfHealingSkipMessage
.
Parse throttle message
for CI Attempt URLs (regex:
/cipes/{id}
)
Reject previous fixes
— for each URL: spawn FETCH_THROTTLE_INFO to get
shortLink
, then UPDATE_FIX with
REJECT
Attempt local fix
Run
ci-state-update.mjs gate --gate-type local-fix
. If not allowed → skip to step 4. Otherwise use
failedTaskIds
and
taskFailureSummaries
for context.
Fallback if local fix not possible or budget exhausted
push empty commit (
git commit --allow-empty -m "ci: rerun after rejecting throttled fixes"
), enter wait mode
no_new_cipe
Report to user: no CI attempt found, suggest checking CI provider
If
--auto-fix-workflow
detect package manager, run install, commit lockfile if changed, enter wait mode
Otherwise: exit with guidance
cipe_no_tasks
Report to user: CI failed with no tasks recorded
Retry:
git commit --allow-empty -m "chore: retry ci [monitor-ci]"
+ push, enter wait mode
If retry also returns
cipe_no_tasks
exit with failure
Fix Action Flows
Apply via MCP
Spawn UPDATE_FIX subagent with
APPLY
. New CI Attempt spawns automatically. No local git ops.
Apply Locally + Enhance Flow
nx-cloud apply-locally
(sets state to
APPLIED_LOCALLY
)
Enhance code to fix failing tasks
Run failing tasks to verify
If still failing → run
ci-state-update.mjs gate --gate-type local-fix
. If not allowed, commit current state and push (let CI be final judge). Otherwise loop back to enhance.
If passing → commit and push, enter wait mode
Reject + Fix From Scratch Flow
Run
ci-state-update.mjs gate --gate-type local-fix
. If not allowed, print message and exit.
Spawn UPDATE_FIX subagent with
REJECT
Fix from scratch locally
Commit and push, enter wait mode
Environment vs Code Failure Recognition
When any local fix path runs a task and it fails, assess whether the failure is a
code issue
or an
environment/tooling issue
before running the gate script.
Indicators of environment/tooling failures
(non-exhaustive): command not found / binary missing, OOM / heap allocation failures, permission denied, network timeouts / DNS failures, missing system libraries, Docker/container issues, disk space exhaustion.
When detected → bail immediately, do NOT run gate (no budget consumed). Report that the failure is an environment/tooling issue, not a code bug.
Code failures
(compilation errors, test assertion failures, lint violations, type errors) are genuine candidates for local fix attempts and proceed normally through the gate.
Git Safety
NEVER use
git add -A
or
git add .
— always stage specific files by name
Users may have concurrent local changes that must NOT be committed
Commit Message Format
git
commit
-m
"fix():
Failed tasks: ,
Local verification: passed|enhanced|failed-pushing-to-ci"
Main Loop
Step 1: Initialize Tracking
cycle_count = 0 # Only incremented for agent-initiated cycles (counted against --max-cycles)
start_time = now()
no_progress_count = 0
local_verify_count = 0
env_rerun_count = 0
last_cipe_url = null
expected_commit_sha = null
agent_triggered = false # Set true after monitor takes an action that triggers new CI Attempt
poll_count = 0
wait_mode = false
prev_status = null
prev_cipe_status = null
prev_sh_status = null
prev_verification_status = null
prev_failure_classification = null
Step 2: Polling Loop
Repeat until done:
2a. Spawn subagent (FETCH_STATUS)
Determine select fields based on mode:
Wait mode
use WAIT_FIELDS (
cipeUrl,commitSha,cipeStatus
)
Normal mode (first poll or after newCipeDetected)
use LIGHT_FIELDS
Task(
agent: "ci-monitor-subagent",
model: haiku,
prompt: "FETCH_STATUS for branch ''.
select: ''"
)
The subagent calls
ci_information
and returns a JSON object with the requested fields. This is a
foreground
call — wait for the result.
2b. Run decision script
node
<
skill_dir
>
/scripts/ci-poll-decide.mjs
''
<
poll_count
>
<
verbosity
>
\
[
--wait-mode
]
\
[
--prev-cipe-url
<
last_cipe_url
>
]
\
[
--expected-sha
<
expected_commit_sha
>
]
\
[
--prev-status
<
prev_status
>
]
\
[
--timeout
<
timeout_seconds
>
]
\
[
--new-cipe-timeout
<
new_cipe_timeout_seconds
>
]
\
[
--env-rerun-count
<
env_rerun_count
>
]
\
[
--no-progress-count
<
no_progress_count
>
]
\
[
--prev-cipe-status
<
prev_cipe_status
>
]
\
[
--prev-sh-status
<
prev_sh_status
>
]
\
[
--prev-verification-status
<
prev_verification_status
>
]
\
[
--prev-failure-classification
<
prev_failure_classification
>
]
The script outputs a single JSON line:
{ action, code, message, delay?, noProgressCount, envRerunCount, fields?, newCipeDetected?, verifiableTaskIds? }
2c. Process script output
Parse the JSON output and update tracking state:
no_progress_count = output.noProgressCount
env_rerun_count = output.envRerunCount
prev_cipe_status = subagent_result.cipeStatus
prev_sh_status = subagent_result.selfHealingStatus
prev_verification_status = subagent_result.verificationStatus
prev_failure_classification = subagent_result.failureClassification
prev_status = output.action + ":" + (output.code || subagent_result.cipeStatus)
poll_count++
Based on
action
:
action == "poll"
Print
output.message
, sleep
output.delay
seconds, go to 2a
If
output.newCipeDetected
clear wait mode, reset
wait_mode = false
action == "wait"
Print
output.message
, sleep
output.delay
seconds, go to 2a
action == "done"
Proceed to Step 3 with
output.code
Step 3: Handle Actionable Status
When decision script returns
action == "done"
:
Run cycle-check (Step 4)
before
handling the code
Check the returned
code
Look up default behavior in the table above
Check if user instructions override the default
Execute the appropriate action
If action expects new CI Attempt
, update tracking (see Step 3a)
If action results in looping, go to Step 2
Spawning subagents for actions
Several statuses require fetching heavy data or calling MCP:
fix_apply_ready
Spawn UPDATE_FIX subagent with
APPLY
fix_needs_local_verify
Spawn FETCH_HEAVY subagent for fix details before local verification
fix_needs_review
Spawn FETCH_HEAVY subagent → get
suggestedFixDescription
,
suggestedFixSummary
,
taskFailureSummaries
fix_failed / no_fix
Spawn FETCH_HEAVY subagent → get
taskFailureSummaries
for local fix context
environment_issue
Spawn UPDATE_FIX subagent with
RERUN_ENVIRONMENT_STATE
self_healing_throttled
Spawn FETCH_HEAVY subagent → get selfHealingSkipMessage ; then FETCH_THROTTLE_INFO + UPDATE_FIX for each old fix Step 3a: Track State for New-CI-Attempt Detection After actions that should trigger a new CI Attempt, run: node < skill_dir

/scripts/ci-state-update.mjs post-action \ --action < type

\ --cipe-url < current_cipe_url

\ --commit-sha < git_rev_parse_HEAD

Action types: fix-auto-applying , apply-mcp , apply-local-push , reject-fix-push , local-fix-push , env-rerun , auto-fix-push , empty-commit-push The script returns { waitMode, pollCount, lastCipeUrl, expectedCommitSha, agentTriggered } . Update all tracking state from the output, then go to Step 2. Step 4: Cycle Classification and Progress Tracking When the decision script returns action == "done" , run cycle-check before handling the code: node < skill_dir

/scripts/ci-state-update.mjs cycle-check \ --code < code

\ [ --agent-triggered ] \ --cycle-count < cycle_count

--max-cycles < max_cycles

\ --env-rerun-count < env_rerun_count

The script returns { cycleCount, agentTriggered, envRerunCount, approachingLimit, message } . Update tracking state from the output. If approachingLimit → ask user whether to continue (with 5 or 10 more cycles) or stop monitoring If previous cycle was NOT agent-triggered (human pushed), log that human-initiated push was detected Progress Tracking no_progress_count , circuit breaker (5 polls), and backoff reset are handled by ci-poll-decide.mjs (progress = any change in cipeStatus, selfHealingStatus, verificationStatus, or failureClassification) env_rerun_count reset on non-environment status is handled by ci-state-update.mjs cycle-check On new CI Attempt detected (poll script returns newCipeDetected ) → reset local_verify_count = 0 , env_rerun_count = 0 Error Handling Error Action Git rebase conflict Report to user, exit nx-cloud apply-locally fails Reject fix via MCP ( action: "REJECT" ), then attempt manual patch (Reject + Fix From Scratch Flow) or exit MCP tool error Retry once, if fails report to user Subagent spawn failure Retry once, if fails exit with error Decision script error Treat as error status, increment no_progress_count No new CI Attempt detected If --auto-fix-workflow , try lockfile update; otherwise report to user with guidance Lockfile auto-fix fails Report to user, exit with guidance to check CI logs User Instruction Examples Users can override default behaviors: Instruction Effect "never auto-apply" Always prompt before applying any fix "always ask before git push" Prompt before each push "reject any fix for e2e tasks" Auto-reject if failedTaskIds contains e2e "apply all fixes regardless of verification" Skip verification check, apply everything "if confidence < 70, reject" Check confidence field before applying "run 'nx affected -t typecheck' before applying" Add local verification step "auto-fix workflow failures" Attempt lockfile updates on pre-CI-Attempt failures "wait 45 min for new CI Attempt" Override new-CI-Attempt timeout (default: 10 min)

返回排行榜