update-dataset

仓库: owid/etl
安装量: 49
排名: #15203

安装

npx skills add https://github.com/owid/etl --skill update-dataset
Update Dataset (PR → snapshot → steps → grapher)
Use this skill to run a complete dataset update with Claude Code subagents, keep a live progress checklist, and pause for approval at a checkpoint
after every numbered workflow step
before continuing.
Inputs
//
Get
as today's date by running
date -u +"%Y-%m-%d"
Optional trailing args:
branch: The working branch name (defaults to current branch)
Assumptions:
All artifacts are written to
workbench//
.
Persist progress to
workbench//progress.md
and update it after each step.
Progress checklist (maintain, tick live, and persist to progress.md)
(Checkpoint rule: After you finish each item below that represents a workflow step, immediately run the CHECKPOINT procedure. Do not batch multiple steps before a checkpoint.)
Parse inputs and resolve: channel, namespace, version, short_name, old_version, branch
Clean workbench directory: delete
workbench/
unless continuing existing update
Run ETL update workflow via
etl-update
subagent (help → dry run → approval → real run)
Create or reuse draft PR and work branch
Update snapshot and compare to previous version; capture summary
Meadow step: run + fix + diff + summarize
Garden step: run + fix + diff + summarize
Grapher step: run + verify (skip diffs), or explicitly mark N/A
CHECKPOINT — present consolidated summary and request approval
If approved, commit, push, and update PR description
Optional: run indicator upgrade on staging and persist report
Draft Slack announcement and notify user to post it to #data-updates-comms
Persistence:
After ticking each item, update
workbench//progress.md
with the current checklist state and a timestamp.
CHECKPOINT (mandatory user approval)
Always performed
immediately after completing each numbered workflow step
(1–6). Never start the next step until approval is granted.
Procedure (each time):
Present a concise summary of what just changed, key diffs/issues resolved, and what the next step will do.
Ask exactly: Proceed? reply: yes/no
Only continue if the user replies exactly yes (case-insensitive). Any other reply = no; stop and wait.
On approval:
Update progress checklist (tick the completed item) and write
workbench//progress.md
with timestamp.
Commit related changes (if any), push.
Update (or append to) the PR description: add a collapsed section titled with the step name (e.g., "Snapshot Update", "Meadow Update") containing the summary.
Mandatory per-step checkpoints (rule)
You MUST:
Stop after each workflow step (1–6) and run CHECKPOINT before starting the next (step 7 is optional and still requires a checkpoint if executed).
Never chain multiple steps inside a single approval.
Treat missing or ambiguous replies as no.
Workflow orchestration
Initial setup
Check if
workbench//progress.md
exists to determine if continuing existing update
If starting fresh: delete
workbench/
directory if it exists
Create fresh
workbench/
directory for artifacts
Run ETL update command (etl-update subagent)
Inputs:
//
plus any required flags
CRITICAL
Run
etl update
ONCE for the full step URI (e.g.,
data://garden/namespace/old_version/short_name
). Do NOT run it separately per channel (snapshot, meadow, garden, grapher). Running it once ensures all cross-step DAG dependencies are updated together. Running it per-channel leaves stale version references in
dag/main.yml
(e.g., garden pointing to old meadow version).
Perform help check, dry run, approval, then real execution; capture summary for later PR notes
After running,
always verify
dag/main.yml
grep for the old version and confirm all internal references between the new steps point to the new version (e.g., garden depends on new meadow, not old meadow).
CHECKPOINT (stop → summarize → ask → require yes)
Create PR and integrate update via subagent (etl-pr)
Inputs:
//
Create or reuse draft PR, set up work branch, and incorporate the ETL update outputs
CHECKPOINT
Snapshot run & compare (snapshot-runner subagent)
Inputs:
//
and
CHECKPOINT
Meadow step repair/verify (step-fixer subagent, channel=meadow)
Run, fix, re-run; produce diffs
Save diffs and summaries
CHECKPOINT
Garden step repair/verify (step-fixer subagent, channel=garden)
Run, fix, re-run; produce diffs
Save diffs and summaries
CHECKPOINT
Grapher step run/verify (step-fixer subagent, channel=grapher, add --grapher)
Skip diff
CHECKPOINT
Indicator upgrade (optional, staging only)
Use indicator-upgrader subagent with
CRITICAL
After the upgrader finishes, always verify it actually worked by querying staging: make query SQL="SELECT COUNT(*) FROM chart_dimensions cd JOIN variables v ON cd.variableId = v.id WHERE v.catalogPath LIKE '%/%'" . If the count is 0, the upgrade did not run — re-run it. CHECKPOINT (if executed) Slack announcement Fill out the template at .claude/skills/update-dataset/slack-announcement-template.md using facts gathered during the update (coverage, chart count, key changes, etc.) Ask user if unsure about any details Save the draft to workbench//slack-announcement.md Tell the user: "Slack announcement drafted at workbench//slack-announcement.md . Please review and post it to

data-updates-comms

."
Guardrails and tips
DAG consistency
After etl update , always verify that all new steps in dag/main.yml reference each other with the new version. A common bug is garden depending on old meadow or old snapshot — this silently loads stale data. Never return empty tables or comment out logic as a workaround — fix the parsing/transformations instead. Column name changes: update garden processing code and metadata YAMLs (garden/grapher) to match schema changes. Indexing: avoid leaking index columns from reset_index() ; format tables with tb.format(["country", "year"]) as appropriate. Metadata validation errors are guidance — update YAML to add/remove variables as indicated. Artifacts (expected) workbench//snapshot-runner.md workbench//progress.md workbench//meadow_diff_raw.txt and meadow_diff.md workbench//garden_diff_raw.txt and garden_diff.md workbench//indicator_upgrade.json (if indicator-upgrader was used) Example usage Minimal catalog URI with explicit old version: update-dataset data://snapshot/irena/2024-11-15/renewable_power_generation_costs 2023-11-15 update-irena-costs Common issues when data structure changes SILENT FAILURES WARNING: Never return empty tables or comment code as workarounds! Column name changes: If columns are renamed/split (e.g., single cost → local currency + PPP), update: Python code references in the garden step Garden metadata YAML (e.g., food_prices_for_nutrition.meta.yml ) Grapher metadata YAML (if it exists) Index issues: Check for unwanted index columns from reset_index() — ensure proper indexing with tb.format(["country", "year"]) . Metadata validation: Use error messages as a guide — they show exactly which variables to add/remove from YAML files.
返回排行榜