ai-paper-reproduction
Use when
The user wants the agent to reproduce an AI paper repository.
The target is a code repository with a README, scripts, configs, or documented commands.
The goal is a minimal trustworthy run, not unlimited experimentation.
The user needs standardized outputs that another human or model can audit quickly.
The task spans more than one stage, such as intake plus setup, or setup plus execution plus reporting.
Do not use when
The task is a general literature review or paper summary.
The task is to design a new model, benchmark suite, or training pipeline from scratch.
The repository is not centered on AI or does not expose a documented reproduction path.
The user primarily wants a deep code refactor rather than README-first reproduction.
The user is explicitly asking for only one narrow phase that a sub-skill already covers cleanly.
The user is explicitly authorizing exploratory branch-only experimentation instead of trusted reproduction.
Success criteria
README is treated as the primary source of reproduction intent.
A minimum trustworthy target is selected and justified.
Documented inference is preferred over evaluation, and evaluation is preferred over training.
Any repo edits remain conservative, explicit, and auditable.
Assumptions, protocol deviations, and human decision points are surfaced rather than hidden.
repro_outputs/
is generated with consistent structure and stable machine-readable fields.
Final user-facing explanation is short and follows the user's language when practical.
Interaction and usability policy
Keep the workflow simple enough for a new user to understand quickly.
Prefer short, concrete plans over exhaustive research.
Expose commands, assumptions, blockers, and evidence.
Avoid turning the skill into an opaque automation layer.
Preserve a low learning cost for both humans and downstream agents.
Language policy
Human-readable Markdown outputs should follow the user's language when it is clear.
If the user's language is unclear, default to concise English.
Machine-readable fields, filenames, keys, and enum values stay in stable English.
Paths, package names, CLI commands, config keys, and code identifiers remain unchanged.
See
references/language-policy.md
.
Reproduction policy
Core priority order:
documented inference
documented evaluation
documented training startup or partial verification
full training only when the user explicitly asks later
Rules:
README-first: use repository files to clarify, not casually override, the README.
Aim for minimal trustworthy reproduction rather than maximum task coverage.
Treat smoke tests, startup verification, and early-step checks as valid training evidence when full training is not appropriate.
In trusted reproduction, a documented training command should first be checked through startup verification or a short monitoring window, then paused for explicit human confirmation before broader training continues.
In explicitly authorized explore-lane execution, the training record can continue without the trusted-lane confirmation pause, but it must stay isolated from trusted conclusions.
Record unresolved gaps rather than fabricating confidence.
Patch policy
Prefer no code changes.
Prefer safer adjustments first:
command-line arguments
environment variables
path fixes
dependency version fixes
dependency file fixes such as
requirements.txt
or
environment.yml
Avoid changing:
model architecture
core inference semantics
core training logic
loss functions
experiment meaning
If repository files must change:
create a patch branch first using
repro/YYYY-MM-DD-short-task
apply low-risk changes before medium-risk changes
avoid high-risk changes by default
commit only verified groups of changes
keep verified patch commits sparse, usually
0-2
use commit messages in the form
repro:
ai-paper-reproduction
安装
npx skills add https://github.com/lllllllama/ai-paper-reproduction-skill --skill ai-paper-reproduction