ICML Paper Reviewer

Enables rigorous review of ML research papers following official ICML guidelines.

Workflow

Step 1: Input Analysis & Mode Selection

Determine input type:

Complete paper

PDF/text with abstract, methodology, experiments, results → Full Review Mode

Incomplete document

Missing major sections, labeled draft/proposal, or user indicates early stage → Early-Stage Feedback Mode

Code repository

User points to folder/repo path → Repository Review Mode

For complete papers

, extract: title, abstract, main claims, methodology, experiments, results. Identify paper type: theoretical, methodological, algorithmic, empirical, bridge paper, or application-driven.

For code repositories

, first explore: read README, scan code structure, find experiment scripts/results, identify the research question and what's implemented.

Step 2: Prior Work Grounding (Critical - All Modes)

This step applies to ALL input types. Grounding in reality is essential for any meaningful feedback.

Generate 3-5 search queries based on the research topic: benchmarks/baselines, same problem, related techniques

Use WebSearch to find recent arXiv papers and published work

Fetch abstracts of 5-10 most relevant papers

Critically synthesize

:

What specific claims in this paper are already addressed by prior work?

What are the actual quantitative improvements over recent baselines?

Are claimed "novelties" actually novel given the literature?

What gaps truly exist vs. what the authors claim exists?

Critical mindset

:

Your job is to verify claims against reality, not accept them at face value

Most papers overclaim—your review should ground their contributions in what the literature actually shows

Default to skepticism: Assume claims are overstated until proven otherwise by evidence

Authors have selection bias toward their own work; you represent the community's interests

Be the critical voice that ensures published work actually advances the field

Then proceed to mode-specific evaluation.

Full Review Mode (Complete Papers)

Step 3: Systematic Evaluation

Evaluate across 7 dimensions (see

references/evaluation-criteria.md

).

Default to skepticism—require strong evidence to score highly.

Dimension

Key Questions (Answer with Literature Evidence)

Originality

Is this truly novel given recent work X, Y, Z? What specific aspects are incremental vs. novel?

Importance

Why does this problem matter? What's the real-world impact? Who will care?

Claims Support

Do experiments actually prove the claims? What alternative explanations exist?

Experimental Soundness

Are baselines from 2023+? Are comparisons fair? What's missing?

Clarity

Can I reproduce this from the paper? Are claims precisely stated?

Community Value

Will this change how people work? Or just add noise?

Prior Work Context

Are comparisons accurate? What recent work (last 2 years) is missing?

Evaluation mindset

:

Start from neutral and require evidence to move up or down

Compare every claim against what you found in the literature search

Most papers are incremental—high originality scores are rare

Weak baselines or missing comparisons are critical flaws, not minor issues

Step 4: Critical Cross-Check Against Literature

Before writing the review, explicitly verify:

Baselines check

List baselines used in paper. List baselines from your literature search of adjacent papers. What's missing?

Methodology check

How do 2-3 adjacent papers approach this problem? Does this paper follow similar methodology? If not, why not?

Claims check

List main claims. For each, cite specific evidence from experiments or proofs. If insufficient, note it.

Citations check

Which papers from your search are cited? Which are missing? Why?

Novelty check

List claimed novelties. For each, cite specific prior work that does or doesn't do this.

This step is not optional. Your review must reference specific findings from your literature search.

Step 5: Generate Review

Follow the ICML review form (see

references/review-template.md

):

Summary

- Neutral, factual (should not be disputed by authors)

Claims and Evidence

- Are claims supported?

Compare to what literature shows

Relation to Prior Work

- Proper context? Missing citations?

List specific missing papers

Strengths

- Specific and substantive,

compared to standards in adjacent work

Weaknesses

- Constructive, explain severity,

cite specific literature for comparison

Questions for Authors

- Numbered, explain impact on evaluation

Minor Issues

- Typos, suggestions

Overall Recommendation

- 1-5 scale with justification

grounded in literature comparison

Confidence Score

- 1-5 scale

Step 6: Quality Check

Verify all claims in review are substantiated

Ensure constructive tone

Check specificity of strengths/weaknesses

Confirm questions are actionable

Key Principles

Be Rigorous AND Constructive

Your primary duty is to the research community—publishing weak papers dilutes the literature.

Be honest

Don't inflate scores to be nice. If baselines are weak, say so clearly.

Be specific

Always cite which literature contradicts or supports claims.

Be fair

Criticism should be substantiated by evidence or literature.

Be actionable

Tell authors exactly what would fix the issues.

"Review the papers of others as you would wish your own to be reviewed"—with rigor, honesty, and specific feedback grounded in the literature.

Be Specific

Bad: "The experiments are weak"

Good: "Experiments compare only against [X] from 2019, but recent baselines [Y] (2024) and [Z] (2024) should be included."

Fair Novelty Assessment

Originality may arise from: creative combinations, new domains, removing restrictive assumptions, novel datasets, new problem formulations.

But

Most claimed novelty is actually incremental. Verify against literature before accepting novelty claims.

Score Calibration

Use this reference frame:

5s are rare

Reserve for papers that will clearly influence the field

4s are uncommon

Solid papers with rigorous execution and clear contributions

3s are common

Papers with merit but significant limitations

2s are common

Incremental work or work with major methodological issues

1s indicate fundamental problems

Wrong results, no contribution, or severe ethical issues

If you find yourself giving mostly 4s and 5s, you're likely being too generous. Re-calibrate against what the literature shows is standard.

Application-Driven Papers

For application-driven ML: methods should fit real-world constraints, non-standard datasets acceptable if documented, compare against domain baselines.

Rating Scales

Overall (1-5):

Use the full range. Most papers should be 2-3.

5 (Strong Accept)

Significant contribution, will be influential, no major flaws

4 (Accept)

Solid contribution, rigorous execution, minor issues only

3 (Weak Accept)

Contribution exists but limited; or good idea with execution flaws

2 (Weak Reject)

Incremental contribution insufficient for venue; or significant methodological issues
1 (Reject): Fundamental flaws, not ready, or no meaningful contribution Red flags that should lower scores : Baselines older than 2 years (unless explicitly justified) Missing comparisons to obvious related work from literature search Claims not directly supported by presented experiments Novelty claims contradicted by prior work Confidence (1-5): 5=Expert/certain, 4=Confident, 3=Fairly confident, 2=Uncertain, 1=Not in area Early-Stage Feedback Mode Use this mode for incomplete drafts, research proposals, or code repositories. Focus shifts from "accept/reject evaluation" to "constructive guidance on how to make this publishable." After completing Steps 1-2 (input analysis and prior work grounding), proceed here. Step 3: Generate Formative Feedback Use the Early-Stage Feedback Template (see references/review-template.md ). No numerical scores—focus on constructive guidance. For code repositories , additionally address: Code quality and organization Experiment design and reproducibility What's missing for a paper (baselines, ablations, analysis) References references/evaluation-criteria.md - Detailed criteria for each dimension references/review-template.md - Full template with examples references/common-issues.md - Common paper issues to identify

icml-reviewer

安装