Recursive Self-Improvement Loop A pattern for generating higher-quality output by iterating against explicit scoring criteria. The Pattern generate → evaluate → diagnose → improve → repeat (until passing) Never ship first-draft output for important content. Run the loop. How It Works 1. Generate Create the initial output as you normally would. 2. Evaluate Score the output against each criterion (1-10). Be brutally honest. 3. Diagnose For any criterion scoring below threshold: What specifically is weak? Why does it fail? What would "passing" look like? 4. Improve Rewrite addressing each diagnosed weakness. Don't patch — rebuild the weak sections. 5. Repeat Re-evaluate. Keep looping until all criteria pass threshold (usually 8/10 minimum). Adversarial Pressure (Optional but Powerful) After passing criteria, attack the output from a hostile perspective: Skeptical customer: "Why should I believe this? What's the catch?" Distracted scroller: "Would I stop for this? In 2 seconds?" Competitor: "How would a rival tear this apart?" If it survives, ship it. If not, iterate. Example Criteria by Use Case Social Content Criterion What to evaluate Hook strength First line grabs attention? Pattern interrupt? Curiosity gap Creates urge to keep reading? Clarity One clear idea? No confusion? Voice match Sounds like the target voice/brand? Engagement potential People will reply/share/save? Thumb-stop power Scroller would pause? Value density Every line earns its place? CTA clarity Clear what reader should do next? Adversarial test: Would a distracted, skeptical user at 11pm engage with this? Landing Page / Web Copy Criterion What to evaluate Headline clarity Instantly clear what this business does? Value prop strength Why choose them over competitors? Benefit focus Features translated to customer benefits? CTA effectiveness Clear, compelling action? Low friction? Trust signals Credibility established? Social proof? Readability Scannable? Short paragraphs? Clear hierarchy? Objection handling Common concerns addressed? Specificity Concrete details vs vague claims? Adversarial test: Would someone searching on their phone take action within 30 seconds? Email Copy Criterion What to evaluate Subject line Would this get opened? Stands out in inbox? Opening hook First sentence earns the second? Single focus One clear ask per email? Skimmability Can get the gist in 5 seconds? CTA prominence Action is obvious and easy? Voice consistency Matches brand/sender personality? Length appropriate No fluff, nothing missing? Mobile friendly Works on small screens? Adversarial test: Would a busy person with 200 unread emails act on this? Ad Copy Criterion What to evaluate Thumb-stop power Pattern interrupt in first 2 seconds? Curiosity gap Creates need to know more? Emotional trigger Hits a real pain point or desire? Credibility Believable? Not too good to be true? CTA strength Clear next step with low friction? Persona match Speaks directly to target audience? Differentiation Stands out from competitor ads? Platform native Fits the platform's style/format? Adversarial test: Would this stop YOUR scroll? Would you click? When to Use Always use for: Headlines and hooks CTAs and value props Key landing page sections Social posts (especially threads) Ad copy Important emails Can skip for: Internal notes First-pass brainstorming Technical documentation Boilerplate content Building Your Own Criteria Pick one task you do repeatedly Write down how YOU evaluate that output — what makes "good" vs "mid"? Turn each into a pass/fail threshold — be specific ("9/10 minimum" not "make it good") Add adversarial pressure — who would attack this? What would they say? Save and reuse — now you have a system, not just a prompt Quick Loop Template
Output v1 [Initial generation]
Evaluation v1
Hook strength: 6/10 — Opens weak, no pattern interrupt
Clarity: 8/10 — Clear enough
Voice match: 7/10 — Too formal [... score all criteria]
Diagnosis 1. Hook needs a surprising stat or contrarian take 2. Voice should be more casual, shorter sentences 3. [...]
Output v2 [Revised version addressing weaknesses]
Evaluation v2 [Re-score — continue until all pass] The loop typically adds 2-3 iterations. Worth it for anything that matters.