swain-search

Collect, normalize, and cache source materials into reusable evidence pools that swain-design artifacts can reference.

Mode detection

Signal

Mode

No pool exists for the topic, or user says "research X" / "gather evidence"

Create

— new pool

Pool exists and user provides new sources or says "add to" / "extend"

Extend

— add sources to existing pool

Pool exists and user says "refresh" or sources are past TTL

Refresh

— re-fetch stale sources

User asks "what pools do we have" or "find evidence about X"

Discover

— search existing pools by tag

Create mode

Build a new evidence pool from scratch.

Step 1 — Gather inputs

Ask the user (or infer from context) for:

Pool ID

— a slug for the topic (e.g.,

websocket-vs-sse

). Suggest one if the context is clear.

Tags

— keywords for discovery (e.g.,

real-time

,

websocket

,

sse

)

Sources

— any combination of:

Web search queries ("search for WebSocket vs SSE comparisons")

URLs (web pages, forum threads, docs)

Video/audio URLs

Local file paths

Freshness TTL overrides

— optional, defaults are fine for most pools

If invoked from swain-design (e.g., spike entering Active), the artifact context provides the topic, tags, and sometimes initial sources.

Step 2 — Collect and normalize

For each source, use the appropriate capability. Read

skills/swain-search/references/normalization-formats.md

for the exact markdown structure per source type.

Web search queries:

Use a web search capability to find relevant results

Select the top 3-5 most relevant results

For each: fetch the page, normalize to markdown per the web page format

If no web search capability is available, tell the user and skip

Web page URLs:

Fetch the page using a browser or page-fetching capability

Strip boilerplate (nav, ads, sidebars, cookie banners)

Normalize to markdown per the web page format

If fetch fails, record the URL in manifest with a

failed: true

flag and move on

Video/audio URLs:

Use a media transcription capability to get the transcript

Normalize to markdown per the media format (timestamps, speaker labels, key points)

If no transcription capability is available, tell the user and skip — or accept a pre-made transcript

Local files:

Use a document conversion capability (PDF, DOCX, etc.) or read directly if already markdown

Normalize per the document format

For markdown files: add frontmatter only, preserve content

Forum threads / discussions:

Fetch and normalize per the forum format (chronological, author-attributed)

Flatten nested threads to chronological order with reply-to context

Each normalized source file goes to

sources/NNN-.md

with sequential numbering.

Step 3 — Generate manifest

Create

manifest.yaml

following the schema in

skills/swain-search/references/manifest-schema.md

. Include:

Pool metadata (id, created date, tags)

Default freshness TTL per source type

One entry per source with provenance (URL/path, fetch date, content hash, type)

Compute content hashes as SHA-256 of the normalized markdown content:

shasum

-a

256

sources/001-example.md

|

cut

-d

' '

-f1

Step 4 — Generate synthesis

Create

synthesis.md

— a structured distillation of key findings across all sources.

Structure the synthesis by

theme

, not by source. Group related findings together, cite sources by ID, and surface:

Key findings

— what the sources collectively say about the topic

Points of agreement

— where sources converge

Points of disagreement

— where sources conflict or present alternatives

Gaps

— what the sources don't cover that might matter

Keep it concise. The synthesis is a starting point, not a comprehensive report — the user or artifact author will refine it.

Step 5 — Report

Tell the user what was created:

Evidence pool

created

with N sources.

docs/evidence-pools//manifest.yaml

— provenance and metadata

docs/evidence-pools//sources/

— N normalized source files

docs/evidence-pools//synthesis.md

— thematic distillation

Reference from artifacts with:

evidence-pool: @

Extend mode

Add new sources to an existing pool.

Read the existing

manifest.yaml

Collect and normalize new sources (same as Create step 2)

Number new sources sequentially after the highest existing ID

Append new entries to

manifest.yaml

Update

refreshed

date

Regenerate

synthesis.md

incorporating all sources (old + new)

Report what was added

Refresh mode

Re-fetch stale sources and update changed content.

Read

manifest.yaml

For each source, check if

fetched

date +

freshness-ttl

has elapsed

For stale sources:

Re-fetch the raw content

Re-normalize to markdown

Compute new content hash

If hash changed: replace the source file, update manifest entry

If hash unchanged: update only

fetched

date

Update

refreshed

date in manifest

If any content changed, regenerate

synthesis.md

Report: "Refreshed N sources. M had changed content, K were unchanged."

For sources with

freshness-ttl: never

, skip them during refresh.

Discover mode

Help the user find existing pools relevant to their topic.

Scan

docs/evidence-pools/*/manifest.yaml

for all pools

Match against the user's query by:

Tag match

— pool tags contain query keywords

Title match

— pool ID slug contains query keywords

For each match, show: pool ID, tags, source count, last refreshed date, referenced-by list

If no matches, suggest creating a new pool

Graceful degradation

The skill references capabilities generically. When a capability isn't available:

Capability

Fallback

Web search

Skip search-based sources. Tell user: "No web search capability available — provide URLs directly or add a search MCP."

Browser / page fetcher

Try basic URL fetch. If that fails: "Can't fetch this URL — paste the content or provide a local file."

Media transcription

"No transcription capability available — provide a pre-made transcript file, or add a media conversion tool."

Document conversion

"Can't convert this file type — provide a markdown version, or add a document conversion tool."

Never fail the entire run because one capability is missing. Collect what you can, skip what you can't, and report clearly.

Capability detection

Before collecting sources, check what's available. Look for tools matching these patterns — the exact tool names vary by installation:

Web search

tools with "search" in the name (e.g.,

brave_web_search

,

bing-search-to-markdown

)

Page fetching

tools with "fetch", "webpage", "browser" in the name (e.g.,

fetch_content

,

webpage-to-markdown

,

browser_navigate

)

Media transcription

tools with "audio", "video", "youtube" in the name (e.g.,
audio-to-markdown
,
youtube-to-markdown
)
Document conversion: tools with "pdf", "docx", "pptx", "xlsx" in the name (e.g., pdf-to-markdown , docx-to-markdown ) Report available capabilities at the start of collection so the user knows what will and won't work. Linking from artifacts Artifacts reference evidence pools in frontmatter: evidence-pool : websocket - vs - sse@abc1234 The format is @ . The commit hash pins the pool to a specific version — pools evolve over time as sources are added or refreshed, and the hash ensures reproducibility. When creating or extending a pool, remind the user to commit and then update the referencing artifact's frontmatter with the new commit hash.

安装