Use a media transcription capability to get the transcript
Normalize to markdown per the media format (timestamps, speaker labels, key points)
If no transcription capability is available, tell the user and skip — or accept a pre-made transcript
Local files:
Use a document conversion capability (PDF, DOCX, etc.) or read directly if already markdown
Normalize per the document format
For markdown files: add frontmatter only, preserve content
Forum threads / discussions:
Fetch and normalize per the forum format (chronological, author-attributed)
Flatten nested threads to chronological order with reply-to context
Each normalized source file goes to
sources/NNN-.md
with sequential numbering.
Step 3 — Generate manifest
Create
manifest.yaml
following the schema in
skills/swain-search/references/manifest-schema.md
. Include:
Pool metadata (id, created date, tags)
Default freshness TTL per source type
One entry per source with provenance (URL/path, fetch date, content hash, type)
Compute content hashes as SHA-256 of the normalized markdown content:
shasum
-a
256
sources/001-example.md
|
cut
-d
' '
-f1
Step 4 — Generate synthesis
Create
synthesis.md
— a structured distillation of key findings across all sources.
Structure the synthesis by
theme
, not by source. Group related findings together, cite sources by ID, and surface:
Key findings
— what the sources collectively say about the topic
Points of agreement
— where sources converge
Points of disagreement
— where sources conflict or present alternatives
Gaps
— what the sources don't cover that might matter
Keep it concise. The synthesis is a starting point, not a comprehensive report — the user or artifact author will refine it.
Step 5 — Report
Tell the user what was created:
Evidence pool
created
with N sources.
docs/evidence-pools//manifest.yaml
— provenance and metadata
docs/evidence-pools//sources/
— N normalized source files
docs/evidence-pools//synthesis.md
— thematic distillation
Reference from artifacts with:
evidence-pool: @
Extend mode
Add new sources to an existing pool.
Read the existing
manifest.yaml
Collect and normalize new sources (same as Create step 2)
Number new sources sequentially after the highest existing ID
Append new entries to
manifest.yaml
Update
refreshed
date
Regenerate
synthesis.md
incorporating all sources (old + new)
Report what was added
Refresh mode
Re-fetch stale sources and update changed content.
Read
manifest.yaml
For each source, check if
fetched
date +
freshness-ttl
has elapsed
For stale sources:
Re-fetch the raw content
Re-normalize to markdown
Compute new content hash
If hash changed: replace the source file, update manifest entry
If hash unchanged: update only
fetched
date
Update
refreshed
date in manifest
If any content changed, regenerate
synthesis.md
Report: "Refreshed N sources. M had changed content, K were unchanged."
For sources with
freshness-ttl: never
, skip them during refresh.
Discover mode
Help the user find existing pools relevant to their topic.
Scan
docs/evidence-pools/*/manifest.yaml
for all pools
Match against the user's query by:
Tag match
— pool tags contain query keywords
Title match
— pool ID slug contains query keywords
For each match, show: pool ID, tags, source count, last refreshed date, referenced-by list
If no matches, suggest creating a new pool
Graceful degradation
The skill references capabilities generically. When a capability isn't available:
Capability
Fallback
Web search
Skip search-based sources. Tell user: "No web search capability available — provide URLs directly or add a search MCP."
Browser / page fetcher
Try basic URL fetch. If that fails: "Can't fetch this URL — paste the content or provide a local file."
Media transcription
"No transcription capability available — provide a pre-made transcript file, or add a media conversion tool."
Document conversion
"Can't convert this file type — provide a markdown version, or add a document conversion tool."
Never fail the entire run because one capability is missing. Collect what you can, skip what you can't, and report clearly.
Capability detection
Before collecting sources, check what's available. Look for tools matching these patterns — the exact tool names vary by installation:
Web search
tools with "search" in the name (e.g.,
brave_web_search
,
bing-search-to-markdown
)
Page fetching
tools with "fetch", "webpage", "browser" in the name (e.g.,
fetch_content
,
webpage-to-markdown
,
browser_navigate
)
Media transcription
tools with "audio", "video", "youtube" in the name (e.g.,
audio-to-markdown
,
youtube-to-markdown
)
Document conversion
tools with "pdf", "docx", "pptx", "xlsx" in the name (e.g.,
pdf-to-markdown
,
docx-to-markdown
)
Report available capabilities at the start of collection so the user knows what will and won't work.
Linking from artifacts
Artifacts reference evidence pools in frontmatter:
evidence-pool
:
websocket
-
vs
-
sse@abc1234
The format is
@
. The commit hash pins the pool to a specific version — pools evolve over time as sources are added or refreshed, and the hash ensures reproducibility.
When creating or extending a pool, remind the user to commit and then update the referencing artifact's frontmatter with the new commit hash.