llm-wiki-skill — Multi-Platform Knowledge Base Builder Skill by ara.so — Daily 2026 Skills collection. Build a persistent, interlinked personal knowledge base from URLs, PDFs, markdown files, and raw text. Based on Karpathy's llm-wiki methodology: knowledge is compiled once and maintained , not re-derived from raw docs on every query. What It Does Ingests articles, tweets, PDFs, YouTube transcripts, WeChat posts, and plain text Routes each source type to the best extraction tool automatically Generates structured wiki pages with [[bidirectional links]] Produces entity pages, topic pages, source summaries, and comparisons Outputs Obsidian-compatible local markdown files Detects orphaned pages, broken links, and contradictions via health checks Installation Recommended: Let Your Agent Install It Give your agent the repo URL and ask it to install for your platform: https://github.com/sdyckjq-lab/llm-wiki-skill Manual Installation Clone the repo anywhere, then run the installer for your platform:
Claude Code
bash install.sh --platform claude
Codex
bash install.sh --platform codex
OpenClaw
bash install.sh --platform openclaw
Auto-detect (only if one platform directory exists)
bash install.sh --platform auto
Custom target directory (OpenClaw non-standard path)
bash install.sh --platform openclaw --target-dir /path/to/your/skills Default Install Locations Platform Path Claude Code ~/.claude/skills/llm-wiki Codex ~/.codex/skills/llm-wiki OpenClaw ~/.openclaw/skills/llm-wiki Legacy Claude Setup (existing users) bash setup.sh
This is now a compatibility shim for the unified installer
Prerequisites
Check Chrome debug mode is running (needed for web extraction)
google-chrome --remote-debugging-port
9222 &
Check uv is installed (needed for WeChat + YouTube extraction)
uv --version
Install bun OR npm (one is enough for web extraction deps)
curl -fsSL https://bun.sh/install | bash
OR: npm is already available in most environments
Platform Entry Points After installation, read the platform-specific instructions: Claude Code : platforms/claude/CLAUDE.md Codex : platforms/codex/AGENTS.md OpenClaw : platforms/openclaw/README.md Knowledge Base Structure your-wiki/ ├── raw/ # Immutable source material │ ├── articles/ # Web articles │ ├── tweets/ # X/Twitter │ ├── wechat/ # WeChat posts │ ├── xiaohongshu/ # Xiaohongshu (manual paste only) │ ├── zhihu/ # Zhihu │ ├── pdfs/ # PDFs │ ├── notes/ # Notes │ └── assets/ # Images, attachments ├── wiki/ # AI-generated knowledge base │ ├── entities/ # People, concepts, tools │ ├── topics/ # Topic pages │ ├── sources/ # Source summaries │ ├── comparisons/ # Side-by-side analysis │ └── synthesis/ # Cross-source synthesis ├── index.md # Master index ├── log.md # Operation log └── .wiki-schema.md # Config Core Workflows Initialize a New Knowledge Base
Ask your agent:
"Create a new knowledge base at ~/my-wiki"
The agent will scaffold the directory structure,
generate index.md, log.md, and .wiki-schema.md
Ingest a Web Article
Agent command pattern:
"Add this article to my wiki: https://example.com/article"
Under the hood, the agent routes to baoyu-url-to-markdown:
npx baoyu-url-to-markdown https://example.com/article
raw/articles/article-slug.md Ingest a YouTube Video
Agent command pattern:
"Digest this YouTube video into my knowledge base: https://youtube.com/watch?v=abc123"
Uses youtube-transcript via uv:
uvx youtube-transcript https://youtube.com/watch?v
abc123
raw/articles/video-slug.md Ingest a WeChat Article
Agent command pattern:
"Add this WeChat article to my wiki: https://mp.weixin.qq.com/s/..."
Uses wechat-article-to-markdown via uv:
uvx wechat-article-to-markdown https://mp.weixin.qq.com/s/ .. .
raw/wechat/article-slug.md Ingest a PDF
Agent command pattern:
"Process this PDF into my wiki: /path/to/paper.pdf"
OR drag a file into the chat
No external tool needed — goes directly into main pipeline:
cp /path/to/paper.pdf raw/pdfs/paper.pdf Ingest Raw Text or Notes
Just paste text to your agent:
"Add these notes to my wiki: [paste content]"
Agent writes directly to:
echo "your content"
raw/notes/note-slug.md Batch Process a Folder
Agent command pattern:
"Process all files in ~/Downloads/research into my wiki"
Agent iterates over files and routes each by type
for f in ~/Downloads/research/* ; do
agent determines type and processes accordingly
done Health Check
Agent command pattern:
"Run a health check on my knowledge base"
Agent checks for:
- Orphaned pages (no incoming links)
- Broken [[wiki links]]
- Contradictory information across pages
- Missing source summaries
Source Routing Reference Source Type Tool Used Requires Web articles baoyu-url-to-markdown Chrome debug mode X/Twitter baoyu-url-to-markdown Chrome debug mode + X login Zhihu baoyu-url-to-markdown Chrome debug mode WeChat wechat-article-to-markdown uv YouTube youtube-transcript uv Xiaohongshu Manual paste Nothing PDF / Markdown / Text Direct pipeline Nothing Source registry lives at: scripts/source-registry.tsv Routing logic lives at: scripts/source-registry.sh Wiki Page Conventions Entity Page (wiki/entities/andrej-karpathy.md)
Andrej Karpathy
Overview Former OpenAI/Tesla researcher, creator of llm-wiki methodology.
Key Ideas
[[llm-wiki]] — compile knowledge once, maintain over time
[[nanoGPT]] — minimal GPT implementation for education
Sources
[[sources/llm-wiki-gist-2024]]
[[sources/karpathy-interview-2023]]
Related
[[topics/language-models]]
[[entities/openai]] Topic Page (wiki/topics/retrieval-augmented-generation.md)
Retrieval-Augmented Generation
Summary ...
Key Entities
[[entities/langchain]]
[[entities/llamaindex]]
Comparisons
[[comparisons/rag-vs-finetuning]]
Sources
[[sources/rag-paper-2020]] Source Summary (wiki/sources/article-slug.md)
Source: Article Title
- **
- URL
- **
-
https://example.com/article
- **
- Date ingested
- **
-
2026-04-10
- **
- Type
- **
- web article
Key Points 1. ... 2. ...
Entities Mentioned
[[entities/...]]
Raw**: [[raw/articles/article-slug]] Troubleshooting Chrome / Web Extraction Fails
Start Chrome with remote debugging enabled
google-chrome --remote-debugging-port
9222 --no-first-run &
Verify it's running
curl http://localhost:9222/json/version
For X/Twitter: make sure you're logged in on that Chrome session
Then retry the extraction
WeChat or YouTube Extraction Fails
Install uv if missing
curl -LsSf https://astral.sh/uv/install.sh | sh source $HOME /.cargo/env
Re-run installer to pick up uv
bash install.sh --platform claude
or your platform
Verify uv tools work
uvx youtube-transcript --help uvx wechat-article-to-markdown --help bun/npm Dependency Install Fails
The installer auto-selects bun or npm — check which is available
which bun && echo "bun found" which npm && echo "npm found"
Manually install web extraction deps with npm
npm install -g baoyu-url-to-markdown Codex Legacy Path Compatibility
Old path still supported automatically:
~/.Codex/skills
capital C — installer handles both
~/.codex/skills
lowercase — new default
Agent Can't Find Installed Skill
Verify install location for your platform
ls ~/.claude/skills/llm-wiki/
Claude Code
ls ~/.codex/skills/llm-wiki/
Codex
ls ~/.openclaw/skills/llm-wiki/
OpenClaw
Re-run installer if directory is missing
bash install.sh --platform < your-platform
Source Registry Lookup
Check registered sources and routing
cat scripts/source-registry.tsv
Test routing for a URL
bash
scripts/source-registry.sh route
"https://mp.weixin.qq.com/s/example"
Key Design Principles
Compile once, maintain
— wiki pages are living documents, not ephemeral answers
Bidirectional links
— every entity and topic links to related nodes with
[[name]]
Immutable raw
— source files in
raw/
are never modified after ingestion
Graceful degradation
— if a tool fails, agent prompts for manual paste instead of crashing
Platform-agnostic
— same knowledge base works across all supported agents
Obsidian-compatible
— open
your-wiki/
directly in Obsidian at any time
Quick Reference for Agents
Initialize wiki: "Create a new wiki at