Claude History Ingest — Conversation Mining You are extracting knowledge from the user's past Claude Code conversations and distilling it into the Obsidian wiki. Conversations are rich but messy — your job is to find the signal and compile it. This skill can be invoked directly or via the wiki-history-ingest router ( /wiki-history-ingest claude ). Before You Start Read .env to get OBSIDIAN_VAULT_PATH and CLAUDE_HISTORY_PATH (defaults to ~/.claude ) Read .manifest.json at the vault root to check what's already been ingested Read index.md at the vault root to know what the wiki already contains Ingest Modes Append Mode (default) Check .manifest.json for each source file (conversation JSONL, memory file). Only process: Files not in the manifest (new conversations, new memory files, new projects) Files whose modification time is newer than their ingested_at in the manifest This is usually what you want — the user ran a few new sessions and wants to capture the delta. Full Mode Process everything regardless of manifest. Use after a wiki-rebuild or if the user explicitly asks. Claude Code Data Layout Claude Code stores everything under ~/.claude/ . Here is the actual structure: ~/.claude/ ├── projects/ # Per-project directories │ ├── -Users-name-project-a/ # Path-derived name (slashes → dashes) │ │ ├── .jsonl # Conversation data (JSONL) │ │ └── memory/ # Structured memories │ │ ├── MEMORY.md # Memory index │ │ ├── user_.md # User profile memories │ │ ├── feedback_.md # Workflow feedback memories │ │ └── project_.md # Project context memories │ ├── -Users-name-project-b/ │ │ └── ... ├── sessions/ # Session metadata (JSON) │ └── .json # {pid, sessionId, cwd, startedAt, kind, entrypoint} ├── history.jsonl # Global session history ├── tasks/ # Subagent task data ├── plans/ # Saved plans └── settings.json Key data sources ranked by value: Memory files ( projects//memory/.md ) — Pre-distilled, already wiki-friendly. These contain the user's preferences, project decisions, and feedback. Gold. Conversation JSONL ( projects//.jsonl ) — Full conversation transcripts. Rich but noisy. Session metadata ( sessions/.json ) — Tells you which project, when, and what CWD. Step 1: Survey and Compute Delta Scan CLAUDE_HISTORY_PATH and compare against .manifest.json :

Find all projects

Glob: ~/.claude/projects/*/

Find memory files (highest value)

Glob: ~/.claude/projects//memory/.md

Find conversation JSONL files

Glob: ~/.claude/projects//.jsonl Build an inventory and classify each file: New — not in manifest → needs ingesting Modified — in manifest but file is newer → needs re-ingesting Unchanged — in manifest and not modified → skip in append mode Report to the user: "Found X projects, Y conversations, Z memory files. Delta: A new, B modified." Step 2: Ingest Memory Files First Memory files are already structured with YAML frontmatter:

name : memory - name description : one - line description type : user | feedback | project | reference

Memory content here.
For each memory file:
Read it and parse the frontmatter
user
type → feeds into an entity page about the user, or concept pages about their domain
feedback
type → feeds into skills pages (workflow patterns, what works, what doesn't)
project
type → feeds into entity pages for the project
reference
type → feeds into reference pages pointing to external resources
The
MEMORY.md
index file in each project is a quick summary — read it first to decide which individual memory files are worth reading in full.
Step 3: Parse Conversation JSONL
Each JSONL file is one conversation session. Each line is a JSON object:
{
"type"
:
"user|assistant|progress|file-history-snapshot"
,
"message"
:
{
"role"
:
"user|assistant"
,
"content"
:
"text string"
}
,
"uuid"
:
"..."
,
"timestamp"
:
"2026-03-15T10:30:00.000Z"
,
"sessionId"
:
"..."
,
"cwd"
:
"/path/to/project"
,
"version"
:
"2.1.59"
}
For assistant messages,
content
may be an array of content blocks:
{
"content"
:
[
{
"type"
:
"thinking"
,
"text"
:
"..."
}
,
{
"type"
:
"text"
,
"text"
:
"The actual response..."
}
,
{
"type"
:
"tool_use"
,
"name"
:
"Read"
,
"input"
:
{
...
}
}
]
}
What to extract from conversations:
Filter to
type: "user"
and
type: "assistant"
entries only
For assistant entries, extract
text
blocks (skip
thinking
and
tool_use
— those are noise)
The
cwd
field tells you which project this conversation belongs to
The project directory name (e.g.,
-Users-name-Documents-projects-my-app
) tells you the project path
Skip these:
type: "progress"
— internal agent progress updates
type: "file-history-snapshot"
— file state tracking
Subagent conversations (under
subagents/
subdirectories) — unless the user specifically asks
Step 4: Cluster by Topic
Don't create one wiki page per conversation. Instead:
Group extracted knowledge
by topic
across conversations
A single conversation about "debugging auth + setting up CI" → two separate topics
Three conversations across different days about "React performance" → one merged topic
The project directory name gives you a natural first-level grouping
Step 5: Distill into Wiki Pages
Each Claude project maps to a project directory in the vault. The project directory name from
~/.claude/projects/
encodes the original path — decode it to get a clean project name:
-Users/Documents/projects/my-Project → myproject
-Users/Documents/projects/Another-app → anotherapp
Project-specific vs. global knowledge
What you found
Where it goes
Example
Project architecture decisions
projects//concepts/
projects/my-project/concepts/main-architecture.md
Project-specific debugging
projects//skills/
projects/my-project/skills/api-rate-limiting.md
General concept the user learned
concepts/
(global)
concepts/react-server-components.md
Recurring problem across projects
skills/
(global)
skills/debugging-hydration-errors.md
A tool/service used
entities/
(global)
entities/vercel-functions.md
Patterns across many conversations
synthesis/
(global)
synthesis/common-debugging-patterns.md
For each project with content, create or update the project overview page at
projects//.md
—
named after the project, not
_project.md
. Obsidian's graph view uses the filename as the node label, so
_project.md
makes every project show up as
_project
in the graph. Naming it
.md
gives each project a distinct, readable node name.
Important:
Distill the
knowledge
, not the conversation. Don't write "In a conversation on March 15, the user asked about X." Write the knowledge itself, with the conversation as a source attribution.
Write a
summary:
frontmatter field
on every new/updated page — 1–2 sentences, ≤200 chars, answering "what is this page about?" for a reader who hasn't opened it.
wiki-query
's cheap retrieval path reads this field to avoid opening page bodies.
Mark provenance
per the convention in
llm-wiki
(Provenance Markers section):
Memory files
are mostly extracted — the user wrote them by hand and they're already distilled. Treat memory-derived claims as extracted unless you're stitching together claims from multiple memory files.
Conversation distillation
is mostly inferred. You're synthesizing a coherent claim from many turns of dialogue, often filling in implicit reasoning. Apply
^[inferred]
liberally to synthesized patterns, generalizations across sessions, and "what the user really meant" interpretations.
Use
^[ambiguous]
when the user changed their mind across sessions or when assistant and user contradicted each other and the resolution is unclear.
Write a
provenance:
frontmatter block on every new/updated page summarizing the rough mix.
Step 6: Update Manifest, Journal, and Special Files
Update
.manifest.json
For each source file processed (conversation JSONL, memory file), add/update its entry with:
ingested_at
,
size_bytes
,
modified_at
source_type
:
"claude_conversation"
or
"claude_memory"
project: the decoded project name pages_created and pages_updated lists Also update the projects section of the manifest: { "project-name" : { "source_path" : "~/.claude/projects/-Users-..." , "vault_path" : "projects/project-name" , "last_ingested" : "TIMESTAMP" , "conversations_ingested" : 5 , "conversations_total" : 8 , "memory_files_ingested" : 3 } } Create journal entry + update special files Update index.md and log.md per the standard process: - [TIMESTAMP] CLAUDE_HISTORY_INGEST projects=N conversations=M pages_updated=X pages_created=Y mode=append|full Privacy Distill and synthesize — don't copy raw conversation text verbatim Skip anything that looks like secrets, API keys, passwords, tokens If you encounter personal/sensitive content, ask the user before including it The user's conversations may reference other people — be thoughtful about what goes in the wiki Reference See references/claude-data-format.md for more details on the data structures.

claude-history-ingest

安装

Find all projects

Find memory files (highest value)

Find conversation JSONL files