Knowledge Base Manager

Build and maintain high-quality knowledge bases for AI systems and human consumption.

Core Principle

Knowledge Base = Structured Information + Quality Curation + Accessibility

A knowledge base is not just a data dump—it's curated, validated, versioned information designed to answer questions and enable reasoning.

When to Use Knowledge Bases

Use Knowledge Bases When:

✅ Need to answer factual questions consistently

✅ Information changes frequently and needs version control

✅ Multiple sources need to be unified and reconciled

✅ Provenance and citation tracking is critical

✅ Building AI systems that need grounded, verifiable information

✅ Organizational knowledge needs to be preserved and searchable

✅ Complex domain with interconnected concepts

Don't Use Knowledge Bases When:

❌ Static documentation is sufficient (use docs + search)

❌ No one will maintain/update it (knowledge rot guaranteed)

❌ Simple FAQ covers all questions (<50 items)

❌ Information doesn't change (static site faster/cheaper)

❌ Team lacks resources for curation

Knowledge Base Types: Decision Framework

1. Document-Based Knowledge Base (RAG)

What it is:

Collection of documents, chunked and embedded for semantic search

Best for:

Technical documentation

Support articles, FAQs

Policy documents

Research papers

Blog content

User manuals

Strengths:

Easy to add new documents

Preserves full context

Natural for text-heavy content

Weaknesses:

Hard to query relationships ("Who works where?")

Duplicate information across documents

Difficult to keep facts consistent

Use:

rag-implementer

skill +

vector-database-mcp

2. Entity-Based Knowledge Base (Knowledge Graph)

What it is:

Network of entities (people, places, things) connected by relationships

Best for:

Organizational charts

Product catalogs with relationships

Social networks

Recommendation systems

Fraud detection

Supply chain tracking

Strengths:

Excellent for "how are X and Y related?" queries

Consistent facts (one source of truth)

Powerful traversal ("friends of friends")

Weaknesses:

Upfront modeling required (ontology design)

Harder to add unstructured information

Learning curve for graph queries

Use:

knowledge-graph-builder

skill +

graph-database-mcp

3. Hybrid Knowledge Base (RAG + Graph)

What it is:

Documents for unstructured knowledge + Graph for structured entities/relationships

Best for:

Enterprise knowledge management

Research with citations and relationships

Medical systems (documents + patient/drug relationships)

Legal systems (cases + precedents + entities)

E-commerce (products + specs + relationships)

Strengths:

Best of both worlds

Flexible for different knowledge types

Rich querying capabilities

Weaknesses:

Most complex to build and maintain

Requires expertise in both RAG and graphs

Higher infrastructure costs

Use:

Both

rag-implementer

+

knowledge-graph-builder

skills

Decision Tree: Which KB Type?

What kind of knowledge do you have?

├─ Mostly unstructured text (docs, articles, content)?

│ └─ Document-Based KB (RAG)

│ Use: rag-implementer skill

│

├─ Mostly structured entities with relationships?

│ └─ Entity-Based KB (Graph)

│ Use: knowledge-graph-builder skill

│

└─ Mix of both?

└─ Hybrid KB (RAG + Graph)

Use: Both skills + This skill for integration

6-Phase Knowledge Base Implementation

Phase 1: Knowledge Audit & Architecture

Goal

Understand what knowledge exists and how to structure it

Actions

:

Inventory existing knowledge sources

Internal: databases, documents, wikis, Slack, emails

External: public data, APIs, third-party sources

Tribal: SME interviews, recorded conversations

Classify knowledge types

Factual

Verifiable facts ("Product X costs $50")

Procedural

How-to knowledge ("How to deploy")

Conceptual

Definitions and explanations

Relationship

Connections between entities

Choose KB architecture

Document-based? Entity-based? Hybrid?

Decision: Use framework above

Define knowledge schema

For documents: metadata fields (source, date, author, category)

For entities: ontology (entity types, relationship types, properties)

Validation

:

All knowledge sources inventoried and prioritized

KB architecture chosen and justified

Schema defined and validated with users

Success metrics established

Phase 2: Knowledge Curation & Ingestion

Goal

Transform raw information into high-quality knowledge

Actions

:

Extract knowledge from sources

Automated: scraping, API ingestion, file parsing

Manual: expert input, annotation, validation

Clean and normalize

Remove duplicates

Standardize formats

Fix inconsistencies

Enrich with metadata

Structure knowledge

For documents: chunk intelligently (semantic boundaries)

For entities: extract entities, relationships, properties

Add provenance

Source URL or reference

Last updated timestamp

Author/contributor

Confidence score (if applicable)

Curation Best Practices

:

Single Source of Truth

One canonical answer per question

Deduplication

Merge similar knowledge entries

Conflict Resolution

When sources disagree, establish priority rules

Metadata Richness

More metadata = better filtering and search

Validation

:

Knowledge extracted and structured

Quality metrics above threshold (accuracy >95%)

Provenance tracked for all entries

Sample queries return relevant results

Phase 3: Storage & Retrieval Setup

Goal

Implement technical infrastructure for knowledge access

Architecture Patterns

:

For Document-Based KB:

// Vector database for semantic search

interface

DocumentKB

{

store

:

'Pinecone'

|

'Weaviate'

|

'pgvector'

chunks

:

{

content

:

string

embedding

:

number

[

]

metadata

:

{

source

:

string

title

:

string

updated_at

:

string

category

:

string

}

[

]

}

For Entity-Based KB:

// Graph database for relationship queries

interface

EntityKB

{

store

:

'Neo4j'

|

'ArangoDB'

nodes

:

{

id

:

string

type

:

'Person'

|

'Organization'

|

'Product'

|

'Concept'

properties

:

Record

<

string

,

any

>

}

[

]

relationships

:

{

from

:

string

to

:

string

type

:

string

properties

:

Record

<

string

,

any

>

}

[

]

}

For Hybrid KB:

// Both vector DB + graph DB

interface

HybridKB

{

vectorDB

:

DocumentKB

graphDB

:

EntityKB

linker

:

{

// Links documents to entities mentioned in them

linkDocumentToEntities

(

docId

:

string

)

:

string

[

]

// Links entities to documents that mention them

linkEntityToDocuments

(

entityId

:

string

)

:

string

[

]

}

Actions

:

Choose database(s)

Document: Pinecone, Weaviate, pgvector

Entity: Neo4j, ArangoDB

Hybrid: Both + linking layer

Implement search/query layer

Vector similarity search (for documents)

Graph traversal (for entities)

Hybrid queries (combining both)

Add caching and optimization

Cache frequent queries

Optimize for common access patterns

Validation

:

Database deployed and accessible

Search/query functionality working

Performance meets requirements (<100ms for most queries)

Phase 4: Quality Control & Validation

Goal

Ensure knowledge base accuracy and reliability

Quality Metrics

:

Accuracy

% of correct answers to test questions

Coverage

% of user questions answerable

Freshness

Average age of knowledge

Consistency

% of conflicts/contradictions

Source Quality

% from authoritative sources

Validation Strategies

:

1. Test Question Sets

Create 100+ test questions with known correct answers:

interface

TestQuestion

{

question

:

string

expected_answer

:

string

category

:

string

difficulty

:

'easy'

|

'medium'

|

'hard'

}

2. Human Review

Sample random knowledge entries

Subject matter expert validation

User feedback loops

3. Automated Checks

Duplicate Detection

Find near-identical entries

Conflict Detection

Find contradictory facts

Staleness Detection

Flag outdated information

Citation Validation

Verify sources still exist

4. Continuous Monitoring

interface

KBHealthMetrics

{

accuracy_score

:

number

// 0-100

coverage_score

:

number

// % questions answered

freshness_score

:

number

// avg days since update

consistency_score

:

number

// % no conflicts

user_satisfaction

:

number

// feedback rating

}

Actions

:

Run test question validation (target: >90% accuracy)

Conduct human review (sample 10% of entries)

Fix detected issues (duplicates, conflicts, staleness)

Establish monitoring dashboards

Validation

:

Accuracy >90% on test questions

Coverage >80% of user questions

<5% conflicting information

Monitoring dashboard operational

Phase 5: Versioning & Evolution

Goal

Track knowledge changes over time and enable rollback

Why Versioning Matters

:

Knowledge changes (facts update, policies change)

Need audit trail (who changed what when)

Rollback capability (undo bad updates)

Historical queries ("What was policy on X in 2023?")

Versioning Strategies

:

1. Snapshot Versioning

interface

KnowledgeEntry

{

id

:

string

content

:

string

version

:

number

created_at

:

string

updated_at

:

string

updated_by

:

string

changelog

:

string

previous_version

?

:

string

// ID of prior version

}

2. Event Sourcing

interface

KnowledgeEvent

{

event_id

:

string

entity_id

:

string

event_type

:

'created'

|

'updated'

|

'deleted'

timestamp

:

string

changes

:

{

field

:

string

old_value

:

any

new_value

:

any

}

[

]

author

:

string

}

3. Git-Style Versioning

Treat knowledge like code

Commit-based changes

Branch for experimental knowledge

Merge when validated

Actions

:

Implement version tracking

Add changelog for all updates

Create rollback mechanism

Build version comparison tools

Validation

:

All changes tracked with versions

Rollback tested and working

Historical queries supported

Audit trail complete

Phase 6: Maintenance & Governance

Goal

Keep knowledge base healthy long-term

Maintenance Tasks

:

Daily:

Monitor for errors and failures

Review user feedback

Address urgent corrections

Weekly:

Review new content submissions

Update time-sensitive knowledge

Run automated quality checks

Monthly:

Audit knowledge freshness

Review and resolve conflicts

Analyze usage patterns

Update stale content

Quarterly:

Comprehensive quality audit

Schema/ontology review

Performance optimization

User satisfaction survey

Governance Framework

:

1. Roles & Responsibilities

Knowledge Owners

Domain experts responsible for content

Curators

Review and approve changes

Contributors

Submit new knowledge

Consumers

Use knowledge and provide feedback

2. Change Process

Submit → Review → Approve → Publish → Monitor

3. Quality Standards

Minimum source quality requirements

Citation requirements

Update frequency requirements

Conflict resolution process

Actions

:

Establish maintenance schedule

Assign roles and responsibilities

Create governance documentation

Train team on processes

Validation

:

Maintenance schedule in place

Governance documented and communicated

Team trained on processes

Quality trending upward

Knowledge Base Anti-Patterns

❌ Anti-Pattern 1: Data Dump Without Curation

Problem

Ingesting everything without quality filtering

Impact

Low signal-to-noise ratio, poor search results, user frustration

Solution

Curate before ingesting. Quality > Quantity

❌ Anti-Pattern 2: No Version Control

Problem

Knowledge changes but no history tracked

Impact

Can't audit changes, can't rollback errors, no accountability

Solution

Implement versioning from Phase 5

❌ Anti-Pattern 3: Stale Knowledge

Problem

Knowledge base outdated but no one knows

Impact

AI systems hallucinate using old facts, users get wrong answers

Solution

Freshness monitoring + scheduled updates

❌ Anti-Pattern 4: Duplicate Information

Problem

Same fact in multiple places, becomes inconsistent

Impact

Conflicting answers, confused users

Solution

Deduplication + single source of truth

❌ Anti-Pattern 5: No Provenance

Problem

Knowledge without source citations

Impact

Can't verify accuracy, can't trace errors

Solution

Always track source + timestamp + author

Integration with Other Skills

With rag-implementer

Use for document-based portion of hybrid KB

Follow RAG implementation phases

Integrate vector search with KB queries

With knowledge-graph-builder

Use for entity-based portion of hybrid KB

Follow graph design patterns

Integrate graph traversal with KB queries

With data-engineer

For ETL pipelines (extract, transform, load knowledge)

For data quality monitoring

For performance optimization

With quality-auditor

For automated quality checks

For testing and validation

For continuous monitoring

With technical-writer

For knowledge documentation

For user guides on KB usage

For governance documentation

Tools & Technologies

Document-Based KB Stack

Vector DB

Pinecone, Weaviate, pgvector

Embeddings

OpenAI, Cohere, custom

Search

Semantic + keyword hybrid

Entity-Based KB Stack

Graph DB

Neo4j, ArangoDB

Query

Cypher, AQL

Visualization

Neo4j Bloom, Gephi

Curation Tools

Deduplication

Custom algorithms, fuzzy matching

Conflict Detection

Rule-based, ML-based

Validation

Test question sets, human review

Monitoring

Metrics

Custom dashboard (Grafana)

Logging

Structured logging of queries/updates

Alerts

Freshness, accuracy, error rate alerts

Success Metrics

Knowledge Quality

Accuracy

>90% on test questions

Coverage

>80% of user questions answered

Freshness

<30 days average age

Consistency

<5% conflicting information

User Satisfaction

Relevance

>85% query results rated relevant

Usefulness

>80% users find KB valuable

Speed

<100ms median query time

Operational Health

Uptime

>99.9%

Update frequency

Weekly minimum

Team engagement

Regular contributions

Common Pitfalls & Solutions

Pitfall 1: "Build it and they will come"

Problem

No user validation, KB doesn't meet needs

Solution

Start with user research, validate continuously

Pitfall 2: Perfectionism

Problem

Waiting to launch until KB is "perfect"

Solution

Launch with 80% coverage, iterate based on usage

Pitfall 3: Over-engineering

Problem

Building complex hybrid system when simple docs would work

Solution

Start simple, add complexity only when needed

Pitfall 4: Maintenance neglect

Problem

Build once, never update

Solution

Establish maintenance schedule from day 1
Quick Start Checklist
Before you start:
Read this entire skill
Review
rag-implementer
if using document KB
Review
knowledge-graph-builder
if using entity KB
Have clear use case and success metrics
Phase 1 - Architecture (Week 1):
Inventory knowledge sources
Choose KB type (document/entity/hybrid)
Define schema/ontology
Set up infrastructure
Phase 2 - Initial Build (Week 2-3):
Ingest and curate initial knowledge
Implement search/query functionality
Create test question set
Validate with users
Phase 3 - Iterate (Ongoing):
Add more knowledge based on usage
Monitor quality metrics
Fix issues as discovered
Establish maintenance cadence
Related Resources
Skills
:
rag-implementer
,
knowledge-graph-builder
,
data-engineer
,
quality-auditor
MCPs
:
vector-database-mcp
,
graph-database-mcp
,
knowledge-base-mcp
,
semantic-search-mcp
Patterns
:
STANDARDS/architecture-patterns/rag-pattern.md
,
knowledge-base-pattern.md
(coming soon)
Integrations
:
INTEGRATIONS/pinecone/
,
INTEGRATIONS/graph-databases/neo4j/
Further Reading
The Knowledge Graph Cookbook
Building Knowledge Bases with LLMs
RAG: Retrieval-Augmented Generation
Knowledge Management Best Practices
Remember: A knowledge base is only as good as its curation. Invest in quality from day 1, establish maintenance processes, and iterate based on user feedback. The goal is not to have all knowledge—it's to have the right knowledge, well-organized, and easily accessible.

knowledge base manager

安装