RAG & Search Engineering — Complete Reference

Build production-grade retrieval systems with hybrid search, grounded generation, and measurable quality.

This skill covers:

RAG: Chunking, contextual retrieval, grounding, adaptive/self-correcting systems Search: BM25, vector search, hybrid fusion, ranking pipelines Evaluation: recall@k, nDCG, MRR, groundedness metrics

Modern Best Practices (Jan 2026):

Separate retrieval quality from answer quality; evaluate both (RAG: https://arxiv.org/abs/2005.11401). Default to hybrid retrieval (sparse + dense) with reranking when precision matters (DPR: https://arxiv.org/abs/2004.04906). Use a failure taxonomy to debug systematically (Seven Failure Points in RAG: https://arxiv.org/abs/2401.05856). Treat freshness/invalidation as first-class; staleness is a correctness bug, not a UX issue. Add grounding gates: answerability checks, citation coverage checks, and refusal-on-missing-context defaults. Threat-model RAG: retrieved text is untrusted input (OWASP LLM Top 10: https://owasp.org/www-project-top-10-for-large-language-model-applications/).

Default posture: deterministic pipeline, bounded context, explicit failure handling, and telemetry for every stage.

Scope note: For prompt structure and output contracts used in the generation phase, see ai-prompt-engineering.

Quick Reference Task Tool/Framework Command/Pattern When to Use Decide RAG vs alternatives Decision framework RAG if: freshness + citations + corpus size; else: fine-tune/caching Avoid unnecessary retrieval latency/complexity Chunking & parsing Chunker + parser Start simple; add structure-aware chunking per doc type Ingestion for docs, code, tables, PDFs Retrieval Sparse + dense (hybrid) Fusion (e.g., RRF) + metadata filters + top-k tuning Mixed query styles; high recall requirements Precision boost Reranker Cross-encoder/LLM rerank of top-k candidates When top-k contains near-misses/noise Grounding Output contract + citations Quote/ID citations; answerability gate; refuse on missing evidence Compliance, trust, and auditability Evaluation Offline + online eval Retrieval metrics + answer metrics + regression tests Prevent silent regressions and staleness failures Decision Tree: RAG Architecture Selection Building RAG system: [Architecture Path] ├─ Document type? │ ├─ Page/section-structured? → Structure-aware chunking (pages/sections + metadata) │ ├─ Technical docs/code? → Structure-aware + code-aware chunking (symbols, headers) │ └─ Simple content? → Fixed-size token chunking with overlap (baseline) │ ├─ Retrieval accuracy low? │ ├─ Query ambiguity? → Query rewriting + multi-query expansion + filters │ ├─ Noisy results? → Add reranker + better metadata filters │ └─ Mixed queries? → Hybrid retrieval (sparse + dense) + reranking │ ├─ Dataset size? │ ├─ <100k chunks? → Flat index (exact search) │ ├─ 100k-10M? → HNSW (low latency) │ └─ >10M? → IVF/ScaNN/DiskANN (scalable) │ └─ Production quality? └─ Add: ACLs, freshness/invalidation, eval gates, and telemetry (end-to-end)

Core Concepts (Vendor-Agnostic) Pipeline stages: ingest → chunk → embed → index → retrieve → rerank → pack context → generate → verify. Two evaluation planes: retrieval relevance (did we fetch the right evidence?) vs generation fidelity (did we use it correctly?). Freshness model: staleness budget, invalidation triggers, and rebuild strategy (incremental vs full). Trust boundaries: retrieved content is untrusted; apply the same rigor as user input (OWASP LLM Top 10: https://owasp.org/www-project-top-10-for-large-language-model-applications/). Implementation Practices (Tooling Examples) Use a retrieval API contract: query, filters, top_k, trace_id, and returned evidence IDs. Instrument each stage with tracing/metrics (OpenTelemetry GenAI semantic conventions: https://opentelemetry.io/docs/specs/semconv/gen-ai/). Add caches deliberately: embeddings cache, retrieval cache (query+filters), and response cache (with invalidation). Do / Avoid

Do keep retrieval deterministic: fixed top_k, stable ranking, explicit filters. Do enforce document-level ACLs at retrieval time (not only at generation time). Do include citations with stable IDs and verify citation coverage in tests.

Avoid

Avoid shipping RAG without a test set and regression gate. Avoid "stuff everything" context packing; it increases cost and can reduce accuracy. Avoid mixing corpora without metadata and tenant isolation. When to Use This Skill

Use this skill when the user asks:

"Help me design a RAG pipeline." "How should I chunk this document?" "Optimize retrieval for my use case." "My RAG system is hallucinating — fix it." "Choose the right vector database / index type." "Create a RAG evaluation framework." "Debug why retrieval gives irrelevant results." Tool/Model Recommendation Protocol

When users ask for vendor/model/framework recommendations, validate claims against current primary sources.

Triggers "What's the best vector database for [use case]?" "What should I use for [chunking/embedding/reranking]?" "What's the latest in RAG development?" "Current best practices for [retrieval/grounding/evaluation]?" "Is [Pinecone/Qdrant/Chroma] still relevant in 2026?" "[Vector DB A] vs [Vector DB B]?" "Best embedding model for [use case]?" "What RAG framework should I use?" Required Checks Read data/sources.json and start from sources with "add_as_web_search": true. Verify 1-2 primary docs per recommendation (release notes, benchmarks, docs). If browsing isn't available, state assumptions and give a verification checklist. What to Report

After checking, provide:

Current landscape: What vector DBs/embeddings are popular NOW (not 6 months ago) Emerging trends: Techniques gaining traction (late interaction, agentic RAG, graph RAG) Deprecated/declining: Approaches or tools losing relevance Recommendation: Based on fresh data, not just static knowledge Example Topics (verify with current sources) Vector databases (Pinecone, Qdrant, Weaviate, Milvus, pgvector, LanceDB) Embedding models (OpenAI, Cohere, Voyage AI, Jina, Sentence Transformers) Reranking (Cohere Rerank, Jina Reranker, FlashRank, RankGPT) RAG frameworks (LlamaIndex, LangChain, Haystack, txtai) Advanced RAG (contextual retrieval, agentic RAG, graph RAG, CRAG) Evaluation (RAGAS, TruLens, DeepEval, BEIR) Related Skills

For adjacent topics, reference these skills:

ai-llm - Prompting, fine-tuning, instruction datasets ai-agents - Agentic RAG workflows and tool routing ai-llm-inference - Serving performance, quantization, batching ai-mlops - Deployment, monitoring, security, privacy, and governance ai-prompt-engineering - Prompt patterns for RAG generation phase Templates System Design (Start Here) RAG System Design Chunking & Ingestion Basic Chunking Code Chunking Long Document Chunking Embedding & Indexing Index Configuration Metadata Schema Retrieval & Reranking Retrieval Pipeline Hybrid Search Reranking Ranking Pipeline Reranker Context Packaging & Grounding Context Packing Grounding Evaluation RAG Evaluation RAG Test Set Search Evaluation Search Test Set Search Configuration BM25 Configuration HNSW Configuration IVF Configuration Hybrid Configuration Query Rewriting Query Rewrite Navigation

Resources

references/advanced-rag-patterns.md references/agentic-rag-patterns.md references/bm25-tuning.md references/chunking-patterns.md references/chunking-strategies.md references/rag-evaluation-guide.md references/rag-troubleshooting.md references/contextual-retrieval-guide.md references/distributed-search-slos.md references/grounding-checklists.md references/hybrid-fusion-patterns.md references/index-selection-guide.md references/multilingual-domain-patterns.md references/pipeline-architecture.md references/query-rewriting-patterns.md references/ranking-pipeline-guide.md references/retrieval-patterns.md references/search-debugging.md references/search-evaluation-guide.md references/user-feedback-learning.md references/vector-search-patterns.md

Templates

assets/context/template-context-packing.md assets/context/template-grounding.md assets/design/rag-system-design.md assets/chunking/template-basic-chunking.md assets/chunking/template-code-chunking.md assets/chunking/template-long-doc-chunking.md assets/retrieval/template-retrieval-pipeline.md assets/retrieval/template-hybrid-search.md assets/retrieval/template-reranking.md assets/eval/template-rag-eval.md assets/eval/template-rag-testset.jsonl assets/eval/template-search-eval.md assets/eval/template-search-testset.jsonl assets/indexing/template-index-config.md assets/indexing/template-metadata-schema.md assets/query/template-query-rewrite.md assets/ranking/template-ranking-pipeline.md assets/ranking/template-reranker.md assets/search/template-bm25-config.md assets/search/template-hnsw-config.md assets/search/template-ivf-config.md assets/search/template-hybrid-config.md

Data

data/sources.json — Curated external references

Use this skill whenever the user needs retrieval-augmented system design or debugging, not prompt work or deployment.

ai-rag

安装