rag-agent-builder

安装量: 66
排名: #11529

安装

npx skills add https://github.com/qodex-ai/ai-agent-skills --skill rag-agent-builder

RAG Agent Builder

Build powerful Retrieval-Augmented Generation (RAG) applications that enhance LLM capabilities with external knowledge sources, enabling accurate, contextualized AI responses.

Quick Start

Get started with RAG implementations in the examples and utilities:

Examples: See examples/ directory for complete implementations:

basic_rag.py - Simple chunk-embed-retrieve-generate pipeline retrieval_strategies.py - Hybrid search, reranking, and filtering agentic_rag.py - Agent-controlled retrieval with iterative refinement

Utilities: See scripts/ directory for helper modules:

embedding_management.py - Embedding generation, normalization, and caching vector_db_manager.py - Vector database abstraction and factory rag_evaluation.py - Retrieval and answer quality metrics Overview

RAG systems combine three key components:

Document Retrieval - Find relevant information from knowledge bases Context Integration - Pass retrieved context to the LLM Response Generation - Generate answers grounded in the retrieved information

This skill covers building production-ready RAG applications with various frameworks and approaches.

Core Concepts What is RAG?

RAG augments LLM knowledge with external data:

Without RAG: LLM relies on training data (may be outdated or limited) With RAG: LLM uses real-time, custom knowledge + training knowledge When to Use RAG Document Q&A: Answer questions about PDFs, books, reports Knowledge Base Search: Query internal documentation, wikis Enterprise Search: Search proprietary company data Context-Specific Assistants: Customer support, HR assistants Fact-Heavy Applications: Legal docs, medical records, financial data When RAG Might Not Be Needed General knowledge questions (ChatGPT-like) Real-time data that changes constantly (use tools instead) Very simple lookup tasks (use database queries) Architecture Patterns Basic RAG Pipeline Documents → Chunks → Embeddings → Vector DB ↓ User Question → Embedding → Retrieval → LLM → Answer ↑ ↓ Vector DB Context

Advanced RAG Patterns 1. Agentic RAG Agent decides what to retrieve and when Can refine queries iteratively Better for complex reasoning 2. Hierarchical RAG Multi-level document structure Search at different levels of detail More flexible organization 3. Hybrid Search RAG Combines keyword search (BM25) + semantic search (embeddings) Captures both exact matches and meaning Better for mixed query types 4. Corrective RAG (CRAG) Evaluates retrieved documents for relevance Retrieves additional sources if needed Ensures high-quality context Implementation Components 1. Document Processing

Chunking Strategies:

Simple fixed-size chunks

chunks = split_text(doc, chunk_size=1000, overlap=100)

Semantic chunks (group by meaning)

chunks = semantic_chunking(doc, max_tokens=512)

Hierarchical chunks (different levels)

chapters = split_by_heading(doc) chunks = split_each_chapter(chapters, size=1000)

Key Considerations:

Chunk size affects retrieval quality and cost Overlap helps maintain context between chunks Semantic chunking preserves meaning better 2. Embedding Generation

Popular Embedding Models:

OpenAI: text-embedding-3-small, text-embedding-3-large Open Source: all-MiniLM-L6-v2, all-mpnet-base-v2 Domain-Specific: Domain-trained embeddings for specialized knowledge

Best Practices:

Use consistent embedding model for retrieval and queries Store embeddings with normalized vectors Update embeddings when documents change 3. Vector Databases

Popular Options:

Pinecone: Managed, serverless, easy to scale Weaviate: Open-source, self-hosted, flexible Milvus: Open-source, high performance Chroma: Lightweight, good for prototypes Qdrant: Production-grade, high-performance

Selection Criteria:

Scale requirements (data volume, queries per second) Latency needs (real-time vs batch) Cost considerations Deployment preferences (managed vs self-hosted) 4. Retrieval Strategies

Retrieval Methods:

Similarity search (most common)

results = vector_db.query(question_embedding, k=5)

Hybrid search (keyword + semantic)

keyword_results = bm25.search(question, k=3) semantic_results = vector_db.query(embedding, k=3) results = combine_and_rank(keyword_results, semantic_results)

Reranking (improve relevance)

retrieved = initial_retrieval(query) reranked = rerank_by_relevance(retrieved, query)

Retrieval Parameters:

k (number of results): Balance between context and relevance Similarity threshold: Filter out low-relevance results Diversity: Return varied results vs best matches 5. Context Integration

Context Window Management:

Fit retrieved documents into context window

def prepare_context(retrieved_docs, max_tokens=3000): context = "" for doc in retrieved_docs: if len(tokenize(context + doc)) <= max_tokens: context += doc else: break return context

Prompt Design:

You are a helpful assistant. Answer the question based on the provided context.

Context:

Question: {user_question}

Answer:

  1. Response Generation

Generation Strategies:

Direct Generation: LLM answers from context Summarization: Summarize multiple retrieved docs first Fact-Grounding: Ensure answer cites sources Iterative Refinement: Refine based on user feedback Implementation Patterns Pattern 1: Basic RAG

Simplest RAG implementation:

Split documents into chunks Generate embeddings for each chunk Store in vector database Retrieve top-k similar chunks for query Pass to LLM with context

Pros: Simple, fast, works well for straightforward QA Cons: May miss relevant context, no refinement

Pattern 2: Agentic RAG

Agent controls retrieval:

Agent receives user question Decides whether to retrieve documents Formulates retrieval query (may differ from original) Retrieves relevant documents Can iterate or use tools Generates final answer

Pros: Better for complex questions, iterative improvement Cons: More complex, higher costs

Pattern 3: Corrective RAG (CRAG)

Validates retrieved documents:

Retrieve documents for question Grade each document for relevance If poor relevance: Try different retrieval strategy Expand search scope Retrieve from different sources Generate answer from validated context

Pros: Higher quality answers, adapts to failures Cons: More API calls, slower

Popular Frameworks LangChain from langchain.document_loaders import PDFLoader from langchain.embeddings import OpenAIEmbeddings from langchain.vectorstores import Pinecone from langchain.chains import RetrievalQA

Load documents

loader = PDFLoader("document.pdf") docs = loader.load()

Create RAG chain

embeddings = OpenAIEmbeddings() vectorstore = Pinecone.from_documents(docs, embeddings) qa = RetrievalQA.from_chain_type( llm=ChatOpenAI(), chain_type="stuff", retriever=vectorstore.as_retriever() )

answer = qa.run("What is the document about?")

LlamaIndex from llama_index import GPTVectorStoreIndex, SimpleDirectoryReader

Load documents

documents = SimpleDirectoryReader("./data").load_data()

Create index

index = GPTVectorStoreIndex.from_documents(documents)

Query

response = index.as_query_engine().query("What is the main topic?")

CrewAI with RAG from crewai import Agent, Task, Crew from tools import retrieval_tool

researcher = Agent( role="Research Assistant", goal="Research topics using knowledge base", tools=[retrieval_tool] )

research_task = Task( description="Research the topic: {topic}", agent=researcher )

Best Practices Document Preparation ✓ Clean and normalize text (remove headers, footers) ✓ Preserve document structure when possible ✓ Add metadata (source, date, category) ✓ Handle PDFs with OCR if scanned ✓ Test chunk sizes for your domain Embedding Strategy ✓ Use same embedding model for indexing and queries ✓ Fine-tune embeddings for domain-specific needs ✓ Normalize embeddings for consistency ✓ Monitor embedding quality metrics Retrieval Optimization ✓ Tune k (number of results) for your use case ✓ Use reranking for quality improvement ✓ Implement relevance filtering ✓ Monitor retrieval precision and recall ✓ Cache frequently retrieved documents Generation Quality ✓ Include source citations in answers ✓ Prompt LLM to indicate confidence ✓ Ask to cite specific documents ✓ Generate summaries for long contexts ✓ Validate answers against context Monitoring & Evaluation ✓ Track retrieval metrics (precision, recall, MRR) ✓ Monitor answer quality and relevance ✓ Log failed retrievals for improvement ✓ Collect user feedback ✓ Iterate based on failures Common Challenges & Solutions Challenge: Irrelevant Retrieval

Solutions:

Improve chunking strategy Better embedding model Add document metadata to queries Implement reranking Use hybrid search Challenge: Context Too Large

Solutions:

Reduce chunk size Retrieve fewer results (smaller k) Summarize retrieved context Use hierarchical retrieval Filter by relevance score Challenge: Missing Information

Solutions:

Increase k (retrieve more) Improve embedding model Better preprocessing Use multiple search strategies Add document hierarchy Challenge: Slow Performance

Solutions:

Use managed vector database Cache embeddings Batch process documents Optimize chunk size Use smaller embedding model for speed Evaluation Metrics

Retrieval Metrics:

Precision: % of retrieved docs that are relevant Recall: % of relevant docs that are retrieved MRR (Mean Reciprocal Rank): Rank of first relevant result NDCG (Normalized DCG): Quality of ranking

Answer Quality Metrics:

Relevance: Does answer address the question? Correctness: Is the answer factually accurate? Grounding: Is answer supported by context? User Satisfaction: Would user find answer helpful? Advanced Techniques 1. Query Expansion

Expand query with related terms

expanded_query = query + " " + synonym_expansion(query) results = retrieve(expanded_query)

  1. Document Compression

Compress retrieved docs before passing to LLM

compressed = compress_documents(retrieved_docs, query) context = format_context(compressed)

  1. Active Retrieval

Iteratively refine retrieval based on LLM output

query = user_question while iterations < max: results = retrieve(query) answer = generate_with_context(results) if answer_complete(answer): break query = refine_query(answer)

  1. Multi-Modal RAG

Retrieve both text and images

text_results = text_retriever.query(question) image_results = image_retriever.query(question) context = combine_multimodal(text_results, image_results)

Resources & References Key Papers "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks" (Lewis et al.) "REALM: Retrieval-Augmented Language Model Pre-Training" (Guu et al.) Frameworks LangChain: https://python.langchain.com/ LlamaIndex: https://www.llamaindex.ai/ HayStack: https://haystack.deepset.ai/ Vector Databases Pinecone: https://www.pinecone.io/ Weaviate: https://weaviate.io/ Qdrant: https://qdrant.tech/ Embedding Models OpenAI: https://platform.openai.com/docs/guides/embeddings Hugging Face: https://huggingface.co/models?pipeline_tag=sentence-similarity Next Steps Choose your stack: Decide on framework (LangChain, LlamaIndex, etc.) Prepare documents: Process and chunk your knowledge base Select embeddings: Choose embedding model for your domain Pick vector DB: Select storage solution for scale Build pipeline: Implement retrieval and generation Evaluate: Test on sample questions and iterate Monitor: Track quality metrics in production

返回排行榜