- LangChain Architecture
- Master the LangChain framework for building sophisticated LLM applications with agents, chains, memory, and tool integration.
- Do not use this skill when
- The task is unrelated to langchain architecture
- You need a different domain or tool outside this scope
- Instructions
- Clarify goals, constraints, and required inputs.
- Apply relevant best practices and validate outcomes.
- Provide actionable steps and verification.
- If detailed examples are required, open
- resources/implementation-playbook.md
- .
- Use this skill when
- Building autonomous AI agents with tool access
- Implementing complex multi-step LLM workflows
- Managing conversation memory and state
- Integrating LLMs with external data sources and APIs
- Creating modular, reusable LLM application components
- Implementing document processing pipelines
- Building production-grade LLM applications
- Core Concepts
- 1. Agents
- Autonomous systems that use LLMs to decide which actions to take.
- Agent Types:
- ReAct
-
- Reasoning + Acting in interleaved manner
- OpenAI Functions
-
- Leverages function calling API
- Structured Chat
-
- Handles multi-input tools
- Conversational
-
- Optimized for chat interfaces
- Self-Ask with Search
-
- Decomposes complex queries
- 2. Chains
- Sequences of calls to LLMs or other utilities.
- Chain Types:
- LLMChain
-
- Basic prompt + LLM combination
- SequentialChain
-
- Multiple chains in sequence
- RouterChain
-
- Routes inputs to specialized chains
- TransformChain
-
- Data transformations between steps
- MapReduceChain
-
- Parallel processing with aggregation
- 3. Memory
- Systems for maintaining context across interactions.
- Memory Types:
- ConversationBufferMemory
-
- Stores all messages
- ConversationSummaryMemory
-
- Summarizes older messages
- ConversationBufferWindowMemory
-
- Keeps last N messages
- EntityMemory
-
- Tracks information about entities
- VectorStoreMemory
-
- Semantic similarity retrieval
- 4. Document Processing
- Loading, transforming, and storing documents for retrieval.
- Components:
- Document Loaders
-
- Load from various sources
- Text Splitters
-
- Chunk documents intelligently
- Vector Stores
-
- Store and retrieve embeddings
- Retrievers
-
- Fetch relevant documents
- Indexes
- Organize documents for efficient access 5. Callbacks Hooks for logging, monitoring, and debugging. Use Cases: Request/response logging Token usage tracking Latency monitoring Error handling Custom metrics collection Quick Start from langchain . agents import AgentType , initialize_agent , load_tools from langchain . llms import OpenAI from langchain . memory import ConversationBufferMemory
Initialize LLM
llm
OpenAI ( temperature = 0 )
Load tools
tools
load_tools ( [ "serpapi" , "llm-math" ] , llm = llm )
Add memory
memory
ConversationBufferMemory ( memory_key = "chat_history" )
Create agent
agent
initialize_agent ( tools , llm , agent = AgentType . CONVERSATIONAL_REACT_DESCRIPTION , memory = memory , verbose = True )
Run agent
result
agent . run ( "What's the weather in SF? Then calculate 25 * 4" ) Architecture Patterns Pattern 1: RAG with LangChain from langchain . chains import RetrievalQA from langchain . document_loaders import TextLoader from langchain . text_splitter import CharacterTextSplitter from langchain . vectorstores import Chroma from langchain . embeddings import OpenAIEmbeddings
Load and process documents
loader
TextLoader ( 'documents.txt' ) documents = loader . load ( ) text_splitter = CharacterTextSplitter ( chunk_size = 1000 , chunk_overlap = 200 ) texts = text_splitter . split_documents ( documents )
Create vector store
embeddings
OpenAIEmbeddings ( ) vectorstore = Chroma . from_documents ( texts , embeddings )
Create retrieval chain
qa_chain
RetrievalQA . from_chain_type ( llm = llm , chain_type = "stuff" , retriever = vectorstore . as_retriever ( ) , return_source_documents = True )
Query
result
qa_chain ( { "query" : "What is the main topic?" } ) Pattern 2: Custom Agent with Tools from langchain . agents import Tool , AgentExecutor from langchain . agents . react . base import ReActDocstoreAgent from langchain . tools import tool @tool def search_database ( query : str ) -
str : """Search internal database for information."""
Your database search logic
return f"Results for: { query } " @tool def send_email ( recipient : str , content : str ) -
str : """Send an email to specified recipient."""
Email sending logic
return f"Email sent to { recipient } " tools = [ search_database , send_email ] agent = initialize_agent ( tools , llm , agent = AgentType . ZERO_SHOT_REACT_DESCRIPTION , verbose = True ) Pattern 3: Multi-Step Chain from langchain . chains import LLMChain , SequentialChain from langchain . prompts import PromptTemplate
Step 1: Extract key information
extract_prompt
PromptTemplate ( input_variables = [ "text" ] , template = "Extract key entities from: {text}\n\nEntities:" ) extract_chain = LLMChain ( llm = llm , prompt = extract_prompt , output_key = "entities" )
Step 2: Analyze entities
analyze_prompt
PromptTemplate ( input_variables = [ "entities" ] , template = "Analyze these entities: {entities}\n\nAnalysis:" ) analyze_chain = LLMChain ( llm = llm , prompt = analyze_prompt , output_key = "analysis" )
Step 3: Generate summary
summary_prompt
PromptTemplate ( input_variables = [ "entities" , "analysis" ] , template = "Summarize:\nEntities: {entities}\nAnalysis: {analysis}\n\nSummary:" ) summary_chain = LLMChain ( llm = llm , prompt = summary_prompt , output_key = "summary" )
Combine into sequential chain
overall_chain
SequentialChain ( chains = [ extract_chain , analyze_chain , summary_chain ] , input_variables = [ "text" ] , output_variables = [ "entities" , "analysis" , "summary" ] , verbose = True ) Memory Management Best Practices Choosing the Right Memory Type
For short conversations (< 10 messages)
from langchain . memory import ConversationBufferMemory memory = ConversationBufferMemory ( )
For long conversations (summarize old messages)
from langchain . memory import ConversationSummaryMemory memory = ConversationSummaryMemory ( llm = llm )
For sliding window (last N messages)
from langchain . memory import ConversationBufferWindowMemory memory = ConversationBufferWindowMemory ( k = 5 )
For entity tracking
from langchain . memory import ConversationEntityMemory memory = ConversationEntityMemory ( llm = llm )
For semantic retrieval of relevant history
from langchain . memory import VectorStoreRetrieverMemory memory = VectorStoreRetrieverMemory ( retriever = retriever ) Callback System Custom Callback Handler from langchain . callbacks . base import BaseCallbackHandler class CustomCallbackHandler ( BaseCallbackHandler ) : def on_llm_start ( self , serialized , prompts , ** kwargs ) : print ( f"LLM started with prompts: { prompts } " ) def on_llm_end ( self , response , ** kwargs ) : print ( f"LLM ended with response: { response } " ) def on_llm_error ( self , error , ** kwargs ) : print ( f"LLM error: { error } " ) def on_chain_start ( self , serialized , inputs , ** kwargs ) : print ( f"Chain started with inputs: { inputs } " ) def on_agent_action ( self , action , ** kwargs ) : print ( f"Agent taking action: { action } " )
Use callback
agent . run ( "query" , callbacks = [ CustomCallbackHandler ( ) ] ) Testing Strategies import pytest from unittest . mock import Mock def test_agent_tool_selection ( ) :
Mock LLM to return specific tool selection
mock_llm
Mock ( ) mock_llm . predict . return_value = "Action: search_database\nAction Input: test query" agent = initialize_agent ( tools , mock_llm , agent = AgentType . ZERO_SHOT_REACT_DESCRIPTION ) result = agent . run ( "test query" )
Verify correct tool was selected
assert "search_database" in str ( mock_llm . predict . call_args ) def test_memory_persistence ( ) : memory = ConversationBufferMemory ( ) memory . save_context ( { "input" : "Hi" } , { "output" : "Hello!" } ) assert "Hi" in memory . load_memory_variables ( { } ) [ 'history' ] assert "Hello!" in memory . load_memory_variables ( { } ) [ 'history' ] Performance Optimization 1. Caching from langchain . cache import InMemoryCache import langchain langchain . llm_cache = InMemoryCache ( ) 2. Batch Processing
Process multiple documents in parallel
- from
- langchain
- .
- document_loaders
- import
- DirectoryLoader
- from
- concurrent
- .
- futures
- import
- ThreadPoolExecutor
- loader
- =
- DirectoryLoader
- (
- './docs'
- )
- docs
- =
- loader
- .
- load
- (
- )
- def
- process_doc
- (
- doc
- )
- :
- return
- text_splitter
- .
- split_documents
- (
- [
- doc
- ]
- )
- with
- ThreadPoolExecutor
- (
- max_workers
- =
- 4
- )
- as
- executor
- :
- split_docs
- =
- list
- (
- executor
- .
- map
- (
- process_doc
- ,
- docs
- )
- )
- 3. Streaming Responses
- from
- langchain
- .
- callbacks
- .
- streaming_stdout
- import
- StreamingStdOutCallbackHandler
- llm
- =
- OpenAI
- (
- streaming
- =
- True
- ,
- callbacks
- =
- [
- StreamingStdOutCallbackHandler
- (
- )
- ]
- )
- Resources
- references/agents.md
-
- Deep dive on agent architectures
- references/memory.md
-
- Memory system patterns
- references/chains.md
-
- Chain composition strategies
- references/document-processing.md
-
- Document loading and indexing
- references/callbacks.md
-
- Monitoring and observability
- assets/agent-template.py
-
- Production-ready agent template
- assets/memory-config.yaml
-
- Memory configuration examples
- assets/chain-example.py
-
- Complex chain examples
- Common Pitfalls
- Memory Overflow
-
- Not managing conversation history length
- Tool Selection Errors
-
- Poor tool descriptions confuse agents
- Context Window Exceeded
-
- Exceeding LLM token limits
- No Error Handling
-
- Not catching and handling agent failures
- Inefficient Retrieval
- Not optimizing vector store queries Production Checklist Implement proper error handling Add request/response logging Monitor token usage and costs Set timeout limits for agent execution Implement rate limiting Add input validation Test with edge cases Set up observability (callbacks) Implement fallback strategies Version control prompts and configurations