Reduce tokens while preserving info (summarization)
Isolate
Split across sub-agents (partitioning)
Anti-Patterns
Exhaustive context over curated context
Critical info in middle positions
No compaction triggers before limits
Single agent for parallelizable tasks
Tools without clear descriptions
Guidelines
Place critical info at beginning/end of context
Implement compaction at 70-80% utilization
Use sub-agents for context isolation, not role-play
Design tools with 4-question framework (what, when, inputs, returns)
Optimize for tokens-per-task, not tokens-per-request
Validate with probe-based evaluation
Monitor KV-cache hit rates in production
Start minimal, add complexity only when proven necessary
Scripts
context_analyzer.py
- Context health analysis, degradation detection
compression_evaluator.py
- Compression quality evaluation