Caveman Token Optimizer Skill by ara.so — Daily 2026 Skills collection. A Claude Code skill and Codex plugin that makes AI agents respond in compressed caveman-speak — cutting ~65% of output tokens on average (up to 87%) while keeping full technical accuracy. No pleasantries. No filler. Just answer. What It Does Caveman mode strips: Pleasantries: "Sure, I'd be happy to help!" → gone Hedging: "It might be worth considering" → gone Articles (a, an, the) → gone Verbose transitions → gone Caveman keeps: All code blocks (written normally) Technical terms (exact: useMemo , polymorphism , middleware ) Error messages (quoted exactly) Git commits and PR descriptions (normal) Same fix. 75% less word. Brain still big. Install Claude Code (npx) npx skills add JuliusBrussee/caveman Claude Code (plugin system) claude plugin marketplace add JuliusBrussee/caveman claude plugin install caveman@caveman Codex Clone the repo Open Codex inside the repo Run /plugins Search Caveman Install plugin Install once. Works in all sessions after that. Manual / Local git clone https://github.com/JuliusBrussee/caveman.git cd caveman pip install -e . Usage — Trigger Commands Claude Code /caveman # enable default (full) caveman mode /caveman lite # professional brevity, grammar intact /caveman full # default — drop articles, use fragments /caveman ultra # maximum compression, telegraphic Codex $caveman $caveman lite $caveman full $caveman ultra Natural language triggers Any of these phrases activate caveman mode: "talk like caveman" "caveman mode" "less tokens please" "be concise" Disable /caveman off
or say: "stop caveman" / "normal mode"
Level sticks until changed or session ends. Intensity Levels Level Trigger Style Example Lite /caveman lite Drop filler, keep grammar "Component re-renders because inline object prop creates new reference each cycle. Wrap in useMemo ." Full /caveman full Drop articles, use fragments "New object ref each render. Inline prop = new ref = re-render. Wrap in useMemo ." Ultra /caveman ultra Telegraphic, abbreviate everything "Inline obj prop → new ref → re-render. useMemo ." Benchmark Results Real token counts from Claude API (reproducible via benchmarks/ directory): Task Normal Caveman Saved React re-render bug 1180 159 87% Auth middleware fix 704 121 83% PostgreSQL pool setup 2347 380 84% Git rebase vs merge 702 292 58% Async/await refactor 387 301 22% Docker multi-stage build 1042 290 72% Average 1214 294 65% Important: Caveman only affects output tokens. Thinking/reasoning tokens are untouched. Caveman make mouth smaller, not brain. Reproducing Benchmarks git clone https://github.com/JuliusBrussee/caveman.git cd caveman/benchmarks
Set your Anthropic API key
export ANTHROPIC_API_KEY = your_key_here
Run benchmark suite
python run_benchmarks.py
Compare normal vs caveman responses
python compare.py
--task
react-rerender
python compare.py
--task
auth-middleware
python compare.py
--all
Code Examples — What Caveman Mode Changes
Before (normal, 69 tokens)
The reason your React component is re-rendering is likely because
you're creating a new object reference on each render cycle. When
you pass an inline object as a prop, React's shallow comparison
sees it as a different object every time, which triggers a
re-render. I'd recommend using useMemo to memoize the object.
After (caveman full, 19 tokens)
New object ref each render. Inline object prop = new ref = re-render.
Wrap in useMemo.
Code blocks stay normal — caveman not stupid
Caveman explains in grunt, but code stays clean:
"Token expiry check broken. Fix:"
def verify_token ( token : str ) -
bool : payload = jwt . decode ( token , SECRET_KEY , algorithms = [ "HS256" ] )
Was: payload["exp"] < time.time()
Fix:
return payload [ "exp" ]
= time . time ( ) What Caveman Preserves vs. Removes
Tokens caveman REMOVES (waste):
filler_phrases
[ "I'd be happy to help you with that" ,
8 tokens gone
"The reason this is happening is because" ,
7 tokens gone
"I would recommend that you consider" ,
7 tokens gone
"Sure, let me take a look at that" ,
8 tokens gone
"Great question!" ,
2 tokens gone
"Certainly!" ,
1 token gone
]
Things caveman KEEPS (substance):
preserved
[ "code blocks" ,
always normal
"technical_terms" ,
exact spelling preserved
"error_messages" ,
quoted verbatim
"variable_names" ,
exact
"git_commits" ,
normal prose
"pr_descriptions" ,
normal prose
] Integration Pattern — Using in a Project If you want caveman-style compression in your own Claude API calls: import anthropic client = anthropic . Anthropic ( )
uses ANTHROPIC_API_KEY env var
Load the caveman SKILL.md as a system prompt addition
with open ( "path/to/caveman/SKILL.md" , "r" ) as f : caveman_skill = f . read ( ) response = client . messages . create ( model = "claude-opus-4-5" , max_tokens = 1024 , system = f" { caveman_skill } \n\nRespond in caveman mode: full intensity." , messages = [ { "role" : "user" , "content" : "Why is my React component re-rendering?" } ] ) print ( response . content [ 0 ] . text )
→ "New object ref each render. Inline prop = new ref = re-render. useMemo fix."
print ( f"Tokens used: { response . usage . output_tokens } " )
~19 vs ~69
Session Workflow
Start session with caveman
/caveman full
Ask technical questions normally — agent responds in caveman
Why does my Docker build take so long? → "Layer cache miss. COPY before RUN npm install. Fix order:" [code block shown normally]
Switch intensity mid-session
/caveman lite
Turn off for PR description writing
/caveman off
Write a PR description for this auth fix → [normal, professional prose]
Back to caveman
/caveman Troubleshooting Caveman mode not activating:
Verify plugin installed
claude plugin list | grep caveman
Reinstall
- claude plugin remove caveman
- claude plugin
- install
- caveman@caveman
- Savings lower than expected:
- Caveman only compresses
- output
- tokens — input tokens unchanged
- Tasks with heavy code output (like Docker setup) see less savings since code is preserved verbatim
- Reasoning/thinking tokens not affected — savings show in visible response only
- Ultra mode gets maximum compression; switch if full mode feels verbose
- Need normal mode for specific output:
- /caveman off # for PR descriptions, user-facing docs, formal reports
- /caveman # re-enable after
- Benchmarking your own tasks:
- cd
- benchmarks/
- export
- ANTHROPIC_API_KEY
- =
- your_key_here
- python run_benchmarks.py
- --task
- "your custom task description"
- Why It Works
- Backed by a March 2026 paper
- "Brevity Constraints Reverse Performance Hierarchies in Language Models"
- constraining large models to brief responses improved accuracy by 26 percentage points on certain benchmarks. Verbose not always better. TOKENS SAVED ████████ 65% avg (up to 87%) TECHNICAL ACCURACY ████████ 100% RESPONSE SPEED ████████ faster (less to generate) READABILITY ████████ better (no wall of text) Key Files caveman/ ├── SKILL.md # the skill definition loaded by Claude Code ├── benchmarks/ │ ├── run_benchmarks.py # reproduce token count results │ └── compare.py # side-by-side comparison tool ├── plugin.json # Codex plugin manifest └── README.md Links Repo: https://github.com/JuliusBrussee/caveman Homepage: https://juliusbrussee.github.io/caveman/ Also by Julius: Blueprint — spec-driven dev for Claude Code One rock. That it. 🪨