Tools
Cut AI Token Costs by 65% with Caveman: The Viral Skill That Makes Claude Code Speak Caveman
Caveman is the viral 66K-star GitHub repo that slashes Claude Code token usage by 65% by making the AI speak like a caveman — same technical accuracy, dramatically lower costs.

Cut AI Token Costs by 65% with Caveman: The Viral Skill That Makes Claude Code Speak Caveman
A single GitHub repository has taken the AI coding world by storm, amassing over 66,000 stars in under two months. It's called **Caveman**, and its tagline says it all: *"why use many token when few do trick."*
Caveman is a skill for Claude Code — and 30+ other AI coding agents like Codex, Gemini CLI, Cursor, and Copilot — that compresses every reply by stripping filler language while preserving full technical accuracy. The benchmark results are striking: an average **65% reduction in output tokens**, with some tasks seeing up to 87% savings. And since most AI coding tools bill by the token, that saving goes straight to your bottom line.
What Makes Caveman Different from "Just Ask It to Be Concise"
You might wonder: *can't I just tell my AI assistant to be brief?* Surprisingly, no — at least not with the same effectiveness. The Caveman project ran a three-arm evaluation harness comparing:
1. **Baseline** — default verbose responses
2. **Terse** — a simple "answer concisely" instruction
3. **Caveman skill** — the full compressed-response ruleset
Caveman consistently outperformed a simple terse instruction, cutting **3x more tokens** on average. The difference comes down to how the skill structures the system prompt — it's not just asking for brevity, it's enforcing a specific communication pattern that the model can follow consistently.
A March 2026 paper, *"Brevity Constraints Reverse Performance Hierarchies in Language Models"* (arXiv:2604.00025), found that constraining large models to brief responses actually **improved accuracy by 26 points** on certain benchmarks. Verbose isn't always better. Sometimes fewer words equals more correct answers.
How Caveman Works
Caveman works as a skill file that your AI coding agent reads at startup. When installed, it changes the agent's output style from this:
**Normal Claude (69 tokens):** "The reason your React component is re-rendering is likely because you're creating a new object reference on each render cycle. When you pass an inline object as a prop, React's shallow comparison sees it as a different object every time..."
**Caveman Claude (19 tokens):** "New object ref each render. Inline object prop = new ref = re-render. Wrap in `useMemo`."
Same fix. 75% fewer tokens. Full technical accuracy maintained.
The skill offers four compression levels:
| Level | Style | Token Savings |
|-------|-------|---------------|
| **Lite** | Drops filler words and pleasantries | ~30-40% |
| **Full** | Full caveman — fragments, no fluff | ~65% (default) |
| **Ultra** | Telegraphic — extreme compression | ~75-80% |
| **Wenyan** | Classical Chinese — even shorter | Variable |
Switch between them with a single command: `/caveman ultra` or `/caveman lite`.
Real-World Benchmarks
These savings are not theoretical. The project publishes reproducible benchmarks from real Claude API calls:
| Task | Normal Tokens | Caveman Tokens | Saved |
|------|:------------:|:-------------:|:----:|
| Explain React re-render bug | 1,180 | 159 | **87%** |
| Fix auth middleware token expiry | 704 | 121 | **83%** |
| Docker multi-stage build | 1,042 | 290 | **72%** |
| Debug PostgreSQL race condition | 1,200 | 232 | **81%** |
| Implement React error boundary | 3,454 | 456 | **87%** |
| **Average across 10 tasks** | **1,214** | **294** | **65%** |
Beyond output compression, Caveman also includes **caveman-compress** — a sub-skill that rewrites memory files like `CLAUDE.md` into caveman-speak. On real project files, this cuts input context by an average of **46%**, saving tokens on every subsequent session.
How to Install Caveman
Installation takes about 30 seconds with a single command:
```bash
macOS / Linux / WSL
curl -fsSL https://raw.githubusercontent.com/JuliusBrussee/caveman/main/install.sh | bash
```
```powershell
Windows (PowerShell 5.1+)
irm https://raw.githubusercontent.com/JuliusBrussee/caveman/main/install.ps1 | iex
```
The installer detects your installed AI coding agents (Claude Code, Codex, Cursor, Gemini CLI, Copilot, and 30+ more) and configures Caveman for each one it finds.
**Requirements:** Node.js ≥ 18. The script is safe to re-run — it detects existing installations and upgrades in place.
After installation, trigger caveman mode in any session by typing `/caveman`. Return to normal mode with `normal mode`.
Built-In Commands You Get
| Command | What It Does |
|---------|-------------|
| `/caveman [lite|full|ultra|wenyan]` | Set compression level for the session |
| `/caveman-commit` | Generate conventional commit messages (≤50 chars) |
| `/caveman-review` | One-line PR comments: `L42: 🔴 bug: user null. Add guard.` |
| `/caveman-stats` | Show session token usage, lifetime savings, and USD cost avoided |
| `/caveman-compress <file>` | Rewrite a memory file in caveman style |
The statusline badge shows `[CAVEMAN] ⛏ 12.4k` — your lifetime tokens saved — updating with every `/caveman-stats` run.
Price Comparison: What 65% Token Savings Means
Let's put the savings in concrete terms. If you're a solo developer running Claude Code for 8 hours per day:
- **Average output per session:** ~12,000 tokens
- **Without Caveman:** ~3,000 tokens saved per session → ~9,000 tokens per session
- **With Caveman (65% reduction):** ~3,900 output tokens per session
- **Monthly savings on output tokens alone:** Potentially hundreds of thousands of tokens
For teams with multiple developers, the savings multiply. And if you're using a self-hosted model where you pay for inference hardware, the throughput improvement is even more valuable — caveman output runs **~3x faster** because there's less text to generate per turn.
Cavemon and the Self-Hosted Ecosystem
Caveman isn't locked into Claude Code. The installer supports **30+ agents** including Codex, Gemini CLI, Cursor, Windsurf, Cline, GitHub Copilot, and the self-hosted gateway OpenClaw. For self-hosted setups, Caveman is particularly valuable because:
1. **Lower token usage = lower VRAM pressure** — less output context held in memory
2. **Faster responses** — 3x speedup means quicker iterations
3. **Better context management** — smaller output means the context window fills slower, reducing the need for compression mid-session
4. **Works with local models** — if you're running DeepSeek or Llama through a local inference server, caveman-style prompts work the same way
For those running self-hosted AI coding agents, we recommend pairing Caveman with Graphify for knowledge graphs and MemPalace for persistent memory — the three tools together dramatically reduce both cost and friction.
The Bottom Line
Caveman has become the fastest-growing AI developer tool of 2026 because it addresses a real pain point: AI token costs add up fast, and simple "be concise" instructions don't actually work well. By providing a structured, benchmarked compression system that preserves technical accuracy, Caveman delivers genuine savings without sacrificing quality.
If you're a heavy user of AI coding tools, the 30-second install is a no-brainer. Try it on your next session — your wallet will thank you.
**Links:**
