token-efficiency

安装量: 558
排名: #1975

安装

npx skills add https://github.com/delphine-l/claude_global --skill token-efficiency

Token Efficiency Expert

This skill provides token optimization strategies for cost-effective Claude Code usage across all projects. These guidelines help minimize token consumption while maintaining high-quality assistance.

Core Principle

ALWAYS follow these optimization guidelines by default unless the user explicitly requests verbose output or full file contents.

Default assumption: Users prefer efficient, cost-effective assistance.

Model Selection Strategy

Use the right model for the task to optimize cost and performance:

Opus - For Learning and Deep Understanding

Use Opus when:

🎓 Learning new codebases - Understanding architecture, code structure, design patterns 📚 Broad exploration - Identifying key files, understanding repository organization 🔍 Deep analysis - Analyzing complex algorithms, performance optimization 📖 Reading and understanding - When you need to comprehend existing code before making changes 🧠 Very complex debugging - Only when Sonnet can't solve it or issue is architectural

Why Opus: More powerful reasoning for understanding complex systems and relationships

Example prompts:

"Use Opus to understand the architecture of this codebase" "Switch to Opus - I need help understanding how this component works" "Use Opus for this deep dive into the authentication system"

Sonnet - For Regular Development Tasks (DEFAULT)

Use Sonnet (default) for:

✏️ Writing code - Creating new files, implementing features 🔧 Editing and fixing - Updating configurations, fixing bugs 🐛 Debugging - Standard debugging, error analysis, troubleshooting (use Sonnet unless very complex) 🧪 Testing - Writing tests, running test suites 📝 Documentation - Writing READMEs, comments, docstrings 🚀 Deployment tasks - Running builds, deploying code 💬 General questions - Quick clarifications, simple explanations

Why Sonnet: Faster and more cost-effective for straightforward tasks, handles most debugging well

Example workflow:

  1. [Opus] Learn codebase structure and identify key components (one-time)
  2. [Sonnet] Implement the feature based on understanding
  3. [Sonnet] Debug and fix issues as they arise
  4. [Sonnet] Write tests and documentation
  5. [Opus] Only if stuck on architectural or very complex issues
  6. [Sonnet] Final cleanup and deployment

Cost Optimization Strategy

Typical session pattern:

Start with Opus - Spend 10-15 minutes understanding the codebase (one-time investment) Switch to Sonnet - Use for ALL implementation, debugging, and routine work Return to Opus - Only when explicitly needed for deep architectural understanding

Savings example:

2 hours of work = 120 minutes Opus for learning: 15 minutes (~5K tokens) Sonnet for everything else: 105 minutes (~15K tokens) vs all Opus: ~40K tokens Savings: ~50% token cost

Remember: Sonnet is very capable - use it by default, including for debugging. Only escalate to Opus when the problem requires deep architectural insight.

Skills and Token Efficiency Common Misconception

Myth: Having many skills in .claude/skills/ increases token usage.

Reality: Skills use progressive disclosure - Claude loads them intelligently:

At session start: Claude sees only skill descriptions (minimal tokens) When activated: Full skill content loaded only for skills being used Unused skills: Consume almost no tokens (just the description line) Example Token Usage .claude/skills/ ├── vgp-pipeline/ # ~50 tokens (description only) ├── galaxy-tool-wrapping/ # ~40 tokens (description only) ├── token-efficiency/ # ~30 tokens (description only) └── python-testing/ # ~35 tokens (description only)

Total overhead: ~155 tokens for 4 skills (just descriptions)

When skill activated: Additional 2,000-5,000 tokens loaded for that specific skill

Implication for Centralized Skills

It's safe to symlink multiple skills to a project!

Link 10+ skills from $CLAUDE_METADATA → only ~500 tokens overhead Only activate skills you need by mentioning them by name Example: "Use the vgp-pipeline skill to check status" → loads only that skill

Best practice:

Link all potentially useful skills

ln -s $CLAUDE_METADATA/skills/vgp-pipeline .claude/skills/vgp-pipeline ln -s $CLAUDE_METADATA/skills/galaxy-tool-wrapping .claude/skills/galaxy-tool-wrapping ln -s $CLAUDE_METADATA/skills/python-testing .claude/skills/python-testing

Activate selectively during session

"Use the vgp-pipeline skill to debug this workflow" # Only VGP skill fully loaded

Token waste comes from:

❌ Reading large log files unnecessarily ❌ Running verbose commands ❌ Reading unchanged files multiple times

NOT from:

✅ Having many skills available ✅ Well-organized skill directories ✅ Using centralized skill repositories Token Optimization Rules 1. Use Quiet/Minimal Output Modes

For commands with --quiet, --silent, or -q flags:

❌ DON'T: Use verbose mode by default

command --verbose

✅ DO: Use quiet mode by default

command --quiet command -q command --silent

Common commands with quiet modes:

grep -q (quiet, exit status only) git --quiet or git -q curl -s or curl --silent wget -q make -s (silent) Custom scripts with --quiet flags

When to use verbose: Only when user explicitly asks for detailed output.

  1. NEVER Read Entire Log Files

Log files can be 50-200K tokens. ALWAYS filter before reading.

❌ NEVER DO THIS:

Read: /var/log/application.log Read: debug.log Read: error.log

✅ ALWAYS DO ONE OF THESE:

Option 1: Read only the end (most recent)

Bash: tail -100 /var/log/application.log

Option 2: Filter for errors/warnings

Bash: grep -A 10 -i "error|fail|warning" /var/log/application.log | head -100

Option 3: Specific time range (if timestamps present)

Bash: grep "2025-01-15" /var/log/application.log | tail -50

Option 4: Count occurrences first

Bash: grep -c "ERROR" /var/log/application.log # See if there are many errors Bash: grep "ERROR" /var/log/application.log | tail -20 # Then read recent ones

Exceptions: Only read full log if:

User explicitly says "read the full log" Filtered output lacks necessary context Log is known to be small (<1000 lines) 3. Check Lightweight Sources First

Before reading large files, check if info is available in smaller sources:

For Git repositories:

✅ Check status first (small output)

Bash: git status --short Bash: git log --oneline -10

❌ Don't immediately read

Read: .git/logs/HEAD # Can be large

For Python/Node projects:

✅ Check package info (small files)

Bash: cat package.json | jq '.dependencies' Bash: cat requirements.txt | head -20

❌ Don't immediately read

Read: node_modules/ # Huge directory Read: venv/ # Large virtual environment

For long-running processes:

✅ Check process status

Bash: ps aux | grep python Bash: top -b -n 1 | head -20

❌ Don't read full logs immediately

Read: /var/log/syslog

  1. Use Grep Instead of Reading Files

When searching for specific content:

❌ DON'T: Read file then manually search

Read: large_file.py # 30K tokens

Then manually look for "def my_function"

✅ DO: Use Grep to find it

Grep: "def my_function" large_file.py

Then only read relevant sections if needed

Advanced grep usage:

Find with context

Bash: grep -A 5 -B 5 "pattern" file.py # 5 lines before/after

Case-insensitive search

Bash: grep -i "error" logfile.txt

Recursive search in directory

Bash: grep -r "TODO" src/ | head -20

Count matches

Bash: grep -c "import" *.py

  1. Read Files with Limits

If you must read a file, use offset and limit parameters:

✅ Read first 100 lines to understand structure

Read: large_file.py (limit: 100)

✅ Read specific section

Read: large_file.py (offset: 500, limit: 100)

✅ Read just the imports/header

Read: script.py (limit: 50)

For very large files:

Check file size first

Bash: wc -l large_file.txt

Output: 50000 lines

Then read strategically

Bash: head -100 large_file.txt # Beginning Bash: tail -100 large_file.txt # End Bash: sed -n '1000,1100p' large_file.txt # Specific middle section

Reading Large Test Output Files:

For Galaxy tool_test_output.json files (can be 30K+ lines):

Read summary first (top of file)

Read(file_path, limit=10) # Just get summary section

Then read specific test results

Read(file_path, offset=140, limit=120) # Target specific test

Search for patterns

Bash("grep -n 'test_index' tool_test_output.json") # Find test boundaries

Token savings:

Full file: ~60K tokens Targeted reads: ~5K tokens Savings: 55K tokens (92%) 6. Use Bash Commands Instead of Reading Files

CRITICAL OPTIMIZATION: For file operations, use bash commands directly instead of reading files into Claude's context.

Reading files costs tokens. Bash commands don't.

Copy File Contents

❌ DON'T: Read and write (costs tokens for file content)

Read: source_file.txt Write: destination_file.txt (with content from source_file.txt)

✅ DO: Use cp command (zero token cost for file content)

Bash: cp source_file.txt destination_file.txt

Token savings: 100% of file content

Replace Text in Files

❌ DON'T: Read, edit, write (costs tokens for entire file)

Read: config.yaml Edit: config.yaml (old_string: "old_value", new_string: "new_value")

✅ DO: Use sed in-place (zero token cost for file content)

Bash: sed -i '' 's/old_value/new_value/g' config.yaml

or

Bash: sed -i.bak 's/old_value/new_value/g' config.yaml # with backup

For literal strings with special characters

Bash: sed -i '' 's|old/path|new/path|g' config.yaml # Use | as delimiter

Token savings: 100% of file content

macOS vs Linux compatibility:

macOS (BSD sed) - requires empty string after -i

sed -i '' 's/old/new/g' file.txt

Linux (GNU sed) - no argument needed

sed -i 's/old/new/g' file.txt

Cross-platform solution (works everywhere):

sed -i.bak 's/old/new/g' file.txt && rm file.txt.bak

OR detect OS:

if [[ "$OSTYPE" == "darwin"* ]]; then sed -i '' 's/old/new/g' file.txt else sed -i 's/old/new/g' file.txt fi

Portable alternative (no -i flag):

sed 's/old/new/g' file.txt > file.tmp && mv file.tmp file.txt

Why this matters: Scripts using sed -i will fail on macOS with cryptic errors like "can't read /pattern/..." if the empty string is omitted. Always use sed -i '' for macOS compatibility or sed -i.bak for cross-platform safety.

Append to Files

❌ DON'T: Read and write entire file

Read: log.txt Write: log.txt (with existing content + new line)

✅ DO: Use echo or append

Bash: echo "New log entry" >> log.txt Bash: cat >> log.txt << 'EOF' Multiple lines of content EOF

Token savings: 100% of existing file content

Delete Lines from Files

❌ DON'T: Read, filter, write

Read: data.txt Write: data.txt (without lines containing "DELETE")

✅ DO: Use sed or grep

Bash: sed -i '' '/DELETE/d' data.txt

or

Bash: grep -v "DELETE" data.txt > data_temp.txt && mv data_temp.txt data.txt

Extract Specific Lines

❌ DON'T: Read entire file to get a few lines

Read: large_file.txt (find lines 100-110)

✅ DO: Use sed or awk

Bash: sed -n '100,110p' large_file.txt Bash: awk 'NR>=100 && NR<=110' large_file.txt Bash: head -110 large_file.txt | tail -11

Rename Files in Bulk

❌ DON'T: Read directory, loop in Claude, execute renames

Read directory listing... For each file: mv old_name new_name

✅ DO: Use bash loop or rename command

Bash: for f in .txt; do mv "$f" "${f%.txt}.md"; done Bash: rename 's/.txt$/.md/' .txt # if rename command available

Merge Files

❌ DON'T: Read multiple files and write combined

Read: file1.txt Read: file2.txt Write: combined.txt

✅ DO: Use cat

Bash: cat file1.txt file2.txt > combined.txt

or append

Bash: cat file2.txt >> file1.txt

Count Lines/Words/Characters

❌ DON'T: Read file to count

Read: document.txt

Then count lines manually

✅ DO: Use wc

Bash: wc -l document.txt # Lines Bash: wc -w document.txt # Words Bash: wc -c document.txt # Characters

Check if File Contains Text

❌ DON'T: Read file to search

Read: config.yaml

Then search for text

✅ DO: Use grep with exit code

Bash: grep -q "search_term" config.yaml && echo "Found" || echo "Not found"

or just check exit code

Bash: grep -q "search_term" config.yaml # Exit 0 if found, 1 if not

Sort File Contents

❌ DON'T: Read, sort in memory, write

Read: unsorted.txt Write: sorted.txt (with sorted content)

✅ DO: Use sort command

Bash: sort unsorted.txt > sorted.txt Bash: sort -u unsorted.txt > sorted_unique.txt # Unique sorted Bash: sort -n numbers.txt > sorted_numbers.txt # Numeric sort

Remove Duplicate Lines

❌ DON'T: Read and deduplicate manually

Read: file_with_dupes.txt Write: file_no_dupes.txt

✅ DO: Use sort -u or uniq

Bash: sort -u file_with_dupes.txt > file_no_dupes.txt

or preserve order

Bash: awk '!seen[$0]++' file_with_dupes.txt > file_no_dupes.txt

Find and Replace Across Multiple Files

❌ DON'T: Read each file, edit, write back

Read: file1.py Edit: file1.py (replace text) Read: file2.py Edit: file2.py (replace text)

... repeat for many files

✅ DO: Use sed with find or loop

Bash: find . -name "*.py" -exec sed -i '' 's/old_text/new_text/g' {} +

or

Bash: for f in *.py; do sed -i '' 's/old_text/new_text/g' "$f"; done

Create File with Template Content

❌ DON'T: Use Write tool for static content

Write: template.txt (with multi-line template)

✅ DO: Use heredoc or echo

Bash: cat > template.txt << 'EOF' Multi-line template content EOF

or for simple content

Bash: echo "Single line content" > file.txt

When to Break These Rules

Still use Read/Edit/Write when:

Complex logic required: Conditional edits based on file structure Code-aware changes: Editing within functions, preserving indentation Validation needed: Need to verify content before changing Interactive review: User needs to see content before approving changes Multi-step analysis: Need to understand code structure first

Example where Read/Edit is better:

Changing function signature requires understanding context

Read: module.py Edit: module.py (update specific function while preserving structure)

Example where bash is better:

Simple text replacement

Bash: sed -i '' 's/old_api_url/new_api_url/g' config.py

Token Savings Examples

Example 1: Update 10 config files

Wasteful approach:

Read: config1.yaml # 5K tokens Edit: config1.yaml Read: config2.yaml # 5K tokens Edit: config2.yaml

... repeat 10 times = 50K tokens

Efficient approach:

Bash: for f in config*.yaml; do sed -i '' 's/old/new/g' "$f"; done

Token cost: ~100 tokens for command, 0 for file content

Savings: 49,900 tokens (99.8%)

Example 2: Copy configuration

Wasteful approach:

Read: template_config.yaml # 10K tokens Write: project_config.yaml # 10K tokens

Total: 20K tokens

Efficient approach:

Bash: cp template_config.yaml project_config.yaml

Token cost: ~50 tokens

Savings: 19,950 tokens (99.75%)

Example 3: Append log entry

Wasteful approach:

Read: application.log # 50K tokens (large file) Write: application.log # 50K tokens

Total: 100K tokens

Efficient approach:

Bash: echo "[$(date)] Log entry" >> application.log

Token cost: ~50 tokens

Savings: 99,950 tokens (99.95%)

Find CSV Column Indices

❌ DON'T: Read entire CSV file to find column numbers

Read: large_table.csv (100+ columns, thousands of rows)

Then manually count columns

✅ DO: Extract and number header row

Bash: head -1 file.csv | tr ',' '\n' | nl

✅ DO: Find specific columns by pattern

Bash: head -1 VGP-table.csv | tr ',' '\n' | nl | grep -i "chrom"

Output shows column numbers and names:

54 num_chromosomes

106 total_number_of_chromosomes

122 num_chromosomes_haploid

How it works:

head -1: Get header row only tr ',' '\n': Convert comma-separated to newlines nl: Number the lines (gives column index) grep -i: Filter by pattern (case-insensitive)

Use case: Quickly identify which columns contain needed data in wide tables (100+ columns).

Token savings: 100% of file content - Only see column headers, not data rows.

Python Data Filtering Pattern

✅ Create separate filtered files rather than overwriting

Read original

species_data = [] with open('data.csv', 'r') as f: reader = csv.DictReader(f) for row in reader: if row['accession'] and row['chromosome_count']: # Filter criteria species_data.append(row)

Write to NEW file with descriptive suffix

output_file = 'data_filtered.csv' # Not 'data.csv' with open(output_file, 'w', newline='') as f: writer = csv.DictWriter(f, fieldnames=reader.fieldnames) writer.writeheader() writer.writerows(species_data)

Benefits:

Preserves original data for comparison Clear naming indicates filtering applied Can generate multiple filtered versions Easier to debug and verify filtering logic Handling Shell Aliases in Python Scripts

Problem: Python's subprocess.run() doesn't expand shell aliases.

❌ FAILS if 'datasets' is an alias

subprocess.run(['datasets', 'summary', ...])

Error: [Errno 2] No such file or directory: 'datasets'

Solution: Use full path to executable

Find full path

type -a datasets

Output: datasets is an alias for ~/Workdir/ncbi_tests/datasets

echo ~/Workdir/ncbi_tests/datasets # Expand ~

Output: /Users/delphine/Workdir/ncbi_tests/datasets

Use full path in script

datasets_cmd = '/Users/delphine/Workdir/ncbi_tests/datasets' subprocess.run([datasets_cmd, 'summary', ...])

Alternative: Use shell=True (but avoid for security reasons with user input)

Key Principle for File Operations

Ask yourself first:

Can this be done with cp, mv, sed, awk, grep? Is the change purely textual (not logic-dependent)? Do I need to see the file content, or just modify it?

If answers are YES, YES, NO → Use bash commands, not Read/Edit/Write

  1. Filter Command Output

For commands that produce large output:

❌ DON'T: Capture all output

Bash: find / -name "*.py" # Could return 10,000+ files

✅ DO: Limit or filter output

Bash: find /specific/path -name ".py" | head -50 Bash: find . -name ".py" -type f | wc -l # Count first Bash: find . -name "*.py" -type f | grep "test" | head -20 # Filter

❌ DON'T: Run verbose commands without filtering

Bash: ls -laR / # Recursive listing of entire filesystem!

✅ DO: Limit scope and depth

Bash: ls -la Bash: find . -maxdepth 2 -type f Bash: tree -L 2 # Limit tree depth

  1. Summarize, Don't Dump

When explaining command output or file contents:

User: "What's in this directory?"

❌ BAD RESPONSE: [Paste entire 5K token ls -la output with 500 files]

✅ GOOD RESPONSE: "This directory contains 487 files. Key items: - 235 Python files (.py) - 142 test files in tests/ - 89 config files (.yaml, *.json) - Main entry point: main.py - Documentation in docs/

Would you like to see specific files or file types?"

For code:

User: "What does this script do?"

❌ BAD: [Read entire 500-line file, paste all code]

✅ GOOD: 1. Read: script.py (limit: 50) # Just the header/imports 2. Grep: "^def " script.py # List all functions 3. Summarize: "This script has 5 main functions: - parse_args(): Command-line argument parsing - load_data(): Reads CSV files - process_data(): Applies transformations - validate_output(): Checks results - main(): Orchestrates the workflow

Would you like details on any specific function?"

  1. Use Head/Tail for Large Output

When commands produce large output:

✅ Limit output length

Bash: cat large_file.txt | head -100 Bash: cat large_file.txt | tail -100 Bash: docker logs container_name | tail -50

✅ Sample from middle

Bash: cat large_file.txt | head -500 | tail -100 # Lines 400-500

✅ Check size before reading

Bash: wc -l file.txt

If > 1000 lines, use head/tail

  1. Use JSON/Data Tools Efficiently

For JSON, YAML, XML files:

❌ DON'T: Read entire file

Read: large_config.json # Could be 50K tokens

✅ DO: Extract specific fields

Bash: cat large_config.json | jq '.metadata' Bash: cat large_config.json | jq 'keys' # Just see top-level keys Bash: cat config.yaml | yq '.database.host'

For XML

Bash: xmllint --xpath '//database/host' config.xml

For CSV files:

❌ DON'T: Read entire CSV

Read: large_data.csv # Could be millions of rows

✅ DO: Sample and analyze

Bash: head -20 large_data.csv # See header and sample rows Bash: wc -l large_data.csv # Count rows Bash: csvstat large_data.csv # Get statistics (if csvkit installed)

  1. Optimize Code Reading

For understanding codebases:

✅ STEP 1: Get overview

Bash: find . -name ".py" | head -20 # List files Bash: grep -r "^class " --include=".py" | head -20 # List classes Bash: grep -r "^def " --include="*.py" | wc -l # Count functions

✅ STEP 2: Read structure only

Read: main.py (limit: 100) # Just imports and main structure

✅ STEP 3: Search for specific code

Grep: "class MyClass" src/

✅ STEP 4: Read only relevant sections

Read: src/mymodule.py (offset: 150, limit: 50) # Just the relevant class

❌ DON'T: Read entire files sequentially

Read: file1.py # 30K tokens Read: file2.py # 30K tokens Read: file3.py # 30K tokens

  1. Use Task Tool for Exploratory Searches

When exploring a codebase to understand patterns or find information (not needle queries for specific files):

❌ Inefficient approach (many tool calls, large context):

Direct grep through many files

Grep(pattern="some_pattern", path=".", output_mode="content")

Followed by multiple Read calls to understand context

Read("file1.py") Read("file2.py")

Followed by more Grep calls for related patterns

Grep(pattern="related_pattern", path=".", output_mode="content")

Results in dozens of tool calls and accumulating context

✅ Efficient approach (single consolidated response):

Use Task tool with Explore subagent

Task( subagent_type="Explore", description="Research how Galaxy API works", prompt="""Explore the codebase to understand how Galaxy API calls are made. I need to know: - Which files contain API call patterns - How authentication is handled - Common error handling patterns Return a summary with file locations and key patterns.""" )

When to use Task/Explore:

"How does X work in this codebase?" "Where are errors from Y handled?" "What is the structure of Z?" Searching for patterns across multiple files Need context from multiple locations Exploring unfamiliar codebases

When to use direct tools instead:

"Read file at specific path X" → Use Read "Find class definition Foo" → Use Glob("**/foo.py") or Grep("class Foo") "Search for specific string in file X" → Use Grep(pattern, path="file.py") You know exactly which file to check

Token savings:

Task tool: ~5-10K tokens for consolidated response Direct exploration: ~30-50K tokens (many tool calls + context accumulation) Savings: 70-80% for exploratory searches

Example comparison:

❌ Inefficient: Exploring workflow patterns manually

Grep("workflow", output_mode="content") # 15K tokens Read("workflow1.py") # 20K tokens Read("workflow2.py") # 18K tokens Grep("error handling", output_mode="content") # 12K tokens

Total: ~65K tokens

✅ Efficient: Using Task tool

Task( subagent_type="Explore", description="Understand workflow error handling", prompt="Explore how workflows handle errors. Return patterns and file locations." )

Total: ~8K tokens (single consolidated response)

Savings: 88%

  1. Efficient Scientific Literature Searches

When searching for data across multiple species (karyotypes, traits, etc.):

❌ Inefficient: Sequential searches

for species in species_list: search(species) # One at a time

✅ Efficient: Parallel searches in batches

Make 5 searches simultaneously

WebSearch("species1 karyotype") WebSearch("species2 karyotype") WebSearch("species3 karyotype") WebSearch("species4 karyotype") WebSearch("species5 karyotype")

Benefits:

5x faster for user Same token usage per search Better user experience Allows quick progress saves before session limits

Best practices:

Batch 3-5 related searches together Group by taxonomy or data type Save results immediately after each batch Document "not found" species to avoid re-searching Dealing with Session Interruptions

When user warns about daily limits:

Immediately save progress:

Write findings to file Update CSV/database with confirmed data Create detailed progress document

Document search status:

Which species searched Which confirmed/not found Which remain to search Next steps with priority order

Create resume file with:

Current totals Completed work Pending tasks with priorities Recommendations for next session

Example: PROGRESS_YYYYMMDD.md file with clear resumption instructions

Search Term Iteration

When initial searches fail, refine systematically:

First try: Specific scientific terms

"Anas acuta karyotype 2n"

Second try: Common name + scientific

"northern pintail Anas acuta chromosome number"

Third try: Genus-level patterns

"Anas genus karyotype waterfowl"

Fourth try: Family-level studies

"Anatidae chromosome evolution cytogenetics"

Don't: Keep searching the same terms repeatedly Do: Escalate to higher taxonomic levels or comparative studies

Token Savings Examples Example 1: Status Check

Scenario: User asks "What's the status of my application?"

❌ Wasteful approach (50K tokens):

Read: /var/log/app.log # 40K tokens Bash: systemctl status myapp # 10K tokens

✅ Efficient approach (3K tokens):

Bash: systemctl status myapp --no-pager | head -20 # 1K tokens Bash: tail -50 /var/log/app.log # 2K tokens

Savings: 94%

Example 2: Debugging Errors

Scenario: User says "My script is failing, help debug"

❌ Wasteful approach (200K tokens):

Read: debug.log # 150K tokens Read: script.py # 30K tokens Read: config.json # 20K tokens

✅ Efficient approach (8K tokens):

Bash: tail -100 debug.log # 3K tokens Bash: grep -i "error|traceback" debug.log | tail -50 # 2K tokens Grep: "def main" script.py # 1K tokens Read: script.py (offset: 120, limit: 50) # 2K tokens (just the failing function)

Savings: 96%

Example 3: Code Review

Scenario: User asks "Review this codebase"

❌ Wasteful approach (500K tokens):

Read: file1.py Read: file2.py Read: file3.py Read: file4.py

... reads 20+ files

✅ Efficient approach (20K tokens):

Bash: find . -name ".py" | head -30 # 1K Bash: cloc . # Lines of code summary - 1K Bash: grep -r "^class " --include=".py" | head -20 # 2K Bash: grep -r "^def " --include="*.py" | wc -l # 1K Read: main.py (limit: 100) # 3K Read: README.md # 5K Grep: "TODO|FIXME|XXX" -r . # 2K

Then ask user what specific areas to review

Savings: 96%

When to Override These Guidelines

Override efficiency rules when:

User explicitly requests full output:

"Show me the entire log file" "Read the full source code" "I don't care about token cost"

Filtered output lacks necessary context:

Error message references line numbers not in filtered output Need to understand full data flow Debugging requires seeing complete state

File is known to be small:

File is < 200 lines Config files with minimal content Small documentation files

Learning code structure and architecture (IMPORTANT):

User is exploring a new codebase to understand its organization Learning coding patterns, idioms, or best practices from existing code Understanding how modules/classes are structured Studying implementation approaches for educational purposes Reading example code or reference implementations Initial exploration phase before making changes

Key indicators for learning mode:

User says: "help me understand this codebase", "how does X work?", "show me how this is implemented" User is asking conceptual questions: "what patterns are used?", "how is this organized?" User wants to learn from the code, not just debug or modify it User is new to the project or technology

In learning mode:

✅ DO: Read full files to show complete patterns and structure ✅ DO: Read multiple related files to show how components interact ✅ DO: Show full function/class implementations as examples ✅ DO: Explain code in detail with context

⚠️ BALANCE: Still use strategic efficiency (don't read 50 files at once) - Apply strategic file selection (see section below) - Read 2-5 key files fully to establish understanding - Use grep to find other relevant examples - Summarize patterns found across many files

After learning phase, return to efficient mode for implementation.

In cases 1-3, explain to the user:

"This will use approximately [X]K tokens. Should I proceed? Or would you prefer a filtered/summarized view first?"

In learning mode (case 4), prioritize understanding over token efficiency, but still be strategic about which files to read fully (see Strategic File Selection below).

Strategic File Selection for Learning Mode

When entering learning mode, first determine if this is broad exploration or targeted learning, then apply the appropriate strategy.

Learning Mode Types

Type 1: Broad Exploration - "Help me understand this codebase", "How is this organized?" → Use repository-based strategies below (identify type, read key files)

Type 2: Targeted Pattern Learning - "How do I implement X?", "Show me examples of Y" → Use targeted concept search (see Targeted Pattern Learning section below)

Targeted Pattern Learning

When user asks about a specific technique or pattern, use this focused approach instead of broad exploration.

Examples of Targeted Learning Queries "How do variable number of outputs work in Galaxy wrappers?" "Show me how to fetch invocation data from Galaxy API" "How do I implement conditional parameters in Galaxy tools?" "How does error handling work in this codebase?" "Show me examples of async function patterns" "How are tests structured for workflow X?" Targeted Learning Workflow

STEP 1: Identify the Specific Concept

Extract the key concept from user's question:

User: "How do variable number of outputs work in Galaxy wrappers?" → Concept: "variable number of outputs" OR "dynamic outputs" → Context: "Galaxy tool wrappers" → File types: ".xml" (Galaxy tool wrappers)

User: "How to fetch invocation data from Galaxy API?" → Concept: "fetch invocation" OR "invocation data" OR "get invocation" → Context: "Galaxy API calls" → File types: ".py" with Galaxy API usage

STEP 2: Search for Examples

Use targeted searches to find relevant code:

```bash

For Galaxy variable outputs example

grep -r "discover_datasets|collection_type.list" --include=".xml" | head -20 grep -r "" --include="*.xml" -A 10 | grep -i "collection|discover"

For Galaxy invocation fetching

grep -r "invocation" --include=".py" -B 2 -A 5 | head -50 grep -r "show_invocation|get_invocation" --include=".py" -l

For conditional parameters

grep -r "<conditional" --include="*.xml" -l | head -10

For error handling patterns

grep -r "try:|except|raise" --include=".py" -l | xargs grep -l "class.Error"

STEP 3: Rank and Select Examples

Selection criteria (in priority order):

Documentation/Comments - Files with good comments explaining the pattern

Find well-documented examples

grep -r "pattern-keyword" --include=".py" -B 5 | grep -E "^\s#|^\s*\"\"\"" | wc -l

Simplicity - Simpler examples are better for learning

Find shorter files (likely simpler)

grep -rl "pattern-keyword" --include="*.py" | xargs wc -l | sort -n | head -5

Recency - Recent code shows current best practices

Find recent examples

grep -rl "pattern-keyword" --include="*.py" | xargs ls -lt | head -5

Multiple variations - Show different approaches if they exist

Compare different implementations

grep -r "pattern-keyword" --include="*.py" -l | head -3

STEP 4: Read Examples Fully

Read 2-3 selected examples completely to understand the pattern:

Example: Variable outputs in Galaxy

After finding: tools/tool1.xml, tools/tool2.xml, tools/advanced.xml

Read: tools/tool1.xml # Simple example Read: tools/tool2.xml # Standard example Read: tools/advanced.xml # Complex variation (if needed)

STEP 5: Extract and Explain the Pattern

After reading examples, explain:

The core pattern - How it works conceptually Required elements - What's needed to implement it Common variations - Different ways to use it Common pitfalls - What to avoid Best practices - Recommended approach Targeted Learning Examples Example 1: Variable Number of Outputs in Galaxy Wrappers

User query: "How do variable number of outputs work in Galaxy wrappers?"

Execution:

Step 1: Identify concept

Concept: dynamic outputs, discover_datasets

File type: Galaxy XML wrappers (*.xml)

Step 2: Search for examples

grep -r "discover_datasets" --include="*.xml" -l

Output: tools/samtools/samtools_merge.xml, tools/kraken2/kraken2.xml, ...

Step 3: Rank examples

ls -lt tools//samtools_merge.xml tools//kraken2.xml

Pick: samtools_merge.xml (recent, well-maintained)

Step 4: Read example fully

Read: tools/samtools/samtools_merge.xml

Step 5: Find another variation

grep -r "collection_type.list" --include=".xml" -l | head -1 Read: tools/example/collection_output.xml

Explanation provided:

I found two main approaches for variable outputs in Galaxy:

  1. discover_datasets pattern (samtools_merge.xml:45-52):
  2. Use in output section
  3. Specify pattern and format
  4. Galaxy auto-detects files matching pattern

  5. output_collection pattern (collection_output.xml:78-85):

  6. Define collection type (list, paired, etc.)
  7. Use discover_datasets within collection
  8. For grouped/structured outputs

Key requirements: - Set discover_datasets pattern attribute - Specify format for discovered files - Optional: Use directory attribute for subdirectories

Would you like me to show a specific implementation for your use case?

Example 2: Fetching Invocation Data from Galaxy API

User query: "How do I fetch invocation data using the Galaxy API?"

Execution:

Step 1: Identify concept

Concept: fetch invocation, show_invocation, invocation data

Context: Galaxy API, bioblend usage

File type: Python files with bioblend/API calls

Step 2: Search for examples

grep -r "show_invocation|get_invocation" --include="*.py" -l

Output: galaxy_client.py, orchestrator.py, check_status.py

grep -r "invocations.show|.show_invocation" --include="*.py" -B 3 -A 8

Shows actual usage patterns

Step 3: Rank examples

galaxy_client.py - likely has core API wrapper functions

orchestrator.py - shows usage in context

Pick both for different perspectives

Step 4: Read examples

Read: batch_vgp_run/galaxy_client.py

Focus on invocation-related functions

grep -n "def.*invocation" batch_vgp_run/galaxy_client.py

Shows: check_invocation_complete (line 250), rerun_failed_invocation (line 847)

Read: batch_vgp_run/galaxy_client.py (offset: 245, limit: 60) Read: batch_vgp_run/galaxy_client.py (offset: 840, limit: 70)

Explanation provided:

I found the pattern for fetching invocation data (galaxy_client.py:250-285):

Basic invocation fetch: ```python invocation = gi.invocations.show_invocation(invocation_id) state = invocation['state'] # 'ok', 'running', 'failed', 'cancelled'

Detailed invocation with steps:

invocation = gi.invocations.show_invocation(invocation_id, include_workflow_steps=True) steps = invocation.get('steps', {}) for step_id, step_data in steps.items(): step_state = step_data['state'] job_id = step_data.get('job_id')

Key fields available:

invocation['state'] - Overall invocation status invocation['steps'] - Individual step details invocation['workflow_id'] - Associated workflow invocation['history_id'] - Target history invocation['update_time'] - Last update timestamp

Common patterns:

Poll for completion (galaxy_client.py:250-285) Get failed step details (galaxy_client.py:320-340) Rerun with modifications (galaxy_client.py:847-899)

Which pattern do you need for your use case?


Example 3: Conditional Parameters in Galaxy Tools

User query: "How do conditional parameters work in Galaxy tool wrappers?"

Execution: ```bash

Step 1: Identify concept

Concept: conditional parameters, when expression

File type: Galaxy XML wrappers

Step 2: Search

grep -r "<conditional" --include=".xml" -l | head -10 grep -r "<conditional" --include=".xml" -A 15 | head -50

Step 3: Find simple example first

grep -rl "<conditional" --include="*.xml" | xargs wc -l | sort -n | head -3

Pick shortest file with conditionals

Step 4: Read examples

Read: tools/simple-tool/simple_conditional.xml # Simple case Read: tools/complex-tool/advanced_conditional.xml # Nested case

Explanation provided:

Conditional parameters in Galaxy (simple_conditional.xml:34-58):

Basic structure: ```xml

<when value="single">
    <param name="format" type="select" label="Format">
        <option value="txt">Text</option>
        <option value="csv">CSV</option>
    </param>
</when>

<when value="collection">
    <param name="collection_type" type="select" label="Collection type">
        <option value="list">List</option>
        <option value="paired">Paired</option>
    </param>
</when>

In command block (Cheetah syntax):

if $output_choice.output_type == "single":

--format ${output_choice.format}

else:

--collection-type ${output_choice.collection_type}

end if

Advanced: Nested conditionals (advanced_conditional.xml:67-120):

Conditionals can contain other conditionals Each is independent Access nested values: ${outer.inner.value}

Would you like to see nested examples or specific use case?


When to Use Targeted Learning

Use targeted learning when user: - ✅ Asks "how do I..." about specific feature - ✅ Requests "show me examples of X" - ✅ Wants to learn specific pattern/technique - ✅ Has focused technical question - ✅ References specific concept/API/feature

Don't use for: - ❌ "Understand this codebase" (use broad exploration) - ❌ "What does this project do?" (use documentation reading) - ❌ "Debug this error" (use debugging mode, not learning mode)


Key Principles for Targeted Learning

  1. Search first, read second
  2. Use grep to find relevant examples
  3. Rank by quality/simplicity/recency
  4. Then read selected examples fully

  5. Read 2-3 examples, not 20

  6. Simple example (minimal working code)
  7. Standard example (common usage)
  8. Complex example (advanced features) - optional

  9. Extract the pattern

  10. Don't just show code, explain the pattern
  11. Highlight key elements and structure
  12. Show variations and alternatives

  13. Provide context

  14. Where this pattern is used
  15. When to use it vs alternatives
  16. Common pitfalls and best practices

  17. Confirm understanding

  18. Ask if user needs specific variation
  19. Offer to show related patterns
  20. Check if explanation answered their question

General Exploration vs Targeted Learning

When user says → Use this approach:

User Request Approach Strategy
"Help me understand this codebase" General Exploration Identify repo type → Read key files
"How is this project organized?" General Exploration Read docs → Entry points → Architecture
"Show me how to implement X" Targeted Learning Search for X → Read examples → Extract pattern
"How does feature Y work?" Targeted Learning Grep for Y → Find best examples → Explain
"What patterns are used here?" General Exploration Read core files → Identify patterns
"How do I use API method Z?" Targeted Learning Search for Z usage → Show examples

Broad Repository Exploration

When entering broad exploration mode, first identify the repository context, then apply the appropriate exploration strategy.

STEP 1: Identify Repository Type

Ask these questions or check indicators:

```bash

Check for multiple independent tools/packages

ls -d */ | wc -l # Many directories at root level? ls recipes/ tools/ packages/ 2>/dev/null # Collection structure?

Check for submission/contribution guidelines

ls -la | grep -i "contrib|guideline|submiss" cat CONTRIBUTING.md README.md 2>/dev/null | grep -i "structure|organization|layout"

Check for monolithic vs modular structure

find . -name "setup.py" -o -name "package.json" -o -name "Cargo.toml" | wc -l

1 = monolithic, many = multi-package

Check for specific patterns

ls -la | grep -E "recipes/|tools/|workflows/|plugins/|examples/"

Repository type indicators:

Tool Library / Recipe Collection (bioconda, tool collections)

Multiple independent directories at same level Each subdirectory is self-contained Examples: recipes/tool1/, recipes/tool2/, workflows/workflow-a/ Indicator files: recipes/, tools/, packages/, multiple meta.yaml or package.json

Monolithic Application (single integrated codebase)

One main entry point Hierarchical module structure Shared dependencies and utilities Examples: src/, lib/, single setup.py, main.py Indicator files: Single setup.py, main.py, init.py, src/ directory

Framework / SDK (extensible system)

Core framework + plugins/extensions Base classes and interfaces Examples: core/, plugins/, extensions/, base/ Indicator files: core/, plugins/, documentation on extending

Example / Template Repository

Multiple example implementations Each directory shows different pattern Examples: examples/, samples/, templates/ Indicator files: examples/, README in each subdirectory STEP 2: Apply Context-Specific Strategy Strategy A: Tool Library / Recipe Collection

Goal: Learn the pattern from representative examples

Approach:

1. Find most recently modified (shows current best practices)

ls -lt recipes/ | head -10 # or tools/, workflows/, etc.

2. Find most common patterns

find recipes/ -name "meta.yaml" -o -name "*.xml" | head -1 | xargs dirname

3. Read submission guidelines first

cat CONTRIBUTING.md README.md | grep -A 20 -i "structure|format|template"

4. Read 2-3 representative examples

Pick: 1 recent, 1 complex, 1 simple

ls -lt recipes/ | head -3

Files to read (in order):

CONTRIBUTING.md or submission guidelines → Learn required structure Recent tool/recipe → Current best practices Well-established tool/recipe → Proven patterns Template or example → Base structure

Example:

For bioconda-style repository

Read: CONTRIBUTING.md ls -lt recipes/ | head -5 # Pick a recent one Read: recipes/recent-tool/meta.yaml Read: recipes/established-tool/meta.yaml # Compare patterns

Strategy B: Monolithic Application

Goal: Understand execution flow and architecture

Approach:

1. Find entry point

find . -name "main.py" -o -name "app.py" -o -name "run*.py" | grep -v test | head -5

2. Find most imported modules (core components)

grep -r "^import|^from" --include=".py" . | \ sed 's/.import //' | cut -d' ' -f1 | cut -d'.' -f1 | \ sort | uniq -c | sort -rn | head -10

3. Find orchestrators/managers

find . -name "manager.py" -o -name "orchestrator.py" -o -name "*controller.py"

4. Check recent changes (active development areas)

git log --name-only --pretty=format: --since="1 month ago" | \ sort | uniq -c | sort -rn | head -10

Files to read (in order):

README.md → Overview and architecture Entry point (main.py, run_all.py) → Execution flow Core orchestrator/manager → Main logic Most-imported utility module → Common patterns One domain-specific module → Implementation details

Example:

For Python application

Read: README.md Read: main.py # Entry point grep -r "^from.*import" main.py | head -10 # See what it imports Read: src/orchestrator.py # Core component Read: src/utils.py # Common utilities

Strategy C: Framework / SDK

Goal: Understand core abstractions and extension points

Approach:

1. Find base classes and interfaces

grep -r "^class.Base|^class.Interface|^class.Abstract" --include=".py" | head -10

2. Find core module

ls -la | grep -E "core/|base/|framework/"

3. Find plugin/extension examples

ls -la | grep -E "plugins?/|extensions?/|examples?/"

4. Check documentation for architecture

find . -name "*.md" | xargs grep -l -i "architecture|design|pattern" | head -5

Files to read (in order):

Architecture documentation → Design philosophy Base/core classes → Fundamental abstractions Simple plugin/extension → How to extend Complex plugin/extension → Advanced patterns

Example:

For plugin-based framework

Read: docs/architecture.md Read: core/base.py # Base classes Read: plugins/simple-example/ # How to extend Read: plugins/advanced-example/ # Advanced usage

Strategy D: Example / Template Repository

Goal: Learn different patterns and use cases

Approach:

1. List all examples

ls -d examples// samples// templates/*/

2. Read index/catalog if available

cat examples/README.md examples/INDEX.md

3. Pick representative examples

- Simple/basic example

- Medium complexity

- Advanced/complete example

Files to read (in order):

examples/README.md → Overview of examples Basic example → Minimal working pattern Advanced example → Full-featured pattern Compare differences → Learn progression STEP 3: Execution Strategy Template

For ANY repository type, use this workflow:

PHASE 1: Context Discovery (always token-efficient)

ls -la # Repository structure cat README.md # Overview ls -la .github/ docs/ | head -20 # Find documentation cat CONTRIBUTING.md 2>/dev/null | head -50 # Submission guidelines

PHASE 2: Identify Type (ask user if unclear)

"I see this repository has [X structure]. Is this: A) A tool library where each tool is independent? B) A monolithic application with integrated components? C) A framework with core + plugins? D) A collection of examples/templates?

This helps me choose the best files to learn from."

PHASE 3: Strategic Reading (based on type)

[Apply appropriate strategy A/B/C/D from above] Read 2-5 key files fully Grep for patterns across remaining files

PHASE 4: Summarize and Confirm

"Based on [files read], I understand: - Pattern/architecture: [summary] - Key components: [list] - Common patterns: [examples]

Is this the area you want to focus on, or should I explore [other aspect]?"

File Selection Priorities (General Rules)

Priority 1: Documentation

README.md, CONTRIBUTING.md, docs/architecture.md

These explain intent, not just implementation

Priority 2: Entry Points

Monolithic: main.py, app.py, run.py, main.py

Library: Most recent example in collection

Priority 3: Core Components

Most imported modules

grep -r "import" | cut -d: -f2 | sort | uniq -c | sort -rn

"Manager", "Controller", "Orchestrator", "Core", "Base"

find . -name "manager" -o -name "core" -o -name "base"

Priority 4: Representative Examples

Recent files (current best practices)

ls -lt directory/ | head -5

Medium complexity (not too simple, not too complex)

wc -l */.py | sort -n | awk 'NR > 10 && NR < 20'

Priority 5: Active Development Areas

Git history (if available)

git log --name-only --since="1 month ago" --pretty=format: | sort | uniq -c | sort -rn

Practical Examples

Example 1: Learning bioconda recipe patterns

Step 1: Identify type

ls recipes/ | wc -l

Output: 3000+ → Tool library

Step 2: Check guidelines

Read: CONTRIBUTING.md # Learn structure requirements

Step 3: Find representative recipes

ls -lt recipes/ | head -5 # Get recent ones

Pick one that was updated recently (current practices)

Read: recipes/recent-tool/meta.yaml

Pick one established recipe for comparison

Read: recipes/samtools/meta.yaml

Step 4: Summarize pattern

"I see bioconda recipes follow this structure: - Jinja2 variables at top - package/source/build/requirements/test/about sections - Current practice: use pip install for Python packages - sha256 checksums required Should I look at any specific type of recipe (Python/R/compiled)?"

Example 2: Learning VGP pipeline orchestration

Step 1: Identify type

ls *.py

Output: run_all.py, orchestrator.py → Monolithic application

Step 2: Read entry point

Read: run_all.py

Step 3: Find core components

grep "^from batch_vgp_run import" run_all.py

Shows: orchestrator, galaxy_client, workflow_manager

Step 4: Read core orchestrator

Read: batch_vgp_run/orchestrator.py # Full file to understand flow

Step 5: Read supporting modules selectively

grep "def run_species_workflows" batch_vgp_run/orchestrator.py -A 5 Read: batch_vgp_run/galaxy_client.py # Key helper functions

Example 3: Learning Galaxy workflow patterns

Step 1: Identify type

ls -d */ # Shows category directories

Output: transcriptomics/, genome-assembly/, etc. → Example collection

Step 2: Read guidelines

Read: .github/CONTRIBUTING.md

Step 3: Pick representative workflows

ls -lt transcriptomics/ # Recent workflows Read: transcriptomics/recent-workflow/workflow.ga Read: transcriptomics/recent-workflow/README.md

Step 4: Compare with another category

Read: genome-assembly/example-workflow/workflow.ga

Step 5: Extract common patterns

grep -r "\"format-version\"" . | head -5 grep -r "\"creator\"" . | head -5

Key Principle for Learning Mode

Balance understanding with efficiency:

✅ Read 2-5 strategic files fully (based on context) ✅ Use grep/head/tail for pattern discovery across many files ✅ Ask user which aspect to focus on after initial exploration ✅ Summarize findings before reading more

Don't:

❌ Read 20+ files sequentially without strategy ❌ Read files without understanding their role ❌ Ignore repository context and documentation Quick Reference Card

Model Selection (First Priority):

🎓 Learning/Understanding → Use Opus 🔧 Development/Debugging/Implementation → Use Sonnet (default)

Before ANY file operation, ask yourself:

Can I use bash commands instead? (cp, sed, awk, grep) → 99%+ token savings Is this a simple text operation? → Use sed/awk, not Read/Edit Am I copying/merging files? → Use cp/cat, not Read/Write Can I check metadata first? (file size, line count, modification time) Can I filter before reading? (grep, head, tail) Can I read just the structure? (first 50 lines, function names) Can I summarize instead of showing raw data? Does the user really need the full content?

Default strategy for file operations:

FIRST: Try bash commands

cp source.txt dest.txt # Instead of Read + Write sed -i '' 's/old/new/g' file.txt # Instead of Read + Edit cat file1.txt file2.txt > combined.txt # Instead of Read + Read + Write echo "text" >> file.txt # Instead of Read + Write (append)

ONLY IF NEEDED: Read files

wc -l file.txt # Check size first head -20 file.txt # Read sample grep "pattern" file.txt | head -50 # Filter before reading

LAST RESORT: Full file read

Only when you need to understand code structure or complex logic

Cost Impact

Conservative estimate for typical usage:

Approach Tokens/Week Claude Pro Claude Team Notes Wasteful (Read/Edit/Write everything) 500K ⚠️ At risk of limits ✅ OK Reading files unnecessarily Moderate (filtered reads only) 200K ✅ Comfortable ✅ Very comfortable Grep/head/tail usage Efficient (bash commands + filters) 30-50K ✅ Very comfortable ✅ Excellent Using cp/sed/awk instead of Read

Applying these rules reduces costs by 90-95% on average.

Bash commands optimization alone:

File operations: 99%+ token savings (e.g., 50K tokens → 50 tokens) Most impactful single optimization Zero learning curve (standard bash commands) Implementation

This skill automatically applies these optimizations when:

Reading log files Executing commands with large output Navigating codebases Debugging errors Checking system status

You can always override by saying:

"Show me the full output" "Read the entire file" "I want verbose mode" "Don't worry about tokens" Managing Long-Running Background Processes Best Practices for Background Tasks

When running scripts that take hours, properly manage background processes to prevent resource leaks and enable clean session transitions:

  1. Run in background with Bash tool run_in_background: true

  2. Document the process in status files:

Background Processes

  • Script: comprehensive_search.py
  • Process ID: Available via BashOutput tool
  • Status: Running (~6% complete)
  • How to check: BashOutput tool with bash_id

  • Kill cleanly before session end:

Before ending session:

1. Kill all background processes

KillShell(shell_id="abc123")

2. Create resume documentation (see claude-collaboration skill)

3. Document current progress (files, counts, status)

4. Save intermediate results

  1. Design scripts to be resumable (see Python Environment Management skill):

Check for existing output files (skip if present) Load existing results and append new ones Save progress incrementally (not just at end) Track completion status in structured format Pre-Interruption Checklist

Before ending a session with running processes:

✅ Check background process status ✅ Kill all background processes cleanly ✅ Create resume documentation (RESUME_HERE.md) ✅ Document current progress with metrics ✅ Save intermediate results to disk ✅ Verify resume commands in documentation Token Efficiency Benefit

Properly managing background processes:

Prevents context pollution - Old process output doesn't leak into new sessions Enables clean handoff - Resume docs allow fresh session without re-explaining Avoids redundant work - Resumable scripts don't repeat completed tasks Repository Organization for Long Projects Problem

Data enrichment and analysis projects generate many intermediate files, scripts, and logs that clutter the root directory, making it hard to:

Find the current working dataset Identify which scripts are actively used Navigate the project structure Maintain focus on important files Solution: Organize Early and Often

Create dedicated subfolders at project start:

mkdir -p python_scripts/ logs/ tables/

Organization strategy:

python_scripts/ - All analysis and processing scripts (16+ scripts in VGP project) logs/ - All execution logs from script runs (38+ logs in VGP project) tables/ - Intermediate results, old versions, and archived data Root directory - Only main working dataset and current outputs

Benefits:

Reduces cognitive load when scanning directory Makes git status cleaner and more readable Easier to exclude intermediate files from version control Faster file navigation with autocomplete Professional project structure for collaboration

When to organize:

At project start (ideal) After accumulating 5+ scripts or logs (acceptable) Before sharing project with collaborators (essential)

Example cleanup script:

Move all Python scripts

mkdir -p python_scripts mv *.py python_scripts/

Move all logs

mkdir -p logs mv *.log logs/

Move intermediate tables (keep main dataset in root)

mkdir -p tables mv _intermediate.csv _backup.csv *_old.csv tables/

Token efficiency impact:

Cleaner ls outputs (fewer lines to process) Easier to target specific directories with Glob Reduced cognitive overhead when navigating Faster file location with autocomplete Summary

Core motto: Right model. Bash over Read. Filter first. Read selectively. Summarize intelligently.

Model selection (highest impact):

Use Opus for learning/understanding (one-time investment) Use Sonnet for development/debugging/implementation (default) This alone can save ~50% cost vs using Opus for everything

Primary optimization rule:

Use bash commands for file operations (cp, sed, awk, grep) instead of Read/Edit/Write This alone can save 99%+ tokens on file operations

Secondary rules:

Filter before reading (grep, head, tail) Read with limits when needed Summarize instead of showing raw output Use quiet modes for commands Strategic file selection for learning

By following these guidelines, users can get 5-10x more value from their Claude subscription while maintaining high-quality assistance.

返回排行榜