Sanitize Git Repository
This skill provides guidance for systematically identifying and replacing sensitive information in git repositories, including API keys, tokens, passwords, and other credentials.
When to Use This Skill
Sanitizing a repository before sharing or open-sourcing
Removing accidentally committed secrets from a codebase
Replacing hardcoded credentials with placeholders
Auditing a repository for sensitive information
Preparing code for security review
Recommended Approach
Phase 1: Comprehensive Discovery
Build a complete inventory of all sensitive values before making any changes. This prevents the common mistake of discovering additional secrets mid-process.
Common Secret Patterns to Search:
Type
Pattern/Prefix
Example
AWS Access Keys
AKIA[A-Z0-9]{16}
AKIAIOSFODNN7EXAMPLE
AWS Secret Keys
40-char base64 strings
Often near access keys
GitHub Tokens
ghp_
,
gho_
,
ghs_
,
ghr_
ghp_xxxxxxxxxxxx
Huggingface Tokens
hf_
hf_xxxxxxxxxxxx
Generic API Keys
api[_-]?key
,
apikey
Varies
Bearer Tokens
bearer
,
token
Varies
Private Keys
-----BEGIN.PRIVATE KEY-----
RSA/EC keys
Database URLs
postgres://
,
mysql://
,
mongodb://
Connection strings
Password Fields
password
,
passwd
,
pwd
Varies
Discovery Strategy:
Search for known prefixes and patterns using regex
Search for common variable names (
API_KEY
,
SECRET
,
TOKEN
,
PASSWORD
,
CREDENTIAL
)
Check configuration files (
.env
,
config.
,
settings.
,
.json
,
.yaml
,
.yml
)
Examine test fixtures and mock data files
Check embedded data within other formats (JSON encoded in strings, base64 encoded content)
Important Locations to Check:
Configuration files and environment templates
Test fixtures and sample data
Documentation and README files
CI/CD configuration files
Docker and deployment configurations
JSON/YAML data files (secrets may be embedded in structured data)
Git diff outputs or patch files stored in the repository
Phase 2: Inventory Documentation
Before making changes, create a documented list of:
Each unique sensitive value found
All file locations where each value appears
The exact string to match (including surrounding context if needed for uniqueness)
The placeholder to use for replacement
Placeholder Conventions:
Use descriptive placeholders that indicate the type:
,
,
Maintain consistency across all replacements
Match the format specified by the user if provided
Phase 3: Systematic Replacement
Execute replacements methodically:
Work through the inventory one secret at a time
For each secret, replace ALL occurrences across ALL files
Verify each replacement immediately after making it
Use exact string matching to avoid unintended modifications
Replacement Best Practices:
Read files before editing to ensure exact string matching
Handle whitespace and formatting precisely
Consider secrets embedded in JSON or other structured formats (may require escaping)
Use batch replacements when the same value appears multiple times in one file
Phase 4: Verification
After all replacements:
Re-run all original discovery searches to confirm no secrets remain
Search for partial matches of sensitive values
Run any provided test suites to validate the sanitization
Check that placeholders are properly formatted
Common Pitfalls
1. Incomplete Initial Discovery
Problem:
Secrets discovered incrementally during the replacement process, requiring backtracking.
Prevention:
Invest time upfront in comprehensive searching using multiple patterns and checking all file types, including data files and embedded content.
2. Secrets in Unexpected Locations
Problem:
Secrets appear in JSON files, embedded strings, test fixtures, or encoded data that aren't found by simple searches.
Prevention:
Search recursively through all file types. Check JSON and YAML files specifically. Look for base64-encoded content that might contain secrets.
3. Exact String Matching Failures
Problem:
Edit operations fail due to whitespace differences or character encoding issues.
Prevention:
Always read the target file first to understand exact formatting. Copy strings exactly as they appear in the file, including whitespace.
4. Missing Occurrences
Problem:
Multiple occurrences of the same secret exist, but only some are replaced.
Prevention:
After each replacement, immediately verify by searching for the original value again. Use replace-all functionality when available.
5. Git History Considerations
Problem:
Secrets remain in git history even after being removed from current files.
Prevention:
Clarify with the user whether git history sanitization is required. If so, tools like
git filter-branch
or
BFG Repo-Cleaner
may be needed (this is a separate, more complex operation).
6. Tool Availability Assumptions
Problem:
Specialized search tools (like ripgrep) may not be available in all environments.
Prevention:
Have fallback approaches ready. Standard
grep -r
works in most environments. Test tool availability before relying on it.
Verification Checklist
Before considering the task complete:
All known secret patterns have been searched for
Configuration files have been examined
JSON/YAML data files have been checked
Test fixtures and mock data have been reviewed
Each identified secret has been replaced in all locations
Verification searches confirm no original secrets remain
Placeholders follow a consistent format
Any provided tests pass successfully
Search Commands Reference
Standard grep (widely available):
Search for AWS access keys
grep
-rn
"AKIA[A-Z0-9]{16}"
.
--exclude-dir
=
.git
Search for common token prefixes
grep
-rn
"ghp_|gho_|ghs_|hf_"
.
--exclude-dir
=
.git
grep
-rn
-i
"password|passwd|pwd"
.
--exclude-dir
=
.git
Search for API key patterns
grep
-rn
-i
"api[_-]\?key|apikey"
.
--exclude-dir
=
.git
Always exclude the
.git
directory from searches to avoid noise from git internals.