Folder Organization Best Practices
Expert guidance for organizing project directories, establishing file naming conventions, and maintaining clean, navigable project structures for research and development work.
When to Use This Skill Setting up new projects Reorganizing existing projects Establishing team conventions Creating reproducible research structures Managing data-intensive projects Core Principles Predictability - Standard locations for common file types Scalability - Structure grows gracefully with project Discoverability - Easy for others (and future you) to navigate Separation of Concerns - Code, data, documentation, outputs separated Version Control Friendly - Large/generated files excluded appropriately Standard Project Structure Research/Analysis Projects project-name/ ├── README.md # Project overview and getting started ├── .gitignore # Exclude data, outputs, env files ├── environment.yml # Conda environment (or requirements.txt) ├── data/ # Input data (often gitignored) │ ├── raw/ # Original, immutable data │ ├── processed/ # Cleaned, transformed data │ └── external/ # Third-party data ├── notebooks/ # Jupyter notebooks for exploration │ ├── 01-exploration.ipynb │ ├── 02-analysis.ipynb │ └── figures/ # Notebook-generated figures ├── src/ # Source code (reusable modules) │ ├── init.py │ ├── data_processing.py │ ├── analysis.py │ └── visualization.py ├── scripts/ # Standalone scripts and workflows │ ├── download_data.sh │ └── run_pipeline.py ├── tests/ # Unit tests │ └── test_analysis.py ├── docs/ # Documentation │ ├── methods.md │ └── references.md ├── results/ # Analysis outputs (gitignored) │ ├── figures/ │ ├── tables/ │ └── models/ └── config/ # Configuration files └── analysis_config.yaml
Development Projects project-name/ ├── README.md ├── .gitignore ├── setup.py # Package configuration ├── requirements.txt # or pyproject.toml ├── src/ │ └── package_name/ │ ├── init.py │ ├── core.py │ └── utils.py ├── tests/ │ ├── test_core.py │ └── test_utils.py ├── docs/ │ ├── api.md │ └── usage.md ├── examples/ # Example usage │ └── example_workflow.py └── .github/ # CI/CD workflows └── workflows/ └── tests.yml
Bioinformatics/Workflow Projects project-name/ ├── README.md ├── data/ │ ├── raw/ # Raw sequencing data │ ├── reference/ # Reference genomes, annotations │ └── processed/ # Workflow outputs ├── workflows/ # Galaxy .ga or Snakemake files │ ├── preprocessing.ga │ └── assembly.ga ├── config/ │ ├── workflow_params.yaml │ └── sample_sheet.tsv ├── scripts/ # Helper scripts │ ├── submit_workflow.py │ └── quality_check.py ├── results/ # Final outputs │ ├── figures/ │ ├── tables/ │ └── reports/ └── logs/ # Workflow execution logs
File Naming Conventions General Rules
Use lowercase with hyphens or underscores
✅ data-analysis.py or data_analysis.py ❌ DataAnalysis.py or data analysis.py
Be descriptive but concise
✅ process-telomere-data.py ❌ script.py or process_all_the_telomere_sequencing_data_from_experiments.py
Use consistent separators
Choose either hyphens or underscores and stick with it Convention: hyphens for file names, underscores for Python modules
Include version/date for important outputs
✅ report-2026-01-23.pdf or model-v2.pkl ❌ report-final-final-v3.pdf Numbered Sequences
For sequential files (notebooks, scripts), use zero-padded numbers:
notebooks/ ├── 01-data-exploration.ipynb ├── 02-quality-control.ipynb ├── 03-statistical-analysis.ipynb └── 04-visualization.ipynb
Data Files
Include metadata in filename when possible:
data/raw/ ├── sample-A_hifi_reads_2026-01-15.fastq.gz ├── sample-B_hifi_reads_2026-01-15.fastq.gz └── reference_genome_v3.fasta
Directory Management Best Practices What to Version Control
DO commit:
Source code Documentation Configuration files Small test datasets (<1MB) Requirements/environment files README files
DON'T commit:
Large data files (use .gitignore) Generated outputs Environment directories (venv/, conda-env/) Logs Temporary files API keys/secrets .gitignore Template
Python
pycache/ .py[cod] $py.class .venv/ venv/ *.egg-info/
Jupyter
.ipynb_checkpoints/ *.ipynb_checkpoints
Data
data/raw/ data/processed/ .fastq.gz .bam *.vcf.gz
Outputs
results/ outputs/ .png .pdf *.html
Logs
logs/ *.log
Environment
.env environment.local.yml
OS
.DS_Store Thumbs.db
Data Organization Raw Data is Sacred Never modify raw data - Always keep originals untouched Store in data/raw/ and make it read-only if possible Document data provenance (where it came from, when downloaded) Processed Data Hierarchy data/ ├── raw/ # Original, immutable ├── interim/ # Intermediate processing steps ├── processed/ # Final, analysis-ready data └── external/ # Third-party data
Documentation Standards README.md Essentials
Every project should have a README with:
Project Name
Brief description
Installation
How to set up the environment
Usage
How to run the analysis/code
Project Structure
Brief overview of directories
Data
Where data lives and how to access it
Results
Where to find outputs
Code Documentation Docstrings for all functions/classes Comments for complex logic CHANGELOG.md for tracking changes TODO.md for tracking work (gitignored or removed before merge) Common Anti-Patterns to Avoid
❌ Flat structure with everything in root
project/ ├── script1.py ├── script2.py ├── data.csv ├── output1.png ├── output2.png └── final_really_final_v3.xlsx
❌ Ambiguous naming
notebooks/ ├── notebook1.ipynb ├── test.ipynb ├── analysis.ipynb └── analysis_new.ipynb
❌ Mixed concerns
project/ ├── src/ │ ├── analysis.py │ ├── data.csv # Data in source code directory │ └── figure1.png # Output in source code directory
Cleanup and Maintenance Regular Maintenance Tasks Archive old branches - Delete merged feature branches Clean temp files - Remove TODO.md, NOTES.md from completed work Update documentation - Keep README current with changes Review .gitignore - Ensure large files aren't tracked Organize notebooks - Rename/renumber as project evolves End-of-Project Checklist README complete and accurate Code documented Tests passing Large files gitignored Working files removed (TODO.md, scratch notebooks) Final outputs in results/ Environment files current License added (if applicable) Integration with Other Skills
This skill works well with:
python-environment - Environment setup and management claude-collaboration - Team workflow best practices jupyter-notebook-analysis - Notebook organization standards Templates and Tools Quick Project Setup
Create standard research project structure
mkdir -p data/{raw,processed,external} notebooks scripts src tests docs results config touch README.md .gitignore environment.yml
Cookiecutter Templates
Consider using cookiecutter for standardized project templates:
cookiecutter-data-science - Data science projects cookiecutter-research - Research projects Custom team templates References and Resources Cookiecutter Data Science A Quick Guide to Organizing Computational Biology Projects Good Enough Practices in Scientific Computing