Folder Organization Best Practices

Expert guidance for organizing project directories, establishing file naming conventions, and maintaining clean, navigable project structures for research and development work.

When to Use This Skill Setting up new projects Reorganizing existing projects Establishing team conventions Creating reproducible research structures Managing data-intensive projects Core Principles Predictability - Standard locations for common file types Scalability - Structure grows gracefully with project Discoverability - Easy for others (and future you) to navigate Separation of Concerns - Code, data, documentation, outputs separated Version Control Friendly - Large/generated files excluded appropriately Standard Project Structure Research/Analysis Projects project-name/ ├── README.md # Project overview and getting started ├── .gitignore # Exclude data, outputs, env files ├── environment.yml # Conda environment (or requirements.txt) ├── data/ # Input data (often gitignored) │ ├── raw/ # Original, immutable data │ ├── processed/ # Cleaned, transformed data │ └── external/ # Third-party data ├── notebooks/ # Jupyter notebooks for exploration │ ├── 01-exploration.ipynb │ ├── 02-analysis.ipynb │ └── figures/ # Notebook-generated figures ├── src/ # Source code (reusable modules) │ ├── init.py │ ├── data_processing.py │ ├── analysis.py │ └── visualization.py ├── scripts/ # Standalone scripts and workflows │ ├── download_data.sh │ └── run_pipeline.py ├── tests/ # Unit tests │ └── test_analysis.py ├── docs/ # Documentation │ ├── methods.md │ └── references.md ├── results/ # Analysis outputs (gitignored) │ ├── figures/ │ ├── tables/ │ └── models/ └── config/ # Configuration files └── analysis_config.yaml

Development Projects project-name/ ├── README.md ├── .gitignore ├── setup.py # Package configuration ├── requirements.txt # or pyproject.toml ├── src/ │ └── package_name/ │ ├── init.py │ ├── core.py │ └── utils.py ├── tests/ │ ├── test_core.py │ └── test_utils.py ├── docs/ │ ├── api.md │ └── usage.md ├── examples/ # Example usage │ └── example_workflow.py └── .github/ # CI/CD workflows └── workflows/ └── tests.yml

Bioinformatics/Workflow Projects project-name/ ├── README.md ├── data/ │ ├── raw/ # Raw sequencing data │ ├── reference/ # Reference genomes, annotations │ └── processed/ # Workflow outputs ├── workflows/ # Galaxy .ga or Snakemake files │ ├── preprocessing.ga │ └── assembly.ga ├── config/ │ ├── workflow_params.yaml │ └── sample_sheet.tsv ├── scripts/ # Helper scripts │ ├── submit_workflow.py │ └── quality_check.py ├── results/ # Final outputs │ ├── figures/ │ ├── tables/ │ └── reports/ └── logs/ # Workflow execution logs

File Naming Conventions General Rules

Use lowercase with hyphens or underscores

✅ data-analysis.py or data_analysis.py ❌ DataAnalysis.py or data analysis.py

Be descriptive but concise

✅ process-telomere-data.py ❌ script.py or process_all_the_telomere_sequencing_data_from_experiments.py

Use consistent separators

Choose either hyphens or underscores and stick with it Convention: hyphens for file names, underscores for Python modules

Include version/date for important outputs

✅ report-2026-01-23.pdf or model-v2.pkl ❌ report-final-final-v3.pdf Numbered Sequences

For sequential files (notebooks, scripts), use zero-padded numbers:

notebooks/ ├── 01-data-exploration.ipynb ├── 02-quality-control.ipynb ├── 03-statistical-analysis.ipynb └── 04-visualization.ipynb

Data Files

Include metadata in filename when possible:

data/raw/ ├── sample-A_hifi_reads_2026-01-15.fastq.gz ├── sample-B_hifi_reads_2026-01-15.fastq.gz └── reference_genome_v3.fasta

Directory Management Best Practices What to Version Control

DO commit:

Source code Documentation Configuration files Small test datasets (<1MB) Requirements/environment files README files

DON'T commit:

Large data files (use .gitignore) Generated outputs Environment directories (venv/, conda-env/) Logs Temporary files API keys/secrets .gitignore Template

Python

pycache/ .py[cod] $py.class .venv/ venv/ *.egg-info/

Jupyter

.ipynb_checkpoints/ *.ipynb_checkpoints

Data

data/raw/ data/processed/ .fastq.gz .bam *.vcf.gz

Outputs

results/ outputs/ .png .pdf *.html

Logs

logs/ *.log

Environment

.env environment.local.yml

OS

.DS_Store Thumbs.db

Data Organization Raw Data is Sacred Never modify raw data - Always keep originals untouched Store in data/raw/ and make it read-only if possible Document data provenance (where it came from, when downloaded) Processed Data Hierarchy data/ ├── raw/ # Original, immutable ├── interim/ # Intermediate processing steps ├── processed/ # Final, analysis-ready data └── external/ # Third-party data

Documentation Standards README.md Essentials

Every project should have a README with:

Project Name

Brief description

Installation

How to set up the environment

Usage

How to run the analysis/code

Project Structure

Brief overview of directories

Data

Where data lives and how to access it

Results

Where to find outputs

Code Documentation Docstrings for all functions/classes Comments for complex logic CHANGELOG.md for tracking changes TODO.md for tracking work (gitignored or removed before merge) Common Anti-Patterns to Avoid

❌ Flat structure with everything in root

project/ ├── script1.py ├── script2.py ├── data.csv ├── output1.png ├── output2.png └── final_really_final_v3.xlsx

❌ Ambiguous naming

notebooks/ ├── notebook1.ipynb ├── test.ipynb ├── analysis.ipynb └── analysis_new.ipynb

❌ Mixed concerns

project/ ├── src/ │ ├── analysis.py │ ├── data.csv # Data in source code directory │ └── figure1.png # Output in source code directory

Cleanup and Maintenance Regular Maintenance Tasks Archive old branches - Delete merged feature branches Clean temp files - Remove TODO.md, NOTES.md from completed work Update documentation - Keep README current with changes Review .gitignore - Ensure large files aren't tracked Organize notebooks - Rename/renumber as project evolves End-of-Project Checklist README complete and accurate Code documented Tests passing Large files gitignored Working files removed (TODO.md, scratch notebooks) Final outputs in results/ Environment files current License added (if applicable) Integration with Other Skills

This skill works well with:

python-environment - Environment setup and management claude-collaboration - Team workflow best practices jupyter-notebook-analysis - Notebook organization standards Templates and Tools Quick Project Setup

Create standard research project structure

mkdir -p data/{raw,processed,external} notebooks scripts src tests docs results config touch README.md .gitignore environment.yml

Cookiecutter Templates

Consider using cookiecutter for standardized project templates:

cookiecutter-data-science - Data science projects cookiecutter-research - Research projects Custom team templates References and Resources Cookiecutter Data Science A Quick Guide to Organizing Computational Biology Projects Good Enough Practices in Scientific Computing

folder-organization

安装

Python

Jupyter

Data

Outputs

Logs

Environment

OS

Project Name

Installation

Usage

Project Structure

Data

Results

Create standard research project structure