ClickHouse Best Practices Comprehensive guidance for ClickHouse covering schema design, query optimization, and data ingestion. Contains 28 rules across 3 main categories (schema, query, insert), prioritized by impact. Official docs: ClickHouse Best Practices IMPORTANT: How to Apply This Skill Before answering ClickHouse questions, follow this priority order: Check for applicable rules in the rules/ directory If rules exist: Apply them and cite them in your response using "Per rule-name ..." If no rule exists: Use the LLM's ClickHouse knowledge or search documentation If uncertain: Use web search for current best practices Always cite your source: rule name, "general ClickHouse guidance", or URL Why rules take priority: ClickHouse has specific behaviors (columnar storage, sparse indexes, merge tree mechanics) where general database intuition can be misleading. The rules encode validated, ClickHouse-specific guidance. For Formal Reviews When performing a formal review of schemas, queries, or data ingestion: Review Procedures For Schema Reviews (CREATE TABLE, ALTER TABLE) Read these rule files in order: rules/schema-pk-plan-before-creation.md - ORDER BY is immutable rules/schema-pk-cardinality-order.md - Column ordering in keys rules/schema-pk-prioritize-filters.md - Filter column inclusion rules/schema-types-native-types.md - Proper type selection rules/schema-types-minimize-bitwidth.md - Numeric type sizing rules/schema-types-lowcardinality.md - LowCardinality usage rules/schema-types-avoid-nullable.md - Nullable vs DEFAULT rules/schema-partition-low-cardinality.md - Partition count limits rules/schema-partition-lifecycle.md - Partitioning purpose Check for: PRIMARY KEY / ORDER BY column order (low-to-high cardinality) Data types match actual data ranges LowCardinality applied to appropriate string columns Partition key cardinality bounded (100-1,000 values) ReplacingMergeTree has version column if used For Query Reviews (SELECT, JOIN, aggregations) Read these rule files: rules/query-join-choose-algorithm.md - Algorithm selection rules/query-join-filter-before.md - Pre-join filtering rules/query-join-use-any.md - ANY vs regular JOIN rules/query-index-skipping-indices.md - Secondary index usage rules/schema-pk-filter-on-orderby.md - Filter alignment with ORDER BY Check for: Filters use ORDER BY prefix columns JOINs filter tables before joining (not after) Correct JOIN algorithm for table sizes Skipping indices for non-ORDER BY filter columns For Insert Strategy Reviews (data ingestion, updates, deletes) Read these rule files: rules/insert-batch-size.md - Batch sizing requirements rules/insert-mutation-avoid-update.md - UPDATE alternatives rules/insert-mutation-avoid-delete.md - DELETE alternatives rules/insert-async-small-batches.md - Async insert usage rules/insert-optimize-avoid-final.md - OPTIMIZE TABLE risks Check for: Batch size 10K-100K rows per INSERT No ALTER TABLE UPDATE for frequent changes ReplacingMergeTree or CollapsingMergeTree for update patterns Async inserts enabled for high-frequency small batches Output Format Structure your response as follows:
Rules Checked
rule-name-1- Compliant / Violation foundrule-name-2- Compliant / Violation found ...
Findings
Violations
rule-name: Description of the issue- Current: [what the code does]
- Required: [what it should do]
- Fix: [specific correction]
Compliant
rule-name: Brief note on why it's correct
Recommendations
- [Prioritized list of changes, citing rules]
- Rule Categories by Priority
- Priority
- Category
- Impact
- Prefix
- Rule Count
- 1
- Primary Key Selection
- CRITICAL
- schema-pk-
- 4
- 2
- Data Type Selection
- CRITICAL
- schema-types-
- 5
- 3
- JOIN Optimization
- CRITICAL
- query-join-
- 5
- 4
- Insert Batching
- CRITICAL
- insert-batch-
- 1
- 5
- Mutation Avoidance
- CRITICAL
- insert-mutation-
- 2
- 6
- Partitioning Strategy
- HIGH
- schema-partition-
- 4
- 7
- Skipping Indices
- HIGH
- query-index-
- 1
- 8
- Materialized Views
- HIGH
- query-mv-
- 2
- 9
- Async Inserts
- HIGH
- insert-async-
- 2
- 10
- OPTIMIZE Avoidance
- HIGH
- insert-optimize-
- 1
- 11
- JSON Usage
- MEDIUM
- schema-json-
- 1
- Quick Reference
- Schema Design - Primary Key (CRITICAL)
- schema-pk-plan-before-creation
- - Plan ORDER BY before table creation (immutable)
- schema-pk-cardinality-order
- - Order columns low-to-high cardinality
- schema-pk-prioritize-filters
- - Include frequently filtered columns
- schema-pk-filter-on-orderby
- - Query filters must use ORDER BY prefix
- Schema Design - Data Types (CRITICAL)
- schema-types-native-types
- - Use native types, not String for everything
- schema-types-minimize-bitwidth
- - Use smallest numeric type that fits
- schema-types-lowcardinality
- - LowCardinality for <10K unique strings
- schema-types-enum
- - Enum for finite value sets with validation
- schema-types-avoid-nullable
- - Avoid Nullable; use DEFAULT instead
- Schema Design - Partitioning (HIGH)
- schema-partition-low-cardinality
- - Keep partition count 100-1,000
- schema-partition-lifecycle
- - Use partitioning for data lifecycle, not queries
- schema-partition-query-tradeoffs
- - Understand partition pruning trade-offs
- schema-partition-start-without
- - Consider starting without partitioning
- Schema Design - JSON (MEDIUM)
- schema-json-when-to-use
- - JSON for dynamic schemas; typed columns for known
- Query Optimization - JOINs (CRITICAL)
- query-join-choose-algorithm
- - Select algorithm based on table sizes
- query-join-use-any
- - ANY JOIN when only one match needed
- query-join-filter-before
- - Filter tables before joining
- query-join-consider-alternatives
- - Dictionaries/denormalization vs JOIN
- query-join-null-handling
- - join_use_nulls=0 for default values
- Query Optimization - Indices (HIGH)
- query-index-skipping-indices
- - Skipping indices for non-ORDER BY filters
- Query Optimization - Materialized Views (HIGH)
- query-mv-incremental
- - Incremental MVs for real-time aggregations
- query-mv-refreshable
- - Refreshable MVs for complex joins
- Insert Strategy - Batching (CRITICAL)
- insert-batch-size
- - Batch 10K-100K rows per INSERT
- Insert Strategy - Async (HIGH)
- insert-async-small-batches
- - Async inserts for high-frequency small batches
- insert-format-native
- - Native format for best performance
- Insert Strategy - Mutations (CRITICAL)
- insert-mutation-avoid-update
- - ReplacingMergeTree instead of ALTER UPDATE
- insert-mutation-avoid-delete
- - Lightweight DELETE or DROP PARTITION
- Insert Strategy - Optimization (HIGH)
- insert-optimize-avoid-final
- - Let background merges work
- When to Apply
- This skill activates when you encounter:
- CREATE TABLE
- statements
- ALTER TABLE
- modifications
- ORDER BY
- or
- PRIMARY KEY
- discussions
- Data type selection questions
- Slow query troubleshooting
- JOIN optimization requests
- Data ingestion pipeline design
- Update/delete strategy questions
- ReplacingMergeTree or other specialized engine usage
- Partitioning strategy decisions
- Rule File Structure
- Each rule file in
- rules/
- contains:
- YAML frontmatter
-
- title, impact level, tags
- Brief explanation
-
- Why this rule matters
- Incorrect example
-
- Anti-pattern with explanation
- Correct example
-
- Best practice with explanation
- Additional context
- Trade-offs, when to apply, references Full Compiled Document For the complete guide with all rules expanded inline: AGENTS.md Use AGENTS.md when you need to check multiple rules quickly without reading individual files.