clickhouse-best-practices

安装量: 1.4K
排名: #1051

安装

npx skills add https://github.com/clickhouse/agent-skills --skill clickhouse-best-practices

ClickHouse Best Practices Comprehensive guidance for ClickHouse covering schema design, query optimization, and data ingestion. Contains 28 rules across 3 main categories (schema, query, insert), prioritized by impact. Official docs: ClickHouse Best Practices IMPORTANT: How to Apply This Skill Before answering ClickHouse questions, follow this priority order: Check for applicable rules in the rules/ directory If rules exist: Apply them and cite them in your response using "Per rule-name ..." If no rule exists: Use the LLM's ClickHouse knowledge or search documentation If uncertain: Use web search for current best practices Always cite your source: rule name, "general ClickHouse guidance", or URL Why rules take priority: ClickHouse has specific behaviors (columnar storage, sparse indexes, merge tree mechanics) where general database intuition can be misleading. The rules encode validated, ClickHouse-specific guidance. For Formal Reviews When performing a formal review of schemas, queries, or data ingestion: Review Procedures For Schema Reviews (CREATE TABLE, ALTER TABLE) Read these rule files in order: rules/schema-pk-plan-before-creation.md - ORDER BY is immutable rules/schema-pk-cardinality-order.md - Column ordering in keys rules/schema-pk-prioritize-filters.md - Filter column inclusion rules/schema-types-native-types.md - Proper type selection rules/schema-types-minimize-bitwidth.md - Numeric type sizing rules/schema-types-lowcardinality.md - LowCardinality usage rules/schema-types-avoid-nullable.md - Nullable vs DEFAULT rules/schema-partition-low-cardinality.md - Partition count limits rules/schema-partition-lifecycle.md - Partitioning purpose Check for: PRIMARY KEY / ORDER BY column order (low-to-high cardinality) Data types match actual data ranges LowCardinality applied to appropriate string columns Partition key cardinality bounded (100-1,000 values) ReplacingMergeTree has version column if used For Query Reviews (SELECT, JOIN, aggregations) Read these rule files: rules/query-join-choose-algorithm.md - Algorithm selection rules/query-join-filter-before.md - Pre-join filtering rules/query-join-use-any.md - ANY vs regular JOIN rules/query-index-skipping-indices.md - Secondary index usage rules/schema-pk-filter-on-orderby.md - Filter alignment with ORDER BY Check for: Filters use ORDER BY prefix columns JOINs filter tables before joining (not after) Correct JOIN algorithm for table sizes Skipping indices for non-ORDER BY filter columns For Insert Strategy Reviews (data ingestion, updates, deletes) Read these rule files: rules/insert-batch-size.md - Batch sizing requirements rules/insert-mutation-avoid-update.md - UPDATE alternatives rules/insert-mutation-avoid-delete.md - DELETE alternatives rules/insert-async-small-batches.md - Async insert usage rules/insert-optimize-avoid-final.md - OPTIMIZE TABLE risks Check for: Batch size 10K-100K rows per INSERT No ALTER TABLE UPDATE for frequent changes ReplacingMergeTree or CollapsingMergeTree for update patterns Async inserts enabled for high-frequency small batches Output Format Structure your response as follows:

Rules Checked

  • rule-name-1 - Compliant / Violation found
  • rule-name-2 - Compliant / Violation found ...

Findings

Violations

  • rule-name: Description of the issue
  • Current: [what the code does]
  • Required: [what it should do]
  • Fix: [specific correction]

Compliant

  • rule-name: Brief note on why it's correct

Recommendations

[Prioritized list of changes, citing rules]
Rule Categories by Priority
Priority
Category
Impact
Prefix
Rule Count
1
Primary Key Selection
CRITICAL
schema-pk-
4
2
Data Type Selection
CRITICAL
schema-types-
5
3
JOIN Optimization
CRITICAL
query-join-
5
4
Insert Batching
CRITICAL
insert-batch-
1
5
Mutation Avoidance
CRITICAL
insert-mutation-
2
6
Partitioning Strategy
HIGH
schema-partition-
4
7
Skipping Indices
HIGH
query-index-
1
8
Materialized Views
HIGH
query-mv-
2
9
Async Inserts
HIGH
insert-async-
2
10
OPTIMIZE Avoidance
HIGH
insert-optimize-
1
11
JSON Usage
MEDIUM
schema-json-
1
Quick Reference
Schema Design - Primary Key (CRITICAL)
schema-pk-plan-before-creation
- Plan ORDER BY before table creation (immutable)
schema-pk-cardinality-order
- Order columns low-to-high cardinality
schema-pk-prioritize-filters
- Include frequently filtered columns
schema-pk-filter-on-orderby
- Query filters must use ORDER BY prefix
Schema Design - Data Types (CRITICAL)
schema-types-native-types
- Use native types, not String for everything
schema-types-minimize-bitwidth
- Use smallest numeric type that fits
schema-types-lowcardinality
- LowCardinality for <10K unique strings
schema-types-enum
- Enum for finite value sets with validation
schema-types-avoid-nullable
- Avoid Nullable; use DEFAULT instead
Schema Design - Partitioning (HIGH)
schema-partition-low-cardinality
- Keep partition count 100-1,000
schema-partition-lifecycle
- Use partitioning for data lifecycle, not queries
schema-partition-query-tradeoffs
- Understand partition pruning trade-offs
schema-partition-start-without
- Consider starting without partitioning
Schema Design - JSON (MEDIUM)
schema-json-when-to-use
- JSON for dynamic schemas; typed columns for known
Query Optimization - JOINs (CRITICAL)
query-join-choose-algorithm
- Select algorithm based on table sizes
query-join-use-any
- ANY JOIN when only one match needed
query-join-filter-before
- Filter tables before joining
query-join-consider-alternatives
- Dictionaries/denormalization vs JOIN
query-join-null-handling
- join_use_nulls=0 for default values
Query Optimization - Indices (HIGH)
query-index-skipping-indices
- Skipping indices for non-ORDER BY filters
Query Optimization - Materialized Views (HIGH)
query-mv-incremental
- Incremental MVs for real-time aggregations
query-mv-refreshable
- Refreshable MVs for complex joins
Insert Strategy - Batching (CRITICAL)
insert-batch-size
- Batch 10K-100K rows per INSERT
Insert Strategy - Async (HIGH)
insert-async-small-batches
- Async inserts for high-frequency small batches
insert-format-native
- Native format for best performance
Insert Strategy - Mutations (CRITICAL)
insert-mutation-avoid-update
- ReplacingMergeTree instead of ALTER UPDATE
insert-mutation-avoid-delete
- Lightweight DELETE or DROP PARTITION
Insert Strategy - Optimization (HIGH)
insert-optimize-avoid-final
- Let background merges work
When to Apply
This skill activates when you encounter:
CREATE TABLE
statements
ALTER TABLE
modifications
ORDER BY
or
PRIMARY KEY
discussions
Data type selection questions
Slow query troubleshooting
JOIN optimization requests
Data ingestion pipeline design
Update/delete strategy questions
ReplacingMergeTree or other specialized engine usage
Partitioning strategy decisions
Rule File Structure
Each rule file in
rules/
contains:
YAML frontmatter
title, impact level, tags
Brief explanation
Why this rule matters
Incorrect example
Anti-pattern with explanation
Correct example
Best practice with explanation
Additional context
Trade-offs, when to apply, references Full Compiled Document For the complete guide with all rules expanded inline: AGENTS.md Use AGENTS.md when you need to check multiple rules quickly without reading individual files.
返回排行榜