altinity-expert-clickhouse-index-analysis

安装量: 38
排名: #18591

安装

npx skills add https://github.com/altinity/skills --skill altinity-expert-clickhouse-index-analysis
Diagnostics
Run all queries from the file checks.sql and analyze the results.
Deep Dive Queries (Placeholder-Based)
EXPLAIN Index Usage for Specific Query
EXPLAIN
indexes
=
1
{query_without_format}
Look for:
PrimaryKey
condition should not be
true
(means no filtering)
Granules: X/Y
ratio shows selectivity (low X/Y = good)
Skip
indexes should reduce parts/granules further
Column Cardinality Analysis
SELECT
{
columns
}
APPLY
uniq
FROM
{
database
}
.
{
table
}
WHERE
{time_column}
>
now
(
)
-
INTERVAL
{days}
DAY
Optimal ORDER BY ordering:
Low cardinality columns first, high cardinality last.
Query Pattern WHERE Columns Extraction
WITH
any
(
query
)
AS
q
,
arrayJoin
(
extractAll
(
query
,
'\b(?:PRE)?WHERE\s+(.*?)\s+(?:GROUP BY|ORDER BY|UNION|SETTINGS|FORMAT|$)'
)
)
AS
w
,
arrayFilter
(
x
-
>
(
position
(
w
,
extract
(
x
,
'\.([^]+`|[^\.]+)$'
)
)
>
0
)
,
columns
)
AS
c
,
arrayJoin
(
c
)
AS
c2
SELECT
c2
,
count
(
)
AS
usage_count
FROM
system
.
query_log
WHERE
event_time
>=
now
(
)
-
toIntervalDay
(
{days}
)
AND
arrayExists
(
x
-
>
x
LIKE
'%{table}%'
,
tables
)
AND
query
ILIKE
'SELECT%'
AND
type
=
'QueryFinish'
GROUP
BY
c2
ORDER
BY
usage_count
DESC
FORMAT PrettyCompactMonoBlock
Normalized WHERE Clause Patterns
WITH
arrayJoin
(
extractAll
(
normalizeQuery
(
query
)
,
'\b(?:PRE)?WHERE\s+(.*?)\s+(?:GROUP BY|ORDER BY|UNION|SETTINGS|FORMAT|$)'
)
)
AS
w
SELECT
w
AS
where_pattern
,
count
(
)
AS
frequency
FROM
system
.
query_log
WHERE
event_time
>=
now
(
)
-
toIntervalDay
(
{days}
)
AND
arrayExists
(
x
-
>
x
LIKE
'%{table}%'
,
tables
)
AND
query
ILIKE
'SELECT%'
AND
type
=
'QueryFinish'
GROUP
BY
w
ORDER
BY
frequency
DESC
LIMIT
20
Granule Selectivity from Query Log
SELECT
query_id
,
normalized_query_hash
,
selected_parts
,
selected_marks
,
read_rows
,
round
(
read_rows
/
nullIf
(
selected_marks
,
0
)
)
AS
rows_per_mark
,
query_duration_ms
,
formatReadableSize
(
read_bytes
)
AS
read_bytes
FROM
system
.
query_log
WHERE
event_time
>=
now
(
)
-
toIntervalDay
(
{days}
)
AND
arrayExists
(
x
-
>
x
LIKE
'%{table}%'
,
tables
)
AND
query
ILIKE
'SELECT%'
AND
type
=
'QueryFinish'
ORDER
BY
selected_marks
DESC
LIMIT
20
High
selected_marks
/ total marks = poor index utilization.
Analysis Workflow
Step 1: Check Current Indexes
-- Table structure with ORDER BY, PRIMARY KEY, indexes
SHOW
CREATE
TABLE
{
database
}
.
{
table
}
-- Skipping indexes
SELECT
name
,
type
,
expr
,
granularity
FROM
system
.
data_skipping_indices
WHERE
database
=
'{database}'
AND
table
=
'{table}'
Step 2: Extract Query Patterns
Run the WHERE column extraction and normalized pattern queries to understand:
Which columns appear most frequently in WHERE clauses
What condition combinations are common
Step 3: Check Column Cardinalities
Compare cardinalities of columns in:
Current ORDER BY key
Frequently filtered columns from Step 2
Step 4: Evaluate Index Alignment
Query Pattern
Index Support
Action
Filters on ORDER BY prefix
✅ Good
None
Filters on non-ORDER BY cols
⚠️ Skip index?
Add bloom_filter or projection
Time range + entity
⚠️ Check order
Time in ORDER BY or partition?
High-cardinality first in ORDER BY
❌ Bad
Reorder (low→high cardinality)
ORDER BY Design Guidelines
Column Order Principles
Lowest cardinality first
- maximizes granule skipping
Most frequently filtered
- columns in WHERE should be in ORDER BY
Time column considerations:
If most queries filter on time ranges → include in ORDER BY (possibly with lower resolution like
toDate(ts)
)
If partition key handles time filtering → may not need in ORDER BY
Common Anti-Patterns
Anti-Pattern
Problem
Fix
High-cardinality UUID first
No granule skipping
Move after low-cardinality columns
DateTime64 microseconds first
Too granular
Use
toDate()
or
toStartOfHour()
Column in WHERE not in ORDER BY
Full scan
Add to ORDER BY or create projection
Bloom filter on ORDER BY column
Redundant
Remove skip index
Time not in ORDER BY or partition
Range queries scan all
Add
toDate(ts)
to ORDER BY prefix
Cardinality Ordering Example
Given cardinalities:
entity_type
6
entity
18,588
cast_hash
335,620 Recommended ORDER BY: (entity_type, entity, cast_hash, ...) Skipping Index Guidelines When Skip Indexes Help Column NOT in ORDER BY Column values correlate with physical data order Low false-positive rate for the index type When Skip Indexes Don't Help Column already in ORDER BY prefix (use PRIMARY KEY instead) Column values randomly distributed (no correlation with ORDER BY) Very high cardinality with set/bloom_filter
返回排行榜