find-hypertable-candidates

安装量: 39
排名: #18466

安装

npx skills add https://github.com/timescale/pg-aiguide --skill find-hypertable-candidates

PostgreSQL Hypertable Candidate Analysis Identify tables that would benefit from TimescaleDB hypertable conversion. After identification, use the companion "migrate-postgres-tables-to-hypertables" skill for configuration and migration. TimescaleDB Benefits Performance gains: 90%+ compression, fast time-based queries, improved insert performance, efficient aggregations, continuous aggregates for materialization (dashboards, reports, analytics), automatic data management (retention, compression). Best for insert-heavy patterns: Time-series data (sensors, metrics, monitoring) Event logs (user events, audit trails, application logs) Transaction records (orders, payments, financial) Sequential data (auto-incrementing IDs with timestamps) Append-only datasets (immutable records, historical) Requirements: Large volumes (1M+ rows), time-based queries, infrequent updates Step 1: Database Schema Analysis Option A: From Database Connection Table statistics and size -- Get all tables with row counts and insert/update patterns WITH table_stats AS ( SELECT schemaname , tablename , n_tup_ins as total_inserts , n_tup_upd as total_updates , n_tup_del as total_deletes , n_live_tup as live_rows , n_dead_tup as dead_rows FROM pg_stat_user_tables ) , table_sizes AS ( SELECT schemaname , tablename , pg_size_pretty ( pg_total_relation_size ( schemaname || '.' || tablename ) ) as total_size , pg_total_relation_size ( schemaname || '.' || tablename ) as total_size_bytes FROM pg_tables WHERE schemaname NOT IN ( 'information_schema' , 'pg_catalog' ) ) SELECT ts . schemaname , ts . tablename , ts . live_rows , tsize . total_size , tsize . total_size_bytes , ts . total_inserts , ts . total_updates , ts . total_deletes , ROUND ( CASE WHEN ts . live_rows

0 THEN ( ts . total_inserts:: float / ts . live_rows ) * 100 ELSE 0 END , 2 ) as insert_ratio_pct FROM table_stats ts JOIN table_sizes tsize ON ts . schemaname = tsize . schemaname AND ts . tablename = tsize . tablename ORDER BY tsize . total_size_bytes DESC ; Look for: mostly insert-heavy patterns (less updates/deletes) big tables (1M+ rows or 100MB+) Index patterns -- Identify common query dimensions SELECT schemaname , tablename , indexname , indexdef FROM pg_indexes WHERE schemaname NOT IN ( 'information_schema' , 'pg_catalog' ) ORDER BY tablename , indexname ; Look for: Multiple indexes with timestamp/created_at columns → time-based queries Composite (entity_id, timestamp) indexes → good candidates Time-only indexes → time range filtering common Query patterns (if pg_stat_statements available) -- Check availability SELECT EXISTS ( SELECT 1 FROM pg_extension WHERE extname = 'pg_stat_statements' ) ; -- Analyze expensive queries for candidate tables SELECT query , calls , mean_exec_time , total_exec_time FROM pg_stat_statements WHERE query ILIKE '%your_table_name%' ORDER BY total_exec_time DESC LIMIT 20 ; ✅ Good patterns: Time-based WHERE, entity filtering combined with time-based qualifiers, GROUP BY time_bucket, range queries over time ❌ Poor patterns: Non-time lookups with no time-based qualifiers in same query (WHERE email = ...) Constraints -- Check migration compatibility SELECT conname , contype , pg_get_constraintdef ( oid ) as definition FROM pg_constraint WHERE conrelid = 'your_table_name' ::regclass ; Compatibility: Primary keys (p): Must include partition column or ask user if can be modified Foreign keys (f): Plain→Hypertable and Hypertable→Plain OK, Hypertable→Hypertable NOT supported Unique constraints (u): Must include partition column or ask user if can be modified Check constraints (c): Usually OK Option B: From Code Analysis ✅ GOOD Patterns

Append-only logging

INSERT INTO events ( user_id , event_time , data ) VALUES ( . . . ) ;

Time-series collection

INSERT INTO metrics ( device_id , timestamp , value ) VALUES ( . . . ) ;

Time-based queries

SELECT * FROM metrics WHERE timestamp

= NOW ( ) - INTERVAL '24 hours' ;

Time aggregations

SELECT DATE_TRUNC ( 'day' , timestamp ) , COUNT ( * ) GROUP BY 1 ; ❌ POOR Patterns

Frequent updates to historical records

UPDATE users SET email

. . . , updated_at = NOW ( ) WHERE id = . . . ;

Non-time lookups

SELECT * FROM users WHERE email = . . . ;

Small reference tables

SELECT * FROM countries ORDER BY name ; Schema Indicators ✅ GOOD: Has timestamp/timestamptz column Multiple indexes with timestamp-based columns Composite (entity_id, timestamp) indexes ❌ POOR: Mostly indexes with non-time-based columns (on columns like email, name, status, etc.) Columns that you expect to be updated over time (updated_at, updated_by, status, etc.) Unique constraints on non-time fields Frequent updated_at modifications Small static tables Special Case: ID-Based Tables Sequential ID tables can be candidates if: Insert-mostly pattern / updates are either infrequent or only on recent records. If updates do happen, they occur on recent records (such as an order status being updated orderered->processing->delivered. Note once an order is delivered, it is unlikely to be updated again.) IDs correlate with time (as is the case for serial/auto-incrementing IDs/GENERATED ALWAYS AS IDENTITY) ID is the primary query dimension Recent data accessed more often (frequently the case in ecommerce, finance, etc.) Time-based reporting common (e.g. monthly, daily summaries/analytics) CREATE TABLE orders ( id BIGSERIAL PRIMARY KEY , -- Can partition by ID user_id BIGINT , created_at TIMESTAMPTZ DEFAULT NOW ( ) -- For sparse indexes ) ; Note: For ID-based tables where there is also a time column (created_at, ordered_at, etc.), you can partition by ID and use sparse indexes on the time column. See the migrate-postgres-tables-to-hypertables skill for details. Step 2: Candidacy Scoring (8+ points = good candidate) Time-Series Characteristics (5+ points needed) Has timestamp/timestamptz column: 3 points Data inserted chronologically: 2 points Queries filter by time: 2 points Time aggregations common: 2 points Scale & Performance (3+ points recommended) Large table (1M+ rows or 100MB+): 2 points High insert volume: 1 point Infrequent updates to historical: 1 point Range queries common: 1 point Aggregation queries: 2 points Data Patterns (bonus) Contains entity ID for segmentation (device_id, user_id, product_id, symbol, etc.): 1 point Numeric measurements: 1 point Log/event structure: 1 point Common Patterns ✅ GOOD Candidates ✅ Event/Log Tables (user_events, audit_logs) CREATE TABLE user_events ( id BIGSERIAL PRIMARY KEY , user_id BIGINT , event_type TEXT , event_time TIMESTAMPTZ DEFAULT NOW ( ) , metadata JSONB ) ; -- Partition by id, segment by user_id, enable minmax sparse_index on event_time ✅ Sensor/IoT Data (sensor_readings, telemetry) CREATE TABLE sensor_readings ( device_id TEXT , timestamp TIMESTAMPTZ , temperature DOUBLE PRECISION , humidity DOUBLE PRECISION ) ; -- Partition by timestamp, segment by device_id, minmax sparse indexes on temperature and humidity ✅ Financial/Trading (stock_prices, transactions) CREATE TABLE stock_prices ( symbol VARCHAR ( 10 ) , price_time TIMESTAMPTZ , open_price DECIMAL , close_price DECIMAL , volume BIGINT ) ; -- Partition by price_time, segment by symbol, minmax sparse indexes on open_price and close_price and volume ✅ System Metrics (monitoring_data) CREATE TABLE system_metrics ( hostname TEXT , metric_time TIMESTAMPTZ , cpu_usage DOUBLE PRECISION , memory_usage BIGINT ) ; -- Partition by metric_time, segment by hostname, minmax sparse indexes on cpu_usage and memory_usage ❌ POOR Candidates ❌ Reference Tables (countries, categories) CREATE TABLE countries ( id SERIAL PRIMARY KEY , name VARCHAR ( 100 ) , code CHAR ( 2 ) ) ; -- Static data, no time component ❌ User Profiles (users, accounts) CREATE TABLE users ( id BIGSERIAL PRIMARY KEY , email VARCHAR ( 255 ) , created_at TIMESTAMPTZ , updated_at TIMESTAMPTZ ) ; -- Accessed by ID, frequently updated, has timestamp but it's not the primary query dimension (the primary query dimension is id or email) ❌ Settings/Config (user_settings) CREATE TABLE user_settings ( user_id BIGINT PRIMARY KEY , theme VARCHAR ( 20 ) , -- Changes: light -> dark -> auto language VARCHAR ( 10 ) , -- Changes: en -> es -> fr notifications JSONB , -- Frequent preference updates updated_at TIMESTAMPTZ ) ; -- Accessed by user_id, frequently updated, has timestamp but it's not the primary query dimension (the primary query dimension is user_id) Analysis Output Requirements For each candidate table provide: Score: Based on criteria (8+ = strong candidate) Pattern: Insert vs update ratio Access: Time-based vs entity lookups Size: Current size and growth rate Queries: Time-range, aggregations, point lookups Focus on insert-heavy patterns with time-based or sequential access. Tables scoring 8+ points are strong candidates for conversion.

返回排行榜