database-designer

安装量: 42
排名: #17288

安装

npx skills add https://github.com/borghei/claude-skills --skill database-designer
Database Designer - POWERFUL Tier Skill
Overview
A comprehensive database design skill that provides expert-level analysis, optimization, and migration capabilities for modern database systems. This skill combines theoretical principles with practical tools to help architects and developers create scalable, performant, and maintainable database schemas.
Core Competencies
Schema Design & Analysis
Normalization Analysis
Automated detection of normalization levels (1NF through BCNF)
Denormalization Strategy
Smart recommendations for performance optimization
Data Type Optimization
Identification of inappropriate types and size issues
Constraint Analysis
Missing foreign keys, unique constraints, and null checks
Naming Convention Validation
Consistent table and column naming patterns
ERD Generation
Automatic Mermaid diagram creation from DDL
Index Optimization
Index Gap Analysis
Identification of missing indexes on foreign keys and query patterns
Composite Index Strategy
Optimal column ordering for multi-column indexes
Index Redundancy Detection
Elimination of overlapping and unused indexes
Performance Impact Modeling
Selectivity estimation and query cost analysis
Index Type Selection
B-tree, hash, partial, covering, and specialized indexes
Migration Management
Zero-Downtime Migrations
Expand-contract pattern implementation
Schema Evolution
Safe column additions, deletions, and type changes
Data Migration Scripts
Automated data transformation and validation
Rollback Strategy
Complete reversal capabilities with validation
Execution Planning
Ordered migration steps with dependency resolution
Database Design Principles
Normalization Forms
First Normal Form (1NF)
Atomic Values
Each column contains indivisible values
Unique Column Names
No duplicate column names within a table
Uniform Data Types
Each column contains the same type of data
Row Uniqueness
No duplicate rows in the table
Example Violation:
-- BAD: Multiple phone numbers in one column
CREATE
TABLE
contacts
(
id
INT
PRIMARY
KEY
,
name
VARCHAR
(
100
)
,
phones
VARCHAR
(
200
)
-- "123-456-7890, 098-765-4321"
)
;
-- GOOD: Separate table for phone numbers
CREATE
TABLE
contacts
(
id
INT
PRIMARY
KEY
,
name
VARCHAR
(
100
)
)
;
CREATE
TABLE
contact_phones
(
id
INT
PRIMARY
KEY
,
contact_id
INT
REFERENCES
contacts
(
id
)
,
phone_number
VARCHAR
(
20
)
,
phone_type
VARCHAR
(
10
)
)
;
Second Normal Form (2NF)
1NF Compliance
Must satisfy First Normal Form
Full Functional Dependency
Non-key attributes depend on the entire primary key
Partial Dependency Elimination
Remove attributes that depend on part of a composite key
Example Violation:
-- BAD: Student course table with partial dependencies
CREATE
TABLE
student_courses
(
student_id
INT
,
course_id
INT
,
student_name
VARCHAR
(
100
)
,
-- Depends only on student_id
course_name
VARCHAR
(
100
)
,
-- Depends only on course_id
grade
CHAR
(
1
)
,
PRIMARY
KEY
(
student_id
,
course_id
)
)
;
-- GOOD: Separate tables eliminate partial dependencies
CREATE
TABLE
students
(
id
INT
PRIMARY
KEY
,
name
VARCHAR
(
100
)
)
;
CREATE
TABLE
courses
(
id
INT
PRIMARY
KEY
,
name
VARCHAR
(
100
)
)
;
CREATE
TABLE
enrollments
(
student_id
INT
REFERENCES
students
(
id
)
,
course_id
INT
REFERENCES
courses
(
id
)
,
grade
CHAR
(
1
)
,
PRIMARY
KEY
(
student_id
,
course_id
)
)
;
Third Normal Form (3NF)
2NF Compliance
Must satisfy Second Normal Form
Transitive Dependency Elimination
Non-key attributes should not depend on other non-key attributes
Direct Dependency
Non-key attributes depend directly on the primary key
Example Violation:
-- BAD: Employee table with transitive dependency
CREATE
TABLE
employees
(
id
INT
PRIMARY
KEY
,
name
VARCHAR
(
100
)
,
department_id
INT
,
department_name
VARCHAR
(
100
)
,
-- Depends on department_id, not employee id
department_budget
DECIMAL
(
10
,
2
)
-- Transitive dependency
)
;
-- GOOD: Separate department information
CREATE
TABLE
departments
(
id
INT
PRIMARY
KEY
,
name
VARCHAR
(
100
)
,
budget
DECIMAL
(
10
,
2
)
)
;
CREATE
TABLE
employees
(
id
INT
PRIMARY
KEY
,
name
VARCHAR
(
100
)
,
department_id
INT
REFERENCES
departments
(
id
)
)
;
Boyce-Codd Normal Form (BCNF)
3NF Compliance
Must satisfy Third Normal Form
Determinant Key Rule
Every determinant must be a candidate key
Stricter 3NF
Handles anomalies not covered by 3NF
Denormalization Strategies
When to Denormalize
Read-Heavy Workloads
High query frequency with acceptable write trade-offs
Performance Bottlenecks
Join operations causing significant latency
Aggregation Needs
Frequent calculation of derived values
Caching Requirements
Pre-computed results for common queries
Common Denormalization Patterns
Redundant Storage
-- Store calculated values to avoid expensive joins
CREATE
TABLE
orders
(
id
INT
PRIMARY
KEY
,
customer_id
INT
REFERENCES
customers
(
id
)
,
customer_name
VARCHAR
(
100
)
,
-- Denormalized from customers table
order_total
DECIMAL
(
10
,
2
)
,
-- Denormalized calculation
created_at
TIMESTAMP
)
;
Materialized Aggregates
-- Pre-computed summary tables
CREATE
TABLE
customer_statistics
(
customer_id
INT
PRIMARY
KEY
,
total_orders
INT
,
lifetime_value
DECIMAL
(
12
,
2
)
,
last_order_date
DATE
,
updated_at
TIMESTAMP
)
;
Index Optimization Strategies
B-Tree Indexes
Default Choice
Best for range queries, sorting, and equality matches
Column Order
Most selective columns first for composite indexes
Prefix Matching
Supports leading column subset queries
Maintenance Cost
Balanced tree structure with logarithmic operations
Hash Indexes
Equality Queries
Optimal for exact match lookups
Memory Efficiency
Constant-time access for single-value queries
Range Limitations
Cannot support range or partial matches
Use Cases
Primary keys, unique constraints, cache keys
Composite Indexes
-- Query pattern determines optimal column order
-- Query: WHERE status = 'active' AND created_date > '2023-01-01' ORDER BY priority DESC
CREATE
INDEX
idx_task_status_date_priority
ON
tasks
(
status
,
created_date
,
priority
DESC
)
;
-- Query: WHERE user_id = 123 AND category IN ('A', 'B') AND date_field BETWEEN '...' AND '...'
CREATE
INDEX
idx_user_category_date
ON
user_activities
(
user_id
,
category
,
date_field
)
;
Covering Indexes
-- Include additional columns to avoid table lookups
CREATE
INDEX
idx_user_email_covering
ON
users
(
email
)
INCLUDE
(
first_name
,
last_name
,
status
)
;
-- Query can be satisfied entirely from the index
-- SELECT first_name, last_name, status FROM users WHERE email = 'user@example.com';
Partial Indexes
-- Index only relevant subset of data
CREATE
INDEX
idx_active_users_email
ON
users
(
email
)
WHERE
status
=
'active'
;
-- Index for recent orders only
CREATE
INDEX
idx_recent_orders_customer
ON
orders
(
customer_id
,
created_at
)
WHERE
created_at
>
CURRENT_DATE
-
INTERVAL
'30 days'
;
Query Analysis & Optimization
Query Patterns Recognition
Equality Filters
Single-column B-tree indexes
Range Queries
B-tree with proper column ordering
Text Search
Full-text indexes or trigram indexes
Join Operations
Foreign key indexes on both sides
Sorting Requirements
Indexes matching ORDER BY clauses
Index Selection Algorithm
1. Identify WHERE clause columns
2. Determine most selective columns first
3. Consider JOIN conditions
4. Include ORDER BY columns if possible
5. Evaluate covering index opportunities
6. Check for existing overlapping indexes
Data Modeling Patterns
Star Schema (Data Warehousing)
-- Central fact table
CREATE
TABLE
sales_facts
(
sale_id
BIGINT
PRIMARY
KEY
,
product_id
INT
REFERENCES
products
(
id
)
,
customer_id
INT
REFERENCES
customers
(
id
)
,
date_id
INT
REFERENCES
date_dimension
(
id
)
,
store_id
INT
REFERENCES
stores
(
id
)
,
quantity
INT
,
unit_price
DECIMAL
(
8
,
2
)
,
total_amount
DECIMAL
(
10
,
2
)
)
;
-- Dimension tables
CREATE
TABLE
date_dimension
(
id
INT
PRIMARY
KEY
,
date_value
DATE
,
year
INT
,
quarter
INT
,
month
INT
,
day_of_week
INT
,
is_weekend
BOOLEAN
)
;
Snowflake Schema
-- Normalized dimension tables
CREATE
TABLE
products
(
id
INT
PRIMARY
KEY
,
name
VARCHAR
(
200
)
,
category_id
INT
REFERENCES
product_categories
(
id
)
,
brand_id
INT
REFERENCES
brands
(
id
)
)
;
CREATE
TABLE
product_categories
(
id
INT
PRIMARY
KEY
,
name
VARCHAR
(
100
)
,
parent_category_id
INT
REFERENCES
product_categories
(
id
)
)
;
Document Model (JSON Storage)
-- Flexible document storage with indexing
CREATE
TABLE
documents
(
id UUID
PRIMARY
KEY
,
document_type
VARCHAR
(
50
)
,
data
JSONB
,
created_at
TIMESTAMP
DEFAULT
NOW
(
)
,
updated_at
TIMESTAMP
DEFAULT
NOW
(
)
)
;
-- Index on JSON properties
CREATE
INDEX
idx_documents_user_id
ON
documents
USING
GIN
(
(
data
-
>>
'user_id'
)
)
;
CREATE
INDEX
idx_documents_status
ON
documents
(
(
data
-
>>
'status'
)
)
WHERE
document_type
=
'order'
;
Graph Data Patterns
-- Adjacency list for hierarchical data
CREATE
TABLE
categories
(
id
INT
PRIMARY
KEY
,
name
VARCHAR
(
100
)
,
parent_id
INT
REFERENCES
categories
(
id
)
,
level
INT
,
path
VARCHAR
(
500
)
-- Materialized path: "/1/5/12/"
)
;
-- Many-to-many relationships
CREATE
TABLE
relationships
(
id UUID
PRIMARY
KEY
,
from_entity_id UUID
,
to_entity_id UUID
,
relationship_type
VARCHAR
(
50
)
,
created_at
TIMESTAMP
,
INDEX
(
from_entity_id
,
relationship_type
)
,
INDEX
(
to_entity_id
,
relationship_type
)
)
;
Migration Strategies
Zero-Downtime Migration (Expand-Contract Pattern)
Phase 1: Expand
-- Add new column without constraints
ALTER
TABLE
users
ADD
COLUMN
new_email
VARCHAR
(
255
)
;
-- Backfill data in batches
UPDATE
users
SET
new_email
=
email
WHERE
id
BETWEEN
1
AND
1000
;
-- Continue in batches...
-- Add constraints after backfill
ALTER
TABLE
users
ADD
CONSTRAINT
users_new_email_unique
UNIQUE
(
new_email
)
;
ALTER
TABLE
users
ALTER
COLUMN
new_email
SET
NOT
NULL
;
Phase 2: Contract
-- Update application to use new column
-- Deploy application changes
-- Verify new column is being used
-- Remove old column
ALTER
TABLE
users
DROP
COLUMN
email
;
-- Rename new column
ALTER
TABLE
users
RENAME
COLUMN
new_email
TO
email
;
Data Type Changes
-- Safe string to integer conversion
ALTER
TABLE
products
ADD
COLUMN
sku_number
INTEGER
;
UPDATE
products
SET
sku_number
=
CAST
(
sku
AS
INTEGER
)
WHERE
sku
~
'^[0-9]+$'
;
-- Validate conversion success before dropping old column
Partitioning Strategies
Horizontal Partitioning (Sharding)
-- Range partitioning by date
CREATE
TABLE
sales_2023
PARTITION
OF
sales
FOR
VALUES
FROM
(
'2023-01-01'
)
TO
(
'2024-01-01'
)
;
CREATE
TABLE
sales_2024
PARTITION
OF
sales
FOR
VALUES
FROM
(
'2024-01-01'
)
TO
(
'2025-01-01'
)
;
-- Hash partitioning by user_id
CREATE
TABLE
user_data_0
PARTITION
OF
user_data
FOR
VALUES
WITH
(
MODULUS
4
,
REMAINDER
0
)
;
CREATE
TABLE
user_data_1
PARTITION
OF
user_data
FOR
VALUES
WITH
(
MODULUS
4
,
REMAINDER
1
)
;
Vertical Partitioning
-- Separate frequently accessed columns
CREATE
TABLE
users_core
(
id
INT
PRIMARY
KEY
,
email
VARCHAR
(
255
)
,
status
VARCHAR
(
20
)
,
created_at
TIMESTAMP
)
;
-- Less frequently accessed profile data
CREATE
TABLE
users_profile
(
user_id
INT
PRIMARY
KEY
REFERENCES
users_core
(
id
)
,
bio
TEXT
,
preferences JSONB
,
last_login
TIMESTAMP
)
;
Connection Management
Connection Pooling
Pool Size
CPU cores × 2 + effective spindle count
Connection Lifetime
Rotate connections to prevent resource leaks
Timeout Settings
Connection, idle, and query timeouts
Health Checks
Regular connection validation Read Replicas Strategy -- Write queries to primary INSERT INTO users ( email , name ) VALUES ( 'user@example.com' , 'John Doe' ) ; -- Read queries to replicas (with appropriate read preference) SELECT * FROM users WHERE status = 'active' ; -- Route to read replica -- Consistent reads when required SELECT * FROM users WHERE id = LAST_INSERT_ID ( ) ; -- Route to primary Caching Layers Cache-Aside Pattern def get_user ( user_id ) :

Try cache first

user

cache . get ( f"user: { user_id } " ) if user is None :

Cache miss - query database

user

db . query ( "SELECT * FROM users WHERE id = %s" , user_id )

Store in cache

cache
.
set
(
f"user:
{
user_id
}
"
,
user
,
ttl
=
3600
)
return
user
Write-Through Cache
Consistency
Always keep cache and database in sync
Write Latency
Higher due to dual writes
Data Safety
No data loss on cache failures
Cache Invalidation Strategies
TTL-Based
Time-based expiration
Event-Driven
Invalidate on data changes
Version-Based
Use version numbers for consistency
Tag-Based
Group related cache entries
Database Selection Guide
SQL Databases
PostgreSQL
Strengths
ACID compliance, complex queries, JSON support, extensibility
Use Cases
OLTP applications, data warehousing, geospatial data
Scale
Vertical scaling with read replicas
MySQL
Strengths
Performance, replication, wide ecosystem support
Use Cases
Web applications, content management, e-commerce
Scale
Horizontal scaling through sharding
NoSQL Databases
Document Stores (MongoDB, CouchDB)
Strengths
Flexible schema, horizontal scaling, developer productivity
Use Cases
Content management, catalogs, user profiles
Trade-offs
Eventual consistency, complex queries limitations
Key-Value Stores (Redis, DynamoDB)
Strengths
High performance, simple model, excellent caching
Use Cases
Session storage, real-time analytics, gaming leaderboards
Trade-offs
Limited query capabilities, data modeling constraints
Column-Family (Cassandra, HBase)
Strengths
Write-heavy workloads, linear scalability, fault tolerance
Use Cases
Time-series data, IoT applications, messaging systems
Trade-offs
Query flexibility, consistency model complexity
Graph Databases (Neo4j, Amazon Neptune)
Strengths
Relationship queries, pattern matching, recommendation engines
Use Cases
Social networks, fraud detection, knowledge graphs
Trade-offs
Specialized use cases, learning curve
NewSQL Databases
Distributed SQL (CockroachDB, TiDB, Spanner)
Strengths
SQL compatibility with horizontal scaling
Use Cases
Global applications requiring ACID guarantees
Trade-offs
Complexity, latency for distributed transactions
Tools & Scripts
Schema Analyzer
Input
SQL DDL files, JSON schema definitions
Analysis
Normalization compliance, constraint validation, naming conventions
Output
Analysis report, Mermaid ERD, improvement recommendations
Index Optimizer
Input
Schema definition, query patterns
Analysis
Missing indexes, redundancy detection, selectivity estimation
Output
Index recommendations, CREATE INDEX statements, performance projections
Migration Generator
Input
Current and target schemas
Analysis
Schema differences, dependency resolution, risk assessment
Output
Migration scripts, rollback plans, validation queries
Best Practices
Schema Design
Use meaningful names
Clear, consistent naming conventions
Choose appropriate data types
Right-sized columns for storage efficiency
Define proper constraints
Foreign keys, check constraints, unique indexes
Consider future growth
Plan for scale from the beginning
Document relationships
Clear foreign key relationships and business rules
Performance Optimization
Index strategically
Cover common query patterns without over-indexing
Monitor query performance
Regular analysis of slow queries
Partition large tables
Improve query performance and maintenance
Use appropriate isolation levels
Balance consistency with performance
Implement connection pooling
Efficient resource utilization
Security Considerations
Principle of least privilege
Grant minimal necessary permissions
Encrypt sensitive data
At rest and in transit
Audit access patterns
Monitor and log database access
Validate inputs
Prevent SQL injection attacks
Regular security updates
Keep database software current Conclusion Effective database design requires balancing multiple competing concerns: performance, scalability, maintainability, and business requirements. This skill provides the tools and knowledge to make informed decisions throughout the database lifecycle, from initial schema design through production optimization and evolution. The included tools automate common analysis and optimization tasks, while the comprehensive guides provide the theoretical foundation for making sound architectural decisions. Whether building a new system or optimizing an existing one, these resources provide expert-level guidance for creating robust, scalable database solutions.
返回排行榜