- Database Designer - POWERFUL Tier Skill
- Overview
- A comprehensive database design skill that provides expert-level analysis, optimization, and migration capabilities for modern database systems. This skill combines theoretical principles with practical tools to help architects and developers create scalable, performant, and maintainable database schemas.
- Core Competencies
- Schema Design & Analysis
- Normalization Analysis
-
- Automated detection of normalization levels (1NF through BCNF)
- Denormalization Strategy
-
- Smart recommendations for performance optimization
- Data Type Optimization
-
- Identification of inappropriate types and size issues
- Constraint Analysis
-
- Missing foreign keys, unique constraints, and null checks
- Naming Convention Validation
-
- Consistent table and column naming patterns
- ERD Generation
-
- Automatic Mermaid diagram creation from DDL
- Index Optimization
- Index Gap Analysis
-
- Identification of missing indexes on foreign keys and query patterns
- Composite Index Strategy
-
- Optimal column ordering for multi-column indexes
- Index Redundancy Detection
-
- Elimination of overlapping and unused indexes
- Performance Impact Modeling
-
- Selectivity estimation and query cost analysis
- Index Type Selection
-
- B-tree, hash, partial, covering, and specialized indexes
- Migration Management
- Zero-Downtime Migrations
-
- Expand-contract pattern implementation
- Schema Evolution
-
- Safe column additions, deletions, and type changes
- Data Migration Scripts
-
- Automated data transformation and validation
- Rollback Strategy
-
- Complete reversal capabilities with validation
- Execution Planning
-
- Ordered migration steps with dependency resolution
- Database Design Principles
- Normalization Forms
- First Normal Form (1NF)
- Atomic Values
-
- Each column contains indivisible values
- Unique Column Names
-
- No duplicate column names within a table
- Uniform Data Types
-
- Each column contains the same type of data
- Row Uniqueness
-
- No duplicate rows in the table
- Example Violation:
- -- BAD: Multiple phone numbers in one column
- CREATE
- TABLE
- contacts
- (
- id
- INT
- PRIMARY
- KEY
- ,
- name
- VARCHAR
- (
- 100
- )
- ,
- phones
- VARCHAR
- (
- 200
- )
- -- "123-456-7890, 098-765-4321"
- )
- ;
- -- GOOD: Separate table for phone numbers
- CREATE
- TABLE
- contacts
- (
- id
- INT
- PRIMARY
- KEY
- ,
- name
- VARCHAR
- (
- 100
- )
- )
- ;
- CREATE
- TABLE
- contact_phones
- (
- id
- INT
- PRIMARY
- KEY
- ,
- contact_id
- INT
- REFERENCES
- contacts
- (
- id
- )
- ,
- phone_number
- VARCHAR
- (
- 20
- )
- ,
- phone_type
- VARCHAR
- (
- 10
- )
- )
- ;
- Second Normal Form (2NF)
- 1NF Compliance
-
- Must satisfy First Normal Form
- Full Functional Dependency
-
- Non-key attributes depend on the entire primary key
- Partial Dependency Elimination
-
- Remove attributes that depend on part of a composite key
- Example Violation:
- -- BAD: Student course table with partial dependencies
- CREATE
- TABLE
- student_courses
- (
- student_id
- INT
- ,
- course_id
- INT
- ,
- student_name
- VARCHAR
- (
- 100
- )
- ,
- -- Depends only on student_id
- course_name
- VARCHAR
- (
- 100
- )
- ,
- -- Depends only on course_id
- grade
- CHAR
- (
- 1
- )
- ,
- PRIMARY
- KEY
- (
- student_id
- ,
- course_id
- )
- )
- ;
- -- GOOD: Separate tables eliminate partial dependencies
- CREATE
- TABLE
- students
- (
- id
- INT
- PRIMARY
- KEY
- ,
- name
- VARCHAR
- (
- 100
- )
- )
- ;
- CREATE
- TABLE
- courses
- (
- id
- INT
- PRIMARY
- KEY
- ,
- name
- VARCHAR
- (
- 100
- )
- )
- ;
- CREATE
- TABLE
- enrollments
- (
- student_id
- INT
- REFERENCES
- students
- (
- id
- )
- ,
- course_id
- INT
- REFERENCES
- courses
- (
- id
- )
- ,
- grade
- CHAR
- (
- 1
- )
- ,
- PRIMARY
- KEY
- (
- student_id
- ,
- course_id
- )
- )
- ;
- Third Normal Form (3NF)
- 2NF Compliance
-
- Must satisfy Second Normal Form
- Transitive Dependency Elimination
-
- Non-key attributes should not depend on other non-key attributes
- Direct Dependency
-
- Non-key attributes depend directly on the primary key
- Example Violation:
- -- BAD: Employee table with transitive dependency
- CREATE
- TABLE
- employees
- (
- id
- INT
- PRIMARY
- KEY
- ,
- name
- VARCHAR
- (
- 100
- )
- ,
- department_id
- INT
- ,
- department_name
- VARCHAR
- (
- 100
- )
- ,
- -- Depends on department_id, not employee id
- department_budget
- DECIMAL
- (
- 10
- ,
- 2
- )
- -- Transitive dependency
- )
- ;
- -- GOOD: Separate department information
- CREATE
- TABLE
- departments
- (
- id
- INT
- PRIMARY
- KEY
- ,
- name
- VARCHAR
- (
- 100
- )
- ,
- budget
- DECIMAL
- (
- 10
- ,
- 2
- )
- )
- ;
- CREATE
- TABLE
- employees
- (
- id
- INT
- PRIMARY
- KEY
- ,
- name
- VARCHAR
- (
- 100
- )
- ,
- department_id
- INT
- REFERENCES
- departments
- (
- id
- )
- )
- ;
- Boyce-Codd Normal Form (BCNF)
- 3NF Compliance
-
- Must satisfy Third Normal Form
- Determinant Key Rule
-
- Every determinant must be a candidate key
- Stricter 3NF
-
- Handles anomalies not covered by 3NF
- Denormalization Strategies
- When to Denormalize
- Read-Heavy Workloads
-
- High query frequency with acceptable write trade-offs
- Performance Bottlenecks
-
- Join operations causing significant latency
- Aggregation Needs
-
- Frequent calculation of derived values
- Caching Requirements
-
- Pre-computed results for common queries
- Common Denormalization Patterns
- Redundant Storage
- -- Store calculated values to avoid expensive joins
- CREATE
- TABLE
- orders
- (
- id
- INT
- PRIMARY
- KEY
- ,
- customer_id
- INT
- REFERENCES
- customers
- (
- id
- )
- ,
- customer_name
- VARCHAR
- (
- 100
- )
- ,
- -- Denormalized from customers table
- order_total
- DECIMAL
- (
- 10
- ,
- 2
- )
- ,
- -- Denormalized calculation
- created_at
- TIMESTAMP
- )
- ;
- Materialized Aggregates
- -- Pre-computed summary tables
- CREATE
- TABLE
- customer_statistics
- (
- customer_id
- INT
- PRIMARY
- KEY
- ,
- total_orders
- INT
- ,
- lifetime_value
- DECIMAL
- (
- 12
- ,
- 2
- )
- ,
- last_order_date
- DATE
- ,
- updated_at
- TIMESTAMP
- )
- ;
- Index Optimization Strategies
- B-Tree Indexes
- Default Choice
-
- Best for range queries, sorting, and equality matches
- Column Order
-
- Most selective columns first for composite indexes
- Prefix Matching
-
- Supports leading column subset queries
- Maintenance Cost
-
- Balanced tree structure with logarithmic operations
- Hash Indexes
- Equality Queries
-
- Optimal for exact match lookups
- Memory Efficiency
-
- Constant-time access for single-value queries
- Range Limitations
-
- Cannot support range or partial matches
- Use Cases
-
- Primary keys, unique constraints, cache keys
- Composite Indexes
- -- Query pattern determines optimal column order
- -- Query: WHERE status = 'active' AND created_date > '2023-01-01' ORDER BY priority DESC
- CREATE
- INDEX
- idx_task_status_date_priority
- ON
- tasks
- (
- status
- ,
- created_date
- ,
- priority
- DESC
- )
- ;
- -- Query: WHERE user_id = 123 AND category IN ('A', 'B') AND date_field BETWEEN '...' AND '...'
- CREATE
- INDEX
- idx_user_category_date
- ON
- user_activities
- (
- user_id
- ,
- category
- ,
- date_field
- )
- ;
- Covering Indexes
- -- Include additional columns to avoid table lookups
- CREATE
- INDEX
- idx_user_email_covering
- ON
- users
- (
- )
- INCLUDE
- (
- first_name
- ,
- last_name
- ,
- status
- )
- ;
- -- Query can be satisfied entirely from the index
- -- SELECT first_name, last_name, status FROM users WHERE email = 'user@example.com';
- Partial Indexes
- -- Index only relevant subset of data
- CREATE
- INDEX
- idx_active_users_email
- ON
- users
- (
- )
- WHERE
- status
- =
- 'active'
- ;
- -- Index for recent orders only
- CREATE
- INDEX
- idx_recent_orders_customer
- ON
- orders
- (
- customer_id
- ,
- created_at
- )
- WHERE
- created_at
- >
- CURRENT_DATE
- -
- INTERVAL
- '30 days'
- ;
- Query Analysis & Optimization
- Query Patterns Recognition
- Equality Filters
-
- Single-column B-tree indexes
- Range Queries
-
- B-tree with proper column ordering
- Text Search
-
- Full-text indexes or trigram indexes
- Join Operations
-
- Foreign key indexes on both sides
- Sorting Requirements
-
- Indexes matching ORDER BY clauses
- Index Selection Algorithm
- 1. Identify WHERE clause columns
- 2. Determine most selective columns first
- 3. Consider JOIN conditions
- 4. Include ORDER BY columns if possible
- 5. Evaluate covering index opportunities
- 6. Check for existing overlapping indexes
- Data Modeling Patterns
- Star Schema (Data Warehousing)
- -- Central fact table
- CREATE
- TABLE
- sales_facts
- (
- sale_id
- BIGINT
- PRIMARY
- KEY
- ,
- product_id
- INT
- REFERENCES
- products
- (
- id
- )
- ,
- customer_id
- INT
- REFERENCES
- customers
- (
- id
- )
- ,
- date_id
- INT
- REFERENCES
- date_dimension
- (
- id
- )
- ,
- store_id
- INT
- REFERENCES
- stores
- (
- id
- )
- ,
- quantity
- INT
- ,
- unit_price
- DECIMAL
- (
- 8
- ,
- 2
- )
- ,
- total_amount
- DECIMAL
- (
- 10
- ,
- 2
- )
- )
- ;
- -- Dimension tables
- CREATE
- TABLE
- date_dimension
- (
- id
- INT
- PRIMARY
- KEY
- ,
- date_value
- DATE
- ,
- year
- INT
- ,
- quarter
- INT
- ,
- month
- INT
- ,
- day_of_week
- INT
- ,
- is_weekend
- BOOLEAN
- )
- ;
- Snowflake Schema
- -- Normalized dimension tables
- CREATE
- TABLE
- products
- (
- id
- INT
- PRIMARY
- KEY
- ,
- name
- VARCHAR
- (
- 200
- )
- ,
- category_id
- INT
- REFERENCES
- product_categories
- (
- id
- )
- ,
- brand_id
- INT
- REFERENCES
- brands
- (
- id
- )
- )
- ;
- CREATE
- TABLE
- product_categories
- (
- id
- INT
- PRIMARY
- KEY
- ,
- name
- VARCHAR
- (
- 100
- )
- ,
- parent_category_id
- INT
- REFERENCES
- product_categories
- (
- id
- )
- )
- ;
- Document Model (JSON Storage)
- -- Flexible document storage with indexing
- CREATE
- TABLE
- documents
- (
- id UUID
- PRIMARY
- KEY
- ,
- document_type
- VARCHAR
- (
- 50
- )
- ,
- data
- JSONB
- ,
- created_at
- TIMESTAMP
- DEFAULT
- NOW
- (
- )
- ,
- updated_at
- TIMESTAMP
- DEFAULT
- NOW
- (
- )
- )
- ;
- -- Index on JSON properties
- CREATE
- INDEX
- idx_documents_user_id
- ON
- documents
- USING
- GIN
- (
- (
- data
- -
- >>
- 'user_id'
- )
- )
- ;
- CREATE
- INDEX
- idx_documents_status
- ON
- documents
- (
- (
- data
- -
- >>
- 'status'
- )
- )
- WHERE
- document_type
- =
- 'order'
- ;
- Graph Data Patterns
- -- Adjacency list for hierarchical data
- CREATE
- TABLE
- categories
- (
- id
- INT
- PRIMARY
- KEY
- ,
- name
- VARCHAR
- (
- 100
- )
- ,
- parent_id
- INT
- REFERENCES
- categories
- (
- id
- )
- ,
- level
- INT
- ,
- path
- VARCHAR
- (
- 500
- )
- -- Materialized path: "/1/5/12/"
- )
- ;
- -- Many-to-many relationships
- CREATE
- TABLE
- relationships
- (
- id UUID
- PRIMARY
- KEY
- ,
- from_entity_id UUID
- ,
- to_entity_id UUID
- ,
- relationship_type
- VARCHAR
- (
- 50
- )
- ,
- created_at
- TIMESTAMP
- ,
- INDEX
- (
- from_entity_id
- ,
- relationship_type
- )
- ,
- INDEX
- (
- to_entity_id
- ,
- relationship_type
- )
- )
- ;
- Migration Strategies
- Zero-Downtime Migration (Expand-Contract Pattern)
- Phase 1: Expand
- -- Add new column without constraints
- ALTER
- TABLE
- users
- ADD
- COLUMN
- new_email
- VARCHAR
- (
- 255
- )
- ;
- -- Backfill data in batches
- UPDATE
- users
- SET
- new_email
- =
- WHERE
- id
- BETWEEN
- 1
- AND
- 1000
- ;
- -- Continue in batches...
- -- Add constraints after backfill
- ALTER
- TABLE
- users
- ADD
- CONSTRAINT
- users_new_email_unique
- UNIQUE
- (
- new_email
- )
- ;
- ALTER
- TABLE
- users
- ALTER
- COLUMN
- new_email
- SET
- NOT
- NULL
- ;
- Phase 2: Contract
- -- Update application to use new column
- -- Deploy application changes
- -- Verify new column is being used
- -- Remove old column
- ALTER
- TABLE
- users
- DROP
- COLUMN
- ;
- -- Rename new column
- ALTER
- TABLE
- users
- RENAME
- COLUMN
- new_email
- TO
- ;
- Data Type Changes
- -- Safe string to integer conversion
- ALTER
- TABLE
- products
- ADD
- COLUMN
- sku_number
- INTEGER
- ;
- UPDATE
- products
- SET
- sku_number
- =
- CAST
- (
- sku
- AS
- INTEGER
- )
- WHERE
- sku
- ~
- '^[0-9]+$'
- ;
- -- Validate conversion success before dropping old column
- Partitioning Strategies
- Horizontal Partitioning (Sharding)
- -- Range partitioning by date
- CREATE
- TABLE
- sales_2023
- PARTITION
- OF
- sales
- FOR
- VALUES
- FROM
- (
- '2023-01-01'
- )
- TO
- (
- '2024-01-01'
- )
- ;
- CREATE
- TABLE
- sales_2024
- PARTITION
- OF
- sales
- FOR
- VALUES
- FROM
- (
- '2024-01-01'
- )
- TO
- (
- '2025-01-01'
- )
- ;
- -- Hash partitioning by user_id
- CREATE
- TABLE
- user_data_0
- PARTITION
- OF
- user_data
- FOR
- VALUES
- WITH
- (
- MODULUS
- 4
- ,
- REMAINDER
- 0
- )
- ;
- CREATE
- TABLE
- user_data_1
- PARTITION
- OF
- user_data
- FOR
- VALUES
- WITH
- (
- MODULUS
- 4
- ,
- REMAINDER
- 1
- )
- ;
- Vertical Partitioning
- -- Separate frequently accessed columns
- CREATE
- TABLE
- users_core
- (
- id
- INT
- PRIMARY
- KEY
- ,
- VARCHAR
- (
- 255
- )
- ,
- status
- VARCHAR
- (
- 20
- )
- ,
- created_at
- TIMESTAMP
- )
- ;
- -- Less frequently accessed profile data
- CREATE
- TABLE
- users_profile
- (
- user_id
- INT
- PRIMARY
- KEY
- REFERENCES
- users_core
- (
- id
- )
- ,
- bio
- TEXT
- ,
- preferences JSONB
- ,
- last_login
- TIMESTAMP
- )
- ;
- Connection Management
- Connection Pooling
- Pool Size
-
- CPU cores × 2 + effective spindle count
- Connection Lifetime
-
- Rotate connections to prevent resource leaks
- Timeout Settings
-
- Connection, idle, and query timeouts
- Health Checks
- Regular connection validation Read Replicas Strategy -- Write queries to primary INSERT INTO users ( email , name ) VALUES ( 'user@example.com' , 'John Doe' ) ; -- Read queries to replicas (with appropriate read preference) SELECT * FROM users WHERE status = 'active' ; -- Route to read replica -- Consistent reads when required SELECT * FROM users WHERE id = LAST_INSERT_ID ( ) ; -- Route to primary Caching Layers Cache-Aside Pattern def get_user ( user_id ) :
Try cache first
user
cache . get ( f"user: { user_id } " ) if user is None :
Cache miss - query database
user
db . query ( "SELECT * FROM users WHERE id = %s" , user_id )
Store in cache
- cache
- .
- set
- (
- f"user:
- {
- user_id
- }
- "
- ,
- user
- ,
- ttl
- =
- 3600
- )
- return
- user
- Write-Through Cache
- Consistency
-
- Always keep cache and database in sync
- Write Latency
-
- Higher due to dual writes
- Data Safety
-
- No data loss on cache failures
- Cache Invalidation Strategies
- TTL-Based
-
- Time-based expiration
- Event-Driven
-
- Invalidate on data changes
- Version-Based
-
- Use version numbers for consistency
- Tag-Based
-
- Group related cache entries
- Database Selection Guide
- SQL Databases
- PostgreSQL
- Strengths
-
- ACID compliance, complex queries, JSON support, extensibility
- Use Cases
-
- OLTP applications, data warehousing, geospatial data
- Scale
-
- Vertical scaling with read replicas
- MySQL
- Strengths
-
- Performance, replication, wide ecosystem support
- Use Cases
-
- Web applications, content management, e-commerce
- Scale
-
- Horizontal scaling through sharding
- NoSQL Databases
- Document Stores (MongoDB, CouchDB)
- Strengths
-
- Flexible schema, horizontal scaling, developer productivity
- Use Cases
-
- Content management, catalogs, user profiles
- Trade-offs
-
- Eventual consistency, complex queries limitations
- Key-Value Stores (Redis, DynamoDB)
- Strengths
-
- High performance, simple model, excellent caching
- Use Cases
-
- Session storage, real-time analytics, gaming leaderboards
- Trade-offs
-
- Limited query capabilities, data modeling constraints
- Column-Family (Cassandra, HBase)
- Strengths
-
- Write-heavy workloads, linear scalability, fault tolerance
- Use Cases
-
- Time-series data, IoT applications, messaging systems
- Trade-offs
-
- Query flexibility, consistency model complexity
- Graph Databases (Neo4j, Amazon Neptune)
- Strengths
-
- Relationship queries, pattern matching, recommendation engines
- Use Cases
-
- Social networks, fraud detection, knowledge graphs
- Trade-offs
-
- Specialized use cases, learning curve
- NewSQL Databases
- Distributed SQL (CockroachDB, TiDB, Spanner)
- Strengths
-
- SQL compatibility with horizontal scaling
- Use Cases
-
- Global applications requiring ACID guarantees
- Trade-offs
-
- Complexity, latency for distributed transactions
- Tools & Scripts
- Schema Analyzer
- Input
-
- SQL DDL files, JSON schema definitions
- Analysis
-
- Normalization compliance, constraint validation, naming conventions
- Output
-
- Analysis report, Mermaid ERD, improvement recommendations
- Index Optimizer
- Input
-
- Schema definition, query patterns
- Analysis
-
- Missing indexes, redundancy detection, selectivity estimation
- Output
-
- Index recommendations, CREATE INDEX statements, performance projections
- Migration Generator
- Input
-
- Current and target schemas
- Analysis
-
- Schema differences, dependency resolution, risk assessment
- Output
-
- Migration scripts, rollback plans, validation queries
- Best Practices
- Schema Design
- Use meaningful names
-
- Clear, consistent naming conventions
- Choose appropriate data types
-
- Right-sized columns for storage efficiency
- Define proper constraints
-
- Foreign keys, check constraints, unique indexes
- Consider future growth
-
- Plan for scale from the beginning
- Document relationships
-
- Clear foreign key relationships and business rules
- Performance Optimization
- Index strategically
-
- Cover common query patterns without over-indexing
- Monitor query performance
-
- Regular analysis of slow queries
- Partition large tables
-
- Improve query performance and maintenance
- Use appropriate isolation levels
-
- Balance consistency with performance
- Implement connection pooling
-
- Efficient resource utilization
- Security Considerations
- Principle of least privilege
-
- Grant minimal necessary permissions
- Encrypt sensitive data
-
- At rest and in transit
- Audit access patterns
-
- Monitor and log database access
- Validate inputs
-
- Prevent SQL injection attacks
- Regular security updates
- Keep database software current Conclusion Effective database design requires balancing multiple competing concerns: performance, scalability, maintainability, and business requirements. This skill provides the tools and knowledge to make informed decisions throughout the database lifecycle, from initial schema design through production optimization and evolution. The included tools automate common analysis and optimization tasks, while the comprehensive guides provide the theoretical foundation for making sound architectural decisions. Whether building a new system or optimizing an existing one, these resources provide expert-level guidance for creating robust, scalable database solutions.