Database Schema Design

When to use this skill

Lists specific situations where this skill should be triggered:

New Project

Database schema design for a new application

Schema Refactoring

Redesigning an existing schema for performance or scalability

Relationship Definition

Implementing 1:1, 1:N, N:M relationships between tables

Migration

Safely applying schema changes

Performance Issues

Index and schema optimization to resolve slow queries

Input Format

The required and optional input information to collect from the user:

Required Information

Database Type

PostgreSQL, MySQL, MongoDB, SQLite, etc.

Domain Description

What data will be stored (e.g., e-commerce, blog, social media)

Key Entities

Core data objects (e.g., User, Product, Order)

Optional Information

Expected Data Volume

Small (<10K rows), Medium (10K-1M), Large (>1M) (default: Medium)

Read/Write Ratio

Read-heavy, Write-heavy, Balanced (default: Balanced)

Transaction Requirements

Whether ACID is required (default: true)
Sharding/Partitioning: Whether large data distribution is needed (default: false) Input Example Design a database for an e-commerce platform: - DB: PostgreSQL - Entities: User, Product, Order, Review - Relationships: - A User can have multiple Orders - An Order contains multiple Products (N:M) - A Review is linked to a User and a Product - Expected data: 100,000 users, 10,000 products - Read-heavy (frequent product lookups) Instructions Specifies the step-by-step task sequence to follow precisely. Step 1: Define Entities and Attributes Identify core data objects and their attributes. Tasks : Extract nouns from business requirements → entities List each entity's attributes (columns) Determine data types (VARCHAR, INTEGER, TIMESTAMP, JSON, etc.) Designate Primary Keys (UUID vs Auto-increment ID) Example (E-commerce): Users - id: UUID PRIMARY KEY - email: VARCHAR(255) UNIQUE NOT NULL - username: VARCHAR(50) UNIQUE NOT NULL - password_hash: VARCHAR(255) NOT NULL - created_at: TIMESTAMP DEFAULT NOW() - updated_at: TIMESTAMP DEFAULT NOW() Products - id: UUID PRIMARY KEY - name: VARCHAR(255) NOT NULL - description: TEXT - price: DECIMAL(10, 2) NOT NULL - stock: INTEGER DEFAULT 0 - category_id: UUID REFERENCES Categories(id) - created_at: TIMESTAMP DEFAULT NOW() Orders - id: UUID PRIMARY KEY - user_id: UUID REFERENCES Users(id) - total_amount: DECIMAL(10, 2) NOT NULL - status: VARCHAR(20) DEFAULT 'pending' - created_at: TIMESTAMP DEFAULT NOW() OrderItems (Junction table) - id: UUID PRIMARY KEY - order_id: UUID REFERENCES Orders(id) ON DELETE CASCADE - product_id: UUID REFERENCES Products(id) - quantity: INTEGER NOT NULL - price: DECIMAL(10, 2) NOT NULL Step 2: Design Relationships and Normalization Define relationships between tables and apply normalization. Tasks : 1:1 relationship: Foreign Key + UNIQUE constraint 1:N relationship: Foreign Key N:M relationship: Create junction table Determine normalization level (1NF ~ 3NF) Decision Criteria : OLTP systems → normalize to 3NF (data integrity) OLAP/analytics systems → denormalization allowed (query performance) Read-heavy → minimize JOINs with partial denormalization Write-heavy → full normalization to eliminate redundancy Example (ERD Mermaid): erDiagram Users ||--o{ Orders : places Orders ||--|{ OrderItems : contains Products ||--o{ OrderItems : "ordered in" Categories ||--o{ Products : categorizes Users ||--o{ Reviews : writes Products ||--o{ Reviews : "reviewed by" Users { uuid id PK string email UK string username UK string password_hash timestamp created_at } Products { uuid id PK string name decimal price int stock uuid category_id FK } Orders { uuid id PK uuid user_id FK decimal total_amount string status timestamp created_at } OrderItems { uuid id PK uuid order_id FK uuid product_id FK int quantity decimal price } Step 3: Establish Indexing Strategy Design indexes for query performance. Tasks : Primary Keys automatically create indexes Columns frequently used in WHERE clauses → add indexes Foreign Keys used in JOINs → indexes Consider composite indexes (WHERE col1 = ? AND col2 = ?) UNIQUE indexes (email, username, etc.) Checklist : Indexes on frequently queried columns Indexes on Foreign Key columns Composite index order optimized (high selectivity columns first) Avoid excessive indexes (degrades INSERT/UPDATE performance) Example (PostgreSQL): -- Primary Keys (auto-indexed) CREATE TABLE users ( id UUID PRIMARY KEY DEFAULT gen_random_uuid ( ) , email VARCHAR ( 255 ) UNIQUE NOT NULL , -- UNIQUE = auto-indexed username VARCHAR ( 50 ) UNIQUE NOT NULL , password_hash VARCHAR ( 255 ) NOT NULL , created_at TIMESTAMP DEFAULT NOW ( ) , updated_at TIMESTAMP DEFAULT NOW ( ) ) ; -- Foreign Keys + explicit indexes CREATE TABLE orders ( id UUID PRIMARY KEY DEFAULT gen_random_uuid ( ) , user_id UUID NOT NULL REFERENCES users ( id ) ON DELETE CASCADE , total_amount DECIMAL ( 10 , 2 ) NOT NULL , status VARCHAR ( 20 ) DEFAULT 'pending' , created_at TIMESTAMP DEFAULT NOW ( ) ) ; CREATE INDEX idx_orders_user_id ON orders ( user_id ) ; CREATE INDEX idx_orders_status ON orders ( status ) ; CREATE INDEX idx_orders_created_at ON orders ( created_at ) ; -- Composite index (status and created_at frequently queried together) CREATE INDEX idx_orders_status_created ON orders ( status , created_at DESC ) ; -- Products table CREATE TABLE products ( id UUID PRIMARY KEY DEFAULT gen_random_uuid ( ) , name VARCHAR ( 255 ) NOT NULL , description TEXT , price DECIMAL ( 10 , 2 ) NOT NULL CHECK ( price

= 0 ) , stock INTEGER DEFAULT 0 CHECK ( stock = 0 ) , category_id UUID REFERENCES categories ( id ) , created_at TIMESTAMP DEFAULT NOW ( ) ) ; CREATE INDEX idx_products_category ON products ( category_id ) ; CREATE INDEX idx_products_price ON products ( price ) ; -- price range search CREATE INDEX idx_products_name ON products ( name ) ; -- product name search -- Full-text search (PostgreSQL) CREATE INDEX idx_products_name_fts ON products USING GIN ( to_tsvector ( 'english' , name ) ) ; CREATE INDEX idx_products_description_fts ON products USING GIN ( to_tsvector ( 'english' , description ) ) ; Step 4: Set Up Constraints and Triggers Add constraints to ensure data integrity. Tasks : NOT NULL: required columns UNIQUE: columns that must be unique CHECK: value range constraints (e.g., price >= 0) Foreign Key + CASCADE option Set default values Example : CREATE TABLE products ( id UUID PRIMARY KEY DEFAULT gen_random_uuid ( ) , name VARCHAR ( 255 ) NOT NULL , price DECIMAL ( 10 , 2 ) NOT NULL CHECK ( price = 0 ) , stock INTEGER DEFAULT 0 CHECK ( stock = 0 ) , discount_percent INTEGER CHECK ( discount_percent = 0 AND discount_percent <= 100 ) , category_id UUID REFERENCES categories ( id ) ON DELETE SET NULL , created_at TIMESTAMP DEFAULT NOW ( ) , updated_at TIMESTAMP DEFAULT NOW ( ) ) ; -- Trigger: auto-update updated_at CREATE OR REPLACE FUNCTION update_updated_at_column ( ) RETURNS TRIGGER AS $$ BEGIN NEW . updated_at = NOW ( ) ; RETURN NEW ; END ; $$ LANGUAGE plpgsql ; CREATE TRIGGER update_products_updated_at BEFORE UPDATE ON products FOR EACH ROW EXECUTE FUNCTION update_updated_at_column ( ) ; Step 5: Write Migration Scripts Write migrations that safely apply schema changes. Tasks : UP migration: apply changes DOWN migration: rollback Wrap in transactions Prevent data loss (use ALTER TABLE carefully) Example (SQL migration): -- migrations/001_create_initial_schema.up.sql BEGIN ; CREATE EXTENSION IF NOT EXISTS "uuid-ossp" ; CREATE TABLE users ( id UUID PRIMARY KEY DEFAULT gen_random_uuid ( ) , email VARCHAR ( 255 ) UNIQUE NOT NULL , username VARCHAR ( 50 ) UNIQUE NOT NULL , password_hash VARCHAR ( 255 ) NOT NULL , created_at TIMESTAMP DEFAULT NOW ( ) , updated_at TIMESTAMP DEFAULT NOW ( ) ) ; CREATE TABLE categories ( id UUID PRIMARY KEY DEFAULT gen_random_uuid ( ) , name VARCHAR ( 100 ) UNIQUE NOT NULL , parent_id UUID REFERENCES categories ( id ) ) ; CREATE TABLE products ( id UUID PRIMARY KEY DEFAULT gen_random_uuid ( ) , name VARCHAR ( 255 ) NOT NULL , description TEXT , price DECIMAL ( 10 , 2 ) NOT NULL CHECK ( price = 0 ) , stock INTEGER DEFAULT 0 CHECK ( stock = 0 ) , category_id UUID REFERENCES categories ( id ) , created_at TIMESTAMP DEFAULT NOW ( ) , updated_at TIMESTAMP DEFAULT NOW ( ) ) ; CREATE INDEX idx_products_category ON products ( category_id ) ; CREATE INDEX idx_products_price ON products ( price ) ; COMMIT ; -- migrations/001_create_initial_schema.down.sql BEGIN ; DROP TABLE IF EXISTS products CASCADE ; DROP TABLE IF EXISTS categories CASCADE ; DROP TABLE IF EXISTS users CASCADE ; COMMIT ; Output format Defines the exact format that deliverables should follow. Basic Structure project/ ├── database/ │ ├── schema.sql # full schema │ ├── migrations/ │ │ ├── 001_create_users.up.sql │ │ ├── 001_create_users.down.sql │ │ ├── 002_create_products.up.sql │ │ └── 002_create_products.down.sql │ ├── seeds/ │ │ └── sample_data.sql # test data │ └── docs/ │ ├── ERD.md # Mermaid ERD diagram │ └── SCHEMA.md # schema documentation └── README.md ERD Diagram (Mermaid Format)

Database Schema

Entity Relationship Diagram ```mermaid erDiagram Users ||--o{ Orders : places Orders ||--|{ OrderItems : contains Products ||--o{ OrderItems : "ordered in" Users { uuid id PK string email UK string username UK } Products { uuid id PK string name decimal price } ```

Table Descriptions

users

Purpose

Store user account information

Indexes

email, username

**
Estimated rows
**: 100,000

products

Purpose

Product catalog

Indexes

category_id, price, name

Estimated rows

10,000

Constraints

Specifies mandatory rules and prohibited actions.

Mandatory Rules (MUST)

Primary Key Required

Define a Primary Key on every table

Unique record identification

Ensures referential integrity

Explicit Foreign Keys

Tables with relationships must define Foreign Keys

Specify ON DELETE CASCADE/SET NULL options

Prevent orphan records

Use NOT NULL Appropriately

Required columns must be NOT NULL

Clearly specify nullable vs. non-nullable

Providing defaults is recommended

Prohibited Actions (MUST NOT)

Avoid EAV Pattern Abuse

Use the Entity-Attribute-Value pattern only in special cases

Query complexity increases dramatically

Performance degradation

Excessive Denormalization

Be careful when denormalizing for performance

Data consistency issues

Risk of update anomalies

No Plaintext Storage of Sensitive Data

Never store passwords, card numbers, etc. in plaintext

Hashing/encryption is mandatory

Legal liability issues

Security Rules

Principle of Least Privilege

Grant only the necessary permissions to application DB accounts

SQL Injection Prevention

Use Prepared Statements / Parameterized Queries

Encrypt Sensitive Columns

Consider encrypting personally identifiable information at rest

Examples

Demonstrates how to apply the skill through real-world use cases.

Example 1: Blog Platform Schema

Situation

Database design for a Medium-style blog platform

User Request

:

Design a PostgreSQL schema for a blog platform:

- Users can write multiple posts

- Posts can have multiple tags (N:M)

- Users can like and bookmark posts

- Comment feature (with nested replies)

Final Result

:

-- Users

CREATE

TABLE

users

(

id UUID

PRIMARY

KEY

DEFAULT

gen_random_uuid

(

)

,

email

VARCHAR

(

255

)

UNIQUE

NOT

NULL

,

username

VARCHAR

(

50

)

UNIQUE

NOT

NULL

,

bio

TEXT

,

avatar_url

VARCHAR

(

500

)

,

created_at

TIMESTAMP

DEFAULT

NOW

(

)

;

-- Posts

CREATE

TABLE

posts

(

id UUID

PRIMARY

KEY

DEFAULT

gen_random_uuid

(

)

,

author_id UUID

NOT

NULL

REFERENCES

users

(

id

)

ON

DELETE

CASCADE

,

title

VARCHAR

(

255

)

NOT

NULL

,

slug

VARCHAR

(

255

)

UNIQUE

NOT

NULL

,

content

TEXT

NOT

NULL

,

published_at

TIMESTAMP

,

created_at

TIMESTAMP

DEFAULT

NOW

(

)

,

updated_at

TIMESTAMP

DEFAULT

NOW

(

)

;

CREATE

INDEX

idx_posts_author

ON

posts

(

author_id

)

;

CREATE

INDEX

idx_posts_published

ON

posts

(

published_at

)

;

CREATE

INDEX

idx_posts_slug

ON

posts

(

slug

)

;

-- Tags

CREATE

TABLE

tags

(

id UUID

PRIMARY

KEY

DEFAULT

gen_random_uuid

(

)

,

name

VARCHAR

(

50

)

UNIQUE

NOT

NULL

,

slug

VARCHAR

(

50

)

UNIQUE

NOT

NULL

)

;

-- Post-Tag relationship (N:M)

CREATE

TABLE

post_tags

(

post_id UUID

REFERENCES

posts

(

id

)

ON

DELETE

CASCADE

,

tag_id UUID

REFERENCES

tags

(

id

)

ON

DELETE

CASCADE

,

PRIMARY

KEY

(

post_id

,

tag_id

)

;

CREATE

INDEX

idx_post_tags_post

ON

post_tags

(

post_id

)

;

CREATE

INDEX

idx_post_tags_tag

ON

post_tags

(

tag_id

)

;

-- Likes

CREATE

TABLE

post_likes

(

user_id UUID

REFERENCES

users

(

id

)

ON

DELETE

CASCADE

,

post_id UUID

REFERENCES

posts

(

id

)

ON

DELETE

CASCADE

,

created_at

TIMESTAMP

DEFAULT

NOW

(

)

,

PRIMARY

KEY

(

user_id

,

post_id

)

;

-- Bookmarks

CREATE

TABLE

post_bookmarks

(

user_id UUID

REFERENCES

users

(

id

)

ON

DELETE

CASCADE

,

post_id UUID

REFERENCES

posts

(

id

)

ON

DELETE

CASCADE

,

created_at

TIMESTAMP

DEFAULT

NOW

(

)

,

PRIMARY

KEY

(

user_id

,

post_id

)

;

-- Comments (self-referencing for nested comments)

CREATE

TABLE

comments

(

id UUID

PRIMARY

KEY

DEFAULT

gen_random_uuid

(

)

,

post_id UUID

NOT

NULL

REFERENCES

posts

(

id

)

ON

DELETE

CASCADE

,

author_id UUID

NOT

NULL

REFERENCES

users

(

id

)

ON

DELETE

CASCADE

,

parent_comment_id UUID

REFERENCES

comments

(

id

)

ON

DELETE

CASCADE

,

content

TEXT

NOT

NULL

,

created_at

TIMESTAMP

DEFAULT

NOW

(

)

,

updated_at

TIMESTAMP

DEFAULT

NOW

(

)

;

CREATE

INDEX

idx_comments_post

ON

comments

(

post_id

)

;

CREATE

INDEX

idx_comments_author

ON

comments

(

author_id

)

;

CREATE

INDEX

idx_comments_parent

ON

comments

(

parent_comment_id

)

;

Example 2: MongoDB Schema (NoSQL)

Situation

MongoDB schema for a real-time chat app

User Request

:

Design a MongoDB schema for a real-time chat app.

Reads are very frequent, and message history needs to be retrieved quickly.

Final Result

:

// users collection

{

_id

:

ObjectId

,

username

:

String

,

// indexed, unique

email

:

String

,

// indexed, unique

avatar_url

:

String

,

status

:

String

,

// 'online', 'offline', 'away'

last_seen

:

Date

,

created_at

:

Date

}

// conversations collection (denormalized - read-optimized)

{

_id

:

ObjectId

,

participants

:

[

// indexed

{

user_id

:

ObjectId

,

username

:

String

,

avatar_url

:

String

}

]

,

last_message

:

{

// denormalized for fast recent-message retrieval

content

:

String

,

sender_id

:

ObjectId

,

sent_at

:

Date

}

,

unread_counts

:

{

// unread message count per participant

"user_id_1"

:

5

,

"user_id_2"

:

0

}

,

created_at

:

Date

,

updated_at

:

Date

}

// messages collection

{

_id

:

ObjectId

,

conversation_id

:

ObjectId

,

// indexed

sender_id

:

ObjectId

,

content

:

String

,

attachments

:

[

{

type

:

String

,

// 'image', 'file', 'video'

url

:

String

,

filename

:

String

}

]

,

read_by

:

[

ObjectId

]

,

// array of user IDs who have read the message

sent_at

:

Date

,

// indexed

edited_at

:

Date

}

// Indexes

db

.

users

.

createIndex

(

{

username

:

1

}

,

{

unique

:

true

}

)

;

db

.

users

.

createIndex

(

{

email

:

1

}

,

{

unique

:

true

}

)

;

db

.

conversations

.

createIndex

(

{

"participants.user_id"

:

1

}

)

;

db

.

conversations

.

createIndex

(

{

updated_at

:

-

1

}

)

;

db

.

messages

.

createIndex

(

{

conversation_id

:

1

,

sent_at

:

-

1

}

)

;

db

.

messages

.

createIndex

(

{

sender_id

:

1

}

)

;

Design Highlights

:

Denormalization for read optimization (embedding last_message)

Indexes on frequently accessed fields

Using array fields (participants, read_by)

Best practices

Quality Improvement

Naming Convention Consistency

Use snake_case for table/column names

users, post_tags, created_at

Be consistent with plurals/singulars (tables plural, columns singular, etc.)

Consider Soft Delete

Use logical deletion instead of physical deletion for important data

deleted_at TIMESTAMP (NULL = active, NOT NULL = deleted)

Allows recovery of accidentally deleted data

Audit trail

Timestamps Required

Include created_at and updated_at in most tables

Data tracking and debugging

Time-series analysis

Efficiency Improvements

Partial Indexes

Minimize index size with conditional indexes

CREATE

INDEX

idx_posts_published

ON

posts

(

published_at

)

WHERE

published_at

IS

NOT

NULL

;

Materialized Views

Cache complex aggregate queries as Materialized Views

Partitioning

Partition large tables by date/range

Common Issues

Issue 1: N+1 Query Problem

Symptom

Multiple DB calls when a single query would suffice

Cause

Individual lookups in a loop without JOINs

Solution

:

-- ❌ Bad example: N+1 queries

SELECT

*

FROM

posts

;

-- 1 time

-- for each post

SELECT

*

FROM

users

WHERE

id

=

?

;

-- N times

-- ✅ Good example: 1 query

SELECT

posts

.

*

,

users

.

username

,

users

.

avatar_url

FROM

posts

JOIN

users

ON

posts

.

author_id

=

users

.

id

;

Issue 2: Slow JOINs Due to Unindexed Foreign Keys

Symptom

JOIN queries are very slow

Cause

Missing index on Foreign Key column

Solution

:

CREATE

INDEX

idx_orders_user_id

ON

orders

(

user_id

)

;

CREATE

INDEX

idx_order_items_order_id

ON

order_items

(

order_id

)

;

CREATE

INDEX

idx_order_items_product_id

ON

order_items

(

product_id

)

;

Issue 3: UUID vs Auto-increment Performance

Symptom

Insert performance degradation when using UUID Primary Keys

Cause

UUIDs are random, causing index fragmentation

Solution

:

PostgreSQL: Use

uuid_generate_v7()

(time-ordered UUID)

MySQL: Use

UUID_TO_BIN(UUID(), 1)

Or consider using Auto-increment BIGINT

References

Official Documentation

PostgreSQL Documentation

MySQL Documentation

MongoDB Schema Design Best Practices

Tools

dbdiagram.io

- ERD diagram creation

PgModeler

- PostgreSQL modeling tool

Prisma

- ORM + migrations

Learning Resources

Database Design Course (freecodecamp)

Use The Index, Luke

- SQL indexing guide

Metadata

Version

Current Version

1.0.0

Last Updated

2025-01-01
Compatible Platforms: Claude, ChatGPT, Gemini

database-schema-design

安装

users

Store user account information

email, username

products

Product catalog

category_id, price, name