PostgreSQL Table Design

Use this skill when

Designing a schema for PostgreSQL

Selecting data types and constraints

Planning indexes, partitions, or RLS policies

Reviewing tables for scale and maintainability

Do not use this skill when

You are targeting a non-PostgreSQL database

You only need query tuning without schema changes

You require a DB-agnostic modeling guide

Instructions

Capture entities, access patterns, and scale targets (rows, QPS, retention).

Choose data types and constraints that enforce invariants.

Add indexes for real query paths and validate with

EXPLAIN

.

Plan partitioning or RLS where required by scale or access control.

Review migration impact and apply changes safely.

Safety

Avoid destructive DDL on production without backups and a rollback plan.

Use migrations and staging validation before applying schema changes.

Core Rules

Define a

PRIMARY KEY

for reference tables (users, orders, etc.). Not always needed for time-series/event/log data. When used, prefer

BIGINT GENERATED ALWAYS AS IDENTITY

; use

UUID

only when global uniqueness/opacity is needed.

Normalize first (to 3NF)

to eliminate data redundancy and update anomalies; denormalize

only

for measured, high-ROI reads where join performance is proven problematic. Premature denormalization creates maintenance burden.

Add

NOT NULL

everywhere it’s semantically required; use

DEFAULT

s for common values.

Create

indexes for access paths you actually query

PK/unique (auto),

FK columns (manual!)

, frequent filters/sorts, and join keys.

Prefer

TIMESTAMPTZ

for event time;

NUMERIC

for money;

TEXT

for strings;

BIGINT

for integer values,

DOUBLE PRECISION

for floats (or

NUMERIC

for exact decimal arithmetic).

PostgreSQL “Gotchas”

Identifiers

unquoted → lowercased. Avoid quoted/mixed-case names. Convention: use

snake_case

for table/column names.

Unique + NULLs

UNIQUE allows multiple NULLs. Use

UNIQUE (...) NULLS NOT DISTINCT

(PG15+) to restrict to one NULL.

FK indexes

PostgreSQL

does not

auto-index FK columns. Add them.

No silent coercions

length/precision overflows error out (no truncation). Example: inserting 999 into

NUMERIC(2,0)

fails with error, unlike some databases that silently truncate or round.

Sequences/identity have gaps

(normal; don't "fix"). Rollbacks, crashes, and concurrent transactions create gaps in ID sequences (1, 2, 5, 6...). This is expected behavior—don't try to make IDs consecutive.

Heap storage

no clustered PK by default (unlike SQL Server/MySQL InnoDB);

CLUSTER

is one-off reorganization, not maintained on subsequent inserts. Row order on disk is insertion order unless explicitly clustered.

MVCC

updates/deletes leave dead tuples; vacuum handles them—design to avoid hot wide-row churn.

Data Types

IDs

:

BIGINT GENERATED ALWAYS AS IDENTITY

preferred (

GENERATED BY DEFAULT

also fine);

UUID

when merging/federating/used in a distributed system or for opaque IDs. Generate with

uuidv7()

(preferred if using PG18+) or

gen_random_uuid()

(if using an older PG version).

Integers

prefer

BIGINT

unless storage space is critical;

INTEGER

for smaller ranges; avoid

SMALLINT

unless constrained.

Floats

prefer

DOUBLE PRECISION

over

REAL

unless storage space is critical. Use

NUMERIC

for exact decimal arithmetic.

Strings

prefer

TEXT

; if length limits needed, use

CHECK (LENGTH(col) <= n)

instead of

VARCHAR(n)

; avoid

CHAR(n)

. Use

BYTEA

for binary data. Large strings/binary (>2KB default threshold) automatically stored in TOAST with compression. TOAST storage:

PLAIN

(no TOAST),

EXTENDED

(compress + out-of-line),

EXTERNAL

(out-of-line, no compress),

MAIN

(compress, keep in-line if possible). Default

EXTENDED

usually optimal. Control with

ALTER TABLE tbl ALTER COLUMN col SET STORAGE strategy

and

ALTER TABLE tbl SET (toast_tuple_target = 4096)

for threshold. Case-insensitive: for locale/accent handling use non-deterministic collations; for plain ASCII use expression indexes on

LOWER(col)

(preferred unless column needs case-insensitive PK/FK/UNIQUE) or

CITEXT

.

Money

:

NUMERIC(p,s)

(never float).

Time

:

TIMESTAMPTZ

for timestamps;

DATE

for date-only;

INTERVAL

for durations. Avoid

TIMESTAMP

(without timezone). Use

now()

for transaction start time,

clock_timestamp()

for current wall-clock time.

Booleans

:

BOOLEAN

with

NOT NULL

constraint unless tri-state values are required.

Enums

:

CREATE TYPE ... AS ENUM

for small, stable sets (e.g. US states, days of week). For business-logic-driven and evolving values (e.g. order statuses) → use TEXT (or INT) + CHECK or lookup table.

Arrays

:

TEXT[]

,

INTEGER[]

, etc. Use for ordered lists where you query elements. Index with

GIN

for containment (

@>

,

<@

) and overlap (

&&

) queries. Access:

arr[1]

(1-indexed),

arr[1:3]

(slicing). Good for tags, categories; avoid for relations—use junction tables instead. Literal syntax:

'{val1,val2}'

or

ARRAY[val1,val2]

.

Range types

:

daterange

,

numrange

,

tstzrange

for intervals. Support overlap (

&&

), containment (

@>

), operators. Index with

GiST

. Good for scheduling, versioning, numeric ranges. Pick a bounds scheme and use it consistently; prefer

[)

(inclusive/exclusive) by default.

Network types

:

INET

for IP addresses,

CIDR

for network ranges,

MACADDR

for MAC addresses. Support network operators (

<<

,

>>

,

&&

).

Geometric types

:

POINT

,

LINE

,

POLYGON

,

CIRCLE

for 2D spatial data. Index with

GiST

. Consider

PostGIS

for advanced spatial features.

Text search

:

TSVECTOR

for full-text search documents,

TSQUERY

for search queries. Index

tsvector

with

GIN

. Always specify language:

to_tsvector('english', col)

and

to_tsquery('english', 'query')

. Never use single-argument versions. This applies to both index expressions and queries.

Domain types

:

CREATE DOMAIN email AS TEXT CHECK (VALUE ~ '^[^@]+@[^@]+$')

for reusable custom types with validation. Enforces constraints across tables.

Composite types

:

CREATE TYPE address AS (street TEXT, city TEXT, zip TEXT)

for structured data within columns. Access with

(col).field

syntax.

JSONB

preferred over JSON; index with

GIN

. Use only for optional/semi-structured attrs. ONLY use JSON if the original ordering of the contents MUST be preserved.

Vector types

:

vector

type by

pgvector

for vector similarity search for embeddings.

Do not use the following data types

DO NOT use

timestamp

(without time zone); DO use

timestamptz

instead.

DO NOT use

char(n)

or

varchar(n)

; DO use

text

instead.

DO NOT use

money

type; DO use

numeric

instead.

DO NOT use

timetz

type; DO use

timestamptz

instead.

DO NOT use

timestamptz(0)

or any other precision specification; DO use

timestamptz

instead

DO NOT use

serial

type; DO use

generated always as identity

instead.

Table Types

Regular

default; fully durable, logged.

TEMPORARY

session-scoped, auto-dropped, not logged. Faster for scratch work.

UNLOGGED

persistent but not crash-safe. Faster writes; good for caches/staging.

Row-Level Security

Enable with

ALTER TABLE tbl ENABLE ROW LEVEL SECURITY

. Create policies:

CREATE POLICY user_access ON orders FOR SELECT TO app_users USING (user_id = current_user_id())

. Built-in user-based access control at the row level.

Constraints

PK

implicit UNIQUE + NOT NULL; creates a B-tree index.

FK

specify

ON DELETE/UPDATE

action (

CASCADE

,

RESTRICT

,

SET NULL

,

SET DEFAULT

). Add explicit index on referencing column—speeds up joins and prevents locking issues on parent deletes/updates. Use

DEFERRABLE INITIALLY DEFERRED

for circular FK dependencies checked at transaction end.

UNIQUE

creates a B-tree index; allows multiple NULLs unless

NULLS NOT DISTINCT

(PG15+). Standard behavior:

(1, NULL)

and

(1, NULL)

are allowed. With

NULLS NOT DISTINCT

only one

(1, NULL)

allowed. Prefer

NULLS NOT DISTINCT

unless you specifically need duplicate NULLs.

CHECK

row-local constraints; NULL values pass the check (three-valued logic). Example:

CHECK (price > 0)

allows NULL prices. Combine with

NOT NULL

to enforce:

price NUMERIC NOT NULL CHECK (price > 0)

.

EXCLUDE

prevents overlapping values using operators.

EXCLUDE USING gist (room_id WITH =, booking_period WITH &&)

prevents double-booking rooms. Requires appropriate index type (often GiST).

Indexing

B-tree

default for equality/range queries (

,

<

,

>

,

BETWEEN

,

ORDER BY

)

Composite

order matters—index used if equality on leftmost prefix (

WHERE a = ? AND b > ?

uses index on

(a,b)

, but

WHERE b = ?

does not). Put most selective/frequently filtered columns first.

Covering

:

CREATE INDEX ON tbl (id) INCLUDE (name, email)

- includes non-key columns for index-only scans without visiting table.

Partial

for hot subsets (

WHERE status = 'active'

→

CREATE INDEX ON tbl (user_id) WHERE status = 'active'

). Any query with

status = 'active'

can use this index.

Expression

for computed search keys (

CREATE INDEX ON tbl (LOWER(email))

). Expression must match exactly in WHERE clause:

WHERE LOWER(email) = 'user@example.com'

.

GIN

JSONB containment/existence, arrays (

@>

,

?

), full-text search (

@@

)

GiST

ranges, geometry, exclusion constraints

BRIN

very large, naturally ordered data (time-series)—minimal storage overhead. Effective when row order on disk correlates with indexed column (insertion order or after

CLUSTER

).

Partitioning

Use for very large tables (>100M rows) where queries consistently filter on partition key (often time/date).

Alternate use: use for tables where data maintenance tasks dictates e.g. data pruned or bulk replaced periodically

RANGE

common for time-series (

PARTITION BY RANGE (created_at)

). Create partitions:

CREATE TABLE logs_2024_01 PARTITION OF logs FOR VALUES FROM ('2024-01-01') TO ('2024-02-01')

.

TimescaleDB

automates time-based or ID-based partitioning with retention policies and compression.

LIST

for discrete values (

PARTITION BY LIST (region)

). Example:

FOR VALUES IN ('us-east', 'us-west')

.

HASH

for even distribution when no natural key (

PARTITION BY HASH (user_id)

). Creates N partitions with modulus.

Constraint exclusion

requires

CHECK

constraints on partitions for query planner to prune. Auto-created for declarative partitioning (PG10+).

Prefer declarative partitioning or hypertables. Do NOT use table inheritance.

Limitations

no global UNIQUE constraints—include partition key in PK/UNIQUE. FKs from partitioned tables not supported; use triggers.

Special Considerations

Update-Heavy Tables

Separate hot/cold columns

—put frequently updated columns in separate table to minimize bloat.

Use

fillfactor=90

to leave space for HOT updates that avoid index maintenance.

Avoid updating indexed columns

—prevents beneficial HOT updates.

Partition by update patterns

—separate frequently updated rows in a different partition from stable data.

Insert-Heavy Workloads

Minimize indexes

—only create what you query; every index slows inserts.

Use

COPY

or multi-row

INSERT

instead of single-row inserts.

UNLOGGED tables

for rebuildable staging data—much faster writes.

Defer index creation

for bulk loads—>drop index, load data, recreate indexes.

Partition by time/hash

to distribute load.

TimescaleDB

automates partitioning and compression of insert-heavy data.

Use a natural key for primary key

such as a (timestamp, device_id) if enforcing global uniqueness is important many insert-heavy tables don't need a primary key at all.

If you do need a surrogate key,

Prefer

BIGINT GENERATED ALWAYS AS IDENTITY

over

UUID

.

Upsert-Friendly Design

Requires UNIQUE index

on conflict target columns—

ON CONFLICT (col1, col2)

needs exact matching unique index (partial indexes don't work).

Use

EXCLUDED.column

to reference would-be-inserted values; only update columns that actually changed to reduce write overhead.

DO NOTHING

faster

than

DO UPDATE

when no actual update needed.

Safe Schema Evolution

Transactional DDL

most DDL operations can run in transactions and be rolled back—

BEGIN; ALTER TABLE...; ROLLBACK;

for safe testing.

Concurrent index creation

:

CREATE INDEX CONCURRENTLY

avoids blocking writes but can't run in transactions.

Volatile defaults cause rewrites

adding

NOT NULL

columns with volatile defaults (e.g.,

now()

,

gen_random_uuid()

) rewrites entire table. Non-volatile defaults are fast.

Drop constraints before columns

:

ALTER TABLE DROP CONSTRAINT

then

DROP COLUMN

to avoid dependency issues.

Function signature changes

:

CREATE OR REPLACE

with different arguments creates overloads, not replacements. DROP old version if no overload desired.

Generated Columns

... GENERATED ALWAYS AS () STORED

for computed, indexable fields. PG18+ adds

VIRTUAL

columns (computed on read, not stored).

Extensions

pgcrypto

:

crypt()

for password hashing.

uuid-ossp

alternative UUID functions; prefer

pgcrypto

for new projects.

pg_trgm

fuzzy text search with

%

operator,

similarity()

function. Index with GIN for

LIKE '%pattern%'

acceleration.

citext

case-insensitive text type. Prefer expression indexes on

LOWER(col)

unless you need case-insensitive constraints.

btree_gin

/

btree_gist

enable mixed-type indexes (e.g., GIN index on both JSONB and text columns).

hstore

key-value pairs; mostly superseded by JSONB but useful for simple string mappings.

timescaledb

essential for time-series—automated partitioning, retention, compression, continuous aggregates.

postgis

comprehensive geospatial support beyond basic geometric types—essential for location-based applications.

pgvector

vector similarity search for embeddings.

pgaudit

audit logging for all database activity.
JSONB Guidance
Prefer
JSONB
with
GIN
index.
Default:
CREATE INDEX ON tbl USING GIN (jsonb_col);
→ accelerates:
Containment
jsonb_col @> '{"k":"v"}'
Key existence
jsonb_col ? 'k'
,
any/all keys
?|
,
?&
Path containment
on nested docs
Disjunction
jsonb_col @> ANY(ARRAY['{"status":"active"}', '{"status":"pending"}'])
Heavy
@>
workloads: consider opclass
jsonb_path_ops
for smaller/faster containment-only indexes:
CREATE INDEX ON tbl USING GIN (jsonb_col jsonb_path_ops);
Trade-off: loses support for key existence ( ? , ?| , ?& ) queries—only supports containment ( @> ) Equality/range on a specific scalar field: extract and index with B-tree (generated column or expression): ALTER TABLE tbl ADD COLUMN price INT GENERATED ALWAYS AS ((jsonb_col->>'price')::INT) STORED; CREATE INDEX ON tbl (price); Prefer queries like WHERE price BETWEEN 100 AND 500 (uses B-tree) over WHERE (jsonb_col->>'price')::INT BETWEEN 100 AND 500 without index. Arrays inside JSONB: use GIN + @> for containment (e.g., tags). Consider jsonb_path_ops if only doing containment. Keep core relations in tables; use JSONB for optional/variable attributes. Use constraints to limit allowed JSONB values in a column e.g. config JSONB NOT NULL CHECK(jsonb_typeof(config) = 'object') Examples Users CREATE TABLE users ( user_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY , email TEXT NOT NULL UNIQUE , name TEXT NOT NULL , created_at TIMESTAMPTZ NOT NULL DEFAULT now ( ) ) ; CREATE UNIQUE INDEX ON users ( LOWER ( email ) ) ; CREATE INDEX ON users ( created_at ) ; Orders CREATE TABLE orders ( order_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY , user_id BIGINT NOT NULL REFERENCES users ( user_id ) , status TEXT NOT NULL DEFAULT 'PENDING' CHECK ( status IN ( 'PENDING' , 'PAID' , 'CANCELED' ) ) , total NUMERIC ( 10 , 2 ) NOT NULL CHECK ( total

0 ) , created_at TIMESTAMPTZ NOT NULL DEFAULT now ( ) ) ; CREATE INDEX ON orders ( user_id ) ; CREATE INDEX ON orders ( created_at ) ; JSONB CREATE TABLE profiles ( user_id BIGINT PRIMARY KEY REFERENCES users ( user_id ) , attrs JSONB NOT NULL DEFAULT '{}' , theme TEXT GENERATED ALWAYS AS ( attrs -

'theme' ) STORED ) ; CREATE INDEX profiles_attrs_gin ON profiles USING GIN ( attrs ) ;

postgresql

安装

default for equality/range queries (