Database Design Patterns Every Developer Should Know

A database design pattern is a reusable way to organize tables, relationships, and indexes in a relational database. Getting it wrong early is expensive. The pattern you pick sets your query performance ceiling, your data integrity guarantees, and how much pain your next schema migration will cause.

This guide walks through six patterns you'll run into in most production systems, the trade-offs behind each, and the anti-patterns that cause real outages.

What are database design patterns?

A database design pattern is a repeatable solution to a common data modeling problem. It's a structural decision: how tables relate to each other, how data gets split or combined, and where you draw the boundaries between entities. Some patterns optimize for write safety (normalization), some for read speed (star schema), some for flexibility (EAV, JSONB columns), and some for keeping one customer's data away from another's (multi-tenancy). Most real databases mix several of these together.

Normalization (1NF through 3NF)

Normalization removes redundant data by splitting it across related tables. The goal is to store each fact exactly once, so an update only ever happens in one place.

First Normal Form (1NF): Every column holds a single value. No comma-separated lists, no arrays stuffed into a VARCHAR. Each row is uniquely identifiable by a primary key.

Second Normal Form (2NF): Every non-key column depends on the entire primary key, not just part of it. This matters when you have composite keys. If an order_items table is keyed on (order_id, product_id), then product_name depends only on product_id and belongs in a separate products table.

Third Normal Form (3NF): Non-key columns depend only on the primary key, not on other non-key columns. If a customers table has city and state, and city determines state, then state should move to a cities table. That removes transitive dependencies.

When to denormalize

3NF is the default target for transactional systems. But read-heavy workloads (dashboards, reports, search) often need denormalized data to dodge expensive multi-table joins. Denormalization means deliberately adding redundancy: storing total_orders_count directly on a customer row instead of running COUNT(*) against the orders table on every request.

The rule of thumb: start normalized, and denormalize only when you have measured evidence that a specific query is too slow.

Star schema and snowflake schema

These two are for analytical workloads, not transactional ones.

Star schema puts a central fact table (holding measures like revenue, quantity, or duration) in the middle, surrounded by denormalized dimension tables (date, product, customer, region). Every query is one join from fact to dimension. Simple, and fast.

Snowflake schema normalizes those dimension tables into sub-dimensions. A location dimension splits into city → state → country tables. You save storage and improve data integrity, at the cost of more joins.

	Star schema	Snowflake schema
Dimension tables	Denormalized	Normalized into sub-tables
Query joins	Fewer	More
Storage	Higher (redundant data)	Lower
Query speed	Faster	Slower
Best for	Dashboards, ad-hoc BI	Complex hierarchies, storage-sensitive environments

Modern columnar warehouses (Snowflake, BigQuery, Redshift) have optimized away much of the old performance penalty for snowflake schemas, so the choice these days is less about speed and more about how your team thinks about the data. Star schema is still the default for most analytics use cases, mostly because it's easier to understand and query.

Entity-Attribute-Value (EAV)

EAV stores data in three columns: entity_id, attribute_name, and value. Instead of a wide table with one column per attribute, each attribute becomes its own row.

-- EAV table
CREATE TABLE product_attributes (
    entity_id   INT REFERENCES products(id),
    attribute   VARCHAR(100),
    value       VARCHAR(255)
);

-- A T-shirt has size and color; a laptop has cpu and ram_gb.
-- Both go in the same table.
INSERT INTO product_attributes VALUES (1, 'size', 'XL');
INSERT INTO product_attributes VALUES (1, 'color', 'blue');
INSERT INTO product_attributes VALUES (2, 'cpu', 'M3 Pro');
INSERT INTO product_attributes VALUES (2, 'ram_gb', '36');

When it looks attractive: Product catalogs with hundreds of optional attributes. Clinical records with thousands of possible fields per patient.

Why it usually disappoints: Any query that retrieves more than one attribute needs self-joins or pivots. Filtering is slow because value is always VARCHAR, so there's no type checking and no numeric indexing. Magento 1.x used EAV for product data and was notorious for poor query performance at scale. PostgreSQL benchmarks show JSONB running over 50,000x faster than EAV for unindexed queries, in a database 3x smaller.

The modern alternative: typed columns for the core fields, plus a JSONB column for the variable remainder.

CREATE TABLE products (
    id          SERIAL PRIMARY KEY,
    name        VARCHAR(200) NOT NULL,
    price       DECIMAL(10,2) NOT NULL,
    category_id INT REFERENCES categories(id),
    attributes  JSONB DEFAULT '{}'
);

A good test: if you find yourself writing WHERE attributes->>'status' = 'active' in most of your queries, then status wants to be a proper column, not something buried in JSON.

Polymorphic associations

A polymorphic association uses a single foreign key column plus a "type" column to point at rows in several different tables.

CREATE TABLE comments (
    id              SERIAL PRIMARY KEY,
    body            TEXT,
    commentable_id  INT,          -- could point to posts, photos, or videos
    commentable_type VARCHAR(50)  -- 'Post', 'Photo', or 'Video'
);

You see this a lot in Rails and Django ORMs. The catch: the database can't enforce a foreign key constraint, because commentable_id points to different tables depending on commentable_type. So no CASCADE deletes, no referential integrity checks, and orphaned records that pile up quietly.

GitLab's engineering documentation explicitly recommends against polymorphic associations. The alternative is a separate association table per type.

CREATE TABLE post_comments (
    id      SERIAL PRIMARY KEY,
    body    TEXT,
    post_id INT REFERENCES posts(id) ON DELETE CASCADE
);

CREATE TABLE photo_comments (
    id       SERIAL PRIMARY KEY,
    body     TEXT,
    photo_id INT REFERENCES photos(id) ON DELETE CASCADE
);

More tables, but the database enforces every relationship for you. For systems that need to hold their data integrity over the long run, that's almost always the better trade.

Multi-tenant patterns

Multi-tenancy is about how you isolate one customer's data from another's when several organizations share the same application. There are three standard approaches, each with a different cost-versus-isolation trade-off.

Shared schema (pool model): All tenants share the same tables, and a tenant_id column on every table keeps the data apart. Cheapest and simplest, but one missing WHERE tenant_id = ? filter can leak data across tenants.

Schema-per-tenant: Each tenant gets its own schema inside the same database instance. PostgreSQL and SQL Server support this; MySQL doesn't. Bytebase's analysis found that it introduces complexity comparable to database-per-tenant without the matching isolation benefit.

Database-per-tenant (silo model): Each tenant gets a fully separate database. Maximum isolation, highest operational cost. Running schema migrations across hundreds of databases needs dedicated tooling.

	Shared schema	Schema-per-tenant	Database-per-tenant
Cost	Lowest	Medium	Highest
Data isolation	Weakest	Medium	Strongest
Per-tenant customization	None	Some	Full
Migration complexity	One migration	Per-schema	Per-database
Best for	Startups, MVPs	Mid-tier SaaS	Enterprise, regulated industries

Start with shared schema unless you have a compliance requirement (HIPAA, GDPR data residency) that demands physical isolation. PostgreSQL's row-level security can add a database-enforced isolation layer on top of your application-level tenant_id filtering.

Anti-patterns to avoid

These are the design decisions that turn into production incidents.

The God Table. One table stores everything: users, orders, products, logs. It starts at 50 columns and grows to 200. Splitting it into focused tables typically cuts query times by 80% or more.

Missing primary keys. Always use a surrogate primary key, either an auto-increment integer or a UUID. Natural keys like name plus date of birth lead to collisions and mixed-up records.

Over-normalization. You can overshoot in the other direction too: splitting data across so many tables that a simple query needs 10+ joins. One fintech startup decomposed currency codes, transaction types, and timestamp components into separate lookup tables, and the system buckled under peak load.

Storing money as FLOAT. Floating-point arithmetic introduces rounding errors. 0.1 + 0.2 = 0.30000000000000004 in most languages. Use DECIMAL(10,2) for monetary values.

No indexes on filtered columns. Index the columns that show up in your WHERE clauses and JOIN conditions, and use EXPLAIN to confirm. Too many indexes slow down writes; too few mean full table scans.

Wrong data types. Dates stored as VARCHAR block range queries. Enums stored as freeform strings invite typos. Use DATE/TIMESTAMP for times, DECIMAL for money, and ENUM or lookup tables for fixed value sets.

Getting the pattern right at the schema design stage is a lot cheaper than fixing it in production.

How Bytebase helps enforce design patterns

Bytebase is a database DevSecOps platform that catches pattern violations before they reach production. Its SQL review policy can enforce rules like:

Every table must have a primary key
Column naming must follow conventions (e.g., _id suffix for foreign keys, _at suffix for timestamps)
CASCADE operations must be explicitly approved
Specific column types are required for specific data (TIMESTAMP for dates, DECIMAL for currency)

These rules run automatically on every schema change before it's applied, so the violations get caught in review rather than during a 3 AM incident.

Back to blog

Database Design Patterns Every Developer Should Know

What are database design patterns?

Normalization (1NF through 3NF)

When to denormalize

Star schema and snowflake schema

Entity-Attribute-Value (EAV)

Polymorphic associations

Multi-tenant patterns

Anti-patterns to avoid

How Bytebase helps enforce design patterns

Related articles

Top 10 Database Schema Design Best Practices

What is a Database Schema?

Multi-Tenant Database Architecture Patterns Explained

Primary Key vs Foreign Key: What Are the Differences?

DDL vs DML in SQL: What is Data Definition Language and Data Manipulation Language?

How to Handle Database Migration / Schema Change?

Explore the standard for database governance

Solutions & Platform

Resources

Change Management

Access & Security

AI & Agents

Comparisons

Company

Legal