Skip to main content

Database Design Patterns Every Developer Should Know

Adela · Mar 11, 2026

A database design pattern is a reusable way to organize tables, relationships, and indexes in a relational database. Getting it wrong early is expensive. The pattern you pick sets your query performance ceiling, your data integrity guarantees, and how much pain your next schema migration will cause.

This guide walks through six patterns you'll run into in most production systems, the trade-offs behind each, and the anti-patterns that cause real outages.

What are database design patterns?

A database design pattern is a repeatable solution to a common data modeling problem. It's a structural decision: how tables relate to each other, how data gets split or combined, and where you draw the boundaries between entities. Some patterns optimize for write safety (normalization), some for read speed (star schema), some for flexibility (EAV, JSONB columns), and some for keeping one customer's data away from another's (multi-tenancy). Most real databases mix several of these together.

Normalization (1NF through 3NF)

Normalization removes redundant data by splitting it across related tables. The goal is to store each fact exactly once, so an update only ever happens in one place.

First Normal Form (1NF): Every column holds a single value. No comma-separated lists, no arrays stuffed into a VARCHAR. Each row is uniquely identifiable by a primary key.

Second Normal Form (2NF): Every non-key column depends on the entire primary key, not just part of it. This matters when you have composite keys. If an order_items table is keyed on (order_id, product_id), then product_name depends only on product_id and belongs in a separate products table.

Third Normal Form (3NF): Non-key columns depend only on the primary key, not on other non-key columns. If a customers table has city and state, and city determines state, then state should move to a cities table. That removes transitive dependencies.

When to denormalize

3NF is the default target for transactional systems. But read-heavy workloads (dashboards, reports, search) often need denormalized data to dodge expensive multi-table joins. Denormalization means deliberately adding redundancy: storing total_orders_count directly on a customer row instead of running COUNT(*) against the orders table on every request.

The rule of thumb: start normalized, and denormalize only when you have measured evidence that a specific query is too slow.

Star schema and snowflake schema

These two are for analytical workloads, not transactional ones.

Star schema puts a central fact table (holding measures like revenue, quantity, or duration) in the middle, surrounded by denormalized dimension tables (date, product, customer, region). Every query is one join from fact to dimension. Simple, and fast.

Snowflake schema normalizes those dimension tables into sub-dimensions. A location dimension splits into city → state → country tables. You save storage and improve data integrity, at the cost of more joins.

Star schemaSnowflake schema
Dimension tablesDenormalizedNormalized into sub-tables
Query joinsFewerMore
StorageHigher (redundant data)Lower
Query speedFasterSlower
Best forDashboards, ad-hoc BIComplex hierarchies, storage-sensitive environments

Modern columnar warehouses (Snowflake, BigQuery, Redshift) have optimized away much of the old performance penalty for snowflake schemas, so the choice these days is less about speed and more about how your team thinks about the data. Star schema is still the default for most analytics use cases, mostly because it's easier to understand and query.

Entity-Attribute-Value (EAV)

EAV stores data in three columns: entity_id, attribute_name, and value. Instead of a wide table with one column per attribute, each attribute becomes its own row.

-- EAV table
CREATE TABLE product_attributes (
    entity_id   INT REFERENCES products(id),
    attribute   VARCHAR(100),
    value       VARCHAR(255)
);

-- A T-shirt has size and color; a laptop has cpu and ram_gb.
-- Both go in the same table.
INSERT INTO product_attributes VALUES (1, 'size', 'XL');
INSERT INTO product_attributes VALUES (1, 'color', 'blue');
INSERT INTO product_attributes VALUES (2, 'cpu', 'M3 Pro');
INSERT INTO product_attributes VALUES (2, 'ram_gb', '36');

When it looks attractive: Product catalogs with hundreds of optional attributes. Clinical records with thousands of possible fields per patient.

Why it usually disappoints: Any query that retrieves more than one attribute needs self-joins or pivots. Filtering is slow because value is always VARCHAR, so there's no type checking and no numeric indexing. Magento 1.x used EAV for product data and was notorious for poor query performance at scale. PostgreSQL benchmarks show JSONB running over 50,000x faster than EAV for unindexed queries, in a database 3x smaller.

The modern alternative: typed columns for the core fields, plus a JSONB column for the variable remainder.

CREATE TABLE products (
    id          SERIAL PRIMARY KEY,
    name        VARCHAR(200) NOT NULL,
    price       DECIMAL(10,2) NOT NULL,
    category_id INT REFERENCES categories(id),
    attributes  JSONB DEFAULT '{}'
);

A good test: if you find yourself writing WHERE attributes->>'status' = 'active' in most of your queries, then status wants to be a proper column, not something buried in JSON.

Polymorphic associations

A polymorphic association uses a single foreign key column plus a "type" column to point at rows in several different tables.

CREATE TABLE comments (
    id              SERIAL PRIMARY KEY,
    body            TEXT,
    commentable_id  INT,          -- could point to posts, photos, or videos
    commentable_type VARCHAR(50)  -- 'Post', 'Photo', or 'Video'
);

You see this a lot in Rails and Django ORMs. The catch: the database can't enforce a foreign key constraint, because commentable_id points to different tables depending on commentable_type. So no CASCADE deletes, no referential integrity checks, and orphaned records that pile up quietly.

GitLab's engineering documentation explicitly recommends against polymorphic associations. The alternative is a separate association table per type.

CREATE TABLE post_comments (
    id      SERIAL PRIMARY KEY,
    body    TEXT,
    post_id INT REFERENCES posts(id) ON DELETE CASCADE
);

CREATE TABLE photo_comments (
    id       SERIAL PRIMARY KEY,
    body     TEXT,
    photo_id INT REFERENCES photos(id) ON DELETE CASCADE
);

More tables, but the database enforces every relationship for you. For systems that need to hold their data integrity over the long run, that's almost always the better trade.

Multi-tenant patterns

Multi-tenancy is about how you isolate one customer's data from another's when several organizations share the same application. There are three standard approaches, each with a different cost-versus-isolation trade-off.

Shared schema (pool model): All tenants share the same tables, and a tenant_id column on every table keeps the data apart. Cheapest and simplest, but one missing WHERE tenant_id = ? filter can leak data across tenants.

Schema-per-tenant: Each tenant gets its own schema inside the same database instance. PostgreSQL and SQL Server support this; MySQL doesn't. Bytebase's analysis found that it introduces complexity comparable to database-per-tenant without the matching isolation benefit.

Database-per-tenant (silo model): Each tenant gets a fully separate database. Maximum isolation, highest operational cost. Running schema migrations across hundreds of databases needs dedicated tooling.

Shared schemaSchema-per-tenantDatabase-per-tenant
CostLowestMediumHighest
Data isolationWeakestMediumStrongest
Per-tenant customizationNoneSomeFull
Migration complexityOne migrationPer-schemaPer-database
Best forStartups, MVPsMid-tier SaaSEnterprise, regulated industries

Start with shared schema unless you have a compliance requirement (HIPAA, GDPR data residency) that demands physical isolation. PostgreSQL's row-level security can add a database-enforced isolation layer on top of your application-level tenant_id filtering.

Anti-patterns to avoid

These are the design decisions that turn into production incidents.

The God Table. One table stores everything: users, orders, products, logs. It starts at 50 columns and grows to 200. Splitting it into focused tables typically cuts query times by 80% or more.

Missing primary keys. Always use a surrogate primary key, either an auto-increment integer or a UUID. Natural keys like name plus date of birth lead to collisions and mixed-up records.

Over-normalization. You can overshoot in the other direction too: splitting data across so many tables that a simple query needs 10+ joins. One fintech startup decomposed currency codes, transaction types, and timestamp components into separate lookup tables, and the system buckled under peak load.

Storing money as FLOAT. Floating-point arithmetic introduces rounding errors. 0.1 + 0.2 = 0.30000000000000004 in most languages. Use DECIMAL(10,2) for monetary values.

No indexes on filtered columns. Index the columns that show up in your WHERE clauses and JOIN conditions, and use EXPLAIN to confirm. Too many indexes slow down writes; too few mean full table scans.

Wrong data types. Dates stored as VARCHAR block range queries. Enums stored as freeform strings invite typos. Use DATE/TIMESTAMP for times, DECIMAL for money, and ENUM or lookup tables for fixed value sets.

Getting the pattern right at the schema design stage is a lot cheaper than fixing it in production.

How Bytebase helps enforce design patterns

Bytebase is a database DevSecOps platform that catches pattern violations before they reach production. Its SQL review policy can enforce rules like:

These rules run automatically on every schema change before it's applied, so the violations get caught in review rather than during a 3 AM incident.

Further readings

Back to blog

Explore the standard for database development