Skip to main content

How to Handle Database Migration / Schema Change?

Tianzhou · May 13, 2026

A series of articles about database migration / schema change:

  1. What is a Database Schema?
  2. What is a Database Schema Migration?
  3. How to Handle Database Migration / Schema Change? (this one)
  4. Top Database Schema Change Tools in 2023

Ad-hoc data change is the unplanned kind. A wrong tax rate. A stuck row. A one-line UPDATE you write under time pressure.

Schema change is the other kind. You open a PR with a new column. Tests pass. CI is green. You merge. At 2am the deploy fires the migration against a 200M-row table and the lock holds for forty minutes. The on-call engineer wakes up to a saturated replica and a half-applied ALTER. Planning is the thing that keeps this from happening.

This post is for the engineer writing the migration. The mistakes that put prod at risk. The four phases that take it out. The controls that enforce them.

The unplanned counterpart to this post is How to Handle Ad-hoc Database Data Changes?. Together they cover both classes of change that touch production data.

What counts as schema change

Anything that alters the structure your application reads against:

  • Adding, dropping, or renaming a column.
  • Adding, dropping, or renaming a table.
  • Adding or dropping an index.
  • Changing a column type, default, or nullability.
  • Adding or modifying a constraint, trigger, or stored procedure.

Each one is a DDL statement. Each one ships as a versioned migration file. Each one lands in every environment in order.

Where it breaks down

The mistakes here start in a pull request — and they share a shape.

  1. You haven't reviewed the lock. Your ALTER looks small in the PR. On the 200M-row table in prod, it rewrites the whole thing under an exclusive lock. The PR didn't tell you because the PR doesn't know the row count.
  2. The migration is coupled to the deploy. The new code starts only after the migration finishes. If the migration fails halfway, the deploy is wedged — and the rollback is two changes, not one.
  3. The new code doesn't run against the old schema. You rename a column. The old pods are still serving requests against the old name. For the seconds between migration and deploy, half of prod is 500ing.
  4. The migration has no reverse. DROP COLUMN ran. There is no down-migration. Whatever was in the column is gone, and the only path back is last night's backup.
  5. You ran the same script by hand in each environment. Dev got version 47. Staging got 47. Prod got 47 — except you fixed a typo between staging and prod and now they aren't the same schema.
  6. The migration ran on one tenant, not all of them. You have thirty regional databases. You applied the change to twenty-nine. Tenant 30 throws schema errors all weekend.
  7. The migration is back. You wrote this ALTER last month. It got reverted in a hotfix and lost in a rebase. Now you're re-applying it without remembering why it got pulled.

Plan. Review. Approve. Deploy.

Four phases close the gaps. Each one runs before the SQL reaches a database.

Plan

You write the migration as a versioned file in the repo — one statement per change, numbered, immutable once merged. The file declares its target: schema-only, online-capable, transaction-safe. Automated SQL review runs at PR time: missing index on a foreign key, unsafe DROP, naming-convention violation, backward-incompatible rename. Every finding shows up as a PR comment before a human reviewer opens the file.

Review

A human approver sees the statement, the lint results, and the affected table's row count and size from the live schema. A 10K-row ALTER on an internal table reads differently from a 200M-row ALTER on invoices. Review is where the lock behavior gets caught — and where online-DDL gets chosen for the tables that need it.

Approve

Your approval matches your risk. A nullable column added to a small staging table routes to a peer. A DROP TABLE in prod routes to the table owner, a DBA, and security. Same lane. Different gates. Approval requests reach approvers through IM — a direct message, not a chat thread they'll scroll past.

Deploy

Deployment runs through a multi-environment pipeline — dev, test, staging, prod — with the migration applied in order and each stage gated. The platform's service account holds the credential; your identity stays on the record, not on the connection. Long-running ALTERs route through online-DDL (gh-ost, pt-osc, or the engine's native equivalent) so the table stays writable. Each application is logged with the version, the executor, the duration, and the affected row count. For multi-tenant deployments, one approved migration fans out across the database group in a single batch.

A migration through the workflow

You need to add a tax_region_id column to invoices with a foreign key to tax_regions. The table holds 240M rows in prod.

Plan. You write the migration as version 0247_invoices_add_tax_region_id.sql:

ALTER TABLE invoices
  ADD COLUMN tax_region_id INTEGER REFERENCES tax_regions(id);

CREATE INDEX CONCURRENTLY idx_invoices_tax_region_id
  ON invoices(tax_region_id);

You open a PR. SQL review flags two things: the column is nullable (intentional — backfill comes later), and the index uses CONCURRENTLY (correct for the row count). The migration declares itself online-capable; the platform confirms it can run without an exclusive lock.

Review. The DBA reads the PR. The 240M-row count is shown next to the table name. She confirms the index path is CONCURRENTLY, that the foreign key is NOT VALID (deferred validation), and that no application code reads the column yet. Approved.

Approve. The platform scores it as moderate risk — prod, large table, additive change, online-capable. It routes to the DBA and the on-call platform engineer. A staging-only version of the same migration would route to a single peer.

Deploy. The platform applies the migration to dev, then test, then staging — each stage running its own row-count check and timing the apply. Staging takes 11 minutes; prod is projected at 90. The prod deploy runs in the next maintenance window. The migration logs version 0247, executor bb-svc, duration 87 minutes, rows touched 0 (additive). The next migration, 0248, backfills tax_region_id and runs through the ad-hoc lane.

The same migration without this workflow: you merged the PR, the deploy hook ran it inline, the ALTER took the lock you didn't expect, and traffic queued behind it until the on-call paged you.

One migration. Every tenant.

The single-database case is the easy one. The hard one: you have thirty regional databases, or one schema per customer, and the migration has to land in every one of them in order.

Running it by hand is where the drift starts. You applied version 247 to twenty-nine of thirty. You'll find tenant 30 next week when the application throws a schema error from a region with low traffic.

One approved migration closes it. One batch deployment across the database group. One consolidated record. Each target reports its own duration and outcome. If three tenants fail, those three halt and the rest continue — the record shows which ran, which didn't, and where drift now exists. Re-running picks up only the targets that didn't complete.

Where the lane ends

The schema-change workflow gates planned DDL. Some things look adjacent but belong elsewhere.

  • Ad-hoc data fixes. A one-row UPDATE does not belong in a versioned migration. Run it through the ad-hoc lane — request, authorize, execute, record — and keep migrations for structural change.
  • Recurring data jobs. A nightly cleanup or rollup is not a migration. Put it in a job scheduler with its own owner and observability.
  • Production hotfixes outside the pipeline. When the pipeline is broken and the schema needs to change anyway, the hand-edit is the failure mode. Treat the bypass as an incident, and reconcile drift back through a migration the same day.
  • Read-path investigation. Querying production to understand the schema is its own workflow — different approval shape, no commit.

Tools that fit

Four approaches cover the shape. Pick the one that matches your team's size and your audit tolerance.

  • ORM migrations. Rails, Django, Prisma, ActiveRecord. Migrations live with the code, run automatically at deploy. Suitable for small teams with one database and one deploy pipeline. Coupled to code release; review is whatever the PR review catches.
  • Migration tool. Liquibase, Flyway, Atlas, Sqitch. Migrations live as versioned files in a dedicated repo; the tool tracks applied versions. Decouples migration from deploy. No built-in human review, no risk routing, no multi-env orchestration.
  • DBaaS console. Apply DDL through the managed-database UI. Single environment, no version control, no review. Acceptable for a prototype, not for a team.
  • Change-management platform. PR with SQL review, risk-routed approval, multi-environment rollout, online-DDL routing, batch fan-out, and a versioned change log share one record. Drift between environments is detectable because the platform knows what ran where.
CapabilityORMMigration ToolDBaaS ConsolePlatform
Versioned, repo-tracked migrations
Automated SQL review at PR time
Human review with table-size context
Approval routed by risk
Multi-environment rollout pipelinePartial
Online-DDL for large-table ALTERs
Batch fan-out across tenant databases
Drift detection across environments

ORM-led teams trade decoupling for setup speed. Migration-tool-led teams trade orchestration for credential simplicity. Platform-led teams pay the cost of consolidation and get one record per migration — with a deploy path and a drift check attached.

Summary

Schema change is the planned lane. The plan is the thing that keeps it from breaking prod. Run it through a SQL client and your safety net is whatever the deploy hook catches at 2am.

Four phases close the gap. Plan. Review. Approve. Deploy. SQL review catches the unsafe DROP and the missing index. Risk routes the approval to the right person. The pipeline applies the migration across every environment in order. Online-DDL keeps the long ALTERs writable. The unplanned counterpart, How to Handle Ad-hoc Database Data Changes?, covers the other lane.

The 6 levels of database automation places this work on the maturity curve. Most teams reach Level 3 when schema change runs through a reviewed pipeline. The next step is folding ad-hoc change into the same record — that's Level 4.

Back to blog

Explore the standard for database development