Skip to main content

From Schema as Code to Schema as Context

Tianzhou · May 8, 2026

For over two decades, the industry has been telling teams to treat database schemas like application code: version it, review it, ship it through a pipeline. That advice still holds. But a new consumer has shown up at the table, and it doesn't read pull requests the way your DBA does.

LLMs and AI agents are increasingly the ones generating SQL, proposing migrations, and executing data changes. For these actors, a versioned migration file is nowhere near enough. They need context.

Schema as Code Was a Necessary First Step

The Database-as-Code movement (let's set aside the migration-based vs. state-based debate for now) brought real discipline to schema management. Tools like Liquibase and Flyway gave teams version-controlled migrations applied through CI/CD pipelines and a single source of truth, the same way Infrastructure as Code did. Bytebase extended this with review workflows, access control, and governance.

Here's the catch. A CREATE TABLE statement tells you the shape of the data. It tells you nothing about who is allowed to see or modify that data, which columns hold PII or financial or health records, what masking rules kick in when different roles query it, or what approval a change must clear before it touches production.

For a human developer, that knowledge lives in tribal memory and old Slack threads. For an AI agent, if it isn't codified, it doesn't exist.

Agents Need More than DDL

Point a text-to-SQL agent at a production database and the typical setup feeds it a schema dump (table names, column types, maybe a few comments) and asks it to generate queries.

This works for demos. It falls apart in production.

The agent doesn't know that hr.employees.salary is Confidential and should be masked. It doesn't know that customer.ssn needs a compliance-mandated masking algorithm. It has no idea that querying payments requires just-in-time access approval that expires in one hour.

The real risk isn't that the agent writes a bad query. It's that the agent writes a correct query that should never have run in the first place.

Context Is Everything Surrounding the Schema

The schema (tables, columns, indexes, constraints) is the structural skeleton. Context is everything that gives it meaning and keeps it safe:

  • Data Classification. Every column tagged with its sensitivity level (public, internal, confidential, or restricted) as structured, machine-readable metadata that drives downstream policies.

  • Dynamic Data Masking. Full masking, partial masking, or custom algorithms applied at query time based on who, or what, is asking. This is the enforcement boundary that stops sensitive data from leaking into an LLM's context window.

  • Access Control. Fine-grained, role-based access beyond database-native GRANTs. Project-level scoping, environment-level restrictions, and just-in-time access with automatic expiration.

  • SQL Review Policies. 200+ lint rules covering anti-pattern detection, naming conventions, and performance guardrails, serving as the automated reviewer when an agent generates or proposes SQL.

  • Change Workflows. The process itself, codified: which changes need DBA approval, which environments need staged rollout, and what the rollback plan is. These workflows are what keep autonomous systems from making irreversible mistakes.

Audit trails (every query, every change, every access request) aren't something you codify. They fall out at runtime as a byproduct of enforcing the policies above, giving you accountability over what your agents actually did.

Codifying Context: Terraform, API, and Your Own Format

You can't hand an LLM a PDF of your security policies and expect it to comply. You need machine-readable, version-controlled policy definitions enforced programmatically.

Bytebase manages all the context layers above and exposes them through its Terraform Provider and API, so the same CI/CD pipeline that provisions a database also provisions who can access it, what masking rules apply, and what review process governs changes. For teams that want to go further, the API lets you build your own context layer in whatever format your agents consume best: pull classification taxonomies as JSON, export masking policies as structured data, or serialize the whole thing as YAML, TOML, or Markdown.

When everything is codified, policies are either enforced at the platform level (masking at query time, access denied before the query runs) or available as structured metadata the agent reasons about before it acts.

Schema as Code got us version control. Schema as Context is what gets us AI-readiness. The schema dump told the model what your data looks like. The question now is whether you'll also tell it what your data is allowed to do.

References

  • Evolutionary Database Design - Martin Fowler and Pramod Sadalage's foundational article on treating database changes as evolutionary, version-controlled migrations.
  • Spider 2.0 - A benchmark for evaluating LLMs on enterprise-level text-to-SQL tasks across real-world databases.
Back to blog

Explore the standard for database development