Skip to main content

Static Data Masking Tools (and How to Roll Your Own)

Tianzhou · May 25, 2026

Update HistoryComment
2026/05/25Initial version.

Static data masking (SDM) rewrites sensitive data into a sanitized copy, so dev, test, and staging never hold real PII. The copy is permanent and irreversible — there is no path back to the original values. This post compares the tools that do it, and shows how to build your own when none of them fit.

Perforce Delphix

Engines: Oracle, SQL Server, PostgreSQL, MySQL, and Db2 (z/OS and iSeries), plus mainframe, PaaS, and file sources — bundled "Standard" connectors, with add-on "Select" connectors for the long tail.

Delphix is the enterprise option. It runs automated sensitive-data discovery, then applies deterministic masking algorithms that preserve referential integrity within and across sources. The masking is irreversible: production values become realistic but fictitious. It pairs masking with data virtualization, so masked copies ship to downstream environments without provisioning full physical storage. On-prem and cloud.

Fits a heterogeneous estate with a formal refresh pipeline and compliance-evidence requirements. Priced and operated as an enterprise platform.

Tonic.ai

Engines: PostgreSQL, MySQL, SQL Server, Oracle, and MongoDB, plus Snowflake, BigQuery, Spark, Salesforce, and flat files.

Tonic Structural is built for developer test data. It de-identifies, subsets, and synthesizes structured and semi-structured data, turning a production database into a referentially intact test set. It runs as SaaS or self-hosted. Synthetic generation is first-class — useful when production data cannot leave its boundary even masked.

Fits teams that want realistic test data wired into development, and cases where synthesis matters more than mirroring production.

greenmask

Engines: PostgreSQL (production-ready); MySQL in beta.

greenmask is the open-source option (Apache-2.0). It works as a logical-dump proxy: it produces backups compatible with pg_restore, masking columns on the way through. Deterministic transformers use hash functions for consistent output, so referential integrity holds, and it supports database subsetting (including cyclic and polymorphic references) and synthetic generation. A single stateless binary, storage-agnostic across local and S3-compatible targets. It ships no scheduler of its own — you invoke the commands from cron, CI, or an orchestrator like Airflow.

Fits Postgres shops that want masking inside a dump/restore workflow without a commercial platform.

Roll your own

No open-source tool covers static masking end to end across engines. greenmask handles the Postgres dump path; beyond that, teams build their own. "Build your own" means reproducing the five components a full SDM solution bundles:

  1. Discovery. Find and classify every sensitive column, and re-scan as the schema changes. Miss one and real PII ships downstream. Often the first piece to outgrow a script — see Top Open Source Sensitive Data Discovery Tools.
  2. Policy. Decide how each column is masked — an algorithm per type, deterministic where joins and foreign keys must survive — and keep the rules consistent as tables are added.
  3. Transformation engine. The component that reads production, applies the masks, and writes the target. Performance and reliability live here: full-table rewrites at volume, runs that are idempotent and restartable after a failure, and referential integrity held across the whole dataset.
  4. Scheduling. Run the refresh on a cadence, trigger it from upstream events, manage dependencies. A homegrown job leans on cron, CI, or an orchestrator.
  5. Audit logging. Record what ran, which columns were masked, and prove to an auditor that no real PII reached non-production.
Loading diagram…

A script can cover one or two of these for a small, stable, single-engine schema. The build-vs-buy line is the other three: discovery, scheduling, and audit are what you end up reinventing as the estate grows.

Comparison

Measured against the five components:

DelphixTonic.aigreenmaskRoll your own
EnginesOracle, SQL Server, Postgres, MySQL, Db2 (+ more)Postgres, MySQL, SQL Server, Oracle, Mongo, Snowflake, BigQueryPostgres (MySQL beta)Anything you script
DiscoveryAutomatedAutomatedDefine rules yourselfManual
PolicyCentralized UICentralized UIYAML configHand-coded
Referential integrityAcross sourcesAcross tablesDeterministicDIY
Subsetting / syntheticSubsettingBothBothOnly if you build it
SchedulingBuilt-inBuilt-in (cron)External (cron / CI)External
Audit loggingBuilt-in reportsAudit trailNoneOnly if you build it
LicenseCommercialCommercial (SaaS / self-hosted)Open source (Apache-2.0)Free (your time)

The masking itself is not the split — all four transform data. The surrounding components are. Delphix and Tonic bundle discovery, scheduling, and audit; greenmask gives you the engine but leaves those to your pipeline; a homegrown script leaves you all five. The more engines, tables, and compliance scrutiny you carry, the more those bundled components justify a commercial tool.


Static masking protects the copies that leave production. It does nothing for the live database, where the requirement flips — mask at read time, by role, without altering the stored data. That is dynamic data masking, a separate control. Bytebase handles that side: queries route through its SQL Editor and results are masked before they leave it, one policy across every engine. It is not a static masking tool — pair it with one of the above for non-production.

Back to blog

Explore the standard for database development