# Static Data Masking Tools (and How to Roll Your Own)

> The static data masking tools that produce safe non-production data — Delphix, Tonic.ai, and greenmask compared — plus how to roll your own when no tool fits.

Tianzhou | 2026-05-25 | Source: https://www.bytebase.com/blog/static-data-masking-tools/

---

| Update History | Comment          |
| -------------- | ---------------- |
| 2026/05/25     | Initial version. |

Static data masking (SDM) rewrites sensitive data into a sanitized copy, so dev, test, and staging never hold real PII. The copy is permanent and irreversible — there is no path back to the original values. This post compares the tools that do it, and shows how to build your own when none of them fit.

## Perforce Delphix

**Engines:** Oracle, SQL Server, PostgreSQL, MySQL, and Db2 (z/OS and iSeries), plus mainframe, PaaS, and file sources — bundled "Standard" connectors, with add-on "Select" connectors for the long tail.

[Delphix](https://www.perforce.com/products/delphix/data-masking) is the enterprise option. It runs automated sensitive-data discovery, then applies deterministic masking algorithms that preserve referential integrity within and across sources. The masking is irreversible: production values become realistic but fictitious. It pairs masking with data virtualization, so masked copies ship to downstream environments without provisioning full physical storage. On-prem and cloud.

Fits a heterogeneous estate with a formal refresh pipeline and compliance-evidence requirements. Priced and operated as an enterprise platform.

## Tonic.ai

**Engines:** PostgreSQL, MySQL, SQL Server, Oracle, and MongoDB, plus Snowflake, BigQuery, Spark, Salesforce, and flat files.

[Tonic Structural](https://www.tonic.ai/) is built for developer test data. It de-identifies, subsets, and synthesizes structured and semi-structured data, turning a production database into a referentially intact test set. It runs as SaaS or self-hosted. Synthetic generation is first-class — useful when production data cannot leave its boundary even masked.

Fits teams that want realistic test data wired into development, and cases where synthesis matters more than mirroring production.

## greenmask

**Engines:** PostgreSQL (production-ready); MySQL in beta.

[greenmask](https://github.com/GreenmaskIO/greenmask) is the open-source option (Apache-2.0). It works as a logical-dump proxy: it produces backups compatible with `pg_restore`, masking columns on the way through. Deterministic transformers use hash functions for consistent output, so referential integrity holds, and it supports database subsetting (including cyclic and polymorphic references) and synthetic generation. A single stateless binary, storage-agnostic across local and S3-compatible targets. It ships no scheduler of its own — you invoke the commands from cron, CI, or an orchestrator like Airflow.

Fits Postgres shops that want masking inside a dump/restore workflow without a commercial platform.

## Roll your own

No open-source tool covers static masking end to end across engines. greenmask handles the Postgres dump path; beyond that, teams build their own. "Build your own" means reproducing the five components a full SDM solution bundles:

1. **Discovery.** Find and classify every sensitive column, and re-scan as the schema changes. Miss one and real PII ships downstream. Often the first piece to outgrow a script — see [Top Open Source Sensitive Data Discovery Tools](/blog/top-open-source-sensitive-data-discovery-tools/).
2. **Policy.** Decide how each column is masked — an algorithm per type, deterministic where joins and foreign keys must survive — and keep the rules consistent as tables are added.
3. **Transformation engine.** The component that reads production, applies the masks, and writes the target. Performance and reliability live here: full-table rewrites at volume, runs that are idempotent and restartable after a failure, and referential integrity held across the whole dataset.
4. **Scheduling.** Run the refresh on a cadence, trigger it from upstream events, manage dependencies. A homegrown job leans on cron, CI, or an orchestrator.
5. **Audit logging.** Record what ran, which columns were masked, and prove to an auditor that no real PII reached non-production.

```mermaid
flowchart TB
    Prod[("Production<br/>real values")]
    Discovery["1. Discovery"]
    Policy["2. Policy"]
    Engine["3. Transformation engine"]
    Schedule["4. Scheduling"]
    Audit["5. Audit log"]
    NonProd[("Non-production<br/>masked copy")]

    Prod -->|scan schema| Discovery
    Discovery -->|sensitive columns| Policy
    Policy -->|masking rules| Engine
    Schedule -->|trigger| Engine
    Prod ==>|read data| Engine
    Engine ==>|masked data| NonProd
    Engine -->|record| Audit

    style Engine fill:#e0f2fe,stroke:#0369a1
```

A script can cover one or two of these for a small, stable, single-engine schema. The build-vs-buy line is the other three: discovery, scheduling, and audit are what you end up reinventing as the estate grows.

## Comparison

Measured against the five components:

|                        | Delphix                    | Tonic.ai                        | greenmask                | Roll your own        |
| ---------------------- | -------------------------- | ------------------------------- | ------------------------ | -------------------- |
| Engines                | Oracle, SQL Server, Postgres, MySQL, Db2 (+ more) | Postgres, MySQL, SQL Server, Oracle, Mongo, Snowflake, BigQuery | Postgres (MySQL beta) | Anything you script |
| Discovery              | Automated                  | Automated                       | Define rules yourself    | Manual               |
| Policy                 | Centralized UI             | Centralized UI                  | YAML config              | Hand-coded           |
| Referential integrity  | Across sources             | Across tables                   | Deterministic            | DIY                  |
| Subsetting / synthetic | Subsetting                 | Both                            | Both                     | Only if you build it |
| Scheduling             | Built-in                   | Built-in (cron)                 | External (cron / CI)     | External             |
| Audit logging          | Built-in reports           | Audit trail                     | None                     | Only if you build it |
| License                | Commercial                 | Commercial (SaaS / self-hosted) | Open source (Apache-2.0) | Free (your time)     |

The masking itself is not the split — all four transform data. The surrounding components are. Delphix and Tonic bundle discovery, scheduling, and audit; greenmask gives you the engine but leaves those to your pipeline; a homegrown script leaves you all five. The more engines, tables, and compliance scrutiny you carry, the more those bundled components justify a commercial tool.

---

Static masking protects the copies that leave production. It does nothing for the live database, where the requirement flips — mask at read time, by role, without altering the stored data. That is dynamic data masking, a separate control. [Bytebase](https://docs.bytebase.com/security/data-masking/overview/) handles that side: queries route through its SQL Editor and results are masked before they leave it, one policy across every engine. It is not a static masking tool — pair it with one of the above for non-production.