← Blog

CREATE OR REPLACE TABLE on Delta feels safe. Here's when it isn't.

June 18, 2026 · 5 min read

On Databricks, CREATE OR REPLACE TABLE is the recommended way to rebuild a table — and that’s precisely what makes it dangerous. Unlike DROP + CREATE, it’s a single transaction on the Delta log: the table’s history survives, time travel keeps working, and the docs are right to prefer it.

So engineers internalize “replace is safe, Delta keeps history,” and then one of three things quietly proves the rule has edges. This post is about those edges: where the safety net actually ends, what’s genuinely unrecoverable, and how to review for it.

What the Delta log actually protects

When you run CREATE OR REPLACE TABLE, Delta writes a new version that swaps in the new data and schema. The old version’s data files are not deleted — they’re dereferenced. That’s why this works:

CREATE OR REPLACE TABLE sales.orders AS SELECT * FROM staging.orders_rebuilt;

-- "oops" — and yet:
DESCRIBE HISTORY sales.orders;            -- the REPLACE is just version N
RESTORE TABLE sales.orders TO VERSION AS OF (N - 1);   -- back like nothing happened

This is real protection and it’s better than almost any other warehouse gives you. But it protects exactly one thing: the table’s contents at past versions. Three production-critical things live outside that protection.

Edge 1: the safety net is on a timer

Dereferenced files survive until VACUUM removes them — by default, anything older than 7 days of retention. Most production workspaces run VACUUM on a schedule (or have predictive optimization doing it for them), because not vacuuming has a real storage cost.

So RESTORE after a bad replace isn’t “we have backups.” It’s “we have up to 7 days, minus however long it takes anyone to notice.” A bad replace discovered during a month-end reconciliation — three weeks later — restores nothing:

RESTORE TABLE sales.orders TO VERSION AS OF 41;
-- => [DELTA_MISSING_FILES_FOR_VERSION] ... files referenced by version 41
--    have been removed by VACUUM

The failure mode isn’t the replace. It’s the replace plus latency of discovery, and discovery latency on data bugs is measured in weeks.

Edge 2: streaming readers don’t ride along

Any Structured Streaming query reading the table as a source treats a replace as what it is — a non-append change — and fails:

StreamingQueryException: [DELTA_SOURCE_TABLE_IGNORE_CHANGES]
Detected a data update ... This is currently not supported.

The honest fix is a new checkpoint plus a full reprocess of the source — expensive, but correct. The tempting fix is the one in every forum answer: set skipChangeCommits (or legacy ignoreChanges) and move on. Now the stream silently skips the replaced data: no error, no gap report, just downstream tables that disagree with upstream forever after.

And if the object you replace is itself a streaming table (Lakeflow declarative pipelines), it’s worse: the checkpoint that tracked what’s been ingested is gone, full stop. The rebuilt table re-reads from earliest — duplicating everything, or missing everything past the source’s own retention. There is no RESTORE for a checkpoint. This is the Databricks twin of dropping a Snowflake stream: the data may be recoverable; the cursor never is.

Edge 3: managed tables make DROP a file deletion

The neighbor mistake, worth naming because the same review should catch it. For an external table, DROP TABLE removes metadata; the files stay in your storage. For a managed table, the files belong to the catalog — Unity Catalog gives you a short UNDROP TABLE grace window (about a week), and then deletes the data permanently.

Two statements that look identical in a PR diff:

DROP TABLE sales.orders_ext;   -- metadata only; files untouched
DROP TABLE sales.orders;       -- managed: files deleted after the UNDROP window

A reviewer can’t tell which is which from the SQL alone. The information that decides whether this is reversible — MANAGED vs EXTERNAL — isn’t in the statement; it’s in the catalog.

The pattern across all three

Each edge has the same shape: the statement looks identical to a safe one, and the thing that makes it unsafe lives somewhere else — in the VACUUM schedule, in a consumer’s checkpoint, in the table’s managed/external flag. Which is why “a senior engineer reviews the SQL in Slack” structurally cannot catch these. The reviewer is being asked to recall invisible state.

The checklist version:

  1. Before replacing a table, list its streaming consumers — every checkpoint pointed at it is state you’re about to invalidate.
  2. Treat RESTORE as a 7-day countdown, not a backup. If discovery of a data bug can take longer than your VACUUM retention (it can), a replace needs a pre-step: CREATE TABLE sales.orders_pre_replace SHALLOW CLONE sales.orders.
  3. Never DROP without knowing managed vs external. DESCRIBE EXTENDED first, every time.
  4. Never set skipChangeCommits to make an error go away. The error is the correct behavior.

Making the invisible state visible

This class of problem — the danger isn’t in the statement, it’s in the state around it — is what I built DDT for: schema-as-files for Unity Catalog, with a compare engine that classifies every proposed change using the catalog state, not just the SQL text. The same diff line classifies differently depending on what it actually touches:

$ ddt compare --project ./Lakehouse --target prod

  ~ TABLE  sales.orders            replace (managed, 2 streaming readers)  [UNRECOVERABLE]
  - TABLE  sales.orders_ext        drop   (external; files retained)       [DESTRUCTIVE]
  + TABLE  sales.orders_v2                                                 [SAFE]

  1 UNRECOVERABLE change — publish refused.
  Remediation: SHALLOW CLONE before replace; restart readers from new checkpoint.

Unrecoverable changes refuse to deploy without an explicit opt-in gate — the review question stops being “did the human remember the checkpoint” and becomes mechanical. The deterministic core is free, runs offline with no workspace credentials needed for compare, and there’s a 60-second demo: npm i -g @ddt-tools/cli.

Tool aside, the portable lesson: on Delta, recoverability is a property of the surrounding state, not the statement. Review the state.


Independent tool, not affiliated with or endorsed by Databricks Inc. “Databricks”, “Unity Catalog”, and “Delta Lake” are trademarks of Databricks Inc.


Ship schema changes against Snowflake or Databricks?

SDT and DDT put a safety classifier on every change — data-loss operations refuse by default. Free tier, runs local-only.

Install SDT Install DDT See pricing