[adapter] Detect concurrent dependency mutation in PlanValidity by mtabebe · Pull Request #37078 · MaterializeInc/materialize

mtabebe · 2026-06-16T16:48:33Z

Problem:
ALTER CONNECTION ... ROTATE KEYS is a read-modify-write with a wide window between the catalog read and the catalog write. The slow steps in the middle are SSH key generation and secret-store I/O, all running in a spawned task. If ALTER CONNECTION SET (...) commits inside that window, the blind catalog write at the end of ROTATE overwrites the SET's change with the pre-SET create_sql. The SET silently disappears.

Note:
ROTATE KEYS is one of the few DDLs exempt from the global serialized_ddl lock (see must_serialize_ddl in command_handler.rs), which is what opens the race window in the first place. Other ALTER CONNECTION variants take the lock and can't race against themselves.

Solution:
PlanValidity already runs at every off-thread to on-thread hop and catches dropped dependencies. Extend it to also catch content changes: capture a hash of each dependency's create_sql at plan time, compare against the live hash in check, and return a new ConcurrentDependencyMutation error (SQLSTATE 40001) on mismatch. The user retries the statement.

PlanValidity::new and extend_dependencies now take &Catalog so they can read entries to hash; all callsites are updated.

Testing:
Regression test that triggers the wait point

Closes SQL-272.

ggevay

LGTM, just a test comment.

(At first glance I was worried about the performance cost the hashing introduces, but PlanValidity is not used by peeks nowadays that we have frontend_peek, and other statements are not that performance-critical, so it's fine.)

ggevay · 2026-06-17T10:53:58Z

How about a fail_point! between rotate_keys_ensure and rotate_keys_finish? I think that would make this test deterministic, plus it would dramatically simplify it.

def- · 2026-06-17T15:13:03Z

It would be great if we don't add a new mzcompose.py for this specific test. If you want a real race, maybe test/race-condition, otherwise what Gabor says works great.

def-

With this extra test:

diff --git a/test/race-condition-rotate-keys/mzcompose.py b/test/race-condition-rotate-keys/mzcompose.py
index 1674d8dd56..e13b7cae7d 100644
--- a/test/race-condition-rotate-keys/mzcompose.py
+++ b/test/race-condition-rotate-keys/mzcompose.py
@@ -27,8 +27,21 @@ instead of silently reverting. The test asserts two things:
   - zero silent reverts across the loop, and
   - at least one OCC conflict, so the test fails if the race window moves and
     we stop actually exercising the fix.
+
+Workflows:
+  - `default`: the SQL-272 regression test described above.
+  - `peek-rename-conflict`: proves Finding 1 from the PR review -- the same
+    `create_sql`-hash check spuriously fails a SELECT staged through the
+    Coordinator when a dependency is RENAMEd during the off-thread optimize
+    window, even though a rename can't invalidate the plan. It also documents
+    the real scope: in the default config simple SELECTs use frontend peek
+    sequencing and bypass PlanValidity, so the regression only bites the
+    Coordinator peek path (frontend sequencing off, or a fallback query shape).
+    RED on the PR branch; would go GREEN once the check is scoped to the DDL
+    paths that need it.
 """

+import threading
 import time
 from textwrap import dedent

@@ -190,3 +203,234 @@ def workflow_default(c: Composition, parser: WorkflowArgumentParser) -> None:
             "iterations; saw zero. Either the race window has moved, the "
             "stagger is wrong, or the fix in PlanValidity has regressed."
         )
+
+
+def _race_select_vs_rename(
+    c: Composition, iterations: int, join_width: int
+) -> tuple[int, list[str], int]:
+    """Run a tight `SELECT` loop against a tight `RENAME` loop on one of the
+    SELECT's dependencies, and report how many SELECTs spuriously failed with
+    `ConcurrentDependencyMutation` / SQLSTATE 40001.
+
+    The caller decides which sequencing path the SELECT takes by setting
+    `enable_frontend_peek_sequencing` *before* calling this (the value is read
+    when a session is created, so the SELECT connection opened here picks up the
+    current setting).
+
+    Returns `(selects_ok, spurious_conflict_messages, renames_committed)`.
+    """
+    join_width = max(2, join_width)
+    # A wide self-join over a 3-row table: cheap to *execute*, expensive to
+    # *optimize*. We need the optimizer's off-thread `spawn_blocking` window
+    # (peek.rs `peek_optimize`) to be wide enough that a concurrently-committed
+    # RENAME reliably lands inside it; the join width buys that window without
+    # inflating data volume. (~50ms/SELECT at width 8.)
+    rels = ", ".join(f"rename_race.t AS r{i}" for i in range(join_width))
+    preds = " AND ".join(f"r0.a = r{i}.a" for i in range(1, join_width))
+    # A dedicated schema dropped with CASCADE makes setup idempotent regardless
+    # of what name the table was left under by a previous run's RENAME loop.
+    c.sql(dedent(f"""
+            DROP SCHEMA IF EXISTS rename_race CASCADE;
+            CREATE SCHEMA rename_race;
+            CREATE TABLE rename_race.t (a int);
+            INSERT INTO rename_race.t VALUES (1), (2), (3);
+            CREATE VIEW rename_race.v AS SELECT count(*) AS c FROM {rels} WHERE {preds};
+            """))
+
+    # The RENAME loop renames the base table to a *fresh* name every iteration
+    # (t_1, t_2, ...). Renaming to brand-new names -- rather than flipping
+    # between two -- means the dependency's `create_sql` never cycles back to a
+    # value the SELECT captured at plan time, so any rename that commits inside
+    # the optimize window is guaranteed to mismatch. (Flipping between two names
+    # tends to cancel out: ~2 renames land per window, an even number, leaving
+    # `create_sql` back where it started.)
+    #
+    # `v` is never renamed, so `SELECT c FROM rename_race.v` always resolves from
+    # the client's side; the only rename-induced failure the SELECT can observe
+    # is the spurious `create_sql`-hash 40001. We never DROP anything mid-loop,
+    # so a `ConcurrentDependencyDrop` can't fire either.
+    stop = threading.Event()
+    renames_committed = 0
+    rename_err: Exception | None = None
+
+    def renamer() -> None:
+        nonlocal renames_committed, rename_err
+        cur_name = "t"
+        i = 0
+        try:
+            # Autocommit is on (see Composition.sql_connection), so each RENAME
+            # is its own committed catalog transaction that bumps the catalog's
+            # transient revision -- exactly what forces `check` off its fast path.
+            with c.sql_cursor(reuse_connection=False) as cur:
+                while not stop.is_set():
+                    i += 1
+                    nxt = f"t_{i}"
+                    cur.execute(
+                        f"ALTER TABLE rename_race.{cur_name} RENAME TO {nxt}".encode()
+                    )
+                    cur_name = nxt
+                    renames_committed += 1
+        except Exception as e:
+            rename_err = e
+
+    t_rename = PropagatingThread(target=renamer, name="renamer")
+    t_rename.start()
+
+    spurious_conflicts: list[str] = []
+    selects_ok = 0
+    try:
+        # Fresh connection so the SELECTs run concurrently with the renamer and
+        # so the session picks up the current `enable_frontend_peek_sequencing`.
+        with c.sql_cursor(reuse_connection=False) as cur:
+            for i in range(iterations):
+                try:
+                    cur.execute(b"SELECT c FROM rename_race.v")
+                    cur.fetchall()
+                    selects_ok += 1
+                except Exception as e:
+                    msg = str(e)
+                    sqlstate = getattr(e, "sqlstate", None)
+                    if OCC_CONFLICT_MESSAGE in msg or sqlstate == "40001":
+                        # The bug: a benign concurrent RENAME spuriously failed a
+                        # plain SELECT.
+                        spurious_conflicts.append(
+                            f"iteration {i}: sqlstate={sqlstate}: {msg}"
+                        )
+                    else:
+                        # Any other error is not what this test is about; surface
+                        # it loudly rather than letting it look like the bug.
+                        raise Exception(
+                            f"iteration {i}: SELECT failed with an unexpected error: {e}"
+                        )
+    finally:
+        stop.set()
+        t_rename.join()
+
+    if rename_err is not None:
+        raise Exception(f"RENAME loop failed unexpectedly: {rename_err}")
+
+    return selects_ok, spurious_conflicts, renames_committed
+
+
+def workflow_peek_rename_conflict(
+    c: Composition, parser: WorkflowArgumentParser
+) -> None:
+    """
+    Demonstrates the blast radius of the SQL-272 fix (PR #37078 review,
+    Finding 1): the `create_sql`-hash check was added to the *shared*
+    `PlanValidity::check`, which runs for every plan staged through the
+    Coordinator -- not just the read-modify-write DDL (ROTATE KEYS) it was meant
+    for.
+
+    A SELECT staged through the Coordinator has an off-thread optimize hop
+    (`spawn_blocking` in peek.rs `peek_optimize`). `PlanValidity::new` captures a
+    hash of each dependency's `create_sql` before that hop; `check` re-compares
+    it after. A peek references its dependencies by stable id, so a concurrent,
+    *benign* `ALTER ... RENAME` of a dependency leaves the optimized dataflow
+    100% valid -- yet the rename rewrites that dependency's `create_sql` (see
+    `Op::RenameItem` in catalog/transact.rs, which also rewrites every
+    dependent's `create_sql`), the hash changes, and the SELECT fails with
+    `ConcurrentDependencyMutation` / SQLSTATE 40001. Before PR #37078 the SELECT
+    succeeded: the old `PlanValidity` only failed on *dropped* dependencies.
+
+    IMPORTANT SCOPE REFINEMENT (discovered while building this test): in the
+    default configuration, simple SELECTs are sequenced in the Adapter Frontend
+    (`enable_frontend_peek_sequencing`, default true) and *never touch*
+    `PlanValidity`, so they are NOT affected. The regression only reaches the
+    Coordinator peek path, which is used when frontend peek sequencing is
+    disabled (a supported config) or when a query shape falls back to the old
+    sequencer. RENAME also can't race a lock-taking DDL like CREATE
+    INDEX/MV/VIEW: those serialize behind the global `serialized_ddl` lock (see
+    `must_serialize_ddl`), so the affected staged operations are the lock-exempt
+    ones -- peeks, SUBSCRIBE, and the purification/exempt DDLs.
+
+    The test runs the same SELECT-vs-RENAME race twice:
+      * Phase A, frontend peek sequencing ON (the default): documents that plain
+        SELECTs are unaffected (expected zero spurious failures).
+      * Phase B, frontend peek sequencing OFF (Coordinator peek path): asserts
+        the SELECT never fails with 40001. This is RED on the PR branch (the bug)
+        and would go GREEN once the hash check is scoped to the DDL paths that
+        actually need it.
+    """
+    parser.add_argument(
+        "--iterations",
+        default=100,
+        type=int,
+        help="Number of SELECTs per phase to run against the racing RENAME loop. "
+        "At the default join width the Coordinator path fails roughly half of "
+        "them, so even a few dozen reliably reproduces the bug.",
+    )
+    parser.add_argument(
+        "--join-width",
+        default=8,
+        type=int,
+        help="Number of self-joined copies of the base table in the view. A "
+        "wider join makes the optimizer's off-thread window longer, so a "
+        "concurrently-committed RENAME reliably lands inside it. Execution stays "
+        "instant because the base table is tiny -- the cost is all in planning.",
+    )
+    args = parser.parse_args()
+
+    c.up("materialized")
+
+    # Phase A: default config. Simple SELECTs go through frontend peek
+    # sequencing, which does not use PlanValidity, so the hash check can't fire.
+    c.sql(
+        "ALTER SYSTEM RESET enable_frontend_peek_sequencing",
+        user="mz_system",
+        port=6877,
+    )
+    ok_a, spurious_a, renames_a = _race_select_vs_rename(
+        c, args.iterations, args.join_width
+    )
+    print(
+        f"--- Phase A (enable_frontend_peek_sequencing = ON, the default): "
+        f"selects_ok={ok_a}, renames_committed={renames_a}, "
+        f"spurious_40001={len(spurious_a)} "
+        f"(expected 0: frontend peek sequencing bypasses PlanValidity)"
+    )
+
+    # Phase B: Coordinator peek path. New sessions pick up the changed value, and
+    # `_race_select_vs_rename` opens a fresh SELECT connection, so the SELECTs
+    # are staged through the Coordinator and run `PlanValidity::check`.
+    c.sql(
+        "ALTER SYSTEM SET enable_frontend_peek_sequencing = false",
+        user="mz_system",
+        port=6877,
+    )
+    ok_b, spurious_b, renames_b = _race_select_vs_rename(
+        c, args.iterations, args.join_width
+    )
+    c.sql(
+        "ALTER SYSTEM RESET enable_frontend_peek_sequencing",
+        user="mz_system",
+        port=6877,
+    )
+    print(
+        f"--- Phase B (enable_frontend_peek_sequencing = OFF, Coordinator path): "
+        f"selects_ok={ok_b}, renames_committed={renames_b}, "
+        f"spurious_40001={len(spurious_b)}"
+    )
+
+    # Guard against a silently-ineffective test: if the RENAME loop never
+    # actually committed, the race was never exercised and a green result means
+    # nothing.
+    if renames_b == 0:
+        raise Exception("Phase B committed zero renames; the race was never exercised.")
+
+    if spurious_b:
+        sample = "\n  ".join(spurious_b[:10])
+        raise Exception(
+            f"Finding 1 reproduced: {len(spurious_b)} of {args.iterations} plain "
+            "SELECTs on the Coordinator peek path failed with "
+            "ConcurrentDependencyMutation / SQLSTATE 40001 because a concurrent, "
+            "benign ALTER ... RENAME changed a dependency's create_sql during the "
+            "SELECT's off-thread optimize window.\n"
+            "A rename leaves the optimized dataflow valid (dependencies are "
+            "referenced by stable id, not name), so this failure is spurious; "
+            "before PR #37078 the SELECT succeeded. The create_sql-hash check in "
+            "PlanValidity::check should not run for read-only peek plans.\n"
+            f"(For contrast, Phase A on the default frontend peek path saw "
+            f"{len(spurious_a)} such failures.)\n"
+            f"Sample failures:\n  {sample}"
+        )

We're seeing a bunch of concurrent dependency mutation errors which shouldn't fail:

  iteration 5: sqlstate=40001: catalog item 'u1027' was concurrently modified
HINT:  Another session modified one of this statement's dependencies before it could commit. Retry the statement.
  iteration 6: sqlstate=40001: catalog item 'u1028' was concurrently modified
HINT:  Another session modified one of this statement's dependencies before it could commit. Retry the statement.
  iteration 8: sqlstate=40001: catalog item 'u1028' was concurrently modified
HINT:  Another session modified one of this statement's dependencies before it could commit. Retry the statement.
  iteration 11: sqlstate=40001: catalog item 'u1027' was concurrently modified
HINT:  Another session modified one of this statement's dependencies before it could commit. Retry the statement.
  iteration 12: sqlstate=40001: catalog item 'u1028' was concurrently modified
HINT:  Another session modified one of this statement's dependencies before it could commit. Retry the statement.
  iteration 13: sqlstate=40001: catalog item 'u1027' was concurrently modified
HINT:  Another session modified one of this statement's dependencies before it could commit. Retry the statement.
  iteration 14: sqlstate=40001: catalog item 'u1028' was concurrently modified
HINT:  Another session modified one of this statement's dependencies before it could commit. Retry the statement.
  iteration 16: sqlstate=40001: catalog item 'u1028' was concurrently modified
HINT:  Another session modified one of this statement's dependencies before it could commit. Retry the statement.
  iteration 17: sqlstate=40001: catalog item 'u1028' was concurrently modified
HINT:  Another session modified one of this statement's dependencies before it could commit. Retry the statement.
  iteration 19: sqlstate=40001: catalog item 'u1028' was concurrently modified
HINT:  Another session modified one of this statement's dependencies before it could commit. Retry the statement.

def- · 2026-06-17T16:22:02Z

(The nightly failures look unrelated)

def- · 2026-06-18T08:01:57Z

Triggered a nightly run with the new test: https://buildkite.com/materialize/nightly/builds/16822 Edit: The test doesn't seem to work:

ALTER CONNECTION ROTATE KEYS vs SET race (SQL-272): test timed out

def-

The PR description is now wrong about the regression test location.

Triggered another nightly run because the test just timed out in previous runs: https://buildkite.com/materialize/nightly/builds/16859
Edit: Failed for me locally, works with this diff:

diff --git a/src/adapter/src/coord/sequencer/inner/secret.rs b/src/adapter/src/coord/sequencer/inner/secret.rs
index 71862615e0..09913a39c9 100644
--- a/src/adapter/src/coord/sequencer/inner/secret.rs
+++ b/src/adapter/src/coord/sequencer/inner/secret.rs
@@ -385,6 +385,13 @@ impl Coordinator {
                     name: entry.name,
                     to_item,
                 }];
+
+                // pause between the secret-store write and the catalog write so a concurrent ALTER
+                // CONNECTION SET can commit.  Runs in the off-thread ensure task, not the
+                // coordinator main loop, so the pause won't freeze the coordinator (which would
+                // block that SET).
+                fail::fail_point!("rotate_keys_between_ensure_and_finish");
+
                 let stage = SecretStage::RotateKeysFinish(RotateKeysSecretFinish { validity, ops });
                 Ok(Box::new(stage))
             }
@@ -398,10 +405,6 @@ impl Coordinator {
         session: &Session,
         RotateKeysSecretFinish { ops, validity: _ }: RotateKeysSecretFinish,
     ) -> Result<StageResult<Box<SecretStage>>, AdapterError> {
-        // Test-only seam: lets a regression test pause ROTATE here so a
-        // concurrent ALTER CONNECTION SET can commit before `catalog_transact`
-        // re-validates dependencies.
-        fail::fail_point!("rotate_keys_between_ensure_and_finish");
         self.catalog_transact(Some(session), ops).await?;
         Ok(StageResult::Response(ExecuteResponse::AlteredObject(
             ObjectType::Connection,
diff --git a/test/race-condition/mzcompose.py b/test/race-condition/mzcompose.py
index f41a4491df..cfcc6cf68b 100644
--- a/test/race-condition/mzcompose.py
+++ b/test/race-condition/mzcompose.py
@@ -1407,8 +1407,10 @@ def toxiproxy_start(c: Composition, jitter: int) -> None:
     assert r.status_code == 200, r


-# Failpoint that pauses `rotate_keys_finish` between the secret-store write and
-# the catalog write. Defined in src/adapter/src/coord/sequencer/inner/secret.rs.
+# Pauses ROTATE in its off-thread `rotate_keys_ensure` task, between the
+# secret-store write and the catalog write. Must fire off the coordinator main
+# loop; pausing on-loop would freeze the coordinator and block this test's SET.
+# Defined in src/adapter/src/coord/sequencer/inner/secret.rs.
 ROTATE_KEYS_FAILPOINT = "rotate_keys_between_ensure_and_finish"

 # Substring of the Display impl of `AdapterError::ConcurrentDependencyMutation`

def- · 2026-06-23T13:24:31Z

         skip: "Not stable yet, not clear if this is a product issue"

+       - id: race-condition-rotate-keys
+         label: "ALTER CONNECTION ROTATE KEYS vs SET race (SQL-272)"


Suggested change

label: "ALTER CONNECTION ROTATE KEYS vs SET race (SQL-272)"

label: "ALTER CONNECTION ROTATE KEYS vs SET race"

Problem: `ALTER CONNECTION ... ROTATE KEYS` is a read-modify-write with a wide window between the catalog read and the catalog write. The slow steps in the middle are SSH key generation and secret-store I/O, all running in a spawned task. If `ALTER CONNECTION SET (...)` commits inside that window, the blind catalog write at the end of ROTATE overwrites the SET's change with the pre-SET `create_sql`. The SET silently disappears. Note: ROTATE KEYS is one of the few DDLs exempt from the global `serialized_ddl` lock (see `must_serialize_ddl` in `command_handler.rs`), which is what opens the race window in the first place. Other ALTER CONNECTION variants take the lock and can't race against themselves. Solution: `PlanValidity` already runs at every off-thread to on-thread hop and catches dropped dependencies. Extend it to also catch content changes: capture a hash of each dependency's `create_sql` at plan time, compare against the live hash in `check`, and return a new `ConcurrentDependencyMutation` error (SQLSTATE 40001) on mismatch. The user retries the statement. `PlanValidity::new` and `extend_dependencies` now take `&Catalog` so they can read entries to hash; all callsites are updated. Testing: Regression test that triggers the wait point Closes SQL-272.

mtabebe · 2026-06-23T15:25:52Z

Ah I updated the commit message but that didn't update the PR

mtabebe force-pushed the ma/sql-272 branch 4 times, most recently from 656cd63 to 38d6107 Compare June 16, 2026 20:42

mtabebe marked this pull request as ready for review June 16, 2026 22:27

mtabebe requested review from a team as code owners June 16, 2026 22:27

ggevay self-requested a review June 17, 2026 08:11

ggevay approved these changes Jun 17, 2026

View reviewed changes

def- reviewed Jun 17, 2026

View reviewed changes

def- requested changes Jun 17, 2026

View reviewed changes

mtabebe force-pushed the ma/sql-272 branch from 38d6107 to d798cd8 Compare June 17, 2026 16:59

mtabebe force-pushed the ma/sql-272 branch 4 times, most recently from b09d535 to 3fc560e Compare June 22, 2026 21:17

mtabebe requested a review from def- June 23, 2026 01:48

def- requested changes Jun 23, 2026

View reviewed changes

mtabebe force-pushed the ma/sql-272 branch from 3fc560e to ec958d2 Compare June 23, 2026 15:25

mtabebe requested a review from def- June 23, 2026 15:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[adapter] Detect concurrent dependency mutation in PlanValidity#37078

[adapter] Detect concurrent dependency mutation in PlanValidity#37078
mtabebe wants to merge 1 commit into
MaterializeInc:mainfrom
mtabebe:ma/sql-272

mtabebe commented Jun 16, 2026 •

edited

Loading

Uh oh!

ggevay left a comment

Uh oh!

ggevay Jun 17, 2026

Uh oh!

def- Jun 17, 2026

Uh oh!

def- left a comment

Uh oh!

def- commented Jun 17, 2026

Uh oh!

def- commented Jun 18, 2026 •

edited

Loading

Uh oh!

def- left a comment •

edited

Loading

Uh oh!

def- Jun 23, 2026

Uh oh!

mtabebe commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	label: "ALTER CONNECTION ROTATE KEYS vs SET race (SQL-272)"
	label: "ALTER CONNECTION ROTATE KEYS vs SET race"

Conversation

mtabebe commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggevay left a comment

Choose a reason for hiding this comment

Uh oh!

ggevay Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

def- Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

def- left a comment

Choose a reason for hiding this comment

Uh oh!

def- commented Jun 17, 2026

Uh oh!

def- commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

def- left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

def- Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

mtabebe commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mtabebe commented Jun 16, 2026 •

edited

Loading

def- commented Jun 18, 2026 •

edited

Loading

def- left a comment •

edited

Loading