Skip to content

fix(lstar): separate protocol rejections from internal assertions#1180

Merged
tcoratger merged 1 commit into
leanEthereum:mainfrom
tcoratger:fix/distinguish-rejection-from-assertion
Jul 4, 2026
Merged

fix(lstar): separate protocol rejections from internal assertions#1180
tcoratger merged 1 commit into
leanEthereum:mainfrom
tcoratger:fix/distinguish-rejection-from-assertion

Conversation

@tcoratger

Copy link
Copy Markdown
Collaborator

What

Addresses #1173: protocol rejections (validation failures every client must reproduce) and internal assertions (invariants that can never fire) were indistinguishable at the type level, because SpecRejectionError subclassed AssertionError and bare asserts reachable from input raised the same type. Worse, python -O strips bare asserts but never the typed raise SpecRejectionError(...), so the reference's own accept/reject set depended on an interpreter flag.

A four-specialist review (fork-choice safety, API design, tests, docs) classified every reachable assert and settled the fix below.

The exception hierarchy

SpecRejectionError now subclasses a new SpecError(Exception) root instead of AssertionError. The reason code — not the Python base class — is the client contract. This mirrors how SSZValueError deliberately subclasses Exception (not ValueError) so Pydantic does not rewrap it and destroy the payload, and it makes the rejection safe to raise inside a validator too.

Blast radius is tiny: only one catch site relied on the old base — the block-production loop caught AssertionError to swallow a WRONG_PROPOSER slot-race rejection. It is narrowed to except SpecRejectionError, which is also strictly better: genuine internal invariant errors now propagate instead of being silently logged as "skipped." Every other consumer already catches SpecRejectionError by name. The errors.py docstring's old "keeps existing rejection handlers working" rationale — the backward-compat shim the repo rule forbids — is removed.

The convention

Documented in errors.py: any unhandled exception during block import or the state transition means the input is invalid, so a transliterating client does not depend on the Python exception type. The companion discipline — a bare assert marks only a provably-unreachable invariant, every reachable validity check is a typed rejection — is what actually neutralizes the -O hazard.

The two concrete asserts the issue cites

  • state_transition.py justification-index assert — a provable invariant. The already-justified skip upstream guarantees the target sits above the finalized slot, so the index is always in range. Kept, and documented as an internal invariant that is never a client reject path.
  • state_transition.py finalization-rebase assert — the real defect. On a state rebuilt from untrusted bytes, a tracked justification root can legitimately be absent from the slot map. The old code asserted (and under -O would raise KeyError on the next line). Such a tally is off the canonical chain and can never justify a later slot, so the correct, Lean-model-matching behavior is to drop it, folded into the existing pruning comprehension. This makes the transition total on all decoded states and removes the interpreter-flag divergence. Locked with a new consensus vector that feeds a phantom off-chain root and asserts the tally is dropped with no crash.

The anchor isinstance guards are documented as by-construction type-narrowing checks (the type checker relies on them), not a client reject path.

Deferred: gossip REJECT / IGNORE taxonomy

The issue's second suggestion — a gossip REJECT (penalize peer) vs IGNORE (drop silently) taxonomy, e.g. routing future-slot and duplicate to IGNORE — is deliberately not in this PR. It is a networking-layer change that wires gossip validation outcomes to peer scoring (node/sync + peer manager), independent of the spec's rejection contract fixed here. It is worth doing as its own change; flagging it so it is not lost. RejectionReason plus the existing bufferable-reason set is the right foundation for it.

Coordination

The fork_choice.py asserts (state/anchor invariants) referenced by the sibling issue #1176 are handled in the still-open PR #1179 and are untouched here to avoid conflict.

Credit

Found via the Lean 4 formalization of this spec (NyxFoundation/formal-leanSpec), which already classifies each reachable assert.

Testing

New finalization vector fills green (13 finalization vectors pass, deterministic); the updated validator-service unit test passes; just check is clean.

Refs #1173

🤖 Generated with Claude Code

Protocol rejections and internal invariant asserts shared one exception
type: the rejection error subclassed AssertionError, and bare asserts
reachable from input raised the same type. A transliterating client could
not tell a validation failure every client must reproduce from an
invariant that can never fire, and python -O strips bare asserts while
keeping the typed raises, so the reference's accept/reject set depended
on an interpreter flag.

Changes:

- The rejection error now subclasses a new spec-error root instead of
  AssertionError, so the reason code, not the Python base class, is the
  client contract. This also makes it safe to raise inside a pydantic
  validator, matching how the SSZ error is defined. The one catch site
  that relied on the old base is narrowed to catch the rejection directly.

- Document the convention that any unhandled exception during block
  import or the state transition means the input is invalid, with the
  companion rule that a bare assert marks only a provably-unreachable
  invariant while every reachable validity check raises a typed rejection.

- The justification-index assert is kept and documented as a
  provably-unreachable invariant; the already-justified skip upstream
  guarantees an in-range index.

- The finalization rebase now drops a tracked tally whose root is off the
  canonical chain instead of asserting. On a state rebuilt from untrusted
  bytes such a root can be absent from the slot map; the tally is inert,
  so dropping it makes the transition total and removes a -O KeyError
  divergence. Matches the Lean model. Locked with a new vector.

- The anchor type guards are documented as by-construction internal
  checks, not a client rejection path.

Refs leanEthereum#1173

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@tcoratger tcoratger merged commit 6a2714a into leanEthereum:main Jul 4, 2026
14 checks passed
adust09 added a commit to NyxFoundation/formal-leanSpec that referenced this pull request Jul 5, 2026
)

Open the Validator domain: LeanSpec/Validator/Service.lean models the
attestation arm of ValidatorService.run (node/validator/service.py,
post leanEthereum/leanSpec#1180). Upstream guards double voting with
the service-wide _attested_slots set - one attestation pass covers
every validator the node manages - and the gate silently skips an
already-attested slot rather than failing, retrying gated slots on
later passes. The retention prune (max(0, slot - 4)) bounds the set.

The runtime around the gate (async clock loop, sync service, gossip
publishers, counters) is IO; the gate's inputs from it - the current
interval and the _is_synced_for_duties verdict - enter as arguments.

no_double_vote: the gate never fires for an attested slot.
attested_after_duty: a fired duty records its slot through the prune.
no_double_vote_after: once fired, never again for that slot.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant