Propagate BufferExec input panics instead of silently truncating output#150
Open
Tristan1900 wants to merge 1 commit into
Open
Conversation
|
Author
|
Codex Review: Didn't find any major issues. 👍 Reviewed commit: ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
A panic while polling the input inside MemoryBufferedStream's producer task was caught by the tokio task harness, dropping the sender; the consumer then read the closed channel as a clean EOF and silently truncated this partition's output. Wrap the input poll in catch_unwind and forward the panic as a DataFusionError over the existing error channel, so it propagates and fails the query instead of being swallowed. Found in staging: a SinglePartitioned aggregate emitting a 16-byte FixedSizeBinary group-key array past Arrow's 2 GiB i32-offset limit panicked, yet the query "succeeded" with partial (under-counted) results. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
15bd2c0 to
afdd4ea
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
MemoryBufferedStream(the stream behindBufferExec) spawns a producer task that polls the input and feeds a channel. If a panic unwinds out of the input poll, the tokio task harness catches it, the sender drops, and the consumer reads the closed channel as a clean EOF — so the partition's output is silently truncated and the query "succeeds" with partial results.This wraps the input poll in
catch_unwindand forwards the panic as aDataFusionErrorover the existing error channel (poll_nextalready surfacesSome(Err(..))). The query now fails loudly instead of returning a wrong answer, and only the offending query fails (not the process).Why
Found in staging: a
SinglePartitionedaggregate (gby=[__bhandle__, date_bin]) emitted a 16-byteFixedSizeBinarygroup-key column for ~458M groups — past Arrow's 2 GiBi32-offset limit — and panicked inGroupValuesRows::emit. The panic was swallowed here, so the query returned under-counted results instead of erroring.The oversized emit itself is a separate, upstream-shaped fix (chunk the aggregate emit to
batch_sizeso a single group-key array can't exceed 2 GiB). This PR only stops the silent failure mode.🤖 Generated with Claude Code