Skip to content

Propagate BufferExec input panics instead of silently truncating output#150

Open
Tristan1900 wants to merge 1 commit into
DataDog:branch-53from
Tristan1900:wenqi.mou/bufferexec-propagate-input-panic
Open

Propagate BufferExec input panics instead of silently truncating output#150
Tristan1900 wants to merge 1 commit into
DataDog:branch-53from
Tristan1900:wenqi.mou/bufferexec-propagate-input-panic

Conversation

@Tristan1900

@Tristan1900 Tristan1900 commented Jun 26, 2026

Copy link
Copy Markdown

What

MemoryBufferedStream (the stream behind BufferExec) spawns a producer task that polls the input and feeds a channel. If a panic unwinds out of the input poll, the tokio task harness catches it, the sender drops, and the consumer reads the closed channel as a clean EOF — so the partition's output is silently truncated and the query "succeeds" with partial results.

This wraps the input poll in catch_unwind and forwards the panic as a DataFusionError over the existing error channel (poll_next already surfaces Some(Err(..))). The query now fails loudly instead of returning a wrong answer, and only the offending query fails (not the process).

Why

Found in staging: a SinglePartitioned aggregate (gby=[__bhandle__, date_bin]) emitted a 16-byte FixedSizeBinary group-key column for ~458M groups — past Arrow's 2 GiB i32-offset limit — and panicked in GroupValuesRows::emit. The panic was swallowed here, so the query returned under-counted results instead of erroring.

The oversized emit itself is a separate, upstream-shaped fix (chunk the aggregate emit to batch_size so a single group-key array can't exceed 2 GiB). This PR only stops the silent failure mode.

🤖 Generated with Claude Code

@datadog-prod-us1-5

datadog-prod-us1-5 Bot commented Jun 26, 2026

Copy link
Copy Markdown

Pipelines

Fix all issues with BitsAI

⚠️ Warnings

🚦 2 Pipeline jobs failed

Datafusion extended tests | Run sqllogictests with the sqlite test suite   View in Datadog   GitHub Actions

Rust | build and run with wasm-pack   View in Datadog   GitHub Actions

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: afdd4ea | Docs | Datadog PR Page | Give us feedback!

@Tristan1900

Copy link
Copy Markdown
Author

@codex

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. 👍

Reviewed commit: 15bd2c0de0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@Tristan1900 Tristan1900 marked this pull request as ready for review June 26, 2026 19:55
A panic while polling the input inside MemoryBufferedStream's producer
task was caught by the tokio task harness, dropping the sender; the
consumer then read the closed channel as a clean EOF and silently
truncated this partition's output. Wrap the input poll in catch_unwind
and forward the panic as a DataFusionError over the existing error
channel, so it propagates and fails the query instead of being swallowed.

Found in staging: a SinglePartitioned aggregate emitting a 16-byte
FixedSizeBinary group-key array past Arrow's 2 GiB i32-offset limit
panicked, yet the query "succeeded" with partial (under-counted) results.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@Tristan1900 Tristan1900 force-pushed the wenqi.mou/bufferexec-propagate-input-panic branch from 15bd2c0 to afdd4ea Compare June 26, 2026 20:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant