Skip to content

Support pushing down empty projections into joins#20191

Merged
alamb merged 2 commits into
apache:mainfrom
restatedev:empty-projection-pushdown
Feb 12, 2026
Merged

Support pushing down empty projections into joins#20191
alamb merged 2 commits into
apache:mainfrom
restatedev:empty-projection-pushdown

Conversation

@jackkleeman

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

We should push down empty projections into HashJoinExec

What changes are included in this PR?

  1. try_embed_projection should embed empty projections
  2. build_batch_empty_build_side should support empty schemas

Are these changes tested?

Yes

Are there any user-facing changes?

No

@github-actions github-actions Bot added core Core DataFusion crate physical-plan Changes to the physical-plan crate labels Feb 6, 2026
@jackkleeman jackkleeman force-pushed the empty-projection-pushdown branch from 261c92b to dc95119 Compare February 6, 2026 16:48
@github-actions github-actions Bot added the sqllogictest SQL Logic Tests (.slt) label Feb 6, 2026

@kosiew kosiew left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackkleeman

Thanks for working on this.

Left some comments.

Comment thread datafusion/core/tests/physical_optimizer/projection_pushdown.rs
Comment thread datafusion/physical-plan/src/joins/utils.rs Outdated
Comment thread datafusion/physical-plan/src/projection.rs
@jackkleeman jackkleeman requested a review from kosiew February 11, 2026 10:37

@kosiew kosiew left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread datafusion/physical-plan/src/joins/utils.rs Outdated
@jackkleeman jackkleeman force-pushed the empty-projection-pushdown branch from a59eed0 to de897ca Compare February 12, 2026 08:43
@jackkleeman jackkleeman force-pushed the empty-projection-pushdown branch from de897ca to 95affa3 Compare February 12, 2026 08:46
@alamb

alamb commented Feb 12, 2026

Copy link
Copy Markdown
Contributor

Thanks @jackkleeman and @kosiew 🚀

@alamb alamb added this pull request to the merge queue Feb 12, 2026
Merged via the queue into apache:main with commit af5f470 Feb 12, 2026
32 checks passed
@jackkleeman jackkleeman deleted the empty-projection-pushdown branch February 12, 2026 22:33
jackkleeman added a commit to restatedev/datafusion that referenced this pull request Feb 17, 2026
- Closes apache#20190.

We should push down empty projections into HashJoinExec

1. try_embed_projection should embed empty projections
2. build_batch_empty_build_side should support empty schemas

Yes

No
jackkleeman added a commit to restatedev/datafusion that referenced this pull request Feb 17, 2026
- Closes apache#20190.

We should push down empty projections into HashJoinExec

1. try_embed_projection should embed empty projections
2. build_batch_empty_build_side should support empty schemas

Yes

No
de-bgunter pushed a commit to de-bgunter/datafusion that referenced this pull request Mar 24, 2026
## Which issue does this PR close?


- Closes apache#20190.

## Rationale for this change

We should push down empty projections into HashJoinExec

## What changes are included in this PR?

1. try_embed_projection should embed empty projections
2. build_batch_empty_build_side should support empty schemas

## Are these changes tested?

Yes

## Are there any user-facing changes?

No
pull Bot pushed a commit to buraksenn/datafusion that referenced this pull request Jun 25, 2026
…LoopJoinExec` (apache#23082)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes apache#23083 

## Rationale for this change

`HashJoinExec` and `NestedLoopJoinExec` carry `projection:
Option<Vec<usize>>` but the proto field is `repeated uint32`. Proto3
can't tell `None` from `Some(vec![])`, and the decoder treats both as
`None`. `Some(vec![])` is reachable in real plans —
`try_embed_projection` produces it for `SELECT count(1) … JOIN …`
(apache#20191) — and after a round-trip the join silently switches from "emit
zero columns" to "emit all columns".

`FilterExec` has a workaround for the same limitation; these two execs
were missed.

## What changes are included in this PR?
Encode `Some(vec![])` as the single-element sentinel `[u32::MAX]` (never
a valid column index); recognise it on decode. Everything else goes
through unchanged.

```rust
// encode
projection: match exec.projection.as_ref() {
    None => Vec::new(),
    Some(v) if v.is_empty() => vec![u32::MAX],
    Some(v) => v.iter().map(|x| *x as u32).collect(),
},

// decode
let projection = match hashjoin.projection.as_slice() {
    [] => None,
    [u32::MAX] => Some(Vec::new()),
    indices => Some(indices.iter().map(|i| *i as usize).collect()),
};
```

Applied symmetrically to `HashJoinExec` and `NestedLoopJoinExec`.


## Are these changes tested?

yes, add roundtrip test case

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate physical-plan Changes to the physical-plan crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Empty ProjectionExec is not embedded into HashJoinExec by projection pushdown

3 participants