Support transparent ExecutionPlan downcasts#22559
Conversation
2128754 to
cae7bf1
Compare
|
I'll review as soon as it's marked ready, but at a high level looks good so far |
259871e to
6563792
Compare
6563792 to
fc8f083
Compare
@timsaucer Should be ready for review now. I just renamed the API to |
gabotechs
left a comment
There was a problem hiding this comment.
👍 Even if it's not super pretty to have that extra method there, it gets the job done without introducing any breaking API change, and it's landed in a way that only people caring about implementing wrapper types need to care about this, without getting in the way of everyone else, so +1 on my side.
Before moving forward, let's wait for at least @timsaucer to give his opinion here.
| match self.downcast_delegate() { | ||
| Some(delegate) => delegate.is::<T>(), | ||
| None => (self as &dyn Any).is::<T>(), | ||
| } |
There was a problem hiding this comment.
Won't this prevent you from downcasting to the wrapper type?
There was a problem hiding this comment.
Yes, and this is intentional to mimic the old as_any behavior. I've updated the PR description to be clear about this choice.
There was a problem hiding this comment.
It seems like the earlier PR would still work for all use cases, right? Because you're always choosing to downcast to a specific type, it seems like it would enable downcasting to either (or many, depending on levels) type.
There was a problem hiding this comment.
The previous PR would stop on the first match, rather than go all the way to the leaf child. I agree this is being pedantic, as "real world" cases would probably never have a case where you'd need to match the leaf child but get interrupted by an intermediate node in a "wrapper chain". However, I still find the current implementation clearer and easier to reason about: the leaf node is what is returned.
There was a problem hiding this comment.
Should we revert the hew :Any instead? I think it was just supposed to be a cleanup but we didn't realize it would have downstream impacts 🤔
There was a problem hiding this comment.
Also I wonder if there is some way to avoid using as_any / delegating logic at all ... But to do that we would have to remove all uses of as_any / downcasting I suspect
There was a problem hiding this comment.
Should we revert the new
:Anyinstead? I think it was just supposed to be a cleanup but we didn't realize it would have downstream impacts 🤔
@alamb Honestly I prefer the new API with downcast_delegate: it's explicit and clearer.
Also I wonder if there is some way to avoid using as_any / delegating logic at all ... But to do that we would have to remove all uses of as_any / downcasting I suspect
That would be a pretty big breaking change as well, as the is a check is quite widespread.
gene-bordegaray
left a comment
There was a problem hiding this comment.
looks good
I think this is feasible and good as we are gonna get 👍
fc8f083 to
41a9861
Compare
|
Do we need to also open a PR to target |
@timsaucer I opened #22565 on |
| match self.downcast_delegate() { | ||
| Some(delegate) => delegate.is::<T>(), | ||
| None => (self as &dyn Any).is::<T>(), | ||
| } |
There was a problem hiding this comment.
Should we revert the hew :Any instead? I think it was just supposed to be a cleanup but we didn't realize it would have downstream impacts 🤔
| match self.downcast_delegate() { | ||
| Some(delegate) => delegate.is::<T>(), | ||
| None => (self as &dyn Any).is::<T>(), | ||
| } |
There was a problem hiding this comment.
Also I wonder if there is some way to avoid using as_any / delegating logic at all ... But to do that we would have to remove all uses of as_any / downcasting I suspect
| /// instrumentation. The default implementation returns `None`, meaning this | ||
| /// plan's concrete type is used for type introspection. | ||
| /// | ||
| /// The `is` and `downcast_ref` helpers follow the returned delegate instead |
There was a problem hiding this comment.
It might be good to add a specific example here (like a wrapper node that wants to observe changes but still be treated as the underlying node when DataFusion checks for patterns using downcast 🤔
There was a problem hiding this comment.
There's the unit test with DowncastDelegatingExec which already shows a possible implementation. Would you also like an inline comment?
There was a problem hiding this comment.
Yeah, I was thinking an inline comment so that if someone else is reading (just) the ExecutionPlan docs they can more quickly understand if they need to worry about implementing this method or not (probably not)
There was a problem hiding this comment.
done: I amended the commit to be clearer in the comment.
41a9861 to
1851c8a
Compare
|
Looks good to me -- thank you everyone |
## Which issue does this PR close? - Refs #22557. - Companion PR to #22559, targeting `branch-54`. ## Rationale for this change DataFusion 54 changed `ExecutionPlan` downcasting to use the `Any` supertrait directly. That removes `ExecutionPlan::as_any`, which had also served as a customization point for wrapper nodes: wrappers could identify as themselves internally while exposing the wrapped plan type to normal downcast-based inspection. This PR adds an explicit `ExecutionPlan::downcast_delegate()` hook for wrapper nodes that want their public `ExecutionPlan` downcast identity to be delegated to another plan. The proposed behavior intentionally preserves the old `as_any` override semantics: when a node opts into downcast delegation, intermediate delegating wrappers are invisible to `dyn ExecutionPlan::is::<T>()` and `downcast_ref::<T>()`. ## What changes are included in this PR? - Adds `ExecutionPlan::downcast_delegate()` with a default implementation returning `None`. - Updates `dyn ExecutionPlan::is::<T>()` and `downcast_ref::<T>()` to delegate to `downcast_delegate()` when present, otherwise use the current concrete plan type. - Documents that `downcast_delegate()` is only for type introspection and is independent from `children()` / plan traversal. - Adds tests for direct and nested downcast-delegating wrappers, including that intermediate delegating wrappers remain invisible to normal downcast-based inspection. ## Are these changes tested? Yes. - `cargo test -p datafusion-physical-plan execution_plan_downcast` - `cargo test -p datafusion-physical-plan --lib` - `cargo fmt --all -- --check` - `git diff --check` ## Are there any user-facing changes? Yes. This adds a new public `ExecutionPlan` trait method with a default implementation, and it changes `ExecutionPlan` downcast helpers to honor wrappers that explicitly opt into delegating public downcast identity.
apache#23154) ## Which issue does this PR close? N/A. This is a follow-up to apache#22559 and apache#21263, but there is no dedicated issue for this specific bug. ## Rationale for this change apache#21263 removed `ExecutionPlan::as_any()` now that `ExecutionPlan` has `Any` as a supertrait. Most call sites were migrated from code like this: ```rust plan.as_any().downcast_ref::<SomeExec>() ``` to the new helper: ```rust plan.downcast_ref::<SomeExec>() ``` That distinction matters after apache#22559. `ExecutionPlan::downcast_ref` is now the public downcast operation for execution plans: it follows `ExecutionPlan::downcast_delegate()` before falling back to raw `Any`. This lets wrapper plans keep their own concrete type private while still presenting the wrapped plan's normal public identity. For example, tracing or instrumentation wrappers can implement `downcast_delegate()` so callers that ask "is this a `FilterExec` / `EmptyExec` / `ProjectionExec`?" get the same answer they would have received without the wrapper. The wrapper remains visible only to code that intentionally performs a raw concrete-type `Any` check. This problem came up while trying to instrument distributed plans through [`datafusion-distributed`](https://github.com/datafusion-contrib/datafusion-distributed). In that setting, physical plans may be wrapped for tracing/instrumentation and then serialized for distributed execution. The wrapper is meant to be transparent for normal plan inspection, but serialization still needs to recognize the delegated built-in plan underneath it. The physical plan proto serializer still had one leftover mechanical migration from the old `as_any()` world: ```rust let plan = plan.as_ref() as &dyn Any; ``` That line bypasses `ExecutionPlan::downcast_ref` entirely. As a result, a delegating wrapper around a built-in execution plan is not serialized as the built-in plan. Instead, the serializer sees only the wrapper's concrete type, fails all built-in `Exec` checks, and falls through to the extension codec. With the default extension codec this produces an unsupported-plan error, even though the wrapped plan itself is serializable. This is particularly relevant for transparent wrappers such as [`InstrumentedExec`](https://github.com/datafusion-contrib/datafusion-tracing/blob/main/datafusion-tracing/src/instrumented_exec.rs) in [`datafusion-contrib/datafusion-tracing`](https://github.com/datafusion-contrib/datafusion-tracing), which intentionally delegate public downcasts to the wrapped plan. ## What changes are included in this PR? - Changes physical plan proto serialization to bind the plan as `&dyn ExecutionPlan` instead of `&dyn Any`. - Leaves the existing serializer downcast chain intact, so each `plan.downcast_ref::<...>()` now dispatches through the `ExecutionPlan` helper and therefore honors `downcast_delegate()`. - Adds a regression test with a small wrapper execution plan that delegates public downcasts to an inner `EmptyExec`. - Audited other `as_ref() as &dyn Any` patterns. The remaining hits are either non-`ExecutionPlan` traits, `PhysicalExpr` checks, UDF tests, or the intentional raw `Any` fallback inside the `ExecutionPlan` helper itself. ## Are these changes tested? Yes. ```bash cargo fmt --all cargo fmt --all --check cargo test -p datafusion-proto --test proto_integration serialize_uses_downcast_delegate cargo clippy --all-targets --all-features -- -D warnings ``` ## Are there any user-facing changes? Yes, as a bug fix. Physical plan proto serialization now honors the documented `ExecutionPlan::downcast_delegate()` behavior. Transparent wrapper plans can serialize as their delegated built-in execution plan when appropriate instead of requiring an extension codec for the wrapper itself.
Which issue does this PR close?
Rationale for this change
DataFusion 54 changed
ExecutionPlandowncasting to use theAnysupertrait directly. That removesExecutionPlan::as_any, which had also served as a customization point for wrapper nodes: wrappers could identify as themselves internally while exposing the wrapped plan type to normal downcast-based inspection.This draft PR proposes one possible fix: add an explicit
ExecutionPlan::downcast_delegate()hook for wrapper nodes that want their publicExecutionPlandowncast identity to be delegated to another plan.The proposed behavior intentionally preserves the old
as_anyoverride semantics: when a node opts into downcast delegation, intermediate delegating wrappers are invisible todyn ExecutionPlan::is::<T>()anddowncast_ref::<T>(). A different "dual identity" model, where wrappers can downcast both as themselves and as their delegates, may also be useful, but that would be a new behavior rather than a compatibility fix.This is only a suggested implementation for the linked issue. Other solution proposals, including different API names or a different shape for the hook, are very welcome.
What changes are included in this PR?
ExecutionPlan::downcast_delegate()with a default implementation returningNone.dyn ExecutionPlan::is::<T>()anddowncast_ref::<T>()to delegate todowncast_delegate()when present, otherwise use the current concrete plan type.downcast_delegate()is only for type introspection and is independent fromchildren()/ plan traversal.An alternative API shape would make every plan expose an explicit downcast target. A concrete spelling could make the self-target case explicit:
That frames every plan as having a public downcast target, which is close to the old
as_anymental model. A simpler conceptual version would befn downcast_target(&self) -> &dyn ExecutionPlanwith the default target beingself, but the real helper still needs an explicit self-target base case to avoid recursing forever. This PR keeps the primary proposal as an explicitOption-based opt-in.Are these changes tested?
Yes.
cargo test -p datafusion-physical-plan execution_plan_downcastcargo test -p datafusion-physical-plan --libcargo fmt --all -- --checkgit diff --checkAre there any user-facing changes?
Yes. This adds a new public
ExecutionPlantrait method with a default implementation, and it changesExecutionPlandowncast helpers to honor wrappers that explicitly opt into delegating public downcast identity.