Skip to content

perf: avoid repeat format in calc_func_dependencies_for_project#12305

Merged
lewiszlw merged 1 commit into
apache:mainfrom
haohuaijin:improve-plan-time
Sep 4, 2024
Merged

perf: avoid repeat format in calc_func_dependencies_for_project#12305
lewiszlw merged 1 commit into
apache:mainfrom
haohuaijin:improve-plan-time

Conversation

@haohuaijin

@haohuaijin haohuaijin commented Sep 3, 2024

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Closes #12304

Rationale for this change

see #12304; the reason for the performance degradation is that we format the same expr in a loop.

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions Bot added the logical-expr Logical plan and expressions label Sep 3, 2024
@haohuaijin

Copy link
Copy Markdown
Contributor Author

the benchmarks are below, they compare main and this pr
image
the below benchmarks are compared main, datafusion v41.0.0 and this pr
image

@andygrove andygrove left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @haohuaijin

.map(|i| vec![i])
.unwrap_or(vec![])),
Expr::Alias(alias) => {
let name = format!("{}", alias.expr);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surprise that rust compile doesn't optimize on this 😮

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe because the rust doesn't know the length of inputs_fields at compile time, it doesn't optimize this case.

let flat_name = qualifier
.map(|t| format!("{}.{}", t, f.name()))
.unwrap_or(f.name().clone());
.unwrap_or_else(|| f.name().clone());

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arguments passed to unwrap_or are eagerly evaluated; if you are passing the result of a function call, it is recommended to use unwrap_or_else, which is lazily evaluated.

👍

@lewiszlw lewiszlw merged commit 6bbad7e into apache:main Sep 4, 2024
@haohuaijin haohuaijin deleted the improve-plan-time branch September 4, 2024 08:51
@haohuaijin haohuaijin restored the improve-plan-time branch September 4, 2024 15:34
@haohuaijin haohuaijin deleted the improve-plan-time branch September 19, 2024 03:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

logical-expr Logical plan and expressions

Projects

None yet

Development

Successfully merging this pull request may close these issues.

significant degraded performance of select * from large tables in main compared to datafusion v41.0.0

6 participants