feat: decimal support for gcd and lcm#22655
Conversation
|
Thank you for opening this pull request! Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch). Details |
|
@neilconway , @Jefffrey I'd appreciate hearing your thoughts on this PR |
| query error DataFusion error: Arrow error: Compute error: Signed integer overflow in GCD\(0, \-9223372036854775808\) | ||
| select gcd(0, -9223372036854775808); | ||
|
|
||
| # gcd decimal |
There was a problem hiding this comment.
could we add test cases for:
- negative number inputs
- decimals with different scale/precision inputs
- decimal point
1.23inputs
There was a problem hiding this comment.
I think we're still missing cases for decimals like 1.23 (non whole numbers)
There was a problem hiding this comment.
A test for non-integers is already here https://github.com/apache/datafusion/pull/22655/changes#diff-a3dd4ed33ba3cb7e6f13d3d50f84f93d654a13ef250dcb53e75373a8567b4847R712 . Am I missing something special about 1.23 ?
There was a problem hiding this comment.
I wanted to see a case where we calculate gcd on non-integers; the test case you linked still operates on whole numbers since it cases to decimals with 0 scale
e.g. from postgres
postgres=# select gcd(15.3, 2.9);
gcd
-----
0.1
(1 row)There was a problem hiding this comment.
Ah, that's true. Introduced scale, now it matches the Postgres test case
|
|
||
| /// Computes a binary math function for input arrays using a specified function | ||
| /// and applies rescaling to given precision and scale. | ||
| /// Deprecated, use [`calculate_binary_decimal_math_cast`] instead. |
There was a problem hiding this comment.
instead of just stating its deprecated might be better to explicitly mark it as so #[deprecated]
There was a problem hiding this comment.
Since the original function calculate_binary_math_cast is public, we cannot track its usage outside, and removal is only possible with deprecation. It shouldn't have been declared as public in the first place - so the new function, too.
What I can see as a plan:
- Deprecate it explicitly with the macro, as you suggest
- Port some usages in the
datafusionrepo in the next PR - Mark the new function as
pub(crate)as it is intended to be used locally only
There was a problem hiding this comment.
A second thought - marking it as deprecated fails a clippy check. Rolled back to an informational comment until the PR removes its usage in (2)
There was a problem hiding this comment.
I think we should still deprecate it, and for the clippy check just throw a #[expect(deprecated)] to suppress them (and hopefully fix them later)
There was a problem hiding this comment.
Makes sense. Since there are only a few instances in a single file, I introduced the change now - anyway, it will be changed. Sorry for the PR creeping into more files.
|
Thank you! This PR is more than enough. |
Which issue does this PR close?
Rationale for this change
A binary gcd and lcm UDF in the datafusion-functions crate supports only Int64, but not Decimals. Adding missing support for decimals.
What changes are included in this PR?
common.rsto avoid inter-UDF dependencycalculate_binary_mathfor Decimals, updated it to accept a target type instead of rawDecimal128Type::DATA_TYPE- it causes scaling issues for these UDFs, see Improve scale support for binary decimal operations #19621A bit more on (4). The driving force is this failing example:
Previously in #19874, I suggested a more complicated solution to extend
calculate_binary_math. However, it only affected gcd/lcm and could be considered overkill. This PR extends these functions with an extra parametercast_targetforcalculate_binary_decimal_mathto perform a proper cast to the actual type used, rather than to the defaultDecimal128Type::DATA_TYPE- it is much lighter.Are these changes tested?
Are there any user-facing changes?
No