Skip to content

feat: decimal support for gcd and lcm#22655

Merged
Jefffrey merged 12 commits into
apache:mainfrom
theirix:gcd-lcm-decimal
Jun 16, 2026
Merged

feat: decimal support for gcd and lcm#22655
Jefffrey merged 12 commits into
apache:mainfrom
theirix:gcd-lcm-decimal

Conversation

@theirix

@theirix theirix commented May 30, 2026

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

A binary gcd and lcm UDF in the datafusion-functions crate supports only Int64, but not Decimals. Adding missing support for decimals.

What changes are included in this PR?

  1. Updated gcd and lcm functions to add decimal support. The integer path is more performant and stays intact. For decimals, the Euclidean algorithm is used for GCD
  2. Added coercion rules: casting to decimals if any argument is decimal; otherwise, stay with ints as before
  3. Common functionality extracted to common.rs to avoid inter-UDF dependency
  4. In order to use calculate_binary_math for Decimals, updated it to accept a target type instead of raw Decimal128Type::DATA_TYPE - it causes scaling issues for these UDFs, see Improve scale support for binary decimal operations #19621

A bit more on (4). The driving force is this failing example:

query R
select gcd(2::decimal(38, 0), 3::decimal(38, 0));
----
1

Previously in #19874, I suggested a more complicated solution to extend calculate_binary_math. However, it only affected gcd/lcm and could be considered overkill. This PR extends these functions with an extra parameter cast_target for calculate_binary_decimal_math to perform a proper cast to the actual type used, rather than to the default Decimal128Type::DATA_TYPE - it is much lighter.

Are these changes tested?

  • Added unit test for UDFs with decimals for array and scalar paths
  • Added unit tests for the gcd/lcm math itself
  • Added new SLT tests for decimals

Are there any user-facing changes?

No

@github-actions github-actions Bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels May 30, 2026
@github-actions

github-actions Bot commented May 30, 2026

Copy link
Copy Markdown

Thank you for opening this pull request!

Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch).

Details
     Cloning apache/main
    Building datafusion-functions v54.0.0 (current)
       Built [  30.573s] (current)
     Parsing datafusion-functions v54.0.0 (current)
      Parsed [   0.084s] (current)
    Building datafusion-functions v54.0.0 (baseline)
       Built [  29.103s] (baseline)
     Parsing datafusion-functions v54.0.0 (baseline)
      Parsed [   0.086s] (baseline)
    Checking datafusion-functions v54.0.0 -> v54.0.0 (no change; assume patch)
     Checked [   0.576s] 223 checks: 221 pass, 2 fail, 0 warn, 30 skip

--- failure function_marked_deprecated: function #[deprecated] added ---

Description:
A function is now #[deprecated]. Downstream crates will get a compiler warning when using this function.
        ref: https://doc.rust-lang.org/reference/attributes/diagnostics.html#the-deprecated-attribute
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.48.0/src/lints/function_marked_deprecated.ron

Failed in:
  function datafusion_functions::utils::calculate_binary_decimal_math in /home/runner/work/datafusion/datafusion/datafusion/functions/src/utils.rs:157

--- failure function_missing: pub fn removed or renamed ---

Description:
A publicly-visible function cannot be imported by its prior path. A `pub use` may have been removed, or the function itself may have been renamed or removed entirely.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.48.0/src/lints/function_missing.ron

Failed in:
  function datafusion_functions::math::gcd::compute_gcd, previously in file /home/runner/work/datafusion/datafusion/target/semver-checks/git-apache_main/c37ba366f681d4072190d15e2308e91220f34258/datafusion/functions/src/math/gcd.rs:175

     Summary semver requires new major version: 1 major and 1 minor checks failed
    Finished [  61.906s] datafusion-functions
    Building datafusion-sqllogictest v54.0.0 (current)
       Built [ 179.887s] (current)
     Parsing datafusion-sqllogictest v54.0.0 (current)
      Parsed [   0.027s] (current)
    Building datafusion-sqllogictest v54.0.0 (baseline)
       Built [ 179.980s] (baseline)
     Parsing datafusion-sqllogictest v54.0.0 (baseline)
      Parsed [   0.025s] (baseline)
    Checking datafusion-sqllogictest v54.0.0 -> v54.0.0 (no change; assume patch)
     Checked [   0.117s] 223 checks: 223 pass, 30 skip
     Summary no semver update required
    Finished [ 363.498s] datafusion-sqllogictest

@github-actions github-actions Bot added the auto detected api change Auto detected API change label May 30, 2026
@theirix theirix marked this pull request as ready for review May 30, 2026 20:55
@theirix

theirix commented May 31, 2026

Copy link
Copy Markdown
Contributor Author

@neilconway , @Jefffrey I'd appreciate hearing your thoughts on this PR

query error DataFusion error: Arrow error: Compute error: Signed integer overflow in GCD\(0, \-9223372036854775808\)
select gcd(0, -9223372036854775808);

# gcd decimal

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we add test cases for:

  • negative number inputs
  • decimals with different scale/precision inputs
  • decimal point 1.23 inputs

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're still missing cases for decimals like 1.23 (non whole numbers)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to see a case where we calculate gcd on non-integers; the test case you linked still operates on whole numbers since it cases to decimals with 0 scale

e.g. from postgres

postgres=# select gcd(15.3, 2.9);
 gcd
-----
 0.1
(1 row)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that's true. Introduced scale, now it matches the Postgres test case

Comment thread datafusion/functions/src/utils.rs Outdated
Comment thread datafusion/functions/src/utils.rs Outdated

/// Computes a binary math function for input arrays using a specified function
/// and applies rescaling to given precision and scale.
/// Deprecated, use [`calculate_binary_decimal_math_cast`] instead.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of just stating its deprecated might be better to explicitly mark it as so #[deprecated]

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the original function calculate_binary_math_cast is public, we cannot track its usage outside, and removal is only possible with deprecation. It shouldn't have been declared as public in the first place - so the new function, too.

What I can see as a plan:

  • Deprecate it explicitly with the macro, as you suggest
  • Port some usages in the datafusion repo in the next PR
  • Mark the new function as pub(crate) as it is intended to be used locally only

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A second thought - marking it as deprecated fails a clippy check. Rolled back to an informational comment until the PR removes its usage in (2)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should still deprecate it, and for the clippy check just throw a #[expect(deprecated)] to suppress them (and hopefully fix them later)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Since there are only a few instances in a single file, I introduced the change now - anyway, it will be changed. Sorry for the PR creeping into more files.

Comment thread datafusion/functions/src/utils.rs Outdated
Comment thread datafusion/functions/src/math/gcd.rs Outdated
Comment thread datafusion/functions/src/math/gcd.rs
Comment thread datafusion/functions/src/math/gcd.rs Outdated
Comment thread datafusion/functions/src/math/lcm.rs
@Jefffrey Jefffrey added this pull request to the merge queue Jun 16, 2026
@Jefffrey

Copy link
Copy Markdown
Contributor

thanks @theirix

do we still want #19874 after this?

@theirix

theirix commented Jun 16, 2026

Copy link
Copy Markdown
Contributor Author

Thank you! This PR is more than enough.

Merged via the queue into apache:main with commit a1e88e2 Jun 16, 2026
35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto detected api change Auto detected API change functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Decimal support for gcd and lcm

2 participants