[SPARK-57486][SQL] Reuse AnyTimestampNanoType for nanosecond-precision timestamp type checks#56540
[SPARK-57486][SQL] Reuse AnyTimestampNanoType for nanosecond-precision timestamp type checks#56540MaxGekk wants to merge 1 commit into
Conversation
|
@uros-b @stevomitric Could you review this PR, please. |
| * A nanosecond-precision timestamp type (`TIMESTAMP_LTZ(p)` / `TIMESTAMP_NTZ(p)`, `p` in [7, 9]). | ||
| */ | ||
| private[sql] abstract class AnyTimestampNanoType extends DatetimeType { | ||
| def precision: Int |
There was a problem hiding this comment.
Is this currently used at all?
None of the converted call sites read .precision polymorphically, and the sibling abstraction AnyTimeType deliberately does not expose precision (even though TimeType has it).
There was a problem hiding this comment.
The refactor is behavior-preserving: AnyTimestampNanoType has exactly the two nanos subclasses, so case _: AnyTimestampNanoType is set-identical to the old explicit pair at every converted site, case ordering is preserved, and imports/MiMa are handled (CSVTable is the only named-import change).
A few non-blocking notes:
- the unused abstract
def precisionon the parent deviates slightly from the AnyTimeType precedent it cites; - and two Cast.scala pairs (L116-117, L264-265) plus the vectorized Java checks(ColumnVectorUtils/ConstantColumnVector/WritableColumnVector) are identical-treatment sites the dedup could also cover — behavior is unchanged either way.
| * A nanosecond-precision timestamp type (`TIMESTAMP_LTZ(p)` / `TIMESTAMP_NTZ(p)`, `p` in [7, 9]). | ||
| */ | ||
| private[sql] abstract class AnyTimestampNanoType extends DatetimeType { | ||
| def precision: Int |
…n timestamp type checks
### What changes were proposed in this pull request?
SPARK-57469 introduced the `AnyTimestampNanoType` abstraction (an `AbstractDataType`
plus the `AnyTimestampNanoTypeExpression` extractor), but it was used in only one
place while many sites across catalyst/core/hive still spelled out the
nanosecond-precision timestamp type pair explicitly, e.g.
`case _: TimestampNTZNanosType | _: TimestampLTZNanosType => ...`.
This PR centralizes that check:
- Follow the existing `AnyTimeType` design: add a companion
`abstract class AnyTimestampNanoType extends DatetimeType { def precision: Int }`
and make `TimestampNTZNanosType` / `TimestampLTZNanosType` extend it, so the
abstraction can be used as a plain type pattern `case _: AnyTimestampNanoType`.
- Simplify `AnyTimestampNanoType.acceptsType` and `AnyTimestampNanoTypeExpression`
to `isInstanceOf[AnyTimestampNanoType]`.
- Replace the explicit `TimestampNTZNanosType` + `TimestampLTZNanosType` pair-checks
across type-coercion, codegen, projections, hashing, and data-source
`supportDataType` checks with `case _: AnyTimestampNanoType`.
### Why are the changes needed?
The pair-checks are duplicated logic that must be kept in sync whenever the set of
nanosecond timestamp types changes. Reusing the abstraction removes the duplication.
### Does this PR introduce _any_ user-facing change?
No. The change is behavior-preserving: no user-facing change and no new functionality.
Because the nanosecond timestamp types are an unreleased preview feature, changing
their superclass has no binary-compatibility impact (verified with MiMa).
### How was this patch tested?
- `build/sbt sql-api/compile catalyst/compile sql/compile hive/compile avro/compile`.
- `./dev/scalastyle` and `dev/mima` pass.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor
f8a9a1e to
2127571
Compare
|
Thanks @uros-b @LuciferYang @stevomitric for the reviews! Addressed the feedback in the latest push (also rebased on the current master):
Re-verified: |
|
Merging to master/4.x. Thank you, @LuciferYang @uros-b @stevomitric for review. |
…n timestamp type checks ### What changes were proposed in this pull request? [SPARK-57469](https://issues.apache.org/jira/browse/SPARK-57469) introduced the `AnyTimestampNanoType` abstraction (an `AbstractDataType` plus the `AnyTimestampNanoTypeExpression` extractor), but it was used in only one place. Meanwhile, many sites across catalyst/core/hive still discriminated on the nanosecond-precision timestamp types by spelling out the pair explicitly, e.g.: ```scala case _: TimestampNTZNanosType | _: TimestampLTZNanosType => ... t.isInstanceOf[TimestampNTZNanosType] || t.isInstanceOf[TimestampLTZNanosType] ``` This PR centralizes that check: - Follow the existing `AnyTimeType` design: add a companion `abstract class AnyTimestampNanoType extends DatetimeType { def precision: Int }` and make `TimestampNTZNanosType` / `TimestampLTZNanosType` extend it (their `precision` constructor `val` implements the abstract `def`). This lets the abstraction be used as a plain type pattern `case _: AnyTimestampNanoType`. - Simplify `AnyTimestampNanoType.acceptsType` and `AnyTimestampNanoTypeExpression` to `isInstanceOf[AnyTimestampNanoType]`. - Replace the explicit `TimestampNTZNanosType` + `TimestampLTZNanosType` pair-checks across type-coercion, codegen, projections, hashing, and data-source `supportDataType` checks with `case _: AnyTimestampNanoType`. Sites that treat the two types differently (distinct converters/ops, per-type instance construction, the `EXTRACT` error-message builder, parsers, physical-type maps) are intentionally left unchanged. This is a sub-task of [SPARK-56822](https://issues.apache.org/jira/browse/SPARK-56822) (SPIP: Timestamps with nanosecond precision). ### Why are the changes needed? The pair-checks are duplicated logic that must be kept in sync whenever the set of nanosecond timestamp types changes. Reusing the abstraction removes the duplication and gives a single place that defines "is a nanosecond-precision timestamp type". ### Does this PR introduce _any_ user-facing change? No. The change is behavior-preserving: no user-facing change and no new functionality. Because the nanosecond timestamp types are an unreleased preview feature, changing their superclass has no binary-compatibility impact (verified with MiMa). ### How was this patch tested? - Compiled the affected modules: `build/sbt sql-api/compile catalyst/compile sql/compile hive/compile avro/compile`. - `./dev/scalastyle` and `dev/mima` pass. - Existing nanosecond-timestamp test coverage continues to apply (no behavior change). ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor Closes #56540 from MaxGekk/nanos-reuse-anytimestamptype. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Max Gekk <max.gekk@gmail.com> (cherry picked from commit bfe75db) Signed-off-by: Max Gekk <max.gekk@gmail.com>
What changes were proposed in this pull request?
SPARK-57469 introduced the
AnyTimestampNanoTypeabstraction (anAbstractDataTypeplus theAnyTimestampNanoTypeExpressionextractor), but it was used in only one place. Meanwhile, many sites across catalyst/core/hive still discriminated on the nanosecond-precision timestamp types by spelling out the pair explicitly, e.g.:This PR centralizes that check:
AnyTimeTypedesign: add a companionabstract class AnyTimestampNanoType extends DatetimeType { def precision: Int }and makeTimestampNTZNanosType/TimestampLTZNanosTypeextend it (theirprecisionconstructorvalimplements the abstractdef). This lets the abstraction be used as a plain type patterncase _: AnyTimestampNanoType.AnyTimestampNanoType.acceptsTypeandAnyTimestampNanoTypeExpressiontoisInstanceOf[AnyTimestampNanoType].TimestampNTZNanosType+TimestampLTZNanosTypepair-checks across type-coercion, codegen, projections, hashing, and data-sourcesupportDataTypechecks withcase _: AnyTimestampNanoType.Sites that treat the two types differently (distinct converters/ops, per-type instance construction, the
EXTRACTerror-message builder, parsers, physical-type maps) are intentionally left unchanged.This is a sub-task of SPARK-56822 (SPIP: Timestamps with nanosecond precision).
Why are the changes needed?
The pair-checks are duplicated logic that must be kept in sync whenever the set of nanosecond timestamp types changes. Reusing the abstraction removes the duplication and gives a single place that defines "is a nanosecond-precision timestamp type".
Does this PR introduce any user-facing change?
No. The change is behavior-preserving: no user-facing change and no new functionality. Because the nanosecond timestamp types are an unreleased preview feature, changing their superclass has no binary-compatibility impact (verified with MiMa).
How was this patch tested?
build/sbt sql-api/compile catalyst/compile sql/compile hive/compile avro/compile../dev/scalastyleanddev/mimapass.Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor