[SPARK-57486][SQL] Reuse AnyTimestampNanoType for nanosecond-precision timestamp type checks by MaxGekk · Pull Request #56540 · apache/spark

MaxGekk · 2026-06-16T07:39:07Z

What changes were proposed in this pull request?

SPARK-57469 introduced the AnyTimestampNanoType abstraction (an AbstractDataType plus the AnyTimestampNanoTypeExpression extractor), but it was used in only one place. Meanwhile, many sites across catalyst/core/hive still discriminated on the nanosecond-precision timestamp types by spelling out the pair explicitly, e.g.:

case _: TimestampNTZNanosType | _: TimestampLTZNanosType => ...
t.isInstanceOf[TimestampNTZNanosType] || t.isInstanceOf[TimestampLTZNanosType]

This PR centralizes that check:

Follow the existing AnyTimeType design: add a companion abstract class AnyTimestampNanoType extends DatetimeType { def precision: Int } and make TimestampNTZNanosType / TimestampLTZNanosType extend it (their precision constructor val implements the abstract def). This lets the abstraction be used as a plain type pattern case _: AnyTimestampNanoType.
Simplify AnyTimestampNanoType.acceptsType and AnyTimestampNanoTypeExpression to isInstanceOf[AnyTimestampNanoType].
Replace the explicit TimestampNTZNanosType + TimestampLTZNanosType pair-checks across type-coercion, codegen, projections, hashing, and data-source supportDataType checks with case _: AnyTimestampNanoType.

Sites that treat the two types differently (distinct converters/ops, per-type instance construction, the EXTRACT error-message builder, parsers, physical-type maps) are intentionally left unchanged.

This is a sub-task of SPARK-56822 (SPIP: Timestamps with nanosecond precision).

Why are the changes needed?

The pair-checks are duplicated logic that must be kept in sync whenever the set of nanosecond timestamp types changes. Reusing the abstraction removes the duplication and gives a single place that defines "is a nanosecond-precision timestamp type".

Does this PR introduce any user-facing change?

No. The change is behavior-preserving: no user-facing change and no new functionality. Because the nanosecond timestamp types are an unreleased preview feature, changing their superclass has no binary-compatibility impact (verified with MiMa).

How was this patch tested?

Compiled the affected modules: build/sbt sql-api/compile catalyst/compile sql/compile hive/compile avro/compile.
./dev/scalastyle and dev/mima pass.
Existing nanosecond-timestamp test coverage continues to apply (no behavior change).

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor

MaxGekk · 2026-06-16T10:19:27Z

@uros-b @stevomitric Could you review this PR, please.

uros-b · 2026-06-16T12:53:36Z

+ * A nanosecond-precision timestamp type (`TIMESTAMP_LTZ(p)` / `TIMESTAMP_NTZ(p)`, `p` in [7, 9]).
+ */
+private[sql] abstract class AnyTimestampNanoType extends DatetimeType {
+  def precision: Int


Is this currently used at all?

None of the converted call sites read .precision polymorphically, and the sibling abstraction AnyTimeType deliberately does not expose precision (even though TimeType has it).

uros-b

Thank you @MaxGekk, LGTM.

stevomitric

LGTM.

LuciferYang

The refactor is behavior-preserving: AnyTimestampNanoType has exactly the two nanos subclasses, so case _: AnyTimestampNanoType is set-identical to the old explicit pair at every converted site, case ordering is preserved, and imports/MiMa are handled (CSVTable is the only named-import change).

A few non-blocking notes:

the unused abstract def precision on the parent deviates slightly from the AnyTimeType precedent it cites;
and two Cast.scala pairs (L116-117, L264-265) plus the vectorized Java checks(ColumnVectorUtils/ConstantColumnVector/WritableColumnVector) are identical-treatment sites the dedup could also cover — behavior is unchanged either way.

LuciferYang · 2026-06-16T13:06:26Z

+ * A nanosecond-precision timestamp type (`TIMESTAMP_LTZ(p)` / `TIMESTAMP_NTZ(p)`, `p` in [7, 9]).
+ */
+private[sql] abstract class AnyTimestampNanoType extends DatetimeType {
+  def precision: Int


…n timestamp type checks ### What changes were proposed in this pull request? SPARK-57469 introduced the `AnyTimestampNanoType` abstraction (an `AbstractDataType` plus the `AnyTimestampNanoTypeExpression` extractor), but it was used in only one place while many sites across catalyst/core/hive still spelled out the nanosecond-precision timestamp type pair explicitly, e.g. `case _: TimestampNTZNanosType | _: TimestampLTZNanosType => ...`. This PR centralizes that check: - Follow the existing `AnyTimeType` design: add a companion `abstract class AnyTimestampNanoType extends DatetimeType { def precision: Int }` and make `TimestampNTZNanosType` / `TimestampLTZNanosType` extend it, so the abstraction can be used as a plain type pattern `case _: AnyTimestampNanoType`. - Simplify `AnyTimestampNanoType.acceptsType` and `AnyTimestampNanoTypeExpression` to `isInstanceOf[AnyTimestampNanoType]`. - Replace the explicit `TimestampNTZNanosType` + `TimestampLTZNanosType` pair-checks across type-coercion, codegen, projections, hashing, and data-source `supportDataType` checks with `case _: AnyTimestampNanoType`. ### Why are the changes needed? The pair-checks are duplicated logic that must be kept in sync whenever the set of nanosecond timestamp types changes. Reusing the abstraction removes the duplication. ### Does this PR introduce _any_ user-facing change? No. The change is behavior-preserving: no user-facing change and no new functionality. Because the nanosecond timestamp types are an unreleased preview feature, changing their superclass has no binary-compatibility impact (verified with MiMa). ### How was this patch tested? - `build/sbt sql-api/compile catalyst/compile sql/compile hive/compile avro/compile`. - `./dev/scalastyle` and `dev/mima` pass. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor

MaxGekk · 2026-06-16T17:15:45Z

Thanks @uros-b @LuciferYang @stevomitric for the reviews! Addressed the feedback in the latest push (also rebased on the current master):

Dropped def precision from the abstract class -- AnyTimestampNanoType now mirrors AnyTimeType exactly as an empty abstract parent, since nothing reads .precision polymorphically.
Switched the vectorized Java checks (WritableColumnVector, ConstantColumnVector, ColumnVectorUtils) to instanceof AnyTimestampNanoType (the abstract class is public in bytecode, so the Java reference is fine). The ColumnVectorUtils arm testing PhysicalTimestamp*NanosType stays, since the physical types don't share this supertype.
Collapsed the (StringType, *Nanos) => true pairs in Cast.scala. I kept the remaining Cast (from, to) arms explicit because they map each nanos type to a different counterpart (e.g. (TimestampNTZType, NTZNanos) vs (TimestampType, LTZNanos), and (NTZNanos, TimestampNTZType) vs (LTZNanos, TimestampType)); collapsing only the identical ones would leave the cast matrices half-abstract.

Re-verified: build/sbt sql-api/compile catalyst/compile sql/compile hive/compile avro/compile, ./dev/scalastyle, and dev/mima all pass.

MaxGekk · 2026-06-16T20:09:54Z

Merging to master/4.x. Thank you, @LuciferYang @uros-b @stevomitric for review.

…n timestamp type checks ### What changes were proposed in this pull request? [SPARK-57469](https://issues.apache.org/jira/browse/SPARK-57469) introduced the `AnyTimestampNanoType` abstraction (an `AbstractDataType` plus the `AnyTimestampNanoTypeExpression` extractor), but it was used in only one place. Meanwhile, many sites across catalyst/core/hive still discriminated on the nanosecond-precision timestamp types by spelling out the pair explicitly, e.g.: ```scala case _: TimestampNTZNanosType | _: TimestampLTZNanosType => ... t.isInstanceOf[TimestampNTZNanosType] || t.isInstanceOf[TimestampLTZNanosType] ``` This PR centralizes that check: - Follow the existing `AnyTimeType` design: add a companion `abstract class AnyTimestampNanoType extends DatetimeType { def precision: Int }` and make `TimestampNTZNanosType` / `TimestampLTZNanosType` extend it (their `precision` constructor `val` implements the abstract `def`). This lets the abstraction be used as a plain type pattern `case _: AnyTimestampNanoType`. - Simplify `AnyTimestampNanoType.acceptsType` and `AnyTimestampNanoTypeExpression` to `isInstanceOf[AnyTimestampNanoType]`. - Replace the explicit `TimestampNTZNanosType` + `TimestampLTZNanosType` pair-checks across type-coercion, codegen, projections, hashing, and data-source `supportDataType` checks with `case _: AnyTimestampNanoType`. Sites that treat the two types differently (distinct converters/ops, per-type instance construction, the `EXTRACT` error-message builder, parsers, physical-type maps) are intentionally left unchanged. This is a sub-task of [SPARK-56822](https://issues.apache.org/jira/browse/SPARK-56822) (SPIP: Timestamps with nanosecond precision). ### Why are the changes needed? The pair-checks are duplicated logic that must be kept in sync whenever the set of nanosecond timestamp types changes. Reusing the abstraction removes the duplication and gives a single place that defines "is a nanosecond-precision timestamp type". ### Does this PR introduce _any_ user-facing change? No. The change is behavior-preserving: no user-facing change and no new functionality. Because the nanosecond timestamp types are an unreleased preview feature, changing their superclass has no binary-compatibility impact (verified with MiMa). ### How was this patch tested? - Compiled the affected modules: `build/sbt sql-api/compile catalyst/compile sql/compile hive/compile avro/compile`. - `./dev/scalastyle` and `dev/mima` pass. - Existing nanosecond-timestamp test coverage continues to apply (no behavior change). ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor Closes #56540 from MaxGekk/nanos-reuse-anytimestamptype. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Max Gekk <max.gekk@gmail.com> (cherry picked from commit bfe75db) Signed-off-by: Max Gekk <max.gekk@gmail.com>

MaxGekk requested review from HyukjinKwon, LuciferYang and cloud-fan June 16, 2026 11:23

uros-b reviewed Jun 16, 2026

View reviewed changes

uros-b approved these changes Jun 16, 2026

View reviewed changes

stevomitric reviewed Jun 16, 2026

View reviewed changes

LuciferYang approved these changes Jun 16, 2026

View reviewed changes

MaxGekk force-pushed the nanos-reuse-anytimestamptype branch from f8a9a1e to 2127571 Compare June 16, 2026 17:14

MaxGekk closed this in bfe75db Jun 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-57486][SQL] Reuse AnyTimestampNanoType for nanosecond-precision timestamp type checks#56540

[SPARK-57486][SQL] Reuse AnyTimestampNanoType for nanosecond-precision timestamp type checks#56540
MaxGekk wants to merge 1 commit into
apache:masterfrom
MaxGekk:nanos-reuse-anytimestamptype

MaxGekk commented Jun 16, 2026

Uh oh!

MaxGekk commented Jun 16, 2026

Uh oh!

uros-b Jun 16, 2026

Uh oh!

LuciferYang Jun 16, 2026

Uh oh!

uros-b left a comment

Uh oh!

stevomitric left a comment

Uh oh!

LuciferYang left a comment •

edited

Loading

Uh oh!

LuciferYang Jun 16, 2026

Uh oh!

MaxGekk commented Jun 16, 2026

Uh oh!

MaxGekk commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

MaxGekk commented Jun 16, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

MaxGekk commented Jun 16, 2026

Uh oh!

uros-b Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

LuciferYang Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

uros-b left a comment

Choose a reason for hiding this comment

Uh oh!

stevomitric left a comment

Choose a reason for hiding this comment

Uh oh!

LuciferYang left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LuciferYang Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

MaxGekk commented Jun 16, 2026

Uh oh!

MaxGekk commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

LuciferYang left a comment •

edited

Loading