Skip to content

[SPARK-57486][SQL] Reuse AnyTimestampNanoType for nanosecond-precision timestamp type checks#56540

Closed
MaxGekk wants to merge 1 commit into
apache:masterfrom
MaxGekk:nanos-reuse-anytimestamptype
Closed

[SPARK-57486][SQL] Reuse AnyTimestampNanoType for nanosecond-precision timestamp type checks#56540
MaxGekk wants to merge 1 commit into
apache:masterfrom
MaxGekk:nanos-reuse-anytimestamptype

Conversation

@MaxGekk

@MaxGekk MaxGekk commented Jun 16, 2026

Copy link
Copy Markdown
Member

What changes were proposed in this pull request?

SPARK-57469 introduced the AnyTimestampNanoType abstraction (an AbstractDataType plus the AnyTimestampNanoTypeExpression extractor), but it was used in only one place. Meanwhile, many sites across catalyst/core/hive still discriminated on the nanosecond-precision timestamp types by spelling out the pair explicitly, e.g.:

case _: TimestampNTZNanosType | _: TimestampLTZNanosType => ...
t.isInstanceOf[TimestampNTZNanosType] || t.isInstanceOf[TimestampLTZNanosType]

This PR centralizes that check:

  • Follow the existing AnyTimeType design: add a companion abstract class AnyTimestampNanoType extends DatetimeType { def precision: Int } and make TimestampNTZNanosType / TimestampLTZNanosType extend it (their precision constructor val implements the abstract def). This lets the abstraction be used as a plain type pattern case _: AnyTimestampNanoType.
  • Simplify AnyTimestampNanoType.acceptsType and AnyTimestampNanoTypeExpression to isInstanceOf[AnyTimestampNanoType].
  • Replace the explicit TimestampNTZNanosType + TimestampLTZNanosType pair-checks across type-coercion, codegen, projections, hashing, and data-source supportDataType checks with case _: AnyTimestampNanoType.

Sites that treat the two types differently (distinct converters/ops, per-type instance construction, the EXTRACT error-message builder, parsers, physical-type maps) are intentionally left unchanged.

This is a sub-task of SPARK-56822 (SPIP: Timestamps with nanosecond precision).

Why are the changes needed?

The pair-checks are duplicated logic that must be kept in sync whenever the set of nanosecond timestamp types changes. Reusing the abstraction removes the duplication and gives a single place that defines "is a nanosecond-precision timestamp type".

Does this PR introduce any user-facing change?

No. The change is behavior-preserving: no user-facing change and no new functionality. Because the nanosecond timestamp types are an unreleased preview feature, changing their superclass has no binary-compatibility impact (verified with MiMa).

How was this patch tested?

  • Compiled the affected modules: build/sbt sql-api/compile catalyst/compile sql/compile hive/compile avro/compile.
  • ./dev/scalastyle and dev/mima pass.
  • Existing nanosecond-timestamp test coverage continues to apply (no behavior change).

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor

@MaxGekk

MaxGekk commented Jun 16, 2026

Copy link
Copy Markdown
Member Author

@uros-b @stevomitric Could you review this PR, please.

* A nanosecond-precision timestamp type (`TIMESTAMP_LTZ(p)` / `TIMESTAMP_NTZ(p)`, `p` in [7, 9]).
*/
private[sql] abstract class AnyTimestampNanoType extends DatetimeType {
def precision: Int

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this currently used at all?

None of the converted call sites read .precision polymorphically, and the sibling abstraction AnyTimeType deliberately does not expose precision (even though TimeType has it).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@uros-b uros-b left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @MaxGekk, LGTM.

@stevomitric stevomitric left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@LuciferYang LuciferYang left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The refactor is behavior-preserving: AnyTimestampNanoType has exactly the two nanos subclasses, so case _: AnyTimestampNanoType is set-identical to the old explicit pair at every converted site, case ordering is preserved, and imports/MiMa are handled (CSVTable is the only named-import change).

A few non-blocking notes:

  1. the unused abstract def precision on the parent deviates slightly from the AnyTimeType precedent it cites;
  2. and two Cast.scala pairs (L116-117, L264-265) plus the vectorized Java checks(ColumnVectorUtils/ConstantColumnVector/WritableColumnVector) are identical-treatment sites the dedup could also cover — behavior is unchanged either way.

* A nanosecond-precision timestamp type (`TIMESTAMP_LTZ(p)` / `TIMESTAMP_NTZ(p)`, `p` in [7, 9]).
*/
private[sql] abstract class AnyTimestampNanoType extends DatetimeType {
def precision: Int

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

…n timestamp type checks

### What changes were proposed in this pull request?
SPARK-57469 introduced the `AnyTimestampNanoType` abstraction (an `AbstractDataType`
plus the `AnyTimestampNanoTypeExpression` extractor), but it was used in only one
place while many sites across catalyst/core/hive still spelled out the
nanosecond-precision timestamp type pair explicitly, e.g.
`case _: TimestampNTZNanosType | _: TimestampLTZNanosType => ...`.

This PR centralizes that check:
- Follow the existing `AnyTimeType` design: add a companion
  `abstract class AnyTimestampNanoType extends DatetimeType { def precision: Int }`
  and make `TimestampNTZNanosType` / `TimestampLTZNanosType` extend it, so the
  abstraction can be used as a plain type pattern `case _: AnyTimestampNanoType`.
- Simplify `AnyTimestampNanoType.acceptsType` and `AnyTimestampNanoTypeExpression`
  to `isInstanceOf[AnyTimestampNanoType]`.
- Replace the explicit `TimestampNTZNanosType` + `TimestampLTZNanosType` pair-checks
  across type-coercion, codegen, projections, hashing, and data-source
  `supportDataType` checks with `case _: AnyTimestampNanoType`.

### Why are the changes needed?
The pair-checks are duplicated logic that must be kept in sync whenever the set of
nanosecond timestamp types changes. Reusing the abstraction removes the duplication.

### Does this PR introduce _any_ user-facing change?
No. The change is behavior-preserving: no user-facing change and no new functionality.
Because the nanosecond timestamp types are an unreleased preview feature, changing
their superclass has no binary-compatibility impact (verified with MiMa).

### How was this patch tested?
- `build/sbt sql-api/compile catalyst/compile sql/compile hive/compile avro/compile`.
- `./dev/scalastyle` and `dev/mima` pass.

### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor
@MaxGekk MaxGekk force-pushed the nanos-reuse-anytimestamptype branch from f8a9a1e to 2127571 Compare June 16, 2026 17:14
@MaxGekk

MaxGekk commented Jun 16, 2026

Copy link
Copy Markdown
Member Author

Thanks @uros-b @LuciferYang @stevomitric for the reviews! Addressed the feedback in the latest push (also rebased on the current master):

  • Dropped def precision from the abstract class -- AnyTimestampNanoType now mirrors AnyTimeType exactly as an empty abstract parent, since nothing reads .precision polymorphically.
  • Switched the vectorized Java checks (WritableColumnVector, ConstantColumnVector, ColumnVectorUtils) to instanceof AnyTimestampNanoType (the abstract class is public in bytecode, so the Java reference is fine). The ColumnVectorUtils arm testing PhysicalTimestamp*NanosType stays, since the physical types don't share this supertype.
  • Collapsed the (StringType, *Nanos) => true pairs in Cast.scala. I kept the remaining Cast (from, to) arms explicit because they map each nanos type to a different counterpart (e.g. (TimestampNTZType, NTZNanos) vs (TimestampType, LTZNanos), and (NTZNanos, TimestampNTZType) vs (LTZNanos, TimestampType)); collapsing only the identical ones would leave the cast matrices half-abstract.

Re-verified: build/sbt sql-api/compile catalyst/compile sql/compile hive/compile avro/compile, ./dev/scalastyle, and dev/mima all pass.

@MaxGekk

MaxGekk commented Jun 16, 2026

Copy link
Copy Markdown
Member Author

Merging to master/4.x. Thank you, @LuciferYang @uros-b @stevomitric for review.

@MaxGekk MaxGekk closed this in bfe75db Jun 16, 2026
MaxGekk added a commit that referenced this pull request Jun 16, 2026
…n timestamp type checks

### What changes were proposed in this pull request?
[SPARK-57469](https://issues.apache.org/jira/browse/SPARK-57469) introduced the `AnyTimestampNanoType` abstraction (an `AbstractDataType` plus the `AnyTimestampNanoTypeExpression` extractor), but it was used in only one place. Meanwhile, many sites across catalyst/core/hive still discriminated on the nanosecond-precision timestamp types by spelling out the pair explicitly, e.g.:

```scala
case _: TimestampNTZNanosType | _: TimestampLTZNanosType => ...
t.isInstanceOf[TimestampNTZNanosType] || t.isInstanceOf[TimestampLTZNanosType]
```

This PR centralizes that check:
- Follow the existing `AnyTimeType` design: add a companion `abstract class AnyTimestampNanoType extends DatetimeType { def precision: Int }` and make `TimestampNTZNanosType` / `TimestampLTZNanosType` extend it (their `precision` constructor `val` implements the abstract `def`). This lets the abstraction be used as a plain type pattern `case _: AnyTimestampNanoType`.
- Simplify `AnyTimestampNanoType.acceptsType` and `AnyTimestampNanoTypeExpression` to `isInstanceOf[AnyTimestampNanoType]`.
- Replace the explicit `TimestampNTZNanosType` + `TimestampLTZNanosType` pair-checks across type-coercion, codegen, projections, hashing, and data-source `supportDataType` checks with `case _: AnyTimestampNanoType`.

Sites that treat the two types differently (distinct converters/ops, per-type instance construction, the `EXTRACT` error-message builder, parsers, physical-type maps) are intentionally left unchanged.

This is a sub-task of [SPARK-56822](https://issues.apache.org/jira/browse/SPARK-56822) (SPIP: Timestamps with nanosecond precision).

### Why are the changes needed?
The pair-checks are duplicated logic that must be kept in sync whenever the set of nanosecond timestamp types changes. Reusing the abstraction removes the duplication and gives a single place that defines "is a nanosecond-precision timestamp type".

### Does this PR introduce _any_ user-facing change?
No. The change is behavior-preserving: no user-facing change and no new functionality. Because the nanosecond timestamp types are an unreleased preview feature, changing their superclass has no binary-compatibility impact (verified with MiMa).

### How was this patch tested?
- Compiled the affected modules: `build/sbt sql-api/compile catalyst/compile sql/compile hive/compile avro/compile`.
- `./dev/scalastyle` and `dev/mima` pass.
- Existing nanosecond-timestamp test coverage continues to apply (no behavior change).

### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor

Closes #56540 from MaxGekk/nanos-reuse-anytimestamptype.

Authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
(cherry picked from commit bfe75db)
Signed-off-by: Max Gekk <max.gekk@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants