Skip to content

[SPARK-57485][BUILD] Exclude package-private Scala types from the generated Javadoc#56538

Open
cloud-fan wants to merge 2 commits into
apache:masterfrom
cloud-fan:genjavadoc-exclude-package-private
Open

[SPARK-57485][BUILD] Exclude package-private Scala types from the generated Javadoc#56538
cloud-fan wants to merge 2 commits into
apache:masterfrom
cloud-fan:genjavadoc-exclude-package-private

Conversation

@cloud-fan

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Spark publishes both a Scaladoc and a Javadoc API site. The Javadoc is generated from Scala sources by genjavadoc, and it currently exposes a large number of internal types that the Scaladoc correctly hides.

The root cause: a top-level private[x] Scala type (e.g. private[spark] trait SupportsDelegationToken) compiles to a JVM-public symbol. genjavadoc emits a public Java stub for it even with -P:genjavadoc:strictVisibility=true, and the Javadoc -public option can't filter it because the stub genuinely is public. Scaladoc, by contrast, honors the access qualifier and drops these types.

This PR adds a filter to JavaUnidoc / unidoc / unidocAllSources (alongside the existing ignoreUndocumentedPackages) that drops a generated stub <module>/target/java/<pkg>/<Name>.java iff every top-level Scala declaration of <Name> in that package is private[...]. A public class with a private[...] companion object (e.g. SparkConf — public class, private[spark] object) is kept, since the class itself is public.

Why are the changes needed?

The published Javadoc lists ~1.3k internal types (e.g. BarrierCoordinator, ContextCleaner, ExecutorAllocationManager, scheduler RPC messages, SupportsDelegationToken) that are private[spark] in source and are absent from the Scaladoc. This both misleads Java users about the public API surface and makes the two API docs disagree on which types are public. Filtering them aligns the Java API doc with the Scala one (format still differs, coverage now matches) without touching genuinely Java-authored public APIs.

Does this PR introduce any user-facing change?

No code/runtime change. The only user-facing effect is on the generated Javadoc site: top-level private[spark] (and other qualified-private) Scala types no longer appear as public Java classes. Genuinely public APIs — including Java-authored ones (src/main/java, e.g. the DataSource V2 connector interfaces) and Java-friendly wrappers like org.apache.spark.api.java.JavaRDD — are unaffected.

How was this patch tested?

  • Validated the filter selects exactly the package-private stubs against the already-generated */target/java stubs across core, sql/core, sql/api, sql/catalyst, mllib, streaming: it drops the private[spark] leaks (SupportsDelegationToken, StructuredStreamingIdAwareSchedulerLogging, InternalAccumulator, ~1.3k total) while keeping public types and public-class-with-private-companion cases (SparkConf, SparkContext, TaskContext, RDD).
  • Confirmed the build definition compiles via build/sbt reload.
  • A full build/sbt unidoc run is the end-to-end integration check; relying on CI's docs build for that.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Isaac)

This pull request and its description were written by Isaac.

…erated Javadoc

genjavadoc emits top-level `private[x]` Scala types (e.g. `private[spark] trait
Foo`) as public Java stubs even with `-P:genjavadoc:strictVisibility=true`, and
the Javadoc `-public` option can't drop them because the stub really is public
(`private[spark]` compiles to JVM-public). ScalaDoc honors the qualifier and
hides such types, so the published Javadoc covers ~1.3k internal types the
Scaladoc does not. Filter these stubs out of `JavaUnidoc / unidocAllSources` so
the Java API doc matches the Scala one.

Co-authored-by: Isaac
Broaden privateTopTypeRe to match top-level `private` (no brackets) and
declarations where a modifier precedes the access qualifier (e.g.
`final private[streaming] class DStreamGraph`). A bare top-level `private`
type is private to its enclosing package and compiles to a JVM-public
symbol, so it leaks into the Javadoc exactly like `private[spark]`
(e.g. ContextCleaner's `CleanAccum`, `rdd.DefaultPartitionCoalescer`).
publicTopTypeRe is unchanged, so the public-class-with-private-companion
keep-case (e.g. SparkConf) is preserved.

Co-authored-by: Isaac
@cloud-fan cloud-fan force-pushed the genjavadoc-exclude-package-private branch from 09f351a to 5c58d6d Compare June 16, 2026 21:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants