Skip to content

[SPARK-57467][SCHEDULER] Reuse identical resource profiles when available#56516

Open
psavalle wants to merge 2 commits into
apache:masterfrom
psavalle:pa/resource-profile-get-or-add
Open

[SPARK-57467][SCHEDULER] Reuse identical resource profiles when available#56516
psavalle wants to merge 2 commits into
apache:masterfrom
psavalle:pa/resource-profile-get-or-add

Conversation

@psavalle

@psavalle psavalle commented Jun 15, 2026

Copy link
Copy Markdown

What changes were proposed in this pull request?

When registering a resource profile, check if one with the same requests already exists. If so, return it, so that the same executors can be reused.

ResourceProfileManager has a method getEquivalentProfile, but it is only called in DAGScheduler.mergeResourceProfilesForStage when stageResourceProfiles.size > 1. Even there, it seems it may be prone to a race condition since it does not acquire the lock for both steps (checking for a profile, adding one if needed).

Why are the changes needed?

Currently, every time a profile is registered, new executors will need to spin up -- even if the resource requests are the same. This is especially problematic with Spark Connect, where different isolated sessions may try and register the same profile, but all end up with isolated executors.

We could also consider putting this behavior behind a configuration flag, if it is desirable to retain the previous behavior by default.

Does this PR introduce any user-facing change?

Previously, equivalent resource profiles would get different profile IDs, and separate executors.

With this change, equivalent resource profiles will get the same ID, and share executors.

How was this patch tested?

Added tests

Was this patch authored or co-authored using generative AI tooling?

Yes, modified initial changes provided by Claude.

Generated-by: 2.1.159 (Claude Code)

@psavalle

Copy link
Copy Markdown
Author

Not really sure would could have a look, maybe @wbo4958? Thank you!

Comment thread core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
@uros-b

uros-b commented Jun 17, 2026

Copy link
Copy Markdown
Member

Regarding the PR description, it seems that there actually is an observable behavior change. Repeated CreateResourceProfile calls with identical resources now return the same profile ID instead of a new one each time. So, equivalent profiles now share executors under dynamic allocation instead of each getting a separate pool.

If there is a user-facing change, it should be notead and we will need an appropriate release note.

@uros-b

uros-b commented Jun 17, 2026

Copy link
Copy Markdown
Member

In any case, I don't have full context here - so @tgravescs @wbo4958 @HyukjinKwon @Ngone51 please review this PR.

@psavalle

Copy link
Copy Markdown
Author

Makes sense, I've updated the PR description, thank you.

@mridulm

mridulm commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

+CC @max2718281 PTAL

@uros-b uros-b requested a review from tgravescs June 18, 2026 06:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants