feat(aggregation): Add SDMGradWeighting by KhusPatel4450 · Pull Request #728 · SimplexLab/TorchJD

KhusPatel4450 · 2026-06-10T13:54:36Z

Adds SDMGradWeighting from Direction-oriented Multi-objective Learning: Simple and Provable Stochastic Algorithms (NeurIPS 2023).

It mirrors MoDoWeighting's structure: the user computes the cross-batch matrix A = J_1 @ J_2.T from two independent mini-batches (via autojac.jac) and passes it to the weighting. The weighting runs the inner simplex-projected solve (matching the official OptMN-Lab/LibMTL momentum-SGD inner loop), tracks w across calls, and returns the direction-augmented weights so the parameter update is the usual losses.backward(weights).

Two points worth a look:

(1+λ) normalization — the returned weights are (w_S + λ·w̃)/(1+λ) (sum to 1), matching the official implementation and LibMTL (g = (gw + λ·g0)/(1+λ)).
Defaults — lr=10, momentum=0.5, n_iter=20 follow the official OptMN-Lab class; lamda=0.3 follows the official run.sh experiments and LibMTL (their class default 0.6 is overridden to 0.3 in their own experiments).

Includes unit tests, docs, a NOTICES entry, and a CHANGELOG entry.

ValerianRey

Looking good already!

My biggest concern is to understand where the official implementation differs from the paper, and to make sure that we're implementing the right version.

As far as I understand, here are the differences between the paper and the official implementation:

Aliasing of the jacobians in the official implementation, as you reported on discord, meaning that only the third jacobian is actually used. => Clearly a bug that we don't want to reproduce, but we want to report. This bug doesn't seem to be in LibMTL though.
Division by scale in both the official impl and in libmtl:
In the official implementation:

scale = torch.mean(torch.sqrt(torch.diag(GG) + 1e-4))
GG = GG / scale.pow(2)

In LibMTL:

GG_diag = torch.diag(GG)
GG_diag = torch.where(GG_diag < 0, torch.zeros_like(GG_diag), GG_diag)
scale = torch.mean(torch.sqrt(GG_diag))
GG = GG / (scale.pow(2) + 1e-8)

I don't see that in the paper's algorithm (but maybe it's in the appendices or somewhere else). Should we do that too? Is it actually a form of normalization that could be implemented elsewhere?

The final update in the official impl (and in LibMTL) is:

g0 = torch.mean(zeta_grads, dim=1)
gw = torch.sum(zeta_grads * w, dim=1)
g = (gw + lamda * g0) / (1 + lamda)

I don't understand why they divide by (1 + lambda). Should we do that too? It seems you did, but we need to understand why. EDIT: I just read your PR message saying that this is to make the weights sum to 1. It makes sense. Maybe we can just keep it this way then! Also, this is just equivalent to a constant (unless lambda changes) LR factor, so it doesn't matter much and it's better to be equivalent to existing implementations there.

…utive_steps

KhusPatel4450 · 2026-06-10T17:41:52Z

Hello,

So this pretty much has all your comments addressed. The previous commit has most of the code changes, and then this new one was just to fix the testing error.

Moved _projection2simplex out of both MoDoWeighting and SDMGradWeighting into aggregation._utils.simplex so it's shared, and moved the known-values test to tests/unit/aggregation/_utils/test_simplex.py, this just makes things cleaner

ValerianRey · 2026-06-10T18:14:07Z

Very cool, thanks for fixing everything. All my comments have been addressed, except that I still don't understand why they normalize with scale in LibMTL and in the official implementation (see point 2 of my main review message). Any idea about this?

KhusPatel4450 · 2026-06-10T18:21:51Z

regarding your comment, I think the main reason is that it keeps the inner loop numerically stable regardless of the gradient scale. So like, whether we should add it or not? I mean we can technically just add a note that users should normalize A themselves in certain situations, sometimes it might not be beneficial too, so leaving that choice up to the user makes the most sense to me

ValerianRey · 2026-06-10T19:47:53Z

regarding your comment, I think the main reason is that it keeps the inner loop numerically stable regardless of the gradient scale. So like, whether we should add it or not? I mean we can technically just add a note that users should normalize A themselves in certain situations, sometimes it might not be beneficial too, so leaving that choice up to the user makes the most sense to me

Actually I asked Claude and it gave me a pretty good answer: this is actually mentioned in appendix 6.1, second paragraph. It also seems to pair well with the division by (1 + lambda) at the end. So I think we should actually add this in our implementation of SDMGrad. Should be easy to add, but the manually computed examples will probably need to be updated.

Source: https://claude.ai/share/a7ae2ca8-6952-4388-a81e-ec8f5e82d4cc

KhusPatel4450 · 2026-06-10T21:29:51Z

Hello after looking at the claude coversation, I added the scale normalization, it is there now in this new commit.

feat(aggregation): Add SDMGradWeighting

c409c4a

KhusPatel4450 requested review from a team, PierreQuinton and ValerianRey as code owners June 10, 2026 13:54

KhusPatel4450 added package: aggregation cc: feat Conventional commit type for new features. labels Jun 10, 2026

ValerianRey requested changes Jun 10, 2026

View reviewed changes

KhusPatel4450 added 2 commits June 10, 2026 13:29

refactor(aggregation): address PR review comments on SDMGradWeighting

f3c3d48

fix(aggregation): use eye_ helper to respect DTYPE in test_two_consec…

db86b53

…utive_steps

Merge branch 'main' into feat/sdmgrad-weighting

21a8c53

ValerianRey mentioned this pull request Jun 10, 2026

Aggregator tracker #665

Open

feat(aggregation): add scale normalization to SDMGradWeighting

6f1c9d2

ValerianRey approved these changes Jun 11, 2026

View reviewed changes

ValerianRey merged commit 3e5b88c into SimplexLab:main Jun 11, 2026
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(aggregation): Add SDMGradWeighting#728

feat(aggregation): Add SDMGradWeighting#728
ValerianRey merged 5 commits into
SimplexLab:mainfrom
KhusPatel4450:feat/sdmgrad-weighting

KhusPatel4450 commented Jun 10, 2026

Uh oh!

ValerianRey left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

KhusPatel4450 commented Jun 10, 2026

Uh oh!

ValerianRey commented Jun 10, 2026 •

edited

Loading

Uh oh!

KhusPatel4450 commented Jun 10, 2026

Uh oh!

ValerianRey commented Jun 10, 2026 •

edited

Loading

Uh oh!

KhusPatel4450 commented Jun 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

KhusPatel4450 commented Jun 10, 2026

Uh oh!

ValerianRey left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

KhusPatel4450 commented Jun 10, 2026

Uh oh!

ValerianRey commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

KhusPatel4450 commented Jun 10, 2026

Uh oh!

ValerianRey commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

KhusPatel4450 commented Jun 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ValerianRey left a comment •

edited

Loading

ValerianRey commented Jun 10, 2026 •

edited

Loading

ValerianRey commented Jun 10, 2026 •

edited

Loading