fix(verl): build transform attention masks from sequence lengths by JasonWei05 · Pull Request #517 · rllm-org/rllm

JasonWei05 · 2026-04-29T20:07:03Z

Summary

This PR makes verl transform attention-mask construction derive from sequence lengths rather than token values. _pad_sequence_batch already determines where padding is inserted: prompts are left-padded and responses are right-padded. The attention mask should mirror that padding layout directly. The previous batch != pad_token_id shortcut is equivalent in the common case, but it relies on the tokenizer-specific invariant that pad_token_id only appears in padded positions; length-based masking removes that unnecessary assumption.

Type of change

What changed

Replaced value-based attention-mask construction in rllm/experimental/verl/transform.py with length-based mask construction.
Preserved existing prompt left-padding and response right-padding behavior.
Added a synthetic regression test showing mask construction follows sequence length rather than token value.

Validation

pre-commit run --all-files
Targeted tests: pytest ...
Manual validation performed
Not run (reason below)

Validation details:

uv run --no-project --with pytest python -m pytest tests/unified_trainer/test_verl_transform.py -q passed: 7 passed, 1 warning.
uv run --no-project --with pre-commit pre-commit run ruff-format --files rllm/experimental/verl/transform.py tests/unified_trainer/test_verl_transform.py passed.
uv run --no-project --with pre-commit pre-commit run ruff --files rllm/experimental/verl/transform.py tests/unified_trainer/test_verl_transform.py passed.
git diff --check passed.

Breaking changes / migration notes

None

Docs / examples

Not needed
Updated docs
Updated examples
Follow-up docs needed

Related issues / PRs

Fixes #
Related to #
Stacked on / depends on #

Screenshots / logs

…on-padding tokens

luyuzhe111 · 2026-04-29T20:28:19Z

@JasonWei05 thanks for the PR! could you share a concrete example for this issue;

"but it is possible, especially when collapsed multi-turn responses include copied prompt/tool-observation tokens. The transform previously built masks with batch != pad_token_id, which can incorrectly mark real in-sequence tokens as padding"

having a hard time grasping exactly what happened.

JasonWei05 · 2026-04-29T21:13:13Z

Yeah I agree that I shouldn't claim this as the motivating case. The simpler reason for the change is that padding is positional: _pad_sequence_batch decides exactly where padding is inserted, with prompts left-padded and responses right-padded. The attention mask should mirror those lengths directly. Using batch != pad_token_id is an unnecessary tokenizer-specific assumption. I will update the PR description to make this a correctness/robustness cleanup rather than a claim about a common failure mode.

update attention mask in verl to use length instead of checking for n…

de641a7

…on-padding tokens

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(verl): build transform attention masks from sequence lengths#517

fix(verl): build transform attention masks from sequence lengths#517
JasonWei05 wants to merge 1 commit into
mainfrom
fix/verl-collapse

JasonWei05 commented Apr 29, 2026 •

edited

Loading

Uh oh!

luyuzhe111 commented Apr 29, 2026

Uh oh!

JasonWei05 commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JasonWei05 commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Type of change

What changed

Validation

Breaking changes / migration notes

Docs / examples

Related issues / PRs

Screenshots / logs

Uh oh!

luyuzhe111 commented Apr 29, 2026

Uh oh!

JasonWei05 commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JasonWei05 commented Apr 29, 2026 •

edited

Loading