Skip to content

fix(verl): build transform attention masks from sequence lengths#517

Open
JasonWei05 wants to merge 1 commit into
mainfrom
fix/verl-collapse
Open

fix(verl): build transform attention masks from sequence lengths#517
JasonWei05 wants to merge 1 commit into
mainfrom
fix/verl-collapse

Conversation

@JasonWei05
Copy link
Copy Markdown
Collaborator

@JasonWei05 JasonWei05 commented Apr 29, 2026

Summary

This PR makes verl transform attention-mask construction derive from sequence lengths rather than token values. _pad_sequence_batch already determines where padding is inserted: prompts are left-padded and responses are right-padded. The attention mask should mirror that padding layout directly. The previous batch != pad_token_id shortcut is equivalent in the common case, but it relies on the tokenizer-specific invariant that pad_token_id only appears in padded positions; length-based masking removes that unnecessary assumption.

Type of change

  • Feature
  • Fix
  • Docs
  • Refactor
  • Example / Project
  • Infra / CI

What changed

  • Replaced value-based attention-mask construction in rllm/experimental/verl/transform.py with length-based mask construction.
  • Preserved existing prompt left-padding and response right-padding behavior.
  • Added a synthetic regression test showing mask construction follows sequence length rather than token value.

Validation

  • pre-commit run --all-files
  • Targeted tests: pytest ...
  • Manual validation performed
  • Not run (reason below)

Validation details:

  • uv run --no-project --with pytest python -m pytest tests/unified_trainer/test_verl_transform.py -q passed: 7 passed, 1 warning.
  • uv run --no-project --with pre-commit pre-commit run ruff-format --files rllm/experimental/verl/transform.py tests/unified_trainer/test_verl_transform.py passed.
  • uv run --no-project --with pre-commit pre-commit run ruff --files rllm/experimental/verl/transform.py tests/unified_trainer/test_verl_transform.py passed.
  • git diff --check passed.

Breaking changes / migration notes

  • None

Docs / examples

  • Not needed
  • Updated docs
  • Updated examples
  • Follow-up docs needed

Related issues / PRs

  • Fixes #
  • Related to #
  • Stacked on / depends on #

Screenshots / logs

@luyuzhe111
Copy link
Copy Markdown
Collaborator

@JasonWei05 thanks for the PR! could you share a concrete example for this issue;

"but it is possible, especially when collapsed multi-turn responses include copied prompt/tool-observation tokens. The transform previously built masks with batch != pad_token_id, which can incorrectly mark real in-sequence tokens as padding"

having a hard time grasping exactly what happened.

@JasonWei05
Copy link
Copy Markdown
Collaborator Author

Yeah I agree that I shouldn't claim this as the motivating case. The simpler reason for the change is that padding is positional: _pad_sequence_batch decides exactly where padding is inserted, with prompts left-padded and responses right-padded. The attention mask should mirror those lengths directly. Using batch != pad_token_id is an unnecessary tokenizer-specific assumption. I will update the PR description to make this a correctness/robustness cleanup rather than a claim about a common failure mode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants