fix(verl): build transform attention masks from sequence lengths#517
fix(verl): build transform attention masks from sequence lengths#517JasonWei05 wants to merge 1 commit into
Conversation
…on-padding tokens
|
@JasonWei05 thanks for the PR! could you share a concrete example for this issue; "but it is possible, especially when collapsed multi-turn responses include copied prompt/tool-observation tokens. The transform previously built masks with batch != pad_token_id, which can incorrectly mark real in-sequence tokens as padding" having a hard time grasping exactly what happened. |
|
Yeah I agree that I shouldn't claim this as the motivating case. The simpler reason for the change is that padding is positional: |
Summary
This PR makes verl transform attention-mask construction derive from sequence lengths rather than token values.
_pad_sequence_batchalready determines where padding is inserted: prompts are left-padded and responses are right-padded. The attention mask should mirror that padding layout directly. The previousbatch != pad_token_idshortcut is equivalent in the common case, but it relies on the tokenizer-specific invariant thatpad_token_idonly appears in padded positions; length-based masking removes that unnecessary assumption.Type of change
What changed
rllm/experimental/verl/transform.pywith length-based mask construction.Validation
pre-commit run --all-filespytest ...Validation details:
uv run --no-project --with pytest python -m pytest tests/unified_trainer/test_verl_transform.py -qpassed:7 passed, 1 warning.uv run --no-project --with pre-commit pre-commit run ruff-format --files rllm/experimental/verl/transform.py tests/unified_trainer/test_verl_transform.pypassed.uv run --no-project --with pre-commit pre-commit run ruff --files rllm/experimental/verl/transform.py tests/unified_trainer/test_verl_transform.pypassed.git diff --checkpassed.Breaking changes / migration notes
Docs / examples
Related issues / PRs
Screenshots / logs