feat: add hf_template tokenize_and_mask method + verl SFTTrainer compat for RLLMSFTDataset#485
Merged
Conversation
1. RLLMSFTDataset.__init__ now accepts processor and max_samples kwargs, matching verl's create_sft_dataset() call signature. Without this, using RLLMSFTDataset as custom_cls with verl's SFTTrainer(config) crashes with TypeError. 2. Add hf_template tokenization method that uses tokenizer.apply_chat_template() directly instead of rLLM's ChatTemplateParser. The existing cumulative/stepwise methods render tool calls as JSON-in-XML, which is wrong for models with native XML tool call format (e.g. Qwen3-Coder). The hf_template method produces the model's native format. Config: data.rllm.tokenize_and_mask_method: hf_template
Collaborator
Collaborator
|
Looks good to me! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two changes to
RLLMSFTDataset:verl SFTTrainer compatibility:
RLLMSFTDataset.__init__now acceptsprocessorandmax_sampleskwargs, matching the call signature of verl'screate_sft_dataset(). Withoutthis, using
RLLMSFTDatasetascustom_clsin verl'sSFTTrainer(config)crashes withTypeError: unexpected keyword argument.hf_templatetokenization method: The existingcumulativeandstepwisemethods userLLM's
ChatTemplateParserto render messages, which renders tool calls as JSON-in-XML:But models like Qwen3-Coder expect native XML format:
The new
hf_templatemethod usestokenizer.apply_chat_template()directly, producingthe model's native format. It uses an incremental prefix-diff approach to isolate each
message's tokens for correct loss masking.
Config:
data.rllm.tokenize_and_mask_method: hf_templateFiles changed
rllm/trainer/verl/sft_dataset.py(~40 lines)Test plan
SFTTrainer(config)+custom_cls: RLLMSFTDatasetworks