Skip to content

feat: add hf_template tokenize_and_mask method + verl SFTTrainer compat for RLLMSFTDataset#485

Merged
kylemontgomery1 merged 1 commit into
rllm-org:mainfrom
yifannnwu:feat/sft-hf-template
Apr 4, 2026
Merged

feat: add hf_template tokenize_and_mask method + verl SFTTrainer compat for RLLMSFTDataset#485
kylemontgomery1 merged 1 commit into
rllm-org:mainfrom
yifannnwu:feat/sft-hf-template

Conversation

@yifannnwu
Copy link
Copy Markdown
Contributor

Summary

Two changes to RLLMSFTDataset:

  1. verl SFTTrainer compatibility: RLLMSFTDataset.__init__ now accepts processor and
    max_samples kwargs, matching the call signature of verl's create_sft_dataset(). Without
    this, using RLLMSFTDataset as custom_cls in verl's SFTTrainer(config) crashes with
    TypeError: unexpected keyword argument.

  2. hf_template tokenization method: The existing cumulative and stepwise methods use
    rLLM's ChatTemplateParser to render messages, which renders tool calls as JSON-in-XML:

    <tool_call>
    {"name": "func", "arguments": {"key": "val"}}
    </tool_call>
    

    But models like Qwen3-Coder expect native XML format:

    <tool_call>
    <function=func>
    <parameter=key>val</parameter>
    </function>
    </tool_call>
    

    The new hf_template method uses tokenizer.apply_chat_template() directly, producing
    the model's native format. It uses an incremental prefix-diff approach to isolate each
    message's tokens for correct loss masking.

    Config: data.rllm.tokenize_and_mask_method: hf_template

Files changed

  • rllm/trainer/verl/sft_dataset.py (~40 lines)

Test plan

  • Validated on 100 samples — tool calls correctly render in native XML format
  • Text-only samples render identically under all methods
  • SFT training with verl's SFTTrainer(config) + custom_cls: RLLMSFTDataset works

1. RLLMSFTDataset.__init__ now accepts processor and max_samples kwargs,
   matching verl's create_sft_dataset() call signature. Without this,
   using RLLMSFTDataset as custom_cls with verl's SFTTrainer(config)
   crashes with TypeError.

2. Add hf_template tokenization method that uses tokenizer.apply_chat_template()
   directly instead of rLLM's ChatTemplateParser. The existing cumulative/stepwise
   methods render tool calls as JSON-in-XML, which is wrong for models with native
   XML tool call format (e.g. Qwen3-Coder). The hf_template method produces the
   model's native format.

   Config: data.rllm.tokenize_and_mask_method: hf_template
@listar2000
Copy link
Copy Markdown
Collaborator

@kylemontgomery1
Copy link
Copy Markdown
Collaborator

Looks good to me!

@kylemontgomery1 kylemontgomery1 merged commit e5b81c1 into rllm-org:main Apr 4, 2026
0 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants