docs(AGENTS): warn against raw tokenizer.encode on chat-tuned models#706
Merged
dphuang2 merged 1 commit intoMay 14, 2026
Conversation
Add a pitfall note clarifying that calling tokenizer.encode on the prompt directly (instead of apply_chat_template or a cookbook renderer) produces OOD prompt tokens for chat-tuned models like gpt-oss-120b, Llama-3-Instruct, Qwen-Instruct, etc. Empirically the sampler and trainer disagree by 5x+ on KL with max per-token ratios in the tens on such inputs, which silently breaks PPO/CISPO/GRPO importance ratios. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
derek-tml
approved these changes
May 14, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add a pitfall note to
AGENTS.md(andCLAUDE.mdvia its symlink) clarifying that callingtokenizer.encode(prompt)directly on a chat-tuned model — instead ofapply_chat_templateor a cookbook renderer — produces OOD prompt tokens.For models like
gpt-oss-120b,Llama-3-Instruct,Qwen-Instruct, etc., the sampler and trainer take subtly different code paths on OOD inputs, and per-token sampler/trainer logprob KL can inflate by 5×+ with max ratios in the tens. This silently breaks PPO/CISPO/GRPO importance ratios.Empirical evidence
Small repro on
openai/gpt-oss-120bwithloss_fn=cispo, sampling-path runs:tokenizer.encodeapply_chat_template(Forced-completion paths require both prompt and completion to be in the chat format; applying the template to only the prompt half makes that path worse, which the note acknowledges by guiding users to renderers / proper
messageslists rather than ad-hoc string concatenation.)Does the doc change actually shift agent behavior?
Ran a small A/B with fresh agents (general-purpose subagents, no other context), N=2 per condition. Both conditions embed the full
AGENTS.mdin the prompt; the only difference is whether the new pitfall hunk is present. The task is identical and uses a naive framing — what an author actually thinks when writing a tokenization helper, not "I am writing a logprob-parity script":return tokenizer.encode(prompt).ids❌return tokenizer.encode(prompt)❌apply_chat_template(messages, tokenize=True, add_generation_prompt=True)["input_ids"]✅apply_chat_template(messages, tokenize=True, add_generation_prompt=True)["input_ids"]✅Both
Beforetrials reproduced the raw-encode bug (via slightly different APIs — HF'sPreTrainedTokenizer.encodeand the tokenizers-library.encode().ids, but the same conceptual failure). BothAftertrials reached for the chat template and correctly unwrapped["input_ids"]from theBatchEncoding— the exact wrinkle the pitfall's parenthetical calls out.Difference is categorical, not subtle. N=2 isn't statistical proof, but the four outputs are opposite enough to suggest the note actually shifts behavior on the framing that produces real-world bugs.
Test plan
encodetoapply_chat_templateon the naive-framing task.