30 Apr 07:22

jeffreysijuntan

0956764

rLLM: v0.3.0-pre Latest

Latest

What's Changed

[feature] add support for on policy distillation by @kylemontgomery1 in #356
[Bugfix] Make sure the tracking logger is now explicitly finished by trainers by @listar2000 in #358
Add Geo3K Tinker training example with VLM support by @BrianChen26 in #357
Support custom metric for SDK by @thwu1 in #362
fix: typo in bibtex by @tongyx361 in #363
Fireworks API returns raw token ids to avoid re-tokenization by @1stprinciple in #301
overlong_filter in stepwise RL by @LianShuQuan in #365
Verifiers and Prime Intellect Environment Integration & Example by @alt-glitch in #367
Add test_train.sh for docs build by @alt-glitch in #374
Resolve Tinker error in advantage computation by @listar2000 in #375
Merge nightly branch into main by @listar2000 in #377
[Fix] Add a few patches to unified trainer by @listar2000 in #383
[Doc improvements] Add the missing multi-GPU configuration options to the evaluation scripts in the quick-start documentation. by @SyncLionPaw in #384
[Fix] Further enhancements to the Unified Trainer by @listar2000 in #385
[Experimental] Fixing Tinker rollout prompt ids by @listar2000 in #390
Reset workflow with task and uid before workflow execution by @JasonWei05 in #389
Fully Async Trainer by @thwu1 in #394
Add LLM-in-Sandbox to awesome projects by @cdxeve in #395
【Fix】All content in examples/search were empty by @Fulin-Gao in #396
[Feat] Important updates to the experimental unified trainer by @listar2000 in #398
FinQA Release by @mananroongta in #393
Refactor unified trainer async flow and add precomputed-advantage + Tinker rollout fixes by @listar2000 in #401
feat(tracking): integrate changes need for the rllm-ui by @Chanbinski in #402
Fixed the qwen parser to handle case when model is non-thinking but user put disable_thinking=false as config by @jeewoo-lee in #405
Enable TrajectorygGoup's role specific RL advantage estimator via unified trainer hook by @listar2000 in #408
Move Tinker backend to rllm.trainer.tinker and deprecate legacy trainer APIs under rllm.trainer.deprecated by @listar2000 in #409
Refactor RL advantage estimators and add REINFORCE++ baseline/RLOO support by @listar2000 in #410
Extend On-Policy Distillation support with unified trainer example by @BrianChen26 in #406
[Doc] Unified trainer docs by @listar2000 in #414
[Fix] Put doc images into assets by @listar2000 in #415
[Fix] Hot-fix advantage calcuilation unpacking issue by @listar2000 in #416
Add deprecation notices in docs by @listar2000 in #417
feat(tracking): UILogger with non-blocking background worker by @Chanbinski in #419
feat(rllm-model-gateway): add standalone proxy server for rLLM by @luyuzhe111 in #412
added modal deploy script and training code by @jeewoo-lee in #421
Add Claude Code GitHub Workflow by @jeffreysijuntan in #423
feat(cli): port non-blocking UILogger, simplify --ui flag, and support eval UI logging by @Chanbinski in #424
Add PR and issue templates by @listar2000 in #427
Fix ChatTemplateParser import in SFT trainer by @listar2000 in #428
feat(cli): add rllm login command for UI authentication by @Chanbinski in #425
Add unified trainer OPD docs page and training curve asset by @BrianChen26 in #430
fix(openai): respect custom ChatTemplateParser from AgentExecutionEngine by @rajatbeladiya in #420
fix(rewards): add sympy timeout to prevent Ray stalls by @rajatbeladiya in #433
fix(ray): auto-attach to existing cluster to avoid missing actors by @rajatbeladiya in #436
feat(model-gateway): separate worker base URL from API path prefix by @luyuzhe111 in #429
feat(ui): progressive batched uploads, session URL, and registration nudge by @Chanbinski in #440
[Feature]: rLLM CLI, AgentFlow Framework, Model Gateway & Plugin System by @jeffreysijuntan in #438
fix(metrics): include dropped workflow episodes in denominators by @rajatbeladiya in #442
docs: sync rllm-ui.md with latest README content by @Chanbinski in #444
[feature]: @rllm.rollout and @rllm.evaluator decorators + cookbook examples by @jeffreysijuntan in #445
add steps[i].chat_completions; token_warning_threshold from agent_args by @LianShuQuan in #446
feat(engine): replace TinkerBackendServer with in-process local handler by @luyuzhe111 in #448
Dev rllm telemetry by @boredbichon67 in #449
Pydantic Error by @avinashreddydev in #450
feat(tinker): add tool-use support for renderer path and fix checkpoint auto-resume by @luyuzhe111 in #451
Upgrade MiniMax provider to M2.7 models by @octo-patch in #452
feat: add cross-episode Store for sharing state across workflow instances by @listar2000 in #453
fix: replace bare except with except Exception in taco.py by @harshadkhetpal in #455
Add Vision-DeepResearch to awesome projects by @Osilly in #456
Verl 0.7.1 upgrade by @listar2000 in #457
feat(engine): add RemoteAgentFlowEngine for remote agent runtimes by @luyuzhe111 in #441
Further fixes to ensure compatibility with Verl 0.7.1 by @listar2000 in #462
Fix multiprocessing.Manager() server process leak in code reward evaluation by @dubin555 in #411
feat(verl): add megatron-compatible batch padding and agentcore + verl math example by @luyuzhe111 in #463
fix: verl transform robustness + NCCL dynamic batch sync patch by @listar2000 in #466
feat(verl): propagate rollout log probs through transform pipeline by @luyuzhe111 in #467
chore: add megatron dependency install script by @luyuzhe111 in #472
fix: support verl 0.7.1 EngineWorker in agent_workflow_trainer by @yifannnwu in #474
Fix: make sure agent import in init is lazy by @listar2000 in #479
Fix #447: norm_adv_by_std_in_grpo should be from rllm.algorithm.norm_... by @JiwaniZakir in #471
fix: update verl import paths for verl 0.7.1+ compatibility by @Lidang-Jiang in #480
Support multiple MCP servers in MCPEnvironment by @taivu1998 in #476
fix: migrate VerlBackend to new EngineWorker path (verl 0.7.1) by @listar2000 in #483
fix: handle signal.signal ValueError in non-main threads by @yifannnwu in #484
fix(trainer): supplement dfed770 by adding missing update_weights in … by @MarkJoson in #469
feat: add hf_template tokenize_and_mask method + verl SFTTrainer compat for RLLMSFTDataset by @yifannnwu in #485
fix: resolve CI failures — E501 lint, tinker test deps, disable Claude actions by @listar2000 in #486
style: auto-format 21 files to fix ruff-format pre-commit failures by @listar2000 in #487
Integrate fully async training to UnifiedTrainer by @kylemontgomery1 in #481
fix(verl): disable vllm compile cache to work around corruption bug by @luyuzhe111 in #490
Fix: Misc fixes for verl 0.7.1 by @kylemontgomery1 in #496
fix(verl): move legacy worker override to launcher before worker class selection by @luyuzhe111 in #499
chore(scripts): simplify and pin megatron dependencies b...

Contributors

1stprinciple, dubin555, and 29 other contributors

Assets 2

18 Dec 23:51

jeffreysijuntan

v0.2.1.post1

618fa7d

rLLM: v0.2.1.post1

What's Changed

update docs & add curves by @thwu1 in #343
Fix import for colorful_print in agent_sdk_engine.py and agent_sdk_trainer.py by @wht0703 in #345
Unblock sdk installation by overriding dependices by @wht0703 in #348
[Doc] Update README and fix a few installation related issues by @listar2000 in #347
fix: keyerror completion_ids by @kxfan2002 in #353
Fix: Enable GPU acceleration for dense retrieval in search agent by @Gitsamshi in #349

New Contributors

@wht0703 made their first contribution in #345
@kxfan2002 made their first contribution in #353
@Gitsamshi made their first contribution in #349

Full Changelog: v0.2.1...v0.2.1.post1

Contributors

Gitsamshi, listar2000, and 3 other contributors

Assets 2

11 Dec 22:58

jeffreysijuntan

v0.2.1

960d573

rLLM: v0.2.1

rLLM v0.2.1: Tinker backend, VLM training, Eval Protocol, and SDK (preview)

We are excited to release rLLM v0.2.1. This new version comes with the following exciting features:

rLLM SDK (preview): The rLLM SDK enables you to transform agents written in frameworks such as LangGraph, SmolAgent, or Strands into trainable workflows. Check out this LangGraph RAG example, which builds a RAG agent and trains it with the rLLM SDK.
Tinker training backend: In addition to verl, rLLM now supports Tinker as a training backend. You can use the same abstractions for building agents and easily switch between different backends for training.
VLM training: rLLM supports Vision-Language Model training with the verl backend. See the Geo3K training example for reference.
LoRA fine-tuning: rLLM supports LoRA training in both the verl and Tinker backends. See the GSM8K LoRA example for how to enable LoRA training with a single config change.
Eval Protocol Integration We integrate with the Eval Protocol from Fireworks AI. Users can now train on any environments supported by the Eval Protocol. See this example that uses Eval Protocol in rLLM to train a Frozenlake agent.

A big shoutout to @thwu1 @kylemontgomery1 @listar2000 @xzrderek for their outstanding work on these features.

What's Changed

make rllm-specific configs applied correctly and robustly by @listar2000 in #256
Ensure disable_thinking defaults to False when config is None by @Tendo33 in #258
fix: circular import issues in WORKFLOW_CLASS_MAPPING by @listar2000 in #261
[nightly] initialize the nightly branch by @listar2000 in #263
Fix environment variable forwarding to ray runtime env by @listar2000 in #265
[nightly] update recent changes on workflow engines by @listar2000 in #268
Fix : Prevent KeyError in _pad_dataproto_to_world_size by @mananroongta in #274
Fix retokenization by @thwu1 in #272
fix controlling the n_parallel_agents and the concurrent env operations by @LianShuQuan in #271
Added is_correct & reward flow through tool env by @mananroongta in #277
Integrate Eval Protocol as RL environment by @1stprinciple in #276
SWEEnv.from_dict() by @LianShuQuan in #278
Fix: Resolve PyArrow nested data conversion error in distributed dataset loading by @erranlli in #281
Per Episode Logging Feature by @qywu in #282
[feature] Support Tinker as a backend by @thwu1 in #283
[feat] Tinker Workflow Trainer by @thwu1 in #288
Fix fireworks dependency by @listar2000 in #296
Examples: fix utils import by @Flecart in #295
[Refactor] Update Tinker Backend Example by @thwu1 in #300
Revert "fix: Gracefully skip overlong prompts during training to prev… by @1stprinciple in #302
Fixes #303 Optimize old_log_prob computation in PPO trainer by @BabelTower in #304
Bug/n parallel agents by @kylemontgomery1 in #307
[nightly] merge recent updates in main back to nightly by @listar2000 in #308
Adding generic Eval Protocol environments to rLLM by @xzrderek in #306
[feat] sdk by @thwu1 in #310
Multimodal by @kylemontgomery1 in #315
add rllm docs by @xzrderek in #312
Fix import problem of megatron ray worker group by @listar2000 in #319
Fix color print display issue by @listar2000 in #317
[feat] Intergrate OpenTelemetry by @thwu1 in #320
Remove unnecessary free_cache_engine checks. by @listar2000 in #324
add vlm docs by @kylemontgomery1 in #326
[feat] Importance Sampling by @thwu1 in #332
Fix repetitive application id causing vLLM issue by @listar2000 in #334
[feat] Add Langgraph Training Example, Fix bugs, Refactor Sdk by @thwu1 in #335
Add Sdk Doc by @thwu1 in #339
[feature] simplified deps by @kylemontgomery1 in #327
Add gsm8k-lora script by @listar2000 in #342
[v0.2.1] Merge nightly into main for rLLM v0.2.1 by @jeffreysijuntan in #341

New Contributors

@Tendo33 made their first contribution in #258
@thwu1 made their first contribution in #272
@LianShuQuan made their first contribution in #271
@qywu made their first contribution in #282
@Flecart made their first contribution in #295
@BabelTower made their first contribution in #304
@xzrderek made their first contribution in #306

Full Changelog: v0.2.0...v0.2.1

Contributors

erranlli, 1stprinciple, and 11 other contributors

Assets 2

16 Oct 21:24

jeffreysijuntan

v0.2.0

52efedc

rLLM: v0.2.0

rLLM v0.2: RL Training over General Agentic Programs (Blog Post)

We are excited to release rLLM v0.2, a major upgrade of our RL training framework. In v0.1, rLLM provided agent and OpenAI Gym-like environment abstractions to support training ReACT-style agents. In v0.2, we additionally introduce AgentWorkflowEngine and AgentWorkflowTrainer—more general abstractions that enable arbitrary agentic programs to be trained. Agent builders and researchers can now define multi-agent systems, complex workflows (e.g., solver-judge, planner executor, MCTS), and agentic programs with custom reward functions, and train them with reinforcement learning without rewriting their production code.

Key Features in v0.2

Support the official verl==0.5.0 as training backend, no custom verl fork anymore! verl==0.5.0 comes with support of the following features which are now supported in rLLM (@kylemontgomery1):
- Megatron training support (@jeewoo-lee)
- SGLang as the rollout engine, in addition to vLLM.
Introduce AgentWorkflowEngine, which enables passing in arbitrary agentic programs for training. (@kylemontgomery1)
Support more agents and environments
- Terminus and TerminalBench (@JasonWei05)
- Tongyi DeepResearch agent (@yayashuxue)
- AppWorld and AppWorldReactAgent (@sunan135)
Integration with other agentic framework/SDK
- Strands SDK from AWS
- SmolAgents

What's Changed

fix <tool_calls_begin> variable by @wj-Mcat in #142
Fix not registered license from code by @annyan09023 in #144
fix r2egym import error; update installation README by @jeffreysijuntan in #146
update deepscaler max_prompt_length to avoid exception during training by @jeffreysijuntan in #148
fix(syntax): Resolve invalid escape sequence warnings by @tonyz0x0 in #154
added Tools for SFT by @mananroongta in #160
update docs by @jeffreysijuntan in #167
Add dark mode to docs by @philippnormann in #168
[FIX] Fix tool calling result parsing problem in tranjectory visualizer & MCP tool name fixing by @VincentXWD in #174
[hotfix][miniwob] Fix gymnasium.error.NameNotFound by @abrohamLee in #172
Load full DeepCoder dataset, instead of LCB subset by @mananroongta in #178
[feat][docker] Installation with Docker by @abrohamLee in #177
Add macOS compatibility: exclude GPU dependencies on darwin by @yayashuxue in #180
Torch 2.7.0 only compatible with MacOS python=3.11 by @yayashuxue in #184
Migrate to verl v0.5.0 by @kylemontgomery1 in #193
Terminal Bench Integration into rLLM (Simplified) by @JasonWei05 in #205
feat: Integrate Strands SDK with RLLM for scalable tool-enabled agent training by @yayashuxue in #206
Add VimGolf agent training example by @James4Ever0 in #209
fix: update search engine source data path by @noiji in #216
[feature] Adding Megatron support for v0.2 by @jeewoo-lee in #221
Use RolloutEngine for single_turn_workflow.py by @1stprinciple in #223
Standalone inference: remove hard verl dependency by @JasonWei05 in #228
Update pyproject.toml to v0.2.0 by @NIL-zhuang in #229
proper handling the case that next_observation is empty dict by @erranlli in #233
[v0.2] Add lazy import to fix circular import and ray init config support by @listar2000 in #236
v0.2 verl patch by @kylemontgomery1 in #237
v0.2 masking/parsing fix by @kylemontgomery1 in #238
v0.2 rollout upgrade by @kylemontgomery1 in #241
Feat: deepresearch integration by @yayashuxue in #215
workflow updates by @kylemontgomery1 in #244
added colab example of solver judge by @jeewoo-lee in #246
v0.2 misc changes by @kylemontgomery1 in #245
Add FireworksEngine for disaggregated rollout by @1stprinciple in #243
AppWorld Integration for rLLM by @sunan135 in #235
V0.2 by @jeffreysijuntan in #247
update solver judge workflow by @kylemontgomery1 in #248
update install instructions, update solver judge notebook by @kylemontgomery1 in #249