Releases: rllm-org/rllm
rLLM: v0.3.0-pre
What's Changed
- [feature] add support for on policy distillation by @kylemontgomery1 in #356
- [Bugfix] Make sure the tracking logger is now explicitly finished by trainers by @listar2000 in #358
- Add Geo3K Tinker training example with VLM support by @BrianChen26 in #357
- Support custom metric for SDK by @thwu1 in #362
- fix: typo in bibtex by @tongyx361 in #363
- Fireworks API returns raw token ids to avoid re-tokenization by @1stprinciple in #301
- overlong_filter in stepwise RL by @LianShuQuan in #365
- Verifiers and Prime Intellect Environment Integration & Example by @alt-glitch in #367
- Add test_train.sh for docs build by @alt-glitch in #374
- Resolve Tinker error in advantage computation by @listar2000 in #375
- Merge nightly branch into main by @listar2000 in #377
- [Fix] Add a few patches to unified trainer by @listar2000 in #383
- [Doc improvements] Add the missing multi-GPU configuration options to the evaluation scripts in the quick-start documentation. by @SyncLionPaw in #384
- [Fix] Further enhancements to the Unified Trainer by @listar2000 in #385
- [Experimental] Fixing Tinker rollout prompt ids by @listar2000 in #390
- Reset workflow with task and uid before workflow execution by @JasonWei05 in #389
- Fully Async Trainer by @thwu1 in #394
- Add LLM-in-Sandbox to awesome projects by @cdxeve in #395
- 【Fix】All content in examples/search were empty by @Fulin-Gao in #396
- [Feat] Important updates to the experimental unified trainer by @listar2000 in #398
- FinQA Release by @mananroongta in #393
- Refactor unified trainer async flow and add precomputed-advantage + Tinker rollout fixes by @listar2000 in #401
- feat(tracking): integrate changes need for the rllm-ui by @Chanbinski in #402
- Fixed the qwen parser to handle case when model is non-thinking but user put disable_thinking=false as config by @jeewoo-lee in #405
- Enable TrajectorygGoup's role specific RL advantage estimator via unified trainer hook by @listar2000 in #408
- Move Tinker backend to rllm.trainer.tinker and deprecate legacy trainer APIs under rllm.trainer.deprecated by @listar2000 in #409
- Refactor RL advantage estimators and add REINFORCE++ baseline/RLOO support by @listar2000 in #410
- Extend On-Policy Distillation support with unified trainer example by @BrianChen26 in #406
- [Doc] Unified trainer docs by @listar2000 in #414
- [Fix] Put doc images into assets by @listar2000 in #415
- [Fix] Hot-fix advantage calcuilation unpacking issue by @listar2000 in #416
- Add deprecation notices in docs by @listar2000 in #417
- feat(tracking): UILogger with non-blocking background worker by @Chanbinski in #419
- feat(rllm-model-gateway): add standalone proxy server for rLLM by @luyuzhe111 in #412
- added modal deploy script and training code by @jeewoo-lee in #421
- Add Claude Code GitHub Workflow by @jeffreysijuntan in #423
- feat(cli): port non-blocking UILogger, simplify --ui flag, and support eval UI logging by @Chanbinski in #424
- Add PR and issue templates by @listar2000 in #427
- Fix ChatTemplateParser import in SFT trainer by @listar2000 in #428
- feat(cli): add
rllm logincommand for UI authentication by @Chanbinski in #425 - Add unified trainer OPD docs page and training curve asset by @BrianChen26 in #430
- fix(openai): respect custom ChatTemplateParser from AgentExecutionEngine by @rajatbeladiya in #420
- fix(rewards): add sympy timeout to prevent Ray stalls by @rajatbeladiya in #433
- fix(ray): auto-attach to existing cluster to avoid missing actors by @rajatbeladiya in #436
- feat(model-gateway): separate worker base URL from API path prefix by @luyuzhe111 in #429
- feat(ui): progressive batched uploads, session URL, and registration nudge by @Chanbinski in #440
- [Feature]: rLLM CLI, AgentFlow Framework, Model Gateway & Plugin System by @jeffreysijuntan in #438
- fix(metrics): include dropped workflow episodes in denominators by @rajatbeladiya in #442
- docs: sync rllm-ui.md with latest README content by @Chanbinski in #444
- [feature]:
@rllm.rolloutand@rllm.evaluatordecorators + cookbook examples by @jeffreysijuntan in #445 - add steps[i].chat_completions; token_warning_threshold from agent_args by @LianShuQuan in #446
- feat(engine): replace TinkerBackendServer with in-process local handler by @luyuzhe111 in #448
- Dev rllm telemetry by @boredbichon67 in #449
- Pydantic Error by @avinashreddydev in #450
- feat(tinker): add tool-use support for renderer path and fix checkpoint auto-resume by @luyuzhe111 in #451
- Upgrade MiniMax provider to M2.7 models by @octo-patch in #452
- feat: add cross-episode Store for sharing state across workflow instances by @listar2000 in #453
- fix: replace bare except with except Exception in taco.py by @harshadkhetpal in #455
- Add Vision-DeepResearch to awesome projects by @Osilly in #456
- Verl 0.7.1 upgrade by @listar2000 in #457
- feat(engine): add RemoteAgentFlowEngine for remote agent runtimes by @luyuzhe111 in #441
- Further fixes to ensure compatibility with Verl 0.7.1 by @listar2000 in #462
- Fix multiprocessing.Manager() server process leak in code reward evaluation by @dubin555 in #411
- feat(verl): add megatron-compatible batch padding and agentcore + verl math example by @luyuzhe111 in #463
- fix: verl transform robustness + NCCL dynamic batch sync patch by @listar2000 in #466
- feat(verl): propagate rollout log probs through transform pipeline by @luyuzhe111 in #467
- chore: add megatron dependency install script by @luyuzhe111 in #472
- fix: support verl 0.7.1 EngineWorker in agent_workflow_trainer by @yifannnwu in #474
- Fix: make sure agent import in init is lazy by @listar2000 in #479
- Fix #447: norm_adv_by_std_in_grpo should be from rllm.algorithm.norm_... by @JiwaniZakir in #471
- fix: update verl import paths for verl 0.7.1+ compatibility by @Lidang-Jiang in #480
- Support multiple MCP servers in MCPEnvironment by @taivu1998 in #476
- fix: migrate VerlBackend to new EngineWorker path (verl 0.7.1) by @listar2000 in #483
- fix: handle signal.signal ValueError in non-main threads by @yifannnwu in #484
- fix(trainer): supplement dfed770 by adding missing update_weights in … by @MarkJoson in #469
- feat: add hf_template tokenize_and_mask method + verl SFTTrainer compat for RLLMSFTDataset by @yifannnwu in #485
- fix: resolve CI failures — E501 lint, tinker test deps, disable Claude actions by @listar2000 in #486
- style: auto-format 21 files to fix ruff-format pre-commit failures by @listar2000 in #487
- Integrate fully async training to UnifiedTrainer by @kylemontgomery1 in #481
- fix(verl): disable vllm compile cache to work around corruption bug by @luyuzhe111 in #490
- Fix: Misc fixes for verl 0.7.1 by @kylemontgomery1 in #496
- fix(verl): move legacy worker override to launcher before worker class selection by @luyuzhe111 in #499
- chore(scripts): simplify and pin megatron dependencies b...
rLLM: v0.2.1.post1
What's Changed
- update docs & add curves by @thwu1 in #343
- Fix import for colorful_print in agent_sdk_engine.py and agent_sdk_trainer.py by @wht0703 in #345
- Unblock sdk installation by overriding dependices by @wht0703 in #348
- [Doc] Update README and fix a few installation related issues by @listar2000 in #347
- fix: keyerror completion_ids by @kxfan2002 in #353
- Fix: Enable GPU acceleration for dense retrieval in search agent by @Gitsamshi in #349
New Contributors
- @wht0703 made their first contribution in #345
- @kxfan2002 made their first contribution in #353
- @Gitsamshi made their first contribution in #349
Full Changelog: v0.2.1...v0.2.1.post1
rLLM: v0.2.1
rLLM v0.2.1: Tinker backend, VLM training, Eval Protocol, and SDK (preview)
We are excited to release rLLM v0.2.1. This new version comes with the following exciting features:
-
rLLM SDK (preview): The rLLM SDK enables you to transform agents written in frameworks such as LangGraph, SmolAgent, or Strands into trainable workflows. Check out this LangGraph RAG example, which builds a RAG agent and trains it with the rLLM SDK.
-
Tinker training backend: In addition to
verl, rLLM now supportsTinkeras a training backend. You can use the same abstractions for building agents and easily switch between different backends for training. -
VLM training: rLLM supports Vision-Language Model training with the
verlbackend. See the Geo3K training example for reference. -
LoRA fine-tuning: rLLM supports LoRA training in both the
verlandTinkerbackends. See the GSM8K LoRA example for how to enable LoRA training with a single config change. -
Eval Protocol Integration We integrate with the Eval Protocol from Fireworks AI. Users can now train on any environments supported by the Eval Protocol. See this example that uses Eval Protocol in rLLM to train a Frozenlake agent.
A big shoutout to @thwu1 @kylemontgomery1 @listar2000 @xzrderek for their outstanding work on these features.
What's Changed
- make rllm-specific configs applied correctly and robustly by @listar2000 in #256
- Ensure disable_thinking defaults to False when config is None by @Tendo33 in #258
- fix: circular import issues in WORKFLOW_CLASS_MAPPING by @listar2000 in #261
- [nightly] initialize the nightly branch by @listar2000 in #263
- Fix environment variable forwarding to ray runtime env by @listar2000 in #265
- [nightly] update recent changes on workflow engines by @listar2000 in #268
- Fix : Prevent KeyError in _pad_dataproto_to_world_size by @mananroongta in #274
- Fix retokenization by @thwu1 in #272
- fix controlling the n_parallel_agents and the concurrent env operations by @LianShuQuan in #271
- Added is_correct & reward flow through tool env by @mananroongta in #277
- Integrate Eval Protocol as RL environment by @1stprinciple in #276
- SWEEnv.from_dict() by @LianShuQuan in #278
- Fix: Resolve PyArrow nested data conversion error in distributed dataset loading by @erranlli in #281
- Per Episode Logging Feature by @qywu in #282
- [feature] Support Tinker as a backend by @thwu1 in #283
- [feat] Tinker Workflow Trainer by @thwu1 in #288
- Fix fireworks dependency by @listar2000 in #296
- Examples: fix utils import by @Flecart in #295
- [Refactor] Update Tinker Backend Example by @thwu1 in #300
- Revert "fix: Gracefully skip overlong prompts during training to prev… by @1stprinciple in #302
- Fixes #303 Optimize old_log_prob computation in PPO trainer by @BabelTower in #304
- Bug/n parallel agents by @kylemontgomery1 in #307
- [nightly] merge recent updates in main back to nightly by @listar2000 in #308
- Adding generic Eval Protocol environments to rLLM by @xzrderek in #306
- [feat] sdk by @thwu1 in #310
- Multimodal by @kylemontgomery1 in #315
- add rllm docs by @xzrderek in #312
- Fix import problem of megatron ray worker group by @listar2000 in #319
- Fix color print display issue by @listar2000 in #317
- [feat] Intergrate OpenTelemetry by @thwu1 in #320
- Remove unnecessary free_cache_engine checks. by @listar2000 in #324
- add vlm docs by @kylemontgomery1 in #326
- [feat] Importance Sampling by @thwu1 in #332
- Fix repetitive application id causing vLLM issue by @listar2000 in #334
- [feat] Add Langgraph Training Example, Fix bugs, Refactor Sdk by @thwu1 in #335
- Add Sdk Doc by @thwu1 in #339
- [feature] simplified deps by @kylemontgomery1 in #327
- Add gsm8k-lora script by @listar2000 in #342
- [v0.2.1] Merge nightly into main for rLLM v0.2.1 by @jeffreysijuntan in #341
New Contributors
- @Tendo33 made their first contribution in #258
- @thwu1 made their first contribution in #272
- @LianShuQuan made their first contribution in #271
- @qywu made their first contribution in #282
- @Flecart made their first contribution in #295
- @BabelTower made their first contribution in #304
- @xzrderek made their first contribution in #306
Full Changelog: v0.2.0...v0.2.1
rLLM: v0.2.0
rLLM v0.2: RL Training over General Agentic Programs (Blog Post)
We are excited to release rLLM v0.2, a major upgrade of our RL training framework. In v0.1, rLLM provided agent and OpenAI Gym-like environment abstractions to support training ReACT-style agents. In v0.2, we additionally introduce AgentWorkflowEngine and AgentWorkflowTrainer—more general abstractions that enable arbitrary agentic programs to be trained. Agent builders and researchers can now define multi-agent systems, complex workflows (e.g., solver-judge, planner executor, MCTS), and agentic programs with custom reward functions, and train them with reinforcement learning without rewriting their production code.
Key Features in v0.2
- Support the official
verl==0.5.0as training backend, no custom verl fork anymore!verl==0.5.0comes with support of the following features which are now supported in rLLM (@kylemontgomery1):- Megatron training support (@jeewoo-lee)
- SGLang as the rollout engine, in addition to vLLM.
- Introduce
AgentWorkflowEngine, which enables passing in arbitrary agentic programs for training. (@kylemontgomery1) - Support more agents and environments
- Terminus and TerminalBench (@JasonWei05)
- Tongyi DeepResearch agent (@yayashuxue)
- AppWorld and AppWorldReactAgent (@sunan135)
- Integration with other agentic framework/SDK
- Strands SDK from AWS
- SmolAgents
What's Changed
- fix <tool_calls_begin> variable by @wj-Mcat in #142
- Fix not registered license from code by @annyan09023 in #144
- fix r2egym import error; update installation README by @jeffreysijuntan in #146
- update deepscaler max_prompt_length to avoid exception during training by @jeffreysijuntan in #148
- fix(syntax): Resolve invalid escape sequence warnings by @tonyz0x0 in #154
- added Tools for SFT by @mananroongta in #160
- update docs by @jeffreysijuntan in #167
- Add dark mode to docs by @philippnormann in #168
- [FIX] Fix tool calling result parsing problem in tranjectory visualizer & MCP tool name fixing by @VincentXWD in #174
- [hotfix][miniwob] Fix gymnasium.error.NameNotFound by @abrohamLee in #172
- Load full DeepCoder dataset, instead of LCB subset by @mananroongta in #178
- [feat][docker] Installation with Docker by @abrohamLee in #177
- Add macOS compatibility: exclude GPU dependencies on darwin by @yayashuxue in #180
- Torch 2.7.0 only compatible with MacOS python=3.11 by @yayashuxue in #184
- Migrate to verl v0.5.0 by @kylemontgomery1 in #193
- Terminal Bench Integration into rLLM (Simplified) by @JasonWei05 in #205
- feat: Integrate Strands SDK with RLLM for scalable tool-enabled agent training by @yayashuxue in #206
- Add VimGolf agent training example by @James4Ever0 in #209
- fix: update search engine source data path by @noiji in #216
- [feature] Adding Megatron support for v0.2 by @jeewoo-lee in #221
- Use RolloutEngine for single_turn_workflow.py by @1stprinciple in #223
- Standalone inference: remove hard verl dependency by @JasonWei05 in #228
- Update pyproject.toml to v0.2.0 by @NIL-zhuang in #229
- proper handling the case that next_observation is empty dict by @erranlli in #233
- [v0.2] Add lazy import to fix circular import and ray init config support by @listar2000 in #236
- v0.2 verl patch by @kylemontgomery1 in #237
- v0.2 masking/parsing fix by @kylemontgomery1 in #238
- v0.2 rollout upgrade by @kylemontgomery1 in #241
- Feat: deepresearch integration by @yayashuxue in #215
- workflow updates by @kylemontgomery1 in #244
- added colab example of solver judge by @jeewoo-lee in #246
- v0.2 misc changes by @kylemontgomery1 in #245
- Add FireworksEngine for disaggregated rollout by @1stprinciple in #243
- AppWorld Integration for rLLM by @sunan135 in #235
- V0.2 by @jeffreysijuntan in #247
- update solver judge workflow by @kylemontgomery1 in #248
- update install instructions, update solver judge notebook by @kylemontgomery1 in #249
New Contributors
- @wj-Mcat made their first contribution in #142
- @annyan09023 made their first contribution in #144
- @tonyz0x0 made their first contribution in #154
- @mananroongta made their first contribution in #160
- @philippnormann made their first contribution in #168
- @VincentXWD made their first contribution in #174
- @abrohamLee made their first contribution in #172
- @yayashuxue made their first contribution in #180
- @kylemontgomery1 made their first contribution in #193
- @JasonWei05 made their first contribution in #205
- @James4Ever0 made their first contribution in #209
- @noiji made their first contribution in #216
- @jeewoo-lee made their first contribution in #221
- @1stprinciple made their first contribution in #223
- @NIL-zhuang made their first contribution in #229
- @erranlli made their first contribution in #233
- @listar2000 made their first contribution in #236
- @sunan135 made their first contribution in #235
Full Changelog: https://github.com/rllm-org/rllm/commits/v0.2.0