Skip to content

Releases: rllm-org/rllm

rLLM: v0.3.0-pre

30 Apr 07:22
0956764

Choose a tag to compare

What's Changed

  • [feature] add support for on policy distillation by @kylemontgomery1 in #356
  • [Bugfix] Make sure the tracking logger is now explicitly finished by trainers by @listar2000 in #358
  • Add Geo3K Tinker training example with VLM support by @BrianChen26 in #357
  • Support custom metric for SDK by @thwu1 in #362
  • fix: typo in bibtex by @tongyx361 in #363
  • Fireworks API returns raw token ids to avoid re-tokenization by @1stprinciple in #301
  • overlong_filter in stepwise RL by @LianShuQuan in #365
  • Verifiers and Prime Intellect Environment Integration & Example by @alt-glitch in #367
  • Add test_train.sh for docs build by @alt-glitch in #374
  • Resolve Tinker error in advantage computation by @listar2000 in #375
  • Merge nightly branch into main by @listar2000 in #377
  • [Fix] Add a few patches to unified trainer by @listar2000 in #383
  • [Doc improvements] Add the missing multi-GPU configuration options to the evaluation scripts in the quick-start documentation. by @SyncLionPaw in #384
  • [Fix] Further enhancements to the Unified Trainer by @listar2000 in #385
  • [Experimental] Fixing Tinker rollout prompt ids by @listar2000 in #390
  • Reset workflow with task and uid before workflow execution by @JasonWei05 in #389
  • Fully Async Trainer by @thwu1 in #394
  • Add LLM-in-Sandbox to awesome projects by @cdxeve in #395
  • 【Fix】All content in examples/search were empty by @Fulin-Gao in #396
  • [Feat] Important updates to the experimental unified trainer by @listar2000 in #398
  • FinQA Release by @mananroongta in #393
  • Refactor unified trainer async flow and add precomputed-advantage + Tinker rollout fixes by @listar2000 in #401
  • feat(tracking): integrate changes need for the rllm-ui by @Chanbinski in #402
  • Fixed the qwen parser to handle case when model is non-thinking but user put disable_thinking=false as config by @jeewoo-lee in #405
  • Enable TrajectorygGoup's role specific RL advantage estimator via unified trainer hook by @listar2000 in #408
  • Move Tinker backend to rllm.trainer.tinker and deprecate legacy trainer APIs under rllm.trainer.deprecated by @listar2000 in #409
  • Refactor RL advantage estimators and add REINFORCE++ baseline/RLOO support by @listar2000 in #410
  • Extend On-Policy Distillation support with unified trainer example by @BrianChen26 in #406
  • [Doc] Unified trainer docs by @listar2000 in #414
  • [Fix] Put doc images into assets by @listar2000 in #415
  • [Fix] Hot-fix advantage calcuilation unpacking issue by @listar2000 in #416
  • Add deprecation notices in docs by @listar2000 in #417
  • feat(tracking): UILogger with non-blocking background worker by @Chanbinski in #419
  • feat(rllm-model-gateway): add standalone proxy server for rLLM by @luyuzhe111 in #412
  • added modal deploy script and training code by @jeewoo-lee in #421
  • Add Claude Code GitHub Workflow by @jeffreysijuntan in #423
  • feat(cli): port non-blocking UILogger, simplify --ui flag, and support eval UI logging by @Chanbinski in #424
  • Add PR and issue templates by @listar2000 in #427
  • Fix ChatTemplateParser import in SFT trainer by @listar2000 in #428
  • feat(cli): add rllm login command for UI authentication by @Chanbinski in #425
  • Add unified trainer OPD docs page and training curve asset by @BrianChen26 in #430
  • fix(openai): respect custom ChatTemplateParser from AgentExecutionEngine by @rajatbeladiya in #420
  • fix(rewards): add sympy timeout to prevent Ray stalls by @rajatbeladiya in #433
  • fix(ray): auto-attach to existing cluster to avoid missing actors by @rajatbeladiya in #436
  • feat(model-gateway): separate worker base URL from API path prefix by @luyuzhe111 in #429
  • feat(ui): progressive batched uploads, session URL, and registration nudge by @Chanbinski in #440
  • [Feature]: rLLM CLI, AgentFlow Framework, Model Gateway & Plugin System by @jeffreysijuntan in #438
  • fix(metrics): include dropped workflow episodes in denominators by @rajatbeladiya in #442
  • docs: sync rllm-ui.md with latest README content by @Chanbinski in #444
  • [feature]: @rllm.rollout and @rllm.evaluator decorators + cookbook examples by @jeffreysijuntan in #445
  • add steps[i].chat_completions; token_warning_threshold from agent_args by @LianShuQuan in #446
  • feat(engine): replace TinkerBackendServer with in-process local handler by @luyuzhe111 in #448
  • Dev rllm telemetry by @boredbichon67 in #449
  • Pydantic Error by @avinashreddydev in #450
  • feat(tinker): add tool-use support for renderer path and fix checkpoint auto-resume by @luyuzhe111 in #451
  • Upgrade MiniMax provider to M2.7 models by @octo-patch in #452
  • feat: add cross-episode Store for sharing state across workflow instances by @listar2000 in #453
  • fix: replace bare except with except Exception in taco.py by @harshadkhetpal in #455
  • Add Vision-DeepResearch to awesome projects by @Osilly in #456
  • Verl 0.7.1 upgrade by @listar2000 in #457
  • feat(engine): add RemoteAgentFlowEngine for remote agent runtimes by @luyuzhe111 in #441
  • Further fixes to ensure compatibility with Verl 0.7.1 by @listar2000 in #462
  • Fix multiprocessing.Manager() server process leak in code reward evaluation by @dubin555 in #411
  • feat(verl): add megatron-compatible batch padding and agentcore + verl math example by @luyuzhe111 in #463
  • fix: verl transform robustness + NCCL dynamic batch sync patch by @listar2000 in #466
  • feat(verl): propagate rollout log probs through transform pipeline by @luyuzhe111 in #467
  • chore: add megatron dependency install script by @luyuzhe111 in #472
  • fix: support verl 0.7.1 EngineWorker in agent_workflow_trainer by @yifannnwu in #474
  • Fix: make sure agent import in init is lazy by @listar2000 in #479
  • Fix #447: norm_adv_by_std_in_grpo should be from rllm.algorithm.norm_... by @JiwaniZakir in #471
  • fix: update verl import paths for verl 0.7.1+ compatibility by @Lidang-Jiang in #480
  • Support multiple MCP servers in MCPEnvironment by @taivu1998 in #476
  • fix: migrate VerlBackend to new EngineWorker path (verl 0.7.1) by @listar2000 in #483
  • fix: handle signal.signal ValueError in non-main threads by @yifannnwu in #484
  • fix(trainer): supplement dfed770 by adding missing update_weights in … by @MarkJoson in #469
  • feat: add hf_template tokenize_and_mask method + verl SFTTrainer compat for RLLMSFTDataset by @yifannnwu in #485
  • fix: resolve CI failures — E501 lint, tinker test deps, disable Claude actions by @listar2000 in #486
  • style: auto-format 21 files to fix ruff-format pre-commit failures by @listar2000 in #487
  • Integrate fully async training to UnifiedTrainer by @kylemontgomery1 in #481
  • fix(verl): disable vllm compile cache to work around corruption bug by @luyuzhe111 in #490
  • Fix: Misc fixes for verl 0.7.1 by @kylemontgomery1 in #496
  • fix(verl): move legacy worker override to launcher before worker class selection by @luyuzhe111 in #499
  • chore(scripts): simplify and pin megatron dependencies b...
Read more

rLLM: v0.2.1.post1

18 Dec 23:51
618fa7d

Choose a tag to compare

What's Changed

  • update docs & add curves by @thwu1 in #343
  • Fix import for colorful_print in agent_sdk_engine.py and agent_sdk_trainer.py by @wht0703 in #345
  • Unblock sdk installation by overriding dependices by @wht0703 in #348
  • [Doc] Update README and fix a few installation related issues by @listar2000 in #347
  • fix: keyerror completion_ids by @kxfan2002 in #353
  • Fix: Enable GPU acceleration for dense retrieval in search agent by @Gitsamshi in #349

New Contributors

Full Changelog: v0.2.1...v0.2.1.post1

rLLM: v0.2.1

11 Dec 22:58
960d573

Choose a tag to compare

rLLM v0.2.1: Tinker backend, VLM training, Eval Protocol, and SDK (preview)

We are excited to release rLLM v0.2.1. This new version comes with the following exciting features:

  • rLLM SDK (preview): The rLLM SDK enables you to transform agents written in frameworks such as LangGraph, SmolAgent, or Strands into trainable workflows. Check out this LangGraph RAG example, which builds a RAG agent and trains it with the rLLM SDK.

  • Tinker training backend: In addition to verl, rLLM now supports Tinker as a training backend. You can use the same abstractions for building agents and easily switch between different backends for training.

  • VLM training: rLLM supports Vision-Language Model training with the verl backend. See the Geo3K training example for reference.

  • LoRA fine-tuning: rLLM supports LoRA training in both the verl and Tinker backends. See the GSM8K LoRA example for how to enable LoRA training with a single config change.

  • Eval Protocol Integration We integrate with the Eval Protocol from Fireworks AI. Users can now train on any environments supported by the Eval Protocol. See this example that uses Eval Protocol in rLLM to train a Frozenlake agent.

A big shoutout to @thwu1 @kylemontgomery1 @listar2000 @xzrderek for their outstanding work on these features.

What's Changed

New Contributors

Full Changelog: v0.2.0...v0.2.1

rLLM: v0.2.0

16 Oct 21:24
52efedc

Choose a tag to compare

rLLM v0.2: RL Training over General Agentic Programs (Blog Post)

We are excited to release rLLM v0.2, a major upgrade of our RL training framework. In v0.1, rLLM provided agent and OpenAI Gym-like environment abstractions to support training ReACT-style agents. In v0.2, we additionally introduce AgentWorkflowEngine and AgentWorkflowTrainer—more general abstractions that enable arbitrary agentic programs to be trained. Agent builders and researchers can now define multi-agent systems, complex workflows (e.g., solver-judge, planner executor, MCTS), and agentic programs with custom reward functions, and train them with reinforcement learning without rewriting their production code.

Key Features in v0.2

  1. Support the official verl==0.5.0 as training backend, no custom verl fork anymore! verl==0.5.0 comes with support of the following features which are now supported in rLLM (@kylemontgomery1):
    • Megatron training support (@jeewoo-lee)
    • SGLang as the rollout engine, in addition to vLLM.
  2. Introduce AgentWorkflowEngine, which enables passing in arbitrary agentic programs for training. (@kylemontgomery1)
  3. Support more agents and environments
  4. Integration with other agentic framework/SDK
    • Strands SDK from AWS
    • SmolAgents

What's Changed

New Contributors

Full Changelog: https://github.com/rllm-org/rllm/commits/v0.2.0