Fully Async Trainer by thwu1 · Pull Request #394 · rllm-org/rllm

thwu1 · 2026-02-11T21:36:09Z

No description provided.

Copilot

Pull request overview

Adds a new experimental “fully async” PPO training pipeline that decouples rollout generation from training via a Ray-based message queue, plus parameter sync/validation plumbing and a DeepResearch example stack.

Changes:

Introduce rllm.experimental.fully_async components: rollouter, trainer, message queue, param sync, HTTP client, metrics + batching utilities.
Add docs/config/scripts for installing dependencies and applying required verl patches.
Add a runnable DeepResearch example (tooling, refine service client, RAG server, and launch scripts).

Reviewed changes

Copilot reviewed 27 out of 28 changed files in this pull request and generated 25 comments.

Show a summary per file

File	Description
rllm/experimental/fully_async/README.md	Documents architecture/installation/patch application for fully-async mode.
rllm/experimental/fully_async/client.py	Async HTTP rollout client with abort/continue support and chat-completions wrapper.
rllm/experimental/fully_async/config/init.py	Marks config package for Hydra discovery.
rllm/experimental/fully_async/config/fully_async_ppo_trainer.yaml	Example Hydra config for fully-async PPO trainer + rollout settings.
rllm/experimental/fully_async/fully_async_trainer.py	Trainer consuming samples from MQ, running PPO updates, triggering param sync, logging/ckpt.
rllm/experimental/fully_async/inference_manager.py	Manages SGLang server workers/router + cache clearing for async rollouts.
rllm/experimental/fully_async/install_vllm_sglang_mcore_updated_sglang.sh	Convenience install script for inference/training dependencies.
rllm/experimental/fully_async/message_queue.py	Ray-actor queue between rollouter and trainer (+ client wrapper).
rllm/experimental/fully_async/message_utils.py	Converts model token outputs into OpenAI message/tool-call structures.
rllm/experimental/fully_async/metric_utils.py	Step-wise metrics aggregation + validation metrics container.
rllm/experimental/fully_async/param_sync.py	Unified parameter sync actor coordinating pause/clear-cache/sync/resume/validation.
rllm/experimental/fully_async/protocol.py	Dataclasses for streamed outputs, sequences, trajectories, and trajectory groups.
rllm/experimental/fully_async/rollout_executor.py	Async rollouter that generates trajectories concurrently and drains to MQ; runs validation.
rllm/experimental/fully_async/runner.py	Entry wiring: starts inference manager, rollouter, trainer, MQ, and synchronizer.
rllm/experimental/fully_async/utils.py	Batch assembly into `DataProto`, rejection sampling, checkpoint helpers, metric reduction, HTTP helpers.
rllm/experimental/fully_async/verl_dp_actor.patch	Patch file for upstream `verl` actor behavior required by fully-async training.
rllm/experimental/fully_async/verl_patch.md	Describes the required upstream `verl` patch intent and how to apply.
examples/fully_async/deepresearch/config/8b_stale05_rs.sh	Example launch configuration for DeepResearch training.
examples/fully_async/deepresearch/data/prepare_browsecomp_plus.py	Dataset prep/decrypt script and DatasetRegistry registration.
examples/fully_async/deepresearch/data/prepare_cut_the_bill.py	DatasetRegistry registration helper for a custom dataset.
examples/fully_async/deepresearch/rag/launch_rag.sh	Launch helper for the RAG server with batching/sharding knobs.
examples/fully_async/deepresearch/rag/rag_server.py	FastAPI retrieval server with GPU sharding + request auto-batching.
examples/fully_async/deepresearch/refine_agent.py	Refine-service client with multi-endpoint load balancing + stats.
examples/fully_async/deepresearch/scripts/launch_refine.sh	Launch helper for multi-GPU vLLM refine servers.
examples/fully_async/deepresearch/search_agent.py	Search agent performing tool calls, refinement, and reward computation.
examples/fully_async/deepresearch/tool.py	Async local retrieval tool with client-side failover/load balancing.
examples/fully_async/deepresearch/train.py	Hydra entry point wiring DeepResearch rollout functions into AsyncAgentTrainer.
examples/fully_async/deepresearch/util.py	Helpers to normalize messages into simple dict format.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-11T21:48:28Z

+
+        try:
+            iteration = 0
+            while self.global_steps < self.total_rollout_steps:


Off-by-one in rollout termination: with global_steps initialized to 1, while self.global_steps < self.total_rollout_steps will run only total_rollout_steps - 1 iterations (since global_steps is incremented once per datum). If total_rollout_steps is meant to be an exact count, adjust the initialization/condition (e.g., start at 0 or use <=).

Suggested change

while self.global_steps < self.total_rollout_steps:

while self.global_steps <= self.total_rollout_steps:

Copilot · 2026-02-11T21:48:28Z

+        results = await asyncio.gather(*[run_one(batch[0]) for batch in self.val_dataloader])
+


asyncio.gather(*[run_one(...) for batch in self.val_dataloader]) creates one coroutine per validation sample up front. Even with the semaphore, this can be a large memory/overhead spike for big validation sets. Prefer a bounded-concurrency/streaming pattern (e.g., asyncio.as_completed over a limited task set).

Suggested change

results = await asyncio.gather(*[run_one(batch[0]) for batch in self.val_dataloader])

# Stream tasks with bounded concurrency instead of creating all coroutines up front.

results = []

concurrency_limit = 1024

pending = set()

for batch in self.val_dataloader:

task = asyncio.create_task(run_one(batch[0]))

pending.add(task)

if len(pending) >= concurrency_limit:

done, pending = await asyncio.wait(

pending, return_when=asyncio.FIRST_COMPLETED

)

for d in done:

results.append(d.result())

if pending:

done, _ = await asyncio.wait(pending)

for d in done:

results.append(d.result())

Copilot · 2026-02-11T21:48:29Z

+## dp_actor_functional_changes.patch
+
+**File:** `verl/workers/actor/dp_actor.py`
+
+**Purpose:** Modifications to `DataParallelPPOActor.update_actor()` for token-mean loss scaling and single mini-batch enforcement.


This doc section header and apply instructions refer to dp_actor_functional_changes.patch, but the repository includes verl_dp_actor.patch (and the README instructs applying verl_dp_actor.patch). Align the filename references here to avoid users applying the wrong/missing patch.

Copilot · 2026-02-11T21:48:29Z

+from rllm.experimental.fully_async.rollout_executor import RolloutExecutor
+from rllm.experimental.fully_async.utils import calculate_max_concurrency
+
+


Unused imports: ResourcePoolManager and need_reference_policy are imported but never referenced in this module. Removing them avoids lint failures and keeps dependencies clearer.

Copilot · 2026-02-11T21:48:29Z

+                group = TrajectoryGroup(trajectories=[res for res in self.result_dict[idx] if res is not None])
+                serialized = ray.cloudpickle.dumps(group)
+                await self.trajectory_group_queue.put(serialized)
+                del self.result_dict[idx]
+                self.active_sample -= 1
+                self.enqueued_sample += 1


If all n rollouts for an index fail (all results are None), this still enqueues a TrajectoryGroup with trajectories=[]. Downstream batch assembly/training is likely to fail or produce empty batches. Consider dropping empty groups (and tracking them as dropped) rather than enqueueing them.

Suggested change

group = TrajectoryGroup(trajectories=[res for res in self.result_dict[idx] if res is not None])

serialized = ray.cloudpickle.dumps(group)

await self.trajectory_group_queue.put(serialized)

del self.result_dict[idx]

self.active_sample -= 1

self.enqueued_sample += 1

trajectories = [res for res in self.result_dict[idx] if res is not None]

if trajectories:

group = TrajectoryGroup(trajectories=trajectories)

serialized = ray.cloudpickle.dumps(group)

await self.trajectory_group_queue.put(serialized)

self.enqueued_sample += 1

else:

# All rollouts for this index failed; drop this sample instead of enqueuing an empty group.

self.dropped_samples += 1

del self.result_dict[idx]

self.active_sample -= 1

Copilot · 2026-02-11T21:48:33Z

+                # Wait for either timeout or batch_event (triggered when queue is full)
+                try:
+                    await asyncio.wait_for(self.batch_event.wait(), timeout=self.batch_timeout)
+                except asyncio.TimeoutError:


'except' clause does nothing but pass and there is no explanatory comment.

Copilot · 2026-02-11T21:48:34Z

+            except asyncio.CancelledError:
+                pass
+            drain_task.cancel()
+            try:
+                await drain_task
+            except asyncio.CancelledError:


'except' clause does nothing but pass and there is no explanatory comment.

Suggested change

except asyncio.CancelledError:

pass

drain_task.cancel()

try:

await drain_task

except asyncio.CancelledError:

except asyncio.CancelledError:

# Task cancellation is expected during cleanup; safely ignore.

pass

drain_task.cancel()

try:

await drain_task

except asyncio.CancelledError:

# Task cancellation is expected during cleanup; safely ignore.

Copilot · 2026-02-11T21:48:34Z

+            except asyncio.CancelledError:
+                pass
+            drain_task.cancel()
+            try:
+                await drain_task
+            except asyncio.CancelledError:


'except' clause does nothing but pass and there is no explanatory comment.

Suggested change

except asyncio.CancelledError:

pass

drain_task.cancel()

try:

await drain_task

except asyncio.CancelledError:

except asyncio.CancelledError:

# Task was cancelled as part of normal shutdown; ignore.

pass

drain_task.cancel()

try:

await drain_task

except asyncio.CancelledError:

# Task was cancelled as part of normal shutdown; ignore.

Copilot · 2026-02-11T21:48:34Z

+                                duplicate_search_detected = True
+                                break
+                            executed_search_calls.add(call_key)
+                        except (KeyError, TypeError):


'except' clause does nothing but pass and there is no explanatory comment.

Copilot · 2026-02-11T21:48:34Z

+    # base_url = "http://localhost:30001"
+    api_key = ""
+else:
+    base_url = "http://localhost:30001"


This statement is unreachable.

thwu1 added 8 commits February 1, 2026 21:33

init commit

32fd6fa

refactor

65147d4

update

dbb00c6

stream rollout & betterlogging

42dd374

support rejection sample & gradient accumulation experimental

3af384d

clean up & add patch to verl

de80220

fix path

b5b9760

add example

9bdd595

Copilot AI review requested due to automatic review settings February 11, 2026 21:36

Copilot started reviewing on behalf of thwu1 February 11, 2026 21:36 View session

thwu1 added 2 commits February 11, 2026 13:39

fix pre commit

a967040

fix precommit

8c94140

thwu1 merged commit 9221f80 into main Feb 11, 2026
1 check passed

Copilot AI reviewed Feb 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fully Async Trainer#394

Fully Async Trainer#394
thwu1 merged 10 commits into
mainfrom
fully_async

thwu1 commented Feb 11, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 11, 2026

Uh oh!

Copilot AI Feb 11, 2026

Uh oh!

Copilot AI Feb 11, 2026

Uh oh!

Copilot AI Feb 11, 2026

Uh oh!

Copilot AI Feb 11, 2026

Uh oh!

Copilot AI Feb 11, 2026

Uh oh!

Copilot AI Feb 11, 2026

Uh oh!

Copilot AI Feb 11, 2026

Uh oh!

Copilot AI Feb 11, 2026

Uh oh!

Copilot AI Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	while self.global_steps < self.total_rollout_steps:
	while self.global_steps <= self.total_rollout_steps:

		results = await asyncio.gather(*[run_one(batch[0]) for batch in self.val_dataloader])

-        results = await asyncio.gather(*[run_one(batch[0]) for batch in self.val_dataloader])
+        # Stream tasks with bounded concurrency instead of creating all coroutines up front.
+        results = []
+        concurrency_limit = 1024
+        pending = set()
+        for batch in self.val_dataloader:
+            task = asyncio.create_task(run_one(batch[0]))
+            pending.add(task)
+            if len(pending) >= concurrency_limit:
+                done, pending = await asyncio.wait(
+                    pending, return_when=asyncio.FIRST_COMPLETED
+                )
+                for d in done:
+                    results.append(d.result())
+        if pending:
+            done, _ = await asyncio.wait(pending)
+            for d in done:
+                results.append(d.result())

		from rllm.experimental.fully_async.rollout_executor import RolloutExecutor
		from rllm.experimental.fully_async.utils import calculate_max_concurrency

Conversation

thwu1 commented Feb 11, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants