feat: Integrate Strands SDK with RLLM for scalable tool-enabled agent training by yayashuxue · Pull Request #206 · rllm-org/rllm

yayashuxue · 2025-09-02T00:28:50Z

Overview

Integrates Strands SDK with RLLM for tool-enabled agent training. Uses hybrid architecture that maintains Strands compliance while enabling complete RL trajectory tracking.

Key Features

Policy Alignment: Same model used for sampling and training (eliminates off-policy risk)
Zero State Contamination: Fresh model instances for parallel processing
Complete Trajectories: Full RL-ready data including tool calls
GAIA Evaluation: Complete benchmark evaluation framework with multi-tool support

Architecture

Model Layer: RLLMModel handles RolloutEngine interaction and StreamEvent generation
Agent Layer: StrandsAgent handles tool execution, trajectory tracking, and event loop
Clear Separation: Model doesn't execute tools, Agent doesn't call RolloutEngine directly

Files Changed

rllm/integrations/strands.py - Core integration implementation
examples/strands/ - Example usage and GAIA evaluation framework

Testing

Tool execution works correctly
GAIA benchmark evaluation with comprehensive metrics
Zero state contamination in parallel processing

Breaking Changes

None - pure addition that doesn't affect existing functionality.

Dependencies

strands-agents/sdk-python (already in requirements)
Existing RLLM dependencies

For detailed documentation, examples, and usage instructions, see examples/strands/README.md

…ower level integration.

… Strands SDK <-> model/tool with (RLLModel replacement), achieving a generalizable solution

…m strands, irrelevant to our code)

yayashuxue · 2025-09-02T22:43:01Z

Plan after the merge:

gaia eval
abstracting to see if can apply to other SDK
documentation

Remove .cursor/rules/always-use-conda-activate-rllm.mdc from tracking and add to gitignore file clean up before pr merge

…l execution This commit implements a hybrid architecture that maintains Strands SDK compliance while enabling complete RL trajectory tracking for tool calls. ## Key Changes ### Architecture Refactor - **RLLMModel**: Simplified from 672 to 394 lines (~41% reduction) - Removed complex tool execution loop (now handled by Strands event loop) - Added tool call info passing to Agent for trajectory tracking - Generates standard StreamEvents for Strands compliance - Only interacts with RolloutEngine (clean separation) - **StrandsAgent**: Enhanced trajectory tracking - Records tool call information passed from Model layer - Maintains standard Agent interface and behavior - Improved chat_completions property to filter empty messages - Added hybrid tool call recording format ### Tool Call Flow 1. Model generates toolUse StreamEvents + notifies Agent 2. Strands event loop handles actual tool execution 3. Agent receives tool info and embeds in trajectory 4. Complete RL-ready trajectory with tool details preserved ### Trajectory Format - New `tool_calls` format for multiple tool calls per step - Backward compatibility with legacy `tool_call` format - Enhanced statistics tracking for both formats - Improved JSON output with detailed tool call information ### Documentation - Created comprehensive README.md with architecture overview - Removed outdated Chinese and verbose documentation files - Added practical usage examples and configuration guide - Documented hybrid design principles and benefits ## Benefits - ✅ Policy alignment: Same model for sampling and training - ✅ Architecture compliance: Follows Strands SDK standards - ✅ Complete trajectories: Full RL data including tool calls - ✅ Code simplicity: Clear separation of concerns - ✅ Maintainability: Reduced complexity, improved structure ## Testing - Verified tool execution works correctly - Confirmed trajectory recording includes tool call details - Validated backward compatibility with existing formats - All tool formats supported (calculator, http_request, file_read, etc.) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

…ro state contamination Key achievements: - Add workflow pooling support via AgentWorkflowEngine (n_parallel_tasks=8) - Fix state contamination by implementing proper reset() methods - Clean up codebase: 25% reduction across GAIA evaluator and workflow files - Maintain complete functionality while enabling scalable batch processing Technical changes: - Add StrandsAgent.reset() for pooling compatibility - Override StrandsWorkflow.reset() to create fresh agents per task - Simplify GAIA evaluator and remove unused methods - Enable seamless scaling from single task to 1000+ parallel tasks 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

- Remove unused variables: results, trajectory_file, tokenizer, has_tool_use - Fix import ordering and code formatting per ruff standards - Clean up type annotations to use modern Python 3.10+ syntax 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

- Remove gaia.json dataset (users should download locally via scripts) - Remove custom calculator_tool.py (use strands_tools.calculator instead) - Remove backup files and output files (following clean PR practices) - Standardize calculator usage across all examples Aligns with PR rllm-org#205 clean practices - only essential code in repo. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

yayashuxue · 2025-09-06T00:33:53Z

@jeffreysijuntan should be good to merge :)

yayashuxue added 7 commits August 27, 2025 13:01

[init] strands integration on agent framework level, different from l…

d537c33

…ower level integration.

[WIP] aligning the strands agent framework to rllm

c89230e

Align the episode with the strands workflow

0565c34

Working Strands SDK + AgentWorkflow Integration without breaking into…

a5b1b5c

… Strands SDK <-> model/tool with (RLLModel replacement), achieving a generalizable solution

[wip]

395bb97

working strands workflow with tool call

5e652f8

remove debug logging, disable strands' opentelemetry error (error fro…

48d9915

…m strands, irrelevant to our code)

yayashuxue changed the base branch from main to v0.2 September 2, 2025 21:49

yayashuxue force-pushed the feature/strands-toolcall branch 2 times, most recently from 4c64fad to 25c46dc Compare September 2, 2025 22:32

clean up before PR merge

6871ca0

Remove .cursor/rules/always-use-conda-activate-rllm.mdc from tracking and add to gitignore file clean up before pr merge

yayashuxue force-pushed the feature/strands-toolcall branch from 25c46dc to 6871ca0 Compare September 2, 2025 22:43

yayashuxue and others added 9 commits September 3, 2025 18:07

[wip] gaia

61ae821

[wip] change to workflow

dbace12

[wip] tool_call bug fixed but seems too complicated. need to refactor

ffd6c05

gaia works, but the context/state seems to be contaminate between tasks.

abb761b

gaia working without context comtamination.

b235a87

gaia + agentworkflow complete

d24626a

yayashuxue force-pushed the feature/strands-toolcall branch from dd7961c to 775d0cd Compare September 6, 2025 00:04

yayashuxue force-pushed the feature/strands-toolcall branch from 775d0cd to d7c92c6 Compare September 6, 2025 00:15

jeffreysijuntan merged commit c4f50a6 into rllm-org:v0.2 Sep 11, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Integrate Strands SDK with RLLM for scalable tool-enabled agent training#206

feat: Integrate Strands SDK with RLLM for scalable tool-enabled agent training#206
jeffreysijuntan merged 18 commits into
rllm-org:v0.2from
yayashuxue:feature/strands-toolcall

yayashuxue commented Sep 2, 2025 •

edited

Loading

Uh oh!

yayashuxue commented Sep 2, 2025 •

edited

Loading

Uh oh!

yayashuxue commented Sep 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yayashuxue commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Key Features

Architecture

Files Changed

Testing

Breaking Changes

Dependencies

Uh oh!

yayashuxue commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yayashuxue commented Sep 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yayashuxue commented Sep 2, 2025 •

edited

Loading

yayashuxue commented Sep 2, 2025 •

edited

Loading