feat: Integrate Strands SDK with RLLM for scalable tool-enabled agent training#206
Merged
jeffreysijuntan merged 18 commits intoSep 11, 2025
Merged
Conversation
…ower level integration.
… Strands SDK <-> model/tool with (RLLModel replacement), achieving a generalizable solution
…m strands, irrelevant to our code)
4c64fad to
25c46dc
Compare
Contributor
Author
|
Plan after the merge:
|
Remove .cursor/rules/always-use-conda-activate-rllm.mdc from tracking and add to gitignore file clean up before pr merge
25c46dc to
6871ca0
Compare
…l execution This commit implements a hybrid architecture that maintains Strands SDK compliance while enabling complete RL trajectory tracking for tool calls. ## Key Changes ### Architecture Refactor - **RLLMModel**: Simplified from 672 to 394 lines (~41% reduction) - Removed complex tool execution loop (now handled by Strands event loop) - Added tool call info passing to Agent for trajectory tracking - Generates standard StreamEvents for Strands compliance - Only interacts with RolloutEngine (clean separation) - **StrandsAgent**: Enhanced trajectory tracking - Records tool call information passed from Model layer - Maintains standard Agent interface and behavior - Improved chat_completions property to filter empty messages - Added hybrid tool call recording format ### Tool Call Flow 1. Model generates toolUse StreamEvents + notifies Agent 2. Strands event loop handles actual tool execution 3. Agent receives tool info and embeds in trajectory 4. Complete RL-ready trajectory with tool details preserved ### Trajectory Format - New `tool_calls` format for multiple tool calls per step - Backward compatibility with legacy `tool_call` format - Enhanced statistics tracking for both formats - Improved JSON output with detailed tool call information ### Documentation - Created comprehensive README.md with architecture overview - Removed outdated Chinese and verbose documentation files - Added practical usage examples and configuration guide - Documented hybrid design principles and benefits ## Benefits - ✅ Policy alignment: Same model for sampling and training - ✅ Architecture compliance: Follows Strands SDK standards - ✅ Complete trajectories: Full RL data including tool calls - ✅ Code simplicity: Clear separation of concerns - ✅ Maintainability: Reduced complexity, improved structure ## Testing - Verified tool execution works correctly - Confirmed trajectory recording includes tool call details - Validated backward compatibility with existing formats - All tool formats supported (calculator, http_request, file_read, etc.) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
…ro state contamination Key achievements: - Add workflow pooling support via AgentWorkflowEngine (n_parallel_tasks=8) - Fix state contamination by implementing proper reset() methods - Clean up codebase: 25% reduction across GAIA evaluator and workflow files - Maintain complete functionality while enabling scalable batch processing Technical changes: - Add StrandsAgent.reset() for pooling compatibility - Override StrandsWorkflow.reset() to create fresh agents per task - Simplify GAIA evaluator and remove unused methods - Enable seamless scaling from single task to 1000+ parallel tasks 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
- Remove unused variables: results, trajectory_file, tokenizer, has_tool_use - Fix import ordering and code formatting per ruff standards - Clean up type annotations to use modern Python 3.10+ syntax 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
dd7961c to
775d0cd
Compare
- Remove gaia.json dataset (users should download locally via scripts) - Remove custom calculator_tool.py (use strands_tools.calculator instead) - Remove backup files and output files (following clean PR practices) - Standardize calculator usage across all examples Aligns with PR rllm-org#205 clean practices - only essential code in repo. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
775d0cd to
d7c92c6
Compare
Contributor
Author
|
@jeffreysijuntan should be good to merge :) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
Integrates Strands SDK with RLLM for tool-enabled agent training. Uses hybrid architecture that maintains Strands compliance while enabling complete RL trajectory tracking.
Key Features
Architecture
Files Changed
rllm/integrations/strands.py- Core integration implementationexamples/strands/- Example usage and GAIA evaluation frameworkTesting
Breaking Changes
None - pure addition that doesn't affect existing functionality.
Dependencies
strands-agents/sdk-python(already in requirements)For detailed documentation, examples, and usage instructions, see
examples/strands/README.md