Skip to content

feat: Integrate Strands SDK with RLLM for scalable tool-enabled agent training#206

Merged
jeffreysijuntan merged 18 commits into
rllm-org:v0.2from
yayashuxue:feature/strands-toolcall
Sep 11, 2025
Merged

feat: Integrate Strands SDK with RLLM for scalable tool-enabled agent training#206
jeffreysijuntan merged 18 commits into
rllm-org:v0.2from
yayashuxue:feature/strands-toolcall

Conversation

@yayashuxue
Copy link
Copy Markdown
Contributor

@yayashuxue yayashuxue commented Sep 2, 2025

Overview

Integrates Strands SDK with RLLM for tool-enabled agent training. Uses hybrid architecture that maintains Strands compliance while enabling complete RL trajectory tracking.

Key Features

  • Policy Alignment: Same model used for sampling and training (eliminates off-policy risk)
  • Zero State Contamination: Fresh model instances for parallel processing
  • Complete Trajectories: Full RL-ready data including tool calls
  • GAIA Evaluation: Complete benchmark evaluation framework with multi-tool support

Architecture

  • Model Layer: RLLMModel handles RolloutEngine interaction and StreamEvent generation
  • Agent Layer: StrandsAgent handles tool execution, trajectory tracking, and event loop
  • Clear Separation: Model doesn't execute tools, Agent doesn't call RolloutEngine directly

Files Changed

  • rllm/integrations/strands.py - Core integration implementation
  • examples/strands/ - Example usage and GAIA evaluation framework

Testing

  • Tool execution works correctly
  • GAIA benchmark evaluation with comprehensive metrics
  • Zero state contamination in parallel processing

Breaking Changes

None - pure addition that doesn't affect existing functionality.

Dependencies

  • strands-agents/sdk-python (already in requirements)
  • Existing RLLM dependencies

For detailed documentation, examples, and usage instructions, see examples/strands/README.md

@yayashuxue yayashuxue changed the base branch from main to v0.2 September 2, 2025 21:49
@yayashuxue yayashuxue force-pushed the feature/strands-toolcall branch 2 times, most recently from 4c64fad to 25c46dc Compare September 2, 2025 22:32
@yayashuxue
Copy link
Copy Markdown
Contributor Author

yayashuxue commented Sep 2, 2025

Plan after the merge:

  • gaia eval
  • abstracting to see if can apply to other SDK
  • documentation

Remove .cursor/rules/always-use-conda-activate-rllm.mdc from tracking and add to gitignore

file clean up before pr merge
@yayashuxue yayashuxue force-pushed the feature/strands-toolcall branch from 25c46dc to 6871ca0 Compare September 2, 2025 22:43
yayashuxue and others added 9 commits September 3, 2025 18:07
…l execution

This commit implements a hybrid architecture that maintains Strands SDK compliance
while enabling complete RL trajectory tracking for tool calls.

## Key Changes

### Architecture Refactor
- **RLLMModel**: Simplified from 672 to 394 lines (~41% reduction)
  - Removed complex tool execution loop (now handled by Strands event loop)
  - Added tool call info passing to Agent for trajectory tracking
  - Generates standard StreamEvents for Strands compliance
  - Only interacts with RolloutEngine (clean separation)

- **StrandsAgent**: Enhanced trajectory tracking
  - Records tool call information passed from Model layer
  - Maintains standard Agent interface and behavior
  - Improved chat_completions property to filter empty messages
  - Added hybrid tool call recording format

### Tool Call Flow
1. Model generates toolUse StreamEvents + notifies Agent
2. Strands event loop handles actual tool execution
3. Agent receives tool info and embeds in trajectory
4. Complete RL-ready trajectory with tool details preserved

### Trajectory Format
- New `tool_calls` format for multiple tool calls per step
- Backward compatibility with legacy `tool_call` format
- Enhanced statistics tracking for both formats
- Improved JSON output with detailed tool call information

### Documentation
- Created comprehensive README.md with architecture overview
- Removed outdated Chinese and verbose documentation files
- Added practical usage examples and configuration guide
- Documented hybrid design principles and benefits

## Benefits
- ✅ Policy alignment: Same model for sampling and training
- ✅ Architecture compliance: Follows Strands SDK standards
- ✅ Complete trajectories: Full RL data including tool calls
- ✅ Code simplicity: Clear separation of concerns
- ✅ Maintainability: Reduced complexity, improved structure

## Testing
- Verified tool execution works correctly
- Confirmed trajectory recording includes tool call details
- Validated backward compatibility with existing formats
- All tool formats supported (calculator, http_request, file_read, etc.)

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
…ro state contamination

Key achievements:
- Add workflow pooling support via AgentWorkflowEngine (n_parallel_tasks=8)
- Fix state contamination by implementing proper reset() methods
- Clean up codebase: 25% reduction across GAIA evaluator and workflow files
- Maintain complete functionality while enabling scalable batch processing

Technical changes:
- Add StrandsAgent.reset() for pooling compatibility
- Override StrandsWorkflow.reset() to create fresh agents per task
- Simplify GAIA evaluator and remove unused methods
- Enable seamless scaling from single task to 1000+ parallel tasks

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Remove unused variables: results, trajectory_file, tokenizer, has_tool_use
- Fix import ordering and code formatting per ruff standards
- Clean up type annotations to use modern Python 3.10+ syntax

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@yayashuxue yayashuxue force-pushed the feature/strands-toolcall branch from dd7961c to 775d0cd Compare September 6, 2025 00:04
- Remove gaia.json dataset (users should download locally via scripts)
- Remove custom calculator_tool.py (use strands_tools.calculator instead)
- Remove backup files and output files (following clean PR practices)
- Standardize calculator usage across all examples

Aligns with PR rllm-org#205 clean practices - only essential code in repo.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@yayashuxue yayashuxue force-pushed the feature/strands-toolcall branch from 775d0cd to d7c92c6 Compare September 6, 2025 00:15
@yayashuxue
Copy link
Copy Markdown
Contributor Author

@jeffreysijuntan should be good to merge :)

@jeffreysijuntan jeffreysijuntan merged commit c4f50a6 into rllm-org:v0.2 Sep 11, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants