fix(backend/copilot): raise baseline tool-round limit to 100 + graceful finish hint#12892
Conversation
…ul finish hint The old `_MAX_TOOL_ROUNDS = 30` cap cuts off legitimate long-running agent turns (multi-bug fixes, feature implementations via bash_exec sequences) mid- tool with no summary, which surfaces as "Exceeded 30 tool-call rounds without a final response" and looks to the user like the assistant "lost" its work. - Bump the limit to 100 — long agent turns routinely exceed 30 rounds. - Pass `last_iteration_message` so round N gets a "stop calling tools, wrap up" hint and the user always gets a final summary instead of a cold cutoff.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughThe tool-call loop and Claude SDK now use Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant BaselineService
participant Model
participant Tool
Client->>BaselineService: Submit request (uses ChatConfig)
BaselineService->>Model: Initial prompt (tools allowed)
loop up to config.agent_max_turns
Model->>BaselineService: Proposes tool call
BaselineService->>Tool: Execute tool
Tool-->>BaselineService: Tool result
BaselineService->>Model: Provide tool result (next iteration)
end
alt final iteration
BaselineService->>Model: Append last_iteration_message and call with no tools (force text)
end
Model-->>Client: Final text response
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🔍 PR Overlap DetectionThis check compares your PR against all other open PRs targeting the same branch to detect potential merge conflicts early. 🔴 Merge Conflicts DetectedThe following PRs have been tested and will have merge conflicts if merged after this PR. Consider coordinating with the authors.
🟢 Low Risk — File Overlap OnlyThese PRs touch the same files but different sections (click to expand)
Summary: 2 conflict(s), 0 medium risk, 6 low risk (out of 8 PRs with file overlap) Auto-generated on push. Ignores: |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
autogpt_platform/backend/backend/copilot/baseline/service.py (1)
125-133: LGTM — sensible baseline bump with graceful wrap-up hint.Raising
_MAX_TOOL_ROUNDSto 100 aligns with the observed production traffic (multi-step turns legitimately exceeding 30), and the_LAST_ITERATION_HINTwired throughlast_iteration_messageneatly converts the hard cut-off into a user-visible summary on the final round. The existing error path at lines 1906–1914 interpolates_MAX_TOOL_ROUNDSso the user-facing message auto-updates.One minor consideration: with the ceiling 3.3× higher, a runaway tool-loop now burns up to ~3.3× more tokens/cost before the guard fires. If not already covered, consider a per-turn cost/token budget (or dashboards/alerts on turns crossing, say, 50 rounds) so the frontend retry cascade follow-up has signal to act on. No change required for this PR.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@autogpt_platform/backend/backend/copilot/baseline/service.py` around lines 125 - 133, Although this PR raises _MAX_TOOL_ROUNDS to 100 and adds _LAST_ITERATION_HINT appropriately, add lightweight monitoring and an optional per-turn budget enforcement: emit an event/metric when a turn's tool round count exceeds a threshold (e.g., 50) and log the current round count, token usage and request id so dashboards/alerts can act; optionally enforce a soft cap by checking _MAX_TOOL_ROUNDS and current round in the same code path that uses last_iteration_message (e.g., where last_iteration_message is constructed/used) to abort/trim execution if cumulative token usage exceeds a configured per-turn token_budget.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@autogpt_platform/backend/backend/copilot/baseline/service.py`:
- Around line 125-133: Although this PR raises _MAX_TOOL_ROUNDS to 100 and adds
_LAST_ITERATION_HINT appropriately, add lightweight monitoring and an optional
per-turn budget enforcement: emit an event/metric when a turn's tool round count
exceeds a threshold (e.g., 50) and log the current round count, token usage and
request id so dashboards/alerts can act; optionally enforce a soft cap by
checking _MAX_TOOL_ROUNDS and current round in the same code path that uses
last_iteration_message (e.g., where last_iteration_message is constructed/used)
to abort/trim execution if cumulative token usage exceeds a configured per-turn
token_budget.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: e13166e7-45d9-406e-a05b-bc8d6c8ba4f9
📒 Files selected for processing (1)
autogpt_platform/backend/backend/copilot/baseline/service.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: check API types
- GitHub Check: Seer Code Review
- GitHub Check: type-check (3.11)
- GitHub Check: test (3.13)
- GitHub Check: type-check (3.13)
- GitHub Check: test (3.12)
- GitHub Check: test (3.11)
- GitHub Check: type-check (3.12)
- GitHub Check: end-to-end tests
- GitHub Check: Analyze (typescript)
- GitHub Check: Analyze (python)
- GitHub Check: Check PR Status
🧰 Additional context used
📓 Path-based instructions (2)
autogpt_platform/backend/**/*.py
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
autogpt_platform/backend/**/*.py: Use Python 3.11 (required; managed by Poetry via pyproject.toml) for backend development
Always run 'poetry run format' (Black + isort) before linting in backend development
Always run 'poetry run lint' (ruff) after formatting in backend development
autogpt_platform/backend/**/*.py: Usepoetry run ...command for executing Python package dependencies
Use top-level imports only — avoid local/inner imports except for lazy imports of heavy optional dependencies likeopenpyxl
Use absolute imports withfrom backend.module import ...for cross-package imports; single-dot relative imports are acceptable for sibling modules within the same package; avoid double-dot relative imports
Do not use duck typing — avoidhasattr/getattr/isinstancefor type dispatch; use typed interfaces/unions/protocols instead
Use Pydantic models over dataclass/namedtuple/dict for structured data
Do not use linter suppressors — no# type: ignore,# noqa,# pyright: ignore; fix the type/code instead
Prefer list comprehensions over manual loop-and-append patterns
Use early return with guard clauses first to avoid deep nesting
Use%sfor deferred interpolation indebuglog statements for efficiency; use f-strings elsewhere for readability (e.g.,logger.debug("Processing %s items", count)vslogger.info(f"Processing {count} items"))
Sanitize error paths by usingos.path.basename()in error messages to avoid leaking directory structure
Be aware of TOCTOU (Time-Of-Check-Time-Of-Use) issues — avoid check-then-act patterns for file access and credit charging
Usetransaction=Truefor Redis pipelines to ensure atomicity on multi-step operations
Usemax(0, value)guards for computed values that should never be negative
Keep files under ~300 lines; if a file grows beyond this, split by responsibility (extract helpers, models, or a sub-module into a new file)
Keep functions under ~40 lines; extract named helpers when a function grows longer
...
Files:
autogpt_platform/backend/backend/copilot/baseline/service.py
autogpt_platform/{backend,autogpt_libs}/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Format Python code with
poetry run format
Files:
autogpt_platform/backend/backend/copilot/baseline/service.py
🧠 Learnings (15)
📓 Common learnings
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12879
File: autogpt_platform/backend/backend/copilot/baseline/service.py:0-0
Timestamp: 2026-04-22T05:57:34.861Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py`, the approved pattern for `_run_task_subagent` (PR `#12879`, commit 187f0a5) uses a nested `try/except Exception` inside an outer `try/finally`. The outer `finally` block resets `_TASK_DEPTH_VAR` (via `_TASK_DEPTH_VAR.reset(token)`) AND calls `_absorb_inner_usage(parent_state, inner_state)` unconditionally, so both the depth ContextVar and usage roll-up are guaranteed on all exit paths including `CancelledError`/`KeyboardInterrupt`/`SystemExit`. The inner `except Exception` catches and converts failures into a `TaskResponse` error payload that is returned as `StreamToolOutputAvailable`. Do NOT flag missing ContextVar reset or usage roll-up on BaseException paths in this function.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-21T11:41:05.877Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py` (PR `#12870`, commits 080d42b9d and 3d7b38162), the `_close_reasoning_block_if_open(state)` helper centralises all four reasoning-block-close call sites (text branch, tool_calls branch, stream-end, exception path). The outer `finally` block of `_baseline_llm_caller` calls this helper plus stripper flush + `StreamTextEnd` to guarantee matched end events are emitted before `StreamFinishStep` on both normal and exception paths. Do NOT flag duplicated close logic or missing reasoning-end-on-exception as issues in this function.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12439
File: autogpt_platform/backend/backend/blocks/autogpt_copilot.py:0-0
Timestamp: 2026-03-16T17:00:02.827Z
Learning: In autogpt_platform/backend/backend/blocks/autogpt_copilot.py, the recursion guard uses two module-level ContextVars: `_copilot_recursion_depth` (tracks current nesting depth) and `_copilot_recursion_limit` (stores the chain-wide ceiling). On the first invocation, `_copilot_recursion_limit` is set to `max_recursion_depth`; nested calls use `min(inherited_limit, max_recursion_depth)`, so they can only lower the cap, never raise it. The entry/exit logic is extracted into module-level helper functions. This is the approved pattern for preventing runaway sub-agent recursion in AutogptCopilotBlock (PR `#12439`, commits 348e9f8e2 and 3b70f61b1).
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12879
File: autogpt_platform/frontend/src/app/api/openapi.json:14576-14577
Timestamp: 2026-04-22T05:58:28.595Z
Learning: Repo: Significant-Gravitas/AutoGPT — autogpt_platform
Process convention: When adding new CoPilot tool response models and updating ToolResponseUnion in backend/api/features/chat/routes.py, regenerate the frontend OpenAPI schema via `poetry run export-api-schema` (do not hand-edit autogpt_platform/frontend/src/app/api/openapi.json).
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12797
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1991-2021
Timestamp: 2026-04-15T13:44:34.273Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py` (`_run_stream_attempt`), the pre-create block (PR `#12797`) intentionally does NOT call `state.transcript_builder.append_assistant(...)` when inserting the empty assistant placeholder into `ctx.session.messages`. The transcript is left ending at the `tool_result` entry (N entries) while `message_count` metadata is N+1. This mismatch is benign and deliberate: on the next `--resume`, the SDK sees the transcript ending at `tool_result` and correctly regenerates the assistant response. Pre-appending the assistant turn to the transcript would suppress regeneration while leaving `session.messages[-1].content = ""` permanently (worse outcome). On the gap-fallback path, `transcript_msg_count (N+1) >= msg_count-1 (N)` means no gap is injected for the empty placeholder, which is correct because injecting an empty assistant message as context would mislead the SDK. Do NOT flag this transcript/message_count discrepancy as a bug.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/copilot/workflow_import/converter.py:0-0
Timestamp: 2026-03-17T10:57:12.953Z
Learning: In Significant-Gravitas/AutoGPT PR `#12440`, `autogpt_platform/backend/backend/copilot/workflow_import/converter.py` was fully rewritten (commit 732960e2d) to no longer make direct LLM/OpenAI API calls. The converter now builds a structured text prompt for AutoPilot/CoPilot instead. There is no `response.choices` access or any direct LLM client usage in this file. Do not flag `response.choices` access or LLM client initialization patterns as issues in this file.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12632
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:0-0
Timestamp: 2026-04-01T04:17:43.495Z
Learning: In autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py (PR `#12632`, commit 12ae03c), the per-tool `BaseTool.read_only` property approach was removed. Instead, `readOnlyHint=True` (via `ToolAnnotations`) is applied unconditionally to ALL tools — including side-effect tools like `bash_exec` and `write_workspace_file` — to enable fully parallel dispatch by the Anthropic SDK/CLI. Do not flag tools with mutating operations (e.g. save_to_path, write operations) for having `readOnlyHint=True`; this is intentional and E2E validated (3x bash_exec(sleep 3) completed in 3.3s vs 9s sequential).
Learnt from: Bentlybro
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-03-09T10:50:43.907Z
Learning: Repo: Significant-Gravitas/AutoGPT — File: autogpt_platform/backend/backend/blocks/llm.py
For xAI Grok models accessed via OpenRouter, the API returns `null` for `max_completion_tokens`. The convention in this codebase is to use the model's context window size as the `max_output_tokens` value in ModelMetadata. For example, Grok 3 uses 131072 (128k) and Grok 4 uses 262144 (256k). Do not flag these as incorrect max output token values.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/sdk/service.py:0-0
Timestamp: 2026-04-22T12:26:42.571Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, `_resolve_sdk_model_for_request`: when a per-user LaunchDarkly model value fails `_normalize_model_name` (e.g. a `moonshotai/kimi-*` slug in direct-Anthropic mode), the fallback must be tier-specific — `config.thinking_advanced_model` for advanced tier, `config.thinking_standard_model` for standard tier — NOT the generic `_resolve_sdk_model()` (which is standard-only and returns None under subscription mode). If the tier-specific config default also fails `_normalize_model_name`, re-raise the original LD error; this is a deployment-level misconfiguration that `model_validator` should have caught at startup. Established in PR `#12881` commit 637d2fef5.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12774
File: autogpt_platform/backend/backend/copilot/tools/e2b_sandbox.py:0-0
Timestamp: 2026-04-14T06:34:02.835Z
Learning: In `autogpt_platform/backend/backend/copilot/tools/e2b_sandbox.py`, the `asyncio.wait_for()` retry loop around `AsyncSandbox.create()` (introduced in PR `#12774`) can leak up to `_SANDBOX_CREATE_MAX_RETRIES - 1` (≤2) orphaned E2B sandboxes per hang incident because `wait_for` cancels only the client-side wait while E2B may complete server-side provisioning. With the default `on_timeout="pause"` lifecycle, leaked orphaned sandboxes are **paused** (not killed) when their original `end_at` is reached and persist indefinitely until explicitly killed — there is NO automatic E2B project-level cleanup. Operators must manage these manually or via their own cleanup jobs. The sandbox_id is not accessible from the timed-out coroutine, so recovery via `AsyncSandbox.connect(sandbox_id)` is not possible at timeout. This is an intentionally accepted trade-off; a proper fix is deferred to a follow-up PR. Do NOT flag the retry loop as a blocking issue.
📚 Learning: 2026-04-22T05:57:34.861Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12879
File: autogpt_platform/backend/backend/copilot/baseline/service.py:0-0
Timestamp: 2026-04-22T05:57:34.861Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py`, the approved pattern for `_run_task_subagent` (PR `#12879`, commit 187f0a5) uses a nested `try/except Exception` inside an outer `try/finally`. The outer `finally` block resets `_TASK_DEPTH_VAR` (via `_TASK_DEPTH_VAR.reset(token)`) AND calls `_absorb_inner_usage(parent_state, inner_state)` unconditionally, so both the depth ContextVar and usage roll-up are guaranteed on all exit paths including `CancelledError`/`KeyboardInterrupt`/`SystemExit`. The inner `except Exception` catches and converts failures into a `TaskResponse` error payload that is returned as `StreamToolOutputAvailable`. Do NOT flag missing ContextVar reset or usage roll-up on BaseException paths in this function.
Applied to files:
autogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-04-21T11:41:05.877Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-21T11:41:05.877Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py` (PR `#12870`, commits 080d42b9d and 3d7b38162), the `_close_reasoning_block_if_open(state)` helper centralises all four reasoning-block-close call sites (text branch, tool_calls branch, stream-end, exception path). The outer `finally` block of `_baseline_llm_caller` calls this helper plus stripper flush + `StreamTextEnd` to guarantee matched end events are emitted before `StreamFinishStep` on both normal and exception paths. Do NOT flag duplicated close logic or missing reasoning-end-on-exception as issues in this function.
Applied to files:
autogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-04-01T04:17:41.600Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12632
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:0-0
Timestamp: 2026-04-01T04:17:41.600Z
Learning: When reviewing AutoGPT Copilot tool implementations, accept that `readOnlyHint=True` (provided via `ToolAnnotations`) may be applied unconditionally to *all* tools—even tools that have side effects (e.g., `bash_exec`, `write_workspace_file`, or other write/save operations). Do **not** flag these tools for having `readOnlyHint=True`; this is intentional to enable fully-parallel dispatch by the Anthropic SDK/CLI and has been E2E validated. Only flag `readOnlyHint` issues if they conflict with the established `ToolAnnotations` behavior (e.g., missing/incorrect propagation relative to the intended annotation mechanism).
Applied to files:
autogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-03-17T10:57:12.953Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/copilot/workflow_import/converter.py:0-0
Timestamp: 2026-03-17T10:57:12.953Z
Learning: In Significant-Gravitas/AutoGPT PR `#12440`, `autogpt_platform/backend/backend/copilot/workflow_import/converter.py` was fully rewritten (commit 732960e2d) to no longer make direct LLM/OpenAI API calls. The converter now builds a structured text prompt for AutoPilot/CoPilot instead. There is no `response.choices` access or any direct LLM client usage in this file. Do not flag `response.choices` access or LLM client initialization patterns as issues in this file.
Applied to files:
autogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-03-16T17:00:02.827Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12439
File: autogpt_platform/backend/backend/blocks/autogpt_copilot.py:0-0
Timestamp: 2026-03-16T17:00:02.827Z
Learning: In autogpt_platform/backend/backend/blocks/autogpt_copilot.py, the recursion guard uses two module-level ContextVars: `_copilot_recursion_depth` (tracks current nesting depth) and `_copilot_recursion_limit` (stores the chain-wide ceiling). On the first invocation, `_copilot_recursion_limit` is set to `max_recursion_depth`; nested calls use `min(inherited_limit, max_recursion_depth)`, so they can only lower the cap, never raise it. The entry/exit logic is extracted into module-level helper functions. This is the approved pattern for preventing runaway sub-agent recursion in AutogptCopilotBlock (PR `#12439`, commits 348e9f8e2 and 3b70f61b1).
Applied to files:
autogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-04-22T05:58:28.595Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12879
File: autogpt_platform/frontend/src/app/api/openapi.json:14576-14577
Timestamp: 2026-04-22T05:58:28.595Z
Learning: Repo: Significant-Gravitas/AutoGPT — autogpt_platform
Process convention: When adding new CoPilot tool response models and updating ToolResponseUnion in backend/api/features/chat/routes.py, regenerate the frontend OpenAPI schema via `poetry run export-api-schema` (do not hand-edit autogpt_platform/frontend/src/app/api/openapi.json).
Applied to files:
autogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-02-26T17:02:22.448Z
Learnt from: Pwuts
Repo: Significant-Gravitas/AutoGPT PR: 12211
File: .pre-commit-config.yaml:160-179
Timestamp: 2026-02-26T17:02:22.448Z
Learning: Keep the pre-commit hook pattern broad for autogpt_platform/backend to ensure OpenAPI schema changes are captured. Do not narrow to backend/api/ alone, since the generated schema depends on Pydantic models across multiple directories (backend/data/, backend/blocks/, backend/copilot/, backend/integrations/, backend/util/). Narrowing could miss schema changes and cause frontend type desynchronization.
Applied to files:
autogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-03-04T08:04:35.881Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12273
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:216-220
Timestamp: 2026-03-04T08:04:35.881Z
Learning: In the AutoGPT Copilot backend, ensure that SVG images are not treated as vision image types by excluding 'image/svg+xml' from INLINEABLE_MIME_TYPES and MULTIMODAL_TYPES in tool_adapter.py; the Claude API supports PNG, JPEG, GIF, and WebP for vision. SVGs (XML text) should be handled via the text path instead, not the vision path.
Applied to files:
autogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-03-05T15:42:08.207Z
Learnt from: ntindle
Repo: Significant-Gravitas/AutoGPT PR: 12297
File: .claude/skills/backend-check/SKILL.md:14-16
Timestamp: 2026-03-05T15:42:08.207Z
Learning: In Python files under autogpt_platform/backend (recursively), rely on poetry run format to perform formatting (Black + isort) and linting (ruff). Do not run poetry run lint as a separate step after poetry run format, since format already includes linting checks.
Applied to files:
autogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-03-16T16:35:40.236Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/api/features/workflow_import.py:54-63
Timestamp: 2026-03-16T16:35:40.236Z
Learning: Avoid using the word 'competitor' in public-facing identifiers and text. Use neutral naming for API paths, model names, function names, and UI text. Examples: rename 'CompetitorFormat' to 'SourcePlatform', 'convert_competitor_workflow' to 'convert_workflow', '/competitor-workflow' to '/workflow'. Apply this guideline to files under autogpt_platform/backend and autogpt_platform/frontend.
Applied to files:
autogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-03-31T15:37:38.626Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12623
File: autogpt_platform/backend/backend/copilot/tools/agent_generator/fixer.py:37-47
Timestamp: 2026-03-31T15:37:38.626Z
Learning: When validating/constructing Anthropic API model IDs in Significant-Gravitas/AutoGPT, allow the hyphen-separated Claude Opus 4.6 model ID `claude-opus-4-6` (it corresponds to `LlmModel.CLAUDE_4_6_OPUS` in `autogpt_platform/backend/backend/blocks/llm.py`). Do NOT require the dot-separated form in Anthropic contexts. Only OpenRouter routing variants should use the dot separator (e.g., `anthropic/claude-opus-4.6`); `claude-opus-4-6` should be treated as correct when passed to Anthropic, and flagged only if it’s used in the OpenRouter path where the dot form is expected.
Applied to files:
autogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-04-15T02:43:36.890Z
Learnt from: ntindle
Repo: Significant-Gravitas/AutoGPT PR: 12780
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:0-0
Timestamp: 2026-04-15T02:43:36.890Z
Learning: When reviewing Python exception handlers, do not flag `isinstance(e, X)` checks as dead/unreachable if the caught exception `X` is a subclass of the exception type being handled. For example, if `X` (e.g., `VirusScanError`) inherits from `ValueError` (directly or via an intermediate class) and it can be raised within an `except ValueError:` block, then `isinstance(e, X)` inside that handler is reachable and should not be treated as dead code.
Applied to files:
autogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-04-22T11:46:04.431Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/config.py:0-0
Timestamp: 2026-04-22T11:46:04.431Z
Learning: Do not flag the Claude Sonnet 4.6 model ID as incorrect when it uses the project’s established hyphenated convention: `anthropic/claude-sonnet-4-6`. This hyphen form is the intentional, production convention and should be treated as valid (including in files like llm.py, blocks tests, reasoning.py, `_is_anthropic_model` tests, and config defaults). Note that OpenRouter also accepts the dot variant `anthropic/claude-sonnet-4.6`, so either form may be tolerated, but `anthropic/claude-sonnet-4-6` should be considered the standard to match project usage.
Applied to files:
autogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-04-22T11:46:12.892Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/baseline/service.py:322-332
Timestamp: 2026-04-22T11:46:12.892Z
Learning: In this codebase (Significant-Gravitas/AutoGPT), OpenRouter-routed Anthropic model IDs should use the hyphen-separated convention (e.g., `anthropic/claude-sonnet-4-6`, `anthropic/claude-opus-4-6`). Although OpenRouter may accept both hyphen and dot variants, treat the hyphen-separated form as the intended, correct codebase-wide convention and do not flag it as an error. Only flag the dot-separated variant (e.g., `anthropic/claude-sonnet-4.6`) as incorrect when reviewing/validating model ID strings for OpenRouter-routed Anthropic models.
Applied to files:
autogpt_platform/backend/backend/copilot/baseline/service.py
🔇 Additional comments (1)
autogpt_platform/backend/backend/copilot/baseline/service.py (1)
1761-1762:tool_call_loopsignature confirmed to supportlast_iteration_message.The parameter is present in the function signature at line 165 of
tool_call_loop.py, documented in the docstring, and actively used in the implementation (lines 210–218). The testtest_last_iteration_message_appendedvalidates the behavior.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## dev #12892 +/- ##
==========================================
- Coverage 67.48% 67.44% -0.04%
==========================================
Files 1906 1906
Lines 146916 146973 +57
Branches 15396 15400 +4
==========================================
- Hits 99152 99133 -19
- Misses 44801 44871 +70
- Partials 2963 2969 +6
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
🧪 E2E Test ReportDate: 2026-04-23 PR under testTwo-line change in
Scenarios1. Config wiring — loop now runs up to 100 rounds (was 30) — PASS
2. Graceful finish on round 100 — PASS
3. No regression in short-turn copilot — PASS
4. Unit tests — PASS
Summary
Verdict: APPROVE. No bugs found. No fixes pushed. Screenshots |
…eline Mirrors the baseline bump in the SDK path so long agent runs aren't cut short on that path either. The SDK already wraps up more gracefully than baseline's hard cliff (Claude Agent SDK honours max_turns with a summary attempt), and a per-query $10 budget guard still prevents runaway cost. The env var CHAT_CLAUDE_AGENT_MAX_TURNS remains the operator escape hatch.
…aceful end Before: baseline had a hard-coded `_MAX_TOOL_ROUNDS = 100`; on hit it emitted `StreamError(baseline_tool_round_limit)` — which surfaced in the UI as a red "Exceeded 100 tool-call rounds" error telling the user to retry, even when the model had already produced a useful streamed response (see Discord bug in session 661ba0cc-a905-4c66-bf11-61eb5423d775). - Route baseline through the existing `config.claude_agent_max_turns` so a single `CHAT_CLAUDE_AGENT_MAX_TURNS` env var tunes both baseline and SDK. - `tool_call_loop`: on the last iteration, drop `tools` (keep the hint). With no tools available the model is physically forced to return text, so the loop always reaches `finished_naturally=True` and no truncation. - Remove the hit-limit `StreamError` emit entirely — the model's final summary is the response, not an error. The generic exception path still surfaces real failures via `baseline_error`. - Extend `test_last_iteration_message_appended` to also assert tools are dropped on the last round.
There was a problem hiding this comment.
🧹 Nitpick comments (3)
autogpt_platform/backend/backend/util/tool_call_loop.py (1)
206-226: Final-iteration behavior is sound; minor note on tool execution.The
is_lastgate correctly forces a text-only turn by passingiteration_tools=[]to the LLM while appending the finishing hint on a defensive copy ofmessages. Two small observations:
is_lastevaluates tostr | None | bool(due to short-circuit onlast_iteration_message), not strictlybool. Used as a truthy gate so it works, but abool(...)wrap or explicitis not Nonewould be clearer for readers/type tooling.- If the model ignores the hint and still emits
tool_callson the last iteration, lines 247–260 will still dispatch them via the originaltoolssequence (the LLM was told it had none, yet their handlers run). Given the loop then terminates atmax_iterationswithfinished_naturally=False, the tool side-effects happen but the model never gets to see the results. Probably acceptable for this PR’s graceful-finish goal, but worth confirming you want those trailing tool calls to execute rather than be dropped.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@autogpt_platform/backend/backend/util/tool_call_loop.py` around lines 206 - 226, The is_last gate currently can be str|None|bool; make it an explicit boolean (e.g., is_last = bool(last_iteration_message) and max_iterations > 0 and iteration == max_iterations or use last_iteration_message is not None) to satisfy readers/type checkers, and ensure downstream tool dispatch uses iteration_tools (the empty sequence set when is_last) rather than the original tools so any tool_calls produced by llm_call on the final turn are not executed; update the logic around llm_call and the later dispatch (the code that processes tool_calls) to check is_last or use iteration_tools when deciding whether to run handlers so final-iteration tool side-effects are dropped.autogpt_platform/backend/backend/copilot/baseline/service.py (2)
1754-1784: Defensive note:-1(infinite) would break the mid-loop drain gate.
max_tool_rounds = config.claude_agent_max_turnsis then used as bothmax_iterations=(wheretool_call_looptreats-1as infinite) and in theiterations >= max_tool_roundscheck at line 1783. If the config is ever set to-1, that check becomes always-true and the pending-message mid-loop drain is effectively disabled (every yield is treated as the "final" one). Today the default is100so this is a latent edge case, but a small guard would future-proof it:Proposed guard
- max_tool_rounds = config.claude_agent_max_turns + max_tool_rounds = config.claude_agent_max_turns + # max_tool_rounds may be -1 (infinite) — in that case there is no + # "final yield" to skip, so mid-loop drains should always run. async for loop_result in tool_call_loop( ... ): ... is_final_yield = ( loop_result.finished_naturally - or loop_result.iterations >= max_tool_rounds + or (max_tool_rounds > 0 and loop_result.iterations >= max_tool_rounds) )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@autogpt_platform/backend/backend/copilot/baseline/service.py` around lines 1754 - 1784, The check that sets is_final_yield uses max_tool_rounds (from config.claude_agent_max_turns) which may be -1 to mean "infinite", causing iterations >= max_tool_rounds to be always true; fix by normalizing or guarding that value before the comparison: compute a local safe_max (e.g., None or math.inf) or check that max_tool_rounds is non-negative before comparing (for example only evaluate loop_result.iterations >= max_tool_rounds when max_tool_rounds >= 0), so tool_call_loop (referenced here) and the is_final_yield logic do not treat every yield as final when config.claude_agent_max_turns == -1.
1905-1916: Graceful-finish fallback looks good; consider a lightweight UX signal when the model ignores the hint.Dropping the hard
StreamErroron budget exhaustion is well-motivated — withiteration_tools=[]forcing a final text response, the common path now yields a clean summary. However, when the model is non-compliant (still returns tool_calls on the last round), the user sees only whatever partial output was streamed and a normalStreamFinish, with no indication the turn was truncated. Logging a warning server-side is useful for ops, but the frontend retry cascade you mentioned as a separate follow-up will still be blind to this case.Optional: emit a subtle assistant-side hint (e.g. a short trailing text delta like "(turn ended at tool-call budget — ask me to continue)") in this branch so the user knows why their run stopped without getting a red error. Not blocking for this PR.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@autogpt_platform/backend/backend/copilot/baseline/service.py` around lines 1905 - 1916, In the branch handling "if loop_result and not loop_result.finished_naturally" add a lightweight assistant-facing hint so the frontend/user knows the turn was truncated by the tool-call budget: after the logger.warning (in service.py where loop_result and iteration_tools=[] are referenced) emit a short trailing text delta (for example "_(turn ended at tool-call budget — ask me to continue)_") via the same streaming/finish mechanism used elsewhere (the codepath that currently returns a normal StreamFinish), rather than only logging — use the existing stream/response helper to append this final assistant message so the frontend receives the subtle UX signal while preserving the graceful finish.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@autogpt_platform/backend/backend/copilot/baseline/service.py`:
- Around line 1754-1784: The check that sets is_final_yield uses max_tool_rounds
(from config.claude_agent_max_turns) which may be -1 to mean "infinite", causing
iterations >= max_tool_rounds to be always true; fix by normalizing or guarding
that value before the comparison: compute a local safe_max (e.g., None or
math.inf) or check that max_tool_rounds is non-negative before comparing (for
example only evaluate loop_result.iterations >= max_tool_rounds when
max_tool_rounds >= 0), so tool_call_loop (referenced here) and the
is_final_yield logic do not treat every yield as final when
config.claude_agent_max_turns == -1.
- Around line 1905-1916: In the branch handling "if loop_result and not
loop_result.finished_naturally" add a lightweight assistant-facing hint so the
frontend/user knows the turn was truncated by the tool-call budget: after the
logger.warning (in service.py where loop_result and iteration_tools=[] are
referenced) emit a short trailing text delta (for example "_(turn ended at
tool-call budget — ask me to continue)_") via the same streaming/finish
mechanism used elsewhere (the codepath that currently returns a normal
StreamFinish), rather than only logging — use the existing stream/response
helper to append this final assistant message so the frontend receives the
subtle UX signal while preserving the graceful finish.
In `@autogpt_platform/backend/backend/util/tool_call_loop.py`:
- Around line 206-226: The is_last gate currently can be str|None|bool; make it
an explicit boolean (e.g., is_last = bool(last_iteration_message) and
max_iterations > 0 and iteration == max_iterations or use last_iteration_message
is not None) to satisfy readers/type checkers, and ensure downstream tool
dispatch uses iteration_tools (the empty sequence set when is_last) rather than
the original tools so any tool_calls produced by llm_call on the final turn are
not executed; update the logic around llm_call and the later dispatch (the code
that processes tool_calls) to check is_last or use iteration_tools when deciding
whether to run handlers so final-iteration tool side-effects are dropped.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: cea64bb2-4966-4a14-a886-123868a71d62
📒 Files selected for processing (3)
autogpt_platform/backend/backend/copilot/baseline/service.pyautogpt_platform/backend/backend/util/tool_call_loop.pyautogpt_platform/backend/backend/util/tool_call_loop_test.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: check API types
- GitHub Check: Seer Code Review
- GitHub Check: Analyze (python)
- GitHub Check: Analyze (typescript)
- GitHub Check: Check PR Status
- GitHub Check: end-to-end tests
- GitHub Check: test (3.12)
- GitHub Check: test (3.13)
- GitHub Check: test (3.11)
- GitHub Check: type-check (3.12)
🧰 Additional context used
📓 Path-based instructions (3)
autogpt_platform/backend/**/*.py
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
autogpt_platform/backend/**/*.py: Use Python 3.11 (required; managed by Poetry via pyproject.toml) for backend development
Always run 'poetry run format' (Black + isort) before linting in backend development
Always run 'poetry run lint' (ruff) after formatting in backend development
autogpt_platform/backend/**/*.py: Usepoetry run ...command for executing Python package dependencies
Use top-level imports only — avoid local/inner imports except for lazy imports of heavy optional dependencies likeopenpyxl
Use absolute imports withfrom backend.module import ...for cross-package imports; single-dot relative imports are acceptable for sibling modules within the same package; avoid double-dot relative imports
Do not use duck typing — avoidhasattr/getattr/isinstancefor type dispatch; use typed interfaces/unions/protocols instead
Use Pydantic models over dataclass/namedtuple/dict for structured data
Do not use linter suppressors — no# type: ignore,# noqa,# pyright: ignore; fix the type/code instead
Prefer list comprehensions over manual loop-and-append patterns
Use early return with guard clauses first to avoid deep nesting
Use%sfor deferred interpolation indebuglog statements for efficiency; use f-strings elsewhere for readability (e.g.,logger.debug("Processing %s items", count)vslogger.info(f"Processing {count} items"))
Sanitize error paths by usingos.path.basename()in error messages to avoid leaking directory structure
Be aware of TOCTOU (Time-Of-Check-Time-Of-Use) issues — avoid check-then-act patterns for file access and credit charging
Usetransaction=Truefor Redis pipelines to ensure atomicity on multi-step operations
Usemax(0, value)guards for computed values that should never be negative
Keep files under ~300 lines; if a file grows beyond this, split by responsibility (extract helpers, models, or a sub-module into a new file)
Keep functions under ~40 lines; extract named helpers when a function grows longer
...
Files:
autogpt_platform/backend/backend/util/tool_call_loop.pyautogpt_platform/backend/backend/util/tool_call_loop_test.pyautogpt_platform/backend/backend/copilot/baseline/service.py
autogpt_platform/{backend,autogpt_libs}/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Format Python code with
poetry run format
Files:
autogpt_platform/backend/backend/util/tool_call_loop.pyautogpt_platform/backend/backend/util/tool_call_loop_test.pyautogpt_platform/backend/backend/copilot/baseline/service.py
autogpt_platform/backend/**/*_test.py
📄 CodeRabbit inference engine (autogpt_platform/backend/AGENTS.md)
autogpt_platform/backend/**/*_test.py: Use pytest with snapshot testing for API responses
Colocate test files with source files using*_test.pynaming convention
Mock at boundaries — mock where the symbol is used, not where it's defined; after refactoring, update mock targets to match new module paths
UseAsyncMockfromunittest.mockfor async functions in tests
When writing tests, use Test-Driven Development (TDD): write failing tests marked with@pytest.mark.xfailbefore implementation, then remove the marker once the implementation is complete
When creating snapshots in tests, usepoetry run pytest path/to/test.py --snapshot-update; always review snapshot changes withgit diffbefore committing
Files:
autogpt_platform/backend/backend/util/tool_call_loop_test.py
🧠 Learnings (20)
📓 Common learnings
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12879
File: autogpt_platform/backend/backend/copilot/baseline/service.py:0-0
Timestamp: 2026-04-22T05:57:34.861Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py`, the approved pattern for `_run_task_subagent` (PR `#12879`, commit 187f0a5) uses a nested `try/except Exception` inside an outer `try/finally`. The outer `finally` block resets `_TASK_DEPTH_VAR` (via `_TASK_DEPTH_VAR.reset(token)`) AND calls `_absorb_inner_usage(parent_state, inner_state)` unconditionally, so both the depth ContextVar and usage roll-up are guaranteed on all exit paths including `CancelledError`/`KeyboardInterrupt`/`SystemExit`. The inner `except Exception` catches and converts failures into a `TaskResponse` error payload that is returned as `StreamToolOutputAvailable`. Do NOT flag missing ContextVar reset or usage roll-up on BaseException paths in this function.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-21T11:41:05.877Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py` (PR `#12870`, commits 080d42b9d and 3d7b38162), the `_close_reasoning_block_if_open(state)` helper centralises all four reasoning-block-close call sites (text branch, tool_calls branch, stream-end, exception path). The outer `finally` block of `_baseline_llm_caller` calls this helper plus stripper flush + `StreamTextEnd` to guarantee matched end events are emitted before `StreamFinishStep` on both normal and exception paths. Do NOT flag duplicated close logic or missing reasoning-end-on-exception as issues in this function.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12632
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:0-0
Timestamp: 2026-04-01T04:17:43.495Z
Learning: In autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py (PR `#12632`, commit 12ae03c), the per-tool `BaseTool.read_only` property approach was removed. Instead, `readOnlyHint=True` (via `ToolAnnotations`) is applied unconditionally to ALL tools — including side-effect tools like `bash_exec` and `write_workspace_file` — to enable fully parallel dispatch by the Anthropic SDK/CLI. Do not flag tools with mutating operations (e.g. save_to_path, write operations) for having `readOnlyHint=True`; this is intentional and E2E validated (3x bash_exec(sleep 3) completed in 3.3s vs 9s sequential).
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12439
File: autogpt_platform/backend/backend/blocks/autogpt_copilot.py:0-0
Timestamp: 2026-03-16T17:00:02.827Z
Learning: In autogpt_platform/backend/backend/blocks/autogpt_copilot.py, the recursion guard uses two module-level ContextVars: `_copilot_recursion_depth` (tracks current nesting depth) and `_copilot_recursion_limit` (stores the chain-wide ceiling). On the first invocation, `_copilot_recursion_limit` is set to `max_recursion_depth`; nested calls use `min(inherited_limit, max_recursion_depth)`, so they can only lower the cap, never raise it. The entry/exit logic is extracted into module-level helper functions. This is the approved pattern for preventing runaway sub-agent recursion in AutogptCopilotBlock (PR `#12439`, commits 348e9f8e2 and 3b70f61b1).
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12797
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1991-2021
Timestamp: 2026-04-15T13:44:34.273Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py` (`_run_stream_attempt`), the pre-create block (PR `#12797`) intentionally does NOT call `state.transcript_builder.append_assistant(...)` when inserting the empty assistant placeholder into `ctx.session.messages`. The transcript is left ending at the `tool_result` entry (N entries) while `message_count` metadata is N+1. This mismatch is benign and deliberate: on the next `--resume`, the SDK sees the transcript ending at `tool_result` and correctly regenerates the assistant response. Pre-appending the assistant turn to the transcript would suppress regeneration while leaving `session.messages[-1].content = ""` permanently (worse outcome). On the gap-fallback path, `transcript_msg_count (N+1) >= msg_count-1 (N)` means no gap is injected for the empty placeholder, which is correct because injecting an empty assistant message as context would mislead the SDK. Do NOT flag this transcript/message_count discrepancy as a bug.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/sdk/service.py:0-0
Timestamp: 2026-04-22T12:26:42.571Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, `_resolve_sdk_model_for_request`: when a per-user LaunchDarkly model value fails `_normalize_model_name` (e.g. a `moonshotai/kimi-*` slug in direct-Anthropic mode), the fallback must be tier-specific — `config.thinking_advanced_model` for advanced tier, `config.thinking_standard_model` for standard tier — NOT the generic `_resolve_sdk_model()` (which is standard-only and returns None under subscription mode). If the tier-specific config default also fails `_normalize_model_name`, re-raise the original LD error; this is a deployment-level misconfiguration that `model_validator` should have caught at startup. Established in PR `#12881` commit 637d2fef5.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12773
File: autogpt_platform/backend/backend/copilot/pending_messages.py:52-64
Timestamp: 2026-04-14T14:36:25.545Z
Learning: In `autogpt_platform/backend/backend/copilot` (PR `#12773`, commit d7bced0c6): when draining pending messages into `session.messages`, each message's text is sanitized via `strip_user_context_tags` before persistence to prevent user-controlled `<user_context>` injection from bypassing the trusted server-side context prefix. Additionally, if `upsert_chat_session` fails after draining, the drained `PendingMessage` objects are requeued back to Redis to avoid silent message loss. Do NOT flag the drain-then-requeue pattern as redundant — it is the intentional failure-resilience strategy for the pending buffer.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/copilot/workflow_import/converter.py:0-0
Timestamp: 2026-03-17T10:57:12.953Z
Learning: In Significant-Gravitas/AutoGPT PR `#12440`, `autogpt_platform/backend/backend/copilot/workflow_import/converter.py` was fully rewritten (commit 732960e2d) to no longer make direct LLM/OpenAI API calls. The converter now builds a structured text prompt for AutoPilot/CoPilot instead. There is no `response.choices` access or any direct LLM client usage in this file. Do not flag `response.choices` access or LLM client initialization patterns as issues in this file.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12636
File: autogpt_platform/backend/backend/copilot/sdk/service.py:0-0
Timestamp: 2026-04-01T14:54:01.937Z
Learning: In Significant-Gravitas/AutoGPT (autogpt_platform), `claude_agent_max_transient_retries` (default=3) in `ChatConfig` counts **total attempts including the initial one**, not the number of extra retries. With the pre-incremented `transient_retries >= max_transient` guard in `service.py`, a value of 3 yields 3 total stream attempts (initial + 2 retries with exponential backoff: 1s, 2s). Do NOT flag this as an off-by-one — the `>=` check is intentional.
📚 Learning: 2026-03-17T10:57:12.953Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/copilot/workflow_import/converter.py:0-0
Timestamp: 2026-03-17T10:57:12.953Z
Learning: In Significant-Gravitas/AutoGPT PR `#12440`, `autogpt_platform/backend/backend/copilot/workflow_import/converter.py` was fully rewritten (commit 732960e2d) to no longer make direct LLM/OpenAI API calls. The converter now builds a structured text prompt for AutoPilot/CoPilot instead. There is no `response.choices` access or any direct LLM client usage in this file. Do not flag `response.choices` access or LLM client initialization patterns as issues in this file.
Applied to files:
autogpt_platform/backend/backend/util/tool_call_loop.pyautogpt_platform/backend/backend/util/tool_call_loop_test.pyautogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-04-01T04:17:43.495Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12632
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:0-0
Timestamp: 2026-04-01T04:17:43.495Z
Learning: In autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py (PR `#12632`, commit 12ae03c), the per-tool `BaseTool.read_only` property approach was removed. Instead, `readOnlyHint=True` (via `ToolAnnotations`) is applied unconditionally to ALL tools — including side-effect tools like `bash_exec` and `write_workspace_file` — to enable fully parallel dispatch by the Anthropic SDK/CLI. Do not flag tools with mutating operations (e.g. save_to_path, write operations) for having `readOnlyHint=True`; this is intentional and E2E validated (3x bash_exec(sleep 3) completed in 3.3s vs 9s sequential).
Applied to files:
autogpt_platform/backend/backend/util/tool_call_loop.pyautogpt_platform/backend/backend/util/tool_call_loop_test.py
📚 Learning: 2026-02-26T17:02:22.448Z
Learnt from: Pwuts
Repo: Significant-Gravitas/AutoGPT PR: 12211
File: .pre-commit-config.yaml:160-179
Timestamp: 2026-02-26T17:02:22.448Z
Learning: Keep the pre-commit hook pattern broad for autogpt_platform/backend to ensure OpenAPI schema changes are captured. Do not narrow to backend/api/ alone, since the generated schema depends on Pydantic models across multiple directories (backend/data/, backend/blocks/, backend/copilot/, backend/integrations/, backend/util/). Narrowing could miss schema changes and cause frontend type desynchronization.
Applied to files:
autogpt_platform/backend/backend/util/tool_call_loop.pyautogpt_platform/backend/backend/util/tool_call_loop_test.pyautogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-03-05T15:42:08.207Z
Learnt from: ntindle
Repo: Significant-Gravitas/AutoGPT PR: 12297
File: .claude/skills/backend-check/SKILL.md:14-16
Timestamp: 2026-03-05T15:42:08.207Z
Learning: In Python files under autogpt_platform/backend (recursively), rely on poetry run format to perform formatting (Black + isort) and linting (ruff). Do not run poetry run lint as a separate step after poetry run format, since format already includes linting checks.
Applied to files:
autogpt_platform/backend/backend/util/tool_call_loop.pyautogpt_platform/backend/backend/util/tool_call_loop_test.pyautogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-03-16T16:35:40.236Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/api/features/workflow_import.py:54-63
Timestamp: 2026-03-16T16:35:40.236Z
Learning: Avoid using the word 'competitor' in public-facing identifiers and text. Use neutral naming for API paths, model names, function names, and UI text. Examples: rename 'CompetitorFormat' to 'SourcePlatform', 'convert_competitor_workflow' to 'convert_workflow', '/competitor-workflow' to '/workflow'. Apply this guideline to files under autogpt_platform/backend and autogpt_platform/frontend.
Applied to files:
autogpt_platform/backend/backend/util/tool_call_loop.pyautogpt_platform/backend/backend/util/tool_call_loop_test.pyautogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-03-31T15:37:38.626Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12623
File: autogpt_platform/backend/backend/copilot/tools/agent_generator/fixer.py:37-47
Timestamp: 2026-03-31T15:37:38.626Z
Learning: When validating/constructing Anthropic API model IDs in Significant-Gravitas/AutoGPT, allow the hyphen-separated Claude Opus 4.6 model ID `claude-opus-4-6` (it corresponds to `LlmModel.CLAUDE_4_6_OPUS` in `autogpt_platform/backend/backend/blocks/llm.py`). Do NOT require the dot-separated form in Anthropic contexts. Only OpenRouter routing variants should use the dot separator (e.g., `anthropic/claude-opus-4.6`); `claude-opus-4-6` should be treated as correct when passed to Anthropic, and flagged only if it’s used in the OpenRouter path where the dot form is expected.
Applied to files:
autogpt_platform/backend/backend/util/tool_call_loop.pyautogpt_platform/backend/backend/util/tool_call_loop_test.pyautogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-04-15T02:43:36.890Z
Learnt from: ntindle
Repo: Significant-Gravitas/AutoGPT PR: 12780
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:0-0
Timestamp: 2026-04-15T02:43:36.890Z
Learning: When reviewing Python exception handlers, do not flag `isinstance(e, X)` checks as dead/unreachable if the caught exception `X` is a subclass of the exception type being handled. For example, if `X` (e.g., `VirusScanError`) inherits from `ValueError` (directly or via an intermediate class) and it can be raised within an `except ValueError:` block, then `isinstance(e, X)` inside that handler is reachable and should not be treated as dead code.
Applied to files:
autogpt_platform/backend/backend/util/tool_call_loop.pyautogpt_platform/backend/backend/util/tool_call_loop_test.pyautogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-04-22T11:46:04.431Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/config.py:0-0
Timestamp: 2026-04-22T11:46:04.431Z
Learning: Do not flag the Claude Sonnet 4.6 model ID as incorrect when it uses the project’s established hyphenated convention: `anthropic/claude-sonnet-4-6`. This hyphen form is the intentional, production convention and should be treated as valid (including in files like llm.py, blocks tests, reasoning.py, `_is_anthropic_model` tests, and config defaults). Note that OpenRouter also accepts the dot variant `anthropic/claude-sonnet-4.6`, so either form may be tolerated, but `anthropic/claude-sonnet-4-6` should be considered the standard to match project usage.
Applied to files:
autogpt_platform/backend/backend/util/tool_call_loop.pyautogpt_platform/backend/backend/util/tool_call_loop_test.pyautogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-04-22T11:46:12.892Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/baseline/service.py:322-332
Timestamp: 2026-04-22T11:46:12.892Z
Learning: In this codebase (Significant-Gravitas/AutoGPT), OpenRouter-routed Anthropic model IDs should use the hyphen-separated convention (e.g., `anthropic/claude-sonnet-4-6`, `anthropic/claude-opus-4-6`). Although OpenRouter may accept both hyphen and dot variants, treat the hyphen-separated form as the intended, correct codebase-wide convention and do not flag it as an error. Only flag the dot-separated variant (e.g., `anthropic/claude-sonnet-4.6`) as incorrect when reviewing/validating model ID strings for OpenRouter-routed Anthropic models.
Applied to files:
autogpt_platform/backend/backend/util/tool_call_loop.pyautogpt_platform/backend/backend/util/tool_call_loop_test.pyautogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-04-15T13:44:34.273Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12797
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1991-2021
Timestamp: 2026-04-15T13:44:34.273Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py` (`_run_stream_attempt`), the pre-create block (PR `#12797`) intentionally does NOT call `state.transcript_builder.append_assistant(...)` when inserting the empty assistant placeholder into `ctx.session.messages`. The transcript is left ending at the `tool_result` entry (N entries) while `message_count` metadata is N+1. This mismatch is benign and deliberate: on the next `--resume`, the SDK sees the transcript ending at `tool_result` and correctly regenerates the assistant response. Pre-appending the assistant turn to the transcript would suppress regeneration while leaving `session.messages[-1].content = ""` permanently (worse outcome). On the gap-fallback path, `transcript_msg_count (N+1) >= msg_count-1 (N)` means no gap is injected for the empty placeholder, which is correct because injecting an empty assistant message as context would mislead the SDK. Do NOT flag this transcript/message_count discrepancy as a bug.
Applied to files:
autogpt_platform/backend/backend/util/tool_call_loop_test.pyautogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-02-04T16:49:42.490Z
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-02-04T16:49:42.490Z
Learning: Applies to autogpt_platform/backend/**/test/**/*.py : Use snapshot testing with '--snapshot-update' flag in backend tests when output changes; always review with 'git diff'
Applied to files:
autogpt_platform/backend/backend/util/tool_call_loop_test.py
📚 Learning: 2026-03-26T07:00:03.405Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12574
File: autogpt_platform/backend/backend/copilot/sdk/transcript.py:980-990
Timestamp: 2026-03-26T07:00:03.405Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/transcript.py`, `_rechain_tail` intentionally rewrites `parentUuid` for **all** tail entries (not just the first), because a single assistant turn can span multiple consecutive JSONL entries sharing the same `message.id` (e.g., a thinking entry + a tool_use entry). Their original `parentUuid` values may reference entries that were absorbed into the compressed prefix, so sequential rechaining of the entire tail is required to maintain a valid parent→child graph. The test `test_chains_multiple_tail_entries` validates this: the second tail entry's `parentUuid` is rewritten from its original value to the uuid of the first tail entry.
Applied to files:
autogpt_platform/backend/backend/util/tool_call_loop_test.py
📚 Learning: 2026-04-22T05:57:34.861Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12879
File: autogpt_platform/backend/backend/copilot/baseline/service.py:0-0
Timestamp: 2026-04-22T05:57:34.861Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py`, the approved pattern for `_run_task_subagent` (PR `#12879`, commit 187f0a5) uses a nested `try/except Exception` inside an outer `try/finally`. The outer `finally` block resets `_TASK_DEPTH_VAR` (via `_TASK_DEPTH_VAR.reset(token)`) AND calls `_absorb_inner_usage(parent_state, inner_state)` unconditionally, so both the depth ContextVar and usage roll-up are guaranteed on all exit paths including `CancelledError`/`KeyboardInterrupt`/`SystemExit`. The inner `except Exception` catches and converts failures into a `TaskResponse` error payload that is returned as `StreamToolOutputAvailable`. Do NOT flag missing ContextVar reset or usage roll-up on BaseException paths in this function.
Applied to files:
autogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-04-21T11:41:05.877Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-21T11:41:05.877Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py` (PR `#12870`, commits 080d42b9d and 3d7b38162), the `_close_reasoning_block_if_open(state)` helper centralises all four reasoning-block-close call sites (text branch, tool_calls branch, stream-end, exception path). The outer `finally` block of `_baseline_llm_caller` calls this helper plus stripper flush + `StreamTextEnd` to guarantee matched end events are emitted before `StreamFinishStep` on both normal and exception paths. Do NOT flag duplicated close logic or missing reasoning-end-on-exception as issues in this function.
Applied to files:
autogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-04-01T14:54:01.937Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12636
File: autogpt_platform/backend/backend/copilot/sdk/service.py:0-0
Timestamp: 2026-04-01T14:54:01.937Z
Learning: In Significant-Gravitas/AutoGPT (autogpt_platform), `claude_agent_max_transient_retries` (default=3) in `ChatConfig` counts **total attempts including the initial one**, not the number of extra retries. With the pre-incremented `transient_retries >= max_transient` guard in `service.py`, a value of 3 yields 3 total stream attempts (initial + 2 retries with exponential backoff: 1s, 2s). Do NOT flag this as an off-by-one — the `>=` check is intentional.
Applied to files:
autogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-04-01T04:17:41.600Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12632
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:0-0
Timestamp: 2026-04-01T04:17:41.600Z
Learning: When reviewing AutoGPT Copilot tool implementations, accept that `readOnlyHint=True` (provided via `ToolAnnotations`) may be applied unconditionally to *all* tools—even tools that have side effects (e.g., `bash_exec`, `write_workspace_file`, or other write/save operations). Do **not** flag these tools for having `readOnlyHint=True`; this is intentional to enable fully-parallel dispatch by the Anthropic SDK/CLI and has been E2E validated. Only flag `readOnlyHint` issues if they conflict with the established `ToolAnnotations` behavior (e.g., missing/incorrect propagation relative to the intended annotation mechanism).
Applied to files:
autogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-03-09T10:50:43.907Z
Learnt from: Bentlybro
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-03-09T10:50:43.907Z
Learning: Repo: Significant-Gravitas/AutoGPT — File: autogpt_platform/backend/backend/blocks/llm.py
For xAI Grok models accessed via OpenRouter, the API returns `null` for `max_completion_tokens`. The convention in this codebase is to use the model's context window size as the `max_output_tokens` value in ModelMetadata. For example, Grok 3 uses 131072 (128k) and Grok 4 uses 262144 (256k). Do not flag these as incorrect max output token values.
Applied to files:
autogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-04-08T17:27:26.657Z
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: classic/original_autogpt/CLAUDE.md:0-0
Timestamp: 2026-04-08T17:27:26.657Z
Learning: The execution flow traces: `__main__.py` → `cli()` → `cli.py:run()` → `run_auto_gpt()` → `main.py:run_auto_gpt()` (build config, set up storage, load/create agent) → `main.py:run_interaction_loop(agent)` (propose, display, feedback, execute, loop).
Applied to files:
autogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-03-04T08:04:35.881Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12273
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:216-220
Timestamp: 2026-03-04T08:04:35.881Z
Learning: In the AutoGPT Copilot backend, ensure that SVG images are not treated as vision image types by excluding 'image/svg+xml' from INLINEABLE_MIME_TYPES and MULTIMODAL_TYPES in tool_adapter.py; the Claude API supports PNG, JPEG, GIF, and WebP for vision. SVGs (XML text) should be handled via the text path instead, not the vision path.
Applied to files:
autogpt_platform/backend/backend/copilot/baseline/service.py
🔇 Additional comments (2)
autogpt_platform/backend/backend/util/tool_call_loop_test.py (1)
417-468: LGTM — assertions correctly pin the new contract.
captured_toolsplus the first/last-iteration assertions tightly couple the test to the new "drop tools + append hint on final iteration" behavior, which is exactly what the loop change needs to prevent regression.autogpt_platform/backend/backend/copilot/baseline/service.py (1)
124-132: Hint copy reads well.Clear directive ("do not call any more tools"), asks for summary + remaining work + continuation advice — matches the three things the frontend needs for a graceful wrap-up.
…turns The field now controls both the baseline and Claude Agent SDK paths, so the Claude-specific name is misleading. Primary env var is now CHAT_AGENT_MAX_TURNS; the old CHAT_CLAUDE_AGENT_MAX_TURNS stays accepted via validation_alias so existing deployments don't break.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@autogpt_platform/backend/backend/copilot/baseline/service.py`:
- Around line 1905-1916: When budget is exhausted (loop_result is truthy and
loop_result.finished_naturally is False) emit and persist a short non-error
terminal note before sending the StreamFinish so the client sees an explanatory
wrap-up; specifically, construct a brief message (e.g., "Tool budget exhausted —
ending turn with final summary.") and send it via the existing streaming/event
API used in this service (same channel that produces streamed output) and also
persist it to the conversation/history store so it appears in the UI, then
proceed to produce the StreamFinish as before; update the branch inside the if
that checks loop_result.finished_naturally (using loop_result.iterations for
context if needed) to perform these two actions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: f356fd1b-82a1-4f97-9df1-c8d3ffc92495
📒 Files selected for processing (5)
autogpt_platform/backend/backend/copilot/baseline/service.pyautogpt_platform/backend/backend/copilot/config.pyautogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.pyautogpt_platform/backend/backend/copilot/sdk/retry_scenarios_test.pyautogpt_platform/backend/backend/copilot/sdk/service.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: test (3.12)
- GitHub Check: type-check (3.12)
- GitHub Check: type-check (3.11)
- GitHub Check: test (3.13)
- GitHub Check: test (3.11)
- GitHub Check: type-check (3.13)
- GitHub Check: lint
- GitHub Check: check API types
- GitHub Check: Seer Code Review
- GitHub Check: Check PR Status
- GitHub Check: end-to-end tests
- GitHub Check: Analyze (python)
- GitHub Check: Analyze (typescript)
🧰 Additional context used
📓 Path-based instructions (3)
autogpt_platform/backend/**/*.py
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
autogpt_platform/backend/**/*.py: Use Python 3.11 (required; managed by Poetry via pyproject.toml) for backend development
Always run 'poetry run format' (Black + isort) before linting in backend development
Always run 'poetry run lint' (ruff) after formatting in backend development
autogpt_platform/backend/**/*.py: Usepoetry run ...command for executing Python package dependencies
Use top-level imports only — avoid local/inner imports except for lazy imports of heavy optional dependencies likeopenpyxl
Use absolute imports withfrom backend.module import ...for cross-package imports; single-dot relative imports are acceptable for sibling modules within the same package; avoid double-dot relative imports
Do not use duck typing — avoidhasattr/getattr/isinstancefor type dispatch; use typed interfaces/unions/protocols instead
Use Pydantic models over dataclass/namedtuple/dict for structured data
Do not use linter suppressors — no# type: ignore,# noqa,# pyright: ignore; fix the type/code instead
Prefer list comprehensions over manual loop-and-append patterns
Use early return with guard clauses first to avoid deep nesting
Use%sfor deferred interpolation indebuglog statements for efficiency; use f-strings elsewhere for readability (e.g.,logger.debug("Processing %s items", count)vslogger.info(f"Processing {count} items"))
Sanitize error paths by usingos.path.basename()in error messages to avoid leaking directory structure
Be aware of TOCTOU (Time-Of-Check-Time-Of-Use) issues — avoid check-then-act patterns for file access and credit charging
Usetransaction=Truefor Redis pipelines to ensure atomicity on multi-step operations
Usemax(0, value)guards for computed values that should never be negative
Keep files under ~300 lines; if a file grows beyond this, split by responsibility (extract helpers, models, or a sub-module into a new file)
Keep functions under ~40 lines; extract named helpers when a function grows longer
...
Files:
autogpt_platform/backend/backend/copilot/sdk/retry_scenarios_test.pyautogpt_platform/backend/backend/copilot/sdk/service.pyautogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.pyautogpt_platform/backend/backend/copilot/config.pyautogpt_platform/backend/backend/copilot/baseline/service.py
autogpt_platform/{backend,autogpt_libs}/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Format Python code with
poetry run format
Files:
autogpt_platform/backend/backend/copilot/sdk/retry_scenarios_test.pyautogpt_platform/backend/backend/copilot/sdk/service.pyautogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.pyautogpt_platform/backend/backend/copilot/config.pyautogpt_platform/backend/backend/copilot/baseline/service.py
autogpt_platform/backend/**/*_test.py
📄 CodeRabbit inference engine (autogpt_platform/backend/AGENTS.md)
autogpt_platform/backend/**/*_test.py: Use pytest with snapshot testing for API responses
Colocate test files with source files using*_test.pynaming convention
Mock at boundaries — mock where the symbol is used, not where it's defined; after refactoring, update mock targets to match new module paths
UseAsyncMockfromunittest.mockfor async functions in tests
When writing tests, use Test-Driven Development (TDD): write failing tests marked with@pytest.mark.xfailbefore implementation, then remove the marker once the implementation is complete
When creating snapshots in tests, usepoetry run pytest path/to/test.py --snapshot-update; always review snapshot changes withgit diffbefore committing
Files:
autogpt_platform/backend/backend/copilot/sdk/retry_scenarios_test.pyautogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py
🧠 Learnings (22)
📓 Common learnings
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12879
File: autogpt_platform/backend/backend/copilot/baseline/service.py:0-0
Timestamp: 2026-04-22T05:57:34.861Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py`, the approved pattern for `_run_task_subagent` (PR `#12879`, commit 187f0a5) uses a nested `try/except Exception` inside an outer `try/finally`. The outer `finally` block resets `_TASK_DEPTH_VAR` (via `_TASK_DEPTH_VAR.reset(token)`) AND calls `_absorb_inner_usage(parent_state, inner_state)` unconditionally, so both the depth ContextVar and usage roll-up are guaranteed on all exit paths including `CancelledError`/`KeyboardInterrupt`/`SystemExit`. The inner `except Exception` catches and converts failures into a `TaskResponse` error payload that is returned as `StreamToolOutputAvailable`. Do NOT flag missing ContextVar reset or usage roll-up on BaseException paths in this function.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12439
File: autogpt_platform/backend/backend/blocks/autogpt_copilot.py:0-0
Timestamp: 2026-03-16T17:00:02.827Z
Learning: In autogpt_platform/backend/backend/blocks/autogpt_copilot.py, the recursion guard uses two module-level ContextVars: `_copilot_recursion_depth` (tracks current nesting depth) and `_copilot_recursion_limit` (stores the chain-wide ceiling). On the first invocation, `_copilot_recursion_limit` is set to `max_recursion_depth`; nested calls use `min(inherited_limit, max_recursion_depth)`, so they can only lower the cap, never raise it. The entry/exit logic is extracted into module-level helper functions. This is the approved pattern for preventing runaway sub-agent recursion in AutogptCopilotBlock (PR `#12439`, commits 348e9f8e2 and 3b70f61b1).
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-21T11:41:05.877Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py` (PR `#12870`, commits 080d42b9d and 3d7b38162), the `_close_reasoning_block_if_open(state)` helper centralises all four reasoning-block-close call sites (text branch, tool_calls branch, stream-end, exception path). The outer `finally` block of `_baseline_llm_caller` calls this helper plus stripper flush + `StreamTextEnd` to guarantee matched end events are emitted before `StreamFinishStep` on both normal and exception paths. Do NOT flag duplicated close logic or missing reasoning-end-on-exception as issues in this function.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12632
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:0-0
Timestamp: 2026-04-01T04:17:43.495Z
Learning: In autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py (PR `#12632`, commit 12ae03c), the per-tool `BaseTool.read_only` property approach was removed. Instead, `readOnlyHint=True` (via `ToolAnnotations`) is applied unconditionally to ALL tools — including side-effect tools like `bash_exec` and `write_workspace_file` — to enable fully parallel dispatch by the Anthropic SDK/CLI. Do not flag tools with mutating operations (e.g. save_to_path, write operations) for having `readOnlyHint=True`; this is intentional and E2E validated (3x bash_exec(sleep 3) completed in 3.3s vs 9s sequential).
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12636
File: autogpt_platform/backend/backend/copilot/sdk/service.py:0-0
Timestamp: 2026-04-01T14:54:01.937Z
Learning: In Significant-Gravitas/AutoGPT (autogpt_platform), `claude_agent_max_transient_retries` (default=3) in `ChatConfig` counts **total attempts including the initial one**, not the number of extra retries. With the pre-incremented `transient_retries >= max_transient` guard in `service.py`, a value of 3 yields 3 total stream attempts (initial + 2 retries with exponential backoff: 1s, 2s). Do NOT flag this as an off-by-one — the `>=` check is intentional.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12797
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1991-2021
Timestamp: 2026-04-15T13:44:34.273Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py` (`_run_stream_attempt`), the pre-create block (PR `#12797`) intentionally does NOT call `state.transcript_builder.append_assistant(...)` when inserting the empty assistant placeholder into `ctx.session.messages`. The transcript is left ending at the `tool_result` entry (N entries) while `message_count` metadata is N+1. This mismatch is benign and deliberate: on the next `--resume`, the SDK sees the transcript ending at `tool_result` and correctly regenerates the assistant response. Pre-appending the assistant turn to the transcript would suppress regeneration while leaving `session.messages[-1].content = ""` permanently (worse outcome). On the gap-fallback path, `transcript_msg_count (N+1) >= msg_count-1 (N)` means no gap is injected for the empty placeholder, which is correct because injecting an empty assistant message as context would mislead the SDK. Do NOT flag this transcript/message_count discrepancy as a bug.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12773
File: autogpt_platform/backend/backend/blocks/autopilot.py:631-638
Timestamp: 2026-04-14T07:35:11.464Z
Learning: In `autogpt_platform/backend/backend/copilot/executor/utils.py`, `CoPilotExecutionEntry` includes a `permissions: CopilotPermissions | None` field (added in PR `#12773` / commit a0184c87b9). `enqueue_copilot_turn` accepts and serializes this field into the queue entry, `_enqueue_for_recovery` in `autopilot.py` accepts and forwards `permissions` to `enqueue_copilot_turn`, and `_execute_async` in `processor.py` restores `entry.permissions` and passes it into `stream_chat_completion_sdk`/`stream_chat_completion_baseline` via `set_execution_context`. This ensures recovered sub-agent turns respect the same tool/block permission ceiling as the original in-process execution (mirroring `_merge_inherited_permissions`). Do NOT flag recovered turns as losing their permission ceiling — it is now fully propagated through the queue.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/copilot/workflow_import/converter.py:0-0
Timestamp: 2026-03-17T10:57:12.953Z
Learning: In Significant-Gravitas/AutoGPT PR `#12440`, `autogpt_platform/backend/backend/copilot/workflow_import/converter.py` was fully rewritten (commit 732960e2d) to no longer make direct LLM/OpenAI API calls. The converter now builds a structured text prompt for AutoPilot/CoPilot instead. There is no `response.choices` access or any direct LLM client usage in this file. Do not flag `response.choices` access or LLM client initialization patterns as issues in this file.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/sdk/service.py:0-0
Timestamp: 2026-04-22T12:26:42.571Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, `_resolve_sdk_model_for_request`: when a per-user LaunchDarkly model value fails `_normalize_model_name` (e.g. a `moonshotai/kimi-*` slug in direct-Anthropic mode), the fallback must be tier-specific — `config.thinking_advanced_model` for advanced tier, `config.thinking_standard_model` for standard tier — NOT the generic `_resolve_sdk_model()` (which is standard-only and returns None under subscription mode). If the tier-specific config default also fails `_normalize_model_name`, re-raise the original LD error; this is a deployment-level misconfiguration that `model_validator` should have caught at startup. Established in PR `#12881` commit 637d2fef5.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12604
File: autogpt_platform/backend/backend/copilot/sdk/security_hooks.py:165-171
Timestamp: 2026-03-30T11:49:37.770Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/security_hooks.py`, the `web_search_count` and `total_tool_call_count` circuit-breaker counters in `create_security_hooks` are intentionally per-turn (closure-local), not per-session. Hooks are recreated per stream invocation in `service.py`, so counters reset each turn. This is an accepted v1 design: it caps a single runaway turn (incident d2f7cba3: 179 WebSearch calls, $20.66). True per-session persistence via Redis is deferred to a later iteration. Do not flag these as a per-session vs. per-turn mismatch bug.
📚 Learning: 2026-04-22T12:26:42.571Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/sdk/service.py:0-0
Timestamp: 2026-04-22T12:26:42.571Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, `_resolve_sdk_model_for_request`: when a per-user LaunchDarkly model value fails `_normalize_model_name` (e.g. a `moonshotai/kimi-*` slug in direct-Anthropic mode), the fallback must be tier-specific — `config.thinking_advanced_model` for advanced tier, `config.thinking_standard_model` for standard tier — NOT the generic `_resolve_sdk_model()` (which is standard-only and returns None under subscription mode). If the tier-specific config default also fails `_normalize_model_name`, re-raise the original LD error; this is a deployment-level misconfiguration that `model_validator` should have caught at startup. Established in PR `#12881` commit 637d2fef5.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/retry_scenarios_test.py
📚 Learning: 2026-04-01T14:54:01.937Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12636
File: autogpt_platform/backend/backend/copilot/sdk/service.py:0-0
Timestamp: 2026-04-01T14:54:01.937Z
Learning: In Significant-Gravitas/AutoGPT (autogpt_platform), `claude_agent_max_transient_retries` (default=3) in `ChatConfig` counts **total attempts including the initial one**, not the number of extra retries. With the pre-incremented `transient_retries >= max_transient` guard in `service.py`, a value of 3 yields 3 total stream attempts (initial + 2 retries with exponential backoff: 1s, 2s). Do NOT flag this as an off-by-one — the `>=` check is intentional.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/retry_scenarios_test.pyautogpt_platform/backend/backend/copilot/sdk/service.pyautogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.pyautogpt_platform/backend/backend/copilot/config.py
📚 Learning: 2026-04-15T13:44:34.273Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12797
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1991-2021
Timestamp: 2026-04-15T13:44:34.273Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py` (`_run_stream_attempt`), the pre-create block (PR `#12797`) intentionally does NOT call `state.transcript_builder.append_assistant(...)` when inserting the empty assistant placeholder into `ctx.session.messages`. The transcript is left ending at the `tool_result` entry (N entries) while `message_count` metadata is N+1. This mismatch is benign and deliberate: on the next `--resume`, the SDK sees the transcript ending at `tool_result` and correctly regenerates the assistant response. Pre-appending the assistant turn to the transcript would suppress regeneration while leaving `session.messages[-1].content = ""` permanently (worse outcome). On the gap-fallback path, `transcript_msg_count (N+1) >= msg_count-1 (N)` means no gap is injected for the empty placeholder, which is correct because injecting an empty assistant message as context would mislead the SDK. Do NOT flag this transcript/message_count discrepancy as a bug.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/retry_scenarios_test.pyautogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-03-27T08:39:45.696Z
Learnt from: ntindle
Repo: Significant-Gravitas/AutoGPT PR: 12592
File: autogpt_platform/frontend/AGENTS.md:1-3
Timestamp: 2026-03-27T08:39:45.696Z
Learning: In Significant-Gravitas/AutoGPT, Claude is the primary coding agent. AGENTS.md files intentionally retain Claude-specific wording (e.g., "CLAUDE.md - Frontend", "This file provides guidance to Claude Code") even though AGENTS.md is the canonical cross-agent instruction source. Do not flag Claude-specific titles or phrasing in AGENTS.md files as issues.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/retry_scenarios_test.py
📚 Learning: 2026-02-26T17:02:22.448Z
Learnt from: Pwuts
Repo: Significant-Gravitas/AutoGPT PR: 12211
File: .pre-commit-config.yaml:160-179
Timestamp: 2026-02-26T17:02:22.448Z
Learning: Keep the pre-commit hook pattern broad for autogpt_platform/backend to ensure OpenAPI schema changes are captured. Do not narrow to backend/api/ alone, since the generated schema depends on Pydantic models across multiple directories (backend/data/, backend/blocks/, backend/copilot/, backend/integrations/, backend/util/). Narrowing could miss schema changes and cause frontend type desynchronization.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/retry_scenarios_test.pyautogpt_platform/backend/backend/copilot/sdk/service.pyautogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.pyautogpt_platform/backend/backend/copilot/config.pyautogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-03-04T08:04:35.881Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12273
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:216-220
Timestamp: 2026-03-04T08:04:35.881Z
Learning: In the AutoGPT Copilot backend, ensure that SVG images are not treated as vision image types by excluding 'image/svg+xml' from INLINEABLE_MIME_TYPES and MULTIMODAL_TYPES in tool_adapter.py; the Claude API supports PNG, JPEG, GIF, and WebP for vision. SVGs (XML text) should be handled via the text path instead, not the vision path.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/retry_scenarios_test.pyautogpt_platform/backend/backend/copilot/sdk/service.pyautogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.pyautogpt_platform/backend/backend/copilot/config.pyautogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-04-01T04:17:41.600Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12632
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:0-0
Timestamp: 2026-04-01T04:17:41.600Z
Learning: When reviewing AutoGPT Copilot tool implementations, accept that `readOnlyHint=True` (provided via `ToolAnnotations`) may be applied unconditionally to *all* tools—even tools that have side effects (e.g., `bash_exec`, `write_workspace_file`, or other write/save operations). Do **not** flag these tools for having `readOnlyHint=True`; this is intentional to enable fully-parallel dispatch by the Anthropic SDK/CLI and has been E2E validated. Only flag `readOnlyHint` issues if they conflict with the established `ToolAnnotations` behavior (e.g., missing/incorrect propagation relative to the intended annotation mechanism).
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/retry_scenarios_test.pyautogpt_platform/backend/backend/copilot/sdk/service.pyautogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.pyautogpt_platform/backend/backend/copilot/config.pyautogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-03-05T15:42:08.207Z
Learnt from: ntindle
Repo: Significant-Gravitas/AutoGPT PR: 12297
File: .claude/skills/backend-check/SKILL.md:14-16
Timestamp: 2026-03-05T15:42:08.207Z
Learning: In Python files under autogpt_platform/backend (recursively), rely on poetry run format to perform formatting (Black + isort) and linting (ruff). Do not run poetry run lint as a separate step after poetry run format, since format already includes linting checks.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/retry_scenarios_test.pyautogpt_platform/backend/backend/copilot/sdk/service.pyautogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.pyautogpt_platform/backend/backend/copilot/config.pyautogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-03-16T16:35:40.236Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/api/features/workflow_import.py:54-63
Timestamp: 2026-03-16T16:35:40.236Z
Learning: Avoid using the word 'competitor' in public-facing identifiers and text. Use neutral naming for API paths, model names, function names, and UI text. Examples: rename 'CompetitorFormat' to 'SourcePlatform', 'convert_competitor_workflow' to 'convert_workflow', '/competitor-workflow' to '/workflow'. Apply this guideline to files under autogpt_platform/backend and autogpt_platform/frontend.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/retry_scenarios_test.pyautogpt_platform/backend/backend/copilot/sdk/service.pyautogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.pyautogpt_platform/backend/backend/copilot/config.pyautogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-03-31T15:37:38.626Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12623
File: autogpt_platform/backend/backend/copilot/tools/agent_generator/fixer.py:37-47
Timestamp: 2026-03-31T15:37:38.626Z
Learning: When validating/constructing Anthropic API model IDs in Significant-Gravitas/AutoGPT, allow the hyphen-separated Claude Opus 4.6 model ID `claude-opus-4-6` (it corresponds to `LlmModel.CLAUDE_4_6_OPUS` in `autogpt_platform/backend/backend/blocks/llm.py`). Do NOT require the dot-separated form in Anthropic contexts. Only OpenRouter routing variants should use the dot separator (e.g., `anthropic/claude-opus-4.6`); `claude-opus-4-6` should be treated as correct when passed to Anthropic, and flagged only if it’s used in the OpenRouter path where the dot form is expected.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/retry_scenarios_test.pyautogpt_platform/backend/backend/copilot/sdk/service.pyautogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.pyautogpt_platform/backend/backend/copilot/config.pyautogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-04-15T02:43:36.890Z
Learnt from: ntindle
Repo: Significant-Gravitas/AutoGPT PR: 12780
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:0-0
Timestamp: 2026-04-15T02:43:36.890Z
Learning: When reviewing Python exception handlers, do not flag `isinstance(e, X)` checks as dead/unreachable if the caught exception `X` is a subclass of the exception type being handled. For example, if `X` (e.g., `VirusScanError`) inherits from `ValueError` (directly or via an intermediate class) and it can be raised within an `except ValueError:` block, then `isinstance(e, X)` inside that handler is reachable and should not be treated as dead code.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/retry_scenarios_test.pyautogpt_platform/backend/backend/copilot/sdk/service.pyautogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.pyautogpt_platform/backend/backend/copilot/config.pyautogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-04-22T11:46:04.431Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/config.py:0-0
Timestamp: 2026-04-22T11:46:04.431Z
Learning: Do not flag the Claude Sonnet 4.6 model ID as incorrect when it uses the project’s established hyphenated convention: `anthropic/claude-sonnet-4-6`. This hyphen form is the intentional, production convention and should be treated as valid (including in files like llm.py, blocks tests, reasoning.py, `_is_anthropic_model` tests, and config defaults). Note that OpenRouter also accepts the dot variant `anthropic/claude-sonnet-4.6`, so either form may be tolerated, but `anthropic/claude-sonnet-4-6` should be considered the standard to match project usage.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/retry_scenarios_test.pyautogpt_platform/backend/backend/copilot/sdk/service.pyautogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.pyautogpt_platform/backend/backend/copilot/config.pyautogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-04-22T11:46:12.892Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/baseline/service.py:322-332
Timestamp: 2026-04-22T11:46:12.892Z
Learning: In this codebase (Significant-Gravitas/AutoGPT), OpenRouter-routed Anthropic model IDs should use the hyphen-separated convention (e.g., `anthropic/claude-sonnet-4-6`, `anthropic/claude-opus-4-6`). Although OpenRouter may accept both hyphen and dot variants, treat the hyphen-separated form as the intended, correct codebase-wide convention and do not flag it as an error. Only flag the dot-separated variant (e.g., `anthropic/claude-sonnet-4.6`) as incorrect when reviewing/validating model ID strings for OpenRouter-routed Anthropic models.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/retry_scenarios_test.pyautogpt_platform/backend/backend/copilot/sdk/service.pyautogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.pyautogpt_platform/backend/backend/copilot/config.pyautogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-03-17T10:57:12.953Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/copilot/workflow_import/converter.py:0-0
Timestamp: 2026-03-17T10:57:12.953Z
Learning: In Significant-Gravitas/AutoGPT PR `#12440`, `autogpt_platform/backend/backend/copilot/workflow_import/converter.py` was fully rewritten (commit 732960e2d) to no longer make direct LLM/OpenAI API calls. The converter now builds a structured text prompt for AutoPilot/CoPilot instead. There is no `response.choices` access or any direct LLM client usage in this file. Do not flag `response.choices` access or LLM client initialization patterns as issues in this file.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.pyautogpt_platform/backend/backend/copilot/config.pyautogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-03-16T17:00:02.827Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12439
File: autogpt_platform/backend/backend/blocks/autogpt_copilot.py:0-0
Timestamp: 2026-03-16T17:00:02.827Z
Learning: In autogpt_platform/backend/backend/blocks/autogpt_copilot.py, the recursion guard uses two module-level ContextVars: `_copilot_recursion_depth` (tracks current nesting depth) and `_copilot_recursion_limit` (stores the chain-wide ceiling). On the first invocation, `_copilot_recursion_limit` is set to `max_recursion_depth`; nested calls use `min(inherited_limit, max_recursion_depth)`, so they can only lower the cap, never raise it. The entry/exit logic is extracted into module-level helper functions. This is the approved pattern for preventing runaway sub-agent recursion in AutogptCopilotBlock (PR `#12439`, commits 348e9f8e2 and 3b70f61b1).
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.pyautogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.pyautogpt_platform/backend/backend/copilot/config.py
📚 Learning: 2026-04-14T07:35:11.464Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12773
File: autogpt_platform/backend/backend/blocks/autopilot.py:631-638
Timestamp: 2026-04-14T07:35:11.464Z
Learning: In `autogpt_platform/backend/backend/copilot/executor/utils.py`, `CoPilotExecutionEntry` includes a `permissions: CopilotPermissions | None` field (added in PR `#12773` / commit a0184c87b9). `enqueue_copilot_turn` accepts and serializes this field into the queue entry, `_enqueue_for_recovery` in `autopilot.py` accepts and forwards `permissions` to `enqueue_copilot_turn`, and `_execute_async` in `processor.py` restores `entry.permissions` and passes it into `stream_chat_completion_sdk`/`stream_chat_completion_baseline` via `set_execution_context`. This ensures recovered sub-agent turns respect the same tool/block permission ceiling as the original in-process execution (mirroring `_merge_inherited_permissions`). Do NOT flag recovered turns as losing their permission ceiling — it is now fully propagated through the queue.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.pyautogpt_platform/backend/backend/copilot/config.py
📚 Learning: 2026-04-22T05:58:28.595Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12879
File: autogpt_platform/frontend/src/app/api/openapi.json:14576-14577
Timestamp: 2026-04-22T05:58:28.595Z
Learning: Repo: Significant-Gravitas/AutoGPT — autogpt_platform
Process convention: When adding new CoPilot tool response models and updating ToolResponseUnion in backend/api/features/chat/routes.py, regenerate the frontend OpenAPI schema via `poetry run export-api-schema` (do not hand-edit autogpt_platform/frontend/src/app/api/openapi.json).
Applied to files:
autogpt_platform/backend/backend/copilot/config.py
📚 Learning: 2026-04-13T14:19:19.341Z
Learnt from: Pwuts
Repo: Significant-Gravitas/AutoGPT PR: 12740
File: autogpt_platform/frontend/src/app/api/openapi.json:0-0
Timestamp: 2026-04-13T14:19:19.341Z
Learning: Repo: Significant-Gravitas/AutoGPT — autogpt_platform
When adding new CoPilot tool response models (e.g., ScheduleListResponse, ScheduleDeletedResponse), update backend/api/features/chat/routes.py to include them in the ToolResponseUnion so the frontend’s autogenerated openapi.json dummy export (/api/chat/schema/tool-responses) exposes them for codegen. Do not hand-edit frontend/src/app/api/openapi.json.
Applied to files:
autogpt_platform/backend/backend/copilot/config.py
📚 Learning: 2026-04-22T05:57:34.861Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12879
File: autogpt_platform/backend/backend/copilot/baseline/service.py:0-0
Timestamp: 2026-04-22T05:57:34.861Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py`, the approved pattern for `_run_task_subagent` (PR `#12879`, commit 187f0a5) uses a nested `try/except Exception` inside an outer `try/finally`. The outer `finally` block resets `_TASK_DEPTH_VAR` (via `_TASK_DEPTH_VAR.reset(token)`) AND calls `_absorb_inner_usage(parent_state, inner_state)` unconditionally, so both the depth ContextVar and usage roll-up are guaranteed on all exit paths including `CancelledError`/`KeyboardInterrupt`/`SystemExit`. The inner `except Exception` catches and converts failures into a `TaskResponse` error payload that is returned as `StreamToolOutputAvailable`. Do NOT flag missing ContextVar reset or usage roll-up on BaseException paths in this function.
Applied to files:
autogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-04-21T11:41:05.877Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-21T11:41:05.877Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py` (PR `#12870`, commits 080d42b9d and 3d7b38162), the `_close_reasoning_block_if_open(state)` helper centralises all four reasoning-block-close call sites (text branch, tool_calls branch, stream-end, exception path). The outer `finally` block of `_baseline_llm_caller` calls this helper plus stripper flush + `StreamTextEnd` to guarantee matched end events are emitted before `StreamFinishStep` on both normal and exception paths. Do NOT flag duplicated close logic or missing reasoning-end-on-exception as issues in this function.
Applied to files:
autogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-04-08T17:27:26.657Z
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: classic/original_autogpt/CLAUDE.md:0-0
Timestamp: 2026-04-08T17:27:26.657Z
Learning: The execution flow traces: `__main__.py` → `cli()` → `cli.py:run()` → `run_auto_gpt()` → `main.py:run_auto_gpt()` (build config, set up storage, load/create agent) → `main.py:run_interaction_loop(agent)` (propose, display, feedback, execute, loop).
Applied to files:
autogpt_platform/backend/backend/copilot/baseline/service.py
🔇 Additional comments (6)
autogpt_platform/backend/backend/copilot/sdk/service.py (1)
3324-3324: LGTM — SDK now uses the unified turn-limit config.This correctly moves the SDK
max_turnsbudget ontoconfig.agent_max_turns, matching the shared baseline/SDK configuration intent.autogpt_platform/backend/backend/copilot/sdk/retry_scenarios_test.py (1)
1037-1037: LGTM — test fixture matches the renamed config field.Using
agent_max_turnshere keeps the retry integration patches aligned withstream_chat_completion_sdk.autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py (2)
211-213: LGTM — default assertion follows the unified turn budget.The test now targets
agent_max_turnsand matches the new default of100.
507-523: LGTM — validator coverage was updated cleanly.The zero, negative, upper-bound, and accepted-boundary cases all exercise the renamed
agent_max_turnsfield.autogpt_platform/backend/backend/copilot/baseline/service.py (1)
1754-1763: LGTM — baseline now uses the shared turn budget and final-round hint.This cleanly removes the hard-coded round cap from the baseline path and reuses
tool_call_loop’s existing final-iteration hook.autogpt_platform/backend/backend/copilot/config.py (1)
210-222: The refactoring ofclaude_agent_max_turnstoagent_max_turnsis complete. A repository-wide search found no stale programmatic references to the old field name, confirming that all Python-level usages have been updated. The legacy env varCHAT_CLAUDE_AGENT_MAX_TURNSis properly mapped viavalidation_alias, so no further configuration is needed.
Addresses CodeRabbit's silent-finish concern without spamming the happy path. Emit a short fallback note only when ``state.assistant_text`` is empty — i.e. the forced-text last round gave the user zero visible response. If the model obeyed the hint and produced its own summary, we stay quiet.
There was a problem hiding this comment.
♻️ Duplicate comments (1)
autogpt_platform/backend/backend/copilot/baseline/service.py (1)
1917-1929:⚠️ Potential issue | 🟡 MinorCheck terminal-round text, not total turn text, before suppressing the fallback.
state.assistant_textincludes earlier tool-round chatter. If the model streamed any earlier text but the final budget round produced no summary, Line 1920 suppresses the fallback and the turn still ends without an explanation.Suggested fix
loop_result_holder: list[Any] = [None] loop_task: asyncio.Task[None] | None = None + last_nonfinal_assistant_len = 0 + terminal_round_had_visible_text = True async def _run_tool_call_loop() -> None: + nonlocal last_nonfinal_assistant_len, terminal_round_had_visible_text # Read/write the current session via ``_session_holder`` so this # closure doesn't need to ``nonlocal session`` — pyright can't narrow @@ if is_final_yield: + terminal_round_had_visible_text = bool( + state.assistant_text[last_nonfinal_assistant_len:].strip() + ) continue + last_nonfinal_assistant_len = len(state.assistant_text) try: pending = await drain_pending_messages(session_id) @@ - if not state.assistant_text.strip(): + if not terminal_round_had_visible_text: terminal_text = ( "Reached the tool-call budget for this turn. " "Send a follow-up message to continue from here."
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: ef57bcd1-c6d9-42ac-9cca-a10468477de2
📒 Files selected for processing (1)
autogpt_platform/backend/backend/copilot/baseline/service.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: check API types
- GitHub Check: test (3.12)
- GitHub Check: type-check (3.13)
- GitHub Check: type-check (3.12)
- GitHub Check: test (3.11)
- GitHub Check: type-check (3.11)
- GitHub Check: test (3.13)
- GitHub Check: Seer Code Review
- GitHub Check: Analyze (python)
- GitHub Check: Check PR Status
- GitHub Check: end-to-end tests
🧰 Additional context used
📓 Path-based instructions (2)
autogpt_platform/backend/**/*.py
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
autogpt_platform/backend/**/*.py: Use Python 3.11 (required; managed by Poetry via pyproject.toml) for backend development
Always run 'poetry run format' (Black + isort) before linting in backend development
Always run 'poetry run lint' (ruff) after formatting in backend development
autogpt_platform/backend/**/*.py: Usepoetry run ...command for executing Python package dependencies
Use top-level imports only — avoid local/inner imports except for lazy imports of heavy optional dependencies likeopenpyxl
Use absolute imports withfrom backend.module import ...for cross-package imports; single-dot relative imports are acceptable for sibling modules within the same package; avoid double-dot relative imports
Do not use duck typing — avoidhasattr/getattr/isinstancefor type dispatch; use typed interfaces/unions/protocols instead
Use Pydantic models over dataclass/namedtuple/dict for structured data
Do not use linter suppressors — no# type: ignore,# noqa,# pyright: ignore; fix the type/code instead
Prefer list comprehensions over manual loop-and-append patterns
Use early return with guard clauses first to avoid deep nesting
Use%sfor deferred interpolation indebuglog statements for efficiency; use f-strings elsewhere for readability (e.g.,logger.debug("Processing %s items", count)vslogger.info(f"Processing {count} items"))
Sanitize error paths by usingos.path.basename()in error messages to avoid leaking directory structure
Be aware of TOCTOU (Time-Of-Check-Time-Of-Use) issues — avoid check-then-act patterns for file access and credit charging
Usetransaction=Truefor Redis pipelines to ensure atomicity on multi-step operations
Usemax(0, value)guards for computed values that should never be negative
Keep files under ~300 lines; if a file grows beyond this, split by responsibility (extract helpers, models, or a sub-module into a new file)
Keep functions under ~40 lines; extract named helpers when a function grows longer
...
Files:
autogpt_platform/backend/backend/copilot/baseline/service.py
autogpt_platform/{backend,autogpt_libs}/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Format Python code with
poetry run format
Files:
autogpt_platform/backend/backend/copilot/baseline/service.py
🧠 Learnings (21)
📓 Common learnings
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-21T11:41:05.877Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py` (PR `#12870`, commits 080d42b9d and 3d7b38162), the `_close_reasoning_block_if_open(state)` helper centralises all four reasoning-block-close call sites (text branch, tool_calls branch, stream-end, exception path). The outer `finally` block of `_baseline_llm_caller` calls this helper plus stripper flush + `StreamTextEnd` to guarantee matched end events are emitted before `StreamFinishStep` on both normal and exception paths. Do NOT flag duplicated close logic or missing reasoning-end-on-exception as issues in this function.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12879
File: autogpt_platform/backend/backend/copilot/baseline/service.py:0-0
Timestamp: 2026-04-22T05:57:34.861Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py`, the approved pattern for `_run_task_subagent` (PR `#12879`, commit 187f0a5) uses a nested `try/except Exception` inside an outer `try/finally`. The outer `finally` block resets `_TASK_DEPTH_VAR` (via `_TASK_DEPTH_VAR.reset(token)`) AND calls `_absorb_inner_usage(parent_state, inner_state)` unconditionally, so both the depth ContextVar and usage roll-up are guaranteed on all exit paths including `CancelledError`/`KeyboardInterrupt`/`SystemExit`. The inner `except Exception` catches and converts failures into a `TaskResponse` error payload that is returned as `StreamToolOutputAvailable`. Do NOT flag missing ContextVar reset or usage roll-up on BaseException paths in this function.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12797
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1991-2021
Timestamp: 2026-04-15T13:44:34.273Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py` (`_run_stream_attempt`), the pre-create block (PR `#12797`) intentionally does NOT call `state.transcript_builder.append_assistant(...)` when inserting the empty assistant placeholder into `ctx.session.messages`. The transcript is left ending at the `tool_result` entry (N entries) while `message_count` metadata is N+1. This mismatch is benign and deliberate: on the next `--resume`, the SDK sees the transcript ending at `tool_result` and correctly regenerates the assistant response. Pre-appending the assistant turn to the transcript would suppress regeneration while leaving `session.messages[-1].content = ""` permanently (worse outcome). On the gap-fallback path, `transcript_msg_count (N+1) >= msg_count-1 (N)` means no gap is injected for the empty placeholder, which is correct because injecting an empty assistant message as context would mislead the SDK. Do NOT flag this transcript/message_count discrepancy as a bug.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12632
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:0-0
Timestamp: 2026-04-01T04:17:43.495Z
Learning: In autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py (PR `#12632`, commit 12ae03c), the per-tool `BaseTool.read_only` property approach was removed. Instead, `readOnlyHint=True` (via `ToolAnnotations`) is applied unconditionally to ALL tools — including side-effect tools like `bash_exec` and `write_workspace_file` — to enable fully parallel dispatch by the Anthropic SDK/CLI. Do not flag tools with mutating operations (e.g. save_to_path, write operations) for having `readOnlyHint=True`; this is intentional and E2E validated (3x bash_exec(sleep 3) completed in 3.3s vs 9s sequential).
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12439
File: autogpt_platform/backend/backend/blocks/autogpt_copilot.py:0-0
Timestamp: 2026-03-16T17:00:02.827Z
Learning: In autogpt_platform/backend/backend/blocks/autogpt_copilot.py, the recursion guard uses two module-level ContextVars: `_copilot_recursion_depth` (tracks current nesting depth) and `_copilot_recursion_limit` (stores the chain-wide ceiling). On the first invocation, `_copilot_recursion_limit` is set to `max_recursion_depth`; nested calls use `min(inherited_limit, max_recursion_depth)`, so they can only lower the cap, never raise it. The entry/exit logic is extracted into module-level helper functions. This is the approved pattern for preventing runaway sub-agent recursion in AutogptCopilotBlock (PR `#12439`, commits 348e9f8e2 and 3b70f61b1).
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/sdk/service.py:0-0
Timestamp: 2026-04-22T12:26:42.571Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, `_resolve_sdk_model_for_request`: when a per-user LaunchDarkly model value fails `_normalize_model_name` (e.g. a `moonshotai/kimi-*` slug in direct-Anthropic mode), the fallback must be tier-specific — `config.thinking_advanced_model` for advanced tier, `config.thinking_standard_model` for standard tier — NOT the generic `_resolve_sdk_model()` (which is standard-only and returns None under subscription mode). If the tier-specific config default also fails `_normalize_model_name`, re-raise the original LD error; this is a deployment-level misconfiguration that `model_validator` should have caught at startup. Established in PR `#12881` commit 637d2fef5.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12636
File: autogpt_platform/backend/backend/copilot/sdk/service.py:0-0
Timestamp: 2026-04-01T14:54:01.937Z
Learning: In Significant-Gravitas/AutoGPT (autogpt_platform), `claude_agent_max_transient_retries` (default=3) in `ChatConfig` counts **total attempts including the initial one**, not the number of extra retries. With the pre-incremented `transient_retries >= max_transient` guard in `service.py`, a value of 3 yields 3 total stream attempts (initial + 2 retries with exponential backoff: 1s, 2s). Do NOT flag this as an off-by-one — the `>=` check is intentional.
📚 Learning: 2026-04-22T05:57:34.861Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12879
File: autogpt_platform/backend/backend/copilot/baseline/service.py:0-0
Timestamp: 2026-04-22T05:57:34.861Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py`, the approved pattern for `_run_task_subagent` (PR `#12879`, commit 187f0a5) uses a nested `try/except Exception` inside an outer `try/finally`. The outer `finally` block resets `_TASK_DEPTH_VAR` (via `_TASK_DEPTH_VAR.reset(token)`) AND calls `_absorb_inner_usage(parent_state, inner_state)` unconditionally, so both the depth ContextVar and usage roll-up are guaranteed on all exit paths including `CancelledError`/`KeyboardInterrupt`/`SystemExit`. The inner `except Exception` catches and converts failures into a `TaskResponse` error payload that is returned as `StreamToolOutputAvailable`. Do NOT flag missing ContextVar reset or usage roll-up on BaseException paths in this function.
Applied to files:
autogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-04-15T13:44:34.273Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12797
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1991-2021
Timestamp: 2026-04-15T13:44:34.273Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py` (`_run_stream_attempt`), the pre-create block (PR `#12797`) intentionally does NOT call `state.transcript_builder.append_assistant(...)` when inserting the empty assistant placeholder into `ctx.session.messages`. The transcript is left ending at the `tool_result` entry (N entries) while `message_count` metadata is N+1. This mismatch is benign and deliberate: on the next `--resume`, the SDK sees the transcript ending at `tool_result` and correctly regenerates the assistant response. Pre-appending the assistant turn to the transcript would suppress regeneration while leaving `session.messages[-1].content = ""` permanently (worse outcome). On the gap-fallback path, `transcript_msg_count (N+1) >= msg_count-1 (N)` means no gap is injected for the empty placeholder, which is correct because injecting an empty assistant message as context would mislead the SDK. Do NOT flag this transcript/message_count discrepancy as a bug.
Applied to files:
autogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-04-21T11:41:05.877Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-21T11:41:05.877Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py` (PR `#12870`, commits 080d42b9d and 3d7b38162), the `_close_reasoning_block_if_open(state)` helper centralises all four reasoning-block-close call sites (text branch, tool_calls branch, stream-end, exception path). The outer `finally` block of `_baseline_llm_caller` calls this helper plus stripper flush + `StreamTextEnd` to guarantee matched end events are emitted before `StreamFinishStep` on both normal and exception paths. Do NOT flag duplicated close logic or missing reasoning-end-on-exception as issues in this function.
Applied to files:
autogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-03-17T10:57:12.953Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/copilot/workflow_import/converter.py:0-0
Timestamp: 2026-03-17T10:57:12.953Z
Learning: In Significant-Gravitas/AutoGPT PR `#12440`, `autogpt_platform/backend/backend/copilot/workflow_import/converter.py` was fully rewritten (commit 732960e2d) to no longer make direct LLM/OpenAI API calls. The converter now builds a structured text prompt for AutoPilot/CoPilot instead. There is no `response.choices` access or any direct LLM client usage in this file. Do not flag `response.choices` access or LLM client initialization patterns as issues in this file.
Applied to files:
autogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-03-16T17:00:02.827Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12439
File: autogpt_platform/backend/backend/blocks/autogpt_copilot.py:0-0
Timestamp: 2026-03-16T17:00:02.827Z
Learning: In autogpt_platform/backend/backend/blocks/autogpt_copilot.py, the recursion guard uses two module-level ContextVars: `_copilot_recursion_depth` (tracks current nesting depth) and `_copilot_recursion_limit` (stores the chain-wide ceiling). On the first invocation, `_copilot_recursion_limit` is set to `max_recursion_depth`; nested calls use `min(inherited_limit, max_recursion_depth)`, so they can only lower the cap, never raise it. The entry/exit logic is extracted into module-level helper functions. This is the approved pattern for preventing runaway sub-agent recursion in AutogptCopilotBlock (PR `#12439`, commits 348e9f8e2 and 3b70f61b1).
Applied to files:
autogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-04-01T04:17:41.600Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12632
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:0-0
Timestamp: 2026-04-01T04:17:41.600Z
Learning: When reviewing AutoGPT Copilot tool implementations, accept that `readOnlyHint=True` (provided via `ToolAnnotations`) may be applied unconditionally to *all* tools—even tools that have side effects (e.g., `bash_exec`, `write_workspace_file`, or other write/save operations). Do **not** flag these tools for having `readOnlyHint=True`; this is intentional to enable fully-parallel dispatch by the Anthropic SDK/CLI and has been E2E validated. Only flag `readOnlyHint` issues if they conflict with the established `ToolAnnotations` behavior (e.g., missing/incorrect propagation relative to the intended annotation mechanism).
Applied to files:
autogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-03-09T10:50:43.907Z
Learnt from: Bentlybro
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-03-09T10:50:43.907Z
Learning: Repo: Significant-Gravitas/AutoGPT — File: autogpt_platform/backend/backend/blocks/llm.py
For xAI Grok models accessed via OpenRouter, the API returns `null` for `max_completion_tokens`. The convention in this codebase is to use the model's context window size as the `max_output_tokens` value in ModelMetadata. For example, Grok 3 uses 131072 (128k) and Grok 4 uses 262144 (256k). Do not flag these as incorrect max output token values.
Applied to files:
autogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-03-15T15:30:09.706Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12385
File: autogpt_platform/backend/backend/copilot/tools/helpers.py:149-185
Timestamp: 2026-03-15T15:30:09.706Z
Learning: In autogpt_platform/backend/backend/copilot/tools/helpers.py, within execute_block, when InsufficientBalanceError occurs after post-execution credit charging (concurrent balance drain after pre-check passed), this is treated as a non-fatal billing leak: log at ERROR level with structured JSON fields `{"billing_leak": True, "user_id": ..., "cost": ...}` for monitoring/alerting, then return BlockOutputResponse normally. Discarding the output would worsen UX since the block already executed with potential side effects. Reuse the credit_model obtained during the pre-execution balance check (guarded by `if cost > 0 and credit_model:`) for the post-execution charge; do not perform a second get_user_credit_model call.
Applied to files:
autogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-04-23T00:07:27.117Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-23T00:07:27.117Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, background tasks that persist cost or emit Langfuse backfill (e.g. the cost-reconcile task) must be anchored to `_background_tasks` using `_background_tasks.add(task)` and `task.add_done_callback(_background_tasks.discard)`, mirroring the existing pattern at lines 3063 / 4232 / 4256. This prevents the asyncio task from being garbage-collected before persistence or Langfuse emission completes. Do NOT flag the absence of this anchoring as acceptable in this file. Established in PR `#12889` commit 5ce3d0388.
Applied to files:
autogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-04-21T17:31:23.683Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12873
File: autogpt_platform/backend/backend/copilot/baseline/reasoning.py:0-0
Timestamp: 2026-04-21T17:31:23.683Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/reasoning.py` (`BaselineReasoningEmitter`), when `render_in_ui=False`, BOTH the `StreamReasoning*` wire events AND the `ChatMessage(role="reasoning")` persistence append must be suppressed together. `convertChatSessionToUiMessages.ts` unconditionally re-renders all persisted `role="reasoning"` rows as `{type:"reasoning"}` UI parts on reload, so persisting rows while silencing live wire events would resurrect the reasoning collapse on page refresh. The audit trail is preserved through the provider transcript and `_format_sdk_content_blocks` (SDK path) instead. The baseline and SDK paths mirror each other: flag off → no live wire event, no persisted row, no hydrated collapse. This was established in PR `#12873`, commit 7ef10b26c.
Applied to files:
autogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-04-03T11:14:45.569Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12623
File: autogpt_platform/backend/backend/copilot/baseline/service.py:0-0
Timestamp: 2026-04-03T11:14:45.569Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py`, `transcript_builder.append_user(content=message)` is called unconditionally even when the message is a duplicate that was suppressed by the `is_new_message` guard. This is intentional: the downloaded transcript may be stale (uploaded before the previous attempt persisted the message), so always appending the current user turn prevents a malformed assistant-after-assistant transcript structure. The `is_user_message` flag is still checked (`if message and is_user_message:`), so assistant-role inputs are excluded. Do NOT flag this as a bug.
Applied to files:
autogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-04-08T17:27:26.657Z
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: classic/original_autogpt/CLAUDE.md:0-0
Timestamp: 2026-04-08T17:27:26.657Z
Learning: The execution flow traces: `__main__.py` → `cli()` → `cli.py:run()` → `run_auto_gpt()` → `main.py:run_auto_gpt()` (build config, set up storage, load/create agent) → `main.py:run_interaction_loop(agent)` (propose, display, feedback, execute, loop).
Applied to files:
autogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-02-26T17:02:22.448Z
Learnt from: Pwuts
Repo: Significant-Gravitas/AutoGPT PR: 12211
File: .pre-commit-config.yaml:160-179
Timestamp: 2026-02-26T17:02:22.448Z
Learning: Keep the pre-commit hook pattern broad for autogpt_platform/backend to ensure OpenAPI schema changes are captured. Do not narrow to backend/api/ alone, since the generated schema depends on Pydantic models across multiple directories (backend/data/, backend/blocks/, backend/copilot/, backend/integrations/, backend/util/). Narrowing could miss schema changes and cause frontend type desynchronization.
Applied to files:
autogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-03-04T08:04:35.881Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12273
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:216-220
Timestamp: 2026-03-04T08:04:35.881Z
Learning: In the AutoGPT Copilot backend, ensure that SVG images are not treated as vision image types by excluding 'image/svg+xml' from INLINEABLE_MIME_TYPES and MULTIMODAL_TYPES in tool_adapter.py; the Claude API supports PNG, JPEG, GIF, and WebP for vision. SVGs (XML text) should be handled via the text path instead, not the vision path.
Applied to files:
autogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-03-05T15:42:08.207Z
Learnt from: ntindle
Repo: Significant-Gravitas/AutoGPT PR: 12297
File: .claude/skills/backend-check/SKILL.md:14-16
Timestamp: 2026-03-05T15:42:08.207Z
Learning: In Python files under autogpt_platform/backend (recursively), rely on poetry run format to perform formatting (Black + isort) and linting (ruff). Do not run poetry run lint as a separate step after poetry run format, since format already includes linting checks.
Applied to files:
autogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-03-16T16:35:40.236Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/api/features/workflow_import.py:54-63
Timestamp: 2026-03-16T16:35:40.236Z
Learning: Avoid using the word 'competitor' in public-facing identifiers and text. Use neutral naming for API paths, model names, function names, and UI text. Examples: rename 'CompetitorFormat' to 'SourcePlatform', 'convert_competitor_workflow' to 'convert_workflow', '/competitor-workflow' to '/workflow'. Apply this guideline to files under autogpt_platform/backend and autogpt_platform/frontend.
Applied to files:
autogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-03-31T15:37:38.626Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12623
File: autogpt_platform/backend/backend/copilot/tools/agent_generator/fixer.py:37-47
Timestamp: 2026-03-31T15:37:38.626Z
Learning: When validating/constructing Anthropic API model IDs in Significant-Gravitas/AutoGPT, allow the hyphen-separated Claude Opus 4.6 model ID `claude-opus-4-6` (it corresponds to `LlmModel.CLAUDE_4_6_OPUS` in `autogpt_platform/backend/backend/blocks/llm.py`). Do NOT require the dot-separated form in Anthropic contexts. Only OpenRouter routing variants should use the dot separator (e.g., `anthropic/claude-opus-4.6`); `claude-opus-4-6` should be treated as correct when passed to Anthropic, and flagged only if it’s used in the OpenRouter path where the dot form is expected.
Applied to files:
autogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-04-15T02:43:36.890Z
Learnt from: ntindle
Repo: Significant-Gravitas/AutoGPT PR: 12780
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:0-0
Timestamp: 2026-04-15T02:43:36.890Z
Learning: When reviewing Python exception handlers, do not flag `isinstance(e, X)` checks as dead/unreachable if the caught exception `X` is a subclass of the exception type being handled. For example, if `X` (e.g., `VirusScanError`) inherits from `ValueError` (directly or via an intermediate class) and it can be raised within an `except ValueError:` block, then `isinstance(e, X)` inside that handler is reachable and should not be treated as dead code.
Applied to files:
autogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-04-22T11:46:04.431Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/config.py:0-0
Timestamp: 2026-04-22T11:46:04.431Z
Learning: Do not flag the Claude Sonnet 4.6 model ID as incorrect when it uses the project’s established hyphenated convention: `anthropic/claude-sonnet-4-6`. This hyphen form is the intentional, production convention and should be treated as valid (including in files like llm.py, blocks tests, reasoning.py, `_is_anthropic_model` tests, and config defaults). Note that OpenRouter also accepts the dot variant `anthropic/claude-sonnet-4.6`, so either form may be tolerated, but `anthropic/claude-sonnet-4-6` should be considered the standard to match project usage.
Applied to files:
autogpt_platform/backend/backend/copilot/baseline/service.py
📚 Learning: 2026-04-22T11:46:12.892Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/baseline/service.py:322-332
Timestamp: 2026-04-22T11:46:12.892Z
Learning: In this codebase (Significant-Gravitas/AutoGPT), OpenRouter-routed Anthropic model IDs should use the hyphen-separated convention (e.g., `anthropic/claude-sonnet-4-6`, `anthropic/claude-opus-4-6`). Although OpenRouter may accept both hyphen and dot variants, treat the hyphen-separated form as the intended, correct codebase-wide convention and do not flag it as an error. Only flag the dot-separated variant (e.g., `anthropic/claude-sonnet-4.6`) as incorrect when reviewing/validating model ID strings for OpenRouter-routed Anthropic models.
Applied to files:
autogpt_platform/backend/backend/copilot/baseline/service.py
🔇 Additional comments (3)
autogpt_platform/backend/backend/copilot/baseline/service.py (3)
124-132: LGTM — the final-iteration hint is clear and scoped.This gives the model an explicit wrap-up instruction without changing normal tool-round behavior.
1754-1763: LGTM — dynamic budget and final-round hint are wired correctly.Using
config.agent_max_turnskeeps the baseline path aligned with the shared copilot turn budget.
1781-1784: LGTM — final-yield detection follows the configured cap.This keeps pending-message drain behavior correct when
agent_max_turnschanges.
Extract the "should we emit a terminal note?" decision into ``_budget_exhausted_notice_text`` and cover it with unit tests so the new fallback branch isn't stranded at 0% coverage. Lifts patch coverage above codecov's 80% threshold.
…nd text CodeRabbit flagged: ``state.assistant_text`` cumulates across rounds, so the previous gate would suppress the fallback whenever any earlier round produced chatter — even if the final budget round fell silent. Track ``text_len_before_final_round`` at every non-final yield and check only the suffix added by the terminal round.
Extract the Stream event construction into ``_build_budget_exhausted_fallback_events`` so it's unit-testable away from the async streaming machinery. Lifts codecov/patch above the 80% threshold by covering the 7 previously-untested lines that produced the fallback StreamTextStart/Delta/End triple.
…y=True Sentry flagged: dropping ``tools=[]`` on the last iteration means ``tool_call_loop`` always takes the ``finished_naturally=True`` path, even when the model returns empty text. The fallback was gated on ``not finished_naturally``, so an empty terminal round bypassed it entirely. Switch the gate to "iterations reached budget cap" — covers both exit paths (natural finish with empty text, and non-compliant tool-calling finish). The terminal-round-text check still scopes the notice to silent finishes.

Why
On prod, longer copilot runs (complex feature implementations, multi-bug fix chains) error out with
Exceeded 30 tool-call rounds without a final response, lose mid-stream assistant output, and the UI appears to re-dispatch an older prompt. Reported by @itsababseh in #breakage for session661ba0cc-a905-4c66-bf11-61eb5423d775.Langfuse trace of that session shows 52 turns / 344 LLM calls; two turns hit exactly 30 rounds (Turn 38: implementing kill-cam/headshot juice pass; Turn 42: fixing multi-bug list). Both were legitimate, non-looping work that simply needed more rounds to complete. Round 30 fired
bash_exec, the loop cut off cold, no summary was ever produced, and the stream surfacedbaseline_tool_round_limit. Frontend subsequently re-dispatched the same user message several times (turns 39–41 × 3, turns 43–47 × 5 with identical prompt), which is what the user perceives as "falling back into acting on an older command."Root cause:
_MAX_TOOL_ROUNDS = 30has been unchanged since the baseline path was introduced (#12276). Modern agent turns with Claude Code / Kimi / Sonnet routinely need more.What
_MAX_TOOL_ROUNDSfrom 30 → 100.last_iteration_messagetotool_call_loopso the final round receives a "stop calling tools, wrap up" system hint. The model now produces a graceful summary on the last round instead of being cut off mid-tool.How
Two-line change in
backend/copilot/baseline/service.py:_LAST_ITERATION_HINTand wire it via the existinglast_iteration_messagekwarg ontool_call_loop. The shared loop already handles appending it only on the final iteration (seetool_call_loop_test.py::test_last_iteration_message_appended).Frontend retry cascade on
baseline_tool_round_limitis a separate UX issue — logging it as a follow-up.Checklist
tool_call_loop_test.pycoverslast_iteration_messagebehavior (10/10 passing)