fix(backend/copilot): raise baseline tool-round limit to 100 + graceful finish hint by majdyz · Pull Request #12892 · Significant-Gravitas/AutoGPT

majdyz · 2026-04-23T08:36:45Z

Why

On prod, longer copilot runs (complex feature implementations, multi-bug fix chains) error out with Exceeded 30 tool-call rounds without a final response, lose mid-stream assistant output, and the UI appears to re-dispatch an older prompt. Reported by @itsababseh in #breakage for session 661ba0cc-a905-4c66-bf11-61eb5423d775.

Langfuse trace of that session shows 52 turns / 344 LLM calls; two turns hit exactly 30 rounds (Turn 38: implementing kill-cam/headshot juice pass; Turn 42: fixing multi-bug list). Both were legitimate, non-looping work that simply needed more rounds to complete. Round 30 fired bash_exec, the loop cut off cold, no summary was ever produced, and the stream surfaced baseline_tool_round_limit. Frontend subsequently re-dispatched the same user message several times (turns 39–41 × 3, turns 43–47 × 5 with identical prompt), which is what the user perceives as "falling back into acting on an older command."

Root cause: _MAX_TOOL_ROUNDS = 30 has been unchanged since the baseline path was introduced (#12276). Modern agent turns with Claude Code / Kimi / Sonnet routinely need more.

What

Raise _MAX_TOOL_ROUNDS from 30 → 100.
Pass last_iteration_message to tool_call_loop so the final round receives a "stop calling tools, wrap up" system hint. The model now produces a graceful summary on the last round instead of being cut off mid-tool.

How

Two-line change in backend/copilot/baseline/service.py:

Bump the module-level constant.
Define _LAST_ITERATION_HINT and wire it via the existing last_iteration_message kwarg on tool_call_loop. The shared loop already handles appending it only on the final iteration (see tool_call_loop_test.py::test_last_iteration_message_appended).

Frontend retry cascade on baseline_tool_round_limit is a separate UX issue — logging it as a follow-up.

Checklist

My code follows the project's style guidelines
I have performed a self-review
Existing tool_call_loop_test.py covers last_iteration_message behavior (10/10 passing)
No new migrations
No breaking changes (constant/kwarg only)

…ul finish hint The old `_MAX_TOOL_ROUNDS = 30` cap cuts off legitimate long-running agent turns (multi-bug fixes, feature implementations via bash_exec sequences) mid- tool with no summary, which surfaces as "Exceeded 30 tool-call rounds without a final response" and looks to the user like the assistant "lost" its work. - Bump the limit to 100 — long agent turns routinely exceed 30 rounds. - Pass `last_iteration_message` so round N gets a "stop calling tools, wrap up" hint and the user always gets a final summary instead of a cold cutoff.

coderabbitai · 2026-04-23T08:37:00Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

The tool-call loop and Claude SDK now use ChatConfig.agent_max_turns (default 100) as the per-turn tool-call budget. On the final allowed iteration a last_iteration_message hint is injected and no tools are offered; hitting the budget logs a warning and ends the turn gracefully instead of yielding a StreamError.

Changes

Cohort / File(s)	Summary
Baseline service `autogpt_platform/backend/backend/copilot/baseline/service.py`	Use `config.agent_max_turns` for tool-loop budget; pass `last_iteration_message` into the tool loop; stop yielding a StreamError on budget exhaustion (log warning and end turn).
Configuration `autogpt_platform/backend/backend/copilot/config.py`	Rename `claude_agent_max_turns` → `agent_max_turns`; default increased to `100`; accept `CHAT_AGENT_MAX_TURNS` env var while still supporting legacy alias.
Claude SDK service `autogpt_platform/backend/backend/copilot/sdk/service.py`	Switch SDK `max_turns` source to `config.agent_max_turns`.
Tool-call loop `autogpt_platform/backend/backend/util/tool_call_loop.py`	On last iteration append `last_iteration_message` and call LLM with no tools; otherwise pass normal `tools`.
Tests `autogpt_platform/backend/backend/util/tool_call_loop_test.py`, `autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py`, `autogpt_platform/backend/backend/copilot/sdk/retry_scenarios_test.py`	Update tests to use `agent_max_turns`, assert default/value boundaries (default 100), and verify final iteration receives empty `tools` and contains `last_iteration_message`.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant BaselineService
    participant Model
    participant Tool
    Client->>BaselineService: Submit request (uses ChatConfig)
    BaselineService->>Model: Initial prompt (tools allowed)
    loop up to config.agent_max_turns
        Model->>BaselineService: Proposes tool call
        BaselineService->>Tool: Execute tool
        Tool-->>BaselineService: Tool result
        BaselineService->>Model: Provide tool result (next iteration)
    end
    alt final iteration
        BaselineService->>Model: Append last_iteration_message and call with no tools (force text)
    end
    Model-->>Client: Final text response

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

fix(copilot): P0 guardrails, transient retry, and security hardening #12636: Related changes touching agent tool-call iteration limits and config naming (claude_agent_max_turns → agent_max_turns).
feat(copilot): remove legacy copilot, add baseline non-SDK mode with tool calling #12276: Prior baseline tool-call loop modifications that this PR continues/refines.
fix(backend): add circuit breaker for infinite tool call retry loops #12499: Related adjustments to tool-call/streaming loop and iterator/abort behavior in SDK service code.

Suggested labels

size/l

Suggested reviewers

Bentlybro
Pwuts
kcze

Poem

🐰 Hopping through prompts with a curious twitch,
I count the turns and nudge the last switch,
No tools on the final, just tidy good night,
I whisper a hint and tuck responses tight,
A soft trail of crumbs to the final light.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 35.71% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'fix(backend/copilot): raise baseline tool-round limit to 100 + graceful finish hint' directly matches the main changes: increasing _MAX_TOOL_ROUNDS from 30 to 100 and adding a last_iteration_message hint for graceful finishing.
Description check	✅ Passed	The description is directly related to the changeset, providing clear context about production issues (tool-call cutoffs at 30 rounds), the root cause, and detailed explanations of the implemented fixes across multiple files.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/copilot-baseline-tool-round-limit

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-04-23T08:37:22Z

🔍 PR Overlap Detection

This check compares your PR against all other open PRs targeting the same branch to detect potential merge conflicts early.

🔴 Merge Conflicts Detected

The following PRs have been tested and will have merge conflicts if merged after this PR. Consider coordinating with the authors.

fix(copilot): prevent 524 timeout on chat deletion by deferring cleanup #12668 (Otto-AGPT · updated 6d ago)
- autogpt_platform/backend/backend/api/features/library/db.py (5 conflicts, ~67 lines)
- autogpt_platform/backend/backend/api/features/library/model.py (1 conflict, ~4 lines)
- autogpt_platform/backend/backend/api/features/subscription_routes_test.py (5 conflicts, ~376 lines)
- autogpt_platform/backend/backend/api/features/v1.py (9 conflicts, ~88 lines)
- autogpt_platform/backend/backend/copilot/baseline/service.py (2 conflicts, ~15 lines)
- autogpt_platform/backend/backend/copilot/model_test.py (1 conflict, ~5 lines)
- autogpt_platform/backend/backend/copilot/prompting.py (1 conflict, ~5 lines)
- autogpt_platform/backend/backend/copilot/sdk/service.py (3 conflicts, ~51 lines)
- autogpt_platform/backend/backend/copilot/transcript.py (1 conflict, ~11 lines)
- autogpt_platform/backend/backend/data/credit.py (5 conflicts, ~727 lines)
- autogpt_platform/backend/backend/data/credit_subscription_test.py (14 conflicts, ~1333 lines)
- autogpt_platform/frontend/src/app/(platform)/copilot/components/PulseChips/usePulseChips.ts (1 conflict, ~13 lines)
- autogpt_platform/frontend/src/app/(platform)/copilot/components/usageHelpers.ts (1 conflict, ~9 lines)
- autogpt_platform/frontend/src/app/(platform)/library/components/AgentBriefingPanel/BriefingTabContent.tsx (9 conflicts, ~147 lines)
- autogpt_platform/frontend/src/app/(platform)/library/components/AgentBriefingPanel/StatsGrid.tsx (2 conflicts, ~9 lines)
- autogpt_platform/frontend/src/app/(platform)/library/components/ContextualActionButton/ContextualActionButton.tsx (2 conflicts, ~12 lines)
- autogpt_platform/frontend/src/app/(platform)/library/components/SitrepItem/SitrepItem.tsx (2 conflicts, ~15 lines)
- autogpt_platform/frontend/src/app/(platform)/library/components/SitrepItem/useSitrepItems.ts (4 conflicts, ~97 lines)
- autogpt_platform/frontend/src/app/(platform)/library/hooks/useAgentStatus.ts (2 conflicts, ~10 lines)
- autogpt_platform/frontend/src/app/(platform)/library/hooks/useLibraryFleetSummary.ts (7 conflicts, ~57 lines)
- autogpt_platform/frontend/src/app/(platform)/library/types.ts (1 conflict, ~4 lines)
- autogpt_platform/frontend/src/app/(platform)/profile/(user)/credits/components/SubscriptionTierSection/SubscriptionTierSection.tsx (9 conflicts, ~159 lines)
- autogpt_platform/frontend/src/app/(platform)/profile/(user)/credits/components/SubscriptionTierSection/__tests__/SubscriptionTierSection.test.tsx (7 conflicts, ~256 lines)
- autogpt_platform/frontend/src/app/(platform)/profile/(user)/credits/components/SubscriptionTierSection/useSubscriptionTierSection.ts (2 conflicts, ~48 lines)
- autogpt_platform/frontend/src/app/api/openapi.json (1 conflict, ~23 lines)
- docs/integrations/block-integrations/misc.md (1 conflict, ~5 lines)
feat(platform/copilot): Reduce time to first output #12828 (Pwuts · updated 6d ago)
- 📁 autogpt_platform/backend/backend/
  - api/features/chat/routes.py (3 conflicts, ~75 lines)
  - copilot/config.py (1 conflict, ~17 lines)
  - copilot/sdk/security_hooks.py (1 conflict, ~12 lines)
  - copilot/sdk/service.py (2 conflicts, ~77 lines)

🟢 Low Risk — File Overlap Only

These PRs touch the same files but different sections (click to expand)

fix(backend/copilot): tame Kimi compaction storm + tunable threshold + Langfuse cost backfill #12889 (majdyz · updated 3m ago)
- autogpt_platform/backend/backend/copilot/sdk/service.py: L3321-3324
Persist stable copilot message IDs through hydration #12676 (rotempasharel1 · updated 9m ago)
- Shared files: autogpt_platform/backend/backend/copilot/sdk/service.py
fix(frontend/copilot): fix streaming reconnect races, hydration ordering, and reasoning split #12813 (0ubbe · updated 1h ago)
- Shared files: autogpt_platform/backend/backend/copilot/sdk/service.py
fix(copilot): mandate gh auth status check before connect_integration #12852 (tianhaocui · updated 23h ago)
- Shared files: autogpt_platform/backend/backend/copilot/sdk/service.py
feat(platform): estimate CoPilot turn cost and require approval for high-cost requests #12877 (Rushi-Balapure · updated 1d ago)
- Shared files: autogpt_platform/backend/backend/copilot/config.py
fix(backend/copilot): reuse existing credentials across chat sessions #12767 (0ubbe · updated 4d ago)
- Shared files: autogpt_platform/backend/backend/copilot/baseline/service.py, autogpt_platform/backend/backend/copilot/sdk/service.py

Summary: 2 conflict(s), 0 medium risk, 6 low risk (out of 8 PRs with file overlap)

Auto-generated on push. Ignores: openapi.json, lock files.

coderabbitai

🧹 Nitpick comments (1)

autogpt_platform/backend/backend/copilot/baseline/service.py (1)
125-133: LGTM — sensible baseline bump with graceful wrap-up hint.

Raising _MAX_TOOL_ROUNDS to 100 aligns with the observed production traffic (multi-step turns legitimately exceeding 30), and the _LAST_ITERATION_HINT wired through last_iteration_message neatly converts the hard cut-off into a user-visible summary on the final round. The existing error path at lines 1906–1914 interpolates _MAX_TOOL_ROUNDS so the user-facing message auto-updates.

One minor consideration: with the ceiling 3.3× higher, a runaway tool-loop now burns up to ~3.3× more tokens/cost before the guard fires. If not already covered, consider a per-turn cost/token budget (or dashboards/alerts on turns crossing, say, 50 rounds) so the frontend retry cascade follow-up has signal to act on. No change required for this PR.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/baseline/service.py` around lines
125 - 133, Although this PR raises _MAX_TOOL_ROUNDS to 100 and adds
_LAST_ITERATION_HINT appropriately, add lightweight monitoring and an optional
per-turn budget enforcement: emit an event/metric when a turn's tool round count
exceeds a threshold (e.g., 50) and log the current round count, token usage and
request id so dashboards/alerts can act; optionally enforce a soft cap by
checking _MAX_TOOL_ROUNDS and current round in the same code path that uses
last_iteration_message (e.g., where last_iteration_message is constructed/used)
to abort/trim execution if cumulative token usage exceeds a configured per-turn
token_budget.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@autogpt_platform/backend/backend/copilot/baseline/service.py`:
- Around line 125-133: Although this PR raises _MAX_TOOL_ROUNDS to 100 and adds
_LAST_ITERATION_HINT appropriately, add lightweight monitoring and an optional
per-turn budget enforcement: emit an event/metric when a turn's tool round count
exceeds a threshold (e.g., 50) and log the current round count, token usage and
request id so dashboards/alerts can act; optionally enforce a soft cap by
checking _MAX_TOOL_ROUNDS and current round in the same code path that uses
last_iteration_message (e.g., where last_iteration_message is constructed/used)
to abort/trim execution if cumulative token usage exceeds a configured per-turn
token_budget.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e13166e7-45d9-406e-a05b-bc8d6c8ba4f9

📥 Commits

Reviewing files that changed from the base of the PR and between cf6d703 and 416aeab.

📒 Files selected for processing (1)

autogpt_platform/backend/backend/copilot/baseline/service.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)

GitHub Check: check API types
GitHub Check: Seer Code Review
GitHub Check: type-check (3.11)
GitHub Check: test (3.13)
GitHub Check: type-check (3.13)
GitHub Check: test (3.12)
GitHub Check: test (3.11)
GitHub Check: type-check (3.12)
GitHub Check: end-to-end tests
GitHub Check: Analyze (typescript)
GitHub Check: Analyze (python)
GitHub Check: Check PR Status

🧰 Additional context used

📓 Path-based instructions (2)

autogpt_platform/backend/**/*.py

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

autogpt_platform/backend/**/*.py: Use Python 3.11 (required; managed by Poetry via pyproject.toml) for backend development
Always run 'poetry run format' (Black + isort) before linting in backend development
Always run 'poetry run lint' (ruff) after formatting in backend development

autogpt_platform/backend/**/*.py: Use poetry run ... command for executing Python package dependencies
Use top-level imports only — avoid local/inner imports except for lazy imports of heavy optional dependencies like openpyxl
Use absolute imports with from backend.module import ... for cross-package imports; single-dot relative imports are acceptable for sibling modules within the same package; avoid double-dot relative imports
Do not use duck typing — avoid hasattr/getattr/isinstance for type dispatch; use typed interfaces/unions/protocols instead
Use Pydantic models over dataclass/namedtuple/dict for structured data
Do not use linter suppressors — no # type: ignore, # noqa, # pyright: ignore; fix the type/code instead
Prefer list comprehensions over manual loop-and-append patterns
Use early return with guard clauses first to avoid deep nesting
Use %s for deferred interpolation in debug log statements for efficiency; use f-strings elsewhere for readability (e.g., logger.debug("Processing %s items", count) vs logger.info(f"Processing {count} items"))
Sanitize error paths by using os.path.basename() in error messages to avoid leaking directory structure
Be aware of TOCTOU (Time-Of-Check-Time-Of-Use) issues — avoid check-then-act patterns for file access and credit charging
Use transaction=True for Redis pipelines to ensure atomicity on multi-step operations
Use max(0, value) guards for computed values that should never be negative
Keep files under ~300 lines; if a file grows beyond this, split by responsibility (extract helpers, models, or a sub-module into a new file)
Keep functions under ~40 lines; extract named helpers when a function grows longer
...

Files:

autogpt_platform/backend/backend/copilot/baseline/service.py

autogpt_platform/{backend,autogpt_libs}/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Format Python code with poetry run format

Files:

autogpt_platform/backend/backend/copilot/baseline/service.py

🧠 Learnings (15)

📓 Common learnings

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12879
File: autogpt_platform/backend/backend/copilot/baseline/service.py:0-0
Timestamp: 2026-04-22T05:57:34.861Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py`, the approved pattern for `_run_task_subagent` (PR `#12879`, commit 187f0a5) uses a nested `try/except Exception` inside an outer `try/finally`. The outer `finally` block resets `_TASK_DEPTH_VAR` (via `_TASK_DEPTH_VAR.reset(token)`) AND calls `_absorb_inner_usage(parent_state, inner_state)` unconditionally, so both the depth ContextVar and usage roll-up are guaranteed on all exit paths including `CancelledError`/`KeyboardInterrupt`/`SystemExit`. The inner `except Exception` catches and converts failures into a `TaskResponse` error payload that is returned as `StreamToolOutputAvailable`. Do NOT flag missing ContextVar reset or usage roll-up on BaseException paths in this function.

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-21T11:41:05.877Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py` (PR `#12870`, commits 080d42b9d and 3d7b38162), the `_close_reasoning_block_if_open(state)` helper centralises all four reasoning-block-close call sites (text branch, tool_calls branch, stream-end, exception path). The outer `finally` block of `_baseline_llm_caller` calls this helper plus stripper flush + `StreamTextEnd` to guarantee matched end events are emitted before `StreamFinishStep` on both normal and exception paths. Do NOT flag duplicated close logic or missing reasoning-end-on-exception as issues in this function.

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12439
File: autogpt_platform/backend/backend/blocks/autogpt_copilot.py:0-0
Timestamp: 2026-03-16T17:00:02.827Z
Learning: In autogpt_platform/backend/backend/blocks/autogpt_copilot.py, the recursion guard uses two module-level ContextVars: `_copilot_recursion_depth` (tracks current nesting depth) and `_copilot_recursion_limit` (stores the chain-wide ceiling). On the first invocation, `_copilot_recursion_limit` is set to `max_recursion_depth`; nested calls use `min(inherited_limit, max_recursion_depth)`, so they can only lower the cap, never raise it. The entry/exit logic is extracted into module-level helper functions. This is the approved pattern for preventing runaway sub-agent recursion in AutogptCopilotBlock (PR `#12439`, commits 348e9f8e2 and 3b70f61b1).

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12879
File: autogpt_platform/frontend/src/app/api/openapi.json:14576-14577
Timestamp: 2026-04-22T05:58:28.595Z
Learning: Repo: Significant-Gravitas/AutoGPT — autogpt_platform
Process convention: When adding new CoPilot tool response models and updating ToolResponseUnion in backend/api/features/chat/routes.py, regenerate the frontend OpenAPI schema via `poetry run export-api-schema` (do not hand-edit autogpt_platform/frontend/src/app/api/openapi.json).

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12797
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1991-2021
Timestamp: 2026-04-15T13:44:34.273Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py` (`_run_stream_attempt`), the pre-create block (PR `#12797`) intentionally does NOT call `state.transcript_builder.append_assistant(...)` when inserting the empty assistant placeholder into `ctx.session.messages`. The transcript is left ending at the `tool_result` entry (N entries) while `message_count` metadata is N+1. This mismatch is benign and deliberate: on the next `--resume`, the SDK sees the transcript ending at `tool_result` and correctly regenerates the assistant response. Pre-appending the assistant turn to the transcript would suppress regeneration while leaving `session.messages[-1].content = ""` permanently (worse outcome). On the gap-fallback path, `transcript_msg_count (N+1) >= msg_count-1 (N)` means no gap is injected for the empty placeholder, which is correct because injecting an empty assistant message as context would mislead the SDK. Do NOT flag this transcript/message_count discrepancy as a bug.

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/copilot/workflow_import/converter.py:0-0
Timestamp: 2026-03-17T10:57:12.953Z
Learning: In Significant-Gravitas/AutoGPT PR `#12440`, `autogpt_platform/backend/backend/copilot/workflow_import/converter.py` was fully rewritten (commit 732960e2d) to no longer make direct LLM/OpenAI API calls. The converter now builds a structured text prompt for AutoPilot/CoPilot instead. There is no `response.choices` access or any direct LLM client usage in this file. Do not flag `response.choices` access or LLM client initialization patterns as issues in this file.

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12632
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:0-0
Timestamp: 2026-04-01T04:17:43.495Z
Learning: In autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py (PR `#12632`, commit 12ae03c), the per-tool `BaseTool.read_only` property approach was removed. Instead, `readOnlyHint=True` (via `ToolAnnotations`) is applied unconditionally to ALL tools — including side-effect tools like `bash_exec` and `write_workspace_file` — to enable fully parallel dispatch by the Anthropic SDK/CLI. Do not flag tools with mutating operations (e.g. save_to_path, write operations) for having `readOnlyHint=True`; this is intentional and E2E validated (3x bash_exec(sleep 3) completed in 3.3s vs 9s sequential).

Learnt from: Bentlybro
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-03-09T10:50:43.907Z
Learning: Repo: Significant-Gravitas/AutoGPT — File: autogpt_platform/backend/backend/blocks/llm.py
For xAI Grok models accessed via OpenRouter, the API returns `null` for `max_completion_tokens`. The convention in this codebase is to use the model's context window size as the `max_output_tokens` value in ModelMetadata. For example, Grok 3 uses 131072 (128k) and Grok 4 uses 262144 (256k). Do not flag these as incorrect max output token values.

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/sdk/service.py:0-0
Timestamp: 2026-04-22T12:26:42.571Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, `_resolve_sdk_model_for_request`: when a per-user LaunchDarkly model value fails `_normalize_model_name` (e.g. a `moonshotai/kimi-*` slug in direct-Anthropic mode), the fallback must be tier-specific — `config.thinking_advanced_model` for advanced tier, `config.thinking_standard_model` for standard tier — NOT the generic `_resolve_sdk_model()` (which is standard-only and returns None under subscription mode). If the tier-specific config default also fails `_normalize_model_name`, re-raise the original LD error; this is a deployment-level misconfiguration that `model_validator` should have caught at startup. Established in PR `#12881` commit 637d2fef5.

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12774
File: autogpt_platform/backend/backend/copilot/tools/e2b_sandbox.py:0-0
Timestamp: 2026-04-14T06:34:02.835Z
Learning: In `autogpt_platform/backend/backend/copilot/tools/e2b_sandbox.py`, the `asyncio.wait_for()` retry loop around `AsyncSandbox.create()` (introduced in PR `#12774`) can leak up to `_SANDBOX_CREATE_MAX_RETRIES - 1` (≤2) orphaned E2B sandboxes per hang incident because `wait_for` cancels only the client-side wait while E2B may complete server-side provisioning. With the default `on_timeout="pause"` lifecycle, leaked orphaned sandboxes are **paused** (not killed) when their original `end_at` is reached and persist indefinitely until explicitly killed — there is NO automatic E2B project-level cleanup. Operators must manage these manually or via their own cleanup jobs. The sandbox_id is not accessible from the timed-out coroutine, so recovery via `AsyncSandbox.connect(sandbox_id)` is not possible at timeout. This is an intentionally accepted trade-off; a proper fix is deferred to a follow-up PR. Do NOT flag the retry loop as a blocking issue.

📚 Learning: 2026-04-22T05:57:34.861Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12879
File: autogpt_platform/backend/backend/copilot/baseline/service.py:0-0
Timestamp: 2026-04-22T05:57:34.861Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py`, the approved pattern for `_run_task_subagent` (PR `#12879`, commit 187f0a5) uses a nested `try/except Exception` inside an outer `try/finally`. The outer `finally` block resets `_TASK_DEPTH_VAR` (via `_TASK_DEPTH_VAR.reset(token)`) AND calls `_absorb_inner_usage(parent_state, inner_state)` unconditionally, so both the depth ContextVar and usage roll-up are guaranteed on all exit paths including `CancelledError`/`KeyboardInterrupt`/`SystemExit`. The inner `except Exception` catches and converts failures into a `TaskResponse` error payload that is returned as `StreamToolOutputAvailable`. Do NOT flag missing ContextVar reset or usage roll-up on BaseException paths in this function.

Applied to files:

autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-04-21T11:41:05.877Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-21T11:41:05.877Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py` (PR `#12870`, commits 080d42b9d and 3d7b38162), the `_close_reasoning_block_if_open(state)` helper centralises all four reasoning-block-close call sites (text branch, tool_calls branch, stream-end, exception path). The outer `finally` block of `_baseline_llm_caller` calls this helper plus stripper flush + `StreamTextEnd` to guarantee matched end events are emitted before `StreamFinishStep` on both normal and exception paths. Do NOT flag duplicated close logic or missing reasoning-end-on-exception as issues in this function.

Applied to files:

autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-04-01T04:17:41.600Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12632
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:0-0
Timestamp: 2026-04-01T04:17:41.600Z
Learning: When reviewing AutoGPT Copilot tool implementations, accept that `readOnlyHint=True` (provided via `ToolAnnotations`) may be applied unconditionally to *all* tools—even tools that have side effects (e.g., `bash_exec`, `write_workspace_file`, or other write/save operations). Do **not** flag these tools for having `readOnlyHint=True`; this is intentional to enable fully-parallel dispatch by the Anthropic SDK/CLI and has been E2E validated. Only flag `readOnlyHint` issues if they conflict with the established `ToolAnnotations` behavior (e.g., missing/incorrect propagation relative to the intended annotation mechanism).

Applied to files:

autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-03-17T10:57:12.953Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/copilot/workflow_import/converter.py:0-0
Timestamp: 2026-03-17T10:57:12.953Z
Learning: In Significant-Gravitas/AutoGPT PR `#12440`, `autogpt_platform/backend/backend/copilot/workflow_import/converter.py` was fully rewritten (commit 732960e2d) to no longer make direct LLM/OpenAI API calls. The converter now builds a structured text prompt for AutoPilot/CoPilot instead. There is no `response.choices` access or any direct LLM client usage in this file. Do not flag `response.choices` access or LLM client initialization patterns as issues in this file.

Applied to files:

autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-03-16T17:00:02.827Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12439
File: autogpt_platform/backend/backend/blocks/autogpt_copilot.py:0-0
Timestamp: 2026-03-16T17:00:02.827Z
Learning: In autogpt_platform/backend/backend/blocks/autogpt_copilot.py, the recursion guard uses two module-level ContextVars: `_copilot_recursion_depth` (tracks current nesting depth) and `_copilot_recursion_limit` (stores the chain-wide ceiling). On the first invocation, `_copilot_recursion_limit` is set to `max_recursion_depth`; nested calls use `min(inherited_limit, max_recursion_depth)`, so they can only lower the cap, never raise it. The entry/exit logic is extracted into module-level helper functions. This is the approved pattern for preventing runaway sub-agent recursion in AutogptCopilotBlock (PR `#12439`, commits 348e9f8e2 and 3b70f61b1).

Applied to files:

autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-04-22T05:58:28.595Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12879
File: autogpt_platform/frontend/src/app/api/openapi.json:14576-14577
Timestamp: 2026-04-22T05:58:28.595Z
Learning: Repo: Significant-Gravitas/AutoGPT — autogpt_platform
Process convention: When adding new CoPilot tool response models and updating ToolResponseUnion in backend/api/features/chat/routes.py, regenerate the frontend OpenAPI schema via `poetry run export-api-schema` (do not hand-edit autogpt_platform/frontend/src/app/api/openapi.json).

Applied to files:

autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-02-26T17:02:22.448Z

Learnt from: Pwuts
Repo: Significant-Gravitas/AutoGPT PR: 12211
File: .pre-commit-config.yaml:160-179
Timestamp: 2026-02-26T17:02:22.448Z
Learning: Keep the pre-commit hook pattern broad for autogpt_platform/backend to ensure OpenAPI schema changes are captured. Do not narrow to backend/api/ alone, since the generated schema depends on Pydantic models across multiple directories (backend/data/, backend/blocks/, backend/copilot/, backend/integrations/, backend/util/). Narrowing could miss schema changes and cause frontend type desynchronization.

Applied to files:

autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-03-04T08:04:35.881Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12273
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:216-220
Timestamp: 2026-03-04T08:04:35.881Z
Learning: In the AutoGPT Copilot backend, ensure that SVG images are not treated as vision image types by excluding 'image/svg+xml' from INLINEABLE_MIME_TYPES and MULTIMODAL_TYPES in tool_adapter.py; the Claude API supports PNG, JPEG, GIF, and WebP for vision. SVGs (XML text) should be handled via the text path instead, not the vision path.

Applied to files:

autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-03-05T15:42:08.207Z

Learnt from: ntindle
Repo: Significant-Gravitas/AutoGPT PR: 12297
File: .claude/skills/backend-check/SKILL.md:14-16
Timestamp: 2026-03-05T15:42:08.207Z
Learning: In Python files under autogpt_platform/backend (recursively), rely on poetry run format to perform formatting (Black + isort) and linting (ruff). Do not run poetry run lint as a separate step after poetry run format, since format already includes linting checks.

Applied to files:

autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-03-16T16:35:40.236Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/api/features/workflow_import.py:54-63
Timestamp: 2026-03-16T16:35:40.236Z
Learning: Avoid using the word 'competitor' in public-facing identifiers and text. Use neutral naming for API paths, model names, function names, and UI text. Examples: rename 'CompetitorFormat' to 'SourcePlatform', 'convert_competitor_workflow' to 'convert_workflow', '/competitor-workflow' to '/workflow'. Apply this guideline to files under autogpt_platform/backend and autogpt_platform/frontend.

Applied to files:

autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-03-31T15:37:38.626Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12623
File: autogpt_platform/backend/backend/copilot/tools/agent_generator/fixer.py:37-47
Timestamp: 2026-03-31T15:37:38.626Z
Learning: When validating/constructing Anthropic API model IDs in Significant-Gravitas/AutoGPT, allow the hyphen-separated Claude Opus 4.6 model ID `claude-opus-4-6` (it corresponds to `LlmModel.CLAUDE_4_6_OPUS` in `autogpt_platform/backend/backend/blocks/llm.py`). Do NOT require the dot-separated form in Anthropic contexts. Only OpenRouter routing variants should use the dot separator (e.g., `anthropic/claude-opus-4.6`); `claude-opus-4-6` should be treated as correct when passed to Anthropic, and flagged only if it’s used in the OpenRouter path where the dot form is expected.

Applied to files:

autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-04-15T02:43:36.890Z

Learnt from: ntindle
Repo: Significant-Gravitas/AutoGPT PR: 12780
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:0-0
Timestamp: 2026-04-15T02:43:36.890Z
Learning: When reviewing Python exception handlers, do not flag `isinstance(e, X)` checks as dead/unreachable if the caught exception `X` is a subclass of the exception type being handled. For example, if `X` (e.g., `VirusScanError`) inherits from `ValueError` (directly or via an intermediate class) and it can be raised within an `except ValueError:` block, then `isinstance(e, X)` inside that handler is reachable and should not be treated as dead code.

Applied to files:

autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-04-22T11:46:04.431Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/config.py:0-0
Timestamp: 2026-04-22T11:46:04.431Z
Learning: Do not flag the Claude Sonnet 4.6 model ID as incorrect when it uses the project’s established hyphenated convention: `anthropic/claude-sonnet-4-6`. This hyphen form is the intentional, production convention and should be treated as valid (including in files like llm.py, blocks tests, reasoning.py, `_is_anthropic_model` tests, and config defaults). Note that OpenRouter also accepts the dot variant `anthropic/claude-sonnet-4.6`, so either form may be tolerated, but `anthropic/claude-sonnet-4-6` should be considered the standard to match project usage.

Applied to files:

autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-04-22T11:46:12.892Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/baseline/service.py:322-332
Timestamp: 2026-04-22T11:46:12.892Z
Learning: In this codebase (Significant-Gravitas/AutoGPT), OpenRouter-routed Anthropic model IDs should use the hyphen-separated convention (e.g., `anthropic/claude-sonnet-4-6`, `anthropic/claude-opus-4-6`). Although OpenRouter may accept both hyphen and dot variants, treat the hyphen-separated form as the intended, correct codebase-wide convention and do not flag it as an error. Only flag the dot-separated variant (e.g., `anthropic/claude-sonnet-4.6`) as incorrect when reviewing/validating model ID strings for OpenRouter-routed Anthropic models.

Applied to files:

autogpt_platform/backend/backend/copilot/baseline/service.py

🔇 Additional comments (1)

autogpt_platform/backend/backend/copilot/baseline/service.py (1)

1761-1762: tool_call_loop signature confirmed to support last_iteration_message.

The parameter is present in the function signature at line 165 of tool_call_loop.py, documented in the docstring, and actively used in the implementation (lines 210–218). The test test_last_iteration_message_appended validates the behavior.

codecov · 2026-04-23T08:48:58Z

Codecov Report

❌ Patch coverage is 83.78378% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 67.44%. Comparing base (cf6d703) to head (fc96844).
⚠️ Report is 1 commits behind head on dev.

Additional details and impacted files

@@            Coverage Diff             @@
##              dev   #12892      +/-   ##
==========================================
- Coverage   67.48%   67.44%   -0.04%     
==========================================
  Files        1906     1906              
  Lines      146916   146973      +57     
  Branches    15396    15400       +4     
==========================================
- Hits        99152    99133      -19     
- Misses      44801    44871      +70     
- Partials     2963     2969       +6

Flag	Coverage Δ
platform-backend	`77.40% <83.78%> (+<0.01%)`	⬆️
platform-frontend-e2e	`30.00% <ø> (-0.96%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
Platform Backend	`77.40% <83.78%> (+<0.01%)`	⬆️
Platform Frontend	`30.85% <ø> (-0.26%)`	⬇️
AutoGPT Libs	`∅ <ø> (∅)`
Classic AutoGPT	`28.43% <ø> (ø)`

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

majdyz · 2026-04-23T09:03:57Z

🧪 E2E Test Report

Date: 2026-04-23
Branch: fix/copilot-baseline-tool-round-limit
Base: dev
Worktree: /Users/majdyz/Code/AutoGPT5
Stack: native dev (poetry run app + pnpm dev), deps via docker compose
LLM route: OpenRouter API key mode (CHAT_USE_CLAUDE_CODE_SUBSCRIPTION=false)

PR under test

Two-line change in autogpt_platform/backend/backend/copilot/baseline/service.py:

_MAX_TOOL_ROUNDS raised from 30 → 100
New _LAST_ITERATION_HINT constant passed via last_iteration_message= to tool_call_loop(...)

Scenarios

1. Config wiring — loop now runs up to 100 rounds (was 30) — PASS

Loaded backend.copilot.baseline.service at runtime against the PR branch: _MAX_TOOL_ROUNDS == 100, _LAST_ITERATION_HINT present (208 chars).
inspect.getsource(stream_chat_completion_baseline) confirms max_iterations=_MAX_TOOL_ROUNDS and last_iteration_message=_LAST_ITERATION_HINT are both passed to tool_call_loop.
Drove tool_call_loop with the exact PR constants and a mock LLM that always requests tool calls: loop made 100 LLM calls (strictly > old 30 cap), yielded 100 iterations, final.iterations == 100.

2. Graceful finish on round 100 — PASS

With the same harness, inspected messages on every LLM call: on round 100 (and only round 100) the messages list contains one {role:"system", content: "You have reached the tool-call budget..."} entry — the _LAST_ITERATION_HINT.
When the mock LLM obeys the hint on round 100 and returns text:
- final.finished_naturally == True
- final.response_text == "Summary: did X, Y, Z..." (clean summary)
- final.last_tool_calls == []
- Service branch at service.py:1905 (if loop_result and not loop_result.finished_naturally) is not taken, so NO StreamError("Exceeded ... tool-call rounds ...") is emitted. The user sees a final summary instead of a cold cutoff.
Negative sub-case (model ignores hint on round 100 and still calls tools): loop still terminates cleanly at 100 iterations, yields a synthetic terminal result with response_text == "Completed after 100 iterations (limit reached)" and finished_naturally == False. In that case the service layer will still surface a StreamError — that path is unchanged by this PR and is the correct hard fallback when the model misbehaves.

3. No regression in short-turn copilot — PASS

Logged into dev UI (native stack at localhost:3000), opened /copilot, sent "In one short sentence, what is 2+2? No tools." via the real UI input + submit button.
Got a clean reply: "2+2 equals 4." with chat title auto-generated as "Simple Math Question". No errors in backend logs. Screenshot attached (02-short-copilot-reply.png).

4. Unit tests — PASS

backend/util/tool_call_loop_test.py: 10/10 pass (including test_last_iteration_message_appended, test_max_iterations_reached, test_max_iterations_zero_no_loop).
backend/copilot/baseline/service_unit_test.py: 80/80 pass — no regression in the baseline service.

Summary

Scenario	Result
1. 100-round cap wired correctly	PASS
2. Graceful finish hint reaches final round	PASS
3. Short-turn copilot still works	PASS
4. Unit tests unchanged/green	PASS (90/90)

Verdict: APPROVE. No bugs found. No fixes pushed.

Screenshots

…eline Mirrors the baseline bump in the SDK path so long agent runs aren't cut short on that path either. The SDK already wraps up more gracefully than baseline's hard cliff (Claude Agent SDK honours max_turns with a summary attempt), and a per-query $10 budget guard still prevents runaway cost. The env var CHAT_CLAUDE_AGENT_MAX_TURNS remains the operator escape hatch.

…aceful end Before: baseline had a hard-coded `_MAX_TOOL_ROUNDS = 100`; on hit it emitted `StreamError(baseline_tool_round_limit)` — which surfaced in the UI as a red "Exceeded 100 tool-call rounds" error telling the user to retry, even when the model had already produced a useful streamed response (see Discord bug in session 661ba0cc-a905-4c66-bf11-61eb5423d775). - Route baseline through the existing `config.claude_agent_max_turns` so a single `CHAT_CLAUDE_AGENT_MAX_TURNS` env var tunes both baseline and SDK. - `tool_call_loop`: on the last iteration, drop `tools` (keep the hint). With no tools available the model is physically forced to return text, so the loop always reaches `finished_naturally=True` and no truncation. - Remove the hit-limit `StreamError` emit entirely — the model's final summary is the response, not an error. The generic exception path still surfaces real failures via `baseline_error`. - Extend `test_last_iteration_message_appended` to also assert tools are dropped on the last round.

coderabbitai

🧹 Nitpick comments (3)

autogpt_platform/backend/backend/util/tool_call_loop.py (1)
206-226: Final-iteration behavior is sound; minor note on tool execution.

The is_last gate correctly forces a text-only turn by passing iteration_tools=[] to the LLM while appending the finishing hint on a defensive copy of messages. Two small observations:

is_last evaluates to str | None | bool (due to short-circuit on last_iteration_message), not strictly bool. Used as a truthy gate so it works, but a bool(...) wrap or explicit is not None would be clearer for readers/type tooling.

If the model ignores the hint and still emits tool_calls on the last iteration, lines 247–260 will still dispatch them via the original tools sequence (the LLM was told it had none, yet their handlers run). Given the loop then terminates at max_iterations with finished_naturally=False, the tool side-effects happen but the model never gets to see the results. Probably acceptable for this PR’s graceful-finish goal, but worth confirming you want those trailing tool calls to execute rather than be dropped.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/util/tool_call_loop.py` around lines 206 -
226, The is_last gate currently can be str|None|bool; make it an explicit
boolean (e.g., is_last = bool(last_iteration_message) and max_iterations > 0 and
iteration == max_iterations or use last_iteration_message is not None) to
satisfy readers/type checkers, and ensure downstream tool dispatch uses
iteration_tools (the empty sequence set when is_last) rather than the original
tools so any tool_calls produced by llm_call on the final turn are not executed;
update the logic around llm_call and the later dispatch (the code that processes
tool_calls) to check is_last or use iteration_tools when deciding whether to run
handlers so final-iteration tool side-effects are dropped.
autogpt_platform/backend/backend/copilot/baseline/service.py (2)
1754-1784: Defensive note: -1 (infinite) would break the mid-loop drain gate.

max_tool_rounds = config.claude_agent_max_turns is then used as both max_iterations= (where tool_call_loop treats -1 as infinite) and in the iterations >= max_tool_rounds check at line 1783. If the config is ever set to -1, that check becomes always-true and the pending-message mid-loop drain is effectively disabled (every yield is treated as the "final" one). Today the default is 100 so this is a latent edge case, but a small guard would future-proof it:
Proposed guard
-            max_tool_rounds = config.claude_agent_max_turns
+            max_tool_rounds = config.claude_agent_max_turns
+            # max_tool_rounds may be -1 (infinite) — in that case there is no
+            # "final yield" to skip, so mid-loop drains should always run.
             async for loop_result in tool_call_loop(
                 ...
             ):
                 ...
                 is_final_yield = (
                     loop_result.finished_naturally
-                    or loop_result.iterations >= max_tool_rounds
+                    or (max_tool_rounds > 0 and loop_result.iterations >= max_tool_rounds)
                 )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/baseline/service.py` around lines
1754 - 1784, The check that sets is_final_yield uses max_tool_rounds (from
config.claude_agent_max_turns) which may be -1 to mean "infinite", causing
iterations >= max_tool_rounds to be always true; fix by normalizing or guarding
that value before the comparison: compute a local safe_max (e.g., None or
math.inf) or check that max_tool_rounds is non-negative before comparing (for
example only evaluate loop_result.iterations >= max_tool_rounds when
max_tool_rounds >= 0), so tool_call_loop (referenced here) and the
is_final_yield logic do not treat every yield as final when
config.claude_agent_max_turns == -1.
1905-1916: Graceful-finish fallback looks good; consider a lightweight UX signal when the model ignores the hint.

Dropping the hard StreamError on budget exhaustion is well-motivated — with iteration_tools=[] forcing a final text response, the common path now yields a clean summary. However, when the model is non-compliant (still returns tool_calls on the last round), the user sees only whatever partial output was streamed and a normal StreamFinish, with no indication the turn was truncated. Logging a warning server-side is useful for ops, but the frontend retry cascade you mentioned as a separate follow-up will still be blind to this case.

Optional: emit a subtle assistant-side hint (e.g. a short trailing text delta like "(turn ended at tool-call budget — ask me to continue)") in this branch so the user knows why their run stopped without getting a red error. Not blocking for this PR.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/baseline/service.py` around lines
1905 - 1916, In the branch handling "if loop_result and not
loop_result.finished_naturally" add a lightweight assistant-facing hint so the
frontend/user knows the turn was truncated by the tool-call budget: after the
logger.warning (in service.py where loop_result and iteration_tools=[] are
referenced) emit a short trailing text delta (for example "_(turn ended at
tool-call budget — ask me to continue)_") via the same streaming/finish
mechanism used elsewhere (the codepath that currently returns a normal
StreamFinish), rather than only logging — use the existing stream/response
helper to append this final assistant message so the frontend receives the
subtle UX signal while preserving the graceful finish.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@autogpt_platform/backend/backend/copilot/baseline/service.py`:
- Around line 1754-1784: The check that sets is_final_yield uses max_tool_rounds
(from config.claude_agent_max_turns) which may be -1 to mean "infinite", causing
iterations >= max_tool_rounds to be always true; fix by normalizing or guarding
that value before the comparison: compute a local safe_max (e.g., None or
math.inf) or check that max_tool_rounds is non-negative before comparing (for
example only evaluate loop_result.iterations >= max_tool_rounds when
max_tool_rounds >= 0), so tool_call_loop (referenced here) and the
is_final_yield logic do not treat every yield as final when
config.claude_agent_max_turns == -1.
- Around line 1905-1916: In the branch handling "if loop_result and not
loop_result.finished_naturally" add a lightweight assistant-facing hint so the
frontend/user knows the turn was truncated by the tool-call budget: after the
logger.warning (in service.py where loop_result and iteration_tools=[] are
referenced) emit a short trailing text delta (for example "_(turn ended at
tool-call budget — ask me to continue)_") via the same streaming/finish
mechanism used elsewhere (the codepath that currently returns a normal
StreamFinish), rather than only logging — use the existing stream/response
helper to append this final assistant message so the frontend receives the
subtle UX signal while preserving the graceful finish.

In `@autogpt_platform/backend/backend/util/tool_call_loop.py`:
- Around line 206-226: The is_last gate currently can be str|None|bool; make it
an explicit boolean (e.g., is_last = bool(last_iteration_message) and
max_iterations > 0 and iteration == max_iterations or use last_iteration_message
is not None) to satisfy readers/type checkers, and ensure downstream tool
dispatch uses iteration_tools (the empty sequence set when is_last) rather than
the original tools so any tool_calls produced by llm_call on the final turn are
not executed; update the logic around llm_call and the later dispatch (the code
that processes tool_calls) to check is_last or use iteration_tools when deciding
whether to run handlers so final-iteration tool side-effects are dropped.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: cea64bb2-4966-4a14-a886-123868a71d62

📥 Commits

Reviewing files that changed from the base of the PR and between cc01acb and bd8bc79.

📒 Files selected for processing (3)

autogpt_platform/backend/backend/copilot/baseline/service.py
autogpt_platform/backend/backend/util/tool_call_loop.py
autogpt_platform/backend/backend/util/tool_call_loop_test.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: check API types
GitHub Check: Seer Code Review
GitHub Check: Analyze (python)
GitHub Check: Analyze (typescript)
GitHub Check: Check PR Status
GitHub Check: end-to-end tests
GitHub Check: test (3.12)
GitHub Check: test (3.13)
GitHub Check: test (3.11)
GitHub Check: type-check (3.12)

🧰 Additional context used

📓 Path-based instructions (3)

autogpt_platform/backend/**/*.py

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

autogpt_platform/backend/**/*.py: Use Python 3.11 (required; managed by Poetry via pyproject.toml) for backend development
Always run 'poetry run format' (Black + isort) before linting in backend development
Always run 'poetry run lint' (ruff) after formatting in backend development

autogpt_platform/backend/**/*.py: Use poetry run ... command for executing Python package dependencies
Use top-level imports only — avoid local/inner imports except for lazy imports of heavy optional dependencies like openpyxl
Use absolute imports with from backend.module import ... for cross-package imports; single-dot relative imports are acceptable for sibling modules within the same package; avoid double-dot relative imports
Do not use duck typing — avoid hasattr/getattr/isinstance for type dispatch; use typed interfaces/unions/protocols instead
Use Pydantic models over dataclass/namedtuple/dict for structured data
Do not use linter suppressors — no # type: ignore, # noqa, # pyright: ignore; fix the type/code instead
Prefer list comprehensions over manual loop-and-append patterns
Use early return with guard clauses first to avoid deep nesting
Use %s for deferred interpolation in debug log statements for efficiency; use f-strings elsewhere for readability (e.g., logger.debug("Processing %s items", count) vs logger.info(f"Processing {count} items"))
Sanitize error paths by using os.path.basename() in error messages to avoid leaking directory structure
Be aware of TOCTOU (Time-Of-Check-Time-Of-Use) issues — avoid check-then-act patterns for file access and credit charging
Use transaction=True for Redis pipelines to ensure atomicity on multi-step operations
Use max(0, value) guards for computed values that should never be negative
Keep files under ~300 lines; if a file grows beyond this, split by responsibility (extract helpers, models, or a sub-module into a new file)
Keep functions under ~40 lines; extract named helpers when a function grows longer
...

Files:

autogpt_platform/backend/backend/util/tool_call_loop.py
autogpt_platform/backend/backend/util/tool_call_loop_test.py
autogpt_platform/backend/backend/copilot/baseline/service.py

autogpt_platform/{backend,autogpt_libs}/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Format Python code with poetry run format

Files:

autogpt_platform/backend/backend/util/tool_call_loop.py
autogpt_platform/backend/backend/util/tool_call_loop_test.py
autogpt_platform/backend/backend/copilot/baseline/service.py

autogpt_platform/backend/**/*_test.py

📄 CodeRabbit inference engine (autogpt_platform/backend/AGENTS.md)

autogpt_platform/backend/**/*_test.py: Use pytest with snapshot testing for API responses
Colocate test files with source files using *_test.py naming convention
Mock at boundaries — mock where the symbol is used, not where it's defined; after refactoring, update mock targets to match new module paths
Use AsyncMock from unittest.mock for async functions in tests
When writing tests, use Test-Driven Development (TDD): write failing tests marked with @pytest.mark.xfail before implementation, then remove the marker once the implementation is complete
When creating snapshots in tests, use poetry run pytest path/to/test.py --snapshot-update; always review snapshot changes with git diff before committing

Files:

autogpt_platform/backend/backend/util/tool_call_loop_test.py

🧠 Learnings (20)

📓 Common learnings

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12879
File: autogpt_platform/backend/backend/copilot/baseline/service.py:0-0
Timestamp: 2026-04-22T05:57:34.861Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py`, the approved pattern for `_run_task_subagent` (PR `#12879`, commit 187f0a5) uses a nested `try/except Exception` inside an outer `try/finally`. The outer `finally` block resets `_TASK_DEPTH_VAR` (via `_TASK_DEPTH_VAR.reset(token)`) AND calls `_absorb_inner_usage(parent_state, inner_state)` unconditionally, so both the depth ContextVar and usage roll-up are guaranteed on all exit paths including `CancelledError`/`KeyboardInterrupt`/`SystemExit`. The inner `except Exception` catches and converts failures into a `TaskResponse` error payload that is returned as `StreamToolOutputAvailable`. Do NOT flag missing ContextVar reset or usage roll-up on BaseException paths in this function.

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-21T11:41:05.877Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py` (PR `#12870`, commits 080d42b9d and 3d7b38162), the `_close_reasoning_block_if_open(state)` helper centralises all four reasoning-block-close call sites (text branch, tool_calls branch, stream-end, exception path). The outer `finally` block of `_baseline_llm_caller` calls this helper plus stripper flush + `StreamTextEnd` to guarantee matched end events are emitted before `StreamFinishStep` on both normal and exception paths. Do NOT flag duplicated close logic or missing reasoning-end-on-exception as issues in this function.

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12632
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:0-0
Timestamp: 2026-04-01T04:17:43.495Z
Learning: In autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py (PR `#12632`, commit 12ae03c), the per-tool `BaseTool.read_only` property approach was removed. Instead, `readOnlyHint=True` (via `ToolAnnotations`) is applied unconditionally to ALL tools — including side-effect tools like `bash_exec` and `write_workspace_file` — to enable fully parallel dispatch by the Anthropic SDK/CLI. Do not flag tools with mutating operations (e.g. save_to_path, write operations) for having `readOnlyHint=True`; this is intentional and E2E validated (3x bash_exec(sleep 3) completed in 3.3s vs 9s sequential).

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12439
File: autogpt_platform/backend/backend/blocks/autogpt_copilot.py:0-0
Timestamp: 2026-03-16T17:00:02.827Z
Learning: In autogpt_platform/backend/backend/blocks/autogpt_copilot.py, the recursion guard uses two module-level ContextVars: `_copilot_recursion_depth` (tracks current nesting depth) and `_copilot_recursion_limit` (stores the chain-wide ceiling). On the first invocation, `_copilot_recursion_limit` is set to `max_recursion_depth`; nested calls use `min(inherited_limit, max_recursion_depth)`, so they can only lower the cap, never raise it. The entry/exit logic is extracted into module-level helper functions. This is the approved pattern for preventing runaway sub-agent recursion in AutogptCopilotBlock (PR `#12439`, commits 348e9f8e2 and 3b70f61b1).

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12797
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1991-2021
Timestamp: 2026-04-15T13:44:34.273Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py` (`_run_stream_attempt`), the pre-create block (PR `#12797`) intentionally does NOT call `state.transcript_builder.append_assistant(...)` when inserting the empty assistant placeholder into `ctx.session.messages`. The transcript is left ending at the `tool_result` entry (N entries) while `message_count` metadata is N+1. This mismatch is benign and deliberate: on the next `--resume`, the SDK sees the transcript ending at `tool_result` and correctly regenerates the assistant response. Pre-appending the assistant turn to the transcript would suppress regeneration while leaving `session.messages[-1].content = ""` permanently (worse outcome). On the gap-fallback path, `transcript_msg_count (N+1) >= msg_count-1 (N)` means no gap is injected for the empty placeholder, which is correct because injecting an empty assistant message as context would mislead the SDK. Do NOT flag this transcript/message_count discrepancy as a bug.

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/sdk/service.py:0-0
Timestamp: 2026-04-22T12:26:42.571Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, `_resolve_sdk_model_for_request`: when a per-user LaunchDarkly model value fails `_normalize_model_name` (e.g. a `moonshotai/kimi-*` slug in direct-Anthropic mode), the fallback must be tier-specific — `config.thinking_advanced_model` for advanced tier, `config.thinking_standard_model` for standard tier — NOT the generic `_resolve_sdk_model()` (which is standard-only and returns None under subscription mode). If the tier-specific config default also fails `_normalize_model_name`, re-raise the original LD error; this is a deployment-level misconfiguration that `model_validator` should have caught at startup. Established in PR `#12881` commit 637d2fef5.

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12773
File: autogpt_platform/backend/backend/copilot/pending_messages.py:52-64
Timestamp: 2026-04-14T14:36:25.545Z
Learning: In `autogpt_platform/backend/backend/copilot` (PR `#12773`, commit d7bced0c6): when draining pending messages into `session.messages`, each message's text is sanitized via `strip_user_context_tags` before persistence to prevent user-controlled `<user_context>` injection from bypassing the trusted server-side context prefix. Additionally, if `upsert_chat_session` fails after draining, the drained `PendingMessage` objects are requeued back to Redis to avoid silent message loss. Do NOT flag the drain-then-requeue pattern as redundant — it is the intentional failure-resilience strategy for the pending buffer.

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/copilot/workflow_import/converter.py:0-0
Timestamp: 2026-03-17T10:57:12.953Z
Learning: In Significant-Gravitas/AutoGPT PR `#12440`, `autogpt_platform/backend/backend/copilot/workflow_import/converter.py` was fully rewritten (commit 732960e2d) to no longer make direct LLM/OpenAI API calls. The converter now builds a structured text prompt for AutoPilot/CoPilot instead. There is no `response.choices` access or any direct LLM client usage in this file. Do not flag `response.choices` access or LLM client initialization patterns as issues in this file.

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12636
File: autogpt_platform/backend/backend/copilot/sdk/service.py:0-0
Timestamp: 2026-04-01T14:54:01.937Z
Learning: In Significant-Gravitas/AutoGPT (autogpt_platform), `claude_agent_max_transient_retries` (default=3) in `ChatConfig` counts **total attempts including the initial one**, not the number of extra retries. With the pre-incremented `transient_retries >= max_transient` guard in `service.py`, a value of 3 yields 3 total stream attempts (initial + 2 retries with exponential backoff: 1s, 2s). Do NOT flag this as an off-by-one — the `>=` check is intentional.

📚 Learning: 2026-03-17T10:57:12.953Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/copilot/workflow_import/converter.py:0-0
Timestamp: 2026-03-17T10:57:12.953Z
Learning: In Significant-Gravitas/AutoGPT PR `#12440`, `autogpt_platform/backend/backend/copilot/workflow_import/converter.py` was fully rewritten (commit 732960e2d) to no longer make direct LLM/OpenAI API calls. The converter now builds a structured text prompt for AutoPilot/CoPilot instead. There is no `response.choices` access or any direct LLM client usage in this file. Do not flag `response.choices` access or LLM client initialization patterns as issues in this file.

Applied to files:

autogpt_platform/backend/backend/util/tool_call_loop.py
autogpt_platform/backend/backend/util/tool_call_loop_test.py
autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-04-01T04:17:43.495Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12632
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:0-0
Timestamp: 2026-04-01T04:17:43.495Z
Learning: In autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py (PR `#12632`, commit 12ae03c), the per-tool `BaseTool.read_only` property approach was removed. Instead, `readOnlyHint=True` (via `ToolAnnotations`) is applied unconditionally to ALL tools — including side-effect tools like `bash_exec` and `write_workspace_file` — to enable fully parallel dispatch by the Anthropic SDK/CLI. Do not flag tools with mutating operations (e.g. save_to_path, write operations) for having `readOnlyHint=True`; this is intentional and E2E validated (3x bash_exec(sleep 3) completed in 3.3s vs 9s sequential).

Applied to files:

autogpt_platform/backend/backend/util/tool_call_loop.py
autogpt_platform/backend/backend/util/tool_call_loop_test.py

📚 Learning: 2026-02-26T17:02:22.448Z

Learnt from: Pwuts
Repo: Significant-Gravitas/AutoGPT PR: 12211
File: .pre-commit-config.yaml:160-179
Timestamp: 2026-02-26T17:02:22.448Z
Learning: Keep the pre-commit hook pattern broad for autogpt_platform/backend to ensure OpenAPI schema changes are captured. Do not narrow to backend/api/ alone, since the generated schema depends on Pydantic models across multiple directories (backend/data/, backend/blocks/, backend/copilot/, backend/integrations/, backend/util/). Narrowing could miss schema changes and cause frontend type desynchronization.

Applied to files:

autogpt_platform/backend/backend/util/tool_call_loop.py
autogpt_platform/backend/backend/util/tool_call_loop_test.py
autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-03-05T15:42:08.207Z

Learnt from: ntindle
Repo: Significant-Gravitas/AutoGPT PR: 12297
File: .claude/skills/backend-check/SKILL.md:14-16
Timestamp: 2026-03-05T15:42:08.207Z
Learning: In Python files under autogpt_platform/backend (recursively), rely on poetry run format to perform formatting (Black + isort) and linting (ruff). Do not run poetry run lint as a separate step after poetry run format, since format already includes linting checks.

Applied to files:

autogpt_platform/backend/backend/util/tool_call_loop.py
autogpt_platform/backend/backend/util/tool_call_loop_test.py
autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-03-16T16:35:40.236Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/api/features/workflow_import.py:54-63
Timestamp: 2026-03-16T16:35:40.236Z
Learning: Avoid using the word 'competitor' in public-facing identifiers and text. Use neutral naming for API paths, model names, function names, and UI text. Examples: rename 'CompetitorFormat' to 'SourcePlatform', 'convert_competitor_workflow' to 'convert_workflow', '/competitor-workflow' to '/workflow'. Apply this guideline to files under autogpt_platform/backend and autogpt_platform/frontend.

Applied to files:

autogpt_platform/backend/backend/util/tool_call_loop.py
autogpt_platform/backend/backend/util/tool_call_loop_test.py
autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-03-31T15:37:38.626Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12623
File: autogpt_platform/backend/backend/copilot/tools/agent_generator/fixer.py:37-47
Timestamp: 2026-03-31T15:37:38.626Z
Learning: When validating/constructing Anthropic API model IDs in Significant-Gravitas/AutoGPT, allow the hyphen-separated Claude Opus 4.6 model ID `claude-opus-4-6` (it corresponds to `LlmModel.CLAUDE_4_6_OPUS` in `autogpt_platform/backend/backend/blocks/llm.py`). Do NOT require the dot-separated form in Anthropic contexts. Only OpenRouter routing variants should use the dot separator (e.g., `anthropic/claude-opus-4.6`); `claude-opus-4-6` should be treated as correct when passed to Anthropic, and flagged only if it’s used in the OpenRouter path where the dot form is expected.

Applied to files:

autogpt_platform/backend/backend/util/tool_call_loop.py
autogpt_platform/backend/backend/util/tool_call_loop_test.py
autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-04-15T02:43:36.890Z

Learnt from: ntindle
Repo: Significant-Gravitas/AutoGPT PR: 12780
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:0-0
Timestamp: 2026-04-15T02:43:36.890Z
Learning: When reviewing Python exception handlers, do not flag `isinstance(e, X)` checks as dead/unreachable if the caught exception `X` is a subclass of the exception type being handled. For example, if `X` (e.g., `VirusScanError`) inherits from `ValueError` (directly or via an intermediate class) and it can be raised within an `except ValueError:` block, then `isinstance(e, X)` inside that handler is reachable and should not be treated as dead code.

Applied to files:

autogpt_platform/backend/backend/util/tool_call_loop.py
autogpt_platform/backend/backend/util/tool_call_loop_test.py
autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-04-22T11:46:04.431Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/config.py:0-0
Timestamp: 2026-04-22T11:46:04.431Z
Learning: Do not flag the Claude Sonnet 4.6 model ID as incorrect when it uses the project’s established hyphenated convention: `anthropic/claude-sonnet-4-6`. This hyphen form is the intentional, production convention and should be treated as valid (including in files like llm.py, blocks tests, reasoning.py, `_is_anthropic_model` tests, and config defaults). Note that OpenRouter also accepts the dot variant `anthropic/claude-sonnet-4.6`, so either form may be tolerated, but `anthropic/claude-sonnet-4-6` should be considered the standard to match project usage.

Applied to files:

autogpt_platform/backend/backend/util/tool_call_loop.py
autogpt_platform/backend/backend/util/tool_call_loop_test.py
autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-04-22T11:46:12.892Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/baseline/service.py:322-332
Timestamp: 2026-04-22T11:46:12.892Z
Learning: In this codebase (Significant-Gravitas/AutoGPT), OpenRouter-routed Anthropic model IDs should use the hyphen-separated convention (e.g., `anthropic/claude-sonnet-4-6`, `anthropic/claude-opus-4-6`). Although OpenRouter may accept both hyphen and dot variants, treat the hyphen-separated form as the intended, correct codebase-wide convention and do not flag it as an error. Only flag the dot-separated variant (e.g., `anthropic/claude-sonnet-4.6`) as incorrect when reviewing/validating model ID strings for OpenRouter-routed Anthropic models.

Applied to files:

autogpt_platform/backend/backend/util/tool_call_loop.py
autogpt_platform/backend/backend/util/tool_call_loop_test.py
autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-04-15T13:44:34.273Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12797
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1991-2021
Timestamp: 2026-04-15T13:44:34.273Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py` (`_run_stream_attempt`), the pre-create block (PR `#12797`) intentionally does NOT call `state.transcript_builder.append_assistant(...)` when inserting the empty assistant placeholder into `ctx.session.messages`. The transcript is left ending at the `tool_result` entry (N entries) while `message_count` metadata is N+1. This mismatch is benign and deliberate: on the next `--resume`, the SDK sees the transcript ending at `tool_result` and correctly regenerates the assistant response. Pre-appending the assistant turn to the transcript would suppress regeneration while leaving `session.messages[-1].content = ""` permanently (worse outcome). On the gap-fallback path, `transcript_msg_count (N+1) >= msg_count-1 (N)` means no gap is injected for the empty placeholder, which is correct because injecting an empty assistant message as context would mislead the SDK. Do NOT flag this transcript/message_count discrepancy as a bug.

Applied to files:

autogpt_platform/backend/backend/util/tool_call_loop_test.py
autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-02-04T16:49:42.490Z

Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-02-04T16:49:42.490Z
Learning: Applies to autogpt_platform/backend/**/test/**/*.py : Use snapshot testing with '--snapshot-update' flag in backend tests when output changes; always review with 'git diff'

Applied to files:

autogpt_platform/backend/backend/util/tool_call_loop_test.py

📚 Learning: 2026-03-26T07:00:03.405Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12574
File: autogpt_platform/backend/backend/copilot/sdk/transcript.py:980-990
Timestamp: 2026-03-26T07:00:03.405Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/transcript.py`, `_rechain_tail` intentionally rewrites `parentUuid` for **all** tail entries (not just the first), because a single assistant turn can span multiple consecutive JSONL entries sharing the same `message.id` (e.g., a thinking entry + a tool_use entry). Their original `parentUuid` values may reference entries that were absorbed into the compressed prefix, so sequential rechaining of the entire tail is required to maintain a valid parent→child graph. The test `test_chains_multiple_tail_entries` validates this: the second tail entry's `parentUuid` is rewritten from its original value to the uuid of the first tail entry.

Applied to files:

autogpt_platform/backend/backend/util/tool_call_loop_test.py

📚 Learning: 2026-04-22T05:57:34.861Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12879
File: autogpt_platform/backend/backend/copilot/baseline/service.py:0-0
Timestamp: 2026-04-22T05:57:34.861Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py`, the approved pattern for `_run_task_subagent` (PR `#12879`, commit 187f0a5) uses a nested `try/except Exception` inside an outer `try/finally`. The outer `finally` block resets `_TASK_DEPTH_VAR` (via `_TASK_DEPTH_VAR.reset(token)`) AND calls `_absorb_inner_usage(parent_state, inner_state)` unconditionally, so both the depth ContextVar and usage roll-up are guaranteed on all exit paths including `CancelledError`/`KeyboardInterrupt`/`SystemExit`. The inner `except Exception` catches and converts failures into a `TaskResponse` error payload that is returned as `StreamToolOutputAvailable`. Do NOT flag missing ContextVar reset or usage roll-up on BaseException paths in this function.

Applied to files:

autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-04-21T11:41:05.877Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-21T11:41:05.877Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py` (PR `#12870`, commits 080d42b9d and 3d7b38162), the `_close_reasoning_block_if_open(state)` helper centralises all four reasoning-block-close call sites (text branch, tool_calls branch, stream-end, exception path). The outer `finally` block of `_baseline_llm_caller` calls this helper plus stripper flush + `StreamTextEnd` to guarantee matched end events are emitted before `StreamFinishStep` on both normal and exception paths. Do NOT flag duplicated close logic or missing reasoning-end-on-exception as issues in this function.

Applied to files:

autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-04-01T14:54:01.937Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12636
File: autogpt_platform/backend/backend/copilot/sdk/service.py:0-0
Timestamp: 2026-04-01T14:54:01.937Z
Learning: In Significant-Gravitas/AutoGPT (autogpt_platform), `claude_agent_max_transient_retries` (default=3) in `ChatConfig` counts **total attempts including the initial one**, not the number of extra retries. With the pre-incremented `transient_retries >= max_transient` guard in `service.py`, a value of 3 yields 3 total stream attempts (initial + 2 retries with exponential backoff: 1s, 2s). Do NOT flag this as an off-by-one — the `>=` check is intentional.

Applied to files:

autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-04-01T04:17:41.600Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12632
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:0-0
Timestamp: 2026-04-01T04:17:41.600Z
Learning: When reviewing AutoGPT Copilot tool implementations, accept that `readOnlyHint=True` (provided via `ToolAnnotations`) may be applied unconditionally to *all* tools—even tools that have side effects (e.g., `bash_exec`, `write_workspace_file`, or other write/save operations). Do **not** flag these tools for having `readOnlyHint=True`; this is intentional to enable fully-parallel dispatch by the Anthropic SDK/CLI and has been E2E validated. Only flag `readOnlyHint` issues if they conflict with the established `ToolAnnotations` behavior (e.g., missing/incorrect propagation relative to the intended annotation mechanism).

Applied to files:

autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-03-09T10:50:43.907Z

Learnt from: Bentlybro
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-03-09T10:50:43.907Z
Learning: Repo: Significant-Gravitas/AutoGPT — File: autogpt_platform/backend/backend/blocks/llm.py
For xAI Grok models accessed via OpenRouter, the API returns `null` for `max_completion_tokens`. The convention in this codebase is to use the model's context window size as the `max_output_tokens` value in ModelMetadata. For example, Grok 3 uses 131072 (128k) and Grok 4 uses 262144 (256k). Do not flag these as incorrect max output token values.

Applied to files:

autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-04-08T17:27:26.657Z

Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: classic/original_autogpt/CLAUDE.md:0-0
Timestamp: 2026-04-08T17:27:26.657Z
Learning: The execution flow traces: `__main__.py` → `cli()` → `cli.py:run()` → `run_auto_gpt()` → `main.py:run_auto_gpt()` (build config, set up storage, load/create agent) → `main.py:run_interaction_loop(agent)` (propose, display, feedback, execute, loop).

Applied to files:

autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-03-04T08:04:35.881Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12273
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:216-220
Timestamp: 2026-03-04T08:04:35.881Z
Learning: In the AutoGPT Copilot backend, ensure that SVG images are not treated as vision image types by excluding 'image/svg+xml' from INLINEABLE_MIME_TYPES and MULTIMODAL_TYPES in tool_adapter.py; the Claude API supports PNG, JPEG, GIF, and WebP for vision. SVGs (XML text) should be handled via the text path instead, not the vision path.

Applied to files:

autogpt_platform/backend/backend/copilot/baseline/service.py

🔇 Additional comments (2)

autogpt_platform/backend/backend/util/tool_call_loop_test.py (1)

417-468: LGTM — assertions correctly pin the new contract.

captured_tools plus the first/last-iteration assertions tightly couple the test to the new "drop tools + append hint on final iteration" behavior, which is exactly what the loop change needs to prevent regression.

autogpt_platform/backend/backend/copilot/baseline/service.py (1)

124-132: Hint copy reads well.

Clear directive ("do not call any more tools"), asks for summary + remaining work + continuation advice — matches the three things the frontend needs for a graceful wrap-up.

…turns The field now controls both the baseline and Claude Agent SDK paths, so the Claude-specific name is misleading. Primary env var is now CHAT_AGENT_MAX_TURNS; the old CHAT_CLAUDE_AGENT_MAX_TURNS stays accepted via validation_alias so existing deployments don't break.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@autogpt_platform/backend/backend/copilot/baseline/service.py`:
- Around line 1905-1916: When budget is exhausted (loop_result is truthy and
loop_result.finished_naturally is False) emit and persist a short non-error
terminal note before sending the StreamFinish so the client sees an explanatory
wrap-up; specifically, construct a brief message (e.g., "Tool budget exhausted —
ending turn with final summary.") and send it via the existing streaming/event
API used in this service (same channel that produces streamed output) and also
persist it to the conversation/history store so it appears in the UI, then
proceed to produce the StreamFinish as before; update the branch inside the if
that checks loop_result.finished_naturally (using loop_result.iterations for
context if needed) to perform these two actions.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f356fd1b-82a1-4f97-9df1-c8d3ffc92495

📥 Commits

Reviewing files that changed from the base of the PR and between bd8bc79 and 714f0e3.

📒 Files selected for processing (5)

autogpt_platform/backend/backend/copilot/baseline/service.py
autogpt_platform/backend/backend/copilot/config.py
autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py
autogpt_platform/backend/backend/copilot/sdk/retry_scenarios_test.py
autogpt_platform/backend/backend/copilot/sdk/service.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)

GitHub Check: test (3.12)
GitHub Check: type-check (3.12)
GitHub Check: type-check (3.11)
GitHub Check: test (3.13)
GitHub Check: test (3.11)
GitHub Check: type-check (3.13)
GitHub Check: lint
GitHub Check: check API types
GitHub Check: Seer Code Review
GitHub Check: Check PR Status
GitHub Check: end-to-end tests
GitHub Check: Analyze (python)
GitHub Check: Analyze (typescript)

🧰 Additional context used

📓 Path-based instructions (3)

autogpt_platform/backend/**/*.py

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

autogpt_platform/backend/**/*.py: Use Python 3.11 (required; managed by Poetry via pyproject.toml) for backend development
Always run 'poetry run format' (Black + isort) before linting in backend development
Always run 'poetry run lint' (ruff) after formatting in backend development

autogpt_platform/backend/**/*.py: Use poetry run ... command for executing Python package dependencies
Use top-level imports only — avoid local/inner imports except for lazy imports of heavy optional dependencies like openpyxl
Use absolute imports with from backend.module import ... for cross-package imports; single-dot relative imports are acceptable for sibling modules within the same package; avoid double-dot relative imports
Do not use duck typing — avoid hasattr/getattr/isinstance for type dispatch; use typed interfaces/unions/protocols instead
Use Pydantic models over dataclass/namedtuple/dict for structured data
Do not use linter suppressors — no # type: ignore, # noqa, # pyright: ignore; fix the type/code instead
Prefer list comprehensions over manual loop-and-append patterns
Use early return with guard clauses first to avoid deep nesting
Use %s for deferred interpolation in debug log statements for efficiency; use f-strings elsewhere for readability (e.g., logger.debug("Processing %s items", count) vs logger.info(f"Processing {count} items"))
Sanitize error paths by using os.path.basename() in error messages to avoid leaking directory structure
Be aware of TOCTOU (Time-Of-Check-Time-Of-Use) issues — avoid check-then-act patterns for file access and credit charging
Use transaction=True for Redis pipelines to ensure atomicity on multi-step operations
Use max(0, value) guards for computed values that should never be negative
Keep files under ~300 lines; if a file grows beyond this, split by responsibility (extract helpers, models, or a sub-module into a new file)
Keep functions under ~40 lines; extract named helpers when a function grows longer
...

Files:

autogpt_platform/backend/backend/copilot/sdk/retry_scenarios_test.py
autogpt_platform/backend/backend/copilot/sdk/service.py
autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py
autogpt_platform/backend/backend/copilot/config.py
autogpt_platform/backend/backend/copilot/baseline/service.py

autogpt_platform/{backend,autogpt_libs}/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Format Python code with poetry run format

Files:

autogpt_platform/backend/backend/copilot/sdk/retry_scenarios_test.py
autogpt_platform/backend/backend/copilot/sdk/service.py
autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py
autogpt_platform/backend/backend/copilot/config.py
autogpt_platform/backend/backend/copilot/baseline/service.py

autogpt_platform/backend/**/*_test.py

📄 CodeRabbit inference engine (autogpt_platform/backend/AGENTS.md)

autogpt_platform/backend/**/*_test.py: Use pytest with snapshot testing for API responses
Colocate test files with source files using *_test.py naming convention
Mock at boundaries — mock where the symbol is used, not where it's defined; after refactoring, update mock targets to match new module paths
Use AsyncMock from unittest.mock for async functions in tests
When writing tests, use Test-Driven Development (TDD): write failing tests marked with @pytest.mark.xfail before implementation, then remove the marker once the implementation is complete
When creating snapshots in tests, use poetry run pytest path/to/test.py --snapshot-update; always review snapshot changes with git diff before committing

Files:

autogpt_platform/backend/backend/copilot/sdk/retry_scenarios_test.py
autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py

🧠 Learnings (22)

📓 Common learnings

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12879
File: autogpt_platform/backend/backend/copilot/baseline/service.py:0-0
Timestamp: 2026-04-22T05:57:34.861Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py`, the approved pattern for `_run_task_subagent` (PR `#12879`, commit 187f0a5) uses a nested `try/except Exception` inside an outer `try/finally`. The outer `finally` block resets `_TASK_DEPTH_VAR` (via `_TASK_DEPTH_VAR.reset(token)`) AND calls `_absorb_inner_usage(parent_state, inner_state)` unconditionally, so both the depth ContextVar and usage roll-up are guaranteed on all exit paths including `CancelledError`/`KeyboardInterrupt`/`SystemExit`. The inner `except Exception` catches and converts failures into a `TaskResponse` error payload that is returned as `StreamToolOutputAvailable`. Do NOT flag missing ContextVar reset or usage roll-up on BaseException paths in this function.

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12439
File: autogpt_platform/backend/backend/blocks/autogpt_copilot.py:0-0
Timestamp: 2026-03-16T17:00:02.827Z
Learning: In autogpt_platform/backend/backend/blocks/autogpt_copilot.py, the recursion guard uses two module-level ContextVars: `_copilot_recursion_depth` (tracks current nesting depth) and `_copilot_recursion_limit` (stores the chain-wide ceiling). On the first invocation, `_copilot_recursion_limit` is set to `max_recursion_depth`; nested calls use `min(inherited_limit, max_recursion_depth)`, so they can only lower the cap, never raise it. The entry/exit logic is extracted into module-level helper functions. This is the approved pattern for preventing runaway sub-agent recursion in AutogptCopilotBlock (PR `#12439`, commits 348e9f8e2 and 3b70f61b1).

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-21T11:41:05.877Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py` (PR `#12870`, commits 080d42b9d and 3d7b38162), the `_close_reasoning_block_if_open(state)` helper centralises all four reasoning-block-close call sites (text branch, tool_calls branch, stream-end, exception path). The outer `finally` block of `_baseline_llm_caller` calls this helper plus stripper flush + `StreamTextEnd` to guarantee matched end events are emitted before `StreamFinishStep` on both normal and exception paths. Do NOT flag duplicated close logic or missing reasoning-end-on-exception as issues in this function.

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12632
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:0-0
Timestamp: 2026-04-01T04:17:43.495Z
Learning: In autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py (PR `#12632`, commit 12ae03c), the per-tool `BaseTool.read_only` property approach was removed. Instead, `readOnlyHint=True` (via `ToolAnnotations`) is applied unconditionally to ALL tools — including side-effect tools like `bash_exec` and `write_workspace_file` — to enable fully parallel dispatch by the Anthropic SDK/CLI. Do not flag tools with mutating operations (e.g. save_to_path, write operations) for having `readOnlyHint=True`; this is intentional and E2E validated (3x bash_exec(sleep 3) completed in 3.3s vs 9s sequential).

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12636
File: autogpt_platform/backend/backend/copilot/sdk/service.py:0-0
Timestamp: 2026-04-01T14:54:01.937Z
Learning: In Significant-Gravitas/AutoGPT (autogpt_platform), `claude_agent_max_transient_retries` (default=3) in `ChatConfig` counts **total attempts including the initial one**, not the number of extra retries. With the pre-incremented `transient_retries >= max_transient` guard in `service.py`, a value of 3 yields 3 total stream attempts (initial + 2 retries with exponential backoff: 1s, 2s). Do NOT flag this as an off-by-one — the `>=` check is intentional.

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12797
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1991-2021
Timestamp: 2026-04-15T13:44:34.273Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py` (`_run_stream_attempt`), the pre-create block (PR `#12797`) intentionally does NOT call `state.transcript_builder.append_assistant(...)` when inserting the empty assistant placeholder into `ctx.session.messages`. The transcript is left ending at the `tool_result` entry (N entries) while `message_count` metadata is N+1. This mismatch is benign and deliberate: on the next `--resume`, the SDK sees the transcript ending at `tool_result` and correctly regenerates the assistant response. Pre-appending the assistant turn to the transcript would suppress regeneration while leaving `session.messages[-1].content = ""` permanently (worse outcome). On the gap-fallback path, `transcript_msg_count (N+1) >= msg_count-1 (N)` means no gap is injected for the empty placeholder, which is correct because injecting an empty assistant message as context would mislead the SDK. Do NOT flag this transcript/message_count discrepancy as a bug.

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12773
File: autogpt_platform/backend/backend/blocks/autopilot.py:631-638
Timestamp: 2026-04-14T07:35:11.464Z
Learning: In `autogpt_platform/backend/backend/copilot/executor/utils.py`, `CoPilotExecutionEntry` includes a `permissions: CopilotPermissions | None` field (added in PR `#12773` / commit a0184c87b9). `enqueue_copilot_turn` accepts and serializes this field into the queue entry, `_enqueue_for_recovery` in `autopilot.py` accepts and forwards `permissions` to `enqueue_copilot_turn`, and `_execute_async` in `processor.py` restores `entry.permissions` and passes it into `stream_chat_completion_sdk`/`stream_chat_completion_baseline` via `set_execution_context`. This ensures recovered sub-agent turns respect the same tool/block permission ceiling as the original in-process execution (mirroring `_merge_inherited_permissions`). Do NOT flag recovered turns as losing their permission ceiling — it is now fully propagated through the queue.

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/copilot/workflow_import/converter.py:0-0
Timestamp: 2026-03-17T10:57:12.953Z
Learning: In Significant-Gravitas/AutoGPT PR `#12440`, `autogpt_platform/backend/backend/copilot/workflow_import/converter.py` was fully rewritten (commit 732960e2d) to no longer make direct LLM/OpenAI API calls. The converter now builds a structured text prompt for AutoPilot/CoPilot instead. There is no `response.choices` access or any direct LLM client usage in this file. Do not flag `response.choices` access or LLM client initialization patterns as issues in this file.

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/sdk/service.py:0-0
Timestamp: 2026-04-22T12:26:42.571Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, `_resolve_sdk_model_for_request`: when a per-user LaunchDarkly model value fails `_normalize_model_name` (e.g. a `moonshotai/kimi-*` slug in direct-Anthropic mode), the fallback must be tier-specific — `config.thinking_advanced_model` for advanced tier, `config.thinking_standard_model` for standard tier — NOT the generic `_resolve_sdk_model()` (which is standard-only and returns None under subscription mode). If the tier-specific config default also fails `_normalize_model_name`, re-raise the original LD error; this is a deployment-level misconfiguration that `model_validator` should have caught at startup. Established in PR `#12881` commit 637d2fef5.

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12604
File: autogpt_platform/backend/backend/copilot/sdk/security_hooks.py:165-171
Timestamp: 2026-03-30T11:49:37.770Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/security_hooks.py`, the `web_search_count` and `total_tool_call_count` circuit-breaker counters in `create_security_hooks` are intentionally per-turn (closure-local), not per-session. Hooks are recreated per stream invocation in `service.py`, so counters reset each turn. This is an accepted v1 design: it caps a single runaway turn (incident d2f7cba3: 179 WebSearch calls, $20.66). True per-session persistence via Redis is deferred to a later iteration. Do not flag these as a per-session vs. per-turn mismatch bug.

📚 Learning: 2026-04-22T12:26:42.571Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/sdk/service.py:0-0
Timestamp: 2026-04-22T12:26:42.571Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, `_resolve_sdk_model_for_request`: when a per-user LaunchDarkly model value fails `_normalize_model_name` (e.g. a `moonshotai/kimi-*` slug in direct-Anthropic mode), the fallback must be tier-specific — `config.thinking_advanced_model` for advanced tier, `config.thinking_standard_model` for standard tier — NOT the generic `_resolve_sdk_model()` (which is standard-only and returns None under subscription mode). If the tier-specific config default also fails `_normalize_model_name`, re-raise the original LD error; this is a deployment-level misconfiguration that `model_validator` should have caught at startup. Established in PR `#12881` commit 637d2fef5.

Applied to files:

autogpt_platform/backend/backend/copilot/sdk/retry_scenarios_test.py

📚 Learning: 2026-04-01T14:54:01.937Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12636
File: autogpt_platform/backend/backend/copilot/sdk/service.py:0-0
Timestamp: 2026-04-01T14:54:01.937Z
Learning: In Significant-Gravitas/AutoGPT (autogpt_platform), `claude_agent_max_transient_retries` (default=3) in `ChatConfig` counts **total attempts including the initial one**, not the number of extra retries. With the pre-incremented `transient_retries >= max_transient` guard in `service.py`, a value of 3 yields 3 total stream attempts (initial + 2 retries with exponential backoff: 1s, 2s). Do NOT flag this as an off-by-one — the `>=` check is intentional.

Applied to files:

autogpt_platform/backend/backend/copilot/sdk/retry_scenarios_test.py
autogpt_platform/backend/backend/copilot/sdk/service.py
autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py
autogpt_platform/backend/backend/copilot/config.py

📚 Learning: 2026-04-15T13:44:34.273Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12797
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1991-2021
Timestamp: 2026-04-15T13:44:34.273Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py` (`_run_stream_attempt`), the pre-create block (PR `#12797`) intentionally does NOT call `state.transcript_builder.append_assistant(...)` when inserting the empty assistant placeholder into `ctx.session.messages`. The transcript is left ending at the `tool_result` entry (N entries) while `message_count` metadata is N+1. This mismatch is benign and deliberate: on the next `--resume`, the SDK sees the transcript ending at `tool_result` and correctly regenerates the assistant response. Pre-appending the assistant turn to the transcript would suppress regeneration while leaving `session.messages[-1].content = ""` permanently (worse outcome). On the gap-fallback path, `transcript_msg_count (N+1) >= msg_count-1 (N)` means no gap is injected for the empty placeholder, which is correct because injecting an empty assistant message as context would mislead the SDK. Do NOT flag this transcript/message_count discrepancy as a bug.

Applied to files:

autogpt_platform/backend/backend/copilot/sdk/retry_scenarios_test.py
autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-03-27T08:39:45.696Z

Learnt from: ntindle
Repo: Significant-Gravitas/AutoGPT PR: 12592
File: autogpt_platform/frontend/AGENTS.md:1-3
Timestamp: 2026-03-27T08:39:45.696Z
Learning: In Significant-Gravitas/AutoGPT, Claude is the primary coding agent. AGENTS.md files intentionally retain Claude-specific wording (e.g., "CLAUDE.md - Frontend", "This file provides guidance to Claude Code") even though AGENTS.md is the canonical cross-agent instruction source. Do not flag Claude-specific titles or phrasing in AGENTS.md files as issues.

Applied to files:

autogpt_platform/backend/backend/copilot/sdk/retry_scenarios_test.py

📚 Learning: 2026-02-26T17:02:22.448Z

Learnt from: Pwuts
Repo: Significant-Gravitas/AutoGPT PR: 12211
File: .pre-commit-config.yaml:160-179
Timestamp: 2026-02-26T17:02:22.448Z
Learning: Keep the pre-commit hook pattern broad for autogpt_platform/backend to ensure OpenAPI schema changes are captured. Do not narrow to backend/api/ alone, since the generated schema depends on Pydantic models across multiple directories (backend/data/, backend/blocks/, backend/copilot/, backend/integrations/, backend/util/). Narrowing could miss schema changes and cause frontend type desynchronization.

Applied to files:

autogpt_platform/backend/backend/copilot/sdk/retry_scenarios_test.py
autogpt_platform/backend/backend/copilot/sdk/service.py
autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py
autogpt_platform/backend/backend/copilot/config.py
autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-03-04T08:04:35.881Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12273
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:216-220
Timestamp: 2026-03-04T08:04:35.881Z
Learning: In the AutoGPT Copilot backend, ensure that SVG images are not treated as vision image types by excluding 'image/svg+xml' from INLINEABLE_MIME_TYPES and MULTIMODAL_TYPES in tool_adapter.py; the Claude API supports PNG, JPEG, GIF, and WebP for vision. SVGs (XML text) should be handled via the text path instead, not the vision path.

Applied to files:

autogpt_platform/backend/backend/copilot/sdk/retry_scenarios_test.py
autogpt_platform/backend/backend/copilot/sdk/service.py
autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py
autogpt_platform/backend/backend/copilot/config.py
autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-04-01T04:17:41.600Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12632
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:0-0
Timestamp: 2026-04-01T04:17:41.600Z
Learning: When reviewing AutoGPT Copilot tool implementations, accept that `readOnlyHint=True` (provided via `ToolAnnotations`) may be applied unconditionally to *all* tools—even tools that have side effects (e.g., `bash_exec`, `write_workspace_file`, or other write/save operations). Do **not** flag these tools for having `readOnlyHint=True`; this is intentional to enable fully-parallel dispatch by the Anthropic SDK/CLI and has been E2E validated. Only flag `readOnlyHint` issues if they conflict with the established `ToolAnnotations` behavior (e.g., missing/incorrect propagation relative to the intended annotation mechanism).

Applied to files:

autogpt_platform/backend/backend/copilot/sdk/retry_scenarios_test.py
autogpt_platform/backend/backend/copilot/sdk/service.py
autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py
autogpt_platform/backend/backend/copilot/config.py
autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-03-05T15:42:08.207Z

Learnt from: ntindle
Repo: Significant-Gravitas/AutoGPT PR: 12297
File: .claude/skills/backend-check/SKILL.md:14-16
Timestamp: 2026-03-05T15:42:08.207Z
Learning: In Python files under autogpt_platform/backend (recursively), rely on poetry run format to perform formatting (Black + isort) and linting (ruff). Do not run poetry run lint as a separate step after poetry run format, since format already includes linting checks.

Applied to files:

autogpt_platform/backend/backend/copilot/sdk/retry_scenarios_test.py
autogpt_platform/backend/backend/copilot/sdk/service.py
autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py
autogpt_platform/backend/backend/copilot/config.py
autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-03-16T16:35:40.236Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/api/features/workflow_import.py:54-63
Timestamp: 2026-03-16T16:35:40.236Z
Learning: Avoid using the word 'competitor' in public-facing identifiers and text. Use neutral naming for API paths, model names, function names, and UI text. Examples: rename 'CompetitorFormat' to 'SourcePlatform', 'convert_competitor_workflow' to 'convert_workflow', '/competitor-workflow' to '/workflow'. Apply this guideline to files under autogpt_platform/backend and autogpt_platform/frontend.

Applied to files:

autogpt_platform/backend/backend/copilot/sdk/retry_scenarios_test.py
autogpt_platform/backend/backend/copilot/sdk/service.py
autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py
autogpt_platform/backend/backend/copilot/config.py
autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-03-31T15:37:38.626Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12623
File: autogpt_platform/backend/backend/copilot/tools/agent_generator/fixer.py:37-47
Timestamp: 2026-03-31T15:37:38.626Z
Learning: When validating/constructing Anthropic API model IDs in Significant-Gravitas/AutoGPT, allow the hyphen-separated Claude Opus 4.6 model ID `claude-opus-4-6` (it corresponds to `LlmModel.CLAUDE_4_6_OPUS` in `autogpt_platform/backend/backend/blocks/llm.py`). Do NOT require the dot-separated form in Anthropic contexts. Only OpenRouter routing variants should use the dot separator (e.g., `anthropic/claude-opus-4.6`); `claude-opus-4-6` should be treated as correct when passed to Anthropic, and flagged only if it’s used in the OpenRouter path where the dot form is expected.

Applied to files:

autogpt_platform/backend/backend/copilot/sdk/retry_scenarios_test.py
autogpt_platform/backend/backend/copilot/sdk/service.py
autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py
autogpt_platform/backend/backend/copilot/config.py
autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-04-15T02:43:36.890Z

Learnt from: ntindle
Repo: Significant-Gravitas/AutoGPT PR: 12780
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:0-0
Timestamp: 2026-04-15T02:43:36.890Z
Learning: When reviewing Python exception handlers, do not flag `isinstance(e, X)` checks as dead/unreachable if the caught exception `X` is a subclass of the exception type being handled. For example, if `X` (e.g., `VirusScanError`) inherits from `ValueError` (directly or via an intermediate class) and it can be raised within an `except ValueError:` block, then `isinstance(e, X)` inside that handler is reachable and should not be treated as dead code.

Applied to files:

autogpt_platform/backend/backend/copilot/sdk/retry_scenarios_test.py
autogpt_platform/backend/backend/copilot/sdk/service.py
autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py
autogpt_platform/backend/backend/copilot/config.py
autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-04-22T11:46:04.431Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/config.py:0-0
Timestamp: 2026-04-22T11:46:04.431Z
Learning: Do not flag the Claude Sonnet 4.6 model ID as incorrect when it uses the project’s established hyphenated convention: `anthropic/claude-sonnet-4-6`. This hyphen form is the intentional, production convention and should be treated as valid (including in files like llm.py, blocks tests, reasoning.py, `_is_anthropic_model` tests, and config defaults). Note that OpenRouter also accepts the dot variant `anthropic/claude-sonnet-4.6`, so either form may be tolerated, but `anthropic/claude-sonnet-4-6` should be considered the standard to match project usage.

Applied to files:

autogpt_platform/backend/backend/copilot/sdk/retry_scenarios_test.py
autogpt_platform/backend/backend/copilot/sdk/service.py
autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py
autogpt_platform/backend/backend/copilot/config.py
autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-04-22T11:46:12.892Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/baseline/service.py:322-332
Timestamp: 2026-04-22T11:46:12.892Z
Learning: In this codebase (Significant-Gravitas/AutoGPT), OpenRouter-routed Anthropic model IDs should use the hyphen-separated convention (e.g., `anthropic/claude-sonnet-4-6`, `anthropic/claude-opus-4-6`). Although OpenRouter may accept both hyphen and dot variants, treat the hyphen-separated form as the intended, correct codebase-wide convention and do not flag it as an error. Only flag the dot-separated variant (e.g., `anthropic/claude-sonnet-4.6`) as incorrect when reviewing/validating model ID strings for OpenRouter-routed Anthropic models.

Applied to files:

autogpt_platform/backend/backend/copilot/sdk/retry_scenarios_test.py
autogpt_platform/backend/backend/copilot/sdk/service.py
autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py
autogpt_platform/backend/backend/copilot/config.py
autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-03-17T10:57:12.953Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/copilot/workflow_import/converter.py:0-0
Timestamp: 2026-03-17T10:57:12.953Z
Learning: In Significant-Gravitas/AutoGPT PR `#12440`, `autogpt_platform/backend/backend/copilot/workflow_import/converter.py` was fully rewritten (commit 732960e2d) to no longer make direct LLM/OpenAI API calls. The converter now builds a structured text prompt for AutoPilot/CoPilot instead. There is no `response.choices` access or any direct LLM client usage in this file. Do not flag `response.choices` access or LLM client initialization patterns as issues in this file.

Applied to files:

autogpt_platform/backend/backend/copilot/sdk/service.py
autogpt_platform/backend/backend/copilot/config.py
autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-03-16T17:00:02.827Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12439
File: autogpt_platform/backend/backend/blocks/autogpt_copilot.py:0-0
Timestamp: 2026-03-16T17:00:02.827Z
Learning: In autogpt_platform/backend/backend/blocks/autogpt_copilot.py, the recursion guard uses two module-level ContextVars: `_copilot_recursion_depth` (tracks current nesting depth) and `_copilot_recursion_limit` (stores the chain-wide ceiling). On the first invocation, `_copilot_recursion_limit` is set to `max_recursion_depth`; nested calls use `min(inherited_limit, max_recursion_depth)`, so they can only lower the cap, never raise it. The entry/exit logic is extracted into module-level helper functions. This is the approved pattern for preventing runaway sub-agent recursion in AutogptCopilotBlock (PR `#12439`, commits 348e9f8e2 and 3b70f61b1).

Applied to files:

autogpt_platform/backend/backend/copilot/sdk/service.py
autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py
autogpt_platform/backend/backend/copilot/config.py

📚 Learning: 2026-04-14T07:35:11.464Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12773
File: autogpt_platform/backend/backend/blocks/autopilot.py:631-638
Timestamp: 2026-04-14T07:35:11.464Z
Learning: In `autogpt_platform/backend/backend/copilot/executor/utils.py`, `CoPilotExecutionEntry` includes a `permissions: CopilotPermissions | None` field (added in PR `#12773` / commit a0184c87b9). `enqueue_copilot_turn` accepts and serializes this field into the queue entry, `_enqueue_for_recovery` in `autopilot.py` accepts and forwards `permissions` to `enqueue_copilot_turn`, and `_execute_async` in `processor.py` restores `entry.permissions` and passes it into `stream_chat_completion_sdk`/`stream_chat_completion_baseline` via `set_execution_context`. This ensures recovered sub-agent turns respect the same tool/block permission ceiling as the original in-process execution (mirroring `_merge_inherited_permissions`). Do NOT flag recovered turns as losing their permission ceiling — it is now fully propagated through the queue.

Applied to files:

autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py
autogpt_platform/backend/backend/copilot/config.py

📚 Learning: 2026-04-22T05:58:28.595Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12879
File: autogpt_platform/frontend/src/app/api/openapi.json:14576-14577
Timestamp: 2026-04-22T05:58:28.595Z
Learning: Repo: Significant-Gravitas/AutoGPT — autogpt_platform
Process convention: When adding new CoPilot tool response models and updating ToolResponseUnion in backend/api/features/chat/routes.py, regenerate the frontend OpenAPI schema via `poetry run export-api-schema` (do not hand-edit autogpt_platform/frontend/src/app/api/openapi.json).

Applied to files:

autogpt_platform/backend/backend/copilot/config.py

📚 Learning: 2026-04-13T14:19:19.341Z

Learnt from: Pwuts
Repo: Significant-Gravitas/AutoGPT PR: 12740
File: autogpt_platform/frontend/src/app/api/openapi.json:0-0
Timestamp: 2026-04-13T14:19:19.341Z
Learning: Repo: Significant-Gravitas/AutoGPT — autogpt_platform
When adding new CoPilot tool response models (e.g., ScheduleListResponse, ScheduleDeletedResponse), update backend/api/features/chat/routes.py to include them in the ToolResponseUnion so the frontend’s autogenerated openapi.json dummy export (/api/chat/schema/tool-responses) exposes them for codegen. Do not hand-edit frontend/src/app/api/openapi.json.

Applied to files:

autogpt_platform/backend/backend/copilot/config.py

📚 Learning: 2026-04-22T05:57:34.861Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12879
File: autogpt_platform/backend/backend/copilot/baseline/service.py:0-0
Timestamp: 2026-04-22T05:57:34.861Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py`, the approved pattern for `_run_task_subagent` (PR `#12879`, commit 187f0a5) uses a nested `try/except Exception` inside an outer `try/finally`. The outer `finally` block resets `_TASK_DEPTH_VAR` (via `_TASK_DEPTH_VAR.reset(token)`) AND calls `_absorb_inner_usage(parent_state, inner_state)` unconditionally, so both the depth ContextVar and usage roll-up are guaranteed on all exit paths including `CancelledError`/`KeyboardInterrupt`/`SystemExit`. The inner `except Exception` catches and converts failures into a `TaskResponse` error payload that is returned as `StreamToolOutputAvailable`. Do NOT flag missing ContextVar reset or usage roll-up on BaseException paths in this function.

Applied to files:

autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-04-21T11:41:05.877Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-21T11:41:05.877Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py` (PR `#12870`, commits 080d42b9d and 3d7b38162), the `_close_reasoning_block_if_open(state)` helper centralises all four reasoning-block-close call sites (text branch, tool_calls branch, stream-end, exception path). The outer `finally` block of `_baseline_llm_caller` calls this helper plus stripper flush + `StreamTextEnd` to guarantee matched end events are emitted before `StreamFinishStep` on both normal and exception paths. Do NOT flag duplicated close logic or missing reasoning-end-on-exception as issues in this function.

Applied to files:

autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-04-08T17:27:26.657Z

Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: classic/original_autogpt/CLAUDE.md:0-0
Timestamp: 2026-04-08T17:27:26.657Z
Learning: The execution flow traces: `__main__.py` → `cli()` → `cli.py:run()` → `run_auto_gpt()` → `main.py:run_auto_gpt()` (build config, set up storage, load/create agent) → `main.py:run_interaction_loop(agent)` (propose, display, feedback, execute, loop).

Applied to files:

autogpt_platform/backend/backend/copilot/baseline/service.py

🔇 Additional comments (6)

autogpt_platform/backend/backend/copilot/sdk/service.py (1)

3324-3324: LGTM — SDK now uses the unified turn-limit config.

This correctly moves the SDK max_turns budget onto config.agent_max_turns, matching the shared baseline/SDK configuration intent.

autogpt_platform/backend/backend/copilot/sdk/retry_scenarios_test.py (1)

1037-1037: LGTM — test fixture matches the renamed config field.

Using agent_max_turns here keeps the retry integration patches aligned with stream_chat_completion_sdk.

autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py (2)

211-213: LGTM — default assertion follows the unified turn budget.

The test now targets agent_max_turns and matches the new default of 100.

507-523: LGTM — validator coverage was updated cleanly.

The zero, negative, upper-bound, and accepted-boundary cases all exercise the renamed agent_max_turns field.

autogpt_platform/backend/backend/copilot/baseline/service.py (1)

1754-1763: LGTM — baseline now uses the shared turn budget and final-round hint.

This cleanly removes the hard-coded round cap from the baseline path and reuses tool_call_loop’s existing final-iteration hook.

autogpt_platform/backend/backend/copilot/config.py (1)

210-222: The refactoring of claude_agent_max_turns to agent_max_turns is complete. A repository-wide search found no stale programmatic references to the old field name, confirming that all Python-level usages have been updated. The legacy env var CHAT_CLAUDE_AGENT_MAX_TURNS is properly mapped via validation_alias, so no further configuration is needed.

Addresses CodeRabbit's silent-finish concern without spamming the happy path. Emit a short fallback note only when ``state.assistant_text`` is empty — i.e. the forced-text last round gave the user zero visible response. If the model obeyed the hint and produced its own summary, we stay quiet.

coderabbitai

♻️ Duplicate comments (1)

autogpt_platform/backend/backend/copilot/baseline/service.py (1)

1917-1929: ⚠️ Potential issue | 🟡 Minor

Check terminal-round text, not total turn text, before suppressing the fallback.

state.assistant_text includes earlier tool-round chatter. If the model streamed any earlier text but the final budget round produced no summary, Line 1920 suppresses the fallback and the turn still ends without an explanation.

Suggested fix

     loop_result_holder: list[Any] = [None]
     loop_task: asyncio.Task[None] | None = None
+    last_nonfinal_assistant_len = 0
+    terminal_round_had_visible_text = True
 
     async def _run_tool_call_loop() -> None:
+        nonlocal last_nonfinal_assistant_len, terminal_round_had_visible_text
         # Read/write the current session via ``_session_holder`` so this
         # closure doesn't need to ``nonlocal session`` — pyright can't narrow
@@
                 if is_final_yield:
+                    terminal_round_had_visible_text = bool(
+                        state.assistant_text[last_nonfinal_assistant_len:].strip()
+                    )
                     continue
+                last_nonfinal_assistant_len = len(state.assistant_text)
                 try:
                     pending = await drain_pending_messages(session_id)
@@
-            if not state.assistant_text.strip():
+            if not terminal_round_had_visible_text:
                 terminal_text = (
                     "Reached the tool-call budget for this turn. "
                     "Send a follow-up message to continue from here."

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ef57bcd1-c6d9-42ac-9cca-a10468477de2

📥 Commits

Reviewing files that changed from the base of the PR and between 714f0e3 and 5f00d39.

📒 Files selected for processing (1)

autogpt_platform/backend/backend/copilot/baseline/service.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)

GitHub Check: check API types
GitHub Check: test (3.12)
GitHub Check: type-check (3.13)
GitHub Check: type-check (3.12)
GitHub Check: test (3.11)
GitHub Check: type-check (3.11)
GitHub Check: test (3.13)
GitHub Check: Seer Code Review
GitHub Check: Analyze (python)
GitHub Check: Check PR Status
GitHub Check: end-to-end tests

🧰 Additional context used

📓 Path-based instructions (2)

autogpt_platform/backend/**/*.py

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

autogpt_platform/backend/**/*.py: Use Python 3.11 (required; managed by Poetry via pyproject.toml) for backend development
Always run 'poetry run format' (Black + isort) before linting in backend development
Always run 'poetry run lint' (ruff) after formatting in backend development

autogpt_platform/backend/**/*.py: Use poetry run ... command for executing Python package dependencies
Use top-level imports only — avoid local/inner imports except for lazy imports of heavy optional dependencies like openpyxl
Use absolute imports with from backend.module import ... for cross-package imports; single-dot relative imports are acceptable for sibling modules within the same package; avoid double-dot relative imports
Do not use duck typing — avoid hasattr/getattr/isinstance for type dispatch; use typed interfaces/unions/protocols instead
Use Pydantic models over dataclass/namedtuple/dict for structured data
Do not use linter suppressors — no # type: ignore, # noqa, # pyright: ignore; fix the type/code instead
Prefer list comprehensions over manual loop-and-append patterns
Use early return with guard clauses first to avoid deep nesting
Use %s for deferred interpolation in debug log statements for efficiency; use f-strings elsewhere for readability (e.g., logger.debug("Processing %s items", count) vs logger.info(f"Processing {count} items"))
Sanitize error paths by using os.path.basename() in error messages to avoid leaking directory structure
Be aware of TOCTOU (Time-Of-Check-Time-Of-Use) issues — avoid check-then-act patterns for file access and credit charging
Use transaction=True for Redis pipelines to ensure atomicity on multi-step operations
Use max(0, value) guards for computed values that should never be negative
Keep files under ~300 lines; if a file grows beyond this, split by responsibility (extract helpers, models, or a sub-module into a new file)
Keep functions under ~40 lines; extract named helpers when a function grows longer
...

Files:

autogpt_platform/backend/backend/copilot/baseline/service.py

autogpt_platform/{backend,autogpt_libs}/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Format Python code with poetry run format

Files:

autogpt_platform/backend/backend/copilot/baseline/service.py

🧠 Learnings (21)

📓 Common learnings

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-21T11:41:05.877Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py` (PR `#12870`, commits 080d42b9d and 3d7b38162), the `_close_reasoning_block_if_open(state)` helper centralises all four reasoning-block-close call sites (text branch, tool_calls branch, stream-end, exception path). The outer `finally` block of `_baseline_llm_caller` calls this helper plus stripper flush + `StreamTextEnd` to guarantee matched end events are emitted before `StreamFinishStep` on both normal and exception paths. Do NOT flag duplicated close logic or missing reasoning-end-on-exception as issues in this function.

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12879
File: autogpt_platform/backend/backend/copilot/baseline/service.py:0-0
Timestamp: 2026-04-22T05:57:34.861Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py`, the approved pattern for `_run_task_subagent` (PR `#12879`, commit 187f0a5) uses a nested `try/except Exception` inside an outer `try/finally`. The outer `finally` block resets `_TASK_DEPTH_VAR` (via `_TASK_DEPTH_VAR.reset(token)`) AND calls `_absorb_inner_usage(parent_state, inner_state)` unconditionally, so both the depth ContextVar and usage roll-up are guaranteed on all exit paths including `CancelledError`/`KeyboardInterrupt`/`SystemExit`. The inner `except Exception` catches and converts failures into a `TaskResponse` error payload that is returned as `StreamToolOutputAvailable`. Do NOT flag missing ContextVar reset or usage roll-up on BaseException paths in this function.

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12797
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1991-2021
Timestamp: 2026-04-15T13:44:34.273Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py` (`_run_stream_attempt`), the pre-create block (PR `#12797`) intentionally does NOT call `state.transcript_builder.append_assistant(...)` when inserting the empty assistant placeholder into `ctx.session.messages`. The transcript is left ending at the `tool_result` entry (N entries) while `message_count` metadata is N+1. This mismatch is benign and deliberate: on the next `--resume`, the SDK sees the transcript ending at `tool_result` and correctly regenerates the assistant response. Pre-appending the assistant turn to the transcript would suppress regeneration while leaving `session.messages[-1].content = ""` permanently (worse outcome). On the gap-fallback path, `transcript_msg_count (N+1) >= msg_count-1 (N)` means no gap is injected for the empty placeholder, which is correct because injecting an empty assistant message as context would mislead the SDK. Do NOT flag this transcript/message_count discrepancy as a bug.

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12632
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:0-0
Timestamp: 2026-04-01T04:17:43.495Z
Learning: In autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py (PR `#12632`, commit 12ae03c), the per-tool `BaseTool.read_only` property approach was removed. Instead, `readOnlyHint=True` (via `ToolAnnotations`) is applied unconditionally to ALL tools — including side-effect tools like `bash_exec` and `write_workspace_file` — to enable fully parallel dispatch by the Anthropic SDK/CLI. Do not flag tools with mutating operations (e.g. save_to_path, write operations) for having `readOnlyHint=True`; this is intentional and E2E validated (3x bash_exec(sleep 3) completed in 3.3s vs 9s sequential).

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12439
File: autogpt_platform/backend/backend/blocks/autogpt_copilot.py:0-0
Timestamp: 2026-03-16T17:00:02.827Z
Learning: In autogpt_platform/backend/backend/blocks/autogpt_copilot.py, the recursion guard uses two module-level ContextVars: `_copilot_recursion_depth` (tracks current nesting depth) and `_copilot_recursion_limit` (stores the chain-wide ceiling). On the first invocation, `_copilot_recursion_limit` is set to `max_recursion_depth`; nested calls use `min(inherited_limit, max_recursion_depth)`, so they can only lower the cap, never raise it. The entry/exit logic is extracted into module-level helper functions. This is the approved pattern for preventing runaway sub-agent recursion in AutogptCopilotBlock (PR `#12439`, commits 348e9f8e2 and 3b70f61b1).

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/sdk/service.py:0-0
Timestamp: 2026-04-22T12:26:42.571Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, `_resolve_sdk_model_for_request`: when a per-user LaunchDarkly model value fails `_normalize_model_name` (e.g. a `moonshotai/kimi-*` slug in direct-Anthropic mode), the fallback must be tier-specific — `config.thinking_advanced_model` for advanced tier, `config.thinking_standard_model` for standard tier — NOT the generic `_resolve_sdk_model()` (which is standard-only and returns None under subscription mode). If the tier-specific config default also fails `_normalize_model_name`, re-raise the original LD error; this is a deployment-level misconfiguration that `model_validator` should have caught at startup. Established in PR `#12881` commit 637d2fef5.

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12636
File: autogpt_platform/backend/backend/copilot/sdk/service.py:0-0
Timestamp: 2026-04-01T14:54:01.937Z
Learning: In Significant-Gravitas/AutoGPT (autogpt_platform), `claude_agent_max_transient_retries` (default=3) in `ChatConfig` counts **total attempts including the initial one**, not the number of extra retries. With the pre-incremented `transient_retries >= max_transient` guard in `service.py`, a value of 3 yields 3 total stream attempts (initial + 2 retries with exponential backoff: 1s, 2s). Do NOT flag this as an off-by-one — the `>=` check is intentional.

📚 Learning: 2026-04-22T05:57:34.861Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12879
File: autogpt_platform/backend/backend/copilot/baseline/service.py:0-0
Timestamp: 2026-04-22T05:57:34.861Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py`, the approved pattern for `_run_task_subagent` (PR `#12879`, commit 187f0a5) uses a nested `try/except Exception` inside an outer `try/finally`. The outer `finally` block resets `_TASK_DEPTH_VAR` (via `_TASK_DEPTH_VAR.reset(token)`) AND calls `_absorb_inner_usage(parent_state, inner_state)` unconditionally, so both the depth ContextVar and usage roll-up are guaranteed on all exit paths including `CancelledError`/`KeyboardInterrupt`/`SystemExit`. The inner `except Exception` catches and converts failures into a `TaskResponse` error payload that is returned as `StreamToolOutputAvailable`. Do NOT flag missing ContextVar reset or usage roll-up on BaseException paths in this function.

Applied to files:

autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-04-15T13:44:34.273Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12797
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1991-2021
Timestamp: 2026-04-15T13:44:34.273Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py` (`_run_stream_attempt`), the pre-create block (PR `#12797`) intentionally does NOT call `state.transcript_builder.append_assistant(...)` when inserting the empty assistant placeholder into `ctx.session.messages`. The transcript is left ending at the `tool_result` entry (N entries) while `message_count` metadata is N+1. This mismatch is benign and deliberate: on the next `--resume`, the SDK sees the transcript ending at `tool_result` and correctly regenerates the assistant response. Pre-appending the assistant turn to the transcript would suppress regeneration while leaving `session.messages[-1].content = ""` permanently (worse outcome). On the gap-fallback path, `transcript_msg_count (N+1) >= msg_count-1 (N)` means no gap is injected for the empty placeholder, which is correct because injecting an empty assistant message as context would mislead the SDK. Do NOT flag this transcript/message_count discrepancy as a bug.

Applied to files:

autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-04-21T11:41:05.877Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-21T11:41:05.877Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py` (PR `#12870`, commits 080d42b9d and 3d7b38162), the `_close_reasoning_block_if_open(state)` helper centralises all four reasoning-block-close call sites (text branch, tool_calls branch, stream-end, exception path). The outer `finally` block of `_baseline_llm_caller` calls this helper plus stripper flush + `StreamTextEnd` to guarantee matched end events are emitted before `StreamFinishStep` on both normal and exception paths. Do NOT flag duplicated close logic or missing reasoning-end-on-exception as issues in this function.

Applied to files:

autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-03-17T10:57:12.953Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/copilot/workflow_import/converter.py:0-0
Timestamp: 2026-03-17T10:57:12.953Z
Learning: In Significant-Gravitas/AutoGPT PR `#12440`, `autogpt_platform/backend/backend/copilot/workflow_import/converter.py` was fully rewritten (commit 732960e2d) to no longer make direct LLM/OpenAI API calls. The converter now builds a structured text prompt for AutoPilot/CoPilot instead. There is no `response.choices` access or any direct LLM client usage in this file. Do not flag `response.choices` access or LLM client initialization patterns as issues in this file.

Applied to files:

autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-03-16T17:00:02.827Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12439
File: autogpt_platform/backend/backend/blocks/autogpt_copilot.py:0-0
Timestamp: 2026-03-16T17:00:02.827Z
Learning: In autogpt_platform/backend/backend/blocks/autogpt_copilot.py, the recursion guard uses two module-level ContextVars: `_copilot_recursion_depth` (tracks current nesting depth) and `_copilot_recursion_limit` (stores the chain-wide ceiling). On the first invocation, `_copilot_recursion_limit` is set to `max_recursion_depth`; nested calls use `min(inherited_limit, max_recursion_depth)`, so they can only lower the cap, never raise it. The entry/exit logic is extracted into module-level helper functions. This is the approved pattern for preventing runaway sub-agent recursion in AutogptCopilotBlock (PR `#12439`, commits 348e9f8e2 and 3b70f61b1).

Applied to files:

autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-04-01T04:17:41.600Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12632
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:0-0
Timestamp: 2026-04-01T04:17:41.600Z
Learning: When reviewing AutoGPT Copilot tool implementations, accept that `readOnlyHint=True` (provided via `ToolAnnotations`) may be applied unconditionally to *all* tools—even tools that have side effects (e.g., `bash_exec`, `write_workspace_file`, or other write/save operations). Do **not** flag these tools for having `readOnlyHint=True`; this is intentional to enable fully-parallel dispatch by the Anthropic SDK/CLI and has been E2E validated. Only flag `readOnlyHint` issues if they conflict with the established `ToolAnnotations` behavior (e.g., missing/incorrect propagation relative to the intended annotation mechanism).

Applied to files:

autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-03-09T10:50:43.907Z

Learnt from: Bentlybro
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-03-09T10:50:43.907Z
Learning: Repo: Significant-Gravitas/AutoGPT — File: autogpt_platform/backend/backend/blocks/llm.py
For xAI Grok models accessed via OpenRouter, the API returns `null` for `max_completion_tokens`. The convention in this codebase is to use the model's context window size as the `max_output_tokens` value in ModelMetadata. For example, Grok 3 uses 131072 (128k) and Grok 4 uses 262144 (256k). Do not flag these as incorrect max output token values.

Applied to files:

autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-03-15T15:30:09.706Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12385
File: autogpt_platform/backend/backend/copilot/tools/helpers.py:149-185
Timestamp: 2026-03-15T15:30:09.706Z
Learning: In autogpt_platform/backend/backend/copilot/tools/helpers.py, within execute_block, when InsufficientBalanceError occurs after post-execution credit charging (concurrent balance drain after pre-check passed), this is treated as a non-fatal billing leak: log at ERROR level with structured JSON fields `{"billing_leak": True, "user_id": ..., "cost": ...}` for monitoring/alerting, then return BlockOutputResponse normally. Discarding the output would worsen UX since the block already executed with potential side effects. Reuse the credit_model obtained during the pre-execution balance check (guarded by `if cost > 0 and credit_model:`) for the post-execution charge; do not perform a second get_user_credit_model call.

Applied to files:

autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-04-23T00:07:27.117Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-23T00:07:27.117Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, background tasks that persist cost or emit Langfuse backfill (e.g. the cost-reconcile task) must be anchored to `_background_tasks` using `_background_tasks.add(task)` and `task.add_done_callback(_background_tasks.discard)`, mirroring the existing pattern at lines 3063 / 4232 / 4256. This prevents the asyncio task from being garbage-collected before persistence or Langfuse emission completes. Do NOT flag the absence of this anchoring as acceptable in this file. Established in PR `#12889` commit 5ce3d0388.

Applied to files:

autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-04-21T17:31:23.683Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12873
File: autogpt_platform/backend/backend/copilot/baseline/reasoning.py:0-0
Timestamp: 2026-04-21T17:31:23.683Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/reasoning.py` (`BaselineReasoningEmitter`), when `render_in_ui=False`, BOTH the `StreamReasoning*` wire events AND the `ChatMessage(role="reasoning")` persistence append must be suppressed together. `convertChatSessionToUiMessages.ts` unconditionally re-renders all persisted `role="reasoning"` rows as `{type:"reasoning"}` UI parts on reload, so persisting rows while silencing live wire events would resurrect the reasoning collapse on page refresh. The audit trail is preserved through the provider transcript and `_format_sdk_content_blocks` (SDK path) instead. The baseline and SDK paths mirror each other: flag off → no live wire event, no persisted row, no hydrated collapse. This was established in PR `#12873`, commit 7ef10b26c.

Applied to files:

autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-04-03T11:14:45.569Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12623
File: autogpt_platform/backend/backend/copilot/baseline/service.py:0-0
Timestamp: 2026-04-03T11:14:45.569Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py`, `transcript_builder.append_user(content=message)` is called unconditionally even when the message is a duplicate that was suppressed by the `is_new_message` guard. This is intentional: the downloaded transcript may be stale (uploaded before the previous attempt persisted the message), so always appending the current user turn prevents a malformed assistant-after-assistant transcript structure. The `is_user_message` flag is still checked (`if message and is_user_message:`), so assistant-role inputs are excluded. Do NOT flag this as a bug.

Applied to files:

autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-04-08T17:27:26.657Z

Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: classic/original_autogpt/CLAUDE.md:0-0
Timestamp: 2026-04-08T17:27:26.657Z
Learning: The execution flow traces: `__main__.py` → `cli()` → `cli.py:run()` → `run_auto_gpt()` → `main.py:run_auto_gpt()` (build config, set up storage, load/create agent) → `main.py:run_interaction_loop(agent)` (propose, display, feedback, execute, loop).

Applied to files:

autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-02-26T17:02:22.448Z

Learnt from: Pwuts
Repo: Significant-Gravitas/AutoGPT PR: 12211
File: .pre-commit-config.yaml:160-179
Timestamp: 2026-02-26T17:02:22.448Z
Learning: Keep the pre-commit hook pattern broad for autogpt_platform/backend to ensure OpenAPI schema changes are captured. Do not narrow to backend/api/ alone, since the generated schema depends on Pydantic models across multiple directories (backend/data/, backend/blocks/, backend/copilot/, backend/integrations/, backend/util/). Narrowing could miss schema changes and cause frontend type desynchronization.

Applied to files:

autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-03-04T08:04:35.881Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12273
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:216-220
Timestamp: 2026-03-04T08:04:35.881Z
Learning: In the AutoGPT Copilot backend, ensure that SVG images are not treated as vision image types by excluding 'image/svg+xml' from INLINEABLE_MIME_TYPES and MULTIMODAL_TYPES in tool_adapter.py; the Claude API supports PNG, JPEG, GIF, and WebP for vision. SVGs (XML text) should be handled via the text path instead, not the vision path.

Applied to files:

autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-03-05T15:42:08.207Z

Learnt from: ntindle
Repo: Significant-Gravitas/AutoGPT PR: 12297
File: .claude/skills/backend-check/SKILL.md:14-16
Timestamp: 2026-03-05T15:42:08.207Z
Learning: In Python files under autogpt_platform/backend (recursively), rely on poetry run format to perform formatting (Black + isort) and linting (ruff). Do not run poetry run lint as a separate step after poetry run format, since format already includes linting checks.

Applied to files:

autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-03-16T16:35:40.236Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/api/features/workflow_import.py:54-63
Timestamp: 2026-03-16T16:35:40.236Z
Learning: Avoid using the word 'competitor' in public-facing identifiers and text. Use neutral naming for API paths, model names, function names, and UI text. Examples: rename 'CompetitorFormat' to 'SourcePlatform', 'convert_competitor_workflow' to 'convert_workflow', '/competitor-workflow' to '/workflow'. Apply this guideline to files under autogpt_platform/backend and autogpt_platform/frontend.

Applied to files:

autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-03-31T15:37:38.626Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12623
File: autogpt_platform/backend/backend/copilot/tools/agent_generator/fixer.py:37-47
Timestamp: 2026-03-31T15:37:38.626Z
Learning: When validating/constructing Anthropic API model IDs in Significant-Gravitas/AutoGPT, allow the hyphen-separated Claude Opus 4.6 model ID `claude-opus-4-6` (it corresponds to `LlmModel.CLAUDE_4_6_OPUS` in `autogpt_platform/backend/backend/blocks/llm.py`). Do NOT require the dot-separated form in Anthropic contexts. Only OpenRouter routing variants should use the dot separator (e.g., `anthropic/claude-opus-4.6`); `claude-opus-4-6` should be treated as correct when passed to Anthropic, and flagged only if it’s used in the OpenRouter path where the dot form is expected.

Applied to files:

autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-04-15T02:43:36.890Z

Learnt from: ntindle
Repo: Significant-Gravitas/AutoGPT PR: 12780
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:0-0
Timestamp: 2026-04-15T02:43:36.890Z
Learning: When reviewing Python exception handlers, do not flag `isinstance(e, X)` checks as dead/unreachable if the caught exception `X` is a subclass of the exception type being handled. For example, if `X` (e.g., `VirusScanError`) inherits from `ValueError` (directly or via an intermediate class) and it can be raised within an `except ValueError:` block, then `isinstance(e, X)` inside that handler is reachable and should not be treated as dead code.

Applied to files:

autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-04-22T11:46:04.431Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/config.py:0-0
Timestamp: 2026-04-22T11:46:04.431Z
Learning: Do not flag the Claude Sonnet 4.6 model ID as incorrect when it uses the project’s established hyphenated convention: `anthropic/claude-sonnet-4-6`. This hyphen form is the intentional, production convention and should be treated as valid (including in files like llm.py, blocks tests, reasoning.py, `_is_anthropic_model` tests, and config defaults). Note that OpenRouter also accepts the dot variant `anthropic/claude-sonnet-4.6`, so either form may be tolerated, but `anthropic/claude-sonnet-4-6` should be considered the standard to match project usage.

Applied to files:

autogpt_platform/backend/backend/copilot/baseline/service.py

📚 Learning: 2026-04-22T11:46:12.892Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/baseline/service.py:322-332
Timestamp: 2026-04-22T11:46:12.892Z
Learning: In this codebase (Significant-Gravitas/AutoGPT), OpenRouter-routed Anthropic model IDs should use the hyphen-separated convention (e.g., `anthropic/claude-sonnet-4-6`, `anthropic/claude-opus-4-6`). Although OpenRouter may accept both hyphen and dot variants, treat the hyphen-separated form as the intended, correct codebase-wide convention and do not flag it as an error. Only flag the dot-separated variant (e.g., `anthropic/claude-sonnet-4.6`) as incorrect when reviewing/validating model ID strings for OpenRouter-routed Anthropic models.

Applied to files:

autogpt_platform/backend/backend/copilot/baseline/service.py

🔇 Additional comments (3)

autogpt_platform/backend/backend/copilot/baseline/service.py (3)

124-132: LGTM — the final-iteration hint is clear and scoped.

This gives the model an explicit wrap-up instruction without changing normal tool-round behavior.

1754-1763: LGTM — dynamic budget and final-round hint are wired correctly.

Using config.agent_max_turns keeps the baseline path aligned with the shared copilot turn budget.

1781-1784: LGTM — final-yield detection follows the configured cap.

This keeps pending-message drain behavior correct when agent_max_turns changes.

Extract the "should we emit a terminal note?" decision into ``_budget_exhausted_notice_text`` and cover it with unit tests so the new fallback branch isn't stranded at 0% coverage. Lifts patch coverage above codecov's 80% threshold.

…nd text CodeRabbit flagged: ``state.assistant_text`` cumulates across rounds, so the previous gate would suppress the fallback whenever any earlier round produced chatter — even if the final budget round fell silent. Track ``text_len_before_final_round`` at every non-final yield and check only the suffix added by the terminal round.

Extract the Stream event construction into ``_build_budget_exhausted_fallback_events`` so it's unit-testable away from the async streaming machinery. Lifts codecov/patch above the 80% threshold by covering the 7 previously-untested lines that produced the fallback StreamTextStart/Delta/End triple.

…y=True Sentry flagged: dropping ``tools=[]`` on the last iteration means ``tool_call_loop`` always takes the ``finished_naturally=True`` path, even when the model returns empty text. The fallback was gated on ``not finished_naturally``, so an empty terminal round bypassed it entirely. Switch the gate to "iterations reached budget cap" — covers both exit paths (natural finish with empty text, and non-compliant tool-calling finish). The terminal-round-text check still scopes the notice to silent finishes.

majdyz requested a review from a team as a code owner April 23, 2026 08:36

majdyz requested review from 0ubbe and removed request for a team April 23, 2026 08:36

github-project-automation Bot added this to AutoGPT development kanban Apr 23, 2026

majdyz requested a review from Bentlybro April 23, 2026 08:36

github-project-automation Bot moved this to 🆕 Needs initial review in AutoGPT development kanban Apr 23, 2026

github-actions Bot added the platform/backend AutoGPT Platform - Back end label Apr 23, 2026

github-actions Bot added the size/m label Apr 23, 2026

coderabbitai Bot reviewed Apr 23, 2026

View reviewed changes

majdyz added 2 commits April 23, 2026 16:04

coderabbitai Bot reviewed Apr 23, 2026

View reviewed changes

Comment thread autogpt_platform/backend/backend/copilot/baseline/service.py Outdated

github-actions Bot added size/l and removed size/m labels Apr 23, 2026

github-actions Bot mentioned this pull request Apr 23, 2026

fix(backend/copilot): tame Kimi compaction storm + tunable threshold + Langfuse cost backfill #12889

Merged

6 tasks

coderabbitai Bot reviewed Apr 23, 2026

View reviewed changes

majdyz added 3 commits April 23, 2026 17:22

sentry Bot reviewed Apr 23, 2026

View reviewed changes

Comment thread autogpt_platform/backend/backend/copilot/baseline/service.py Outdated

majdyz merged commit 4242da7 into dev Apr 23, 2026
40 checks passed

majdyz deleted the fix/copilot-baseline-tool-round-limit branch April 23, 2026 11:38

github-project-automation Bot moved this from 🆕 Needs initial review to ✅ Done in AutoGPT development kanban Apr 23, 2026

majdyz mentioned this pull request Apr 25, 2026

fix(backend/copilot): preserve interrupted SDK partial work on final-failure exit #12918

Merged

8 tasks

coderabbitai Bot mentioned this pull request May 8, 2026

fix(backend/copilot): close empty-completion gaps + add silence heartbeat #13056

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(backend/copilot): raise baseline tool-round limit to 100 + graceful finish hint#12892

fix(backend/copilot): raise baseline tool-round limit to 100 + graceful finish hint#12892
majdyz merged 9 commits into
devfrom
fix/copilot-baseline-tool-round-limit

majdyz commented Apr 23, 2026

Uh oh!

coderabbitai Bot commented Apr 23, 2026 •

edited

Loading

Reviews paused

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Apr 23, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

codecov Bot commented Apr 23, 2026 •

edited

Loading

Uh oh!

majdyz commented Apr 23, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

majdyz commented Apr 23, 2026

Why

What

How

Checklist

Uh oh!

coderabbitai Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔍 PR Overlap Detection

🔴 Merge Conflicts Detected

🟢 Low Risk — File Overlap Only

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

majdyz commented Apr 23, 2026

🧪 E2E Test Report

PR under test

Scenarios

1. Config wiring — loop now runs up to 100 rounds (was 30) — PASS

2. Graceful finish on round 100 — PASS

3. No regression in short-turn copilot — PASS

4. Unit tests — PASS

Summary

Screenshots

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Apr 23, 2026 •

edited

Loading

github-actions Bot commented Apr 23, 2026 •

edited

Loading

codecov Bot commented Apr 23, 2026 •

edited

Loading