Skip to content

feat(platform/copilot): live baseline streaming + render flag + Sonar web_search + simulator cost tracking + reconnect fixes#12873

Merged
majdyz merged 39 commits into
devfrom
feat/copilot-reasoning-render-flag-and-reconnect-fixes
Apr 22, 2026
Merged

feat(platform/copilot): live baseline streaming + render flag + Sonar web_search + simulator cost tracking + reconnect fixes#12873
majdyz merged 39 commits into
devfrom
feat/copilot-reasoning-render-flag-and-reconnect-fixes

Conversation

@majdyz
Copy link
Copy Markdown
Contributor

@majdyz majdyz commented Apr 21, 2026

Why / What / How

Why. Three problems on the baseline copilot path that compound: extended-thinking turns froze the UI for minutes because Kimi K2.6 events were buffered in state.pending_events: list until the full tool_call_loop iteration finished (reasoning arrived in one lump at the end); the SSE stream replayed 1000 events on every reconnect and the frontend opened multiple SSE streams in quick succession on tab-focus thrash (reconnect storm → UI flickers, tab freezes); the web_search tool hit Anthropic's server-side beta directly via a dispatch-model round-trip that fed entire page contents back through the model for a second inference pass (observed $0.072 on a 74K-token call); and the simulator dry-run path ran on Gemini Flash without any cost tracking at all, so every dry-run was free on the platform's microdollar ledger.

What. Grouped deltas, all targeting reliability, cost, and UX of the copilot live-answer pipeline:

  • Live per-token baseline streaming. state.pending_events is now an asyncio.Queue drained concurrently by the outer async generator. The tool-call loop runs as a background task; reasoning / text / tool events reach the SSE wire during the upstream OpenRouter stream, not after it. None is the close sentinel; inner-task exceptions are re-raised via await loop_task once the sentinel arrives. An emitted_events: list mirror preserves post-hoc test inspection. Coalescing widened 32/40 → 64/50 ms to halve the React re-render rate on extended-thinking turns while staying under the ~100 ms perceptual threshold.
  • Reasoning render flagChatConfig.render_reasoning_in_ui: bool = True wired through both BaselineReasoningEmitter and SDKResponseAdapter. When False the wire StreamReasoning* events are suppressed while the persisted ChatMessage(role='reasoning') rows always survive (decoupled from the render flag so audit/replay is unaffected); the service-layer yield filter does the gating. Tokens are still billed upstream; operator kill-switch for UI-level flicker investigations.
  • Reconnect storm mitigationsChatConfig.stream_replay_count: int = 200 (was hard-coded 1000) caps stream_registry.subscribe_to_session XREAD size. Frontend useCopilotStream::handleReconnect adds a 1500 ms debounce via lastReconnectResumeAtRef, so tab-focus thrash doesn't fan out into 5–6 parallel replays in the same second.
  • web_search rewritten to Perplexity Sonar via OpenRouter — single unified credential, real usage.cost flows through persist_and_record_usage(provider='open_router'). Two tiers via a deep param: perplexity/sonar ($0.005/call quick) and perplexity/sonar-deep-research ($0.50–$1.30/call multi-step research). Replaces the Anthropic-native + server-tool dispatches; drops the hardcoded pricing constants entirely.
  • Synthesised answer surfaced end-to-end — Sonar already writes a web-grounded answer on the same call we pay for; the new WebSearchResponse.answer field passes it through and the accordion UI renders it above citations so the agent doesn't re-fetch URLs that are usually bot-protected anyway.
  • Deep-tier cost warning + UI affordancesdeep param description is explicit that it's ~100× pricier; UI labels read "Researching / Researched / N research sources" when deep=true so users know what's running.
  • Simulator cost tracking + cheaper defaultgoogle/gemini-2.5-flashgoogle/gemini-2.5-flash-lite (3× cheaper tokens) and every dry-run now hits persist_and_record_usage(provider='open_router') with real usage.cost. Previously each sim was free against the user's microdollar budget.
  • Typed access everywhere — cost extractors now use openai.types.CompletionUsage.model_extra["cost"] and openai.types.chat.ChatCompletion / Annotation / AnnotationURLCitation with no getattr / duck typing. Mirrors the baseline service's _extract_usage_cost pattern; keep in sync.

How. Key file touches:

  1. copilot/config.pyrender_reasoning_in_ui, stream_replay_count, simulation_model default.
  2. copilot/baseline/service.py_BaselineStreamState.pending_events: asyncio.Queue, _emit / _emit_all helpers, outer generator runs tool_call_loop as a background task + yields from queue concurrently.
  3. copilot/baseline/reasoning.pyBaselineReasoningEmitter(render_in_ui=...), coalescing bumped to 64 chars / 50 ms.
  4. copilot/sdk/service.pystate.adapter.render_reasoning_in_ui threaded through every adapter construction.
  5. copilot/sdk/response_adapter.pyrender_reasoning_in_ui wiring + service-layer yield filter gating for wire suppression while persistence stays intact.
  6. copilot/stream_registry.pycount=config.stream_replay_count.
  7. frontend/.../useCopilotStream.ts::handleReconnect — 1500 ms debounce.
  8. copilot/tools/web_search.py + models.py — Sonar quick/deep paths, WebSearchResponse.answer + typed extractors.
  9. frontend/.../GenericTool/*answer render + deep-aware labels / accordion titles.
  10. executor/simulator.py + executor/manager.py + copilot/config.py — cost tracking + model swap + user_id threading.

Changes

  • copilot/config.py — new render_reasoning_in_ui, stream_replay_count; simulation_model default flipped to Flash-Lite.
  • copilot/baseline/service.pypending_events: asyncio.Queue refactor; outer gen runs loop as task, yields from queue live.
  • copilot/baseline/reasoning.pyBaselineReasoningEmitter(render_in_ui=...) + 64/50 coalesce.
  • copilot/sdk/service.py + response_adapter.pyrender_reasoning_in_ui wire suppression (persistence preserved).
  • copilot/stream_registry.py — replay cap from config.
  • copilot/tools/web_search.py + models.py — Sonar quick/deep + answer field + typed extractors.
  • copilot/tools/helpers.py — tool description tightens deep=true cost warning.
  • frontend/.../useCopilotStream.ts — reconnect debounce.
  • frontend/.../GenericTool/GenericTool.tsx + helpers.ts + tests — render answer, deep-aware verbs / titles.
  • executor/simulator.py + simulator_test.py + executor/manager.py — cost tracking + model swap + user_id plumbing.

Follow-up (deferred to a separate PR)

SDK per-token streaming via include_partial_messages=True was attempted (commits 599e83543 + 530fa8f95) and reverted here. The two-signal model (StreamEvent partial deltas + AssistantMessage summary) needs proper per-block diff tracking — when the partial stream delivers a subset of the final block content, emit only summary.text[len(already_emitted):] from the summary rather than gating on a binary flag. Binary gating truncated replies in the field when the partial stream delivered less than the summary (observed: "The analysis template you" cut off mid-sentence because partial had streamed that much and the rest only lived in the summary). SDK reasoning still renders end-of-phase (as today); this PR's baseline per-token streaming is unaffected.

Checklist

For code changes:

  • Changes listed above
  • Test plan below
  • Tested according to the test plan:
    • poetry run pytest backend/copilot/baseline/ backend/copilot/sdk/ backend/copilot/tools/web_search_test.py backend/executor/simulator_test.py — all pass (155 baseline + 927 SDK + web_search + simulator)
    • pnpm types && pnpm vitest run src/app/(platform)/copilot/tools/GenericTool/ — pass
    • Manual: baseline live-streaming — Kimi K2.6 reasoning arrives token-by-token, coalesced (no end-of-stream burst).
    • Manual: quick web_search via copilot UI — ~$0.005/call, answer + citations rendered, cost logged as provider=open_router.
    • Manual: deep web_search — dispatched only on explicit research phrasing; sonar-deep-research billed, UI labels say "Researched" / "N research sources".
    • Manual: simulator dry-run — Gemini Flash-Lite, [simulator] Turn usage log entry, PlatformCostLog row visible.
    • Manual: reconnect debounce — tab-focus thrash no longer produces parallel XREADs in backend log.
    • Manual: CHAT_RENDER_REASONING_IN_UI=false smoke-check — reasoning collapse absent, no persisted reasoning row on reload.

For configuration changes:

  • .env.default — new config knobs fall back to pydantic defaults; existing CHAT_MODEL/CHAT_FAST_MODEL/CHAT_ADVANCED_MODEL legacy envs still honored upstream (unchanged by this PR).

Companion PR

PR #12876 closes the run_block-via-copilot cost-leak gap (registers PerplexityBlock / FactCheckerBlock in BLOCK_COSTS; documents the credit/microdollar wallet boundary). Separate because the credit-wallet side is orthogonal to the copilot microdollar / rate-limit surface this PR ships.

…m mitigations

Follow-up to #12871 addressing streaming UX issues observed during Kimi K2.6
rollout testing. Three independent changes:

1. New ChatConfig.render_reasoning_in_ui (default True) gates the
   StreamReasoning* wire events on BOTH baseline and SDK paths. When False
   the frontend sees a text-only stream; the model still reasons, tokens
   are still billed, and role="reasoning" rows are still persisted to
   session.messages so a future per-session toggle can surface them on
   reload.

2. SSE reconnect replay cap: ChatConfig.stream_replay_count defaults to 200
   (was hard-coded count=1000 in subscribe_to_session.xread). Bounds the
   replay storm when a tab-switch / browser-throttle fires multiple
   reconnects. 200 still covers a full Kimi turn after coalescing
   (~150 events).

3. Frontend reconnect debounce: handleReconnect rejects requests that
   arrive within 1500ms of the last reconnect's resume, so visibility-
   throttle bursts collapse to one GET /stream instead of 2-3.

Scope 3 from the brief (Last-Event-ID SSE resume) is deferred to a second
follow-up — threading the redis stream id through every to_sse() and
swapping the fetch-based DefaultChatTransport for a Last-Event-ID-aware
client is a deeper architectural change than scopes 1+2 combined.

Tests: 1140 copilot backend tests pass. New coverage: BaselineReasoningEmitter
render_in_ui flag (wire suppression + persistence preservation), SDKResponseAdapter
render_reasoning_in_ui flag, ChatConfig regression tests for both new fields.
@majdyz majdyz requested a review from a team as a code owner April 21, 2026 16:50
@majdyz majdyz requested review from 0ubbe and Bentlybro and removed request for a team April 21, 2026 16:50
@github-project-automation github-project-automation Bot moved this to 🆕 Needs initial review in AutoGPT development kanban Apr 21, 2026
@github-actions github-actions Bot added platform/frontend AutoGPT Platform - Front end platform/backend AutoGPT Platform - Back end labels Apr 21, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 21, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds a config-driven toggle to suppress reasoning wire events and persisted reasoning messages, caps Redis stream replay count via config, and debounces frontend reconnects to coalesce rapid resume attempts.

Changes

Cohort / File(s) Summary
Baseline reasoning emitter
autogpt_platform/backend/backend/copilot/baseline/reasoning.py, autogpt_platform/backend/backend/copilot/baseline/reasoning_test.py
Add keyword-only render_in_ui to BaselineReasoningEmitter; when False suppresses StreamReasoningStart/Delta/End emissions and skips persisting ChatMessage(role="reasoning") while still advancing internal emitter state. Tests for disabled and default paths added.
SDK response adapter
autogpt_platform/backend/backend/copilot/sdk/response_adapter.py, autogpt_platform/backend/backend/copilot/sdk/response_adapter_test.py
Add render_reasoning_in_ui ctor flag; conditionalize reasoning start/delta/end emission and ensure start/end become no-ops when rendering is disabled. Tests added for suppressed and default behavior.
Service wiring & guardrails
autogpt_platform/backend/backend/copilot/baseline/service.py, autogpt_platform/backend/backend/copilot/sdk/service.py, autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py
Thread config.render_reasoning_in_ui into BaselineReasoningEmitter and SDKResponseAdapter construction sites; update tests to expect the new kwarg.
Configuration and tests
autogpt_platform/backend/backend/copilot/config.py, autogpt_platform/backend/backend/copilot/config_test.py
Add ChatConfig.render_reasoning_in_ui: bool = True and ChatConfig.stream_replay_count: int = 200 (validated 1..10000); tests for defaults, env overrides, and validation added.
Stream replay limit
autogpt_platform/backend/backend/copilot/stream_registry.py
Replace hard-coded Redis xread count=1000 with config.stream_replay_count.
Frontend reconnect debounce
autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts, autogpt_platform/frontend/src/app/(platform)/copilot/helpers.ts, autogpt_platform/frontend/src/app/(platform)/copilot/helpers.test.ts, autogpt_platform/frontend/src/app/(platform)/copilot/__tests__/useCopilotStream.test.ts
Add shouldDebounceReconnect, RECONNECT_DEBOUNCE_MS, lastReconnectResumeAtRef, and debounce/coalescing logic in reconnect handling; unit and integration tests validating debounce/coalescing behavior added.

Sequence Diagram(s)

sequenceDiagram
  participant SDK as SDKResponseAdapter
  participant Session as SessionStore
  participant Stream as Redis/SSE Stream
  participant UI as Client/UI

  rect rgba(0,128,0,0.5)
    Note over SDK,Stream: Render ON (config.render_reasoning_in_ui = true)
  end

  SDK->>Session: persist ChatMessage(role="reasoning")
  SDK->>Stream: emit StreamReasoningStart
  SDK->>Stream: emit StreamReasoningDelta...
  SDK->>Stream: emit StreamReasoningEnd
  Stream->>UI: deliver reasoning events
Loading
sequenceDiagram
  participant SDK as SDKResponseAdapter
  participant Session as SessionStore
  participant Stream as Redis/SSE Stream
  participant UI as Client/UI

  rect rgba(128,0,0,0.5)
    Note over SDK,Session: Render OFF (config.render_reasoning_in_ui = false)
  end

  SDK->>Session: optionally skip persisting reasoning rows
  SDK-->>Stream: (no StreamReasoningStart/Delta/End emitted)
  Stream-->>UI: (no reasoning events)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • Pwuts
  • 0ubbe
  • Bentlybro

Poem

🐰 I twitched my whiskers, quieted the stream,
Hid my thinking in a cozy dream,
Replayed less, and waited to resume,
Coalesced the hops inside the room,
A gentle nibble—code sleeps, all calm.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 69.23% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title references multiple key changes (render flag, reconnect fixes) but omits several significant components described in the PR (live streaming, web_search rewrite, simulator cost tracking, Sonar integration). It partially captures the scope.
Description check ✅ Passed The description is comprehensive, well-structured with clear sections (Why/What/How), file-by-file changes, and detailed test plans. It directly relates to the changeset and provides thorough context.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/copilot-reasoning-render-flag-and-reconnect-fixes

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 21, 2026

🔍 PR Overlap Detection

This check compares your PR against all other open PRs targeting the same branch to detect potential merge conflicts early.

🔴 Merge Conflicts Detected

The following PRs have been tested and will have merge conflicts if merged after this PR. Consider coordinating with the authors.

  • fix(copilot): prevent 524 timeout on chat deletion by deferring cleanup #12668 (Otto-AGPT · updated 5d ago)

    • autogpt_platform/backend/backend/api/features/library/db.py (5 conflicts, ~67 lines)
    • autogpt_platform/backend/backend/api/features/library/model.py (1 conflict, ~4 lines)
    • autogpt_platform/backend/backend/api/features/subscription_routes_test.py (5 conflicts, ~376 lines)
    • autogpt_platform/backend/backend/api/features/v1.py (9 conflicts, ~88 lines)
    • autogpt_platform/backend/backend/copilot/baseline/service.py (2 conflicts, ~15 lines)
    • autogpt_platform/backend/backend/copilot/model_test.py (1 conflict, ~5 lines)
    • autogpt_platform/backend/backend/copilot/sdk/service.py (3 conflicts, ~51 lines)
    • autogpt_platform/backend/backend/copilot/transcript.py (1 conflict, ~11 lines)
    • autogpt_platform/backend/backend/data/credit.py (5 conflicts, ~727 lines)
    • autogpt_platform/backend/backend/data/credit_subscription_test.py (14 conflicts, ~1333 lines)
    • autogpt_platform/frontend/src/app/(platform)/copilot/components/PulseChips/usePulseChips.ts (1 conflict, ~13 lines)
    • autogpt_platform/frontend/src/app/(platform)/copilot/components/usageHelpers.ts (1 conflict, ~9 lines)
    • autogpt_platform/frontend/src/app/(platform)/library/components/AgentBriefingPanel/BriefingTabContent.tsx (9 conflicts, ~147 lines)
    • autogpt_platform/frontend/src/app/(platform)/library/components/AgentBriefingPanel/StatsGrid.tsx (2 conflicts, ~9 lines)
    • autogpt_platform/frontend/src/app/(platform)/library/components/ContextualActionButton/ContextualActionButton.tsx (2 conflicts, ~12 lines)
    • autogpt_platform/frontend/src/app/(platform)/library/components/SitrepItem/SitrepItem.tsx (2 conflicts, ~15 lines)
    • autogpt_platform/frontend/src/app/(platform)/library/components/SitrepItem/useSitrepItems.ts (4 conflicts, ~97 lines)
    • autogpt_platform/frontend/src/app/(platform)/library/hooks/useAgentStatus.ts (2 conflicts, ~10 lines)
    • autogpt_platform/frontend/src/app/(platform)/library/hooks/useLibraryFleetSummary.ts (7 conflicts, ~57 lines)
    • autogpt_platform/frontend/src/app/(platform)/library/types.ts (1 conflict, ~4 lines)
    • autogpt_platform/frontend/src/app/(platform)/profile/(user)/credits/components/SubscriptionTierSection/SubscriptionTierSection.tsx (9 conflicts, ~159 lines)
    • autogpt_platform/frontend/src/app/(platform)/profile/(user)/credits/components/SubscriptionTierSection/__tests__/SubscriptionTierSection.test.tsx (7 conflicts, ~256 lines)
    • autogpt_platform/frontend/src/app/(platform)/profile/(user)/credits/components/SubscriptionTierSection/useSubscriptionTierSection.ts (2 conflicts, ~48 lines)
    • autogpt_platform/frontend/src/app/api/openapi.json (1 conflict, ~23 lines)
    • docs/integrations/block-integrations/misc.md (1 conflict, ~5 lines)
  • fix(frontend/copilot): fix streaming reconnect races, hydration ordering, and reasoning split #12813 (0ubbe · updated 22h ago)

    • 📁 autogpt_platform/frontend/src/app/(platform)/copilot/
      • CopilotPage.tsx (3 conflicts, ~63 lines)
      • __tests__/useCopilotStream.test.ts (2 conflicts, ~648 lines)
      • useCopilotStream.ts (4 conflicts, ~67 lines)
  • feat(platform/copilot): Reduce time to first output #12828 (Pwuts · updated 5d ago)

    • 📁 autogpt_platform/backend/backend/
      • api/features/chat/routes.py (3 conflicts, ~75 lines)
      • copilot/sdk/security_hooks.py (1 conflict, ~12 lines)
      • copilot/sdk/service.py (2 conflicts, ~58 lines)
  • feat(backend/copilot): SDK fast tier defaults to Kimi K2.6 via OpenRouter + vendor-aware cost + cross-model fix #12878 (majdyz · updated 1m ago)

    • 📁 autogpt_platform/backend/backend/copilot/
      • config_test.py (2 conflicts, ~139 lines)
  • feat(backend/copilot): TodoWrite for baseline copilot #12879 (majdyz · updated 26m ago)

    • 📁 autogpt_platform/backend/backend/copilot/
      • baseline/service.py (2 conflicts, ~39 lines)
      • tools/tool_schema_test.py (1 conflict, ~17 lines)
  • feat(platform): estimate CoPilot turn cost and require approval for high-cost requests #12877 (Rushi-Balapure · updated 2h ago)

    • 📁 autogpt_platform/backend/backend/
      • api/features/chat/routes.py (2 conflicts, ~32 lines)
      • util/feature_flag.py (1 conflict, ~8 lines)
  • Persist stable copilot message IDs through hydration #12676 (rotempasharel1 · updated 14h ago)

    • 📁 autogpt_platform/backend/backend/copilot/sdk/
      • service.py (2 conflicts, ~33 lines)
  • feat(copilot): add goal decomposition step before agent building #12731 (anvyle · updated 1h ago)

    • 📁 autogpt_platform/backend/backend/copilot/tools/
      • tool_schema_test.py (1 conflict, ~16 lines)

🟢 Low Risk — File Overlap Only

These PRs touch the same files but different sections (click to expand)

Summary: 8 conflict(s), 0 medium risk, 6 low risk (out of 14 PRs with file overlap)


Auto-generated on push. Ignores: openapi.json, lock files.

Comment thread autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 21, 2026

Codecov Report

❌ Patch coverage is 83.26693% with 84 lines in your changes missing coverage. Please review.
✅ Project coverage is 67.14%. Comparing base (e3f6d36) to head (2151051).
⚠️ Report is 1 commits behind head on dev.

Additional details and impacted files
@@            Coverage Diff             @@
##              dev   #12873      +/-   ##
==========================================
+ Coverage   67.09%   67.14%   +0.04%     
==========================================
  Files        1897     1897              
  Lines      145046   145364     +318     
  Branches    15264    15292      +28     
==========================================
+ Hits        97321    97601     +280     
- Misses      44803    44824      +21     
- Partials     2922     2939      +17     
Flag Coverage Δ
platform-backend 77.14% <83.15%> (+0.02%) ⬆️
platform-frontend 23.84% <84.84%> (+0.38%) ⬆️
platform-frontend-e2e 30.14% <0.00%> (-0.66%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
Platform Backend 77.14% <83.15%> (+0.02%) ⬆️
Platform Frontend 30.88% <84.84%> (+0.09%) ⬆️
AutoGPT Libs ∅ <ø> (∅)
Classic AutoGPT 28.43% <ø> (ø)
🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py (1)

727-740: ⚠️ Potential issue | 🟡 Minor

Patch config so this test is not env-dependent.

This assertion assumes the module-level SDK config resolves render_reasoning_in_ui=True. If CHAT_RENDER_REASONING_IN_UI=false is present, production should pass False and this test will fail for the wrong reason. Patch backend.copilot.sdk.service.config and assert the configured value.

Proposed fix
+        cfg = _make_config(render_reasoning_in_ui=False)
         with (
             patch("asyncio.sleep", new=AsyncMock()),
+            patch(f"{_SVC}.config", cfg),
             patch("backend.copilot.sdk.service.SDKResponseAdapter") as mock_cls,
         ):
             new_adapter = MagicMock()
             mock_cls.return_value = new_adapter
             async for _ in _do_transient_backoff(3, state, "msg-1", "sess-1"):
@@
         mock_cls.assert_called_once_with(
             message_id="msg-1",
             session_id="sess-1",
-            render_reasoning_in_ui=True,
+            render_reasoning_in_ui=False,
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py` around
lines 727 - 740, The test currently assumes module-level SDK config yields
render_reasoning_in_ui=True; patch backend.copilot.sdk.service.config inside the
test to a controlled object (or set CHAT_RENDER_REASONING_IN_UI) so the value is
deterministic, then call _do_transient_backoff and assert SDKResponseAdapter
(mock_cls) was called with render_reasoning_in_ui equal to that patched config
value rather than hardcoding True; target symbols: _do_transient_backoff,
SDKResponseAdapter (mock_cls), and backend.copilot.sdk.service.config.
autogpt_platform/backend/backend/copilot/stream_registry.py (1)

496-548: ⚠️ Potential issue | 🟠 Major

The replay cap bounds only the initial XREAD, not the total backlog for running sessions; completed sessions truncate without explicit recovery.

count=config.stream_replay_count (default 200) caps only the first XREAD call. For running sessions, _stream_listener continues from replay_last_id in a while True loop, draining all remaining messages in batches of 100 — the replay backlog is not prevented, just split across multiple XREAD calls. For completed sessions, no listener is started; only StreamFinish() is sent, truncating the turn without a visible mechanism to hydrate missing messages from the database.

The deduplication rationale ("frontend deduplicates on block ids") applies to old entries being dropped, not to truncating messages after the cap point.

Consider capping the total replay count (reject messages after the Nth oldest entry when subscribing) or starting the live listener from the stream's latest ID after the capped replay to prevent the batched backlog drain for running sessions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/stream_registry.py` around lines 496
- 548, Summary: The current replay only limits the first XREAD batch, allowing
unlimited backlog draining for running sessions and silent truncation for
completed sessions; enforce a total replay cap and change listener start
behavior when cap is reached. Fix: In the replay loop in stream_registry.py
where messages are processed (variables replayed_count, replay_last_id,
config.stream_replay_count, subscriber_queue), stop processing further messages
as soon as replayed_count >= config.stream_replay_count and break out of both
loops; record a flag (e.g. replay_truncated=True). When starting the live
listener via _stream_listener for session_status == "running", if
replay_truncated is True start the listener from the stream latest-new ID (use
Redis semantics by passing "$" or equivalent to indicate only new messages)
instead of replay_last_id to avoid draining backlog; if session is completed and
replay_truncated is True, trigger a recovery path (e.g. call the DB hydration
routine or emit a StreamFinish with a recovery_needed flag) so truncated
messages are not silently lost. Ensure log entries note when truncation occurs
and reference the symbols _stream_listener, replayed_count, replay_last_id,
config.stream_replay_count, and StreamFinish.
autogpt_platform/backend/backend/copilot/sdk/response_adapter.py (1)

144-173: ⚠️ Potential issue | 🟠 Major

Preserve SDK reasoning persistence when suppressing the UI stream.

When render_reasoning_in_ui=False, the response adapter suppresses StreamReasoningStart, StreamReasoningDelta, and StreamReasoningEnd events (lines 379–396, 167). However, in service.py::_dispatch_response (lines 1705–1716), ChatMessage(role="reasoning") rows are appended to session.messages only when those StreamReasoning* responses are received. This means render_reasoning_in_ui=False currently loses session.messages persistence of reasoning content, contradicting the config docstring (line 216) which promises "Reasoning rows are still persisted to session.messages."

Thinking content is persisted to the transcript via _format_sdk_content_blocks, but that does not populate session.messages, so reloaded sessions will not display the reasoning.

Move the StreamReasoning* suppression downstream—after _dispatch_response persists the events to session.messages—or add a separate ThinkingBlock persistence path in service.py that appends ChatMessage(role="reasoning") directly when the adapter suppresses events.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/sdk/response_adapter.py` around
lines 144 - 173, The adapter currently stops emitting
StreamReasoningStart/Delta/End when render_reasoning_in_ui is False which
prevents service.py::_dispatch_response from persisting
ChatMessage(role="reasoning") into session.messages; update response_adapter.py
in the ThinkingBlock branch (where _end_text_if_open and
_ensure_reasoning_started are called and StreamReasoningDelta is appended) so
that even when self._render_reasoning_in_ui is False you still emit a
persistence-only signal (or call the same persistence helper) that will result
in a ChatMessage(role="reasoning") being recorded (mirror what
_format_sdk_content_blocks or the StreamReasoning* events normally cause) —
alternatively, if you prefer the other approach, change
service.py::_dispatch_response to detect suppressed reasoning streams and append
ChatMessage(role="reasoning") directly when a ThinkingBlock payload is present;
reference ThinkingBlock, render_reasoning_in_ui, StreamReasoningDelta,
_format_sdk_content_blocks, and ChatMessage(role="reasoning") to locate the
relevant code paths.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@autogpt_platform/backend/backend/copilot/baseline/reasoning.py`:
- Around line 176-183: When render_in_ui=False we still persist role="reasoning"
rows and later rehydrate them into UI parts (convertChatSessionToUiMessages.ts),
so update the logic to prevent that: either stop persisting reasoning rows from
StreamReasoning* when render_in_ui is False by branching in the code that
appends/mutates the persisted reasoning row in reasoning.py (check the
render_in_ui parameter before calling the persist/append path), or add a guard
in the hydration/conversion step (convertChatSessionToUiMessages.ts) to filter
out persisted rows with role="reasoning" when the session's
ChatConfig.render_reasoning_in_ui is false; pick one approach and ensure
StreamReasoning* still reads payloads for audit if needed but does not create
persisted UI reasoning rows when render_in_ui is False.

In `@autogpt_platform/frontend/src/app/`(platform)/copilot/useCopilotStream.ts:
- Around line 147-158: The debounce currently stamps
lastReconnectResumeAtRef.current before a successful reattach, causing quick
failed resumes to be dropped by the early return and stalling retries; change
the logic so you either (a) only update lastReconnectResumeAtRef.current after a
confirmed successful resume/reattach, or (b) instead of returning immediately
when sinceLastResume < RECONNECT_DEBOUNCE_MS, schedule a delayed call
(setTimeout) to attempt reconnect after the remaining debounce window so failed
resume->onError->handleReconnect won't be ignored; update the code that sets
lastReconnectResumeAtRef.current and the debounce branch in the reconnect
routine (references: lastReconnectResumeAtRef.current, RECONNECT_DEBOUNCE_MS,
onError, handleReconnect) accordingly.

---

Outside diff comments:
In `@autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py`:
- Around line 727-740: The test currently assumes module-level SDK config yields
render_reasoning_in_ui=True; patch backend.copilot.sdk.service.config inside the
test to a controlled object (or set CHAT_RENDER_REASONING_IN_UI) so the value is
deterministic, then call _do_transient_backoff and assert SDKResponseAdapter
(mock_cls) was called with render_reasoning_in_ui equal to that patched config
value rather than hardcoding True; target symbols: _do_transient_backoff,
SDKResponseAdapter (mock_cls), and backend.copilot.sdk.service.config.

In `@autogpt_platform/backend/backend/copilot/sdk/response_adapter.py`:
- Around line 144-173: The adapter currently stops emitting
StreamReasoningStart/Delta/End when render_reasoning_in_ui is False which
prevents service.py::_dispatch_response from persisting
ChatMessage(role="reasoning") into session.messages; update response_adapter.py
in the ThinkingBlock branch (where _end_text_if_open and
_ensure_reasoning_started are called and StreamReasoningDelta is appended) so
that even when self._render_reasoning_in_ui is False you still emit a
persistence-only signal (or call the same persistence helper) that will result
in a ChatMessage(role="reasoning") being recorded (mirror what
_format_sdk_content_blocks or the StreamReasoning* events normally cause) —
alternatively, if you prefer the other approach, change
service.py::_dispatch_response to detect suppressed reasoning streams and append
ChatMessage(role="reasoning") directly when a ThinkingBlock payload is present;
reference ThinkingBlock, render_reasoning_in_ui, StreamReasoningDelta,
_format_sdk_content_blocks, and ChatMessage(role="reasoning") to locate the
relevant code paths.

In `@autogpt_platform/backend/backend/copilot/stream_registry.py`:
- Around line 496-548: Summary: The current replay only limits the first XREAD
batch, allowing unlimited backlog draining for running sessions and silent
truncation for completed sessions; enforce a total replay cap and change
listener start behavior when cap is reached. Fix: In the replay loop in
stream_registry.py where messages are processed (variables replayed_count,
replay_last_id, config.stream_replay_count, subscriber_queue), stop processing
further messages as soon as replayed_count >= config.stream_replay_count and
break out of both loops; record a flag (e.g. replay_truncated=True). When
starting the live listener via _stream_listener for session_status == "running",
if replay_truncated is True start the listener from the stream latest-new ID
(use Redis semantics by passing "$" or equivalent to indicate only new messages)
instead of replay_last_id to avoid draining backlog; if session is completed and
replay_truncated is True, trigger a recovery path (e.g. call the DB hydration
routine or emit a StreamFinish with a recovery_needed flag) so truncated
messages are not silently lost. Ensure log entries note when truncation occurs
and reference the symbols _stream_listener, replayed_count, replay_last_id,
config.stream_replay_count, and StreamFinish.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5b2e1c3d-1ea2-4e96-91dc-8568fd67f0bc

📥 Commits

Reviewing files that changed from the base of the PR and between e4f291e and 35e92e0.

📒 Files selected for processing (11)
  • autogpt_platform/backend/backend/copilot/baseline/reasoning.py
  • autogpt_platform/backend/backend/copilot/baseline/reasoning_test.py
  • autogpt_platform/backend/backend/copilot/baseline/service.py
  • autogpt_platform/backend/backend/copilot/config.py
  • autogpt_platform/backend/backend/copilot/config_test.py
  • autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py
  • autogpt_platform/backend/backend/copilot/sdk/response_adapter.py
  • autogpt_platform/backend/backend/copilot/sdk/response_adapter_test.py
  • autogpt_platform/backend/backend/copilot/sdk/service.py
  • autogpt_platform/backend/backend/copilot/stream_registry.py
  • autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
  • GitHub Check: integration_test
  • GitHub Check: check API types
  • GitHub Check: Seer Code Review
  • GitHub Check: type-check (3.11)
  • GitHub Check: test (3.12)
  • GitHub Check: test (3.13)
  • GitHub Check: test (3.11)
  • GitHub Check: type-check (3.13)
  • GitHub Check: type-check (3.12)
  • GitHub Check: end-to-end tests
  • GitHub Check: Analyze (python)
  • GitHub Check: Analyze (typescript)
  • GitHub Check: Check PR Status
🧰 Additional context used
📓 Path-based instructions (10)
autogpt_platform/backend/**/*.py

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

autogpt_platform/backend/**/*.py: Use Python 3.11 (required; managed by Poetry via pyproject.toml) for backend development
Always run 'poetry run format' (Black + isort) before linting in backend development
Always run 'poetry run lint' (ruff) after formatting in backend development

autogpt_platform/backend/**/*.py: Use poetry run ... command for executing Python package dependencies
Use top-level imports only — avoid local/inner imports except for lazy imports of heavy optional dependencies like openpyxl
Use absolute imports with from backend.module import ... for cross-package imports; single-dot relative imports are acceptable for sibling modules within the same package; avoid double-dot relative imports
Do not use duck typing — avoid hasattr/getattr/isinstance for type dispatch; use typed interfaces/unions/protocols instead
Use Pydantic models over dataclass/namedtuple/dict for structured data
Do not use linter suppressors — no # type: ignore, # noqa, # pyright: ignore; fix the type/code instead
Prefer list comprehensions over manual loop-and-append patterns
Use early return with guard clauses first to avoid deep nesting
Use %s for deferred interpolation in debug log statements for efficiency; use f-strings elsewhere for readability (e.g., logger.debug("Processing %s items", count) vs logger.info(f"Processing {count} items"))
Sanitize error paths by using os.path.basename() in error messages to avoid leaking directory structure
Be aware of TOCTOU (Time-Of-Check-Time-Of-Use) issues — avoid check-then-act patterns for file access and credit charging
Use transaction=True for Redis pipelines to ensure atomicity on multi-step operations
Use max(0, value) guards for computed values that should never be negative
Keep files under ~300 lines; if a file grows beyond this, split by responsibility (extract helpers, models, or a sub-module into a new file)
Keep functions under ~40 lines; extract named helpers when a function grows longer
...

Files:

  • autogpt_platform/backend/backend/copilot/stream_registry.py
  • autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py
  • autogpt_platform/backend/backend/copilot/baseline/service.py
  • autogpt_platform/backend/backend/copilot/config_test.py
  • autogpt_platform/backend/backend/copilot/sdk/response_adapter_test.py
  • autogpt_platform/backend/backend/copilot/sdk/service.py
  • autogpt_platform/backend/backend/copilot/config.py
  • autogpt_platform/backend/backend/copilot/sdk/response_adapter.py
  • autogpt_platform/backend/backend/copilot/baseline/reasoning.py
  • autogpt_platform/backend/backend/copilot/baseline/reasoning_test.py
autogpt_platform/{backend,autogpt_libs}/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Format Python code with poetry run format

Files:

  • autogpt_platform/backend/backend/copilot/stream_registry.py
  • autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py
  • autogpt_platform/backend/backend/copilot/baseline/service.py
  • autogpt_platform/backend/backend/copilot/config_test.py
  • autogpt_platform/backend/backend/copilot/sdk/response_adapter_test.py
  • autogpt_platform/backend/backend/copilot/sdk/service.py
  • autogpt_platform/backend/backend/copilot/config.py
  • autogpt_platform/backend/backend/copilot/sdk/response_adapter.py
  • autogpt_platform/backend/backend/copilot/baseline/reasoning.py
  • autogpt_platform/backend/backend/copilot/baseline/reasoning_test.py
autogpt_platform/backend/**/*_test.py

📄 CodeRabbit inference engine (autogpt_platform/backend/AGENTS.md)

autogpt_platform/backend/**/*_test.py: Use pytest with snapshot testing for API responses
Colocate test files with source files using *_test.py naming convention
Mock at boundaries — mock where the symbol is used, not where it's defined; after refactoring, update mock targets to match new module paths
Use AsyncMock from unittest.mock for async functions in tests
When writing tests, use Test-Driven Development (TDD): write failing tests marked with @pytest.mark.xfail before implementation, then remove the marker once the implementation is complete
When creating snapshots in tests, use poetry run pytest path/to/test.py --snapshot-update; always review snapshot changes with git diff before committing

Files:

  • autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py
  • autogpt_platform/backend/backend/copilot/config_test.py
  • autogpt_platform/backend/backend/copilot/sdk/response_adapter_test.py
  • autogpt_platform/backend/backend/copilot/baseline/reasoning_test.py
autogpt_platform/frontend/**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

autogpt_platform/frontend/**/*.{ts,tsx,js,jsx}: Use Node.js 21+ with pnpm package manager for frontend development
Always run 'pnpm format' for formatting and linting code in frontend development

Format frontend code using pnpm format

Files:

  • autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts
autogpt_platform/frontend/**/*.{tsx,ts}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

autogpt_platform/frontend/**/*.{tsx,ts}: Use function declarations for components and handlers (not arrow functions) in React components
Only use arrow functions for small inline lambdas (map, filter, etc.) in React components
Use PascalCase for component names and camelCase with 'use' prefix for hook names in React
Use Tailwind CSS utilities only for styling in frontend components
Use design system components from 'src/components/' (atoms, molecules, organisms) in frontend development
Never use 'src/components/legacy/' in frontend code
Only use Phosphor Icons (@phosphor-icons/react) for icons in frontend components
Use generated API hooks from '@/app/api/generated/endpoints/' instead of deprecated 'BackendAPI' or 'src/lib/autogpt-server-api/
'
Use React Query for server state (via generated hooks) in frontend development
Default to client components ('use client') in Next.js; only use server components for SEO or extreme TTFB needs
Use '' component for rendering errors in frontend UI; use toast notifications for mutation errors; use 'Sentry.captureException()' for manual exceptions
Separate render logic from data/behavior in React components; keep comments minimal (code should be self-documenting)

Files:

  • autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts
autogpt_platform/frontend/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

autogpt_platform/frontend/**/*.{ts,tsx}: No barrel files or 'index.ts' re-exports in frontend code
Regenerate API hooks with 'pnpm generate:api' after backend OpenAPI spec changes in frontend development

autogpt_platform/frontend/**/*.{ts,tsx}: Fully capitalize acronyms in symbols, e.g. graphID, useBackendAPI
Use function declarations (not arrow functions) for components and handlers
No dark: Tailwind classes — the design system handles dark mode
Use Next.js <Link> for internal navigation — never raw <a> tags
No any types unless the value genuinely can be anything
No linter suppressors (// @ts-ignore``, // eslint-disable) — fix the actual issue
Keep files under ~200 lines; extract sub-components or hooks into their own files when a file grows beyond this
Keep render functions and hooks under ~50 lines; extract named helpers or sub-components when they grow longer
Use generated API hooks from `@/app/api/generated/endpoints/` with pattern `use{Method}{Version}{OperationName}` and regenerate with `pnpm generate:api`
Do not use `useCallback` or `useMemo` unless asked to optimise a given function
Separate render logic (`.tsx`) from business logic (`use*.ts` hooks)
Use ErrorCard for render errors, toast for mutations, and Sentry for exceptions in the frontend

Files:

  • autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts
autogpt_platform/frontend/src/**/*.{ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

autogpt_platform/frontend/src/**/*.{ts,tsx}: Use generated API hooks from @/app/api/__generated__/endpoints/ following the pattern use{Method}{Version}{OperationName}, and regenerate with pnpm generate:api
Separate render logic from business logic using component.tsx + useComponent.ts + helpers.ts pattern, colocate state when possible and avoid creating large components, use sub-components in local /components folder
Use function declarations for components and handlers, use arrow functions only for callbacks
Do not use useCallback or useMemo unless asked to optimise a given function

Files:

  • autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts
autogpt_platform/frontend/**/*.ts

📄 CodeRabbit inference engine (AGENTS.md)

No barrel files or index.ts re-exports in the frontend

Do not type hook returns, let Typescript infer as much as possible

Files:

  • autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts
autogpt_platform/frontend/src/**/*.ts

📄 CodeRabbit inference engine (AGENTS.md)

Do not type hook returns, let Typescript infer as much as possible

Extract component logic into custom hooks grouped by concern, not by component. Each hook should represent a cohesive domain of functionality (e.g., useSearch, useFilters, usePagination) rather than bundling all state into one useComponentState hook. Put each hook in its own .ts file.

Files:

  • autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts
autogpt_platform/**/*.{ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

Never type with any, if no types available use unknown

Files:

  • autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts
🧠 Learnings (27)
📓 Common learnings
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12445
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1071-1072
Timestamp: 2026-03-17T06:48:26.471Z
Learning: In Significant-Gravitas/AutoGPT (autogpt_platform), the AI SDK enforces `z.strictObject({type, errorText})` on SSE `StreamError` responses, so additional fields like `retryable: bool` cannot be added to `StreamError` or serialized via `to_sse()`. Instead, retry signaling for transient Anthropic API errors is done via the `COPILOT_RETRYABLE_ERROR_PREFIX` constant prepended to persisted session messages (in `ChatMessage.content`). The frontend detects retryable errors by checking `markerType === "retryable_error"` from `parseSpecialMarkers()` — no SSE schema changes and no string matching on error text. This pattern was established in PR `#12445`, commit 64d82797b.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12773
File: autogpt_platform/backend/backend/copilot/pending_messages.py:52-64
Timestamp: 2026-04-14T14:36:25.545Z
Learning: In `autogpt_platform/backend/backend/copilot` (PR `#12773`, commit d7bced0c6): when draining pending messages into `session.messages`, each message's text is sanitized via `strip_user_context_tags` before persistence to prevent user-controlled `<user_context>` injection from bypassing the trusted server-side context prefix. Additionally, if `upsert_chat_session` fails after draining, the drained `PendingMessage` objects are requeued back to Redis to avoid silent message loss. Do NOT flag the drain-then-requeue pattern as redundant — it is the intentional failure-resilience strategy for the pending buffer.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12796
File: autogpt_platform/backend/backend/api/features/chat/routes.py:504-527
Timestamp: 2026-04-16T12:33:44.990Z
Learning: In `autogpt_platform/backend/backend/api/features/chat/routes.py`, `get_session` (PR `#12796`, commit 3771bfad9c1) closes the TOCTOU race between the initial `stream_registry.get_active_session()` pre-check and `get_chat_messages_paginated()` with a post-check re-verification: after the DB fetch, if `is_initial_load and active_session is not None`, it calls `get_active_session` a second time; if `post_active is None` (stream completed during the window), it resets `from_start=True`, `forward_paginated=True`, and re-fetches messages from sequence 0. Do NOT flag the double `get_active_session` call pattern as redundant — it is the intentional TOCTOU mitigation for pagination direction selection.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12773
File: autogpt_platform/frontend/src/app/api/openapi.json:12803-12806
Timestamp: 2026-04-14T06:39:52.592Z
Learning: Repo: Significant-Gravitas/AutoGPT — autogpt_platform
Intentional message length caps:
- StreamChatRequest.message maxLength = 64000.
- QueuePendingMessageRequest.message maxLength = 32000 (matches PendingMessage.content).
Rationale: both feed the same LLM context window; pending must not exceed stream, and larger ceilings replace legacy 4000/16000.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-21T11:41:05.877Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py` (PR `#12870`, commits 080d42b9d and 3d7b38162), the `_close_reasoning_block_if_open(state)` helper centralises all four reasoning-block-close call sites (text branch, tool_calls branch, stream-end, exception path). The outer `finally` block of `_baseline_llm_caller` calls this helper plus stripper flush + `StreamTextEnd` to guarantee matched end events are emitted before `StreamFinishStep` on both normal and exception paths. Do NOT flag duplicated close logic or missing reasoning-end-on-exception as issues in this function.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12797
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1991-2021
Timestamp: 2026-04-15T13:44:34.273Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py` (`_run_stream_attempt`), the pre-create block (PR `#12797`) intentionally does NOT call `state.transcript_builder.append_assistant(...)` when inserting the empty assistant placeholder into `ctx.session.messages`. The transcript is left ending at the `tool_result` entry (N entries) while `message_count` metadata is N+1. This mismatch is benign and deliberate: on the next `--resume`, the SDK sees the transcript ending at `tool_result` and correctly regenerates the assistant response. Pre-appending the assistant turn to the transcript would suppress regeneration while leaving `session.messages[-1].content = ""` permanently (worse outcome). On the gap-fallback path, `transcript_msg_count (N+1) >= msg_count-1 (N)` means no gap is injected for the empty placeholder, which is correct because injecting an empty assistant message as context would mislead the SDK. Do NOT flag this transcript/message_count discrepancy as a bug.
📚 Learning: 2026-04-14T14:45:42.706Z
Learnt from: 0ubbe
Repo: Significant-Gravitas/AutoGPT PR: 12766
File: autogpt_platform/backend/backend/copilot/stream_registry.py:1175-1193
Timestamp: 2026-04-14T14:45:42.706Z
Learning: In `autogpt_platform/backend/backend/copilot/stream_registry.py`, `disconnect_all_listeners(session_id)` is intentionally pod-local (inspects in-memory `_listener_sessions`) and session-scoped (not subscriber-scoped). It cancels all listener tasks for the session on the current pod only. If the DELETE request hits a different pod, nothing is cancelled on that pod — the XREAD timeout (5 s block + status poll) bounds the worst-case release time. In the rare two-tabs-same-session case both listeners on the same pod would be torn down. A subscriber-scoped cross-pod fan-out (per-listener tokens + Redis pub/sub) is deferred as a follow-up. Do NOT re-flag this as a blocking issue; the limitation is explicitly documented in the function's docstring (PR `#12766`, commit 1f3ebafd5).

Applied to files:

  • autogpt_platform/backend/backend/copilot/stream_registry.py
📚 Learning: 2026-04-16T12:33:44.990Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12796
File: autogpt_platform/backend/backend/api/features/chat/routes.py:504-527
Timestamp: 2026-04-16T12:33:44.990Z
Learning: In `autogpt_platform/backend/backend/api/features/chat/routes.py`, `get_session` (PR `#12796`, commit 3771bfad9c1) closes the TOCTOU race between the initial `stream_registry.get_active_session()` pre-check and `get_chat_messages_paginated()` with a post-check re-verification: after the DB fetch, if `is_initial_load and active_session is not None`, it calls `get_active_session` a second time; if `post_active is None` (stream completed during the window), it resets `from_start=True`, `forward_paginated=True`, and re-fetches messages from sequence 0. Do NOT flag the double `get_active_session` call pattern as redundant — it is the intentional TOCTOU mitigation for pagination direction selection.

Applied to files:

  • autogpt_platform/backend/backend/copilot/stream_registry.py
📚 Learning: 2026-03-13T15:49:44.961Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12385
File: autogpt_platform/backend/backend/copilot/rate_limit.py:0-0
Timestamp: 2026-03-13T15:49:44.961Z
Learning: In `autogpt_platform/backend/backend/copilot/rate_limit.py`, the original per-session token window (with a TTL-based reset) was replaced with fixed daily and weekly windows. `resets_at` is now derived from `_daily_reset_time()` (midnight UTC) and `_weekly_reset_time()` (next Monday 00:00 UTC) — deterministic fixed-boundary calculations that require no Redis TTL introspection.

Applied to files:

  • autogpt_platform/backend/backend/copilot/stream_registry.py
📚 Learning: 2026-04-14T14:36:25.545Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12773
File: autogpt_platform/backend/backend/copilot/pending_messages.py:52-64
Timestamp: 2026-04-14T14:36:25.545Z
Learning: In `autogpt_platform/backend/backend/copilot` (PR `#12773`, commit d7bced0c6): when draining pending messages into `session.messages`, each message's text is sanitized via `strip_user_context_tags` before persistence to prevent user-controlled `<user_context>` injection from bypassing the trusted server-side context prefix. Additionally, if `upsert_chat_session` fails after draining, the drained `PendingMessage` objects are requeued back to Redis to avoid silent message loss. Do NOT flag the drain-then-requeue pattern as redundant — it is the intentional failure-resilience strategy for the pending buffer.

Applied to files:

  • autogpt_platform/backend/backend/copilot/stream_registry.py
📚 Learning: 2026-04-16T13:28:28.641Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12814
File: autogpt_platform/backend/backend/copilot/model.py:0-0
Timestamp: 2026-04-16T13:28:28.641Z
Learning: In `autogpt_platform/backend/backend/copilot/model.py` (PR `#12814`, commit 259d37083): `append_and_save_message` uses `async with _get_session_lock(session_id)` — the same shared context manager used across the module — which internally acquires `redis-py`'s built-in `Lock` (key `copilot:session_lock:{session_id}`, timeout=10s, blocking_timeout=2s) via an atomic Lua-script. Lock release is also owner-verified via Lua so a slow pod can never delete a lock it no longer holds. On Redis failure the lock is skipped with a warning; the in-function idempotency check (`session.messages[-1].role` and `.content` comparison) still runs as a fallback. Do NOT expect a raw `redis.set(nx=True)` / `redis.delete()` pattern here — that intermediate approach was replaced in commit 259d37083.

Applied to files:

  • autogpt_platform/backend/backend/copilot/stream_registry.py
📚 Learning: 2026-02-26T17:02:22.448Z
Learnt from: Pwuts
Repo: Significant-Gravitas/AutoGPT PR: 12211
File: .pre-commit-config.yaml:160-179
Timestamp: 2026-02-26T17:02:22.448Z
Learning: Keep the pre-commit hook pattern broad for autogpt_platform/backend to ensure OpenAPI schema changes are captured. Do not narrow to backend/api/ alone, since the generated schema depends on Pydantic models across multiple directories (backend/data/, backend/blocks/, backend/copilot/, backend/integrations/, backend/util/). Narrowing could miss schema changes and cause frontend type desynchronization.

Applied to files:

  • autogpt_platform/backend/backend/copilot/stream_registry.py
  • autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py
  • autogpt_platform/backend/backend/copilot/baseline/service.py
  • autogpt_platform/backend/backend/copilot/config_test.py
  • autogpt_platform/backend/backend/copilot/sdk/response_adapter_test.py
  • autogpt_platform/backend/backend/copilot/sdk/service.py
  • autogpt_platform/backend/backend/copilot/config.py
  • autogpt_platform/backend/backend/copilot/sdk/response_adapter.py
  • autogpt_platform/backend/backend/copilot/baseline/reasoning.py
  • autogpt_platform/backend/backend/copilot/baseline/reasoning_test.py
📚 Learning: 2026-03-04T08:04:35.881Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12273
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:216-220
Timestamp: 2026-03-04T08:04:35.881Z
Learning: In the AutoGPT Copilot backend, ensure that SVG images are not treated as vision image types by excluding 'image/svg+xml' from INLINEABLE_MIME_TYPES and MULTIMODAL_TYPES in tool_adapter.py; the Claude API supports PNG, JPEG, GIF, and WebP for vision. SVGs (XML text) should be handled via the text path instead, not the vision path.

Applied to files:

  • autogpt_platform/backend/backend/copilot/stream_registry.py
  • autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py
  • autogpt_platform/backend/backend/copilot/baseline/service.py
  • autogpt_platform/backend/backend/copilot/config_test.py
  • autogpt_platform/backend/backend/copilot/sdk/response_adapter_test.py
  • autogpt_platform/backend/backend/copilot/sdk/service.py
  • autogpt_platform/backend/backend/copilot/config.py
  • autogpt_platform/backend/backend/copilot/sdk/response_adapter.py
  • autogpt_platform/backend/backend/copilot/baseline/reasoning.py
  • autogpt_platform/backend/backend/copilot/baseline/reasoning_test.py
📚 Learning: 2026-04-01T04:17:41.600Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12632
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:0-0
Timestamp: 2026-04-01T04:17:41.600Z
Learning: When reviewing AutoGPT Copilot tool implementations, accept that `readOnlyHint=True` (provided via `ToolAnnotations`) may be applied unconditionally to *all* tools—even tools that have side effects (e.g., `bash_exec`, `write_workspace_file`, or other write/save operations). Do **not** flag these tools for having `readOnlyHint=True`; this is intentional to enable fully-parallel dispatch by the Anthropic SDK/CLI and has been E2E validated. Only flag `readOnlyHint` issues if they conflict with the established `ToolAnnotations` behavior (e.g., missing/incorrect propagation relative to the intended annotation mechanism).

Applied to files:

  • autogpt_platform/backend/backend/copilot/stream_registry.py
  • autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py
  • autogpt_platform/backend/backend/copilot/baseline/service.py
  • autogpt_platform/backend/backend/copilot/config_test.py
  • autogpt_platform/backend/backend/copilot/sdk/response_adapter_test.py
  • autogpt_platform/backend/backend/copilot/sdk/service.py
  • autogpt_platform/backend/backend/copilot/config.py
  • autogpt_platform/backend/backend/copilot/sdk/response_adapter.py
  • autogpt_platform/backend/backend/copilot/baseline/reasoning.py
  • autogpt_platform/backend/backend/copilot/baseline/reasoning_test.py
📚 Learning: 2026-03-05T15:42:08.207Z
Learnt from: ntindle
Repo: Significant-Gravitas/AutoGPT PR: 12297
File: .claude/skills/backend-check/SKILL.md:14-16
Timestamp: 2026-03-05T15:42:08.207Z
Learning: In Python files under autogpt_platform/backend (recursively), rely on poetry run format to perform formatting (Black + isort) and linting (ruff). Do not run poetry run lint as a separate step after poetry run format, since format already includes linting checks.

Applied to files:

  • autogpt_platform/backend/backend/copilot/stream_registry.py
  • autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py
  • autogpt_platform/backend/backend/copilot/baseline/service.py
  • autogpt_platform/backend/backend/copilot/config_test.py
  • autogpt_platform/backend/backend/copilot/sdk/response_adapter_test.py
  • autogpt_platform/backend/backend/copilot/sdk/service.py
  • autogpt_platform/backend/backend/copilot/config.py
  • autogpt_platform/backend/backend/copilot/sdk/response_adapter.py
  • autogpt_platform/backend/backend/copilot/baseline/reasoning.py
  • autogpt_platform/backend/backend/copilot/baseline/reasoning_test.py
📚 Learning: 2026-03-16T16:35:40.236Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/api/features/workflow_import.py:54-63
Timestamp: 2026-03-16T16:35:40.236Z
Learning: Avoid using the word 'competitor' in public-facing identifiers and text. Use neutral naming for API paths, model names, function names, and UI text. Examples: rename 'CompetitorFormat' to 'SourcePlatform', 'convert_competitor_workflow' to 'convert_workflow', '/competitor-workflow' to '/workflow'. Apply this guideline to files under autogpt_platform/backend and autogpt_platform/frontend.

Applied to files:

  • autogpt_platform/backend/backend/copilot/stream_registry.py
  • autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py
  • autogpt_platform/backend/backend/copilot/baseline/service.py
  • autogpt_platform/backend/backend/copilot/config_test.py
  • autogpt_platform/backend/backend/copilot/sdk/response_adapter_test.py
  • autogpt_platform/backend/backend/copilot/sdk/service.py
  • autogpt_platform/backend/backend/copilot/config.py
  • autogpt_platform/backend/backend/copilot/sdk/response_adapter.py
  • autogpt_platform/backend/backend/copilot/baseline/reasoning.py
  • autogpt_platform/backend/backend/copilot/baseline/reasoning_test.py
📚 Learning: 2026-03-31T15:37:38.626Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12623
File: autogpt_platform/backend/backend/copilot/tools/agent_generator/fixer.py:37-47
Timestamp: 2026-03-31T15:37:38.626Z
Learning: When validating/constructing Anthropic API model IDs in Significant-Gravitas/AutoGPT, allow the hyphen-separated Claude Opus 4.6 model ID `claude-opus-4-6` (it corresponds to `LlmModel.CLAUDE_4_6_OPUS` in `autogpt_platform/backend/backend/blocks/llm.py`). Do NOT require the dot-separated form in Anthropic contexts. Only OpenRouter routing variants should use the dot separator (e.g., `anthropic/claude-opus-4.6`); `claude-opus-4-6` should be treated as correct when passed to Anthropic, and flagged only if it’s used in the OpenRouter path where the dot form is expected.

Applied to files:

  • autogpt_platform/backend/backend/copilot/stream_registry.py
  • autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py
  • autogpt_platform/backend/backend/copilot/baseline/service.py
  • autogpt_platform/backend/backend/copilot/config_test.py
  • autogpt_platform/backend/backend/copilot/sdk/response_adapter_test.py
  • autogpt_platform/backend/backend/copilot/sdk/service.py
  • autogpt_platform/backend/backend/copilot/config.py
  • autogpt_platform/backend/backend/copilot/sdk/response_adapter.py
  • autogpt_platform/backend/backend/copilot/baseline/reasoning.py
  • autogpt_platform/backend/backend/copilot/baseline/reasoning_test.py
📚 Learning: 2026-04-15T02:43:36.890Z
Learnt from: ntindle
Repo: Significant-Gravitas/AutoGPT PR: 12780
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:0-0
Timestamp: 2026-04-15T02:43:36.890Z
Learning: When reviewing Python exception handlers, do not flag `isinstance(e, X)` checks as dead/unreachable if the caught exception `X` is a subclass of the exception type being handled. For example, if `X` (e.g., `VirusScanError`) inherits from `ValueError` (directly or via an intermediate class) and it can be raised within an `except ValueError:` block, then `isinstance(e, X)` inside that handler is reachable and should not be treated as dead code.

Applied to files:

  • autogpt_platform/backend/backend/copilot/stream_registry.py
  • autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py
  • autogpt_platform/backend/backend/copilot/baseline/service.py
  • autogpt_platform/backend/backend/copilot/config_test.py
  • autogpt_platform/backend/backend/copilot/sdk/response_adapter_test.py
  • autogpt_platform/backend/backend/copilot/sdk/service.py
  • autogpt_platform/backend/backend/copilot/config.py
  • autogpt_platform/backend/backend/copilot/sdk/response_adapter.py
  • autogpt_platform/backend/backend/copilot/baseline/reasoning.py
  • autogpt_platform/backend/backend/copilot/baseline/reasoning_test.py
📚 Learning: 2026-04-15T13:44:34.273Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12797
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1991-2021
Timestamp: 2026-04-15T13:44:34.273Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py` (`_run_stream_attempt`), the pre-create block (PR `#12797`) intentionally does NOT call `state.transcript_builder.append_assistant(...)` when inserting the empty assistant placeholder into `ctx.session.messages`. The transcript is left ending at the `tool_result` entry (N entries) while `message_count` metadata is N+1. This mismatch is benign and deliberate: on the next `--resume`, the SDK sees the transcript ending at `tool_result` and correctly regenerates the assistant response. Pre-appending the assistant turn to the transcript would suppress regeneration while leaving `session.messages[-1].content = ""` permanently (worse outcome). On the gap-fallback path, `transcript_msg_count (N+1) >= msg_count-1 (N)` means no gap is injected for the empty placeholder, which is correct because injecting an empty assistant message as context would mislead the SDK. Do NOT flag this transcript/message_count discrepancy as a bug.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py
  • autogpt_platform/backend/backend/copilot/sdk/response_adapter_test.py
  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-17T10:57:12.953Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/copilot/workflow_import/converter.py:0-0
Timestamp: 2026-03-17T10:57:12.953Z
Learning: In Significant-Gravitas/AutoGPT PR `#12440`, `autogpt_platform/backend/backend/copilot/workflow_import/converter.py` was fully rewritten (commit 732960e2d) to no longer make direct LLM/OpenAI API calls. The converter now builds a structured text prompt for AutoPilot/CoPilot instead. There is no `response.choices` access or any direct LLM client usage in this file. Do not flag `response.choices` access or LLM client initialization patterns as issues in this file.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py
  • autogpt_platform/backend/backend/copilot/sdk/response_adapter_test.py
  • autogpt_platform/backend/backend/copilot/sdk/service.py
  • autogpt_platform/backend/backend/copilot/sdk/response_adapter.py
📚 Learning: 2026-02-04T16:49:42.490Z
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-02-04T16:49:42.490Z
Learning: Applies to autogpt_platform/backend/**/test/**/*.py : Use snapshot testing with '--snapshot-update' flag in backend tests when output changes; always review with 'git diff'

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py
📚 Learning: 2026-04-21T11:41:05.877Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-21T11:41:05.877Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py` (PR `#12870`, commits 080d42b9d and 3d7b38162), the `_close_reasoning_block_if_open(state)` helper centralises all four reasoning-block-close call sites (text branch, tool_calls branch, stream-end, exception path). The outer `finally` block of `_baseline_llm_caller` calls this helper plus stripper flush + `StreamTextEnd` to guarantee matched end events are emitted before `StreamFinishStep` on both normal and exception paths. Do NOT flag duplicated close logic or missing reasoning-end-on-exception as issues in this function.

Applied to files:

  • autogpt_platform/backend/backend/copilot/baseline/service.py
  • autogpt_platform/backend/backend/copilot/sdk/response_adapter.py
  • autogpt_platform/backend/backend/copilot/baseline/reasoning.py
  • autogpt_platform/backend/backend/copilot/baseline/reasoning_test.py
📚 Learning: 2026-04-08T17:26:41.549Z
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: classic/CLAUDE.md:0-0
Timestamp: 2026-04-08T17:26:41.549Z
Learning: Use environment variables for global configuration: OPENAI_API_KEY, SMART_LLM, FAST_LLM, EMBEDDING_MODEL, TAVILY_API_KEY, SERPER_API_KEY, GOOGLE_API_KEY, LOG_LEVEL, DATABASE_STRING, PORT, FILE_STORAGE_BACKEND

Applied to files:

  • autogpt_platform/backend/backend/copilot/config_test.py
📚 Learning: 2026-04-13T14:19:19.341Z
Learnt from: Pwuts
Repo: Significant-Gravitas/AutoGPT PR: 12740
File: autogpt_platform/frontend/src/app/api/openapi.json:0-0
Timestamp: 2026-04-13T14:19:19.341Z
Learning: Repo: Significant-Gravitas/AutoGPT — autogpt_platform
When adding new CoPilot tool response models (e.g., ScheduleListResponse, ScheduleDeletedResponse), update backend/api/features/chat/routes.py to include them in the ToolResponseUnion so the frontend’s autogenerated openapi.json dummy export (/api/chat/schema/tool-responses) exposes them for codegen. Do not hand-edit frontend/src/app/api/openapi.json.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/response_adapter_test.py
  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-02-04T16:49:42.490Z
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-02-04T16:49:42.490Z
Learning: Applies to autogpt_platform/backend/backend/blocks/**/*.py : Write tests alongside block implementation when adding new blocks in backend

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/response_adapter_test.py
📚 Learning: 2026-03-17T06:48:26.471Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12445
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1071-1072
Timestamp: 2026-03-17T06:48:26.471Z
Learning: In Significant-Gravitas/AutoGPT (autogpt_platform), the AI SDK enforces `z.strictObject({type, errorText})` on SSE `StreamError` responses, so additional fields like `retryable: bool` cannot be added to `StreamError` or serialized via `to_sse()`. Instead, retry signaling for transient Anthropic API errors is done via the `COPILOT_RETRYABLE_ERROR_PREFIX` constant prepended to persisted session messages (in `ChatMessage.content`). The frontend detects retryable errors by checking `markerType === "retryable_error"` from `parseSpecialMarkers()` — no SSE schema changes and no string matching on error text. This pattern was established in PR `#12445`, commit 64d82797b.

Applied to files:

  • autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts
  • autogpt_platform/backend/backend/copilot/sdk/service.py
  • autogpt_platform/backend/backend/copilot/config.py
📚 Learning: 2026-03-19T11:25:27.842Z
Learnt from: Pwuts
Repo: Significant-Gravitas/AutoGPT PR: 12471
File: autogpt_platform/frontend/src/app/(platform)/admin/users/useAdminUsersPage.ts:48-55
Timestamp: 2026-03-19T11:25:27.842Z
Learning: In `autogpt_platform/frontend/src/app/(platform)/admin/users/useAdminUsersPage.ts`, the debounced search pattern uses `useRef(debounce(...))` (lodash) rather than `useEffect`+`setTimeout`. The debounced callback atomically applies `setDebouncedSearch(value.trim())` and `setCurrentPage(1)`, so the page reset is deferred along with the filter change and never races ahead. The query is driven by `debouncedSearch` (not the raw `searchQuery`), so no stale-filter fetch occurs on the first keystroke. Do not flag this pattern as incorrect in future reviews.

Applied to files:

  • autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts
📚 Learning: 2026-03-24T02:23:31.305Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12526
File: autogpt_platform/frontend/src/app/(platform)/copilot/components/RateLimitResetDialog/RateLimitResetDialog.tsx:0-0
Timestamp: 2026-03-24T02:23:31.305Z
Learning: In the Copilot platform UI code, follow the established Orval hook `onError` error-handling convention: first explicitly detect/handle `ApiError`, then read `error.response?.detail` (if present) as the primary message; if not available, fall back to `error.message`; and finally fall back to a generic string message. This convention should be used for generated Orval hooks even if the custom Orval mutator already maps details into `ApiError.message`, to keep consistency across hooks/components (e.g., `useCronSchedulerDialog.ts`, `useRunGraph.ts`, and rate-limit/reset flows).

Applied to files:

  • autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts
📚 Learning: 2026-04-01T18:54:16.035Z
Learnt from: Bentlybro
Repo: Significant-Gravitas/AutoGPT PR: 12633
File: autogpt_platform/frontend/src/app/(platform)/library/components/AgentFilterMenu/AgentFilterMenu.tsx:3-10
Timestamp: 2026-04-01T18:54:16.035Z
Learning: In the frontend, the legacy Select component at `@/components/__legacy__/ui/select` is an intentional, codebase-wide visual-consistency pattern. During code reviews, do not flag or block PRs merely for continuing to use this legacy Select. If a migration to the newer design-system Select is desired, bundle it into a single dedicated cleanup/migration PR that updates all Select usages together (e.g., avoid piecemeal replacements).

Applied to files:

  • autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts
📚 Learning: 2026-04-07T09:24:16.582Z
Learnt from: 0ubbe
Repo: Significant-Gravitas/AutoGPT PR: 12686
File: autogpt_platform/frontend/src/app/(no-navbar)/onboarding/steps/__tests__/PainPointsStep.test.tsx:1-19
Timestamp: 2026-04-07T09:24:16.582Z
Learning: In Significant-Gravitas/AutoGPT’s `autogpt_platform/frontend` (Vite + `vitejs/plugin-react` with the automatic JSX transform), do not flag usages of React types/components (e.g., `React.ReactNode`) in `.ts`/`.tsx` files as missing `React` imports. Since the React namespace is made available by the project’s TS/Vite setup, an explicit `import React from 'react'` or `import type { ReactNode } ...` is not required; only treat it as missing if typechecking (e.g., `pnpm types`) would actually fail.

Applied to files:

  • autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts
📚 Learning: 2026-04-02T05:43:49.128Z
Learnt from: 0ubbe
Repo: Significant-Gravitas/AutoGPT PR: 12640
File: autogpt_platform/frontend/src/app/(no-navbar)/onboarding/steps/WelcomeStep.tsx:13-13
Timestamp: 2026-04-02T05:43:49.128Z
Learning: Do not flag `import { Question } from "phosphor-icons/react"` as an invalid import. `Question` is a valid named export from `phosphor-icons/react` (as reflected in the package’s generated `.d.ts` files and re-exports via `dist/index.d.ts`), so it should be treated as a supported named export during code reviews.

Applied to files:

  • autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts
📚 Learning: 2026-04-14T06:39:52.592Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12773
File: autogpt_platform/frontend/src/app/api/openapi.json:12803-12806
Timestamp: 2026-04-14T06:39:52.592Z
Learning: Repo: Significant-Gravitas/AutoGPT — autogpt_platform
Intentional message length caps:
- StreamChatRequest.message maxLength = 64000.
- QueuePendingMessageRequest.message maxLength = 32000 (matches PendingMessage.content).
Rationale: both feed the same LLM context window; pending must not exceed stream, and larger ceilings replace legacy 4000/16000.

Applied to files:

  • autogpt_platform/backend/backend/copilot/config.py

Comment thread autogpt_platform/backend/backend/copilot/baseline/reasoning.py Outdated
@majdyz
Copy link
Copy Markdown
Contributor Author

majdyz commented Apr 21, 2026

Independent review

Scope: render_reasoning_in_ui flag, stream_replay_count cap, frontend reconnect debounce. Scope 3 (Last-Event-ID resume) explicitly deferred.

Findings

1. (Major) Reconnect debounce drops valid retries on fast-failing resumes.
useCopilotStream.ts::handleReconnect stamps lastReconnectResumeAtRef unconditionally before resumeStreamRef.current() actually reattaches. If the resume fetch fails quickly (e.g. 500ms on a 502 from Redis), onError calls handleReconnect again within the 1500ms window and the debounce silently drops it. Retry loop stalls; isReconnecting=true sticks until a manual refresh. Both sentry-bot and coderabbit flagged this. The fix is to coalesce by delaying rather than dropping: schedule a timer for the remaining debounce window, don't early-return. Will address.

2. (Major) render_reasoning_in_ui=False persistence asymmetry.

  • Baseline path: BaselineReasoningEmitter.on_delta still appends a ChatMessage(role="reasoning") to session_messages when render is off.
  • SDK path: _dispatch_response only creates the role="reasoning" row on StreamReasoningStart, which the adapter suppresses — so nothing is persisted.
  • Frontend: convertChatSessionToUiMessages.ts unconditionally converts any persisted role="reasoning" row into a type: "reasoning" UI part on reload.

Result: when the flag is off, the baseline path hides reasoning on the live wire but resurrects it on reload (flag has no effect after refresh); the SDK path hides it everywhere. That is both inconsistent and not what the config docstring promises. Simplest fix: when render_in_ui=False, skip persisting the role="reasoning" row in the baseline emitter too. Audit trail is still captured in the SDK transcript (_format_sdk_content_blocks) / OpenRouter logs — we don't rely on session.messages reasoning rows for audit today. I will update the docstring accordingly and drop the "future per-session toggle" promise, since with hydration unconditionally re-rendering those rows, persisting them without a matching frontend gate is a footgun. Will address.

3. (Minor) p0_guardrails_test::test_replaces_adapter_with_new_instance asserts render_reasoning_in_ui=True literal.
Test is env-dependent — CHAT_RENDER_REASONING_IN_UI=false in the shell would flip the production call but not the assertion. Patch backend.copilot.sdk.service.config so the assertion tracks the configured value, not a literal. Will address.

4. (No change needed) Total replay backlog cap for running sessions.
CodeRabbit flagged _stream_listener as "draining all remaining messages in batches of 100" past the replay cap. That is a misread of the mechanics: _stream_listener uses block=5000, count=100 per call, so it is receiving real-time events as they arrive on the Redis stream — not draining a backlog synchronously. The "replay storm" this PR addresses is specifically the initial batch on reconnect; live-stream events after that point are real-time by construction. Expanding scope to also bound live delivery would change unrelated behavior and is out of scope. Will reply and resolve.

5. (No change needed) "Flag still ships reasoning tokens / still bills."
Intentional per PR body — the model still reasons, the flag only gates UI surfacing. Not a bug.

Plan: address findings 1, 2, 3. Reply-and-resolve finding 4.

…t debounce

Reconnect debounce (useCopilotStream.ts):
- Coalesce-by-delay instead of drop-on-early-return. A fast-failing resume
  (e.g. 502 on GET /stream at 500ms) would call handleReconnect inside the
  1500ms window and the debounce silently returned, stalling the retry loop
  until a manual refresh. Now we schedule a timer for the remaining window
  so the retry still fires. Flagged by sentry-bot (HIGH) and coderabbit.

render_reasoning_in_ui=False persistence asymmetry:
- BaselineReasoningEmitter now also skips ChatMessage(role="reasoning")
  persistence when render is off. Previously only wire events were silenced
  while the persisted row was still appended; convertChatSessionToUiMessages
  unconditionally re-renders reasoning rows as {type: "reasoning"} UI parts
  on reload, so the flag was a no-op post-refresh. The SDK path was already
  consistent (_dispatch_response only creates the row on StreamReasoningStart,
  which the adapter suppresses). Docstrings on the emitter, the adapter, and
  the config field updated to describe the combined wire+persistence gating
  and point at the provider transcript as the audit source.
- Dropped the "future per-session toggle" promise from the emitter docstring
  - with hydration unconditionally resurfacing persisted rows, keeping them
  while silencing the live wire is a footgun, not a feature.
- Flagged by coderabbit (major, inline + outside-diff on response_adapter.py).

p0_guardrails test env-dependence:
- test_replaces_adapter_with_new_instance now asserts against
  config.render_reasoning_in_ui rather than the True literal, so
  CHAT_RENDER_REASONING_IN_UI=false in the shell no longer causes the test
  to fail for the wrong reason. Flagged by coderabbit (outside-diff, minor).
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py (1)

723-743: Patch the config to False so this catches hardcoded True regressions.

As written, this assertion follows the service config, but in the default environment it can still pass if _do_transient_backoff accidentally hardcodes render_reasoning_in_ui=True. Setting the patched service config to False makes the test prove the non-default runtime path.

🧪 Suggested test tightening
-        from backend.copilot.sdk.service import _do_transient_backoff, config
+        from backend.copilot.sdk.service import _do_transient_backoff
+
+        cfg = _make_config(render_reasoning_in_ui=False)

         original_adapter = MagicMock()
         state = MagicMock()
         state.adapter = original_adapter
         state.usage = MagicMock()

         with (
             patch("asyncio.sleep", new=AsyncMock()),
+            patch(f"{_SVC}.config", cfg),
             patch("backend.copilot.sdk.service.SDKResponseAdapter") as mock_cls,
         ):
@@
         mock_cls.assert_called_once_with(
             message_id="msg-1",
             session_id="sess-1",
-            render_reasoning_in_ui=config.render_reasoning_in_ui,
+            render_reasoning_in_ui=False,
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py` around
lines 723 - 743, The test currently relies on the real config value and can miss
regressions that hardcode render_reasoning_in_ui=True; update the test around
the _do_transient_backoff invocation to force the service config to False (e.g.,
set or patch config.render_reasoning_in_ui = False or use monkeypatch/patch on
backend.copilot.sdk.service.config.render_reasoning_in_ui) before calling
_do_transient_backoff so the mock SDKResponseAdapter created in the with-block
(mock_cls) must be constructed with render_reasoning_in_ui=False; keep
references to _do_transient_backoff, config, and SDKResponseAdapter when making
the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py`:
- Around line 723-743: The test currently relies on the real config value and
can miss regressions that hardcode render_reasoning_in_ui=True; update the test
around the _do_transient_backoff invocation to force the service config to False
(e.g., set or patch config.render_reasoning_in_ui = False or use
monkeypatch/patch on backend.copilot.sdk.service.config.render_reasoning_in_ui)
before calling _do_transient_backoff so the mock SDKResponseAdapter created in
the with-block (mock_cls) must be constructed with render_reasoning_in_ui=False;
keep references to _do_transient_backoff, config, and SDKResponseAdapter when
making the change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f0a750b5-aa9b-4fc3-b67c-0f9196a7858d

📥 Commits

Reviewing files that changed from the base of the PR and between 35e92e0 and 7ef10b2.

📒 Files selected for processing (6)
  • autogpt_platform/backend/backend/copilot/baseline/reasoning.py
  • autogpt_platform/backend/backend/copilot/baseline/reasoning_test.py
  • autogpt_platform/backend/backend/copilot/config.py
  • autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py
  • autogpt_platform/backend/backend/copilot/sdk/response_adapter.py
  • autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts
✅ Files skipped from review due to trivial changes (1)
  • autogpt_platform/backend/backend/copilot/baseline/reasoning_test.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • autogpt_platform/backend/backend/copilot/config.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
  • GitHub Check: lint
  • GitHub Check: integration_test
  • GitHub Check: check API types
  • GitHub Check: Seer Code Review
  • GitHub Check: test (3.12)
  • GitHub Check: test (3.13)
  • GitHub Check: type-check (3.11)
  • GitHub Check: end-to-end tests
  • GitHub Check: test (3.11)
  • GitHub Check: type-check (3.13)
  • GitHub Check: Analyze (typescript)
  • GitHub Check: Analyze (python)
  • GitHub Check: Check PR Status
🧰 Additional context used
📓 Path-based instructions (10)
autogpt_platform/frontend/**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

autogpt_platform/frontend/**/*.{ts,tsx,js,jsx}: Use Node.js 21+ with pnpm package manager for frontend development
Always run 'pnpm format' for formatting and linting code in frontend development

Format frontend code using pnpm format

Files:

  • autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts
autogpt_platform/frontend/**/*.{tsx,ts}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

autogpt_platform/frontend/**/*.{tsx,ts}: Use function declarations for components and handlers (not arrow functions) in React components
Only use arrow functions for small inline lambdas (map, filter, etc.) in React components
Use PascalCase for component names and camelCase with 'use' prefix for hook names in React
Use Tailwind CSS utilities only for styling in frontend components
Use design system components from 'src/components/' (atoms, molecules, organisms) in frontend development
Never use 'src/components/legacy/' in frontend code
Only use Phosphor Icons (@phosphor-icons/react) for icons in frontend components
Use generated API hooks from '@/app/api/generated/endpoints/' instead of deprecated 'BackendAPI' or 'src/lib/autogpt-server-api/
'
Use React Query for server state (via generated hooks) in frontend development
Default to client components ('use client') in Next.js; only use server components for SEO or extreme TTFB needs
Use '' component for rendering errors in frontend UI; use toast notifications for mutation errors; use 'Sentry.captureException()' for manual exceptions
Separate render logic from data/behavior in React components; keep comments minimal (code should be self-documenting)

Files:

  • autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts
autogpt_platform/frontend/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

autogpt_platform/frontend/**/*.{ts,tsx}: No barrel files or 'index.ts' re-exports in frontend code
Regenerate API hooks with 'pnpm generate:api' after backend OpenAPI spec changes in frontend development

autogpt_platform/frontend/**/*.{ts,tsx}: Fully capitalize acronyms in symbols, e.g. graphID, useBackendAPI
Use function declarations (not arrow functions) for components and handlers
No dark: Tailwind classes — the design system handles dark mode
Use Next.js <Link> for internal navigation — never raw <a> tags
No any types unless the value genuinely can be anything
No linter suppressors (// @ts-ignore``, // eslint-disable) — fix the actual issue
Keep files under ~200 lines; extract sub-components or hooks into their own files when a file grows beyond this
Keep render functions and hooks under ~50 lines; extract named helpers or sub-components when they grow longer
Use generated API hooks from `@/app/api/generated/endpoints/` with pattern `use{Method}{Version}{OperationName}` and regenerate with `pnpm generate:api`
Do not use `useCallback` or `useMemo` unless asked to optimise a given function
Separate render logic (`.tsx`) from business logic (`use*.ts` hooks)
Use ErrorCard for render errors, toast for mutations, and Sentry for exceptions in the frontend

Files:

  • autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts
autogpt_platform/frontend/src/**/*.{ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

autogpt_platform/frontend/src/**/*.{ts,tsx}: Use generated API hooks from @/app/api/__generated__/endpoints/ following the pattern use{Method}{Version}{OperationName}, and regenerate with pnpm generate:api
Separate render logic from business logic using component.tsx + useComponent.ts + helpers.ts pattern, colocate state when possible and avoid creating large components, use sub-components in local /components folder
Use function declarations for components and handlers, use arrow functions only for callbacks
Do not use useCallback or useMemo unless asked to optimise a given function

Files:

  • autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts
autogpt_platform/frontend/**/*.ts

📄 CodeRabbit inference engine (AGENTS.md)

No barrel files or index.ts re-exports in the frontend

Do not type hook returns, let Typescript infer as much as possible

Files:

  • autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts
autogpt_platform/frontend/src/**/*.ts

📄 CodeRabbit inference engine (AGENTS.md)

Do not type hook returns, let Typescript infer as much as possible

Extract component logic into custom hooks grouped by concern, not by component. Each hook should represent a cohesive domain of functionality (e.g., useSearch, useFilters, usePagination) rather than bundling all state into one useComponentState hook. Put each hook in its own .ts file.

Files:

  • autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts
autogpt_platform/**/*.{ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

Never type with any, if no types available use unknown

Files:

  • autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts
autogpt_platform/backend/**/*.py

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

autogpt_platform/backend/**/*.py: Use Python 3.11 (required; managed by Poetry via pyproject.toml) for backend development
Always run 'poetry run format' (Black + isort) before linting in backend development
Always run 'poetry run lint' (ruff) after formatting in backend development

autogpt_platform/backend/**/*.py: Use poetry run ... command for executing Python package dependencies
Use top-level imports only — avoid local/inner imports except for lazy imports of heavy optional dependencies like openpyxl
Use absolute imports with from backend.module import ... for cross-package imports; single-dot relative imports are acceptable for sibling modules within the same package; avoid double-dot relative imports
Do not use duck typing — avoid hasattr/getattr/isinstance for type dispatch; use typed interfaces/unions/protocols instead
Use Pydantic models over dataclass/namedtuple/dict for structured data
Do not use linter suppressors — no # type: ignore, # noqa, # pyright: ignore; fix the type/code instead
Prefer list comprehensions over manual loop-and-append patterns
Use early return with guard clauses first to avoid deep nesting
Use %s for deferred interpolation in debug log statements for efficiency; use f-strings elsewhere for readability (e.g., logger.debug("Processing %s items", count) vs logger.info(f"Processing {count} items"))
Sanitize error paths by using os.path.basename() in error messages to avoid leaking directory structure
Be aware of TOCTOU (Time-Of-Check-Time-Of-Use) issues — avoid check-then-act patterns for file access and credit charging
Use transaction=True for Redis pipelines to ensure atomicity on multi-step operations
Use max(0, value) guards for computed values that should never be negative
Keep files under ~300 lines; if a file grows beyond this, split by responsibility (extract helpers, models, or a sub-module into a new file)
Keep functions under ~40 lines; extract named helpers when a function grows longer
...

Files:

  • autogpt_platform/backend/backend/copilot/sdk/response_adapter.py
  • autogpt_platform/backend/backend/copilot/baseline/reasoning.py
  • autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py
autogpt_platform/{backend,autogpt_libs}/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Format Python code with poetry run format

Files:

  • autogpt_platform/backend/backend/copilot/sdk/response_adapter.py
  • autogpt_platform/backend/backend/copilot/baseline/reasoning.py
  • autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py
autogpt_platform/backend/**/*_test.py

📄 CodeRabbit inference engine (autogpt_platform/backend/AGENTS.md)

autogpt_platform/backend/**/*_test.py: Use pytest with snapshot testing for API responses
Colocate test files with source files using *_test.py naming convention
Mock at boundaries — mock where the symbol is used, not where it's defined; after refactoring, update mock targets to match new module paths
Use AsyncMock from unittest.mock for async functions in tests
When writing tests, use Test-Driven Development (TDD): write failing tests marked with @pytest.mark.xfail before implementation, then remove the marker once the implementation is complete
When creating snapshots in tests, use poetry run pytest path/to/test.py --snapshot-update; always review snapshot changes with git diff before committing

Files:

  • autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py
🧠 Learnings (35)
📓 Common learnings
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12873
File: autogpt_platform/backend/backend/copilot/baseline/reasoning.py:0-0
Timestamp: 2026-04-21T17:31:23.683Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/reasoning.py` (`BaselineReasoningEmitter`), when `render_in_ui=False`, BOTH the `StreamReasoning*` wire events AND the `ChatMessage(role="reasoning")` persistence append must be suppressed together. `convertChatSessionToUiMessages.ts` unconditionally re-renders all persisted `role="reasoning"` rows as `{type:"reasoning"}` UI parts on reload, so persisting rows while silencing live wire events would resurrect the reasoning collapse on page refresh. The audit trail is preserved through the provider transcript and `_format_sdk_content_blocks` (SDK path) instead. The baseline and SDK paths mirror each other: flag off → no live wire event, no persisted row, no hydrated collapse. This was established in PR `#12873`, commit 7ef10b26c.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12445
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1071-1072
Timestamp: 2026-03-17T06:48:26.471Z
Learning: In Significant-Gravitas/AutoGPT (autogpt_platform), the AI SDK enforces `z.strictObject({type, errorText})` on SSE `StreamError` responses, so additional fields like `retryable: bool` cannot be added to `StreamError` or serialized via `to_sse()`. Instead, retry signaling for transient Anthropic API errors is done via the `COPILOT_RETRYABLE_ERROR_PREFIX` constant prepended to persisted session messages (in `ChatMessage.content`). The frontend detects retryable errors by checking `markerType === "retryable_error"` from `parseSpecialMarkers()` — no SSE schema changes and no string matching on error text. This pattern was established in PR `#12445`, commit 64d82797b.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12773
File: autogpt_platform/backend/backend/copilot/pending_messages.py:52-64
Timestamp: 2026-04-14T14:36:25.545Z
Learning: In `autogpt_platform/backend/backend/copilot` (PR `#12773`, commit d7bced0c6): when draining pending messages into `session.messages`, each message's text is sanitized via `strip_user_context_tags` before persistence to prevent user-controlled `<user_context>` injection from bypassing the trusted server-side context prefix. Additionally, if `upsert_chat_session` fails after draining, the drained `PendingMessage` objects are requeued back to Redis to avoid silent message loss. Do NOT flag the drain-then-requeue pattern as redundant — it is the intentional failure-resilience strategy for the pending buffer.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12796
File: autogpt_platform/backend/backend/api/features/chat/routes.py:504-527
Timestamp: 2026-04-16T12:33:44.990Z
Learning: In `autogpt_platform/backend/backend/api/features/chat/routes.py`, `get_session` (PR `#12796`, commit 3771bfad9c1) closes the TOCTOU race between the initial `stream_registry.get_active_session()` pre-check and `get_chat_messages_paginated()` with a post-check re-verification: after the DB fetch, if `is_initial_load and active_session is not None`, it calls `get_active_session` a second time; if `post_active is None` (stream completed during the window), it resets `from_start=True`, `forward_paginated=True`, and re-fetches messages from sequence 0. Do NOT flag the double `get_active_session` call pattern as redundant — it is the intentional TOCTOU mitigation for pagination direction selection.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-21T11:41:05.877Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py` (PR `#12870`, commits 080d42b9d and 3d7b38162), the `_close_reasoning_block_if_open(state)` helper centralises all four reasoning-block-close call sites (text branch, tool_calls branch, stream-end, exception path). The outer `finally` block of `_baseline_llm_caller` calls this helper plus stripper flush + `StreamTextEnd` to guarantee matched end events are emitted before `StreamFinishStep` on both normal and exception paths. Do NOT flag duplicated close logic or missing reasoning-end-on-exception as issues in this function.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12773
File: autogpt_platform/frontend/src/app/api/openapi.json:12803-12806
Timestamp: 2026-04-14T06:39:52.592Z
Learning: Repo: Significant-Gravitas/AutoGPT — autogpt_platform
Intentional message length caps:
- StreamChatRequest.message maxLength = 64000.
- QueuePendingMessageRequest.message maxLength = 32000 (matches PendingMessage.content).
Rationale: both feed the same LLM context window; pending must not exceed stream, and larger ceilings replace legacy 4000/16000.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12797
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1991-2021
Timestamp: 2026-04-15T13:44:34.273Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py` (`_run_stream_attempt`), the pre-create block (PR `#12797`) intentionally does NOT call `state.transcript_builder.append_assistant(...)` when inserting the empty assistant placeholder into `ctx.session.messages`. The transcript is left ending at the `tool_result` entry (N entries) while `message_count` metadata is N+1. This mismatch is benign and deliberate: on the next `--resume`, the SDK sees the transcript ending at `tool_result` and correctly regenerates the assistant response. Pre-appending the assistant turn to the transcript would suppress regeneration while leaving `session.messages[-1].content = ""` permanently (worse outcome). On the gap-fallback path, `transcript_msg_count (N+1) >= msg_count-1 (N)` means no gap is injected for the empty placeholder, which is correct because injecting an empty assistant message as context would mislead the SDK. Do NOT flag this transcript/message_count discrepancy as a bug.
📚 Learning: 2026-03-17T06:48:26.471Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12445
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1071-1072
Timestamp: 2026-03-17T06:48:26.471Z
Learning: In Significant-Gravitas/AutoGPT (autogpt_platform), the AI SDK enforces `z.strictObject({type, errorText})` on SSE `StreamError` responses, so additional fields like `retryable: bool` cannot be added to `StreamError` or serialized via `to_sse()`. Instead, retry signaling for transient Anthropic API errors is done via the `COPILOT_RETRYABLE_ERROR_PREFIX` constant prepended to persisted session messages (in `ChatMessage.content`). The frontend detects retryable errors by checking `markerType === "retryable_error"` from `parseSpecialMarkers()` — no SSE schema changes and no string matching on error text. This pattern was established in PR `#12445`, commit 64d82797b.

Applied to files:

  • autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts
  • autogpt_platform/backend/backend/copilot/baseline/reasoning.py
📚 Learning: 2026-03-19T11:25:27.842Z
Learnt from: Pwuts
Repo: Significant-Gravitas/AutoGPT PR: 12471
File: autogpt_platform/frontend/src/app/(platform)/admin/users/useAdminUsersPage.ts:48-55
Timestamp: 2026-03-19T11:25:27.842Z
Learning: In `autogpt_platform/frontend/src/app/(platform)/admin/users/useAdminUsersPage.ts`, the debounced search pattern uses `useRef(debounce(...))` (lodash) rather than `useEffect`+`setTimeout`. The debounced callback atomically applies `setDebouncedSearch(value.trim())` and `setCurrentPage(1)`, so the page reset is deferred along with the filter change and never races ahead. The query is driven by `debouncedSearch` (not the raw `searchQuery`), so no stale-filter fetch occurs on the first keystroke. Do not flag this pattern as incorrect in future reviews.

Applied to files:

  • autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts
📚 Learning: 2026-03-17T06:18:51.570Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12445
File: autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatContainer/ChatContainer.tsx:55-67
Timestamp: 2026-03-17T06:18:51.570Z
Learning: In `autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatContainer/ChatContainer.tsx`, an explicit `isBusy` guard on the retry handler (`handleRetry`) is not needed. Once `onSend` is invoked, the chat status immediately transitions to "submitted", which causes the `ErrorCard` (containing the retry button) to unmount before a second click can register, making double-send impossible by design.

Applied to files:

  • autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts
📚 Learning: 2026-03-24T02:05:08.144Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12526
File: autogpt_platform/frontend/src/app/(platform)/copilot/CopilotPage.tsx:0-0
Timestamp: 2026-03-24T02:05:08.144Z
Learning: In `Significant-Gravitas/AutoGPT` (autogpt_platform frontend), when gating logic on a React Query result being available (e.g., `useGetV2GetCopilotUsage`), prefer destructuring `isSuccess` (e.g., `const { data, isSuccess: hasUsage } = useQuery(...)`) over checking `!isLoading`. `isLoading` can be `false` in error/idle states where `data` is still `undefined`, while `isSuccess` guarantees the query completed successfully and `data` is populated. This pattern was established in `CopilotPage.tsx` (PR `#12526`, commit e9dfd1f76).

Applied to files:

  • autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts
📚 Learning: 2026-04-01T14:54:01.937Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12636
File: autogpt_platform/backend/backend/copilot/sdk/service.py:0-0
Timestamp: 2026-04-01T14:54:01.937Z
Learning: In Significant-Gravitas/AutoGPT (autogpt_platform), `claude_agent_max_transient_retries` (default=3) in `ChatConfig` counts **total attempts including the initial one**, not the number of extra retries. With the pre-incremented `transient_retries >= max_transient` guard in `service.py`, a value of 3 yields 3 total stream attempts (initial + 2 retries with exponential backoff: 1s, 2s). Do NOT flag this as an off-by-one — the `>=` check is intentional.

Applied to files:

  • autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts
📚 Learning: 2026-04-13T13:10:33.180Z
Learnt from: 0ubbe
Repo: Significant-Gravitas/AutoGPT PR: 12764
File: autogpt_platform/frontend/src/app/(platform)/library/components/LibraryAgentList/useLibraryAgentList.ts:264-294
Timestamp: 2026-04-13T13:10:33.180Z
Learning: In `autogpt_platform/frontend/src/app/(platform)/library/components/LibraryAgentList/useLibraryAgentList.ts`, the `consecutiveEmptyPagesRef` and `prevFilteredLengthRef` refs used to track filtered-pagination exhaustion are intentional. The one-render lag in `filteredExhausted` (which reads `consecutiveEmptyPagesRef.current` synchronously) is by design — refs are preferred here to avoid triggering extra re-renders for internal fetch-state bookkeeping. Do not flag this as a stale-ref bug in future reviews.

Applied to files:

  • autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts
📚 Learning: 2026-04-13T13:11:00.401Z
Learnt from: 0ubbe
Repo: Significant-Gravitas/AutoGPT PR: 12764
File: autogpt_platform/frontend/src/app/(platform)/copilot/components/EmptySession/EmptySession.tsx:41-42
Timestamp: 2026-04-13T13:11:00.401Z
Learning: In Significant-Gravitas/AutoGPT `autogpt_platform/frontend`, unconditional React Query hook calls (e.g. `usePulseChips()` in `EmptySession.tsx`) are intentional when the underlying data is expected to be cached from prior page visits. The team considers the fetch cost acceptable in these cases and does not require `enabled` gating purely for feature-flag-disabled paths. Do not flag unconditional query hooks as wasteful when caching makes the cost negligible.

Applied to files:

  • autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts
  • autogpt_platform/backend/backend/copilot/baseline/reasoning.py
📚 Learning: 2026-04-08T17:28:40.841Z
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: autogpt_platform/frontend/AGENTS.md:0-0
Timestamp: 2026-04-08T17:28:40.841Z
Learning: Applies to autogpt_platform/frontend/**/*.{ts,tsx} : No linter suppressors (`// ts-ignore`, `// eslint-disable`) — fix the actual issue

Applied to files:

  • autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts
📚 Learning: 2026-04-15T13:44:34.273Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12797
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1991-2021
Timestamp: 2026-04-15T13:44:34.273Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py` (`_run_stream_attempt`), the pre-create block (PR `#12797`) intentionally does NOT call `state.transcript_builder.append_assistant(...)` when inserting the empty assistant placeholder into `ctx.session.messages`. The transcript is left ending at the `tool_result` entry (N entries) while `message_count` metadata is N+1. This mismatch is benign and deliberate: on the next `--resume`, the SDK sees the transcript ending at `tool_result` and correctly regenerates the assistant response. Pre-appending the assistant turn to the transcript would suppress regeneration while leaving `session.messages[-1].content = ""` permanently (worse outcome). On the gap-fallback path, `transcript_msg_count (N+1) >= msg_count-1 (N)` means no gap is injected for the empty placeholder, which is correct because injecting an empty assistant message as context would mislead the SDK. Do NOT flag this transcript/message_count discrepancy as a bug.

Applied to files:

  • autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts
  • autogpt_platform/backend/backend/copilot/sdk/response_adapter.py
  • autogpt_platform/backend/backend/copilot/baseline/reasoning.py
  • autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py
📚 Learning: 2026-04-14T06:34:02.835Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12774
File: autogpt_platform/backend/backend/copilot/tools/e2b_sandbox.py:0-0
Timestamp: 2026-04-14T06:34:02.835Z
Learning: In `autogpt_platform/backend/backend/copilot/tools/e2b_sandbox.py`, the `asyncio.wait_for()` retry loop around `AsyncSandbox.create()` (introduced in PR `#12774`) can leak up to `_SANDBOX_CREATE_MAX_RETRIES - 1` (≤2) orphaned E2B sandboxes per hang incident because `wait_for` cancels only the client-side wait while E2B may complete server-side provisioning. With the default `on_timeout="pause"` lifecycle, leaked orphaned sandboxes are **paused** (not killed) when their original `end_at` is reached and persist indefinitely until explicitly killed — there is NO automatic E2B project-level cleanup. Operators must manage these manually or via their own cleanup jobs. The sandbox_id is not accessible from the timed-out coroutine, so recovery via `AsyncSandbox.connect(sandbox_id)` is not possible at timeout. This is an intentionally accepted trade-off; a proper fix is deferred to a follow-up PR. Do NOT flag the retry loop as a blocking issue.

Applied to files:

  • autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts
📚 Learning: 2026-03-24T02:23:31.305Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12526
File: autogpt_platform/frontend/src/app/(platform)/copilot/components/RateLimitResetDialog/RateLimitResetDialog.tsx:0-0
Timestamp: 2026-03-24T02:23:31.305Z
Learning: In the Copilot platform UI code, follow the established Orval hook `onError` error-handling convention: first explicitly detect/handle `ApiError`, then read `error.response?.detail` (if present) as the primary message; if not available, fall back to `error.message`; and finally fall back to a generic string message. This convention should be used for generated Orval hooks even if the custom Orval mutator already maps details into `ApiError.message`, to keep consistency across hooks/components (e.g., `useCronSchedulerDialog.ts`, `useRunGraph.ts`, and rate-limit/reset flows).

Applied to files:

  • autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts
📚 Learning: 2026-04-15T14:10:18.177Z
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: autogpt_platform/backend/backend/copilot/graphiti/CLAUDE.md:0-0
Timestamp: 2026-04-15T14:10:18.177Z
Learning: Applies to autogpt_platform/backend/backend/copilot/graphiti/**/*agent*.{ts,tsx} : Agent error handling must distinguish between recoverable and non-recoverable errors

Applied to files:

  • autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts
📚 Learning: 2026-04-14T06:39:52.592Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12773
File: autogpt_platform/frontend/src/app/api/openapi.json:12803-12806
Timestamp: 2026-04-14T06:39:52.592Z
Learning: Repo: Significant-Gravitas/AutoGPT — autogpt_platform
Intentional message length caps:
- StreamChatRequest.message maxLength = 64000.
- QueuePendingMessageRequest.message maxLength = 32000 (matches PendingMessage.content).
Rationale: both feed the same LLM context window; pending must not exceed stream, and larger ceilings replace legacy 4000/16000.

Applied to files:

  • autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts
📚 Learning: 2026-04-14T14:36:25.545Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12773
File: autogpt_platform/backend/backend/copilot/pending_messages.py:52-64
Timestamp: 2026-04-14T14:36:25.545Z
Learning: In `autogpt_platform/backend/backend/copilot` (PR `#12773`, commit d7bced0c6): when draining pending messages into `session.messages`, each message's text is sanitized via `strip_user_context_tags` before persistence to prevent user-controlled `<user_context>` injection from bypassing the trusted server-side context prefix. Additionally, if `upsert_chat_session` fails after draining, the drained `PendingMessage` objects are requeued back to Redis to avoid silent message loss. Do NOT flag the drain-then-requeue pattern as redundant — it is the intentional failure-resilience strategy for the pending buffer.

Applied to files:

  • autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts
  • autogpt_platform/backend/backend/copilot/baseline/reasoning.py
📚 Learning: 2026-04-16T12:33:44.990Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12796
File: autogpt_platform/backend/backend/api/features/chat/routes.py:504-527
Timestamp: 2026-04-16T12:33:44.990Z
Learning: In `autogpt_platform/backend/backend/api/features/chat/routes.py`, `get_session` (PR `#12796`, commit 3771bfad9c1) closes the TOCTOU race between the initial `stream_registry.get_active_session()` pre-check and `get_chat_messages_paginated()` with a post-check re-verification: after the DB fetch, if `is_initial_load and active_session is not None`, it calls `get_active_session` a second time; if `post_active is None` (stream completed during the window), it resets `from_start=True`, `forward_paginated=True`, and re-fetches messages from sequence 0. Do NOT flag the double `get_active_session` call pattern as redundant — it is the intentional TOCTOU mitigation for pagination direction selection.

Applied to files:

  • autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts
📚 Learning: 2026-04-01T18:54:16.035Z
Learnt from: Bentlybro
Repo: Significant-Gravitas/AutoGPT PR: 12633
File: autogpt_platform/frontend/src/app/(platform)/library/components/AgentFilterMenu/AgentFilterMenu.tsx:3-10
Timestamp: 2026-04-01T18:54:16.035Z
Learning: In the frontend, the legacy Select component at `@/components/__legacy__/ui/select` is an intentional, codebase-wide visual-consistency pattern. During code reviews, do not flag or block PRs merely for continuing to use this legacy Select. If a migration to the newer design-system Select is desired, bundle it into a single dedicated cleanup/migration PR that updates all Select usages together (e.g., avoid piecemeal replacements).

Applied to files:

  • autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts
📚 Learning: 2026-04-07T09:24:16.582Z
Learnt from: 0ubbe
Repo: Significant-Gravitas/AutoGPT PR: 12686
File: autogpt_platform/frontend/src/app/(no-navbar)/onboarding/steps/__tests__/PainPointsStep.test.tsx:1-19
Timestamp: 2026-04-07T09:24:16.582Z
Learning: In Significant-Gravitas/AutoGPT’s `autogpt_platform/frontend` (Vite + `vitejs/plugin-react` with the automatic JSX transform), do not flag usages of React types/components (e.g., `React.ReactNode`) in `.ts`/`.tsx` files as missing `React` imports. Since the React namespace is made available by the project’s TS/Vite setup, an explicit `import React from 'react'` or `import type { ReactNode } ...` is not required; only treat it as missing if typechecking (e.g., `pnpm types`) would actually fail.

Applied to files:

  • autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts
📚 Learning: 2026-04-02T05:43:49.128Z
Learnt from: 0ubbe
Repo: Significant-Gravitas/AutoGPT PR: 12640
File: autogpt_platform/frontend/src/app/(no-navbar)/onboarding/steps/WelcomeStep.tsx:13-13
Timestamp: 2026-04-02T05:43:49.128Z
Learning: Do not flag `import { Question } from "phosphor-icons/react"` as an invalid import. `Question` is a valid named export from `phosphor-icons/react` (as reflected in the package’s generated `.d.ts` files and re-exports via `dist/index.d.ts`), so it should be treated as a supported named export during code reviews.

Applied to files:

  • autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts
📚 Learning: 2026-04-21T17:31:23.683Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12873
File: autogpt_platform/backend/backend/copilot/baseline/reasoning.py:0-0
Timestamp: 2026-04-21T17:31:23.683Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/reasoning.py` (`BaselineReasoningEmitter`), when `render_in_ui=False`, BOTH the `StreamReasoning*` wire events AND the `ChatMessage(role="reasoning")` persistence append must be suppressed together. `convertChatSessionToUiMessages.ts` unconditionally re-renders all persisted `role="reasoning"` rows as `{type:"reasoning"}` UI parts on reload, so persisting rows while silencing live wire events would resurrect the reasoning collapse on page refresh. The audit trail is preserved through the provider transcript and `_format_sdk_content_blocks` (SDK path) instead. The baseline and SDK paths mirror each other: flag off → no live wire event, no persisted row, no hydrated collapse. This was established in PR `#12873`, commit 7ef10b26c.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/response_adapter.py
  • autogpt_platform/backend/backend/copilot/baseline/reasoning.py
📚 Learning: 2026-03-17T10:57:12.953Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/copilot/workflow_import/converter.py:0-0
Timestamp: 2026-03-17T10:57:12.953Z
Learning: In Significant-Gravitas/AutoGPT PR `#12440`, `autogpt_platform/backend/backend/copilot/workflow_import/converter.py` was fully rewritten (commit 732960e2d) to no longer make direct LLM/OpenAI API calls. The converter now builds a structured text prompt for AutoPilot/CoPilot instead. There is no `response.choices` access or any direct LLM client usage in this file. Do not flag `response.choices` access or LLM client initialization patterns as issues in this file.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/response_adapter.py
  • autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py
📚 Learning: 2026-04-21T11:41:05.877Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-21T11:41:05.877Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py` (PR `#12870`, commits 080d42b9d and 3d7b38162), the `_close_reasoning_block_if_open(state)` helper centralises all four reasoning-block-close call sites (text branch, tool_calls branch, stream-end, exception path). The outer `finally` block of `_baseline_llm_caller` calls this helper plus stripper flush + `StreamTextEnd` to guarantee matched end events are emitted before `StreamFinishStep` on both normal and exception paths. Do NOT flag duplicated close logic or missing reasoning-end-on-exception as issues in this function.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/response_adapter.py
  • autogpt_platform/backend/backend/copilot/baseline/reasoning.py
📚 Learning: 2026-02-26T17:02:22.448Z
Learnt from: Pwuts
Repo: Significant-Gravitas/AutoGPT PR: 12211
File: .pre-commit-config.yaml:160-179
Timestamp: 2026-02-26T17:02:22.448Z
Learning: Keep the pre-commit hook pattern broad for autogpt_platform/backend to ensure OpenAPI schema changes are captured. Do not narrow to backend/api/ alone, since the generated schema depends on Pydantic models across multiple directories (backend/data/, backend/blocks/, backend/copilot/, backend/integrations/, backend/util/). Narrowing could miss schema changes and cause frontend type desynchronization.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/response_adapter.py
  • autogpt_platform/backend/backend/copilot/baseline/reasoning.py
  • autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py
📚 Learning: 2026-03-04T08:04:35.881Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12273
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:216-220
Timestamp: 2026-03-04T08:04:35.881Z
Learning: In the AutoGPT Copilot backend, ensure that SVG images are not treated as vision image types by excluding 'image/svg+xml' from INLINEABLE_MIME_TYPES and MULTIMODAL_TYPES in tool_adapter.py; the Claude API supports PNG, JPEG, GIF, and WebP for vision. SVGs (XML text) should be handled via the text path instead, not the vision path.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/response_adapter.py
  • autogpt_platform/backend/backend/copilot/baseline/reasoning.py
  • autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py
📚 Learning: 2026-04-01T04:17:41.600Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12632
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:0-0
Timestamp: 2026-04-01T04:17:41.600Z
Learning: When reviewing AutoGPT Copilot tool implementations, accept that `readOnlyHint=True` (provided via `ToolAnnotations`) may be applied unconditionally to *all* tools—even tools that have side effects (e.g., `bash_exec`, `write_workspace_file`, or other write/save operations). Do **not** flag these tools for having `readOnlyHint=True`; this is intentional to enable fully-parallel dispatch by the Anthropic SDK/CLI and has been E2E validated. Only flag `readOnlyHint` issues if they conflict with the established `ToolAnnotations` behavior (e.g., missing/incorrect propagation relative to the intended annotation mechanism).

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/response_adapter.py
  • autogpt_platform/backend/backend/copilot/baseline/reasoning.py
  • autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py
📚 Learning: 2026-03-05T15:42:08.207Z
Learnt from: ntindle
Repo: Significant-Gravitas/AutoGPT PR: 12297
File: .claude/skills/backend-check/SKILL.md:14-16
Timestamp: 2026-03-05T15:42:08.207Z
Learning: In Python files under autogpt_platform/backend (recursively), rely on poetry run format to perform formatting (Black + isort) and linting (ruff). Do not run poetry run lint as a separate step after poetry run format, since format already includes linting checks.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/response_adapter.py
  • autogpt_platform/backend/backend/copilot/baseline/reasoning.py
  • autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py
📚 Learning: 2026-03-16T16:35:40.236Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/api/features/workflow_import.py:54-63
Timestamp: 2026-03-16T16:35:40.236Z
Learning: Avoid using the word 'competitor' in public-facing identifiers and text. Use neutral naming for API paths, model names, function names, and UI text. Examples: rename 'CompetitorFormat' to 'SourcePlatform', 'convert_competitor_workflow' to 'convert_workflow', '/competitor-workflow' to '/workflow'. Apply this guideline to files under autogpt_platform/backend and autogpt_platform/frontend.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/response_adapter.py
  • autogpt_platform/backend/backend/copilot/baseline/reasoning.py
  • autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py
📚 Learning: 2026-03-31T15:37:38.626Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12623
File: autogpt_platform/backend/backend/copilot/tools/agent_generator/fixer.py:37-47
Timestamp: 2026-03-31T15:37:38.626Z
Learning: When validating/constructing Anthropic API model IDs in Significant-Gravitas/AutoGPT, allow the hyphen-separated Claude Opus 4.6 model ID `claude-opus-4-6` (it corresponds to `LlmModel.CLAUDE_4_6_OPUS` in `autogpt_platform/backend/backend/blocks/llm.py`). Do NOT require the dot-separated form in Anthropic contexts. Only OpenRouter routing variants should use the dot separator (e.g., `anthropic/claude-opus-4.6`); `claude-opus-4-6` should be treated as correct when passed to Anthropic, and flagged only if it’s used in the OpenRouter path where the dot form is expected.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/response_adapter.py
  • autogpt_platform/backend/backend/copilot/baseline/reasoning.py
  • autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py
📚 Learning: 2026-04-15T02:43:36.890Z
Learnt from: ntindle
Repo: Significant-Gravitas/AutoGPT PR: 12780
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:0-0
Timestamp: 2026-04-15T02:43:36.890Z
Learning: When reviewing Python exception handlers, do not flag `isinstance(e, X)` checks as dead/unreachable if the caught exception `X` is a subclass of the exception type being handled. For example, if `X` (e.g., `VirusScanError`) inherits from `ValueError` (directly or via an intermediate class) and it can be raised within an `except ValueError:` block, then `isinstance(e, X)` inside that handler is reachable and should not be treated as dead code.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/response_adapter.py
  • autogpt_platform/backend/backend/copilot/baseline/reasoning.py
  • autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py
📚 Learning: 2026-03-10T08:39:22.025Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12356
File: autogpt_platform/backend/backend/copilot/constants.py:9-12
Timestamp: 2026-03-10T08:39:22.025Z
Learning: In Significant-Gravitas/AutoGPT PR `#12356`, the `COPILOT_SYNTHETIC_ID_PREFIX = "copilot-"` check in `create_auto_approval_record` (human_review.py) is intentional and safe. The `graph_exec_id` passed to this function comes from server-side `PendingHumanReview` DB records (not from user input); the API only accepts `node_exec_id` from users. Synthetic `copilot-*` IDs are only ever created server-side in `run_block.py`. The prefix skip avoids a DB lookup for a `AgentGraphExecution` record that legitimately does not exist for CoPilot sessions, while `user_id` scoping is enforced at the auth layer and on the resulting auto-approval record.

Applied to files:

  • autogpt_platform/backend/backend/copilot/baseline/reasoning.py
📚 Learning: 2026-04-09T09:07:11.551Z
Learnt from: ntindle
Repo: Significant-Gravitas/AutoGPT PR: 12720
File: autogpt_platform/backend/backend/copilot/tools/graphiti_delete.py:63-69
Timestamp: 2026-04-09T09:07:11.551Z
Learning: In Significant-Gravitas/AutoGPT, gating `graphiti_delete_user_data` (and similar Graphiti memory tools) on the `is_enabled_for_user` / `graphiti-memory` LaunchDarkly flag in the delete path is intentional and acceptable. The scenario where a user has existing Graphiti data but the flag is later disabled (preventing deletion) is not a concern for the team. Do not flag this pattern as an issue.

Applied to files:

  • autogpt_platform/backend/backend/copilot/baseline/reasoning.py
📚 Learning: 2026-03-11T08:40:59.673Z
Learnt from: kcze
Repo: Significant-Gravitas/AutoGPT PR: 12328
File: autogpt_platform/frontend/src/app/(platform)/copilot/useLoadMoreMessages.ts:49-61
Timestamp: 2026-03-11T08:40:59.673Z
Learning: In `autogpt_platform/frontend/src/app/(platform)/copilot/useLoadMoreMessages.ts`, clearing `olderMessages` (and resetting `oldestSequence`/`hasMore`) when `initialOldestSequence` shifts on the same session is intentional. Pages already fetched were based on a now-stale cursor; retaining them risks sequence gaps or duplicates. `ScrollPreserver` keeps the currently visible viewport intact, so only unvisited older pages are dropped. This is a deliberate safe-refetch design tradeoff.

Applied to files:

  • autogpt_platform/backend/backend/copilot/baseline/reasoning.py
📚 Learning: 2026-03-11T16:50:45.654Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12367
File: autogpt_platform/backend/generate_views.py:146-150
Timestamp: 2026-03-11T16:50:45.654Z
Learning: In Significant-Gravitas/AutoGPT PR `#12367` (`autogpt_platform/backend/generate_views.py`), the `security_invoker = false` setting on analytics views is intentional. `analytics_readonly` is granted access only to the `analytics` schema. The views execute as their owner (postgres) to read `auth.*` and `platform.*` tables, but expose only curated, scrubbed columns via their SELECT lists. This is the correct PostgreSQL "view as security boundary" pattern — equivalent to a stored-procedure security definer. Do not flag this as a privilege escalation; switching to `security_invoker = true` would require granting `analytics_readonly` direct SELECT on `auth.sessions`, `auth.audit_log_entries`, `platform.AgentGraphExecution`, etc., which is a much wider blast radius than the current design.

Applied to files:

  • autogpt_platform/backend/backend/copilot/baseline/reasoning.py
📚 Learning: 2026-04-08T17:28:23.439Z
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: autogpt_platform/backend/AGENTS.md:0-0
Timestamp: 2026-04-08T17:28:23.439Z
Learning: Applies to autogpt_platform/backend/**/*_test.py : Mock at boundaries — mock where the symbol is **used**, not where it's **defined**; after refactoring, update mock targets to match new module paths

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py
📚 Learning: 2026-02-04T16:49:42.490Z
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-02-04T16:49:42.490Z
Learning: Applies to autogpt_platform/backend/**/test/**/*.py : Use snapshot testing with '--snapshot-update' flag in backend tests when output changes; always review with 'git diff'

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py
🔇 Additional comments (4)
autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts (2)

145-213: Coalesce-by-delay implementation looks correct.

The debounce branch now schedules a retry on the window boundary rather than dropping it, which addresses the previously-flagged stall-on-fast-failing-resume case. Timer is tracked via reconnectTimerRef so the session-switch cleanup (line 509) also cancels a pending coalesced retry, and the recursive call path doesn't increment reconnectAttemptsRef until the actual attempt fires — so the retry budget is preserved across coalesced waits.


126-130: lastReconnectResumeAtRef lifecycle is consistent.

Stamped immediately before resumeStreamRef.current() (line 210), reset to 0 on session switch (line 517), and the sentinel > 0 check at line 160 correctly bypasses the debounce on the first reconnect of a session. Matches the documented intent in the comments.

Also applies to: 210-210, 517-517

autogpt_platform/backend/backend/copilot/sdk/response_adapter.py (1)

56-83: Render flag gating looks consistent.

render_reasoning_in_ui=False now suppresses reasoning start/delta/end events together, while leaving SDK transcript continuity unaffected. This keeps the SDK path aligned with the intended “no live wire, no persisted UI reasoning row” behavior. Based on learnings, when reasoning rendering is disabled, both StreamReasoning* events and ChatMessage(role="reasoning") persistence must be suppressed together.

Also applies to: 164-184, 377-407

autogpt_platform/backend/backend/copilot/baseline/reasoning.py (1)

176-188: Baseline reasoning suppression now matches the flag semantics.

The emitter now suppresses both StreamReasoning* wire events and ChatMessage(role="reasoning") persistence when render_in_ui=False, while still keeping the block lifecycle symmetric. This resolves the reload/hydration asymmetry cleanly. Based on learnings, when render_in_ui=False, both the wire events and reasoning-row persistence must be suppressed together.

Also applies to: 191-201, 216-241, 252-267

majdyz added 2 commits April 22, 2026 07:12
Extract the inline debounce logic in `useCopilotStream.handleReconnect`
into a pure `shouldDebounceReconnect(lastResumeAt, now, windowMs)` helper
and cover it with 10 vitest cases (first-reconnect pass-through, inside
window coalesce, boundary, beyond window, custom window, burst
simulation). The hook wiring shrinks to two lines and the decision
surface is 100% covered by unit tests — useful for codecov/patch on the
frontend diff.
Mocks @ai-sdk/react so renderHook(useCopilotStream) can capture the
onFinish callback directly and drive handleReconnect without real SSE.
Two cases, both on vi.useFakeTimers():

- a burst of onFinish({isDisconnect: true}) inside the 1500ms window
  coalesces onto the boundary — resumeStream is called once for the
  first cycle, then a second time only after the window + attempt-#2
  backoff elapse.
- a disconnect arriving after the window closes takes the normal
  backoff path (not the debounce branch).

Covers the wiring lines shouldDebounceReconnect can't reach on its own
(useRef(0), the remainingDelay !== null branch's timer setup, and the
Date.now() stamp on resume). Together with the helper unit tests this
brings the codecov/patch diff for platform-frontend from 0% to full
coverage on the debounce lines.
@github-actions github-actions Bot added size/xl and removed size/l labels Apr 22, 2026
@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts with the base branch, please resolve those so we can evaluate the pull request.

@github-actions github-actions Bot added the conflicts Automatically applied to PRs with merge conflicts label Apr 22, 2026
majdyz added 2 commits April 22, 2026 08:56
…ect-fixes

Resolve conflict in baseline/reasoning.py by combining the coalescing
state + Start/first-Delta atomic flush from #12871 with the
``render_in_ui`` wire-event / persistence gating from this branch.
…accounting

- Switch dispatch model from ``claude-haiku-4-5`` ($1/$5) to
  ``claude-haiku-3-5`` ($0.25/$1.25) — 4x cheaper tokens while keeping
  the same tool-use contract.  Observed cost on a 19K-input call drops
  from ~$0.029 to ~$0.015.
- Add ``cache_read_input_tokens`` / ``cache_creation_input_tokens`` to
  the cost estimator.  Anthropic reports these buckets separately from
  ``input_tokens`` (cache reads are ~10% of input rate; writes are
  ~125% for 5m TTL), so not counting them under- or over-bills against
  the user's daily/weekly microdollar rate limit.
- New regression test ``test_cache_tokens_billed_distinctly`` pins the
  three distinct line items so a future pricing-constant edit can't
  quietly flatten them back into one.
majdyz added 2 commits April 22, 2026 11:08
…s wire only

Decouple the persistence side of reasoning from the render-in-UI flag
so the session transcript (role='reasoning' rows in session.messages)
is always preserved — both baseline and SDK paths now persist every
ThinkingBlock / OpenRouter reasoning delta.  The render flag is
narrowed to the wire surface: baseline no longer gates the
session_messages append; SDK emits StreamReasoning events
unconditionally, and the SSE yield loop filters reasoning events out
of the wire when flag=False.

Matches the user-facing contract: turning render off hides the
collapse in the live UI, but audit/replay still has the reasoning
text.  Frontend follow-up will filter persisted rows on reload so the
collapse doesn't resurrect post-refresh.
…er_in_ui=False

Flip the sense of test_render_off_skips_persistence to assert the new
behaviour: render flag gates wire only, persistence stays on. Renamed
to test_render_off_still_persists for clarity.
Comment thread autogpt_platform/backend/backend/copilot/baseline/reasoning.py
majdyz added 3 commits April 22, 2026 11:14
Log flush-count, deferred-chunk-count, total chars, and turn duration
at close() so we can see whether 'reasoning appears all at end'
reports are coalesce over-buffering (many deferred, 0-1 flushes) vs
bursty provider output (flushes fire but in tight bursts) vs a pure
transport problem downstream.  Temporary; will remove after the UX
question is settled.
…ding

Persistence of role='reasoning' rows is unconditional now (see
0773e35 + c4a26ca); render_reasoning_in_ui only gates the live
StreamReasoning* wire events so the hydration path keeps working
with the flag off. Fix the BaselineReasoningEmitter docstring and the
ChatConfig.render_reasoning_in_ui field description so future readers
aren't misled by the old 'suppresses both' phrasing.
…ration burst

Baseline reasoning events (StreamReasoning*, StreamText*, StreamTool*) were
buffered in state.pending_events: list and only drained AFTER each
_baseline_llm_caller awaited its full upstream stream to completion — via
the per-iteration drain inside async for loop_result in tool_call_loop(...).
On Kimi K2.6 a single reasoning round can emit ~1,400 deltas over 3+ minutes
before the OpenRouter stream closes, so the UI stayed frozen on StreamStart
for the whole window and then flushed the backlog in one burst.

Switch pending_events to an asyncio.Queue drained concurrently by the outer
async generator. The tool-call loop runs as a background task; events put on
the queue reach the SSE wire during the upstream stream, not after. None is
the close sentinel; inner-task exceptions are re-raised via await loop_task
after the sentinel.

Mirror every emission into a new emitted_events: list so the existing unit
tests keep their post-hoc inspection view. Update the two SDK adapter tests
that still pinned the pre-decoupling contract (adapter now always emits;
service-layer filter handles render_reasoning_in_ui=False).

Multipod-safe: the queue is per-stream/per-process; cross-pod delivery is
still via Redis streams, which now receive chunks live.
…ration burst

Baseline reasoning events (StreamReasoning*, StreamText*, StreamTool*) were
buffered in state.pending_events: list and only drained AFTER each
_baseline_llm_caller awaited its full upstream stream to completion — via
the per-iteration drain inside async for loop_result in tool_call_loop(...).
Any reasoning route streaming for several minutes per round (extended
thinking on Anthropic, Moonshot, or future providers routed through
OpenRouter) froze the UI on StreamStart for the whole window and then
flushed the backlog in one burst.

Switch pending_events to an asyncio.Queue drained concurrently by the outer
async generator. The tool-call loop runs as a background task; events put on
the queue reach the SSE wire during the upstream stream, not after. None is
the close sentinel; inner-task exceptions are re-raised via await loop_task
after the sentinel.

Mirror every emission into a new emitted_events: list so the existing unit
tests keep their post-hoc inspection view. Update the two SDK adapter tests
that still pinned the pre-decoupling contract (adapter now always emits;
service-layer filter handles render_reasoning_in_ui=False).

SDK path is unaffected: it yields events per SDK message inline with no
equivalent list buffer. Multipod-safe: the queue is per-stream/per-process;
cross-pod delivery is still via Redis streams, which now receive chunks live.
@majdyz majdyz force-pushed the feat/copilot-reasoning-render-flag-and-reconnect-fixes branch from 35d5ad0 to 12399f4 Compare April 22, 2026 04:58
majdyz added a commit that referenced this pull request Apr 22, 2026
…dollar wallet boundary (#12876)

### Why / What / How

**Why.** Audit of `BLOCK_COSTS` against `credentials_store.py` system
credentials revealed **13 paid blocks** running for free from the credit
wallet's perspective — `BLOCK_COSTS.get(type(block))` returned `None`,
`cost = 0`, no `spend_credits` deduction. Users without their own API
key consumed system credentials with zero credit drain. Separately, the
credit wallet (user-facing prepaid balance) and the copilot microdollar
counter (operator-side meter that gates `daily_cost_limit_microdollars`)
were never documented as separate systems, so future readers kept
tripping on the "why isn't this block charging my limit?" question.

**What.** Three deltas, all credit-wallet-side:

- **Register the 13 paid blocks in `BLOCK_COSTS`** with reasonable
per-call credit prices (1 credit = $0.01). Pricing researched against
the providers' published rates with ~2-3x markup.
- **Document the credit/microdollar boundary** in
`copilot/rate_limit.py`: credits = user-facing prepaid wallet with
marketplace-creator charging; microdollars = operator-side meter that
only ticks on copilot LLM turns (baseline / SDK / web_search /
simulator). Block execution bills credits, not microdollars — explicit
contract.
- **Populate `provider_cost`** on PerplexityBlock so PlatformCostLog
rows carry the real OpenRouter `x-total-cost` value via the existing
`executor/cost_tracking.log_system_credential_cost` path (separate flow
from credit deduction).

### Block costs registered

| Provider | Block | Credits | Raw cost / markup |
|---|---|---|---|
| Perplexity (OpenRouter) | PerplexityBlock — Sonar | 1 | $0.001-0.005 /
call |
| | PerplexityBlock — Sonar Pro | 5 | $0.025 / call |
| | PerplexityBlock — Sonar Deep Research | 10 | up to $0.05 / call |
| Jina | FactCheckerBlock | 1 | $0.005 / call |
| Mem0 | AddMemoryBlock | 1 | $0.0004 / call (1c floor) |
| | SearchMemoryBlock | 1 | $0.004 / call |
| | GetAllMemoriesBlock | 1 | $0.004 / call |
| | GetLatestMemoryBlock | 1 | $0.004 / call |
| ScreenshotOne | ScreenshotWebPageBlock | 2 | $0.0085 / call (2.4x) |
| Nvidia | NvidiaDeepfakeDetectBlock | 2 | est $0.005 (no public SKU) |
| Smartlead | CreateCampaignBlock | 2 | $0.0065 send-equivalent (3x) |
| | AddLeadToCampaignBlock | 1 | $0.0065 (1.5x) |
| | SaveCampaignSequencesBlock | 1 | config-only |
| ZeroBounce | ValidateEmailsBlock | 2 | $0.008 / email (2.5x) |
| E2B + Anthropic | ClaudeCodeBlock | **100** | $0.50-$2 / typical
session (E2B sandbox + in-sandbox Claude) |

**Not in scope** — already covered via the SDK
`ProviderBuilder.with_base_cost()` pattern in their respective
`_config.py`: Exa, Linear, Airtable, Bannerbear, Wolfram, Firecrawl,
Wordpress, Baas, Stagehand, Dataforseo.

### How

1. `backend/data/block_cost_config.py` — 13 new `BlockCost` entries (3
Perplexity models + Fact Checker + 11 from this round).
2. `backend/copilot/rate_limit.py` — boundary docstring.
3. `backend/blocks/perplexity.py` — populate
`NodeExecutionStats.provider_cost` so PlatformCostLog rows carry the
real OpenRouter `x-total-cost` value.
4. Tests — `TestUnregisteredBlockRunsFree` regression +
`TestNewlyRegisteredBlockCosts` pinning every new entry by `cost_amount`
so a future refactor can't quietly drop one.

The companion Notion "Platform System Credentials" database has been
updated with a new `Platform Credit Cost` column populated across all 30
provider rows.

### Scope trim

An earlier revision piped block execution cost into the **copilot
microdollar counter** via `_record_block_microdollar_cost` in
`copilot/tools/helpers.py::execute_block`. That was reverted in
`16ae0f7b5` — the microdollar counter stays scoped to copilot LLM turns
only, credit wallet handles block execution. The pipe-through crossed a
boundary we explicitly want to keep.

### Changes

- `backend/data/block_cost_config.py` — 13 × `BlockCost` entries across
7 providers.
- `backend/blocks/perplexity.py` — populate `provider_cost` on the
execution stats (feeds PlatformCostLog).
- `backend/copilot/rate_limit.py` — boundary docstring only (no
behaviour change).
- `backend/copilot/tools/helpers_test.py` —
`TestUnregisteredBlockRunsFree` + `TestNewlyRegisteredBlockCosts` (8 new
regression tests).
- `backend/blocks/block_cost_tracking_test.py` — provider-cost
extraction pins.

### Checklist

For code changes:
- [x] Changes listed above
- [x] Test plan below
- [x] Tested according to the test plan:
- [x] `poetry run pytest backend/copilot/tools/helpers_test.py
backend/copilot/tools/run_block_test.py
backend/copilot/tools/continue_run_block_test.py
backend/blocks/block_cost_tracking_test.py
backend/blocks/test/test_perplexity.py` — passes
- [x] `poetry run pytest backend/executor/manager_cost_tracking_test.py
backend/copilot/rate_limit_test.py
backend/copilot/token_tracking_test.py` — passes (confirms docstring
edits didn't regress the LLM-turn microdollar path)
  - [x] Pyright clean on all touched files
- [ ] Manual: run PerplexityBlock via copilot `run_block` — credits
deduct, PlatformCostLog row visible with `provider_cost`, no
microdollar-counter tick.
- [ ] Manual: run an unregistered block via copilot — no error, no
credit drain, no silent billing.
- [ ] Manual: run ClaudeCodeBlock via builder — 100 credits deducted
from wallet.

### Companion PR

PR #12873 ships the copilot microdollar / rate-limit work (web_search
cost, simulator cost, reasoning / reconnect fixes). This PR is
credit-wallet only.
majdyz added 4 commits April 22, 2026 12:44
…at/copilot-reasoning-render-flag-and-reconnect-fixes
… / 50 ms

Widens the wire-emission window from 32/40 → 64/50 to halve the React
re-render rate on Kimi K2.6 turns (~4,700 deltas/turn). Still well under
the ~100 ms perceptual threshold so the collapse stays responsive.
Per-delta persistence to session.messages is unchanged.
Enables include_partial_messages=True on ClaudeAgentOptions so the Claude
CLI emits raw Anthropic content_block_delta events as StreamEvent messages
ahead of the summary AssistantMessage. The adapter routes text_delta to
StreamTextDelta and thinking_delta to StreamReasoningDelta with a 64-char
/ 50 ms coalesce window (matching baseline) to tame paint-storm on the
non-virtualised chat list.

Before: CLI batched a whole TextBlock or ThinkingBlock per AssistantMessage,
so long responses popped in as a lump after content_block_stop. After: the
frontend sees each token within the perceptual threshold.

The summary AssistantMessage still drives tool_use emission
(StreamToolInputStart / Available) — partial input_json_delta is not
consumed because the frontend tool widgets need the final payload. A
_partial_emitted flag suppresses the summary's text/thinking repeat so the
wire doesn't double-emit.

9 new tests in response_adapter_test.py cover: text_delta pass-through,
thinking_delta coalesce under / over threshold, tail drain on
content_block_stop, summary suppression when partial already streamed,
tool_use still emitted from summary, and malformed-payload resilience.
Full SDK suite: 936 passed.
@majdyz majdyz changed the title feat(platform/copilot): reasoning render flag + reconnect fixes + Sonar web_search + simulator cost tracking feat(platform/copilot): live per-token streaming (baseline + SDK) + render flag + Sonar web_search + simulator cost tracking + reconnect fixes Apr 22, 2026
majdyz added 3 commits April 22, 2026 13:29
…cating replies

A single _partial_emitted flag was flipping True on any StreamEvent delta,
which then suppressed BOTH text and thinking on the later AssistantMessage
summary. If a turn streamed thinking via partial but shipped its text only
in the summary (OpenRouter proxy quirks, CLI summary-only short blocks,
signature-only encrypted thinking cases), the text never reached the wire
and the UI showed an empty response after 'Thought for Xs'.

Split into _streamed_text_from_partial / _streamed_thinking_from_partial
so each kind is gated independently. A block emits from the summary iff
that kind did NOT stream via StreamEvent.

Added 2 tests covering the per-kind gap cases (thinking-only partial +
text-only summary, and the converse).
@majdyz majdyz changed the title feat(platform/copilot): live per-token streaming (baseline + SDK) + render flag + Sonar web_search + simulator cost tracking + reconnect fixes feat(platform/copilot): live baseline streaming + render flag + Sonar web_search + simulator cost tracking + reconnect fixes Apr 22, 2026
@majdyz majdyz merged commit 33a608e into dev Apr 22, 2026
44 checks passed
@majdyz majdyz deleted the feat/copilot-reasoning-render-flag-and-reconnect-fixes branch April 22, 2026 06:52
@github-project-automation github-project-automation Bot moved this from 🆕 Needs initial review to ✅ Done in AutoGPT development kanban Apr 22, 2026
@github-project-automation github-project-automation Bot moved this to Done in Frontend Apr 22, 2026
majdyz added a commit that referenced this pull request Apr 22, 2026
…lag-gated)

Matches the baseline path's progressive-reveal UX (shipped in #12873)
on the SDK path too.  When ``CHAT_SDK_INCLUDE_PARTIAL_MESSAGES=true``
the Claude Agent SDK CLI emits raw Anthropic ``content_block_*``
events as ``StreamEvent`` messages ahead of each summary
``AssistantMessage``, and ``SDKResponseAdapter`` forwards them
token-by-token to the wire instead of waiting for the lumped summary.

Fixes (for Kimi K2.6 via OpenRouter specifically):

- **Reasoning-before-text order** — Moonshot places ``reasoning`` AFTER
  visible text in the response.  The partial stream delivers blocks in
  their authoritative CLI order so the UI sees reasoning first, as the
  natural reading order Anthropic models produce.
- **"Thought for Ns" followed by a long pause** — extended_thinking
  reasoning streams live instead of landing all-at-once at
  ``content_block_stop``.

Per-kind binary suppression (what was reverted from
``feat/copilot-reasoning-render-flag-and-reconnect-fixes`` in 599e835
/ 530fa8f) would have truncated replies when partial delivered only
a prefix and the summary carried the full text.  The fix this commit
ships is **diff-based reconcile**: per-index ``_emitted_text_by_index``
and ``_emitted_thinking_by_index`` maps track cumulative emitted
content, and the summary walk emits
``summary_block.content[len(already_emitted):]`` — the tail the partial
stream didn't cover.  If the two views diverge (partial content isn't a
prefix of the summary — rare, logged as a warning) the summary wins so
no authoritative content is lost.

Coalescing matches the baseline window: thinking_delta events buffer
to a 64-char / 50 ms threshold before flushing to keep the non-
virtualised chat list out of paint-storm territory on Kimi's ~4,700-
events-per-turn reasoning channel.

Scope guard: flag defaults to **False**.  When off, the CLI never
emits ``StreamEvent`` so the adapter code path is dormant — existing
summary-only behaviour is unchanged and CI's summary-only tests still
pass.  Rollout plan per ``docs/sdk-per-token-streaming-followup.md``:
enable per deployment, watch for divergence warnings + truncation
reports for a week, then flip the default.

Test coverage: 11 new cases in
``TestPartialMessageStreaming`` covering the 10 scenarios from the
follow-up doc (partial+summary agreement, short-partial/long-summary
tail emission, summary-only fallback, long-partial no-duplicate,
partial/summary divergence, thinking-only coalesce, text-only,
mixed-order, multi-message reset, empty thinking, tail drain on
block_stop, char-threshold mid-block flush) plus a no-double-emit
reconcile guard.
majdyz added a commit that referenced this pull request Apr 22, 2026
…al streaming with #12873

Incoming changes from dev:
- #12876: paid-blocks registration (no conflict)
- #12873: baseline live streaming + render_reasoning_in_ui flag +
  stream_replay_count — lands on response_adapter.py and config_test.py

Conflicts and resolution:
- config_test.py _ENV_VARS_TO_CLEAR: took the union — our model-alias
  additions (CHAT_FAST_* / CHAT_THINKING_* / CHAT_CLAUDE_AGENT_FALLBACK_MODEL)
  and dev's render-flag additions (CHAT_RENDER_REASONING_IN_UI /
  CHAT_STREAM_REPLAY_COUNT) are independent.
- config_test.py test classes: both sides appended; kept our
  TestSdkModelVendorCompatibility next to dev's TestRenderReasoningInUi /
  TestStreamReplayCount.
- response_adapter.py ThinkingBlock summary branch: our diff-based tail
  emission (_thinking_tail_for_block) and dev's explanatory comment about
  render_reasoning_in_ui-gated persistence both apply — kept the
  comment since it accurately describes the persistence behaviour, and
  adopted the tail-emit logic since it's what makes partial+summary
  reconcile work.  render_reasoning_in_ui is enforced at the service
  yield layer (service.py:2463) not in the adapter, so there's no
  behaviour conflict between the two changes.
EOF
)
majdyz added a commit that referenced this pull request Apr 22, 2026
Resolved two conflicts:
- baseline/service.py: dev's queue-based _emit() refactor of
  pending_events + dev's added Iterable import kept; my Task
  short-circuit now funnels through _emit(state, result) too.
  Inner-state drain switched from list.clear() to a queue drain
  + emitted_events.clear() to match the new shape.
- tools/tool_schema_test.py: budget resolution chained dev's
  32800→33200 bump (web_search deep mode, PR #12873) with my
  33200→34600 bump for baseline TodoWrite/Task.  Actual size is 34492.
Also regenerated openapi.json against the merged backend to pick up
dev's concurrent schema changes.
majdyz added a commit that referenced this pull request Apr 22, 2026
Flip ``CHAT_SDK_INCLUDE_PARTIAL_MESSAGES`` default from False → True so
extended-thinking turns on the SDK path stream token-by-token instead
of popping in as a lump at ``content_block_stop``.  Matches the UX
the baseline path has had since #12873.

The partial/summary diff-based reconcile in the adapter has been stable
through internal soak — partial deltas + summary tail emit without
double-writing or truncation across text, thinking, and tool_use
block types.

Kill-switch: set ``CHAT_SDK_INCLUDE_PARTIAL_MESSAGES=false`` to fall
back to summary-only emission if an adapter regression surfaces.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

platform/backend AutoGPT Platform - Back end platform/frontend AutoGPT Platform - Front end size/xl

Projects

Status: ✅ Done
Status: Done

Development

Successfully merging this pull request may close these issues.

1 participant