Skip to content

feat(platform): estimate CoPilot turn cost and require approval for high-cost requests#12877

Open
Rushi-Balapure wants to merge 3 commits into
Significant-Gravitas:devfrom
Rushi-Balapure:feature/cost_estimation
Open

feat(platform): estimate CoPilot turn cost and require approval for high-cost requests#12877
Rushi-Balapure wants to merge 3 commits into
Significant-Gravitas:devfrom
Rushi-Balapure:feature/cost_estimation

Conversation

@Rushi-Balapure
Copy link
Copy Markdown

Why / What / How

Why
CoPilot requests can unexpectedly become expensive (large prompts, high-token contexts, expensive model paths), and users currently have no guardrail before execution. This creates risk of accidental high-cost turns.
What
This MR introduces a cost-safety flow:

  1. Estimate turn cost before execution.
  2. Compare estimate against a configurable threshold.
  3. If threshold is exceeded, block execution until explicit user approval is provided.
  4. Require a short-lived, server-validated approval token on the follow-up execute request.
    How
  • Added a pre-execution cost estimation step to CoPilot turn handling.
  • Added threshold evaluation logic to determine whether approval is required.
  • Added approval-token issuance/validation flow with expiry and server-side verification.
  • Bound approval token validity to the relevant request context to prevent replay/misuse.
  • Updated execution flow to reject high-cost requests without a valid approval token.
  • Added UI/API handling to surface estimated cost and request explicit confirmation when required.

Changes 🏗️

  • Added request-cost estimation before CoPilot turn execution.
  • Added configurable cost threshold guardrail for expensive turns.
  • Added short-lived server-validated approval token mechanism for over-threshold requests.
  • Enforced token validation in execution path when approval is required.
  • Added response states/errors for approval_required, missing token, and expired/invalid token.
  • Added/updated tests for estimation, threshold gating, token validation, and end-to-end approval flow.

Checklist 📋

For code changes:

  • I have clearly listed my changes in the PR description
  • I have made a test plan
  • I have tested my changes according to the test plan:
    • [x ] Unit: cost estimation logic returns expected estimates for representative request sizes/models
    • Unit: threshold gating correctly allows below-threshold and blocks above-threshold requests
    • Unit: approval token issuance/validation (valid, expired, invalid, replayed token scenarios)
    • Integration: over-threshold request returns approval_required and estimated cost
    • Integration: re-submitting with valid approval token allows execution
    • Integration: missing/expired/invalid token is rejected with correct error response
    • UI/API flow: user sees cost warning and can explicitly approve/retry successfully

For configuration changes:

  • .env.default is updated or already compatible with my changes
  • docker-compose.yml is updated or already compatible with my changes
  • I have included a list of my configuration changes in the PR description (under Changes)
    Configuration changes (if applicable):
  • Added/updated configurable cost threshold for CoPilot turn approval gating.
  • Added/updated approval token TTL configuration.
    If you want, I can also give you a shorter “squashed” version (more concise for internal MRs) or a stricter version aligned to Conventional Commit wording.

Torantulino and others added 3 commits April 17, 2026 15:22
…tion (Significant-Gravitas#12832)

## Problem

The CoPilot system prompt contains a `gh auth status` instruction in the
E2B-specific `GitHub CLI` section, but models pattern-match to
`connect_integration` from the **Tool Discovery Priority** section —
which is where the actual decision to call an external service is made.

Because the GitHub auth check lives in a separate, later section, it's
not salient at the point of decision-making. This causes the model to
call `connect_integration(provider='github')` even when `gh` is already
authenticated via `GH_TOKEN`, unnecessarily prompting the user.

## Fix

Add a 3-line callout directly inside the **Tool Discovery Priority**
section:

```
> 🔑 **GitHub exception:** Before calling `connect_integration` for GitHub,
> always run `gh auth status` first. If it shows `Logged in`, proceed
> directly with `gh`/`git` — no integration connection needed.
```

This places the rule at the exact location where the model decides which
tool path to take, preventing the miss.

## Why this works

- **Placement over repetition**: The existing instruction isn't wrong —
it's just in the wrong spot relative to where the decision is made
- **Negative framing**: Explicitly says "before calling
`connect_integration`" which directly intercepts the incorrect reflex
- **Minimal change**: 4 lines added, zero removed

Co-authored-by: Toran Bruce Richards <22963551+Torantulino@users.noreply.github.com>
Prevent accidental expensive CoPilot turns by estimating request cost before execution and requiring a short-lived server-validated approval token when thresholds are exceeded.
@Rushi-Balapure Rushi-Balapure requested a review from a team as a code owner April 22, 2026 03:57
@Rushi-Balapure Rushi-Balapure requested review from 0ubbe and kcze and removed request for a team April 22, 2026 03:57
@github-project-automation github-project-automation Bot moved this to 🆕 Needs initial review in AutoGPT development kanban Apr 22, 2026
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 22, 2026

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
0 out of 2 committers have signed the CLA.

❌ Torantulino
❌ Rushi-Balapure
You have signed the CLA already but the status is still pending? Let us recheck it.

@github-actions
Copy link
Copy Markdown
Contributor

This PR targets the master branch but does not come from dev or a hotfix/* branch.

Automatically setting the base branch to dev.

@github-actions github-actions Bot changed the base branch from master to dev April 22, 2026 03:58
@github-actions github-actions Bot added platform/frontend AutoGPT Platform - Front end platform/backend AutoGPT Platform - Back end labels Apr 22, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 22, 2026

Walkthrough

This PR implements cost estimation and user approval workflows for CoPilot chat turns. It adds backend cost-calculation logic, signed time-limited approval tokens, configuration settings, and a frontend confirmation dialog to gate execution of requests exceeding a user-configurable cost threshold.

Changes

Cohort / File(s) Summary
Backend Chat Routes
autogpt_platform/backend/backend/api/features/chat/routes.py, autogpt_platform/backend/backend/api/features/chat/routes_test.py
Added /sessions/{session_id}/estimate endpoint to calculate per-turn costs; extended stream_chat_post to validate approval tokens before execution when confirmation is required; updated tests to cover missing/invalid token scenarios and successful cost estimation flows.
Cost Estimation & Approval
autogpt_platform/backend/backend/copilot/cost_estimation.py, autogpt_platform/backend/backend/copilot/cost_approval.py, autogpt_platform/backend/backend/copilot/cost_estimation_test.py
Introduced CoPilotTurnCostEstimate model capturing resolved path, model, token/cost estimates, and confirmation requirements; implemented HMAC-SHA256 signed approval token generation and validation with configurable TTL; added round-trip and security tests (tampering, expiry, malformed payloads).
Configuration & Feature Flags
autogpt_platform/backend/backend/copilot/config.py, autogpt_platform/backend/backend/util/feature_flag.py
Added copilot_cost_confirmation_threshold_usd (default 10.0) and copilot_cost_approval_ttl_seconds (default 600) settings to ChatConfig; added LaunchDarkly feature flag COPILOT_COST_CONFIRMATION_THRESHOLD_USD for per-user threshold overrides.
Frontend Cost Confirmation UI
autogpt_platform/frontend/src/app/(platform)/copilot/components/CostConfirmationDialog/CostConfirmationDialog.tsx, autogpt_platform/frontend/src/app/(platform)/copilot/CopilotPage.tsx, autogpt_platform/frontend/src/app/(platform)/copilot/__tests__/CopilotPage.test.tsx
Added CostConfirmationDialog component to render estimated cost, threshold, and model details; wired into CopilotPage with cancel/confirm handlers; updated test mocks for state shape consistency.
Frontend State Management
autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts, autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts
Extended useCopilotStream with cost estimate fetching, promise-based approval flow, and special error handling for cancellation; propagated pendingCostEstimate, confirmCostEstimate, and dismissCostEstimate through hook exports; added cleanup on unmount and session switching.

Sequence Diagram

sequenceDiagram
    participant User
    participant Frontend
    participant Backend as Backend API
    participant CostSvc as Cost Service
    participant DB as Approval Token

    User->>Frontend: Submit message + request
    Frontend->>Backend: POST /sessions/{id}/estimate
    Backend->>CostSvc: estimate_copilot_turn_cost(...)
    CostSvc->>CostSvc: Resolve path/model, calculate tokens & cost
    CostSvc-->>Backend: CoPilotTurnCostEstimate
    
    alt Confirmation Required
        Backend->>DB: generate_cost_approval_token(...)
        DB-->>Backend: signed token
        Backend-->>Frontend: {requires_confirmation: true, approval_token, ...}
        Frontend->>User: Show CostConfirmationDialog
        
        alt User Confirms
            User->>Frontend: Click "Continue"
            Frontend->>Backend: POST /sessions/{id}/stream + approval_token
            Backend->>DB: validate_cost_approval_token(...)
            DB-->>Backend: valid
            Backend->>Backend: Process message + enqueue task
            Backend-->>Frontend: Stream response (200)
        else User Cancels
            User->>Frontend: Click "Cancel"
            Frontend-->>User: Dismiss dialog
        end
    else No Confirmation Needed
        Backend-->>Frontend: {requires_confirmation: false, ...}
        Frontend->>Backend: POST /sessions/{id}/stream
        Backend->>Backend: Process message + enqueue task
        Backend-->>Frontend: Stream response (200)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

The changes span multiple architectural layers with heterogeneous edits: cryptographic token logic requiring careful security review, complex cost-estimation heuristics with LaunchDarkly integration, async approval-based state management across frontend and backend, and tight integration points across multiple modules. The variety of concerns (security, business logic, state coordination) and density of new logic in several files elevates the cognitive load.

Possibly related PRs

Suggested labels

size/xl, platform/frontend, platform/backend

Suggested reviewers

  • 0ubbe
  • kcze
  • Swiftyos

Poem

🐰 A hop, a skip, costs we now estimate!
Tokens signed with HMAC—no need to hesitate.
"Is this too dear?" the dialog softly pleads,
Users click "Continue," fulfilling CoPilot's needs.
Approval flows like waterfalls, secure and light.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 32.56% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and accurately summarizes the main change: implementing cost estimation and approval requirements for high-cost CoPilot requests.
Description check ✅ Passed The description is comprehensive, well-structured, and directly related to the changeset, explaining the why/what/how, listing all major changes, and including test coverage details.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added size/xl conflicts Automatically applied to PRs with merge conflicts labels Apr 22, 2026
@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts with the base branch, please resolve those so we can evaluate the pull request.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 22, 2026

Codecov Report

❌ Patch coverage is 53.53846% with 151 lines in your changes missing coverage. Please review.
✅ Project coverage is 65.05%. Comparing base (0d4b31e) to head (1ce3009).
⚠️ Report is 39 commits behind head on dev.

Additional details and impacted files
@@            Coverage Diff             @@
##              dev   #12877      +/-   ##
==========================================
- Coverage   65.14%   65.05%   -0.09%     
==========================================
  Files        1831     1835       +4     
  Lines      135945   136266     +321     
  Branches    14534    14581      +47     
==========================================
+ Hits        88555    88650      +95     
- Misses      44670    44896     +226     
  Partials     2720     2720              
Flag Coverage Δ
platform-backend 75.76% <64.34%> (-0.10%) ⬇️
platform-frontend 19.30% <0.00%> (-0.05%) ⬇️
platform-frontend-e2e 29.75% <20.51%> (-0.27%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
Platform Backend 75.76% <64.34%> (-0.10%) ⬇️
Platform Frontend 27.15% <11.94%> (-0.08%) ⬇️
AutoGPT Libs ∅ <ø> (∅)
Classic AutoGPT 28.43% <ø> (ø)
🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (6)
autogpt_platform/frontend/src/app/(platform)/copilot/components/CostConfirmationDialog/CostConfirmationDialog.tsx (1)

7-14: Reuse the generated estimate type instead of duplicating it.

CostEstimate mirrors a subset of the backend CoPilotTurnCostEstimate response. Import the generated model and Pick<> the displayed fields so this stays in sync after OpenAPI regeneration.

♻️ Suggested type alignment
+import type { CoPilotTurnCostEstimate } from "@/app/api/__generated__/models/coPilotTurnCostEstimate";
 import { Button } from "@/components/atoms/Button/Button";
 import { Text } from "@/components/atoms/Text/Text";
 import { Dialog } from "@/components/molecules/Dialog/Dialog";
 
-type CostEstimate = {
-  resolved_path: "baseline" | "sdk";
-  resolved_model: string;
-  estimated_llm_calls: number;
-  estimated_cost_usd: number;
-  confirmation_threshold_usd: number;
-  rationale: string;
-};
+type CostEstimate = Pick<
+  CoPilotTurnCostEstimate,
+  | "resolved_path"
+  | "resolved_model"
+  | "estimated_llm_calls"
+  | "estimated_cost_usd"
+  | "confirmation_threshold_usd"
+  | "rationale"
+>;

Based on learnings, prefer using generated OpenAPI types from @/app/api/__generated__/ for payloads defined in openapi.json; use inline interfaces only for SSE-stream-only payloads not exposed via OpenAPI.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@autogpt_platform/frontend/src/app/`(platform)/copilot/components/CostConfirmationDialog/CostConfirmationDialog.tsx
around lines 7 - 14, Replace the duplicated inline CostEstimate type with a Pick
from the generated OpenAPI type: import CoPilotTurnCostEstimate from the
generated models (from "@/app/api/__generated__/...") and define CostEstimate as
Pick<CoPilotTurnCostEstimate, 'resolved_path' | 'resolved_model' |
'estimated_llm_calls' | 'estimated_cost_usd' | 'confirmation_threshold_usd' |
'rationale'> so the component uses the canonical backend shape and stays in sync
after OpenAPI regeneration; update any usages of CostEstimate in
CostConfirmationDialog.tsx accordingly.
autogpt_platform/backend/backend/copilot/cost_approval.py (2)

138-140: Minor: tighten _urlsafe_encode_json payload type.

The annotation dict[str, str | int] is narrower than what is actually written — payload["h"] is a hex string, payload["e"] is int, but if build_cost_approval_fingerprint's return type ever changes (e.g. to return bytes) the type checker will not catch it. Use dict[str, object] or, better, introduce a small Pydantic / TypedDict for the token payload (u, s, h, e) so generate and validate share the exact shape and the isinstance(payload, dict) + payload.get(...) checks in validate_cost_approval_token can be replaced by a single model-parse step with typed access.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/cost_approval.py` around lines 138 -
140, The _urlsafe_encode_json payload annotation is too narrow; update it to
accept the actual token payload shape (either change to dict[str, object] or
better, define a TypedDict/Pydantic model for the token with keys u,s,h,e) and
use that type for _urlsafe_encode_json and the generator/validator functions;
implement the model/TypedDict and replace the ad-hoc isinstance/dict.get checks
in validate_cost_approval_token with a single model parse so
build_cost_approval_fingerprint, the generate function, _urlsafe_encode_json,
and validate_cost_approval_token share the exact typed payload shape.

21-49: async on pure CPU functions is misleading.

generate_cost_approval_token and validate_cost_approval_token do no await — they are synchronous HMAC/JSON code. Marking them async forces callers to pay the await overhead and wrongly suggests I/O. Either drop async (callers will need a trivial update) or keep async but plan for a real async key-fetch path (e.g., pulling the signing key from Redis / Secrets Manager in the recommendation above — which would genuinely be async).

Also applies to: 52-94

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/cost_approval.py` around lines 21 -
49, Both generate_cost_approval_token and validate_cost_approval_token are
marked async but perform only synchronous CPU-bound work (JSON/HMAC) and do not
await anything; remove the async keyword from both function definitions (and
their corresponding async annotations/returns if present) so they become regular
synchronous functions, then update any call sites that currently "await" them to
call directly; alternatively, if you intend to fetch signing keys asynchronously
later, replace the direct key access in _sign_payload/_verify_payload with an
async key-fetch hook and keep the functions async—choose one approach and apply
it consistently across generate_cost_approval_token,
validate_cost_approval_token, _sign_payload, and _verify_payload and their
callers.
autogpt_platform/backend/backend/copilot/cost_estimation.py (1)

226-232: Redundant final branch in _model_pricing_per_million.

The "sonnet" branch and the fallback both return (3.0, 15.0), so the default case silently prices every unknown model at Sonnet rates. If that's intentional (unknown → assume cheapest supported tier), add a one-line comment; otherwise this will understate cost for future Opus/Haiku-3.5/etc. model IDs that don't contain "opus" or "sonnet". Consider pulling rates from a single source of truth (the same price table used by the billing/credit system) so this doesn't drift.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/cost_estimation.py` around lines 226
- 232, The _model_pricing_per_million function currently returns (3.0, 15.0) for
both the "sonnet" branch and the final fallback, which silently classifies
unknown models as Sonnet-tier; either make this explicit or change it to a safer
behavior: update _model_pricing_per_million to (a) reference the canonical
pricing table used by billing/credits instead of hardcoded literals, and (b)
either add a one-line comment stating "unknown models default to Sonnet rates"
if that is intended, or return a conservative default / raise/log an error for
unknown model_name values so future Opus/Haiku/etc. IDs aren't underpriced;
locate and modify the function _model_pricing_per_million to implement the
chosen fix.
autogpt_platform/backend/backend/copilot/cost_estimation_test.py (1)

37-109: Add coverage for tampering vectors not exercised by these four tests.

Current tests cover valid round-trip, message tampering, expiry, and malformed input. To fully exercise validate_cost_approval_token's branches, consider adding:

  • user_id / session_id mismatch between generation and validation.
  • Signature tampering (flip a char in the signature half).
  • is_user_message / context / file_ids / mode / model divergence (each is fingerprinted; regression if any field is accidentally dropped from build_cost_approval_fingerprint).
  • Empty/whitespace token and token missing the . separator (exercises the split(".", 1) ValueError path distinctly from the current malformed-but-dotted case).

These would lock in the security guarantees of the helper against accidental refactors.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/cost_estimation_test.py` around
lines 37 - 109, Add tests exercising more tampering vectors for
validate_cost_approval_token: create tokens via generate_cost_approval_token
then assert validation fails when (1) user_id or session_id are changed at
validation (user/session mismatch), (2) the signature is tampered by flipping a
character in the signature half of the token (signature tampering), (3) any
fingerprinted field differs at validation (is_user_message, context, file_ids,
mode, model) — write one test per field to ensure
build_cost_approval_fingerprint dependencies are covered, and (4)
malformed-token edge cases: an empty/whitespace token and a token missing the
'.' separator (to hit the split ValueError branch). Use the existing test
patterns and the functions generate_cost_approval_token and
validate_cost_approval_token to implement these assertions.
autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts (1)

41-54: Import the estimate response type from generated models instead of redeclaring it.

CoPilotCostEstimate mirrors the backend CoPilotTurnCostEstimate Pydantic model (note: backend type is CoPilotTurnCostEstimate, this is CoPilotCostEstimate — the names already drift). Once the estimate endpoint is consumed via the generated client, import the generated model type so field additions/renames on the backend surface as a compile-time break here instead of silent undefined. Also pick whichever casing the generated TS model uses for "launchdarkly" / "default" and "baseline" | "sdk" rather than maintaining a hand-written union.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/frontend/src/app/`(platform)/copilot/useCopilotStream.ts
around lines 41 - 54, Replace the local CoPilotCostEstimate interface in
useCopilotStream.ts with the generated TypeScript model from the API client (the
backend model is CoPilotTurnCostEstimate) by importing that generated type and
updating all usages of CoPilotCostEstimate to the imported model; remove the
hand-written union/casing and adopt whatever casing and enum values the
generated model exposes so future backend changes surface as compile-time type
errors.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@autogpt_platform/backend/backend/api/features/chat/routes.py`:
- Around line 905-929: The approval token is validated but not consumed,
allowing reuse within the approval window; modify the flow around
estimate.requires_confirmation/validate_cost_approval_token to atomically mark
the token as consumed before saving/enqueueing the request: after
validate_cost_approval_token returns true (for request.approval_token, user_id,
session_id, message, etc.), perform an atomic consume operation (e.g., Redis SET
NX with a key derived from the token and a short TTL or store a nonce/hash) and
if the consume fails treat the token as already-used and raise
HTTPException(403) with an appropriate message; ensure the consume+check is done
before any persistence/enqueue steps so concurrent requests cannot reuse the
same token.

In `@autogpt_platform/backend/backend/copilot/cost_approval.py`:
- Line 18: _current implementation creates a per-process _ephemeral_secret at
import which causes cross-worker token validation failures and silently reuses
unrelated secrets; change this by (1) introducing a dedicated configuration key
(e.g., COST_APPROVAL_SIGNING_KEY / cost_approval_signing_key) and use it for
signing/validation in cost_approval.py instead of multiplexing ENCRYPTION_KEY,
UNSUBSCRIBE_SECRET_KEY, or SUPABASE_SERVICE_ROLE_KEY, (2) if none of those env
vars are set, log a clear [SECURITY] warning (matching the cache.py warning
pattern) when generating or validating tokens, and (3) avoid per-process
fallbacks — either require the dedicated key or derive a stable deployment-wide
fallback from a shared store/bootstrap secret (e.g., Redis or other config
provider) rather than secrets.token_bytes so tokens remain valid across workers;
update token generation/validation paths that reference _ephemeral_secret to use
the new key logic.

In `@autogpt_platform/backend/backend/copilot/cost_estimation_test.py`:
- Around line 1-109: The test file is named incorrectly—it's exercising
backend.copilot.cost_approval (generate_cost_approval_token,
validate_cost_approval_token) but is called cost_estimation_test.py; rename the
test file to cost_approval_test.py so it colocates and clearly maps to
cost_approval.py, update any test imports if they reference the old filename,
and ensure no other tests use cost_estimation_test.py so the reserved name
remains available for tests that target cost_estimation.py in the future.

In `@autogpt_platform/backend/backend/copilot/cost_estimation.py`:
- Around line 107-125: The current resolve_cost_confirmation_threshold function
misattributes the source when LaunchDarkly returns a value equal to the config
fallback; change get_feature_flag_value to return an explicit signal about
whether the returned value came from LaunchDarkly or was the fallback (e.g.,
return (raw_value, from_flag: bool) or (raw_value, source:
"launchdarkly"|"default")), then update resolve_cost_confirmation_threshold to
use that signal to set the returned source (keep using
Flag.COPILOT_COST_CONFIRMATION_THRESHOLD_USD.value and
config.copilot_cost_confirmation_threshold_usd to compute numeric fallback and
threshold but rely on the new second return value from get_feature_flag_value to
decide between "launchdarkly" and "default").
- Around line 150-158: The estimate vs approval can diverge if the session
accrues messages after estimating, so update the fingerprint to include session
identity + history version: modify build_cost_approval_fingerprint to accept and
incorporate session.session_id (or equivalent) and a
history_version/message_count (e.g., len(session.messages) or a monotonic
version) so any change in conversation invalidates the prior approval; keep
_estimate_history_tokens' slicing logic (last 30) in sync with any bounding used
when building the fingerprint so both use the same window/length semantics and
ensure the same symbols (_estimate_history_tokens and
build_cost_approval_fingerprint) are updated together.
- Around line 7-14: cost_estimation.py is importing module-private helpers
(_resolve_baseline_model and _resolve_model_and_multiplier); change the
dependency to public APIs by either (A) renaming and exporting the existing
helpers in their modules (backend.copilot.baseline.service: expose
resolve_baseline_model or get_effective_baseline_model;
backend.copilot.sdk.service: expose resolve_model_and_multiplier or
get_effective_sdk_model) and update the imports in cost_estimation.py to those
public names, or (B) add new thin public wrapper helpers in those modules (e.g.,
get_effective_baseline_model(mode) and get_effective_sdk_model(model,
session_id)) and import those from cost_estimation.py instead; ensure you update
any calls in cost_estimation.py that reference _resolve_baseline_model and
_resolve_model_and_multiplier to the new public function names.

In `@autogpt_platform/frontend/src/app/`(platform)/copilot/useCopilotStream.ts:
- Around line 106-144: fetchCostEstimate is missing an AbortSignal so in-flight
POSTs can complete after a session switch/unmount and mutate stale state; modify
the logic to create an AbortController tied to the current sessionId, pass
controller.signal into fetch inside fetchCostEstimate, and abort that controller
both in the session-switch effect and the unmount cleanup (where
costApprovalResolverRef and pendingCostEstimateRef are cleared). Ensure
pendingCostEstimateRef and costApprovalResolverRef are set/cleared only for the
matching sessionId (use sessionIdRef.current === sid guard like
handleWakeResync) so resolved promises/state updates do not apply to a different
session if aborting is not possible.

---

Nitpick comments:
In `@autogpt_platform/backend/backend/copilot/cost_approval.py`:
- Around line 138-140: The _urlsafe_encode_json payload annotation is too
narrow; update it to accept the actual token payload shape (either change to
dict[str, object] or better, define a TypedDict/Pydantic model for the token
with keys u,s,h,e) and use that type for _urlsafe_encode_json and the
generator/validator functions; implement the model/TypedDict and replace the
ad-hoc isinstance/dict.get checks in validate_cost_approval_token with a single
model parse so build_cost_approval_fingerprint, the generate function,
_urlsafe_encode_json, and validate_cost_approval_token share the exact typed
payload shape.
- Around line 21-49: Both generate_cost_approval_token and
validate_cost_approval_token are marked async but perform only synchronous
CPU-bound work (JSON/HMAC) and do not await anything; remove the async keyword
from both function definitions (and their corresponding async
annotations/returns if present) so they become regular synchronous functions,
then update any call sites that currently "await" them to call directly;
alternatively, if you intend to fetch signing keys asynchronously later, replace
the direct key access in _sign_payload/_verify_payload with an async key-fetch
hook and keep the functions async—choose one approach and apply it consistently
across generate_cost_approval_token, validate_cost_approval_token,
_sign_payload, and _verify_payload and their callers.

In `@autogpt_platform/backend/backend/copilot/cost_estimation_test.py`:
- Around line 37-109: Add tests exercising more tampering vectors for
validate_cost_approval_token: create tokens via generate_cost_approval_token
then assert validation fails when (1) user_id or session_id are changed at
validation (user/session mismatch), (2) the signature is tampered by flipping a
character in the signature half of the token (signature tampering), (3) any
fingerprinted field differs at validation (is_user_message, context, file_ids,
mode, model) — write one test per field to ensure
build_cost_approval_fingerprint dependencies are covered, and (4)
malformed-token edge cases: an empty/whitespace token and a token missing the
'.' separator (to hit the split ValueError branch). Use the existing test
patterns and the functions generate_cost_approval_token and
validate_cost_approval_token to implement these assertions.

In `@autogpt_platform/backend/backend/copilot/cost_estimation.py`:
- Around line 226-232: The _model_pricing_per_million function currently returns
(3.0, 15.0) for both the "sonnet" branch and the final fallback, which silently
classifies unknown models as Sonnet-tier; either make this explicit or change it
to a safer behavior: update _model_pricing_per_million to (a) reference the
canonical pricing table used by billing/credits instead of hardcoded literals,
and (b) either add a one-line comment stating "unknown models default to Sonnet
rates" if that is intended, or return a conservative default / raise/log an
error for unknown model_name values so future Opus/Haiku/etc. IDs aren't
underpriced; locate and modify the function _model_pricing_per_million to
implement the chosen fix.

In
`@autogpt_platform/frontend/src/app/`(platform)/copilot/components/CostConfirmationDialog/CostConfirmationDialog.tsx:
- Around line 7-14: Replace the duplicated inline CostEstimate type with a Pick
from the generated OpenAPI type: import CoPilotTurnCostEstimate from the
generated models (from "@/app/api/__generated__/...") and define CostEstimate as
Pick<CoPilotTurnCostEstimate, 'resolved_path' | 'resolved_model' |
'estimated_llm_calls' | 'estimated_cost_usd' | 'confirmation_threshold_usd' |
'rationale'> so the component uses the canonical backend shape and stays in sync
after OpenAPI regeneration; update any usages of CostEstimate in
CostConfirmationDialog.tsx accordingly.

In `@autogpt_platform/frontend/src/app/`(platform)/copilot/useCopilotStream.ts:
- Around line 41-54: Replace the local CoPilotCostEstimate interface in
useCopilotStream.ts with the generated TypeScript model from the API client (the
backend model is CoPilotTurnCostEstimate) by importing that generated type and
updating all usages of CoPilotCostEstimate to the imported model; remove the
hand-written union/casing and adopt whatever casing and enum values the
generated model exposes so future backend changes surface as compile-time type
errors.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 378f69d2-b70f-4b9e-b495-0394a0bd23cb

📥 Commits

Reviewing files that changed from the base of the PR and between 1c0c7a6 and 1ce3009.

📒 Files selected for processing (12)
  • autogpt_platform/backend/backend/api/features/chat/routes.py
  • autogpt_platform/backend/backend/api/features/chat/routes_test.py
  • autogpt_platform/backend/backend/copilot/config.py
  • autogpt_platform/backend/backend/copilot/cost_approval.py
  • autogpt_platform/backend/backend/copilot/cost_estimation.py
  • autogpt_platform/backend/backend/copilot/cost_estimation_test.py
  • autogpt_platform/backend/backend/util/feature_flag.py
  • autogpt_platform/frontend/src/app/(platform)/copilot/CopilotPage.tsx
  • autogpt_platform/frontend/src/app/(platform)/copilot/__tests__/CopilotPage.test.tsx
  • autogpt_platform/frontend/src/app/(platform)/copilot/components/CostConfirmationDialog/CostConfirmationDialog.tsx
  • autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts
  • autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts

Comment on lines +905 to +929
if estimate.requires_confirmation:
if not request.approval_token:
raise HTTPException(
status_code=403,
detail=(
"This task requires cost confirmation before execution. "
"Request an estimate and confirm to continue."
),
)
approved = await validate_cost_approval_token(
token=request.approval_token,
user_id=user_id,
session_id=session_id,
message=request.message,
is_user_message=request.is_user_message,
context=request.context,
file_ids=request.file_ids,
mode=request.mode,
model=request.model,
)
if not approved:
raise HTTPException(
status_code=403,
detail="Cost confirmation token is invalid or expired.",
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Inspect the cost approval and route code for one-time token consumption.
# Expected: a stateful consume/nonce check (for example Redis SET NX / stored token ID)
# should exist on the stream approval path before append/enqueue.

rg -nP 'validate_cost_approval_token|generate_cost_approval_token|consume|nonce|jti|used|replay|setnx|set\(|delete\(' \
  autogpt_platform/backend/backend/copilot/cost_approval.py \
  autogpt_platform/backend/backend/api/features/chat/routes.py

Repository: Significant-Gravitas/AutoGPT

Length of output: 2056


🏁 Script executed:

#!/bin/bash
# Get the implementation of both approval token functions
echo "=== COST APPROVAL FUNCTIONS ===" 
cat -n autogpt_platform/backend/backend/copilot/cost_approval.py

echo ""
echo "=== TOKEN VALIDATION IN ROUTES (lines 905-930) ===" 
sed -n '905,930p' autogpt_platform/backend/backend/api/features/chat/routes.py

Repository: Significant-Gravitas/AutoGPT

Length of output: 6279


🏁 Script executed:

#!/bin/bash
# Check for any Redis operations on approval tokens
rg -A 5 -B 5 'approval.*redis|redis.*approval|approval.*token.*delete|approval.*token.*nonce' \
  autogpt_platform/backend/backend/copilot/ \
  autogpt_platform/backend/backend/api/features/chat/

Repository: Significant-Gravitas/AutoGPT

Length of output: 54


🏁 Script executed:

#!/bin/bash
# Search for token consumption or one-time use patterns in backend
fd -e py autogpt_platform/backend/backend/copilot/ autogpt_platform/backend/backend/api/features/chat/ \
  | xargs rg -l 'jti|nonce|one.time|one_time|consumed|consume_token' 2>/dev/null

Repository: Significant-Gravitas/AutoGPT

Length of output: 3347


🏁 Script executed:

#!/bin/bash
# Search for dedup/duplicate-post lock duration and approval token TTL
rg -n 'dedup|duplicate|idempotent|ttl.*approval|approval.*ttl|600' \
  autogpt_platform/backend/backend/api/features/chat/routes.py \
  autogpt_platform/backend/backend/copilot/cost_approval.py \
  autogpt_platform/backend/backend/api/features/chat/

Repository: Significant-Gravitas/AutoGPT

Length of output: 7638


🏁 Script executed:

#!/bin/bash
# Find where approval tokens are generated/called - check default TTL
rg -B 3 -A 3 'generate_cost_approval_token' autogpt_platform/backend/backend/api/features/chat/routes.py

Repository: Significant-Gravitas/AutoGPT

Length of output: 587


🏁 Script executed:

#!/bin/bash
# Search for Redis dedup or request deduplication logic
rg -n 'duplicate|dedup|idempotent|seen.*request|request.*seen|reques.*hash' \
  autogpt_platform/backend/backend/api/features/chat/routes.py | head -20

Repository: Significant-Gravitas/AutoGPT

Length of output: 1135


🏁 Script executed:

#!/bin/bash
# Find the config class definition with copilot_cost_approval_ttl_seconds
rg -n 'copilot_cost_approval_ttl_seconds|class.*Config|ChatConfig' \
  autogpt_platform/backend/backend/copilot/ \
  | grep -E '(copilot_cost_approval|ChatConfig|class)' | head -20

Repository: Significant-Gravitas/AutoGPT

Length of output: 2250


🏁 Script executed:

#!/bin/bash
# Search for ChatConfig definition
fd -e py autogpt_platform/backend/backend/copilot/ | xargs grep -l 'class ChatConfig'

Repository: Significant-Gravitas/AutoGPT

Length of output: 543


🏁 Script executed:

#!/bin/bash
# Find config.py files in copilot
find autogpt_platform/backend/backend/copilot -name '*config*' -type f

Repository: Significant-Gravitas/AutoGPT

Length of output: 303


🏁 Script executed:

#!/bin/bash
# Read the config.py file to find copilot_cost_approval_ttl_seconds
cat -n autogpt_platform/backend/backend/copilot/config.py | head -100

Repository: Significant-Gravitas/AutoGPT

Length of output: 4711


🏁 Script executed:

#!/bin/bash
# Search for the approval TTL value
rg -n 'copilot_cost_approval_ttl_seconds|approval.*ttl' autogpt_platform/backend/backend/copilot/config.py

Repository: Significant-Gravitas/AutoGPT

Length of output: 127


🏁 Script executed:

#!/bin/bash
# Get the full definition of copilot_cost_approval_ttl_seconds
sed -n '129,135p' autogpt_platform/backend/backend/copilot/config.py

Repository: Significant-Gravitas/AutoGPT

Length of output: 267


🏁 Script executed:

#!/bin/bash
# Read message_dedup.py to understand the dedup lock mechanism
cat -n autogpt_platform/backend/backend/copilot/message_dedup.py

Repository: Significant-Gravitas/AutoGPT

Length of output: 2973


Add one-time consumption of approval tokens before enqueueing.

The token validation checks signature, expiry, and request fingerprint, but does not mark the token as consumed. Within the 600-second approval window, the same token can be reused to execute the identical high-cost request after the dedup lock expires. Add an atomic consume step (e.g., Redis SET NX with a nonce/hash) before saving and enqueueing the request, and reject already-consumed tokens.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/api/features/chat/routes.py` around lines
905 - 929, The approval token is validated but not consumed, allowing reuse
within the approval window; modify the flow around
estimate.requires_confirmation/validate_cost_approval_token to atomically mark
the token as consumed before saving/enqueueing the request: after
validate_cost_approval_token returns true (for request.approval_token, user_id,
session_id, message, etc.), perform an atomic consume operation (e.g., Redis SET
NX with a key derived from the token and a short TTL or store a nonce/hash) and
if the consume fails treat the token as already-used and raise
HTTPException(403) with an appropriate message; ensure the consume+check is done
before any persistence/enqueue steps so concurrent requests cannot reuse the
same token.

CopilotLlmModel = Literal["standard", "advanced"]

settings = Settings()
_ephemeral_secret = secrets.token_bytes(32)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Confirm these secrets are actually populated in every deployed env (.env.default, compose, docs).
rg -nP 'encryption_key|unsubscribe_secret_key|supabase_service_role_key|cost_approval' autogpt_platform/backend -g '!**/cost_approval.py' -C1
rg -nP 'ENCRYPTION_KEY|UNSUBSCRIBE_SECRET_KEY|SUPABASE_SERVICE_ROLE_KEY' autogpt_platform -g '*.env*' -g 'docker-compose*.yml' -C1
# Check worker count in production config to confirm multi-worker risk.
rg -nP 'workers\s*[=:]' autogpt_platform -g '*.py' -g '*.toml' -g '*.yml' -C1

Repository: Significant-Gravitas/AutoGPT

Length of output: 12569


🏁 Script executed:

cat -n autogpt_platform/backend/backend/copilot/cost_approval.py

Repository: Significant-Gravitas/AutoGPT

Length of output: 5268


🏁 Script executed:

# Check for tests that might exercise the ephemeral fallback
rg -nA5 'ephemeral|all three|unset.*secret' autogpt_platform/backend/backend/copilot --type py

# Check if startup validates secrets
rg -nA5 'encryption_key|unsubscribe_secret_key|supabase_service_role_key' autogpt_platform/backend/backend/util/settings.py -B2 -A2

# Check deployment docs
find autogpt_platform -name '*.md' -o -name 'README*' | head -20 | xargs grep -l 'ENCRYPTION_KEY\|secret' 2>/dev/null | head -5

Repository: Significant-Gravitas/AutoGPT

Length of output: 33655


🏁 Script executed:

# Check if there's startup validation that would fail if secrets are missing
rg -nB3 -A3 'Settings()' autogpt_platform/backend/backend/util/settings.py | head -40

# Check if there's any environment validation at app startup
rg -nA5 'def.*init|startup|validate' autogpt_platform/backend/backend/api/main.py -C2 | head -50

# Check if tests actually trigger the ephemeral fallback path
rg -nA10 'test.*approval.*token' autogpt_platform/backend/backend/copilot/cost_estimation_test.py | head -60

Repository: Significant-Gravitas/AutoGPT

Length of output: 2977


🏁 Script executed:

# Look for where the FastAPI app is initialized and if there's startup validation
fd -t f -name 'app.py' -o -name 'server.py' autogpt_platform/backend | head -5

# Check what the default/common deployment scenarios are
rg -nA5 'if.*encryption_key|if.*unsubscribe_secret_key|if not.*secret' autogpt_platform/backend/backend/util -C1

# Verify that .env.default is actually used and consulted in normal deployments
find autogpt_platform/backend -name 'README*' -o -name 'docker-compose*' -o -name 'deploy*' | xargs cat 2>/dev/null | grep -i 'env\|secret' | head -20

Repository: Significant-Gravitas/AutoGPT

Length of output: 1600


🏁 Script executed:

# Find the main app initialization
find autogpt_platform/backend -name 'main.py' -type f

# Check if there's any secret validation in __init__ or setup
rg -n 'def __init__|def startup|logger.warning.*encryption_key' autogpt_platform/backend/backend/util --type py -A3

# Look for the pattern of how cache.py handles missing encryption_key as a model
cat -n autogpt_platform/backend/backend/util/cache.py | sed -n '449,460p'

Repository: Significant-Gravitas/AutoGPT

Length of output: 13004


Silent fallback to per-process ephemeral secret creates multi-worker validation failures and lacks warning precedent.

When none of encryption_key, unsubscribe_secret_key, or supabase_service_role_key are configured, _ephemeral_secret (line 18) provides a per-process signing key generated at module import time. This breaks token validation across workers: a cost-approval token signed by worker A using its ephemeral secret cannot be validated by worker B using a different ephemeral secret, producing intermittent 403 approval_required / invalid token errors.

While .env.default does populate ENCRYPTION_KEY and UNSUBSCRIBE_SECRET_KEY, making this fallback unlikely in standard deployments, the design lacks the safety precedent established in cache.py (lines 449–456), which logs [SECURITY] warning when encryption_key is missing. Additionally, reusing three unrelated secrets (encryption_key for data, unsubscribe_secret_key for email HMAC, supabase_service_role_key for external bearer) as HMAC signing material violates cryptographic key separation, even though collision risk is low for distinct message domains.

Recommendations:

  1. Add explicit warning at token generation/validation if all three secrets are unset, matching cache.py pattern.
  2. Prefer a dedicated cost_approval_signing_key over multiplexing unrelated secrets.
  3. If fallback is necessary, derive a stable per-deployment key (e.g., from Redis or a shared bootstrap secret) rather than per-process.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/cost_approval.py` at line 18,
_current implementation creates a per-process _ephemeral_secret at import which
causes cross-worker token validation failures and silently reuses unrelated
secrets; change this by (1) introducing a dedicated configuration key (e.g.,
COST_APPROVAL_SIGNING_KEY / cost_approval_signing_key) and use it for
signing/validation in cost_approval.py instead of multiplexing ENCRYPTION_KEY,
UNSUBSCRIBE_SECRET_KEY, or SUPABASE_SERVICE_ROLE_KEY, (2) if none of those env
vars are set, log a clear [SECURITY] warning (matching the cache.py warning
pattern) when generating or validating tokens, and (3) avoid per-process
fallbacks — either require the dedicated key or derive a stable deployment-wide
fallback from a shared store/bootstrap secret (e.g., Redis or other config
provider) rather than secrets.token_bytes so tokens remain valid across workers;
update token generation/validation paths that reference _ephemeral_secret to use
the new key logic.

Comment on lines +1 to +109
"""Tests for CoPilot cost-estimation helpers."""

import pytest

from backend.copilot import cost_approval


@pytest.mark.asyncio
async def test_approval_token_round_trip_valid():
token = await cost_approval.generate_cost_approval_token(
user_id="user-1",
session_id="sess-1",
message="Analyze this task",
is_user_message=True,
context={"url": "https://example.com", "content": "ctx"},
file_ids=["11111111-1111-1111-1111-111111111111"],
mode="extended_thinking",
model="advanced",
ttl_seconds=600,
)

is_valid = await cost_approval.validate_cost_approval_token(
token=token,
user_id="user-1",
session_id="sess-1",
message="Analyze this task",
is_user_message=True,
context={"url": "https://example.com", "content": "ctx"},
file_ids=["11111111-1111-1111-1111-111111111111"],
mode="extended_thinking",
model="advanced",
)

assert is_valid is True


@pytest.mark.asyncio
async def test_approval_token_invalid_when_message_changes():
token = await cost_approval.generate_cost_approval_token(
user_id="user-1",
session_id="sess-1",
message="Original message",
is_user_message=True,
context=None,
file_ids=None,
mode=None,
model=None,
ttl_seconds=600,
)

is_valid = await cost_approval.validate_cost_approval_token(
token=token,
user_id="user-1",
session_id="sess-1",
message="Tampered message",
is_user_message=True,
context=None,
file_ids=None,
mode=None,
model=None,
)

assert is_valid is False


@pytest.mark.asyncio
async def test_approval_token_invalid_when_expired():
token = await cost_approval.generate_cost_approval_token(
user_id="user-1",
session_id="sess-1",
message="Original message",
is_user_message=True,
context=None,
file_ids=None,
mode=None,
model=None,
ttl_seconds=-1,
)

is_valid = await cost_approval.validate_cost_approval_token(
token=token,
user_id="user-1",
session_id="sess-1",
message="Original message",
is_user_message=True,
context=None,
file_ids=None,
mode=None,
model=None,
)

assert is_valid is False


@pytest.mark.asyncio
async def test_approval_token_invalid_for_malformed_payload():
is_valid = await cost_approval.validate_cost_approval_token(
token="not-a-valid-token.payload",
user_id="user-1",
session_id="sess-1",
message="Original message",
is_user_message=True,
context=None,
file_ids=None,
mode=None,
model=None,
)

assert is_valid is False
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Test file name does not match the module under test.

This file exclusively tests backend.copilot.cost_approval (generate_cost_approval_token / validate_cost_approval_token), but the filename cost_estimation_test.py pairs it with cost_estimation.py. Per the colocation convention, rename this file to cost_approval_test.py and keep cost_estimation_test.py reserved for tests of cost_estimation.py. This will also prevent future confusion when a real test module for cost_estimation.py is added.

As per coding guidelines: "Colocate test files with source files using *_test.py naming convention".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/cost_estimation_test.py` around
lines 1 - 109, The test file is named incorrectly—it's exercising
backend.copilot.cost_approval (generate_cost_approval_token,
validate_cost_approval_token) but is called cost_estimation_test.py; rename the
test file to cost_approval_test.py so it colocates and clearly maps to
cost_approval.py, update any test imports if they reference the old filename,
and ensure no other tests use cost_estimation_test.py so the reserved name
remains available for tests that target cost_estimation.py in the future.

Comment on lines +7 to +14
from backend.copilot.baseline.service import _resolve_baseline_model
from backend.copilot.config import ChatConfig, CopilotLlmModel, CopilotMode
from backend.copilot.executor.processor import (
resolve_effective_mode,
resolve_use_sdk_for_mode,
)
from backend.copilot.model import ChatSession
from backend.copilot.sdk.service import _resolve_model_and_multiplier
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Confirm these are truly private in their source modules (no other non-test import sites).
rg -nP '\b_resolve_baseline_model\b' autogpt_platform/backend -C2
rg -nP '\b_resolve_model_and_multiplier\b' autogpt_platform/backend -C2

Repository: Significant-Gravitas/AutoGPT

Length of output: 6285


Importing private helpers from sibling modules (_resolve_baseline_model, _resolve_model_and_multiplier).

Both symbols are _-prefixed, signalling module-private internals. Importing them into cost_estimation.py from backend.copilot.baseline.service and backend.copilot.sdk.service couples this module to implementation details that the owning modules are free to refactor without notice. Either promote these to public names (resolve_baseline_model, resolve_model_and_multiplier) in their source modules, or expose dedicated public helpers (get_effective_baseline_model(mode), get_effective_sdk_model(model, session_id)) that cost_estimation.py can depend on.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/cost_estimation.py` around lines 7 -
14, cost_estimation.py is importing module-private helpers
(_resolve_baseline_model and _resolve_model_and_multiplier); change the
dependency to public APIs by either (A) renaming and exporting the existing
helpers in their modules (backend.copilot.baseline.service: expose
resolve_baseline_model or get_effective_baseline_model;
backend.copilot.sdk.service: expose resolve_model_and_multiplier or
get_effective_sdk_model) and update the imports in cost_estimation.py to those
public names, or (B) add new thin public wrapper helpers in those modules (e.g.,
get_effective_baseline_model(mode) and get_effective_sdk_model(model,
session_id)) and import those from cost_estimation.py instead; ensure you update
any calls in cost_estimation.py that reference _resolve_baseline_model and
_resolve_model_and_multiplier to the new public function names.

Comment on lines +107 to +125
async def resolve_cost_confirmation_threshold(
user_id: str | None,
) -> tuple[float, Literal["launchdarkly", "default"]]:
"""Resolve threshold from LaunchDarkly with config fallback."""
fallback = max(0.0, float(config.copilot_cost_confirmation_threshold_usd))
if not user_id:
return fallback, "default"
raw_value = await get_feature_flag_value(
Flag.COPILOT_COST_CONFIRMATION_THRESHOLD_USD.value,
user_id,
fallback,
)
try:
threshold = max(0.0, float(raw_value))
except (TypeError, ValueError):
return fallback, "default"
if threshold == fallback:
return threshold, "default"
return threshold, "launchdarkly"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

threshold_source misattribution when LaunchDarkly returns a value equal to the config fallback.

The equality check if threshold == fallback: return threshold, "default" conflates two different states:

  1. LaunchDarkly had no rule matching this user, get_feature_flag_value returned the fallback default → correctly "default".
  2. LaunchDarkly does have a rule, and the targeted value happens to equal config.copilot_cost_confirmation_threshold_usd (e.g. both are 10.0) → should be "launchdarkly" but is reported as "default".

This affects the response field exposed to the frontend and anyone logging it for analytics ("which users are on LD-targeted thresholds?"). If get_feature_flag_value can distinguish "no variation" vs. "variation returned", prefer that signal. Otherwise, document the known ambiguity in a comment so it isn't treated as a bug later.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/cost_estimation.py` around lines 107
- 125, The current resolve_cost_confirmation_threshold function misattributes
the source when LaunchDarkly returns a value equal to the config fallback;
change get_feature_flag_value to return an explicit signal about whether the
returned value came from LaunchDarkly or was the fallback (e.g., return
(raw_value, from_flag: bool) or (raw_value, source: "launchdarkly"|"default")),
then update resolve_cost_confirmation_threshold to use that signal to set the
returned source (keep using Flag.COPILOT_COST_CONFIRMATION_THRESHOLD_USD.value
and config.copilot_cost_confirmation_threshold_usd to compute numeric fallback
and threshold but rely on the new second return value from
get_feature_flag_value to decide between "launchdarkly" and "default").

Comment on lines +150 to +158
def _estimate_history_tokens(session: ChatSession) -> int:
messages = [
{"role": m.role, "content": m.content or ""}
for m in session.messages[-30:]
if m.role in {"system", "user", "assistant", "tool"}
]
if not messages:
return 0
return estimate_token_count(messages)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fingerprint inputs and estimator history should stay in sync.

_estimate_history_tokens counts the last 30 messages of session.messages, but the cost-approval fingerprint (build_cost_approval_fingerprint in cost_approval.py) only binds message, is_user_message, context, file_ids, mode, modelnot the session's prior history. That is intentional (you can't easily re-bind a conversation's state), but be aware: between estimate and confirm, the session can accrue new messages (e.g., a second tab), the re-run-time history is larger, and the real cost may exceed the approved estimate. Consider either:

  • documenting this gap in the module docstring / API docs, or
  • binding session_id + a message-count/version into the fingerprint so a noticeably-different conversation forces re-estimation.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/cost_estimation.py` around lines 150
- 158, The estimate vs approval can diverge if the session accrues messages
after estimating, so update the fingerprint to include session identity +
history version: modify build_cost_approval_fingerprint to accept and
incorporate session.session_id (or equivalent) and a
history_version/message_count (e.g., len(session.messages) or a monotonic
version) so any change in conversation invalidates the prior approval; keep
_estimate_history_tokens' slicing logic (last 30) in sync with any bounding used
when building the fingerprint so both use the same window/length semantics and
ensure the same symbols (_estimate_history_tokens and
build_cost_approval_fingerprint) are updated together.

Comment on lines +106 to +144
async function fetchCostEstimate(
sid: string,
requestBody: {
message: string;
is_user_message: boolean;
context: null;
file_ids: string[] | null;
mode: CopilotMode | null;
model: CopilotLlmModel | null;
},
headers: Record<string, string>,
): Promise<CoPilotCostEstimate> {
const response = await fetch(
`${environment.getAGPTServerBaseUrl()}/api/chat/sessions/${sid}/estimate`,
{
method: "POST",
headers: {
...headers,
"Content-Type": "application/json",
},
body: JSON.stringify(requestBody),
},
);

if (!response.ok) {
const detail = await response.text();
throw new Error(detail || "Failed to estimate request cost");
}
return (await response.json()) as CoPilotCostEstimate;
}

useEffect(() => {
return () => {
const resolver = costApprovalResolverRef.current;
costApprovalResolverRef.current = null;
pendingCostEstimateRef.current = null;
resolver?.(null);
};
}, []);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

In-flight fetchCostEstimate is not cancelled on session switch / unmount.

fetchCostEstimate has no AbortSignal. If the user switches sessions (or the component unmounts) while the estimate POST is in flight, the promise still resolves against the old session and the subsequent setPendingCostEstimate / waitForCostApproval mutates state for a session the UI no longer owns. The unmount cleanup only resolves the approval promise (costApprovalResolverRef); it does not abort the HTTP request. Pass an AbortController.signal into fetch and abort it from the session-switch effect and the unmount cleanup, or guard the post-fetch state writes with a session-id check (similar to the sessionIdRef.current !== sid pattern already used in handleWakeResync).

🛠️ Sketch
-  async function fetchCostEstimate(
-    sid: string,
-    requestBody: { ... },
-    headers: Record<string, string>,
-  ): Promise<CoPilotCostEstimate> {
+  async function fetchCostEstimate(
+    sid: string,
+    requestBody: { ... },
+    headers: Record<string, string>,
+    signal?: AbortSignal,
+  ): Promise<CoPilotCostEstimate> {
     const response = await fetch(
       `${environment.getAGPTServerBaseUrl()}/api/chat/sessions/${sid}/estimate`,
-      { method: "POST", headers: { ...headers, "Content-Type": "application/json" }, body: JSON.stringify(requestBody) },
+      { method: "POST", headers: { ...headers, "Content-Type": "application/json" }, body: JSON.stringify(requestBody), signal },
     );
     ...
   }

Thread an AbortController tied to the current sessionId and abort from the session-switch effect and the unmount cleanup.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/frontend/src/app/`(platform)/copilot/useCopilotStream.ts
around lines 106 - 144, fetchCostEstimate is missing an AbortSignal so in-flight
POSTs can complete after a session switch/unmount and mutate stale state; modify
the logic to create an AbortController tied to the current sessionId, pass
controller.signal into fetch inside fetchCostEstimate, and abort that controller
both in the session-switch effect and the unmount cleanup (where
costApprovalResolverRef and pendingCostEstimateRef are cleared). Ensure
pendingCostEstimateRef and costApprovalResolverRef are set/cleared only for the
matching sessionId (use sessionIdRef.current === sid guard like
handleWakeResync) so resolved promises/state updates do not apply to a different
session if aborting is not possible.

majdyz added a commit that referenced this pull request Apr 30, 2026
`self._active_tasks_lock = threading.Lock()` in `__init__` (added in #12877
to make `_cleanup_completed_tasks` thread-safe) holds a `_thread.lock`
that the forkserver/spawn start method cannot serialize. With it set
eagerly, `Process(target=self.execute_run_command).start()` from
`AppProcess.start()` raises `TypeError: cannot pickle '_thread.lock'
object` and `poetry run app` aborts at startup before the REST server
binds.

Move the lock to a lazy `@property _active_tasks_lock` so the parent
process never holds a real `threading.Lock` instance — the lock is
materialized inside the forked child the first time
`_cleanup_completed_tasks` runs, where pickling is no longer in play.
This mirrors the existing lazy-init pattern already used for the
ThreadPoolExecutor, RabbitMQ clients, and consumer threads in this
class.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

conflicts Automatically applied to PRs with merge conflicts platform/backend AutoGPT Platform - Back end platform/frontend AutoGPT Platform - Front end size/xl

Projects

Status: 🆕 Needs initial review
Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants