feat(supervisor): add opt-in dequeue backpressure#3836
Conversation
🦋 Changeset detectedLatest commit: e67644b The changes in this PR will be included in the next version bump. This PR includes changesets to release 25 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughThis pull request implements a Redis-backed dequeue backpressure system for the supervisor. When enabled, the supervisor periodically reads a backpressure verdict from Redis and uses it to pause consumer scale-up and probabilistically skip dequeue attempts. The system includes verdict timestamps and staleness handling, a post-release ramp that gradually resumes dequeueing, a dry-run mode, Prometheus metrics, environment configuration, supervisor wiring to start/stop the monitor and Redis client, and a new consumer-pool 🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Cached, fail-open monitor that decides whether to skip dequeues based on a pluggable signal source. Disabled is a total no-op (no refresh, no reads). The hot-path read is synchronous and never performs I/O; every failure mode (source throws, returns null, or verdict goes stale) fails open.
Reads the backpressure verdict from a Redis key (written by the cluster-side aggregator). Malformed or wrong-shaped values are treated as unknown so the monitor fails open. Adds @internal/redis + @internal/testcontainers deps.
Gate dequeues on the backpressure verdict via the existing preDequeue hook, on all paths including k8s (where the resource monitor is a no-op). Construct the Redis-backed monitor only when TRIGGER_DEQUEUE_BACKPRESSURE_ENABLED is set; require a Redis host when enabled. Off by default - no Redis client, no effect.
isEngaged() exposes the hard backpressure state (drives scale-up freeze), while shouldSkipDequeue() additionally ramps after release - skipping a linearly-decaying fraction of attempts over rampMs so the aggregate dequeue rate climbs back to full instead of snapping and re-flooding the cluster.
Add optional shouldPauseScaling to ScalingOptions; when it returns true the pool stops scaling up (scale-down still allowed), so it won't add consumers to drain a queue backpressure is deliberately holding.
Pass shouldPauseScaling (monitor.isEngaged) into the consumer pool so scale-up freezes while hard-engaged, and feed TRIGGER_DEQUEUE_BACKPRESSURE_RAMP_MS into the monitor's post-release ramp. Off by default.
Dry-run (default on via env) keeps the gates inert while computeEngaged still reflects the real signal and verdict transitions are logged. Adds BackpressureMetrics (engaged/dry_run gauges, skipped-dequeues counter).
Guard against overlapping refresh ticks when a read hangs; use the staleness-aware computeEngaged() for transition/ramp/gauge bookkeeping; close the backpressure Redis client on supervisor shutdown.
Add TRIGGER_DEQUEUE_BACKPRESSURE_REDIS_PASSWORD to the secret strip-list so it never lands in the DEBUG startup log, with a comment to keep new secrets out.
@internal/redis ships TS source (no dist), so 'node dist/index.js' crashed with ERR_UNKNOWN_FILE_EXTENSION on the .ts file at startup. Use ioredis directly (same ^5.3.2 as @internal/redis, no lockfile drift), mirroring smooth-operator.
@trigger.dev/build
trigger.dev
@trigger.dev/core
@trigger.dev/plugins
@trigger.dev/python
@trigger.dev/react-hooks
@trigger.dev/redis-worker
@trigger.dev/rsc
@trigger.dev/schema-to-json
@trigger.dev/sdk
commit: |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
apps/supervisor/package.json (1)
21-21: Confirm ioredis security status forioredis@^5.3.2There are no published CVEs or GitHub security advisories specifically for
ioredisv5.3.2; dependency scanners may still flag indirect/transitive issues. Consider running your package manager’s audit and upgrading to the latestioredisrelease to pick up dependency fixes.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 90871fd0-a183-489a-9c7d-01e39644bc6c
⛔ Files ignored due to path filters (1)
pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (4)
apps/supervisor/package.jsonapps/supervisor/src/backpressure/redisBackpressureSignalSource.test.tsapps/supervisor/src/backpressure/redisBackpressureSignalSource.tsapps/supervisor/src/index.ts
🚧 Files skipped from review as they are similar to previous changes (3)
- apps/supervisor/src/backpressure/redisBackpressureSignalSource.test.ts
- apps/supervisor/src/index.ts
- apps/supervisor/src/backpressure/redisBackpressureSignalSource.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (30)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (8, 8)
- GitHub Check: sdk-compat / Node.js 20.20 (ubuntu-latest)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (7, 8)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (2, 8)
- GitHub Check: internal / 🧪 Unit Tests: Internal (2, 8)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (5, 8)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (6, 8)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (3, 8)
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
- GitHub Check: internal / 🧪 Unit Tests: Internal (8, 8)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (1, 8)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (4, 8)
- GitHub Check: sdk-compat / Node.js 22.12 (ubuntu-latest)
- GitHub Check: internal / 🧪 Unit Tests: Internal (5, 8)
- GitHub Check: internal / 🧪 Unit Tests: Internal (3, 8)
- GitHub Check: internal / 🧪 Unit Tests: Internal (1, 8)
- GitHub Check: internal / 🧪 Unit Tests: Internal (7, 8)
- GitHub Check: internal / 🧪 Unit Tests: Internal (6, 8)
- GitHub Check: internal / 🧪 Unit Tests: Internal (4, 8)
- GitHub Check: sdk-compat / Cloudflare Workers
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
- GitHub Check: sdk-compat / Deno Runtime
- GitHub Check: typecheck / typecheck
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
- GitHub Check: sdk-compat / Bun Runtime
- GitHub Check: e2e-webapp / 🧪 E2E Tests: Webapp
- GitHub Check: packages / 🧪 Unit Tests: Packages (1, 1)
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Build and publish previews
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{js,ts,tsx,jsx,css,json,md}
📄 CodeRabbit inference engine (AGENTS.md)
Use Prettier for code formatting and run
pnpm run formatbefore committing
Files:
apps/supervisor/package.json
🧠 Learnings (4)
📓 Common learnings
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: packages/redis-worker/CLAUDE.md:0-0
Timestamp: 2026-03-02T12:43:43.173Z
Learning: Applies to packages/redis-worker/**/*@(job|queue|worker|background).{ts,tsx} : Use trigger.dev/redis-worker for all new background job implementations, replacing graphile-worker and zodworker
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3625
File: apps/webapp/app/services/taskMetadataCache.server.ts:270-291
Timestamp: 2026-05-15T08:05:57.683Z
Learning: In the triggerdotdev/trigger.dev codebase, `populateByCurrentWorker()` in `apps/webapp/app/services/taskMetadataCache.server.ts` intentionally logs and swallows Redis errors rather than rethrowing. The design rationale: rethrowing would propagate into `ChangeCurrentDeploymentService.call` and break deploy promotion when Redis is briefly unavailable; the 24h `TASK_META_CACHE_CURRENT_ENV_TTL_SECONDS` TTL acts as the self-healing window for cache drift, and next-promotion overwrites the env key sooner in practice. A compensating DEL on failure is also not a win because if Redis is unreachable the DEL fails identically, and Lua scripts are atomic so a partial write is impossible. Do not flag this log+swallow pattern as a bug in future reviews.
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: packages/redis-worker/CLAUDE.md:0-0
Timestamp: 2026-03-02T12:43:43.173Z
Learning: Applies to packages/redis-worker/**/redis-worker/src/queue.ts : Job queue abstraction should be Redis-backed in src/queue.ts
Learnt from: nicktrn
Repo: triggerdotdev/trigger.dev PR: 3114
File: apps/supervisor/src/index.ts:83-98
Timestamp: 2026-03-27T18:11:57.032Z
Learning: In `apps/supervisor/src/index.ts`, `RESOURCE_MONITOR_ENABLED` (env var in `apps/supervisor/src/env.ts`) defaults to `false`. As a result, the local `ResourceMonitor`-based `maxResources`/`skipDequeue` gating in `preDequeue` is inactive in compute mode deployments. Do not flag local resource monitor usage in compute mode as a live bug; it has no practical impact unless `RESOURCE_MONITOR_ENABLED` is explicitly set to `true`.
Learnt from: d-cs
Repo: triggerdotdev/trigger.dev PR: 3754
File: apps/webapp/app/v3/mollifierStaleSweepWorker.server.ts:30-32
Timestamp: 2026-06-01T12:05:44.112Z
Learning: In the triggerdotdev/trigger.dev codebase, the mollifier stale-entry sweep (`initMollifierStaleSweepWorker` in `apps/webapp/app/v3/mollifierStaleSweepWorker.server.ts`) intentionally runs per-webapp instance without a distributed lease in its initial implementation. All Redis ops (cursor, counts hash, reconcile) are individually atomic and produce correct shared state even with multiple concurrent sweepers. The known limitation is that OpenTelemetry metric output (`recordStaleEntry`, `reportStaleEntrySnapshot`) multiplies by N webapp instances, mis-calibrating alert thresholds by a factor of N. A SETNX-based per-tick lease (SET NX PX on the sweep's existing Redis) is the planned follow-up fix. Until then, alert thresholds should be scaled accordingly. Do not re-raise this as a blocking correctness bug — it is a documented metric-scaling limitation with a tracked follow-up.
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3558
File: internal-packages/run-engine/src/run-queue/index.ts:420-424
Timestamp: 2026-05-12T06:43:12.346Z
Learning: In the triggerdotdev/trigger.dev codebase (`internal-packages/run-engine/src/run-queue/index.ts`), the established convention in `RunQueue` read-path methods (e.g., `lengthOfQueue`, `lengthOfQueues`, `currentConcurrencyOfQueues`) is to **fail open** on transient Redis pipeline errors: pipeline result errors (`baseErr`, `ctrErr`, etc.) are coerced to `0` rather than surfaced or re-thrown. This is intentional — the project treats Redis command errors the same as missing keys for these counter reads. Do not flag this pattern as a bug or suggest throwing/propagating these errors in future reviews.
Learnt from: d-cs
Repo: triggerdotdev/trigger.dev PR: 3754
File: apps/webapp/app/env.server.ts:1104-1129
Timestamp: 2026-06-01T11:37:12.623Z
Learning: In triggerdotdev/trigger.dev (apps/webapp/app/env.server.ts), new background/periodic worker feature flags should hard-default to "0" (explicitly opt-in) rather than inheriting a parent feature flag (e.g., TRIGGER_MOLLIFIER_ENABLED). Inheriting a parent flag causes the new worker to auto-start on upgrade for any deployment that already has the parent flag enabled, turning on unexpected background load without an explicit rollout step. Each new worker component should require its own explicit opt-in via its own env var (e.g., TRIGGER_MOLLIFIER_STALE_SWEEP_ENABLED defaults to "0", not to process.env.TRIGGER_MOLLIFIER_ENABLED ?? "0").
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3625
File: apps/webapp/app/services/taskMetadataCache.server.ts:72-84
Timestamp: 2026-05-15T08:05:54.659Z
Learning: In `triggerdotdev/trigger.dev`, the `task-meta:*` Redis keyspace in `apps/webapp/app/services/taskMetadataCache.server.ts` is fully self-owned: `RedisTaskMetadataCache` is the sole writer and sole reader of this keyspace. Do not flag the `decode()` function (which casts parsed JSON to `EncodedEntry` and wraps in try/catch) for missing Zod schema validation. The existing `JSON.parse` + `try/catch` → `null` fallback is intentional; a `null` return triggers a safe PG fallback and cache back-fill. Adding Zod validation on every `HGET` was explicitly rejected as unnecessary CPU overhead on the hot path with no real safety benefit given the single-writer contract.
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: apps/supervisor/CLAUDE.md:0-0
Timestamp: 2026-03-02T12:42:47.652Z
Learning: SupervisorSession should manage the dequeue loop with EWMA-based dynamic scaling
📚 Learning: 2026-03-02T12:43:43.173Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: packages/redis-worker/CLAUDE.md:0-0
Timestamp: 2026-03-02T12:43:43.173Z
Learning: Applies to packages/redis-worker/**/redis-worker/**/*.{test,spec}.{ts,tsx} : Use testcontainers for Redis in test files for redis-worker
Applied to files:
apps/supervisor/package.json
📚 Learning: 2026-04-27T16:40:37.692Z
Learnt from: nicktrn
Repo: triggerdotdev/trigger.dev PR: 3456
File: apps/webapp/package.json:230-230
Timestamp: 2026-04-27T16:40:37.692Z
Learning: In `apps/webapp/remix.config.js` (Remix 2.x, CJS server build via `serverModuleFormat: "cjs"`), ESM-only npm packages must be added to the `serverDependenciesToBundle` array so esbuild inlines them rather than emitting a `require()` call. The `engines` field allows Node >=18.19.0 || >=20.6.0, so `require(esm)` (Node 20.19+) cannot be relied upon. Packages already listed include p-limit, p-map, axios, and (as of PR `#3456`) uuid. When upgrading a dependency that drops CJS support, always check the post-build artifact for `require("<package>")` and add it to `serverDependenciesToBundle` if present.
Applied to files:
apps/supervisor/package.json
📚 Learning: 2026-06-02T21:20:56.997Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-06-02T21:20:56.997Z
Learning: Install dependencies with `pnpm i` (requires pnpm `10.33.2` and Node.js `20.20.2`)
Applied to files:
apps/supervisor/package.json
🔇 Additional comments (1)
apps/supervisor/package.json (1)
21-21: Confirm ioredis version range is consistent across the monorepoThe monorepo’s other
package.jsonfiles already use the sameioredisversion range ("ioredis": "^5.3.2"), including:
apps/webapp/package.jsoninternal-packages/redis/package.jsoninternal-packages/testcontainers/package.json
When maxVerdictAgeMs is set, an engaged verdict must carry a fresh ts; a missing or stale ts can't be trusted (a dead producer could otherwise pin the brake), so treat it as not-engaged.
b73683c to
c3bc59b
Compare
Pre-existing secret that wasn't excluded from envWithoutSecrets; add it to the strip-list alongside the backpressure redis password.
## Summary 1 new feature, 8 improvements, 1 bug fix. ## Highlights - Add optional `shouldPauseScaling` to the supervisor consumer pool scaling options to freeze scale-up while it returns true (scale-down stays allowed). ([#3836](#3836)) ## Improvements - The MCP server no longer tells the AI agent to wait for a run to complete after every `trigger_task` call. Waiting is now opt-in: the agent only waits when you ask it to (for example "trigger and then wait for it to finish"). This avoids burning tokens polling runs you didn't need to block on and keeps responses clearer. ([#3838](#3838)) - Update the bundled OpenTelemetry packages to their latest releases (`@opentelemetry/sdk-node` 0.218.0, `@opentelemetry/core` 2.7.1, `@opentelemetry/host-metrics` 0.38.3). ([#3810](#3810)) - `envvars.upload` now accepts an optional `isSecret` flag, letting you create the imported variables as secret (redacted) environment variables. When omitted, variables default to non-secret. ([#3809](#3809)) - Offload large trigger payloads to object storage before sending the trigger API request. The SDK uploads packets at or above the existing 128KB limit and sends an `application/store` pointer instead of embedding large JSON in the request body. `TriggerTaskRequestBody` now validates that `application/store` payloads are non-empty storage paths. ([#3785](#3785)) - Make mollifier buffer and drainer internals configurable. `MollifierBuffer` now accepts `ackGraceTtlSeconds`, `maxRetriesPerRequest`, `reconnectStepMs`, and `reconnectMaxMs` options, and `MollifierDrainer` accepts `maxBackoffMs` and `backoffFloorMs`. All default to their previous hardcoded values, so existing behaviour is unchanged. ([#3822](#3822)) - `MollifierDrainer` accepts a `drainBatchSize` option (default 1) that controls how many entries are popped per env per tick — in-flight handlers remain capped by the global `concurrency`. `MollifierBuffer` also gains `getDrainingCount()` / `listStaleDraining()`, backed by a new `mollifier:draining` ZSET maintained atomically with pop/ack/fail/requeue (observability-only). ([#3797](#3797)) - Adds AI SDK 7 support. The `ai` peer range now includes v7, and the `chat.agent` / chat surfaces work against v7's ESM-only build. On v7, install `@ai-sdk/otel` alongside `ai` and the SDK registers it for you so `experimental_telemetry` spans keep flowing into your run traces (v7 stopped emitting them from `ai` core). v5 and v6 keep working unchanged. ([#3833](#3833)) - `useTriggerChatTransport` now recovers when restored session state points at a session that no longer exists in the current environment ([#3816](#3816)) ## Bug fixes - Fix `@trigger.dev/core` build: cast the underlying log record exporter when calling `forceFlush` so it typechecks against the updated OpenTelemetry `LogRecordExporter` type (which no longer declares `forceFlush`). ([#3829](#3829)) <details> <summary>Raw changeset output</summary>⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ `main` is currently in **pre mode** so this branch has prereleases rather than normal releases. If you want to exit prereleases, run `changeset pre exit` on `main`.⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ # Releases ## @trigger.dev/build@4.5.0-rc.5 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.5.0-rc.5` ## trigger.dev@4.5.0-rc.5 ### Patch Changes - The MCP server no longer tells the AI agent to wait for a run to complete after every `trigger_task` call. Waiting is now opt-in: the agent only waits when you ask it to (for example "trigger and then wait for it to finish"). This avoids burning tokens polling runs you didn't need to block on and keeps responses clearer. ([#3838](#3838)) - Update the bundled OpenTelemetry packages to their latest releases (`@opentelemetry/sdk-node` 0.218.0, `@opentelemetry/core` 2.7.1, `@opentelemetry/host-metrics` 0.38.3). ([#3810](#3810)) - Updated dependencies: - `@trigger.dev/core@4.5.0-rc.5` - `@trigger.dev/build@4.5.0-rc.5` - `@trigger.dev/schema-to-json@4.5.0-rc.5` ## @trigger.dev/core@4.5.0-rc.5 ### Patch Changes - Add optional `shouldPauseScaling` to the supervisor consumer pool scaling options to freeze scale-up while it returns true (scale-down stays allowed). ([#3836](#3836)) - Fix `@trigger.dev/core` build: cast the underlying log record exporter when calling `forceFlush` so it typechecks against the updated OpenTelemetry `LogRecordExporter` type (which no longer declares `forceFlush`). ([#3829](#3829)) - `envvars.upload` now accepts an optional `isSecret` flag, letting you create the imported variables as secret (redacted) environment variables. When omitted, variables default to non-secret. ([#3809](#3809)) ```ts await envvars.upload("proj_1234", "prod", { variables: { STRIPE_SECRET_KEY: "sk_live_..." }, isSecret: true, }); ``` - Offload large trigger payloads to object storage before sending the trigger API request. The SDK uploads packets at or above the existing 128KB limit and sends an `application/store` pointer instead of embedding large JSON in the request body. `TriggerTaskRequestBody` now validates that `application/store` payloads are non-empty storage paths. ([#3785](#3785)) Payload uploads use the same resolved `ApiClient` as the trigger call (including `requestOptions.clientConfig`), not only the global `apiClientManager.client` — so custom `baseURL`, access token, and preview branch apply to both presign and trigger. - Update the bundled OpenTelemetry packages to their latest releases (`@opentelemetry/sdk-node` 0.218.0, `@opentelemetry/core` 2.7.1, `@opentelemetry/host-metrics` 0.38.3). ([#3810](#3810)) ## @trigger.dev/plugins@4.5.0-rc.5 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.5.0-rc.5` ## @trigger.dev/python@4.5.0-rc.5 ### Patch Changes - Updated dependencies: - `@trigger.dev/sdk@4.5.0-rc.5` - `@trigger.dev/core@4.5.0-rc.5` - `@trigger.dev/build@4.5.0-rc.5` ## @trigger.dev/react-hooks@4.5.0-rc.5 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.5.0-rc.5` ## @trigger.dev/redis-worker@4.5.0-rc.5 ### Patch Changes - Make mollifier buffer and drainer internals configurable. `MollifierBuffer` now accepts `ackGraceTtlSeconds`, `maxRetriesPerRequest`, `reconnectStepMs`, and `reconnectMaxMs` options, and `MollifierDrainer` accepts `maxBackoffMs` and `backoffFloorMs`. All default to their previous hardcoded values, so existing behaviour is unchanged. ([#3822](#3822)) - `MollifierDrainer` accepts a `drainBatchSize` option (default 1) that controls how many entries are popped per env per tick — in-flight handlers remain capped by the global `concurrency`. `MollifierBuffer` also gains `getDrainingCount()` / `listStaleDraining()`, backed by a new `mollifier:draining` ZSET maintained atomically with pop/ack/fail/requeue (observability-only). ([#3797](#3797)) - Updated dependencies: - `@trigger.dev/core@4.5.0-rc.5` ## @trigger.dev/rsc@4.5.0-rc.5 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.5.0-rc.5` ## @trigger.dev/schema-to-json@4.5.0-rc.5 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.5.0-rc.5` ## @trigger.dev/sdk@4.5.0-rc.5 ### Patch Changes - Adds AI SDK 7 support. The `ai` peer range now includes v7, and the `chat.agent` / chat surfaces work against v7's ESM-only build. On v7, install `@ai-sdk/otel` alongside `ai` and the SDK registers it for you so `experimental_telemetry` spans keep flowing into your run traces (v7 stopped emitting them from `ai` core). v5 and v6 keep working unchanged. ([#3833](#3833)) - `useTriggerChatTransport` now recovers when restored session state points at a session that no longer exists in the current environment ([#3816](#3816)) - Offload large trigger payloads to object storage before sending the trigger API request. The SDK uploads packets at or above the existing 128KB limit and sends an `application/store` pointer instead of embedding large JSON in the request body. `TriggerTaskRequestBody` now validates that `application/store` payloads are non-empty storage paths. ([#3785](#3785)) Payload uploads use the same resolved `ApiClient` as the trigger call (including `requestOptions.clientConfig`), not only the global `apiClientManager.client` — so custom `baseURL`, access token, and preview branch apply to both presign and trigger. - Update the bundled OpenTelemetry packages to their latest releases (`@opentelemetry/sdk-node` 0.218.0, `@opentelemetry/core` 2.7.1, `@opentelemetry/host-metrics` 0.38.3). ([#3810](#3810)) - Updated dependencies: - `@trigger.dev/core@4.5.0-rc.5` </details> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
The supervisor can now pause dequeuing - and freeze consumer-pool scale-up - when a backpressure signal says the cluster can't place more work, then ramp dequeuing back up gradually once it clears. The signal is a verdict published to a Redis key by a cluster-side component; the supervisor reads it on a short refresh and gates
preDequeueon it.Off by default (
TRIGGER_DEQUEUE_BACKPRESSURE_ENABLED). Everything fails open: a missing, stale, or unreadable verdict never pins the brake, and the hot-path read is a synchronous cached lookup with no I/O. The scale-up freeze leaves scale-down untouched, and on release the resume is ramped so a deep queue isn't hammered all at once.Dry-run is on by default (
TRIGGER_DEQUEUE_BACKPRESSURE_DRY_RUN): even once enabled it only logs what it would have done, and surfaces the computed state through metrics, until explicitly set to act. Prometheus:supervisor_backpressure_engaged,_dry_run,_skipped_dequeues_total.Refs TRI-5354