chore(webapp,redis-worker): make mollifier constants configurable#3822
Conversation
🦋 Changeset detectedLatest commit: 7458de2 The changes in this PR will be included in the next version bump. This PR includes changesets to release 25 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📜 Recent review details⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (22)
WalkthroughThis PR makes mollifier buffer and drainer internals configurable via environment variables and wiring. Twenty new TRIGGER_MOLLIFIER_* environment variables control stale-sweep bounds, ACK TTL, Redis retry/reconnect backoff, drainer poll/backoff/shutdown/gauge timing, idempotency claim timing, mutate-with-fallback polling/backoff, and metadata CAS retry backoff. The redis-worker library types and implementation are extended to accept these options; webapp initialization and worker code pass env values into MollifierBuffer, MollifierDrainer, and related components; and API routes thread the tunables into apply/mutate/claim flows. 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
789c916 to
84f98c3
Compare
@trigger.dev/build
trigger.dev
@trigger.dev/core
@trigger.dev/plugins
@trigger.dev/python
@trigger.dev/react-hooks
@trigger.dev/redis-worker
@trigger.dev/rsc
@trigger.dev/schema-to-json
@trigger.dev/sdk
commit: |
Expose every previously-hardcoded mollifier tunable (buffer ack TTL and Redis retry/reconnect, drainer poll interval and backoff, idempotency claim TTL/wait/poll, mutate-fallback wait loop, metadata CAS retries, stale-sweep scan bounds, draining-gauge interval) via TRIGGER_MOLLIFIER_* env vars, each defaulting to its prior hardcoded value so behaviour is unchanged unless overridden. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two box-diagram docs in the mollifier dir for onboarding and tuning: TRIP.md covers ingress (gate → rate counter → buffer) and DRAINER.md covers egress (Redis → drainer fan-out → Postgres), each annotating the env-var lever on every edge. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
84f98c3 to
1947020
Compare
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Devin Review found 1 potential issue.
⚠️ 1 issue in files not directly in the diff
⚠️ Incomplete wiring: reschedule and tags routes don't pass configurable mutateWithFallback parameters (apps/webapp/app/routes/api.v1.runs.$runParam.reschedule.ts:101-149)
The PR makes mutateWithFallback poll/backoff knobs configurable via env vars (TRIGGER_MOLLIFIER_MUTATE_SAFETY_NET_MS, TRIGGER_MOLLIFIER_MUTATE_POLL_STEP_MS, TRIGGER_MOLLIFIER_MUTATE_MAX_POLL_STEP_MS, TRIGGER_MOLLIFIER_MUTATE_BACKOFF_FACTOR). The cancel route (api.v2.runs.$runParam.cancel.ts:66-69) was updated to pass these from appEnv, but two other production callers of mutateWithFallback were not: api.v1.runs.$runParam.reschedule.ts:101 and api.v1.runs.$runId.tags.ts:64. Those routes still use the hardcoded defaults in mutateWithFallback.server.ts. If an operator tunes, say, TRIGGER_MOLLIFIER_MUTATE_SAFETY_NET_MS to 5000ms, the cancel route respects the override while reschedule and tags routes continue using 2000ms — an inconsistent operator experience that contradicts the PR's stated intent of making these internals configurable.
View 4 additional findings in Devin Review.
## Summary 1 new feature, 8 improvements, 1 bug fix. ## Highlights - Add optional `shouldPauseScaling` to the supervisor consumer pool scaling options to freeze scale-up while it returns true (scale-down stays allowed). ([#3836](#3836)) ## Improvements - The MCP server no longer tells the AI agent to wait for a run to complete after every `trigger_task` call. Waiting is now opt-in: the agent only waits when you ask it to (for example "trigger and then wait for it to finish"). This avoids burning tokens polling runs you didn't need to block on and keeps responses clearer. ([#3838](#3838)) - Update the bundled OpenTelemetry packages to their latest releases (`@opentelemetry/sdk-node` 0.218.0, `@opentelemetry/core` 2.7.1, `@opentelemetry/host-metrics` 0.38.3). ([#3810](#3810)) - `envvars.upload` now accepts an optional `isSecret` flag, letting you create the imported variables as secret (redacted) environment variables. When omitted, variables default to non-secret. ([#3809](#3809)) - Offload large trigger payloads to object storage before sending the trigger API request. The SDK uploads packets at or above the existing 128KB limit and sends an `application/store` pointer instead of embedding large JSON in the request body. `TriggerTaskRequestBody` now validates that `application/store` payloads are non-empty storage paths. ([#3785](#3785)) - Make mollifier buffer and drainer internals configurable. `MollifierBuffer` now accepts `ackGraceTtlSeconds`, `maxRetriesPerRequest`, `reconnectStepMs`, and `reconnectMaxMs` options, and `MollifierDrainer` accepts `maxBackoffMs` and `backoffFloorMs`. All default to their previous hardcoded values, so existing behaviour is unchanged. ([#3822](#3822)) - `MollifierDrainer` accepts a `drainBatchSize` option (default 1) that controls how many entries are popped per env per tick — in-flight handlers remain capped by the global `concurrency`. `MollifierBuffer` also gains `getDrainingCount()` / `listStaleDraining()`, backed by a new `mollifier:draining` ZSET maintained atomically with pop/ack/fail/requeue (observability-only). ([#3797](#3797)) - Adds AI SDK 7 support. The `ai` peer range now includes v7, and the `chat.agent` / chat surfaces work against v7's ESM-only build. On v7, install `@ai-sdk/otel` alongside `ai` and the SDK registers it for you so `experimental_telemetry` spans keep flowing into your run traces (v7 stopped emitting them from `ai` core). v5 and v6 keep working unchanged. ([#3833](#3833)) - `useTriggerChatTransport` now recovers when restored session state points at a session that no longer exists in the current environment ([#3816](#3816)) ## Bug fixes - Fix `@trigger.dev/core` build: cast the underlying log record exporter when calling `forceFlush` so it typechecks against the updated OpenTelemetry `LogRecordExporter` type (which no longer declares `forceFlush`). ([#3829](#3829)) <details> <summary>Raw changeset output</summary>⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ `main` is currently in **pre mode** so this branch has prereleases rather than normal releases. If you want to exit prereleases, run `changeset pre exit` on `main`.⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ # Releases ## @trigger.dev/build@4.5.0-rc.5 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.5.0-rc.5` ## trigger.dev@4.5.0-rc.5 ### Patch Changes - The MCP server no longer tells the AI agent to wait for a run to complete after every `trigger_task` call. Waiting is now opt-in: the agent only waits when you ask it to (for example "trigger and then wait for it to finish"). This avoids burning tokens polling runs you didn't need to block on and keeps responses clearer. ([#3838](#3838)) - Update the bundled OpenTelemetry packages to their latest releases (`@opentelemetry/sdk-node` 0.218.0, `@opentelemetry/core` 2.7.1, `@opentelemetry/host-metrics` 0.38.3). ([#3810](#3810)) - Updated dependencies: - `@trigger.dev/core@4.5.0-rc.5` - `@trigger.dev/build@4.5.0-rc.5` - `@trigger.dev/schema-to-json@4.5.0-rc.5` ## @trigger.dev/core@4.5.0-rc.5 ### Patch Changes - Add optional `shouldPauseScaling` to the supervisor consumer pool scaling options to freeze scale-up while it returns true (scale-down stays allowed). ([#3836](#3836)) - Fix `@trigger.dev/core` build: cast the underlying log record exporter when calling `forceFlush` so it typechecks against the updated OpenTelemetry `LogRecordExporter` type (which no longer declares `forceFlush`). ([#3829](#3829)) - `envvars.upload` now accepts an optional `isSecret` flag, letting you create the imported variables as secret (redacted) environment variables. When omitted, variables default to non-secret. ([#3809](#3809)) ```ts await envvars.upload("proj_1234", "prod", { variables: { STRIPE_SECRET_KEY: "sk_live_..." }, isSecret: true, }); ``` - Offload large trigger payloads to object storage before sending the trigger API request. The SDK uploads packets at or above the existing 128KB limit and sends an `application/store` pointer instead of embedding large JSON in the request body. `TriggerTaskRequestBody` now validates that `application/store` payloads are non-empty storage paths. ([#3785](#3785)) Payload uploads use the same resolved `ApiClient` as the trigger call (including `requestOptions.clientConfig`), not only the global `apiClientManager.client` — so custom `baseURL`, access token, and preview branch apply to both presign and trigger. - Update the bundled OpenTelemetry packages to their latest releases (`@opentelemetry/sdk-node` 0.218.0, `@opentelemetry/core` 2.7.1, `@opentelemetry/host-metrics` 0.38.3). ([#3810](#3810)) ## @trigger.dev/plugins@4.5.0-rc.5 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.5.0-rc.5` ## @trigger.dev/python@4.5.0-rc.5 ### Patch Changes - Updated dependencies: - `@trigger.dev/sdk@4.5.0-rc.5` - `@trigger.dev/core@4.5.0-rc.5` - `@trigger.dev/build@4.5.0-rc.5` ## @trigger.dev/react-hooks@4.5.0-rc.5 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.5.0-rc.5` ## @trigger.dev/redis-worker@4.5.0-rc.5 ### Patch Changes - Make mollifier buffer and drainer internals configurable. `MollifierBuffer` now accepts `ackGraceTtlSeconds`, `maxRetriesPerRequest`, `reconnectStepMs`, and `reconnectMaxMs` options, and `MollifierDrainer` accepts `maxBackoffMs` and `backoffFloorMs`. All default to their previous hardcoded values, so existing behaviour is unchanged. ([#3822](#3822)) - `MollifierDrainer` accepts a `drainBatchSize` option (default 1) that controls how many entries are popped per env per tick — in-flight handlers remain capped by the global `concurrency`. `MollifierBuffer` also gains `getDrainingCount()` / `listStaleDraining()`, backed by a new `mollifier:draining` ZSET maintained atomically with pop/ack/fail/requeue (observability-only). ([#3797](#3797)) - Updated dependencies: - `@trigger.dev/core@4.5.0-rc.5` ## @trigger.dev/rsc@4.5.0-rc.5 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.5.0-rc.5` ## @trigger.dev/schema-to-json@4.5.0-rc.5 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.5.0-rc.5` ## @trigger.dev/sdk@4.5.0-rc.5 ### Patch Changes - Adds AI SDK 7 support. The `ai` peer range now includes v7, and the `chat.agent` / chat surfaces work against v7's ESM-only build. On v7, install `@ai-sdk/otel` alongside `ai` and the SDK registers it for you so `experimental_telemetry` spans keep flowing into your run traces (v7 stopped emitting them from `ai` core). v5 and v6 keep working unchanged. ([#3833](#3833)) - `useTriggerChatTransport` now recovers when restored session state points at a session that no longer exists in the current environment ([#3816](#3816)) - Offload large trigger payloads to object storage before sending the trigger API request. The SDK uploads packets at or above the existing 128KB limit and sends an `application/store` pointer instead of embedding large JSON in the request body. `TriggerTaskRequestBody` now validates that `application/store` payloads are non-empty storage paths. ([#3785](#3785)) Payload uploads use the same resolved `ApiClient` as the trigger call (including `requestOptions.clientConfig`), not only the global `apiClientManager.client` — so custom `baseURL`, access token, and preview branch apply to both presign and trigger. - Update the bundled OpenTelemetry packages to their latest releases (`@opentelemetry/sdk-node` 0.218.0, `@opentelemetry/core` 2.7.1, `@opentelemetry/host-metrics` 0.38.3). ([#3810](#3810)) - Updated dependencies: - `@trigger.dev/core@4.5.0-rc.5` </details> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Summary
The mollifier had ~21 behavioural constants baked in as hardcoded values — the buffer's ack-grace TTL and Redis retry/reconnect tuning, the drainer's poll interval and backoff envelope, the pre-gate idempotency claim TTL/wait/poll, the buffered-run mutate-with-fallback wait loop, the metadata CAS retry budget and backoff, the stale-sweep scan bounds, and the draining-gauge interval. None could be adjusted without a code change, which makes tuning the system under production load impossible.
This exposes all of them as
TRIGGER_MOLLIFIER_*environment variables, each defaulting to its previous hardcoded value. Behaviour is identical unless an operator sets a var, so it's a safe no-op deploy.Design
The package-level classes (
MollifierBuffer,MollifierDrainerin@trigger.dev/redis-worker) gain optional constructor options defaulting to the old constants — backward compatible, hence a patch changeset. The webapp factories and worker bootstraps read the env and pass them through. The route- and concern-level pure helpers (mutate-with-fallback, metadata mutation, idempotency claim, stale-sweep state) keep their existing?? DEFAULToption fallbacks and are fed env values at their call sites, so they stay unit-testable without importingenv.server.Test plan
@trigger.dev/redis-workerbuildsTRIGGER_MOLLIFIER_*env var names match ops conventions