Skip to content

(WIP) proof of concept: feat(server): heap-safe instance cache + opt-in blueprint pooling (one instance per schema-shape)#1325

Open
yyyyaaa wants to merge 6 commits into
mainfrom
feat/scale-phase0
Open

(WIP) proof of concept: feat(server): heap-safe instance cache + opt-in blueprint pooling (one instance per schema-shape)#1325
yyyyaaa wants to merge 6 commits into
mainfrom
feat/scale-phase0

Conversation

@yyyyaaa

@yyyyaaa yyyyaaa commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Summary

Makes the multi-tenant PostGraphile v5 server survive and scale on small heaps. Two layers:

  1. Availability hardening (always on): heap-aware instance cache with safe eviction — the server degrades gracefully under tenant churn instead of OOMing.
  2. Blueprint pooling (opt-in, GRAPHILE_BLUEPRINT_POOLING=1): one shared PostGraphile instance per schema-shape, routed per request via search_path — collapsing N same-shape tenants from N×~0.5GB to one instance.

Also fixes three latent multi-tenant bugs found during the audit (one live RLS bypass).

Problem

Each query-serving PostGraphile instance retains ~0.5GB of heap (measured; ~51% strings, ~30% plan closures). The cache was keyed per svc_key (per tenant×API) with max: 50, TTL 1 year — a steady state of ~24GB for a fleet that runs in 2GB containers. Memory grew linearly with tenants and the process OOM'd long before "thousands of tenants."

What's included

1. Cache + eviction hardening (graphile-cache, graphql/server)

  • Heap-aware default cap: clamp(⌊heap×0.5 / 512MB⌋, 3, 50) instead of a fixed 50 (GRAPHILE_CACHE_MAX, GRAPHILE_CACHE_INSTANCE_HEAP_BYTES override/tune); resolved cap logged at startup.
  • Entry-identity disposal guard (fixes a same-key disposal race), pool-coupling by entry.dbname (the old substring match never fired).
  • Eviction drain: in-flight requests are refcounted (invokeEntryHandler); disposeEntry waits for them (bounded by GRAPHILE_CACHE_DRAIN_TIMEOUT_MS, default 30s) before pgl.release() — eviction can no longer tear a schema down mid-request.
  • Build admission control: cross-key builds serialize through a semaphore (GRAPHILE_BUILD_CONCURRENCY, default 1) and evict the LRU instance before building, so the build's transient peak lands on freed headroom.
  • Prod idle TTL 1 year → 6h; GraphiQL (ruru) off outside development (GRAPHILE_GRAPHIQL=true to force).

2. Multi-tenant bug fixes

  • graphile-i18n: live RLS bypass — the localeStrings query ran withPgClient(null, …) (no role, no claims). Now threads the request's pgSettings. Also resets the type cache per build (module-singleton leak).
  • meta-schema: cachedTablesMeta was a module global — concurrent builds could serve each other's _meta. Now keyed per build via WeakMap (build objects are frozen; the flat global remains only for single-build codegen consumers, documented).
  • graphile-llm agent-discovery: config cache was keyed by dbname with a LIMIT 1, no tenant filter — cross-tenant bleed in shared-DB topologies. Now filtered and keyed by database_id.
  • graphile-presigned-url: storage-module resolution matched build-time physical schema names; now matches logical names (hash prefix stripped) so it works under pooling and across re-hashed schemas.

3. Blueprint pooling (opt-in)

  • Key: bp:sha256({sorted logical schemas, shape fingerprint, database settings flags, api name, mode, dbname}). The shape fingerprint hashes the catalog's [logical schema, relname] pairs, so tenants that drifted (e.g. a half-provisioned tenant) automatically get their own instance. dbname is included so same-shape tenants in different physical databases never share a pool.
  • Mechanism: shared instances build with the stock gather: { pgIdentifiers: 'unqualified' } (search_path-relative SQL — GraphQL SDL is byte-identical to qualified builds, sha256-verified) plus schema: { constructiveUnqualified: true } so Constructive plugins (search chunk refs + BM25 index name, llm RAG chunk query, i18n localeStrings) emit search_path-relative SQL for tenant data. Control-plane (metaschema_*, services_public) references stay fully qualified by design.
  • Routing: grafast.context reads roles from req.api (de-closured in all modes) and, on pooled instances only, sets pgSettings.search_path = requesting tenant's physical schemas + public last (shared domains/extensions — SECURITY DEFINER functions like sign_in depend on it).
  • Safety fallbacks to today's per-tenant instances: realtime-enabled APIs; empty schema lists; unqualified relation-name collisions within the schema set (e.g. the intentional identity_providers table/view shadow — detected by catalog probe, logged, per-tenant instance used); failed probes (not memoized — re-probed next request).
  • Invalidation (v1): any tenant schema:update flushes all pooled instances + cached decisions (rebuilds are ~1–2s for tenant APIs); the manual /flush route does the same.
  • Flag off is verified behavior-identical (existing middleware suites unchanged; plugin emissions byte-identical).

Verification (isolated rig: full constructive-db schema + 8 seeded marketplace tenants as hashed schemas, server at 2GB heap)

  • Gate: SDL qualified vs unqualified — byte-identical (same sha256). Zero-bleed spike at the engine level — one unqualified instance served other tenants purely via search_path, canaries clean.
  • Flag OFF regression: 40/40 requests, one instance per host, no pooling lines; separately 360/360 requests across two eviction soaks with 251+ pre-build evictions, 0 drain timeouts, 0 5xx.
  • Flag ON: 7 same-shape tenants → one shared instance (6 builds → 1 for the same hosts; 56 requests served by one build); the half-provisioned tenant automatically split to its own instance by fingerprint. Zero-bleed via HTTP-authenticated canary queries through the shared instance (10 interleaved rounds; cross-token control correctly unauthenticated). RSS for the same traffic: 1475MB (per-tenant) → 711MB (pooled). Collision fallback observed live (warn + per-tenant instance). Pooled login verified end-to-end (signIn → bearer → tenant data).
  • Adversarial review: 3 dimensions (correctness / tenant isolation / perf-ops), 19 findings adjudicated by refute-by-default verifiers → 9 confirmed → 6 fixed in this PR (search_path public drop, dbname in key, transient-probe memoization, /flush bp: gap, double catalog scan, _meta physical-name leak), 3 documented below.

Env vars

Var Default Purpose
GRAPHILE_BLUEPRINT_POOLING off Opt-in instance sharing per schema-shape
GRAPHILE_CACHE_MAX heap-aware Max cached instances
GRAPHILE_CACHE_INSTANCE_HEAP_BYTES 512MB Per-instance estimate for the heap-aware cap
GRAPHILE_CACHE_TTL_MS 6h prod / 5m dev Idle TTL
GRAPHILE_CACHE_DRAIN_TIMEOUT_MS 30s Max wait for in-flight requests before release
GRAPHILE_BUILD_CONCURRENCY 1 Concurrent schema builds
GRAPHILE_GRAPHIQL off in prod Force-enable GraphiQL

Known limitations / follow-ups

  • Fingerprint granularity: relation names only (not columns/functions/enum labels). Same-relname column drift between pooled tenants is not detected by the key; exposure is bounded by flush-all-on-schema:update. Follow-up: extend the fingerprint to attributes/procs.
  • v1 flush semantics are coarse (any tenant migration flushes all pooled instances). Cheap rebuilds make this acceptable; follow-up: blueprint-membership tracking for targeted eviction.
  • Build semaphore has no wall-clock timeout — a pathologically slow build delays queued builds (requests on cached instances are unaffected). Follow-up: per-build timeout + 503.
  • Headroom accounting counts cached instances only, not still-draining ones (bounded by the 30s drain timeout).
  • Realtime APIs are excluded from pooling (per-instance LISTEN topology); pooling them needs the channel scheme rework.
  • The /flush route's missing auth is pre-existing (TODO in code).

Rollout

  1. Ship with flag off (behavior-identical; hardening + bug fixes active).
  2. Staging: GRAPHILE_BLUEPRINT_POOLING=1, watch [pooling] logs (attach vs build), instance counts, RSS.
  3. Production canary on public API pods; auth/admin endpoints pool automatically where collision-free.

🤖 Generated with Claude Code

https://claude.ai/code/session_0122xqM2VkNbuAmZshK1YNSb

yyyyaaa and others added 6 commits July 2, 2026 12:43
…of 50

Root cause of the schema-builder (public cnc server) heap OOM: each
PostGraphile v5 instance that has served a GraphQL request retains ~0.5 GB
of heap (fully-materialised schema + grafast plan machinery; a build-only
instance is far smaller). graphileCache capped entries at a fixed 50, so the
steady-state resident set was ~50 x 0.5 GB ~= 24 GB -- far beyond the heap --
and the process OOM'd as distinct app hosts filled the cache over days.
Eviction was empirically confirmed to free instances correctly; the count cap
was simply far too large for the per-instance footprint.

- getCacheConfig: heap-aware default for GRAPHILE_CACHE_MAX -- budget ~50% of
  the V8 heap limit at ~0.5 GB/instance, clamped to [3, 50], instead of a fixed
  50. Override with GRAPHILE_CACHE_MAX; tune the per-instance estimate with
  GRAPHILE_CACHE_INSTANCE_HEAP_BYTES. The resolved cap is logged at startup.
- disposeEntry: guard double-disposal by ENTRY IDENTITY (WeakSet) instead of by
  cache key. The key-scoped guard skipped pgl.release() for a rebuilt entry that
  shared a key with an entry still mid-release (same-key disposal race), proven
  via a repro harness (1/12 -> 12/12 disposals run). Also close the http.Server
  unconditionally -- it is never .listen()ed, so the old `.listening` guard was
  dead -- and drop the now-needless key bookkeeping.
- pgCache cleanup callback: match entries by entry.dbname === poolKey (pools are
  keyed by database name) instead of `cacheKey.includes(poolKey)`; cacheKey is
  the request host and never contained the db name, so that safety valve was
  dead. dbname is threaded onto the entry via createGraphileInstance.
- Add regression tests for the disposal guard and the heap-aware cap.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01H3mDDgX8z6dE7kyaMERhin
… GraphiQL gating

- disposeEntry drains in-flight requests (refcounted via invokeEntryHandler,
  bounded by GRAPHILE_CACHE_DRAIN_TIMEOUT_MS, default 30s) before pgl.release(),
  so eviction can no longer tear down a schema mid-request.
- All handler invocations in the graphile middleware go through
  invokeEntryHandler; disposing entries are treated as cache misses.
- Global BuildSemaphore (GRAPHILE_BUILD_CONCURRENCY, default 1) serializes
  cross-key schema builds; ensureCacheHeadroom evicts the LRU instance BEFORE
  each build so the build's transient peak lands on freed headroom.
- Prod idle TTL drops from 1 year to 6h (GRAPHILE_CACHE_TTL_MS still overrides).
- GraphiQL (ruru) only in development or with GRAPHILE_GRAPHIQL=true.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_0122xqM2VkNbuAmZshK1YNSb
…LS), reset type cache per build

withPgClient(null, ...) executed the localeStrings query with no role and no
jwt.claims — bypassing RLS on every translation read. Thread pgSettings from
the grafast context into the runtime query (same pattern as graphile-llm's
rag-plugin). Also reset localeTypeCache in init alongside i18nRegistry so the
module-singleton I18nPlugin export cannot leak GraphQLObjectTypes across
schema rebuilds.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_0122xqM2VkNbuAmZshK1YNSb
…not module global

Concurrent PostGraphile builds in one process interleave init/fields hooks, so
the module-global cachedTablesMeta could bake build B's tables into build A's
_meta resolver. Store per-build via WeakMap (build objects are frozen by
graphile-build, so no own-property). Flat global retained solely for
single-build codegen consumers (graphile-schema buildIntrospectionJSON,
codegen DatabaseSchemaSource), documented as such.

Also export invokeEntryHandler/ensureCacheHeadroom from graphile-cache barrel.

Verified: graphile-settings 158/158 tests, 3 snapshots identical, against live
PG on :5433.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_0122xqM2VkNbuAmZshK1YNSb
… per schema-shape

GRAPHILE_BLUEPRINT_POOLING=1 keys the instance cache by a blueprint hash
(sorted logical schema names + shape fingerprint over the catalog's
[schema,table] pairs + database settings flags) instead of per-tenant svc_key,
builds shared instances with stock gather.pgIdentifiers='unqualified', and
routes each request via pgSettings search_path (requesting tenant's physical
schemas, double-quoted). Safety fallbacks to today's per-tenant instances:
realtime-enabled APIs, empty schema lists, unqualified relation-name
collisions within the schema set (e.g. identity_providers table/view shadow),
or failed catalog probes. Decisions memoized per svc_key; schema:update
flushes all pooled instances + decisions (v1 semantics).

Plugins honor schema.constructiveUnqualified for tenant-data SQL (search
chunk refs + BM25 index name, llm RAG chunk query, i18n localeStrings) while
control-plane metaschema references stay fully qualified. presigned-url
resolves storage modules by logical schema name; llm agent-discovery is
tenant-filtered and keyed by database_id (fixes a LIMIT 1 cross-tenant bleed).

grafast.context now reads role/anonRole from req.api in all modes (de-closured).
Flag off = behavior-identical (verified: 61 pre-existing middleware tests
unchanged; plugin suites byte-identical emissions).

Gate evidence: SDL qualified-vs-unqualified byte-identical (sha256 match);
zero-bleed proven on live hashed-schema tenants via per-request search_path.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_0122xqM2VkNbuAmZshK1YNSb
…ixes for pooling

- tenantSearchPath: keep shared 'public' LAST on the pooled search_path.
  Replacing the path with only tenant schemas broke SECURITY DEFINER auth
  functions without their own SET search_path (sign_in's ::email cast →
  'type email does not exist', HTTP 500 on every pooled login). Verified
  fixed live: pooled signIn returns a token; authenticated data reads flow
  through the shared instance.
- computeBlueprintKey now includes dbname: same-shape tenants in DIFFERENT
  physical databases must never share an instance (its pool targets one DB).
- Transient catalog-probe failures are no longer memoized as permanent
  per-tenant fallbacks — next request re-probes.
- Manual /flush route now also clears bp: entries + pooling decisions.
- Single catalog scan feeds both shape fingerprint and collision check
  (was two identical pg_class scans per decision).
- _meta reports LOGICAL schema names on pooled instances (stops leaking the
  representative tenant's hashed schema identifier to other tenants).

W3 rig evidence: 7 same-shape tenants → 1 shared instance (6 builds → 1),
tenant2 auto-split by shape fingerprint, zero-bleed via HTTP-authenticated
canaries (10 interleaved rounds + cross-token control), RSS 1475MB → 711MB,
collision fallback fires with warn + per-tenant instance.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_0122xqM2VkNbuAmZshK1YNSb
@yyyyaaa yyyyaaa changed the title feat(server): heap-safe instance cache + opt-in blueprint pooling (one instance per schema-shape) (WIP) proof of concept: feat(server): heap-safe instance cache + opt-in blueprint pooling (one instance per schema-shape) Jul 2, 2026
: `"${schemaName}"."${baseTable}"`;
const translationTableRef = constructiveUnqualified
? `"${translationTable}"`
: `"${schemaName}"."${translationTable}"`;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's aim to use this where we can instead of just quoting stuff w/o checking:

https://www.npmjs.com/package/@pgsql/quotes

QuoteUtils.quoteQualifiedIdentifier('public', 'my_table');

acm.task_table_name
FROM metaschema_modules_public.agent_chat_module acm
JOIN metaschema_public.schema s ON s.id = acm.schema_id
WHERE s.database_id = $1

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is great, not sure if we can separate a few smaller PRs, would be great :)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if not, I also get it, we can take a look together then.

// (schema.constructiveUnqualified), emit search_path-relative references for
// tenant-data tables/indexes so the per-request search_path resolves the
// tenant schema. Default (flag absent): fully schema-qualified, byte-identical.
const constructiveUnqualified = !!((build as any)?.options?.constructiveUnqualified);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait, this means using unqualified schemas?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants