SynaptExtraction is the intermediate language (IL) for synapt's product stack. It is the universal exchange format between text extraction and intelligence operations.
Any text + Any LLM -> SynaptExtraction (IL) -> @synapt/memory (intelligence)
This repo contains the v1 schema, types, validators, finalization pipeline, and composable prompt system in both TypeScript and Python.
| Package | Registry | Install |
|---|---|---|
@synapt-dev/extract |
npm | npm install @synapt-dev/extract@0.5.0 |
synapt-extract |
PyPI | pip install synapt-extract==0.5.0 |
Deno:
import { buildExtractionPrompt } from "npm:@synapt-dev/extract@0.5.0";Version pinning: Always pin to an exact version (@0.5.0, ==0.5.0). Do not use ranges (^0.5.0, ~0.5.0, >=0.5.0). The IL schema evolves across minor versions (v1.1 added produced_by object form, v1.2 added 8 new fields). Pinning prevents unexpected schema changes from affecting your extraction pipeline.
import {
buildExtractionPrompt,
finalizeExtraction,
validateExtraction,
} from "@synapt-dev/extract";
// 1. Build a prompt for your LLM
const prompt = buildExtractionPrompt(text, {
profile: "standard",
categories: ["Health", "Family"],
});
// 2. Send to any LLM, parse JSON response
const llmOutput = JSON.parse(await llm.complete(prompt));
// 3. Finalize: inject client context, normalize, validate
const result = finalizeExtraction(llmOutput, {
produced_by: "openai://gpt-4o-mini",
user_id: userId,
kind: "conversa/prayer",
});
console.log(result.extraction); // Complete SynaptExtraction
console.log(result.validation); // { valid: true, errors: [] }from synapt_extract import (
build_extraction_prompt,
finalize_extraction,
FinalizeContext,
)
# 1. Build a prompt
prompt = build_extraction_prompt(text, profile="standard")
# 2. Send to any LLM, parse JSON response
llm_output = json.loads(llm.complete(prompt))
# 3. Finalize
result = finalize_extraction(llm_output, FinalizeContext(
produced_by="openai://gpt-4o-mini",
user_id=user_id,
kind="conversa/prayer",
))
assert result.validation.validSynaptExtraction documents are assembled in three stages:
- Stage 1 (LLM): The LLM extracts content fields (entities, goals, themes, etc.) from text
- Stage 2 (Client): Your application injects context the LLM can't know (produced_by, user_id, embeddings, extensions)
- Stage 3 (Library):
finalizeExtraction()normalizes the document (version injection, capability detection, sub-schema versioning, validation)
| Profile | Model class | Capabilities (25 total) |
|---|---|---|
minimal |
3B-7B local | entities, entity_state, goals, themes, summary |
standard |
GPT-4o-mini, Haiku | + entity_context, goal_timing, facts, temporal_refs, sentiment, evidence_anchoring |
full |
GPT-4o, Sonnet, Opus | + entity_ids, goal_entity_refs, keywords, structured_sentiment, questions, actions, decisions, relations, relation_origin, assertion_signals, temporal_classes, language, source_metadata, confidence |
Use the builder when the model API supports structured output. It resolves capabilities once, then builds the matching prompt, Stage 1 JSON schema, OpenAI response format, finalized packet schema, and optional finalization context.
import { createExtractionBuilder } from "@synapt-dev/extract";
const builder = createExtractionBuilder(text, { profile: "standard" })
.addCapabilities(["entity_ids", "goal_entity_refs"])
.withExtractedAt("2026-05-11T18:00:00Z")
.withProducedBy({
model: "openai://gpt-5.5",
model_version: "gpt-5.5-2026-04-23",
configuration: { reasoning_effort: "medium" },
operator: "synapt-dev",
})
.withSource({ source_id: "note-1", source_type: "note" });
const built = builder.build({ name: "synapt_extract_stage1" });
// Send built.prompt and built.responseFormat to the model.
// Then call builder.finalize(stage1Json) or finalizeExtraction(stage1Json, built.finalizeContext).from synapt_extract import create_extraction_builder
builder = (
create_extraction_builder(text, profile="standard")
.add_capabilities(["entity_ids", "goal_entity_refs"])
.with_extracted_at("2026-05-11T18:00:00Z")
.with_produced_by({
"model": "openai://gpt-5.5",
"model_version": "gpt-5.5-2026-04-23",
"configuration": {"reasoning_effort": "medium"},
"operator": "synapt-dev",
})
.with_source(source_id="note-1", source_type="note")
)
built = builder.build(name="synapt_extract_stage1")
# Send built["prompt"] and built["response_format"] to the model.
# Then call builder.finalize(stage1_json) or finalize_extraction(stage1_json, builder.finalize_context()).buildExtractionSchema() / build_extraction_schema() return the semantic Stage 1 schema. buildExtractionResponseFormat() / build_extraction_response_format() return an OpenAI-compatible json_schema response format; strict mode requires every object property as OpenAI expects and represents semantic optional fields as nullable. buildFinalizedExtractionSchema() / build_finalized_extraction_schema() return the finalized packet shape, including produced_by, source context, capabilities, embeddings, and extensions.
Use extract() when you want the library to execute the full pipeline while your application owns the model and embedding API calls. The callbacks receive typed JSON requests, so callers can route to OpenAI, another provider, a local model, or a test fixture. Both packages export callback contracts such as LlmCallback, LlmRequest, LlmResponse, EmbeddingCallback, EmbeddingRequest, and EmbeddingResponse.
For OpenAI-compatible clients, use the thin adapter instead of writing callbacks by hand. The library still does not own credentials or provider setup; pass a caller-owned client and optional artifact directory.
import OpenAI from "openai";
import { createExtractionBuilder } from "@synapt-dev/extract";
import { extractOpenAI } from "@synapt-dev/extract/openai";
const builder = createExtractionBuilder(text)
.full({ embed: true })
.withSource({ source_id: "note-1", source_type: "note" });
const result = await extractOpenAI(text, new OpenAI(), {
...builder.extractOptions(),
model: "gpt-5.5",
reasoningEffort: "medium",
embeddingModel: "text-embedding-3-small",
artifactDirectory: "./artifacts",
});from openai import OpenAI
from synapt_extract import create_extraction_builder, extract_openai
builder = (
create_extraction_builder(text)
.full(embed=True)
.with_source(source_id="note-1", source_type="note")
)
result = await extract_openai(
text,
OpenAI(),
**builder.extract_options(),
model="gpt-5.5",
reasoning_effort="medium",
embedding_model="text-embedding-3-small",
artifact_dir="./artifacts",
)The returned result includes artifactBundle / artifact_bundle. TypeScript exports the Node artifact writer at @synapt-dev/extract/artifacts; Python exports write_artifact_bundle() from synapt_extract.
import { createExtractionBuilder, extract } from "@synapt-dev/extract";
const builder = createExtractionBuilder(text)
.full({ embed: true })
.withSource({ source_id: "note-1", source_type: "note" });
const result = await extract(text, {
callLlm: async (request) => {
// Send request.messages and request.responseFormat to your model provider.
return {
output: stage1Json,
produced_by: { model: "openai://gpt-5.5" },
};
},
getEmbedding: async (request) => {
// Embed request.text and record the provider URI for the embedding model.
return { vector, model: "openai://text-embedding-3-small" };
},
}, {
...builder.extractOptions(),
extend: ({ response, stage1, embeddings }) => ({
"synapt/response_binding": {
response_id: response.id,
response_model: response.model,
stage1_fields: Object.keys(stage1).length,
embedding_count: embeddings.length,
},
}),
});from synapt_extract import create_extraction_builder, extract
builder = (
create_extraction_builder(text)
.full(embed=True)
.with_source(source_id="note-1", source_type="note")
)
result = await extract(
text,
{
"call_llm": call_llm,
"get_embedding": get_embedding,
},
**builder.extract_options(),
extend=lambda ctx: {
"synapt/response_binding": {
"response_id": ctx["response"].get("id"),
"response_model": ctx["response"].get("model"),
"stage1_fields": len(ctx["stage1"]),
"embedding_count": len(ctx["embeddings"]),
}
},
)Profile helpers keep the public UX short: .full({ embed: true }) resolves the full capability set and standard embedding inputs, including the source text. Capability entries can still be plain strings or { name, embed: true } specs for lower-level control. embeddingInputs: "all" / embedding_inputs="all" remains available for exhaustive tests and computes embeddings for source, summary, entities, goals, themes, keywords, facts, questions, actions, decisions, temporal refs, and sentiment when those fields exist. Embeddings are opt-in; if no embedding inputs are requested and no capability has embed: true, no embedding API call is made.
The extend resolver runs after the LLM response is parsed and embeddings are computed, but before finalization. It receives a normalized response envelope (response.provider, response.id, response.status, response.model, response.stop_reason, response.usage, and response.raw), so extensions can depend on provider output without knowing the provider's raw response shape. Raw OpenAI Responses and Anthropic Messages objects are translated automatically when returned as raw; callers can pass responseTranslator for custom providers. If produced_by is omitted, the runner can derive openai://... or anthropic://... producer metadata from normalized provider/model fields. Returned extension objects are merged over static extensions and receive version: "1" during finalization.
For preflight UX, builders expose profile helpers and a plan:
const plan = createExtractionBuilder(text)
.full({ embed: true })
.minus("questions")
.embed("summary", false)
.plan();plan() reports resolved capabilities, excluded capabilities, embedding inputs, requested-but-not-embedded capabilities, required callbacks, and prompt character count. extractOptions() / extract_options() returns the same resolved run contract for extract() and the OpenAI adapters.
The canonical schema is hosted at:
https://synapt.dev/schemas/extract/v1.json
Sub-schemas: source-ref/v1.json, embedding/v1.json, assertion-signals/v1.json, temporal-ref/v1.json, producer/v1.json, entity/v1.json, goal/v1.json, question/v1.json, action/v1.json, decision/v1.json, sentiment/v1.json, source-metadata/v1.json.
v1.x schema updates are additive only. Breaking changes require v2.
- v1.0 documents remain valid under v1.2 validators
produced_byaccepts string (v1.0) or SynaptProducer object (v1.1+)sentimentaccepts string (v1.0) or SynaptSentiment object (v1.2+)- Readers MUST branch on string vs object for both fields
Releases are published with Sigstore provenance via npm OIDC trusted publishing and PyPI trusted publishing. Each GitHub Release includes a CycloneDX SBOM (sbom.cdx.json).
extract/
packages/
ts/ # @synapt-dev/extract (TypeScript, npm)
python/ # synapt-extract (Python, PyPI)
schemas/ # JSON Schema files (language-agnostic)
prompts/
capabilities.json # Capability registry: order, dependencies, profiles, embedding inputs
v1/ # 25 capability prompt fragments + preamble/postamble
profiles/ # Profile definitions (minimal, standard, full)
tests/
python/ # Python test suite
conformance/ # Cross-language conformance fixtures
docs/ # Design documents
MIT