User Tracing#223
Open
mar-cf wants to merge 7 commits into
Open
Conversation
mar-cf
commented
Jun 24, 2026
Comment on lines
+126
to
+128
| # TEMPORARY | ||
| [patch.crates-io] | ||
| cf-rustracing = { git = "https://github.com/mar-cf/rustracing.git", branch = "user-tracing" } |
Replace the blanket `impl From<Span> for SharedSpan` — which always registered the span in the harness's `active_roots` — with an explicit `shared_span()` constructor, and migrate the internal call sites. This makes tracked construction a deliberate choice and sets up a later untracked variant for user spans.
Adds the user-span pipeline core: an `Untracked` `user_shared_span` constructor, the `start_user_trace`/`user_span` entry points and `UserSpanScope` guard on a separate `USER_HARNESS` (with a `USER_NOOP_HARNESS` fallback), plus the `add_user_span_tags!`/`add_user_span_log_fields!`/`set_user_span_finish_callback!` macros. Includes the test harness (`user_traces()` second sink) so user spans are observed independently of the internal pipeline. `start_user_trace` is name-only here; routing and inbound W3C continuation are layered on in later changes.
Adds a `user_span` slot to `TelemetryContext` (captured by `current()`, re-established by `scope()`, cloned across forks) plus `UserSpanScope::into_context()` and a parallel carry on `SpanScope::into_context()`. This lets a user span survive `.await`/`tokio::spawn` and ride along even when propagation goes through an internal span's context — no explicit threading. Verified by `propagates_across_await` and `user_span_carried_by_internal_context`.
Adds `SpanScope::with_user_span()` to open a parallel user span off an internal span (named after it), and a `user = true` option on `#[span_fn]` that does the same for whole functions (sync and async). Both are no-ops when no user trace is active. Covered by macro snapshot tests plus parallel and no-op runtime tests.
Points `cf-rustracing` at the fork branch that adds `RoutingMetadata` as a span property, needed by the user-tracing exporter and `start_user_trace` routing. Placeholder to be replaced by a normal version bump once the rustracing change is released.
Adds the per-process user pipeline — `UserTracingSettings`, `init_user`/`USER_HARNESS`, and the OTLP-over-UDS exporter that encodes `RoutingMetadata` into the `cf-trace-config` header — wired into `telemetry::init`. `start_user_trace` now takes a required `RoutingMetadata` attached at span construction and inherited by descendants (the exporter drops routing-less spans). Verified end-to-end by producer tests that decode the exported OTLP body.
Adds the `TraceparentContext` W3C parser and wires it through: `start_user_trace` gains an optional `inbound` traceparent that stitches the user root onto the upstream trace (shared trace id, inbound parent), and `user_tracing::w3c_traceparent()` derives the header for the current user span for outbound propagation. Covered by parser unit tests plus continuation tests through the test harness and the OTLP/UDS producer path.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
This stack adds a user-facing span pipeline to
foundations, parallel to the existing internal tracing pipeline. Application code can emit spans into a separateUSER_HARNESSthat exports OTLP HTTP over a Unix domain socket (gRPC unsupported) to an OTLP endpoint, with per-trace routing metadata carried on the wire and W3Ctraceparentcontinuation in/out.The design is deliberately a mirror of internal tracing onto a second harness: every user API (
user_span,start_user_trace,add_user_span_tags!,TelemetryContext.user_span, …) is theget_user()twin of an existingget()API. Two hard rules shape the surface:There are two layers to keep separate:
USER_HARNESS+ the exporter, pointed at the OTLP endpoint's socket.start_user_trace(...)opens a root. Without a root, everyuser_span/with_user_span()/#[span_fn(user = true)]/add_user_span_tags!is a no-op.User guide
Settings (init-time)
User tracing is configured via one optional block, gated by the
user-tracingfeature and fed to the existingfoundations::telemetry::init:The only setting a consumer must choose is
socket_path(the OTLP endpoint's UDS).enabledandmax_queue_sizemirror their internal-tracing equivalents;num_tasksandmax_batch_sizetune the exporter. All have defaults and can be configured as needed.There is deliberately no sampling configuration here. Unlike internal tracing, the user pipeline is not sampled inside foundations — the inbound
user_tracingcontrol header drives the activation (and therefore sampling) decision upstream.Instrumentation APIs
Toy example covering the whole surface:
Key points for users:
into_context()/TelemetryContext::current()/#[span_fn]all propagate it across.await, even through an internal span's context — no manual threading.with_user_span()and#[span_fn(user = true)]open a user span in parallel with an internal span — this matches the guideline to create an internal span for every user span (the#[span_fn]decorator does the same), so a single call feeds both pipelines.Notes to reviewers
The stack is four stacked PRs / seven commits. PR2–4 introduce the core user-span APIs; the surrounding PRs prepare for and power them — PR1 makes them safe to build, PR5–6 export them off-box, PR7 propagates them over W3C.
PR1 —
user-tracing-1: MakeSharedSpanconstruction explicitInternal-only pre-move: replaces the blanket
impl From<Span>(which always produced a tracked span) with an explicitshared_span()constructor, so user spans can later be built untracked and never enter the internal live registry.PR2–4 —
user-tracing-2-4: the in-process user APIIntroduces the core and the ways to drive it:
user_shared_span(untracked),start_user_trace/user_span, theUserSpanScopeguard, theadd_user_span_tags!/add_user_span_log_fields!/set_user_span_finish_callback!macros,get_user()+ a dedicatedUSER_NOOP_HARNESSfallback, and theuser_traces()test sink that observes user spans independently of the internal pipeline.TelemetryContext.user_span+UserSpanScope::into_context(), so a user span survives.await/spawnand rides along on an internal span's context.SpanScope::with_user_span()and#[span_fn(user = true)].PR5–6 —
user-tracing-5-6: ship spans off-boxcf-rustracing[patch.crates-io]for the construction-timeRoutingMetadataspan field (placeholder for a later version bump).UserTracingSettings,init_user/USER_HARNESS, the OTLP/UDS exporter that encodesRoutingMetadatainto thecf-trace-configheader) wired intotelemetry::init, plusstart_user_trace's requiredroutingarg. Verified by producer tests that decode the exported OTLP body.PR7 —
user-tracing-7: W3C trace propagationThe
TraceparentContextW3C parser, the optionalinboundstitch onstart_user_trace(continues the upstream trace — shared trace id, inbound parent), anduser_tracing::w3c_traceparent()for outbound.Alternatives considered
If constrained not to include user tracing specifics in this codebase, we can open up some seams to plug in similar functionality.
Opaque routing metadata. Rather than a typed
RoutingMetadata, cf-rustracing/foundations could carry routing as a type-erased value they never interpret, leaving the onlydowncastto the exporter:This keeps Cloudflare routing concepts (zone / account / workspace / destinations) out of cf-rustracing. We chose the typed
Option<RoutingMetadata>field: those types become visible in cf-rustracing, in exchange for direct typed access and noAnydowncast.Pluggable exporter seam (
BatchHandler). Rather than a concrete exporter inside foundations, foundations could accept an object-safeArc<dyn BatchHandler>and run a shared drain loop, with the OTLP/UDS handler implemented in oxy:This would keep the OTLP/UDS (
hyper/prost) deps and wire format out of foundations. We chose a concrete output module (output_otlp_uds, mirroringoutput_jaeger_thrift_udp/output_otlp_grpc): no trait object or plugin point — future destinations are newUserTracesOutputenum variants.Additional notes
SpanContext/SpanContextStateconversion is internal-only —TraceparentContextis the only stitch type users see.cf-rustracing[patch.crates-io]in PR5 is intentionally temporary; the matching change is a separatecf-rustracingPR and will become a normal version bump once released.