Skip to content

docs(telemetry): add instrumentation conventions, align telemetry to them#996

Merged
EhabY merged 3 commits into
mainfrom
chore/telemetry-conventions
Jun 11, 2026
Merged

docs(telemetry): add instrumentation conventions, align telemetry to them#996
EhabY merged 3 commits into
mainfrom
chore/telemetry-conventions

Conversation

@EhabY

@EhabY EhabY commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

What

Establishes a shared style for local telemetry instrumentation and brings every existing event into line with it.

  • New src/instrumentation/CONVENTIONS.md (linked from CONTRIBUTING.md) covering the framework vs per-domain-instrumentation split, threading (spans passed explicitly, attributes set imperatively via setProperty, never return a value purely to log it), naming per the OTel convention, properties vs measurements, outcome/error/abort semantics, and secret safety.
  • OTel naming alignment: caller-supplied keys go snake_case (cache_source, workspace_name, command_id, …); enumerated values are lowercase snake_case unions (file_path/not_found, idle/connecting/… connection states); counts are named <entity>.count (agent.count, workspace.count); unit suffixes (_ms, _seconds, _mbits) resolve into the OTLP unit field so Prometheus suffixes cleanly. Framework-managed result/durationMs are left as-is.
  • Outcome semantics tightened: result stays framework-managed — the caller-set key on connection.reconnect_resolved becomes a domain outcome; thrown aborts mark spans aborted instead of recording error.type="aborted" (workspace open, dev containers, credential store/clear); errors carry error.type, aborts carry reason/abort_stage, never both.
  • Layering: auth.session_lookup moves behind AuthTelemetry.traceSessionLookup, and the diagnostic export format property is typed as the closed ExportFormat union.

Why

Telemetry is local-only and unreleased, so unifying the naming now is free. Merging this first gives the other telemetry branches a conformant base to rebase onto and a doc to follow.

@EhabY EhabY self-assigned this Jun 8, 2026
@EhabY EhabY force-pushed the chore/telemetry-conventions branch 3 times, most recently from 13ac79c to df89efa Compare June 8, 2026 15:06
@EhabY EhabY requested review from andrewdennis117 and hugodutka and removed request for hugodutka June 8, 2026 16:05
@EhabY EhabY force-pushed the chore/telemetry-conventions branch from df89efa to ca8c7ec Compare June 10, 2026 15:22
EhabY added 2 commits June 11, 2026 13:53
…e keys

Add src/instrumentation/CONVENTIONS.md documenting how telemetry is structured
(explicit span threading, imperative setProperty, event/attribute naming,
namespace grouping, properties vs measurements) and link it from CONTRIBUTING.

Align existing instrumentation with the OTel naming convention:
- rename caller-supplied property/measurement keys from camelCase to snake_case
  (auth, ssh, workspace, activation, websocket, cli.download);
- strip unit suffixes from exported metric names into the OTLP unit field
  (latency_ms -> metric latency, unit ms) so Prometheus suffixes cleanly;
- group the token-refresh span under its namespace (auth.token_refreshed ->
  auth.token_refresh.completed) next to auth.token_refresh.deduped.

Framework-managed result/durationMs are left unchanged.
- rename the trace outcome hook fail() -> error() and the category
  fetch_failed -> fetch_error to match the recordError vocabulary
- name count measurements <entity>.count (agent.count, workspace.count,
  event.count, interval.count) per OTel, replacing flat _count keys
- set error.type on auth error spans (login/logout exceptions, auth_failed)
  and split auth outcomes so errors carry error.type, aborts carry reason
- fold the general telemetry review findings into CONVENTIONS.md and trim
  for conciseness
@EhabY EhabY force-pushed the chore/telemetry-conventions branch from 32dddf7 to e1eb693 Compare June 11, 2026 11:01
…udit

- rename command.invoked commandId -> command_id and CliCacheSource
  values to snake_case (file_path, not_found)
- lowercase ConnectionState values on connection.state_transitioned;
  rename the caller-set result key on connection.reconnect_resolved to
  outcome so result stays framework-managed
- thrown aborts mark spans aborted instead of error.type="aborted"
  (workspace.open, dev container open, credential store/clear)
- type the diagnostic export format as ExportFormat, wrap
  auth.session_lookup in AuthTelemetry, resolve _seconds measurement
  suffixes to an OTLP unit
@EhabY EhabY changed the title docs(telemetry): instrumentation conventions + snake_case attribute keys docs(telemetry): add instrumentation conventions, align telemetry to them Jun 11, 2026
@EhabY EhabY merged commit a402fb6 into main Jun 11, 2026
13 checks passed
@EhabY EhabY deleted the chore/telemetry-conventions branch June 11, 2026 12:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants