quic: add TLS key log support (NSS Key Log Format)#44821
Conversation
Signed-off-by: Doogie Min <doogie.min@sendbird.com>
|
Hi @danzh2010 and @RyanTheOptimist — when you have a moment, would you mind taking a look? This is a small follow-up to #42734 that addresses the |
|
/retest |
- Rename ContextImpl::writeKeyLog to maybeWriteKeyLog to make the no-op-when-not-configured contract obvious at the call site. - Read connection addresses from the EnvoyQuicServerSession's Network::Connection facet (passed in at handshaker construction) instead of re-converting QUICHE addresses on every key log line. - Inline the runtime-flag literal in the integration test fixture instead of holding a single-use constant. Signed-off-by: Doogie Min <doogie.min@sendbird.com>
|
@danzh2010 addressed all three review comments in the latest commit:
PR description updated to match. PTAL when you have a moment, thanks! |
Match danzh2010's review comment literally by doing the static_cast<EnvoyQuicServerSession*>(session()) inside the keylog callback rather than at handshaker construction time. The original deviation existed because the literal pattern triggered a Bazel cycle (envoy_tls_server_handshaker -> envoy_quic_server_session_lib -> envoy_quic_proof_source_lib -> envoy_tls_server_handshaker) - but the cycle had a vestigial edge: envoy_quic_server_session.cc included envoy_quic_proof_source.h while referencing zero proof source symbols. Removing that stale include and adding the direct quic_server_transport_socket_factory.h include it was transitively providing dissolves the cycle. The envoy_connection_ member, its constructor parameter, and the construction-time cast in envoy_quic_crypto_server_stream.cc are no longer needed. Signed-off-by: Doogie Min <doogie.min@sendbird.com>
|
/retest |
danzh2010
left a comment
There was a problem hiding this comment.
Thanks for your contribution!
|
Hand it to @RyanTheOptimist for final pass! |
|
@RyanTheOptimist gentle ping! |
| registerCertCompression(ssl_ctx); | ||
| if (Runtime::runtimeFeatureEnabled("envoy.reloadable_features.quic_session_ticket_support")) { | ||
| SSL_CTX_set_tlsext_ticket_key_cb(ssl_ctx, EnvoyTlsServerHandshaker::ticketKeyCallback); | ||
| SSL_CTX_set_keylog_callback(ssl_ctx, EnvoyTlsServerHandshaker::keylogCallback); |
There was a problem hiding this comment.
This shouldn't be tied to envoy.reloadable_features.quic_session_ticket_support
There was a problem hiding this comment.
Can this callback only be registered when keylogging is configured?
There was a problem hiding this comment.
I don’t think we can gate the SSL_CTX callback registration directly on keylog config at OnNewSslCtx() time. That hook only has the shared QUICHE SSL_CTX; it does not know which filter chain / ServerContextImpl will be selected for a connection.
Instead, the updated patch installs the shared callback unconditionally, then gates the actual write at callback time through the per-connection pinned ServerContextImpl. If keylogging is not configured, maybeWriteKeyLog() short-circuits and no file is created/written.
To make that work for keylog-only configs, EnvoyTlsServerHandshaker is now selected whenever a QUIC transport socket factory is available, while the runtime flag only controls session resumption.
| } | ||
|
|
||
| void EnvoyTlsServerHandshaker::keylogCallback(const SSL* ssl, const char* line) { | ||
| auto* handshaker = |
There was a problem hiding this comment.
Add on line above: ASSERT(dynamic_cast<EnvoyTlsServerHandshaker*>(SSL_get_ex_data(ssl, handshakerExDataIndex())) != nullptr;
There was a problem hiding this comment.
Because the SSL_CTX callbacks are now installed unconditionally, they can be invoked for SSL objects that did not come from EnvoyTlsServerHandshaker — specifically the vanilla QUICHE fallback path when no QUIC transport socket factory/filter chain is available. In that case the ex_data slot is legitimately unset, so nullptr is valid and a pre-null-check assert would be incorrect.
…log from session ticket flag Addresses ggreenway review on envoyproxy#44821: - EnvoyTlsServerHandshaker is now selected for every QUIC connection that has a transport socket factory (vanilla quic::TlsServerHandshaker remains the fallback only when transport_socket_factory has no value). - The session-ticket runtime flag is folded into the per-connection disable_resumption ctor argument computed by the crypto stream factory as: !Runtime::runtimeFeatureEnabled(envoy.reloadable_features.quic_session_ticket_support) || !ticket_config.has_keys || ticket_config.disable_stateless_resumption || ticket_config.handles_session_resumption. When the flag eventually flips on by default or is removed, only the OR-clause needs updating; the ctor signature stays the same. - The handshaker ctor calls DisableResumption() whenever the caller wants resumption off or the pinned context is null / has no ticket keys. That sets SSL_OP_NO_TICKET on the SSL, which is what gates the QUIC/TLS 1.3 ticket paths in BoringSSL; no per-handshaker policy bit needed. - OnNewSslCtx installs both ticket-key and key log callbacks unconditionally on the QUICHE SSL_CTX. ticketKeyCallback no longer checks any runtime state; keylogCallback is structurally independent of session-ticket logic and writes whenever the pinned ServerContextImpl has key log configured. - Both callbacks retrieve the handshaker via a private static helper handshakerFromSsl(ssl) that does a plain static_cast. The ex_data slot is owned by EnvoyTlsServerHandshaker (handshakerExDataIndex() is private) and only the constructor writes to it, so the slot is either nullptr (vanilla fallback) or a real EnvoyTlsServerHandshaker*. The constructor wraps the SSL_set_ex_data call in a RELEASE_ASSERT for symmetry with the existing RELEASE_ASSERT on SSL_get_ex_new_index in handshakerExDataIndex(). - ASSERT(dynamic_cast<EnvoyQuicServerSession*>(handshaker->session()) != nullptr) added above the connectionInfoProvider() lookup in keylogCallback (ggreenway comment 5). - Verbose null-handshaker fallback comments deleted (ggreenway comment 4). - Proof source smoke test collapsed to a single OnNewSslCtxInstallsKeylogCallback that asserts exact equality against EnvoyTlsServerHandshaker::keylogCallback. Removed the unused test_runtime_lib dep from test/common/quic/BUILD. - New integration regression test NoSessionTicketResumptionWhenRuntimeDisabledWithStaticKeys configures static ticket keys with the runtime flag off and asserts the second QUIC connection does not resume. Pins the BoringSSL SSL_OP_NO_TICKET contract on which the callback-side gating relies. - QuicKeylogIntegrationTest::setUpKeylog no longer toggles the session-ticket runtime flag, proving the key log feature operates with the flag at its default (off). Signed-off-by: Doogie Min <doogie.min@sendbird.com>
|
@danzh could you please review the updated design where Envoy installs |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request implements TLS key logging for QUIC by bridging shared QUICHE SSL_CTX callbacks to per-connection Envoy state and refactoring ContextImpl to share key logging logic. The reviewer identified several critical issues, including circular dependencies introduced in the build and source files by depending on the full session library instead of the connection interface. Additionally, a compilation error was noted where the void return value of DisableResumption() was incorrectly assigned to a boolean variable.
Signed-off-by: Doogie Min <doogie.min@sendbird.com>
The previous push broke 0-RTT / session resumption for every test that
relies on QUICHE/BoringSSL default ticket processing
(QuicHttpIntegrationTest.ZeroRtt, EarlyDataDisabled,
HttpsWithEarlyData on Envoy Mobile, RetryAfterHttp3ZeroRttHandshakeFailed,
the MultiplexedUpstreamIntegrationTest early-data trio). The factory was
folding the envoy.reloadable_features.quic_session_ticket_support runtime
flag into disable_resumption, so the handshaker constructor called
DisableResumption() on every connection when the flag was off (the
default), which sets SSL_OP_NO_TICKET and prevents the server from
issuing tickets at all.
The runtime flag was only supposed to gate Envoy's custom ticket key
callback. When the flag is off, the server should leave session tickets
alone and let QUICHE/BoringSSL issue them via the default path. This
patch restores that separation:
- EnvoyQuicProofSource::OnNewSslCtx installs the keylog callback
unconditionally (key log is independent of tickets), and installs the
ticket key callback only when the flag is on.
- The factory splits the flag check out of disable_resumption and
passes envoy_ticket_key_cb_enabled as a separate ctor argument.
disable_stateless_resumption is operator-set and always disables
resumption regardless of the flag. has_keys and
handles_session_resumption only matter when Envoy's callback is in
play, so they're gated on the flag.
- EnvoyTlsServerHandshaker takes envoy_ticket_key_cb_enabled and uses
it to gate the defensive "no Envoy keys / pinned context null"
branch. With the flag off the SSL_CTX has no Envoy ticket callback,
so that branch must not fire (otherwise the default flag-off path
with no static Envoy keys still trips DisableResumption()).
- Removes NoSessionTicketResumptionWhenRuntimeDisabledWithStaticKeys
from the integration test suite. It was pinning the buggy contract
(flag off + static keys => no resumption) that contradicted existing
ZeroRtt / EarlyDataDisabled coverage and the corrected design.
Behavior matrix the fix implements:
flag off + no disable_stateless_resumption => DisableResumption not
called, QUICHE issues tickets, 0-RTT works.
flag on + static keys => ticketKeyCallback
handles tickets via Envoy keys.
flag on + no keys / SDS race => DisableResumption() to
avoid dispatching into a broken callback.
any + disable_stateless_resumption => DisableResumption()
regardless of flag; operator override wins.
Locally verified (Docker, OrbStack):
//test/integration:quic_http_integration_test
--test_filter=*ZeroRtt*:*EarlyData* (6/6 pass: ZeroRtt,
EarlyDataDisabled, EnableAndDisableEarlyData x IPv4/IPv6)
//test/common/quic:envoy_quic_proof_source_test (7/7 pass)
//test/common/quic:envoy_tls_server_handshaker_test (pass)
//test/extensions/filters/http/alternate_protocols_cache:filter_integration_test
(pass)
Signed-off-by: Doogie Min <doogie.min@sendbird.com>
…tall only
Per review direction, simplify the design:
- Keep the EnvoyTlsServerHandshaker ctor signature (no extra arg).
- Only gate SSL_CTX_set_tlsext_ticket_key_cb on the runtime flag in
EnvoyQuicProofSource::OnNewSslCtx; keylog callback stays unconditional.
- Compute disable_resumption from listener config only:
disable_resumption = disable_stateless_resumption
|| !has_keys
|| handles_session_resumption.
- Constructor reduces to: if (disable_resumption) DisableResumption();.
Signed-off-by: Doogie Min <doogie.min@sendbird.com>
Commit Message: quic: add TLS key log support so QUIC TLS secrets can be exported in NSS Key Log Format
Additional Description:
Summary
When debugging QUIC TLS behavior — session ticket resumption, certificate compression, or other handshake-level questions — the typical workflow is to open a packet capture in Wireshark and supply per-connection TLS secrets via a NSS Key Log file. Today this only works when the client can emit the key log itself via
SSLKEYLOGFILE. Some client stacks — for example Apple's Network framework on macOS and iOS, and SDKs built on top of it — don't expose that hook, so operators are left without an easy way to inspect those handshakes.For TCP TLS this is already covered: Envoy's
DownstreamTlsContexthas akey_logfield that hooks BoringSSL's key log callback and writes NSS-format lines to a file, with optional local/remote address filters. The same proto field exists for QUIC but is not yet plumbed (see #25418, underCommonTlsContext: key_log).This PR bridges that gap so the same
key_logconfiguration that already works for TCP TLS produces NSS Key Log lines for QUIC connections too, giving operators a server-side way to decrypt QUIC captures when the client can't help.Implementation
We piggy-back on the per-connection
EnvoyTlsServerHandshakerintroduced by #42734 and add a sibling static callbackEnvoyTlsServerHandshaker::keylogCallbacknext to the existingticketKeyCallback. The callback retrieves the handshaker from SSL ex_data, downcasts the QUIC session to itsNetwork::Connectionfacet (static_cast<EnvoyQuicServerSession*>(handshaker->session())), reads cached local/peer addresses from its connection info provider, and forwards the line to the pinnedServerContextImpl'smaybeWriteKeyLog(...).To keep the address-filter + file-write logic from being duplicated, the existing inline body of
ContextImpl::keylogCallbackis factored into a newvoid ContextImpl::maybeWriteKeyLog(line, local_addr, remote_addr)that both the TCP and QUIC paths call. The TCP callback is now a thin shim: fetch the per-connectionNetwork::TransportSocketCallbacksfrom ex_data, pull addresses off the connection, and callmaybeWriteKeyLog. No behavior change for TCP.The
SSL_CTX_set_keylog_callbackinstall lives inEnvoyQuicProofSource::OnNewSslCtx, alongsideSSL_CTX_set_tlsext_ticket_key_cb. Both callbacks install unconditionally on the QUICHE-owned SSL_CTX, sinceOnNewSslCtxis the single mutation hook Envoy has on it and runs before any filter chain is known. Per-connection gating happens later.EnvoyTlsServerHandshakeris selected for every QUIC connection that has a transport socket factory (vanillaquic::TlsServerHandshakerremains the fallback only whentransport_socket_factory.has_value()is false — the no-filter-chain edge path). The session-ticket runtime flag (envoy.reloadable_features.quic_session_ticket_support) is folded into the per-connectiondisable_resumptionctor argument computed by the crypto stream factory:The handshaker constructor calls
DisableResumption()(which setsSSL_OP_NO_TICKETon the SSL) whenever the caller wants resumption off or the pinned context is null / has no ticket keys. BoringSSL gates every QUIC/TLS 1.3 ticket path onSSL_OP_NO_TICKET, soticketKeyCallbackdoesn't need a per-handshaker policy bit — the callback only verifies an Envoy handshaker is in ex_data and a pinned context is available, and trusts BoringSSL not to dispatch on disabled connections. An integration regression test (NoSessionTicketResumptionWhenRuntimeDisabledWithStaticKeys) pins that contract.A small preparatory cleanup is included:
source/common/quic/envoy_quic_server_session.ccwas includingenvoy_quic_proof_source.hwithout referencing any proof-source symbol, and the matching BUILD dep onenvoy_quic_proof_source_libexisted only to satisfy strict-headers for that vestigial include. Removing the include + dep (and adding the directquic_server_transport_socket_factory.hinclude the proof-source header was transitively providing) breaks a cycle that would otherwise preventenvoy_tls_server_handshakerfrom depending onenvoy_quic_server_session_libto do the cast.Key design decisions:
EnvoyTlsServerHandshakeris always used for QUIC. Restricting it to a runtime flag would couple unrelated features (key log, session tickets, future per-handshake hooks) to the same gate. vs. vanillaquic::TlsServerHandshaker, the subclass adds onlySSL_set_ex_data(a pointer store) and aDisableResumption()call when resumption is disabled — the latter setsSSL_OP_NO_TICKETper-SSL, redundant with QUICHE's existing behavior of not settingSSL_CTX_set_ticket_aead_methodwhenproof_source->GetTicketCrypter()returns null (Envoy's case).Per-connection ticket gating lives in
DisableResumption(), not inticketKeyCallback. SettingSSL_OP_NO_TICKETin the ctor makes BoringSSL refuse to invoketicket_key_cbon the TLS 1.2 decrypt/encrypt and TLS 1.3 encrypt paths. The callback only needs to verify ex_data is populated; the regression test pins the BoringSSL contract.Reuse the pinned
ServerContextImplfrom quic: add TLS session ticket resumption support #42734. The handshaker already pins aServerContextSharedPtrfor session-ticket purposes. The key log callback uses the same pin, so an SDS rotation that swaps the factory's active context does not affect in-flight connections — every line a connection emits goes to the key log file the connection was created with, matching TCP TLS behavior.maybeWriteKeyLoglives onContextImpl, shared by TCP and QUIC. The key log file and IP-list filter members already live onContextImpl, so factoring the filter+write step into a member of that class is the natural home. RefactoringContextImpl::keylogCallbackto delegate makes the new method genuinely shared rather than parallel-copied logic.Addresses come from
EnvoyQuicServerSession'sNetwork::Connectionfacet. InsidekeylogCallback,static_cast<EnvoyQuicServerSession*>(handshaker->session())->connectionInfoProvider()returns the same cached envoy address objects the connection was created with — no per-line allocation.handshakerExDataIndex()is private toEnvoyTlsServerHandshaker. The ex_data slot is written only by the constructor (SSL_set_ex_data(..., this), wrapped inRELEASE_ASSERTfor symmetry with the existingRELEASE_ASSERT(index >= 0, ...)onSSL_get_ex_new_index). Retrieval goes through a private helperhandshakerFromSsl(ssl)that does a plainstatic_cast. The type contract — "the slot is either nullptr or a realEnvoyTlsServerHandshaker*" — is enforced structurally by who owns the index, not by runtime RTTI.Flow
Risk Level: Low. The key log feature is a no-op unless
key_logis configured. The TCP TLS path is refactored but observably unchanged. Always-installingEnvoyTlsServerHandshakeris audited above: the only added per-connection work vs. vanilla is a pointer store and a redundantSSL_OP_NO_TICKETflag.Testing:
EnvoyTlsServerHandshakerTest.TicketKeyCallbackNullHandshakerandKeylogCallbackNullHandshaker(helper's null-tolerance on the vanilla edge path).EnvoyQuicProofSourceTest.OnNewSslCtxInstallsKeylogCallback(exact-equality assertion thatEnvoyTlsServerHandshaker::keylogCallbackis installed on the SSL_CTX).quic_http_integration_test(KeylogNoFilter,KeylogLocalFilterMatches,KeylogRemoteFilterMatches,KeylogLocalAndRemoteFilterMatch,KeylogLocalFilterNoMatch,KeylogRemoteFilterNoMatch,KeylogNotConfigured), each parameterized on IPv4/IPv6. The fixture does not toggle the session-ticket runtime flag, so the whole suite proves key log works with the flag at its default (off). Positive cases assert the file contains all five TLS 1.3 secrets (CLIENT_HANDSHAKE_TRAFFIC_SECRET,SERVER_HANDSHAKE_TRAFFIC_SECRET,CLIENT_TRAFFIC_SECRET,SERVER_TRAFFIC_SECRET,EXPORTER_SECRET); address-filter-no-match cases assert the file is created but empty;KeylogNotConfiguredasserts no file is created whenkey_logis absent.QuicHttpIntegrationTest.NoSessionTicketResumptionWhenRuntimeDisabledWithStaticKeysconfigures static ticket keys withenvoy.reloadable_features.quic_session_ticket_supportoff and asserts the second connection does not resume — pins the BoringSSLSSL_OP_NO_TICKETcontract the design relies on.SessionTicketResumptionWithStaticKeysandNoSessionTicketResumptionWithoutKeyscontinue to cover the flag-on resumption behavior.test/common/tls/integration/ssl_integration_testcontinue to pass after thekeylogCallback→maybeWriteKeyLogrefactor.Docs Changes: N/A — uses the existing
key_logproto already documented for TCP TLS.Release Notes: N/A.
Platform Specific Features: N/A
[Optional Fixes #Issue] Fixes #35339. Also addresses the
CommonTlsContext: key_logbullet of #25418.