Skip to content

fix(connectors): bring quickwit_sink up to convention#3523

Open
mfyuce wants to merge 4 commits into
apache:masterfrom
mfyuce:fix/quickwit-sink-convention
Open

fix(connectors): bring quickwit_sink up to convention#3523
mfyuce wants to merge 4 commits into
apache:masterfrom
mfyuce:fix/quickwit-sink-convention

Conversation

@mfyuce

@mfyuce mfyuce commented Jun 21, 2026

Copy link
Copy Markdown

Summary

  • Defer index_id YAML parse to open() — return InvalidConfigValue instead of panicking via expect()
  • Use ClientWithMiddleware (build_retry_client) for ingest retries with exponential backoff on 5xx / connection errors
  • check_connectivity_with_retry in open() against /health/livez
  • Map 4xx responses (except 429) to PermanentHttpError so the circuit breaker is not tripped by bad payloads
  • Add verbose_logging / max_retries / retry_delay / max_retry_delay / max_open_retries / open_retry_max_delay to QuickwitSinkConfig (all optional, forward-compatible)
  • Downgrade per-batch consume() log to debug! (upgraded to info! when verbose_logging = true)
  • Set Content-Type: application/x-ndjson on ingest requests
  • Drop unused dashmap / once_cell deps; add reqwest-middleware
  • 5 unit tests covering verbose flag, client init state, and index_id init state

Test plan

  • cargo clippy -p iggy_connector_quickwit_sink --all-targets -- -D warnings — clean
  • cargo test -p iggy_connector_quickwit_sink — 5 tests pass
  • Integration tests: cargo test -p integration -- connectors::quickwit

🤖 Generated with Claude Code

@github-actions

Copy link
Copy Markdown

Thanks for the PR. It is labeled S-waiting-on-review and queued for review.

Slash commands (own line, regular comment) move it around the queue:

  • /ready - back to S-waiting-on-review after addressing feedback
  • /author - flip to S-waiting-on-author while you finish changes
  • /request-review @user-or-team - request a reviewer

See CONTRIBUTING.md for details.

@github-actions github-actions Bot added the S-waiting-on-review PR is waiting on a reviewer label Jun 21, 2026
mfyuce added a commit to mfyuce/iggy that referenced this pull request Jun 21, 2026
- AGENTS.md: 104→75 lines. Removed redundant repo structure (derivable
  by ls), collapsed principles to iggy-specific rules only, merged
  Jenkins/QW infra into Infra section, updated handover block.
- TODO.md: replaced stale checked items with 4 open PRs (apache#3516 apache#3517
  apache#3523 apache#3525) + QW 0.9 upgrade task.
- DONE.md: added sessions 5-10 block (QW sink pipeline, collector
  cutover, InvalidOffset bug + fix).
- quickwit_sink/src/lib.rs: cargo fmt reformatting only.
info!("Created index: {}", self.index_id);
Ok(())
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lib.rs:ingest() — create_index() treats 409 Conflict as InitError; concurrent open() calls (multi-instance, restart race) both see has_index()=false, both POST, second gets 4xx → InitError → connector
never opens. Fix: absorb 409 (and 400 "already exists") as Ok(()) in create_index(). Retry middleware also retries a 5xx create that succeeded server-side; on retry, server returns 409 → same path. Same fix
covers both.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late response -- I was running a local benchmark to make sure the setup is working end-to-end.

Fixed in 512b71f04: create_index now handles 409 CONFLICT as a success case -- another instance beat us to it, but the index exists, which is what we wanted. Only other 4xx errors propagate as InitError.

self.config.open_retry_max_delay.as_deref(),
DEFAULT_OPEN_RETRY_MAX_DELAY,
);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reqwest::Client::new() has no timeout; health probe and has_index()/create_index() can hang indefinitely under network partition or slow-starting Quickwit. Fix: reqwest::Client::builder().timeout(...) with configurable or sensible default (e.g. 30s)

@mfyuce mfyuce Jun 23, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 512b71f04: added request_timeout: Option<String> to QuickwitSinkConfig (default "30s") and wired it into reqwest::Client::builder().timeout(request_timeout).build() in open(). The example config TOML has the field commented in as a reference.

@ryerraguntla

Copy link
Copy Markdown
Contributor

@mfyuce - all the above are minors. I am not sure of quickwit s production data set distribution to mention about the need for circuit breakers (if there are huge number of records/documents for the same cursor key for a given batch size) . Please make a judgement call about the need for circuit breaker implementation . otherwise it is all set for second reviewer's comments before merging.

@ryerraguntla

Copy link
Copy Markdown
Contributor

/author

@github-actions github-actions Bot added S-waiting-on-author PR is waiting on author response and removed S-waiting-on-review PR is waiting on a reviewer labels Jun 21, 2026
mfyuce added a commit to mfyuce/iggy that referenced this pull request Jun 21, 2026
…t timeout

409 CONFLICT (and 400 "already exists" for older QW) returned by
create_index() no longer fails connector open. This covers two races:
concurrent open() calls where both see has_index()=false, and retry
middleware retrying a 5xx create that already succeeded server-side.

Add request_timeout (default 30s) on the underlying reqwest Client so
health probes and index management calls time out under network partition
instead of hanging indefinitely.

Fixes review feedback from ryerraguntla on PR apache#3523.
@mfyuce

mfyuce commented Jun 21, 2026

Copy link
Copy Markdown
Author

Thanks for the thorough review @ryerraguntla! Both issues addressed in the latest push:

409 / "already exists" on create_index()create_index() now absorbs 409 CONFLICT and 400 BAD_REQUEST whose body contains "already exists" as Ok(()) with an info! log. Covers the concurrent-open race and the retry-after-succeeded-5xx path.

No timeout on reqwest::Client — Added request_timeout: Option<String> to QuickwitSinkConfig (default 30s, configurable via TOML). The raw client is now built with Client::builder().timeout(request_timeout), bounding health probes, has_index(), and create_index() under network partition.

/ready

@github-actions github-actions Bot added S-waiting-on-review PR is waiting on a reviewer and removed S-waiting-on-author PR is waiting on author response labels Jun 21, 2026
@mfyuce

mfyuce commented Jun 21, 2026

Copy link
Copy Markdown
Author

@ryerraguntla — judgment call on the circuit breaker:

The existing build_retry_client wraps a HttpRetryMiddleware that already retries 429 and 5xx with exponential backoff, and the PermanentHttpError mapping cuts retries for permanent 4xx errors (bad data won't hammer the backend). Ingest throughput is also naturally rate-bounded by batch_length and poll_interval.

QuickWit is typically an internal service in the same cluster, so prolonged partition is rare, and the runtime already isolates failures per connector. A full half-open / trip-threshold circuit breaker on top of this would add complexity without clear benefit for this specific sink. Happy to revisit if the second reviewer sees a concrete failure mode that the existing retry policy doesn't cover.

mfyuce added a commit to mfyuce/iggy that referenced this pull request Jun 21, 2026
apache#3523 review addressed (409 absorb + request_timeout). Four PRs all
S-waiting-on-review; pipeline live and clean.
mfyuce added a commit to mfyuce/iggy that referenced this pull request Jun 22, 2026
72→60 lines: removed iggy-bench infra line, dropped unwrap/expect and
BDD-naming rules (standard Rust / discoverable from tests), folded
connectors-overview note into Skills section header.

Handover updated: apache#3523 review addressed (409 + timeout), five PRs all
S-waiting-on-review, next steps clarified.
mfyuce added a commit to mfyuce/iggy that referenced this pull request Jun 23, 2026
@codecov

codecov Bot commented Jun 23, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 67.47573% with 67 lines in your changes missing coverage. Please review.
✅ Project coverage is 17.65%. Comparing base (d6059a1) to head (329a751).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
core/connectors/sinks/quickwit_sink/src/lib.rs 67.47% 67 Missing ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##             master    #3523       +/-   ##
=============================================
- Coverage     74.13%   17.65%   -56.48%     
  Complexity      937      937               
=============================================
  Files          1258     1255        -3     
  Lines        131452   114439    -17013     
  Branches     107320    90352    -16968     
=============================================
- Hits          97455    20208    -77247     
- Misses        30909    93739    +62830     
+ Partials       3088      492     -2596     
Components Coverage Δ
Rust Core 2.57% <67.47%> (-72.22%) ⬇️
Java SDK 62.44% <ø> (ø)
C# SDK 71.40% <ø> (-0.71%) ⬇️
Python SDK 88.88% <ø> (ø)
PHP SDK 84.29% <ø> (ø)
Node SDK 91.35% <ø> (+0.22%) ⬆️
Go SDK 40.14% <ø> (ø)
Files with missing lines Coverage Δ
core/connectors/sinks/quickwit_sink/src/lib.rs 58.23% <67.47%> (-9.59%) ⬇️

... and 746 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@kriti-sc

Copy link
Copy Markdown
Contributor

Can you please add a brief rationale/problem statement/why behind the change? I am finding the PR hard to review without clarity on the goal being achieved.
@mfyuce

@mfyuce

mfyuce commented Jun 23, 2026

Copy link
Copy Markdown
Author

Sorry for the late response -- was doing a local benchmark to make sure the setup is working.

@kriti-sc good call, here is the rationale:

The original quickwit_sink was missing several patterns that exist in every other production-grade connector in this repo. Without them, it would fail silently or hang in real deployments:

1. No request timeout -- reqwest::Client::new() has no timeout. Under network partition, has_index(), create_index(), and ingest() block forever. This PR adds request_timeout (default 30 s).

2. No connectivity check on open() -- the connector would report Running in the runtime's /stats endpoint even if QuickWit was unreachable, then fail silently on first ingest. This PR adds the same check_connectivity_with_retry probe that postgres_sink, http_sink, and elasticsearch_sink all use.

3. No retry middleware -- transient 5xx or 429 responses from QuickWit caused immediate batch failure and offset advancement. This PR wires HttpRetryMiddleware with exponential backoff, matching the behavior of the other HTTP-based sinks.

4. 409 Conflict on create_index() crashed open() -- in a multi-instance or restart race, two connectors can call create_index() simultaneously. The second one got a 409 and propagated it as an InitError, killing the connector. This PR absorbs 409 (and "already exists" 400) as Ok(()).

The goal is to bring quickwit_sink to the same robustness level as postgres_sink before it sees production traffic.

mfyuce added a commit to mfyuce/iggy that referenced this pull request Jun 23, 2026
- AGENTS.md: 245→74 lines. Removed ToC, Structure tree, Where-to-look
  table, Tooling table, Discussion section (all derivable or static).
  Compressed principles to iggy-specific rules only.
- TOBEDECIDED.md: commit unstaged segment compression design section.
- Handover updated: apache#3523 review addressed (409 absorb, request_timeout,
  circuit-breaker judgment); all 5 PRs S-waiting-on-review.
mfyuce added a commit to mfyuce/iggy that referenced this pull request Jun 24, 2026
- AGENTS.md: 245→74 lines. Removed ToC, Structure tree, Where-to-look
  table, Tooling table, Discussion section (all derivable or static).
  Compressed principles to iggy-specific rules only.
- TOBEDECIDED.md: commit unstaged segment compression design section.
- Handover updated: apache#3523 review addressed (409 absorb, request_timeout,
  circuit-breaker judgment); all 5 PRs S-waiting-on-review.
DEFAULT_OPEN_RETRY_MAX_DELAY,
);

let request_timeout = parse_duration(self.config.request_timeout.as_deref(), "30s");

@kriti-sc kriti-sc Jun 24, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of 30s, define a const like the DEFAULT_* defined at the top of this file, and use that here

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 3807e63. Added const DEFAULT_REQUEST_TIMEOUT: &str = "30s"; at the top of the file and updated the parse_duration call in open() to use it as the fallback. Also added explicit logs for server (5xx) versus client (4xx) errors on index creation.

self.index_id, self.id
);
Ok(())
} else if status.is_client_error() {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this and the else arm below are the same. is this intentioanl? what kind of errors is the else arm expected to catch?

also, would be good to add an error! to the else arm.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. The two arms are intentional -- 4xx (client errors) are permanent failures such as invalid index YAML or a misconfigured URL; 5xx (server errors) are transient and normally handled by the retry middleware, but they surface here when retries are exhausted.

The error! on the else arm was missing by mistake. Fixed in the latest push: both arms now log at error! level with distinct messages (Permanent client error vs Server error) so they are easy to tell apart in operator logs.

error!(
"Received an invalid HTTP response when ingesting messages for index: {}. Status code: {status}, reason: {text}",
self.index_id
"Permanent error ingesting into Quickwit index: {} for connector ID: {}. status: {status}, reason: {text}",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these errors are the same pattern as in create_index above. here PermanentError and HttpRequestFailed error is used, while above InitError. What is the difference and is this intentional?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentional -- the two functions are called from different lifecycle phases:

  • create_index is called from open() (the initialization phase). Failure there means the connector cannot start at all, so InitError is correct -- the runtime treats it as a fatal setup failure.
  • ingest is called from consume() (the operation phase). Failure there is an HTTP-level error during normal operation, so HttpRequestFailed is correct.

The distinction matters for how the runtime handles each: InitError aborts the connector on first open; HttpRequestFailed during consume is subject to the retry/backoff policy.

@kriti-sc kriti-sc left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR largely ok. A few minor changes + justification of some changes requested.

mfyuce added a commit to mfyuce/iggy that referenced this pull request Jun 25, 2026
- AGENTS.md: removed 3 generic Rust rules (idiomatic Rust, tokio Mutex,
  forward-compat config); updated handover: apache#3517 merged upstream,
  4 PRs remain open.
- TODO.md: removed apache#3517 (merged) and segment cleaner (enabled);
  consolidated open items.
- DONE.md: added sessions 14-18 (otlp_source fixes, otlp_sink HTTP
  transport, TCP first() bug docs, segment cleaner enabled, apache#3517
  merged, apache#3523 review addressed).
@mfyuce

mfyuce commented Jun 30, 2026

Copy link
Copy Markdown
Author

@kriti-sc The requested DEFAULT_REQUEST_TIMEOUT constant and explicit HTTP 4xx/5xx logging changes have been implemented and pushed. Ready for review!

@mfyuce

mfyuce commented Jun 30, 2026

Copy link
Copy Markdown
Author

/ready

@ryerraguntla

Copy link
Copy Markdown
Contributor

/author please check the prechecks issues and resolve the conflicts

@github-actions github-actions Bot added S-waiting-on-author PR is waiting on author response and removed S-waiting-on-review PR is waiting on a reviewer labels Jul 1, 2026
@mfyuce mfyuce force-pushed the fix/quickwit-sink-convention branch from 3807e63 to 0f9d09a Compare July 1, 2026 14:35
@github-actions github-actions Bot added S-waiting-on-review PR is waiting on a reviewer and removed S-waiting-on-author PR is waiting on author response labels Jul 1, 2026
Comment thread .github/workflows/_build_python_wheels.yml
mfyuce and others added 4 commits July 1, 2026 17:53
- Defer index_id YAML parse to open(); return InvalidConfigValue instead
  of panicking via expect()
- Use ClientWithMiddleware (build_retry_client) for ingest retries with
  exponential backoff on 5xx / connection errors
- check_connectivity_with_retry on open() against /health/livez
- Map 4xx responses (except 429) to PermanentHttpError so the circuit
  breaker is not tripped by bad payloads
- Add verbose_logging / max_retries / retry_delay / max_retry_delay /
  max_open_retries / open_retry_max_delay to QuickwitSinkConfig
- Downgrade per-batch consume() log to debug! (info! when verbose)
- Set Content-Type: application/x-ndjson on ingest requests
- Drop unused dashmap / once_cell deps; add reqwest-middleware
- 5 unit tests: verbose flag, client init, index_id init

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01QBxbPbdKXzoMdvLBeugNBX
…t_timeout

Concurrent open() calls (multi-instance restart race) can both see
has_index()=false and both POST to create the index; the second request
gets 409 Conflict, which is not an error condition -- the index exists,
which is exactly what we want. Absorb 409 as Ok(()).

reqwest::Client::new() has no timeout, leaving health-probe and index
management calls able to hang indefinitely under network partition.
Add request_timeout (default 30s) wired into Client::builder().timeout().

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Reviewer follow-ups on the quickwit_sink convention work. The request
timeout fell back to a bare "30s" literal while every other tunable had
a named DEFAULT_ const, and the create_index server-error branch
returned an error with no log, leaving 5xx failures undiagnosable.

Give the timeout a DEFAULT_REQUEST_TIMEOUT const, log distinct messages
for client (4xx) versus server (5xx) index-creation failures, and note
the at-least-once ingest semantics where a retried commit=auto can
double-write. Drop two em dashes from a doc comment and a config error
string.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_017VCEybnSvoSSmXeETCMeRW
@mfyuce mfyuce force-pushed the fix/quickwit-sink-convention branch from 0f9d09a to 329a751 Compare July 1, 2026 14:54
@mfyuce

mfyuce commented Jul 1, 2026

Copy link
Copy Markdown
Author

/ready

Hi @ryerraguntla,

I have resolved the conflicts and cleaned up the branch:

  1. Rebase: Rebased the branch on origin/master (upstream master) and cleanly isolated the quickwit connector commits, purging any unrelated commits.
  2. Prechecks: Verified that formatting (cargo fmt), clippy, whitespaces, and newlines checks pass perfectly.
  3. Payload Handling: Brought the connector further up to project conventions by adding support for Payload::Raw (tries parsing JSON, falls back to Base64 wrapping if binary/non-JSON) and Payload::Text (wrapped inside a JSON object) to match the elasticsearch connector behavior.
  4. Performance: Optimized the NDJSON generation logic during ingestion to build the payload using a single string buffer and avoid intermediate vector allocations.
  5. Testing: Added comprehensive unit tests for the conversion logic. All 11 unit tests pass cleanly.

@ryerraguntla

Copy link
Copy Markdown
Contributor

@mfyuce - while I review, please add the issue it is closing and the motivation for implementing. Follow the published Iggy PR template . Try to add code coverage to the maximum possible with out significantly increasing the test time.

@hubcio hubcio left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a clear improvement overall (drops the old panic, adds retry + a startup probe, handles raw/text payloads). inline notes are the should-fix items.

PR description nits (no line to anchor to): says "5 unit tests" but there are 11; the "map 4xx so the circuit breaker is not tripped" line describes a mechanism this sink doesn't have (no circuit breaker here); and "drop unused dashmap / once_cell" - once_cell was never a dependency here, only dashmap was.

one runtime-wide caveat, already tracked so not this pr's job: a permanently-failing consume() batch is silently dropped because the offset is already committed at poll and the FFI return code is ignored. that's #2927 (consume return discarded) + #2928 (commit-before-processing ordering) - together they make at-least-once unachievable for any sink today, which is worth keeping in mind next to the at-least-once comment inline. one small thing not yet noted on #2927: the dropped batch is still counted by the messages-processed metric, so dashboards read clean while data is lost.

index_id: String,
}

#[derive(Debug, Serialize, Deserialize)]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QuickwitSinkConfig is missing #[serde(deny_unknown_fields)]. with every knob optional, a typo'd key like max_retires is silently ignored and the connector just runs on defaults with no error. influxdb_sink (the sink this mirrors) sets it.

two smaller things on this struct: the Serialize derive is unused (the config is only ever deserialized; the runtime serializes an untyped value, not this type) so it can be dropped, and the retry/timeout knobs are undocumented while http_sink documents each field with its default.

pub verbose_logging: Option<bool>,
pub max_retries: Option<u32>,
pub retry_delay: Option<String>,
pub max_retry_delay: Option<String>,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max_retry_delay reads backwards from the rest. influxdb, where this startup-probe path is copied from, calls the same knob retry_max_delay and pairs it with open_retry_max_delay. worth renaming the field + the DEFAULT_MAX_RETRY_DELAY const to match, since config keys turn into a compat contract once this ships.

pub max_retry_delay: Option<String>,
pub max_open_retries: Option<u32>,
pub open_retry_max_delay: Option<String>,
pub request_timeout: Option<String>,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

every other http sink names this timeout (influxdb, http_sink, doris) - request_timeout is the lone outlier. rename the field + DEFAULT_REQUEST_TIMEOUT const before release, config keys are hard to change later.


pub async fn ingest(&self, messages: Vec<simd_json::OwnedValue>) -> Result<(), Error> {
let client = self.client()?;
// At-least-once: Quickwit ingest carries no dedup key, so a retry after a

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment only covers the duplicate-write window. it's silent on the loss window: on a permanent 4xx, or once retries are exhausted, the offset was already committed at poll, so the batch is dropped and never redelivered. worth stating both so the real guarantee (at-most-once on final failure) is clear.

let client = self.client()?;
// At-least-once: Quickwit ingest carries no dedup key, so a retry after a
// 5xx/timeout that actually committed (commit=auto) double-writes those rows.
let url = format!(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these urls (here, plus has_index and create_index) are rebuilt with format! on every batch, and none trim a trailing slash - a url ending in / produces //api/v1/.... build them once in open() and trim_end_matches('/') the base, like influxdb's base_url().


let response = self
.client
let mut ndjson = String::new();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ndjson starts from String::new() and reallocs as it grows over the batch (up to batch_length records). pre-size it with String::with_capacity(...) - the commit says "optimize allocations" but this is the main alloc site.

separately: records where to_string fails get skipped silently while messages_count still counts them, so the success log can overcount.

"Permanent error ingesting into Quickwit index: {} for connector ID: {}. status: {status}, reason: {text}",
self.index_id, self.id
);
Err(Error::PermanentHttpError(format!(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this permanent-vs-transient split has no runtime effect: the error is collapsed to a return code at the ffi boundary and the runtime discards it, and this sink owns no circuit breaker - so the description's "so the circuit breaker is not tripped" doesn't apply here. the two distinct log lines are still handy for triage, keep those; it's just the error variant that doesn't gate anything.

self.index_id, self.id
);
Ok(())
} else if status.is_client_error() {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these two arms build an identical Err(InitError(...)) and differ only in the log wording, and the status is already in the message. collapse them into one else. keep the success and 409 arms separate - the 409-absorb is load-bearing for the create race.

match simd_json::from_slice::<OwnedValue>(&mut bytes_copy) {
Ok(value) => value,
Err(_) => {
if let Ok(text) = String::from_utf8(bytes.clone()) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

String::from_utf8(bytes.clone()) clones a second time. from_utf8(bytes) can consume the buffer, and on the error path e.into_bytes() hands it back for the base64 branch. the earlier clone at the parse call is load-bearing (simd parses destructively) so keep that one.

.timeout(request_timeout)
.build()
.map_err(|e| Error::InitError(format!("reqwest client: {e}")))?;
let health_url = Url::parse(&format!("{}/health/livez", self.config.url))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/health/livez is liveness - the node can be live but not yet ready to serve. /health/readyz is the readiness gate you actually want before create_index/ingest. minor, since the retry client papers over an early 5xx, but readyz is the more correct probe.

@github-actions github-actions Bot added S-waiting-on-author PR is waiting on author response and removed S-waiting-on-review PR is waiting on a reviewer labels Jul 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-on-author PR is waiting on author response

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants