LCORE-2311: Llama Stack Provider Fix by jrobertboos · Pull Request #1936 · lightspeed-core/lightspeed-stack

jrobertboos · 2026-06-16T15:16:24Z

Description

Fixed llama stack provider by buffering and replaying streaming events in expected order.

Type of change

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

Assisted-by: (e.g., Claude, CodeRabbit, Ollama, etc., N/A if not used)
Generated by: (e.g., tool name and version; N/A if not used)

Related Tickets & Documents

Related Issue #
Closes #

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Please provide detailed steps to perform tests related to this code change.
How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

New Features
- Llama Stack models now provide more reliable streaming by buffering out-of-order tool-call argument events and replaying them once the corresponding call is announced.
- Streaming responses are now better structured for downstream consumers, ensuring tool-call output is emitted in a consistent, correctly sequenced way.
Bug Fixes
- Improved server streaming handling: when a route returns a Starlette StreamingResponse, chunks are streamed correctly as bytes (and otherwise preserved as prior SSE-formatted output).

coderabbitai · 2026-06-16T15:16:35Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6b02b28f-a103-4251-97ec-3a3bb02eebf0

📥 Commits

Reviewing files that changed from the base of the PR and between 98a3c0e and 62375b8.

📒 Files selected for processing (1)

src/pydantic_ai_lightspeed/llamastack/_model.py

📜 Recent review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)

GitHub Check: radon
GitHub Check: integration_tests (3.12)
GitHub Check: integration_tests (3.13)
GitHub Check: E2E: library mode / ci / group 3
GitHub Check: E2E: library mode / ci / group 1
GitHub Check: E2E: server mode / ci / group 3
GitHub Check: E2E: server mode / ci / group 1
GitHub Check: E2E: server mode / ci / group 2
GitHub Check: E2E: library mode / ci / group 2
GitHub Check: E2E Tests for Lightspeed Evaluation job
GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-0-6-on-pull-request

🧰 Additional context used

📓 Path-based instructions (1)

src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Llama Stack imports: Use from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Use async def for I/O operations and external API calls
Use standard log levels with clear purposes: debug() for diagnostic info, info() for program execution, warning() for unexpected events, error() for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Abstract classes must use ABC with @abstractmethod decorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes

Files:

src/pydantic_ai_lightspeed/llamastack/_model.py

🔇 Additional comments (8)

src/pydantic_ai_lightspeed/llamastack/_model.py (8)
61-79: Add missing return type annotations.

The coding guidelines require complete type annotations for all functions. The __init__ and close methods are missing explicit -> None return type annotations.
-    def __init__(self, source: AsyncStream[responses.ResponseStreamEvent]) -> None:
+    def __init__(self, source: AsyncStream[responses.ResponseStreamEvent]) -> None:
         ...

-    async def close(self) -> None:
+    async def close(self) -> None:
Wait - I see these already have -> None. Let me re-check...

Actually, looking more carefully at line 61 and 73, both already have -> None. Approving instead.

98-104: Missing None guard for McpCall.id may cause malformed events.

This issue was flagged in a previous review and appears to still be present. The ResponseFunctionToolCall branch (line 89) guards against falsy event.item.id, but the McpCall branch does not.

1-15: LGTM!

17-45: LGTM!

48-79: LGTM!

117-173: LGTM!

228-231: LGTM!

184-245: LGTM!

Walkthrough

Adds LlamaStackResponsesModel (subclassing OpenAIResponsesModel) with _FilteredResponseStream, which buffers out-of-order ResponseFunctionCallArgumentsDeltaEvent events and replays them after the corresponding tool-call announcement. Also updates the transport layer to forward Starlette StreamingResponse body bytes directly instead of re-wrapping them as SSE.

Changes

Llama Stack Streaming Correctness

Layer / File(s)	Summary
Transport `StreamingResponse` passthrough `src/pydantic_ai_lightspeed/llamastack/_transport.py`	Imports `StreamingResponse` from `starlette.responses` and adds a branch in the streaming byte-generation loop to iterate `body_iterator`, encoding string chunks to UTF-8 and yielding raw byte chunks, bypassing the existing SSE-formatting path.
`_FilteredResponseStream` buffer and replay `src/pydantic_ai_lightspeed/llamastack/_model.py`	Adds module documentation and imports, then implements `_FilteredResponseStream`: buffers early `ResponseFunctionCallArgumentsDeltaEvent` events by `item_id`, and on tool-call announcement replays them unchanged for `ResponseFunctionToolCall` or synthesizes a single combined delta with a `-call`-suffixed `item_id` and incremented `sequence_number` for `McpCall`.
`LlamaStackResponsesModel.request_stream` override `src/pydantic_ai_lightspeed/llamastack/_model.py`	Overrides `request_stream` to wrap the upstream Responses stream with `_FilteredResponseStream`, uses `PeekableAsyncStream` to inspect the first event, raises `UnexpectedModelBehavior` on an empty stream, and yields `OpenAIResponsesStreamedResponse` initialized from `ResponseCreatedEvent` metadata.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

tisnik

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Title check	⚠️ Warning	The title "LCORE-2311: Llama Stack Provider Fix" is vague and does not clearly summarize the main change. While it references a ticket, it uses generic language like "Fix" without describing what specifically is being fixed.	Replace with a more specific title that describes the actual change, such as "Add buffering for Llama Stack streaming response events" or "Fix Llama Stack streaming event ordering with response buffering".

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

✨ Simplify code

Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

asimurka

LGTM

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/pydantic_ai_lightspeed/llamastack/_model.py`:
- Around line 98-104: The McpCall branch starting at line 98 is missing a None
guard for event.item.id, unlike the ResponseFunctionToolCall branch which
properly guards against falsy id values. Add a conditional check to ensure
event.item.id is not None before assigning it to item_id and proceeding with the
yield and _replay_mcp_buffered_deltas call. This prevents malformed delta events
with item_id set to "None-call" from being synthesized when the id is None.
- Line 228: The assert statement used to validate that first_chunk is an
instance of responses.ResponseCreatedEvent will be stripped away when Python is
run with the -O optimization flag, causing the code to proceed with an invalid
value and raise an unclear AttributeError later. Replace the assert isinstance
check with an explicit type validation that raises an appropriate exception
(such as TypeError or ValueError) immediately if first_chunk is not a
ResponseCreatedEvent, ensuring the error is clear and cannot be optimized away.
- Around line 27-39: The module depends on private/internal pydantic-ai APIs
(RunContext, PeekableAsyncStream, Unset, number_to_datetime, and
_map_api_errors) which are implementation details subject to change. The
pydantic-ai version constraint currently has no upper bound, making it
vulnerable to silent breakage from future version updates. Locate the project's
dependency declaration file (typically pyproject.toml or setup.py) and update
the pydantic-ai constraint from "pydantic-ai>=1.99.0" to include an upper bound
such as "pydantic-ai>=1.99.0,<2.0.0" (adjusting the version number as
appropriate for stability), or alternatively add a comment documenting this
private API dependency as a known technical debt if keeping the unconstrained
version is intentional.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 366c0e5a-d8a1-4108-8bc4-3782ccd9ae4e

📥 Commits

Reviewing files that changed from the base of the PR and between 6c7b589 and 98a3c0e.

📒 Files selected for processing (2)

src/pydantic_ai_lightspeed/llamastack/_model.py
src/pydantic_ai_lightspeed/llamastack/_transport.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)

GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-0-6-on-pull-request
GitHub Check: E2E: server mode / ci / group 2
GitHub Check: E2E: server mode / ci / group 1
GitHub Check: E2E: library mode / ci / group 2
GitHub Check: E2E: library mode / ci / group 1
GitHub Check: E2E: library mode / ci / group 3
GitHub Check: E2E: server mode / ci / group 3

🧰 Additional context used

📓 Path-based instructions (1)

src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Llama Stack imports: Use from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Use async def for I/O operations and external API calls
Use standard log levels with clear purposes: debug() for diagnostic info, info() for program execution, warning() for unexpected events, error() for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Abstract classes must use ABC with @abstractmethod decorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes

Files:

src/pydantic_ai_lightspeed/llamastack/_transport.py
src/pydantic_ai_lightspeed/llamastack/_model.py

🪛 ast-grep (0.43.0)

src/pydantic_ai_lightspeed/llamastack/_transport.py

[info] 194-194: use jsonify instead of json.dumps for JSON output
Context: json.dumps(convert_pydantic_to_json_value(chunk))
Note: Security best practice.

(use-jsonify)

🔇 Additional comments (5)

src/pydantic_ai_lightspeed/llamastack/_transport.py (2)

20-20: LGTM!

186-196: LGTM!

src/pydantic_ai_lightspeed/llamastack/_model.py (3)

117-173: LGTM!

213-218: LGTM!

230-242: LGTM!

asimurka · 2026-06-17T06:31:22Z

+                    "Streamed response ended without content or tool calls"
+                )
+
+            assert isinstance(first_chunk, responses.ResponseCreatedEvent)


@jrobertboos please remove assert from runtime validation

jrobertboos · 2026-06-17T14:56:41Z

/retest

tisnik

LGTM

fix

98a3c0e

jrobertboos force-pushed the lcore-2311-fix1 branch from 5e2de0e to 98a3c0e Compare June 16, 2026 16:20

jrobertboos marked this pull request as ready for review June 16, 2026 16:28

asimurka approved these changes Jun 16, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 16, 2026

View reviewed changes

Comment thread src/pydantic_ai_lightspeed/llamastack/_model.py

Comment thread src/pydantic_ai_lightspeed/llamastack/_model.py

Comment thread src/pydantic_ai_lightspeed/llamastack/_model.py Outdated

asimurka requested changes Jun 17, 2026

View reviewed changes

address comments

62375b8

jrobertboos requested a review from asimurka June 17, 2026 15:00

asimurka approved these changes Jun 17, 2026

View reviewed changes

tisnik approved these changes Jun 17, 2026

View reviewed changes

tisnik merged commit 570a66e into lightspeed-core:main Jun 17, 2026
38 of 45 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LCORE-2311: Llama Stack Provider Fix#1936

LCORE-2311: Llama Stack Provider Fix#1936
tisnik merged 2 commits into
lightspeed-core:mainfrom
jrobertboos:lcore-2311-fix1

jrobertboos commented Jun 16, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 16, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

asimurka left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

asimurka Jun 17, 2026

Uh oh!

jrobertboos Jun 17, 2026

Uh oh!

jrobertboos commented Jun 17, 2026

Uh oh!

tisnik left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jrobertboos commented Jun 16, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Tools used to create PR

Related Tickets & Documents

Checklist before requesting a review

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

asimurka left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

asimurka Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

jrobertboos Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

jrobertboos commented Jun 17, 2026

Uh oh!

tisnik left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jrobertboos commented Jun 16, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 16, 2026 •

edited

Loading