Gambiarra to accept tool calls.#559
Conversation
|
@microsoft-github-policy-service agree |
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds a local patching mechanism to modify the bundled llama.cpp server at build time (without moving the submodule pointer) so it can accept OpenAI-style tools / tool_choice requests and translate strict JSON tool-call outputs into OpenAI-compatible message.tool_calls.
Changes:
- Introduces a Python patch applier that updates
llama.cppsources (including a CORS preflight tweak) prior to builds. - Adds a unified diff patch that adapts the llama.cpp server utilities for tool calls / tool results conversion.
- Updates the environment setup script to apply patches automatically before compiling.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| utils/apply_local_patches.py | Implements local unified-diff patch application and a direct CORS block replacement. |
| setup_env.py | Runs the local patch step automatically before build steps. |
| patches/llama-server-tools.patch | Patch content that modifies llama.cpp server utils to support tool calls and tool result formatting. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (2)
patches/llama-server-tools.patch:1
nlohmann::json::contains()throwstype_errorwhen the JSON value is not an object. If the model outputs valid JSON that’s an array/number/string (e.g.,[],"ok"), this will crash the server instead of returning “no tool calls”. Add an early guard likeif (!parsed.is_object()) return json::array();(or equivalent) before callingcontains().
diff --git a/examples/server/utils.hpp b/examples/server/utils.hpp
patches/llama-server-tools.patch:1
- This fallback treats any JSON object containing
"name"(or"function") as a tool call, which can misclassify normal “JSON-only” answers (common for structured output) as tool invocations and incorrectly setfinish_reason = "tool_calls". Tighten the detection criteria (e.g., requiretool_calls/tool_call, or requireargumentsalongsidename, and/or only attempt this upgrade when the request actually providedtools).
diff --git a/examples/server/utils.hpp b/examples/server/utils.hpp
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (1)
patches/llama-server-tools.patch:1
- In non-streaming mode, the server will now reinterpret any response that happens to parse as certain JSON shapes as a tool call and emit
"content": null+message.tool_calls, even when the client did not sendtools/tool_choice. That’s a potentially breaking response-shape change for clients expecting plain text JSON responses. A safer approach is to gate this upgrade behind an explicit request signal (e.g., only whentoolswere provided, or when__oaicompat_toolsis present, or when a dedicated compatibility flag is set).
diff --git a/examples/server/server.cpp b/examples/server/server.cpp
| while i < len(patch_lines) and not patch_lines[i].startswith("@@ ") and not patch_lines[i].startswith("diff --git "): | ||
| line = patch_lines[i] | ||
| if line.startswith("\\ No newline"): | ||
| i += 1 | ||
| continue | ||
| if line == "": | ||
| old_lines.append("") | ||
| new_lines.append("") | ||
| i += 1 | ||
| continue | ||
|
|
||
| marker = line[:1] | ||
| value = line[1:] | ||
| if marker == " ": | ||
| old_lines.append(value) | ||
| new_lines.append(value) | ||
| elif marker == "-": | ||
| old_lines.append(value) | ||
| elif marker == "+": | ||
| new_lines.append(value) | ||
| else: | ||
| return None | ||
| i += 1 |
| def ensure_server_cors_patch() -> None: | ||
| if not SERVER_CPP.exists(): | ||
| print(f"Skipping llama.cpp CORS patch: file not found at {SERVER_CPP}") | ||
| return | ||
|
|
||
| content = SERVER_CPP.read_text(encoding="utf-8") | ||
| cors_comment = " // CORS preflight" | ||
| start = content.find(cors_comment) | ||
| if start == -1: | ||
| print("Failed to locate CORS preflight block in server.cpp", file=sys.stderr) | ||
| sys.exit(1) | ||
|
|
||
| end_marker = " });" | ||
| end = content.find(end_marker, start) | ||
| if end == -1: | ||
| print("Failed to locate end of CORS preflight block in server.cpp", file=sys.stderr) | ||
| sys.exit(1) | ||
| end += len(end_marker) | ||
|
|
||
| current_block = content[start:end] | ||
| if "Access-Control-Request-Headers" in current_block: | ||
| print("llama.cpp CORS patch already applied") | ||
| return | ||
|
|
||
| required_markers = ( | ||
| "svr->Options", | ||
| "httplib::Request &", | ||
| "httplib::Response & res", | ||
| 'res.set_header("Access-Control-Allow-Methods"', | ||
| 'res.set_header("Access-Control-Allow-Headers"', | ||
| ) | ||
| if not all(marker in current_block for marker in required_markers): | ||
| print("Failed to locate expected CORS preflight lines in server.cpp", file=sys.stderr) | ||
| sys.exit(1) | ||
|
|
| if status == "already": | ||
| already_count += 1 | ||
| continue | ||
| pending_writes.append((file_patch.target, new_content or "")) |
| +static std::string trim(const std::string & str) { | ||
| + const auto first = str.find_first_not_of(" \t\n\r"); | ||
| + if (first == std::string::npos) { | ||
| + return ""; | ||
| + } | ||
| + | ||
| + const auto last = str.find_last_not_of(" \t\n\r"); | ||
| + return str.substr(first, last - first + 1); | ||
| +} | ||
| + | ||
| +static json normalize_tool_call_arguments(const json & args) { | ||
| + if (args.is_string()) { | ||
| + try { | ||
| + return json::parse(args.get<std::string>()); | ||
| + } catch (const json::parse_error &) { | ||
| + return json::object({{"input", args.get<std::string>()}}); | ||
| + } | ||
| + } |
Adds a patch mechanism to support OpenAI-style tool calls in the bundled llama.cpp server without changing the submodule pointer.
Changes
Adds patches/llama-server-tools.patch.
Adds utils/apply_local_patches.py.
Updates setup_env.py to apply local patches before building.
Enables the server to accept tools / tool_choice requests instead of rejecting them.
Converts strict JSON tool-call responses into OpenAI-compatible message.tool_calls.
Handles both tool_calls and tool_call response shapes.
Converts tool result messages into user-visible observations so the model can produce a final answer.