Gambiarra to accept tool calls. by rivaldodev · Pull Request #559 · microsoft/BitNet

rivaldodev · 2026-05-19T14:04:31Z

Adds a patch mechanism to support OpenAI-style tool calls in the bundled llama.cpp server without changing the submodule pointer.

Changes

Adds patches/llama-server-tools.patch.
Adds utils/apply_local_patches.py.
Updates setup_env.py to apply local patches before building.
Enables the server to accept tools / tool_choice requests instead of rejecting them.
Converts strict JSON tool-call responses into OpenAI-compatible message.tool_calls.
Handles both tool_calls and tool_call response shapes.
Converts tool result messages into user-visible observations so the model can produce a final answer.

rivaldodev · 2026-05-19T14:07:56Z

@microsoft-github-policy-service agree

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a local patching mechanism to modify the bundled llama.cpp server at build time (without moving the submodule pointer) so it can accept OpenAI-style tools / tool_choice requests and translate strict JSON tool-call outputs into OpenAI-compatible message.tool_calls.

Changes:

Introduces a Python patch applier that updates llama.cpp sources (including a CORS preflight tweak) prior to builds.
Adds a unified diff patch that adapts the llama.cpp server utilities for tool calls / tool results conversion.
Updates the environment setup script to apply patches automatically before compiling.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

File	Description
utils/apply_local_patches.py	Implements local unified-diff patch application and a direct CORS block replacement.
setup_env.py	Runs the local patch step automatically before build steps.
patches/llama-server-tools.patch	Patch content that modifies llama.cpp server utils to support tool calls and tool result formatting.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (2)

patches/llama-server-tools.patch:1

nlohmann::json::contains() throws type_error when the JSON value is not an object. If the model outputs valid JSON that’s an array/number/string (e.g., [], "ok"), this will crash the server instead of returning “no tool calls”. Add an early guard like if (!parsed.is_object()) return json::array(); (or equivalent) before calling contains().

diff --git a/examples/server/utils.hpp b/examples/server/utils.hpp

patches/llama-server-tools.patch:1

This fallback treats any JSON object containing "name" (or "function") as a tool call, which can misclassify normal “JSON-only” answers (common for structured output) as tool invocations and incorrectly set finish_reason = "tool_calls". Tighten the detection criteria (e.g., require tool_calls/tool_call, or require arguments alongside name, and/or only attempt this upgrade when the request actually provided tools).

diff --git a/examples/server/utils.hpp b/examples/server/utils.hpp

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (1)

patches/llama-server-tools.patch:1

In non-streaming mode, the server will now reinterpret any response that happens to parse as certain JSON shapes as a tool call and emit "content": null + message.tool_calls, even when the client did not send tools / tool_choice. That’s a potentially breaking response-shape change for clients expecting plain text JSON responses. A safer approach is to gate this upgrade behind an explicit request signal (e.g., only when tools were provided, or when __oaicompat_tools is present, or when a dedicated compatibility flag is set).

diff --git a/examples/server/server.cpp b/examples/server/server.cpp

+            while i < len(patch_lines) and not patch_lines[i].startswith("@@ ") and not patch_lines[i].startswith("diff --git "):
+                line = patch_lines[i]
+                if line.startswith("\\ No newline"):
+                    i += 1
+                    continue
+                if line == "":
+                    old_lines.append("")
+                    new_lines.append("")
+                    i += 1
+                    continue
+
+                marker = line[:1]
+                value = line[1:]
+                if marker == " ":
+                    old_lines.append(value)
+                    new_lines.append(value)
+                elif marker == "-":
+                    old_lines.append(value)
+                elif marker == "+":
+                    new_lines.append(value)
+                else:
+                    return None
+                i += 1


+def ensure_server_cors_patch() -> None:
+    if not SERVER_CPP.exists():
+        print(f"Skipping llama.cpp CORS patch: file not found at {SERVER_CPP}")
+        return
+
+    content = SERVER_CPP.read_text(encoding="utf-8")
+    cors_comment = "    // CORS preflight"
+    start = content.find(cors_comment)
+    if start == -1:
+        print("Failed to locate CORS preflight block in server.cpp", file=sys.stderr)
+        sys.exit(1)
+
+    end_marker = "    });"
+    end = content.find(end_marker, start)
+    if end == -1:
+        print("Failed to locate end of CORS preflight block in server.cpp", file=sys.stderr)
+        sys.exit(1)
+    end += len(end_marker)
+
+    current_block = content[start:end]
+    if "Access-Control-Request-Headers" in current_block:
+        print("llama.cpp CORS patch already applied")
+        return
+
+    required_markers = (
+        "svr->Options",
+        "httplib::Request &",
+        "httplib::Response & res",
+        'res.set_header("Access-Control-Allow-Methods"',
+        'res.set_header("Access-Control-Allow-Headers"',
+    )
+    if not all(marker in current_block for marker in required_markers):
+        print("Failed to locate expected CORS preflight lines in server.cpp", file=sys.stderr)
+        sys.exit(1)
+


+        if status == "already":
+            already_count += 1
+            continue
+        pending_writes.append((file_patch.target, new_content or ""))


+static std::string trim(const std::string & str) {
+    const auto first = str.find_first_not_of(" \t\n\r");
+    if (first == std::string::npos) {
+        return "";
+    }
+
+    const auto last = str.find_last_not_of(" \t\n\r");
+    return str.substr(first, last - first + 1);
+}
+
+static json normalize_tool_call_arguments(const json & args) {
+    if (args.is_string()) {
+        try {
+            return json::parse(args.get<std::string>());
+        } catch (const json::parse_error &) {
+            return json::object({{"input", args.get<std::string>()}});
+        }
+    }


Gambiarra to accept tool calls.

9b76190

Copilot AI review requested due to automatic review settings May 19, 2026 14:04

Copilot AI reviewed May 19, 2026

View reviewed changes

Copilot started reviewing on behalf of rivaldodev May 19, 2026 14:17 View session

feat: enhance CORS patching and refactor patch application logic

7a0f8af

rivaldodev requested a review from Copilot May 19, 2026 14:34

Copilot AI reviewed May 19, 2026

View reviewed changes

Comment thread utils/apply_local_patches.py Outdated

Comment thread utils/apply_local_patches.py

Comment thread utils/apply_local_patches.py

Comment thread setup_env.py

Copilot started reviewing on behalf of rivaldodev May 19, 2026 14:55 View session

Address Copilot patch review feedback

dca304a

rivaldodev requested a review from Copilot May 19, 2026 15:00

Copilot started reviewing on behalf of rivaldodev May 19, 2026 15:09 View session

Handle escaped tool call JSON

acd69a4

Copilot AI reviewed May 19, 2026

View reviewed changes

rivaldodev requested a review from Copilot May 19, 2026 15:26

Copilot started reviewing on behalf of rivaldodev May 19, 2026 15:28 View session

Copilot AI reviewed May 19, 2026

View reviewed changes

Support streaming tool call conversion

709ec29

rivaldodev requested a review from Copilot May 22, 2026 02:54

Copilot AI reviewed May 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gambiarra to accept tool calls.#559

Gambiarra to accept tool calls.#559
rivaldodev wants to merge 5 commits into
microsoft:mainfrom
rivaldodev:gambiarra

rivaldodev commented May 19, 2026

Uh oh!

rivaldodev commented May 19, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rivaldodev commented May 19, 2026

Uh oh!

rivaldodev commented May 19, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants