Skip to content

Optimize splitting by semicolons#1314

Merged
josevalim merged 1 commit into
elixir-plug:mainfrom
preciz:optimization30
Jun 15, 2026
Merged

Optimize splitting by semicolons#1314
josevalim merged 1 commit into
elixir-plug:mainfrom
preciz:optimization30

Conversation

@preciz

@preciz preciz commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Assited by: Antigravity CLI : Gemini 3.5 Flash

Benchmarks show >2x speedup and smaller memory usage.

Bench:

Mix.install([
  {:benchee, "~> 1.0"}
])

defmodule Original do
  def split_semicolon(t) do
    split_semicolon(t, "", [], false)
  end

  defp split_semicolon(<<>>, <<>>, acc, _), do: acc
  defp split_semicolon(<<>>, buffer, acc, _), do: [buffer | acc]

  defp split_semicolon(<<?", rest::binary>>, buffer, acc, quoted?),
    do: split_semicolon(rest, <<buffer::binary, ?">>, acc, not quoted?)

  defp split_semicolon(<<?;, rest::binary>>, buffer, acc, false),
    do: split_semicolon(rest, <<>>, [buffer | acc], false)

  defp split_semicolon(<<char, rest::binary>>, buffer, acc, quoted?),
    do: split_semicolon(rest, <<buffer::binary, char>>, acc, quoted?)
end

defmodule Optimized do
  def split_semicolon(t) do
    split_semicolon(t, t, 0, [], false, 0)
  end

  defp split_semicolon(<<?", rest::binary>>, original, start, acc, quoted?, len) do
    split_semicolon(rest, original, start, acc, not quoted?, len + 1)
  end

  defp split_semicolon(<<?;, rest::binary>>, original, start, acc, false, len) do
    part = binary_part(original, start, len)
    split_semicolon(rest, original, start + len + 1, [part | acc], false, 0)
  end

  defp split_semicolon(<<_, rest::binary>>, original, start, acc, quoted?, len) do
    split_semicolon(rest, original, start, acc, quoted?, len + 1)
  end

  defp split_semicolon(<<>>, _original, _start, acc, _quoted?, 0) do
    acc
  end

  defp split_semicolon(<<>>, original, start, acc, _quoted?, len) do
    [binary_part(original, start, len) | acc]
  end
end

# Sanity check
unless Original.split_semicolon("foo=\"bar; baz\"; qux=123") == Optimized.split_semicolon("foo=\"bar; baz\"; qux=123") do
  raise "Sanity check failed! Original: #{inspect Original.split_semicolon("foo=\"bar; baz\"; qux=123")}, Optimized: #{inspect Optimized.split_semicolon("foo=\"bar; baz\"; qux=123")}"
end

Benchee.run(
  %{
    "original split_semicolon" => fn {input} -> Original.split_semicolon(input) end,
    "optimized split_semicolon" => fn {input} -> Optimized.split_semicolon(input) end
  },
  inputs: %{
    "Short Header" => {"foo=bar; baz=qux"},
    "Long Header" => {String.duplicate("foo=bar; baz=qux; foo=\"bar; baz\"; ", 20)}
  },
  time: 2,
  memory_time: 2
)

Results on noisy system:

Operating System: Linux
CPU Information: AMD Ryzen 7 8845HS w
Number of Available Cores: 16
Available memory: 54.72 GB
Elixir 1.20.0
Erlang 29.0.1
JIT enabled: true

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 2 s
memory time: 2 s
reduction time: 0 ns
parallel: 1
inputs: Long Header, Short Header
Estimated total run time: 24 s
Excluding outliers: false

Benchmarking optimized split_semicolon with input Long Header ...
Benchmarking optimized split_semicolon with input Short Header ...
Benchmarking original split_semicolon with input Long Header ...
Benchmarking original split_semicolon with input Short Header ...
Calculating statistics...
Formatting results...

##### With input Long Header #####
Name                                ips        average  deviation         median         99th %
optimized split_semicolon      425.50 K        2.35 μs   ±252.09%        2.24 μs        4.19 μs
original split_semicolon       164.29 K        6.09 μs    ±62.94%        5.62 μs       19.89 μs

Comparison:
optimized split_semicolon      425.50 K
original split_semicolon       164.29 K - 2.59x slower +3.74 μs

Memory usage statistics:

Name                         Memory usage
optimized split_semicolon         0.99 KB
original split_semicolon          4.80 KB - 4.84x memory usage +3.81 KB

**All measurements for memory usage were the same**

##### With input Short Header #####
Name                                ips        average  deviation         median         99th %
optimized split_semicolon       11.54 M       86.66 ns  ±3032.88%          80 ns         161 ns
original split_semicolon         5.43 M      184.07 ns   ±422.95%         171 ns         321 ns

Comparison:
optimized split_semicolon       11.54 M
original split_semicolon         5.43 M - 2.12x slower +97.41 ns

Memory usage statistics:

Name                         Memory usage
optimized split_semicolon            72 B
original split_semicolon            200 B - 2.78x memory usage +128 B

**All measurements for memory usage were the same**

@josevalim josevalim merged commit 74b5cda into elixir-plug:main Jun 15, 2026
2 checks passed
@josevalim

Copy link
Copy Markdown
Member

💚 💙 💜 💛 ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants