Skip to content

Performance regression in Unicode.String.Break.next/4 starting from v1.3.0 #6

Description

@mntns

After updating from v1.2.1 to v1.3.1 we noticed a performance regression in Unicode.String.Break.next/4, which is now slower by a factor of ~50. It is still present in v1.4.0.

I did some preliminary profiling and found that a lot of time seems to be spent in regex compilation.

Reproduction

[unicode_string_ver] = System.argv() 

Mix.install([
  {:unicode_string, unicode_string_ver}
])

:timer.tc(fn -> 
  Enum.each(1..100, fn _ -> 
    {_, _} = Unicode.String.Break.next("test123 ", "root", :word, [])
  end)
end)
|> elem(0)
|> IO.inspect(label: "usecs")
> elixir repro.exs "~> 1.2.1"
usecs: 28131

> elixir repro.exs "~> 1.3.1"
usecs: 1456437

> elixir repro.exs "~> 1.4.0"
usecs: 1431304
Erlang/OTP 25 [erts-13.2.2.4] [source] [64-bit] [smp:16:16] [ds:16:16:10] [async-threads:1] [jit:ns]

Elixir 1.15.7 (compiled with Erlang/OTP 25)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions