Skip to content

Ascii.Equals and Ascii.EqualsIgnoreCase#84886

Merged
jeffhandley merged 6 commits into
dotnet:mainfrom
adamsitnik:asciiEquals
Apr 22, 2023
Merged

Ascii.Equals and Ascii.EqualsIgnoreCase#84886
jeffhandley merged 6 commits into
dotnet:mainfrom
adamsitnik:asciiEquals

Conversation

@adamsitnik

@adamsitnik adamsitnik commented Apr 15, 2023

Copy link
Copy Markdown
Member

fixes #84887

@ghost ghost assigned adamsitnik Apr 15, 2023
@ghost ghost added needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners new-api-needs-documentation labels Apr 15, 2023
@ghost

ghost commented Apr 15, 2023

Copy link
Copy Markdown

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

@adamsitnik adamsitnik added area-System.Text.Encoding and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Apr 15, 2023
@ghost

ghost commented Apr 15, 2023

Copy link
Copy Markdown

Tagging subscribers to this area: @dotnet/area-system-text-encoding
See info in area-owners.md if you want to be subscribed.

Issue Details

null

Author: adamsitnik
Assignees: adamsitnik
Labels:

area-System.Text.Encoding, new-api-needs-documentation

Milestone: -

Comment thread src/libraries/System.Net.Http/src/System/Net/Http/ByteArrayHelpers.cs Outdated
Comment thread src/libraries/System.Text.Encoding/tests/Ascii/EqualsTests.cs Outdated
Comment thread src/libraries/System.Text.Encoding/tests/Ascii/EqualsTests.cs Outdated
Comment thread src/libraries/System.Text.Encoding/tests/Ascii/EqualsTests.cs Outdated

namespace System.Text.Tests
{
public static class EqualsTests

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest adding a could tests that show real-world use cases like what we see in networking or aspnetcore, comparing against a constant. Such tests could use a non-ascii utf8 sequence that could match something provided by client input in one of those scenarios.

for (int i = 0; i < right.Length; i++)
{
char c = right[i];
byte b = left[i];

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we care about the bound check here?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we've already confirmed left.Length == right.Length above, I think we're OK without another bounds check here.

Comment thread src/libraries/System.Private.CoreLib/src/System/Text/Ascii.Equality.cs Outdated
Comment thread src/libraries/System.Private.CoreLib/src/System/Text/Ascii.Equality.cs Outdated
{
Debug.Assert(left.Length == right.Length);

for (int i = 0; i < left.Length; i++)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is no vectorization needed?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps vectorization here would be a further perf opportunity that was missed. I recommend we proceed with mering this without it and potentially follow up with vectorizing it in a separate PR.

}
else
{
(Vector64<ushort> lower, Vector64<ushort> upper) = Vector64.Widen(bytes);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC Vector64 uses the software fallback on x64.
crossgen2 with main-branch shows that too.

I don't know if there are plans or WIP to hw-accelerate Vector64 in the .NET 8 timeframe.
Otherwise maybe read the value as long into a Vector128 and do the widening then. Something like

static Vector128<ushort> LoadAndWiden(ref byte ptr)
{
    if (AdvSimd.IsSupported)
    {
        Vector64<byte> vec64 = Vector64.LoadUnsafe(ref ptr);
        return AdvSimd.ZeroExtendWideningLower(vec64);
    }
    else if (Sse2.IsSupported)
    {
        ulong value = Unsafe.ReadUnaligned<ulong>(ref ptr);
        Vector128<byte> vec = Vector128.CreateScalarUnsafe(value).AsByte();
        return Sse2.UnpackLow(vec, Vector128<byte>.Zero).AsUInt16();
    }
    else
    {
        Vector64<byte> vec64 = Vector64.LoadUnsafe(ref ptr);
        (Vector64<ushort> lower, Vector64<ushort> upper) = Vector64.Widen(vec64);
        return Vector128.Create(lower, upper);
    }
}

}

private static bool VectorContainsNonAsciiChar(Vector64<byte> bytes)
=> !Utf8Utility.AllBytesInUInt64AreAscii(bytes.AsUInt64().ToScalar());

@gfoidl gfoidl Apr 17, 2023

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same concern here for Vector64. Pass in ulong as Utf8Utility.AllBytesInUInt64AreAscii expects that.
So get rid of Vector64 where possible and use ulong instead?

Edit: the vector is stored to stack, then the ulong read from there. Not ideal, but not too bad.

Co-authored-by: Günther Foidl <gue@korporal.at>
@adamsitnik adamsitnik marked this pull request as ready for review April 21, 2023 18:23
@adamsitnik

Copy link
Copy Markdown
Member Author

@gfoidl thank you for your suggestions and ideas! I won't have the time to implement all perf improvements now (I started paternity leave very recently and now I just want to get this issue off my TODO list). Would you be interested in sending a PR with improvements once this PR is merged?

@gfoidl

gfoidl commented Apr 21, 2023

Copy link
Copy Markdown
Member

Would you be interested in sending a PR with improvements once this PR is merged?

Yep, I can do the follow up.

paternity leave

Congratulations!

@jeffhandley jeffhandley left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. @adamsitnik requested we merge this without the vectorization improvements recommended by @gfoidl so that we can get in ahead of the Preview 4 snap (and while Adam is unavailable), and then potentially follow up with an additional PR to improve the vectorization.

@gfoidl -- would you like to submit a follow-up PR with those recommendations you made?

for (int i = 0; i < right.Length; i++)
{
char c = right[i];
byte b = left[i];

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we've already confirmed left.Length == right.Length above, I think we're OK without another bounds check here.

{
Debug.Assert(left.Length == right.Length);

for (int i = 0; i < left.Length; i++)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps vectorization here would be a further perf opportunity that was missed. I recommend we proceed with mering this without it and potentially follow up with vectorizing it in a separate PR.

@gfoidl

gfoidl commented Apr 21, 2023

Copy link
Copy Markdown
Member

@jeffhandley yes I'll do a follow up PR to address the perf-points (maybe soon, otherwise in about 2 weeks as rough timeframe).
Bring this PR in...

@jeffhandley

Copy link
Copy Markdown
Member

Awesome; thank you! I'll merge when green.

@jeffhandley jeffhandley merged commit 880d44c into dotnet:main Apr 22, 2023
gfoidl added a commit to gfoidl/dotnet-runtime that referenced this pull request May 8, 2023
@gfoidl

gfoidl commented May 8, 2023

Copy link
Copy Markdown
Member

@jeffhandley follow up PR is here: #85926

@ghost ghost locked as resolved and limited conversation to collaborators Jun 7, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ascii.Equals

5 participants