Skip to content

Fix LoaderHeap's free list growing more than expected#129203

Merged
eduardo-vp merged 9 commits into
dotnet:mainfrom
eduardo-vp:xunit-reg-linux
Jun 18, 2026
Merged

Fix LoaderHeap's free list growing more than expected#129203
eduardo-vp merged 9 commits into
dotnet:mainfrom
eduardo-vp:xunit-reg-linux

Conversation

@eduardo-vp

@eduardo-vp eduardo-vp commented Jun 10, 2026

Copy link
Copy Markdown
Member

I was working with the test in https://github.com/korchak-aleksandr/net10-regression-repro and found out that the free list in UnlockedLoaderHeap grows to thousands of elements, which makes allocations very slow since we do a linear scan of this free list for each one of them.

In these scenarios multiple threads might need the same generic instantiation simultaneously and they all race to create/publish it. Multiple threads can lose the race and quickly add blocks to the free list since they don't need that memory. Subsequent calls that need generic instantiations do a linear scan of the free list to find a memory block to reuse. This ends up taking a lot of time due to its size (can be up to ten of thousands).

This PR stops making thread race to create/publish such that we don't insert several blocks in the free list.

.NET 10 .NET 10 + this PR
Time 91.2 s 3.9 s
Max RSS 114.1 MB 117.5 MB

Copilot AI review requested due to automatic review settings June 10, 2026 01:22

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR restructures UnlockedLoaderHeap’s free list to reduce allocation-time overhead in backout-heavy scenarios by replacing a single linear-scanned free list with size-segregated buckets for common small block sizes plus an overflow list for larger blocks.

Changes:

  • Replaces m_pFirstFreeBlock with 32 size buckets (pointer-size increments) and a separate “large/overflow” free list.
  • Updates free-block insertion/allocation logic to use bucketed O(1) reuse for small sizes, and retains linear scanning only for the overflow list (including a stress-log warning on long scans).
  • Adjusts debug-only free-list dumping and validation to iterate across all buckets and the overflow list.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
src/coreclr/utilcode/loaderheap.cpp Implements bucket initialization, bucket-aware allocation/insertion, overflow scan warning, and updates debug dump/validation to traverse buckets.
src/coreclr/utilcode/loaderheap_shared.h Updates LoaderHeapFreeBlock API to no longer take an explicit head pointer (heap chooses bucket internally).
src/coreclr/inc/loaderheap.h Adds bucket/overflow free list fields and related constants to UnlockedLoaderHeap.

Comment thread src/coreclr/utilcode/loaderheap.cpp Outdated
Comment thread src/coreclr/utilcode/loaderheap.cpp Outdated
Comment thread src/coreclr/utilcode/loaderheap_shared.h Outdated
@jkotas

jkotas commented Jun 10, 2026

Copy link
Copy Markdown
Member

In these scenarios multiple threads might need the same generic instantiation simultaneously and they all race to create/publish it

Where is this race condition exactly? Can we add a lock there instead?

The freelist in LoaderHeap is meant to be only used in error conditions to backout types that failed to load, or to deal with rare race condition. If you see the freelist growing this much, it means that the loader heap is not used correctly. We should fix that instead.

@eduardo-vp

Copy link
Copy Markdown
Member Author

I'll take a look, this the part where many threads lose

InstantiatedMethodDesc *pOldMD = FindLoadedInstantiatedMethodDesc(pExactMT,
pGenericMDescInRepMT->GetMemberDef(),
methodInst,
getWrappedCode,
pGenericMDescInRepMT->IsAsyncVariantMethod());
if (pOldMD == NULL)
{
// No one else got there first, our MethodDesc wins.
amt.SuppressRelease();
#ifdef _DEBUG
SString name;
TypeString::AppendMethodDebug(name, pNewMD);
const char* pDebugNameUTF8 = name.GetUTF8();
const char* verb = "Created";
if (pWrappedMD)
LOG((LF_CLASSLOADER, LL_INFO1000,
"GENERICS: %s instantiating-stub method desc %s with dictionary size %d\n",
verb, pDebugNameUTF8, infoSize));
else
LOG((LF_CLASSLOADER, LL_INFO1000,
"GENERICS: %s instantiated method desc %s\n",
verb, pDebugNameUTF8));
S_SIZE_T safeLen = S_SIZE_T(strlen(pDebugNameUTF8))+S_SIZE_T(1);
if(safeLen.IsOverflow()) COMPlusThrowHR(COR_E_OVERFLOW);
size_t len = safeLen.Value();
pNewMD->m_pszDebugMethodName = (char*) (void*)pAllocator->GetLowFrequencyHeap()->AllocMem(safeLen);
_ASSERTE(pNewMD->m_pszDebugMethodName);
strcpy_s((char *) pNewMD->m_pszDebugMethodName, len, pDebugNameUTF8);
pNewMD->m_pszDebugClassName = pExactMT->GetDebugClassName();
pNewMD->m_pszDebugMethodSignature = (LPUTF8)pNewMD->m_pszDebugMethodName;
#endif // _DEBUG
// Generic methods can't be varargs. code:MethodTableBuilder::ValidateMethods should have checked it.
_ASSERTE(!pNewMD->IsVarArg());
// Verify that we are not creating redundant MethodDescs
_ASSERTE(!pNewMD->IsTightlyBoundToMethodTable());
// The method desc is fully set up; now add to the table
InstMethodHashTable* pTable = pExactMDLoaderModule->GetInstMethodHashTable();
pTable->InsertMethodDesc(pNewMD);
}
else
pNewMD = pOldMD;
// CrstHolder goes out of scope here
}

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

Comment thread src/coreclr/vm/genmeth.cpp
Comment thread src/coreclr/vm/genmeth.cpp
@eduardo-vp eduardo-vp changed the title Add size-segregated buckets in UnlockedLoaderHeap Fix LoaderHeap's free list growing more than expected Jun 11, 2026
@eduardo-vp eduardo-vp requested a review from Copilot June 11, 2026 00:17

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

Comment thread src/coreclr/vm/genmeth.cpp Outdated
Comment thread src/coreclr/vm/genmeth.cpp Outdated
Comment thread src/coreclr/vm/genmeth.cpp Outdated

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

Comment thread src/coreclr/vm/genmeth.cpp Outdated
Comment thread src/coreclr/vm/genmeth.cpp
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 11, 2026 02:54

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.

Comment thread src/coreclr/vm/genmeth.cpp
Comment thread src/coreclr/vm/genmeth.cpp Outdated
Comment thread src/coreclr/vm/genmeth.cpp
Comment thread src/coreclr/vm/genmeth.cpp
Copilot AI review requested due to automatic review settings June 15, 2026 21:39

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.

@jkotas

jkotas commented Jun 15, 2026

Copy link
Copy Markdown
Member

Could you please run a perf test that tries to create many different method instantiations on multiple threads, and see whether taking a lock around MethodDesc creation makes it measurably slower?

If we find that it is getting measurably slower, we may want to do something else about it - e.g. move more of the work outside the lock.

@eduardo-vp

Copy link
Copy Markdown
Member Author

I used a script to create a Gen.cs file and test the creation of 150k different method instantiations.

public static class W
{
    [System.Runtime.CompilerServices.MethodImpl(System.Runtime.CompilerServices.MethodImplOptions.NoInlining)]
    public static long M1<T>() => System.Runtime.CompilerServices.Unsafe.SizeOf<T>();
    [System.Runtime.CompilerServices.MethodImpl(System.Runtime.CompilerServices.MethodImplOptions.NoInlining)]
    public static long M2<T>() => System.Runtime.CompilerServices.Unsafe.SizeOf<T>();
    [System.Runtime.CompilerServices.MethodImpl(System.Runtime.CompilerServices.MethodImplOptions.NoInlining)]
    public static long M3<T>() => System.Runtime.CompilerServices.Unsafe.SizeOf<T>();
    [System.Runtime.CompilerServices.MethodImpl(System.Runtime.CompilerServices.MethodImplOptions.NoInlining)]
    public static long M4<T>() => System.Runtime.CompilerServices.Unsafe.SizeOf<T>();
    [System.Runtime.CompilerServices.MethodImpl(System.Runtime.CompilerServices.MethodImplOptions.NoInlining)]
    public static long M5<T>() => System.Runtime.CompilerServices.Unsafe.SizeOf<T>();
}

public struct S0 { }
public struct S1 { }
public struct S2 { }
public struct S3 { }
public struct S4 { }
// ...
public struct S29998 { }
public struct S29999 { }

public static class Gen
{
    public static readonly System.Action[] Actions = new System.Action[]
    {
        static () => { W.M1<S0>(); W.M2<S0>(); W.M3<S0>(); W.M4<S0>(); W.M5<S0>(); },
        static () => { W.M1<S1>(); W.M2<S1>(); W.M3<S1>(); W.M4<S1>(); W.M5<S1>(); },
        static () => { W.M1<S2>(); W.M2<S2>(); W.M3<S2>(); W.M4<S2>(); W.M5<S2>(); },
        static () => { W.M1<S3>(); W.M2<S3>(); W.M3<S3>(); W.M4<S3>(); W.M5<S3>(); },
        static () => { W.M1<S4>(); W.M2<S4>(); W.M3<S4>(); W.M4<S4>(); W.M5<S4>(); },
        // ...
        static () => { W.M1<S29998>(); W.M2<S29998>(); W.M3<S29998>(); W.M4<S29998>(); W.M5<S29998>(); },
        static () => { W.M1<S29999>(); W.M2<S29999>(); W.M3<S29999>(); W.M4<S29999>(); W.M5<S29999>(); },
    }
}

The main programs just runs a loop

Parallel.For(0, N, new ParallelOptions { MaxDegreeOfParallelism = Workers }, i => actions[i]());

net10 takes ~690 ms while .net10 + this change takes ~705 ms (around 2% slower in this extreme case). I think this should be fine.

@eduardo-vp

Copy link
Copy Markdown
Member Author

Double checked the perf improvement for the original benchmark too. The CI errors are timeouts only.

@jkotas jkotas added the tenet-performance Performance related issue label Jun 18, 2026

@jkotas jkotas left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@eduardo-vp

Copy link
Copy Markdown
Member Author

/ba-g timeouts

@eduardo-vp eduardo-vp merged commit 33dc710 into dotnet:main Jun 18, 2026
117 of 121 checks passed
@eduardo-vp

Copy link
Copy Markdown
Member Author

/backport to release/10.0

@github-actions

Copy link
Copy Markdown
Contributor

Started backporting to release/10.0 (link to workflow run)

@github-actions

Copy link
Copy Markdown
Contributor

@eduardo-vp backporting to release/10.0 failed, the patch most likely resulted in conflicts. Please backport manually!

git am output
$ git am --3way --empty=keep --ignore-whitespace --keep-non-patch changes.patch

Applying: Add size-segregated buckets
Applying: Remove race in NewInstantiatedMethodDesc
Applying: Revert "Add size-segregated buckets"
Applying: Place constraints check outside of the lock
Applying: Do the constraint check upfront
error: sha1 information is lacking or useless (src/coreclr/vm/genmeth.cpp).
error: could not build fake ancestor
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Patch failed at 0005 Do the constraint check upfront
Error: The process '/usr/bin/git' failed with exit code 128

Link to workflow output

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-VM-coreclr tenet-performance Performance related issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants