Skip to content

Use patchpointinfo for OSR runtime helper#123645

Closed
am11 wants to merge 18 commits into
dotnet:mainfrom
am11:feature/deterministic-unwinding
Closed

Use patchpointinfo for OSR runtime helper#123645
am11 wants to merge 18 commits into
dotnet:mainfrom
am11:feature/deterministic-unwinding

Conversation

@am11

@am11 am11 commented Jan 26, 2026

Copy link
Copy Markdown
Member

On ARM64/LoongArch64/RISC-V64, OSR transitions were reading callee-saved register values from the TransitionBlock, which contains the register values at the moment JIT_Patchpoint was called. Under register stress (or normal optimization), Tier0 may reuse callee-saved registers as temporaries, so these values can be garbage rather than the preserved values the caller expects. This fix reads callee-saves from Tier0's stack save area using offset information recorded in PatchpointInfo during Tier0 compilation; the same location where Tier0 saved the caller's original values in its prolog. x64 doesn't need this because its JIT epilog explicitly pops Tier0's callee-saves from the stack, but bringing it to the same plan for consistency.

Closes #123608
Closes #123605

@github-actions github-actions Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 26, 2026
@dotnet-policy-service dotnet-policy-service Bot added the community-contribution Indicates that the PR has been added by a community member label Jan 26, 2026
@AndyAyersMS

Copy link
Copy Markdown
Member

Can you point out an example where this happens? OSR isn't supported in optimized code, and in unoptimized code all the live state at a patchpoint should be on the stack.

Sounds like register stress is doing something wrong.

@am11

am11 commented Jan 26, 2026

Copy link
Copy Markdown
Member Author

The issue isn't about OSR's live state, but about caller's callee-saved registers that Tier0 must preserve.

Scenario: Test() calls Problem() with x19/x20 holding values. Tier0 saves these in prolog, then under JitStressRegs reuses x19/x20 as scratch. When patchpoint hits, TransitionBlock captures the current register values (scratch), not the saved originals. OSR returns → Test() sees corrupted registers.

Repro: JIT/opt/OSR/Runtime_89666 with DOTNET_JitStressRegs=3 on ARM64

x64 doesn't hit this because its OSR epilog explicitly pops Tier0's callee-saves from stack. Register stress is behaving correctly; the fix reads callee-saves from Tier0's stack save area instead of TransitionBlock.

@AndyAyersMS

Copy link
Copy Markdown
Member

Ah, ok.

We should move these other archs to the same plan as x64 (see #33658), but that is likely easier said than done. But perhaps some of the things that made it hard before have since been fixed.

I went looking for non-stressregs failures in this weekend's runs. There is an x64 failure here, though given the above it is likely something different.

https://dev.azure.com/dnceng-public/public/_build/results?buildId=1266373&view=ms.vss-test-web.build-test-results-tab&runId=35337758&paneView=debug

net11.0-windows-Release-x64-jitosr_stress_random

DOTNET_JitRandomOnStackReplacement=15
DOTNET_OSR_HitLimit=2
DOTNET_TC_OnStackReplacement=1
DOTNET_TC_OnStackReplacement_InitialCounter=1
DOTNET_TC_QuickJitForLoops=1
DOTNET_TieredCompilation=1

  Discovering: System.Net.Http.WinHttpHandler.Functional.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.Net.Http.WinHttpHandler.Functional.Tests (found 368 of 555 test cases)
  Starting:    System.Net.Http.WinHttpHandler.Functional.Tests (parallel test collections = on [4 threads], stop on fail = of
....
  System.Net.Http.Functional.Tests.PlatformHandler_HttpClientHandler_ClientCertificates_Http2_Test.Manual_CertificateOnlySentWhenValid_Success(certIndex: 3, serverExpectsClientCertificate: False) [SKIP]
      https://github.com/dotnet/runtime/issues/69238
Fatal error.
System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
   at System.Net.Http.Functional.Tests.HttpClientHandler_Proxy_Test.ProxyTunnelRequest_MaxConnectionsSetButDoesNotApplyToProxyConnect_Success()
Repeated 2 times:
--------------------------------
   at System.RuntimeMethodHandle.InvokeMethod(System.Runtime.CompilerServices.ObjectHandleOnStack, Void**, System.Runtime.CompilerServices.ObjectHandleOnStack, BOOL, System.Runtime.CompilerServices.ObjectHandleOnStack)
--------------------------------
   at System.Reflection.MethodBaseInvoker.InterpretedInvoke_Method(System.Object, IntPtr*)
   at System.Reflection.RuntimeMethodInfo.Invoke(System.Object, System.Reflection.BindingFlags, System.Reflection.Binder, System.Object[], System.Globalization.CultureInfo)

@am11

am11 commented Jan 27, 2026

Copy link
Copy Markdown
Member Author

NativeAOT size regression failure is #123667 (unrelated; failing since 19+ hours in main).

@AndyAyersMS, @jakobbotsch, could you please run /azp run runtime-coreclr jitstressregs and /azp run runtime-coreclr jitstress2-jitstressregs pipelines against this PR?


@AndyAyersMS, I tried running a few libs tests with

env:DOTNET_TC_OnStackReplacement_InitialCounter=1;
$env:DOTNET_TC_OnStackReplacement=1;
$env:DOTNET_OSR_HitLimit=2;
$env:DOTNET_JitRandomOnStackReplacement=15;
$env:DOTNET_TC_OnStackReplacement=1;
$env:DOTNET_TC_QuickJitForLoops=1;
$env:DOTNET_TieredCompilation=1;

on win-x64, but couldn't make it fail locally. We can open a separate issue to investigate PGO pipeline issue, I will take a look after this stress test thing is done. Having a corerun'able repro would help, I will run a few src/tests with these flags on before/after once we get there.

We should move these other archs to the same plan as x64 (see #33658), but that is likely easier said than done. But perhaps some of the things that made it hard before have since been fixed.

This PR is bringing others partially to x64 plan (JIT side is now same as it populates the patchpointinfo, helper deviates at where to reads info from TransitionBlock on x64 for both modified and unmodified registers and ppInfo on arm64 etc. for modified registers), the remaining partial difference could be resolved by putting x64 to the new plan, but either way, as you said, it's a bit tricky to consolidate. :)

@jakobbotsch

Copy link
Copy Markdown
Member

/azp run runtime-coreclr jitstressregs, runtime-coreclr jitstress2-jitstressregs

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 2 pipeline(s).

@AndyAyersMS

Copy link
Copy Markdown
Member

Do we have a theory on the x64 failures? @jakobbotsch and I were chatting and thought perhaps x64 was insulated... but it appears not.

@am11

am11 commented Jan 27, 2026

Copy link
Copy Markdown
Member Author

I just pushed a commit bringing to the same plan, lets see. I wasn't able to repro one reported issue #123671 (comment) with and without CET, but @janvorli has a private repro with diagnostics tool.

@am11 am11 force-pushed the feature/deterministic-unwinding branch from d197992 to 3ed158b Compare January 28, 2026 08:01
@am11 am11 changed the title Use patchpointinfo for OSR on non-xarch ISAs Use patchpointinfo for OSR runtime helper Jan 28, 2026
@jakobbotsch

Copy link
Copy Markdown
Member

This PR should simplify my job for #120865 as well.

Comment thread src/coreclr/vm/jithelpers.cpp Outdated
@janvorli

janvorli commented Feb 2, 2026

Copy link
Copy Markdown
Member
  • On x64, make it the OSR function's responsibility to push a fake return address on entry

Such a thing would not work with CET enabled (unless the fake return address is just a temporary thing and would not be taken by the CPU).

@AndyAyersMS

AndyAyersMS commented Feb 2, 2026

Copy link
Copy Markdown
Member
  • On x64, make it the OSR function's responsibility to push a fake return address on entry

Such a thing would not work with CET enabled (unless the fake return address is just a temporary thing and would not be taken by the CPU).

The value being pushed doesn't matter, it's not really a return address -- that value is held in the Tier0 frame. The push ensures the OSR method has proper SP alignment on entry, since we are jumping to it instead of calling it.

@janvorli

janvorli commented Feb 2, 2026

Copy link
Copy Markdown
Member

Ok, then it is fine.

@janvorli

janvorli commented Feb 3, 2026

Copy link
Copy Markdown
Member

@am11 I've verified that the internal diagnostic tests don't show any regression with this change.

@am11

am11 commented Feb 3, 2026

Copy link
Copy Markdown
Member Author

Thank you @janvorli. We are trying out Jakob's alternative approach; make JIT jump into OSR and simplify the helper without needing to unwind and adjust the context: #123894.

@am11

am11 commented Feb 4, 2026

Copy link
Copy Markdown
Member Author

@jakobbotsch, while the x64 legs are passing, the arm64 library legs are failing in #123894. The arm64 Pri0 leg passed, but xUnit execution failed. This is likely due to missing frame 1–2 handling compared to frames 3–5, similar to what we addressed here, but it’s taking a while to fully root-cause.

I think we’ll need unwinding support, or some variant of this PR, on that side as well.

Given that, would it make sense to move forward with this PR now and upgrade to the newer plan later? My original goal was to eliminate (or at least reduce) VirtualUnwind* calls and ultimately remove the libunwind dependency on Unix. :)

@am11 am11 marked this pull request as ready for review February 4, 2026 18:25
@am11 am11 requested review from jakobbotsch and janvorli February 5, 2026 10:11
@jakobbotsch

Copy link
Copy Markdown
Member

I think we’ll need unwinding support, or some variant of this PR, on that side as well.

Why do you think that? I would not expect we would need to call any runtime code to transition when the scheme is switched to work like x64.

@am11

am11 commented Feb 5, 2026

Copy link
Copy Markdown
Member Author

Why do you think that? I would not expect we would need to call any runtime code to transition when the scheme is switched to work like x64.

Interestingly, only a small subset of lib tests are failing in the other PR currently, while the runtime tests (including src/tests/JIT/opt/OSR) are all passing. AFAICT, the failure pattern suggests it corrupts x19 reg and we need to restore it in epilogue; consequently, while the OSR transition succeeds but the exception handling after the transition sees x19 reg corrupted and throws access violation. Rerunning the same test shows different error. If I keep VirtualUnwindToFirstManagedFrame heavy handed call in jithelper, it fixes the failure. It's probably something wrong in the epilogue part or the missing frame-type based handling (like we've done in this PR: 16fac4b), but I was having a hard time figuring it out. -.-

Comment thread src/coreclr/vm/excep.cpp
Comment on lines +11300 to +11319
// Registers are pushed from highest to lowest register number (after RBP)
// RBP is not in this loop - it's handled separately
// Register encoding: RAX=0, RCX=1, RDX=2, RBX=3, RSP=4, RBP=5, RSI=6, RDI=7, R8-R15=8-15
if (calleeSaveMask & (1ULL << 15)) { pContext->R15 = *((UINT64*)(calleeSaveBase - offset)); offset += 8; }
else { pContext->R15 = pTransitionBlock->m_calleeSavedRegisters.R15; }
if (calleeSaveMask & (1ULL << 14)) { pContext->R14 = *((UINT64*)(calleeSaveBase - offset)); offset += 8; }
else { pContext->R14 = pTransitionBlock->m_calleeSavedRegisters.R14; }
if (calleeSaveMask & (1ULL << 13)) { pContext->R13 = *((UINT64*)(calleeSaveBase - offset)); offset += 8; }
else { pContext->R13 = pTransitionBlock->m_calleeSavedRegisters.R13; }
if (calleeSaveMask & (1ULL << 12)) { pContext->R12 = *((UINT64*)(calleeSaveBase - offset)); offset += 8; }
else { pContext->R12 = pTransitionBlock->m_calleeSavedRegisters.R12; }
if (calleeSaveMask & (1ULL << 3)) { pContext->Rbx = *((UINT64*)(calleeSaveBase - offset)); offset += 8; }
else { pContext->Rbx = pTransitionBlock->m_calleeSavedRegisters.Rbx; }
#if defined(TARGET_WINDOWS)
// Windows has Rsi and Rdi as callee-saved (pushed after Rbx)
if (calleeSaveMask & (1ULL << 6)) { pContext->Rsi = *((UINT64*)(calleeSaveBase - offset)); offset += 8; }
else { pContext->Rsi = pTransitionBlock->m_calleeSavedRegisters.Rsi; }
if (calleeSaveMask & (1ULL << 7)) { pContext->Rdi = *((UINT64*)(calleeSaveBase - offset)); offset += 8; }
else { pContext->Rdi = pTransitionBlock->m_calleeSavedRegisters.Rdi; }
#endif

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way x64 works the OSR method is responsible for restoring the registers from where the tier0 version saved them, so this seems unnecessary. But seems ok to do it for consistency.

@jakobbotsch jakobbotsch left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is sort of a hybrid approach where the OSR transition helper gets the registers from where tier0 saved them instead of changing the OSR method to restore them from the tier0 frame (what x64 is doing).

It seems ok to me to do it this way for now, but I will definitely need the alternative scheme for #120865.

@am11

am11 commented Feb 6, 2026

Copy link
Copy Markdown
Member Author

It seems ok to me to do it this way for now, but I will definitely need the alternative scheme for #120865.

If you could help with figuring out arm64 failure in #123894, we can go ahead with that. It just gets a bit too involved when only a small subset of tests are failing and we can't repro with corerun.

@jakobbotsch

Copy link
Copy Markdown
Member

It seems ok to me to do it this way for now, but I will definitely need the alternative scheme for #120865.

If you could help with figuring out arm64 failure in #123894, we can go ahead with that. It just gets a bit too involved when only a small subset of tests are failing and we can't repro with corerun.

I can take a look.
From a cursory glance, one missing bit is the epilog change that restores callee saves directly from tier0.

@am11

am11 commented Feb 10, 2026

Copy link
Copy Markdown
Member Author

Oh, good catch! Pushed a commit there. The failing reflection emit test passed locally. So lets wait for the CI. :)

@JulieLeeMSFT

Copy link
Copy Markdown
Member

@jakobbotsch, PTAL.

@jakobbotsch

Copy link
Copy Markdown
Member

Going to close this. Once #126880 is merged I think picking out the GT_PATCHPOINT related changes from #123894 would be the simpler path. @am11 are you interested in submitting that follow up?

@am11

am11 commented Apr 20, 2026

Copy link
Copy Markdown
Member Author

Sure, actually I was thinking about which one to select after your changes are merged. Thanks for the pointers. 😉

jakobbotsch added a commit that referenced this pull request Apr 28, 2026
Follow-up on
#123645 (comment).

---------

Co-authored-by: Jakob Botsch Nielsen <Jakob.botsch.nielsen@gmail.com>
@github-actions github-actions Bot locked and limited conversation to collaborators May 21, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Test failure: JIT/opt/OSR/Runtime_89666/Runtime_89666.cmd Test failure: JIT/opt/OSR/Runtime_69032/Runtime_69032.cmd

5 participants