Skip to content

fix: skip wait_for_status when Vercel sandbox is in a terminal state#3410

Merged
seratch merged 2 commits into
openai:mainfrom
cty-ut:fix/vercel-resume-terminal-state
May 15, 2026
Merged

fix: skip wait_for_status when Vercel sandbox is in a terminal state#3410
seratch merged 2 commits into
openai:mainfrom
cty-ut:fix/vercel-resume-terminal-state

Conversation

@cty-ut
Copy link
Copy Markdown
Contributor

@cty-ut cty-ut commented May 14, 2026

Summary

VercelSandboxClient.resume() called wait_for_status(RUNNING) unconditionally
after fetching an existing sandbox, even when the sandbox was already in a terminal
state (stopped or failed). Since a terminal sandbox can never transition to
RUNNING, this caused a ~45-second timeout before falling back to creating a new
sandbox — wasting time on every resume after a sandbox had stopped.

This was noted by a maintainer with an XXX comment at the call site.

Fix: check the current sandbox status before waiting:

  • Already running → reconnect immediately, skip the wait
  • Transient state (pending / stopping) → wait as before
  • Terminal state (stopped / failed) → close and recreate immediately

Test plan

  • Updated test_vercel_resume_reconnects_existing_running_sandbox: sandbox already
    running no longer calls wait_for_status.
  • Updated test_vercel_resume_recreates_sandbox_after_wait_timeout: uses pending
    status so the wait path is actually exercised.
  • Added test_vercel_resume_waits_when_sandbox_in_transient_state (parametrized:
    pending, stopping): verifies wait_for_status is still called for transient states.
  • Added test_vercel_resume_recreates_sandbox_when_in_terminal_state (parametrized:
    stopped, failed): verifies immediate fallback with no wait.

All 4587 tests pass (make tests).

Issue number

Checks

  • I've added new tests (if relevant)
  • I've added/updated the relevant documentation
  • I've run make lint and make format
  • I've made sure tests pass

@cty-ut
Copy link
Copy Markdown
Contributor Author

cty-ut commented May 14, 2026

Self-review note: in the original commit I included stopping in the
transient-status set, but a STOPPING sandbox transitions to STOPPED, not
back to RUNNING — including it would have preserved the very bug this PR
aims to fix. I've narrowed the set to just pending, and expanded the
recreate-path tests to cover stopping, stopped, failed, aborted,
and snapshotting (all states from which RUNNING is unreachable).

Copy link
Copy Markdown
Member

@seratch seratch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved. I validated the code path against the real Vercel Sandbox runtime as well as the focused unit coverage.

On PR head, resuming an already-running sandbox reconnected to the same sandbox immediately, and resuming a stopped sandbox recreated a new sandbox without waiting for the wait_for_status(RUNNING) timeout. As a control, the same stopped-resume scenario on origin/main still waited for the shortened timeout before recreating, which confirms this PR removes the intended delay.

I also confirmed the live SDK status values are pending, running, stopping, stopped, failed, aborted, and snapshotting, so limiting the wait path to pending matches the observed state model.

@seratch seratch added this to the 0.17.x milestone May 15, 2026
@seratch seratch merged commit 43a389d into openai:main May 15, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants