Remove mailbox for now#268
Conversation
This reverts commit 6da67d7.
|
Created a monitoring plan for this PR. What this PR does: Removes the guest-initiated resume-network mailbox — a fast-path where hypeman pre-patched VM snapshot memory with new network identity before restore. Network reconfiguration after standby→running transitions now uses only the shell-exec path (already deployed as the fallback). Intended effect:
Risks:
Status updates will be posted automatically on this PR as monitoring progresses. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 58ce89f. Configure here.
|
CI server firewall was messed up dropping guest -> host communication, which is why CI was failing: |
The Linux integration tests create per-bridge iptables FORWARD/NAT rules. On the shared self-hosted runner these leak: the test-harness cleanup called iptables without the -w lock-wait flag and ignored all errors, so under xtables-lock contention from concurrent CI jobs cleanup failed silently. Rules whose bridge interface no longer exists then accumulated indefinitely (cleanupStaleLinkDownRoutes only handles still-present linkdown routes), inflating every iptables operation over time and contributing to flaky "instance did not reach Running within 45s" timeouts. - Add -w 5 to all harness iptables calls (matches lib/network convention) - Surface cleanup errors to stderr instead of swallowing them; retry deletes on transient lock contention; treat already-gone routes/links as benign - Sweep orphaned test iptables rules (interface gone) once per test binary under the existing subnet lock, scoped to the "hm" test prefix so a non-test hypeman process's "ha" rules are never touched - Remove dead HYPEMAN_TEST_NETWORK_TMPDIR plumbing from test.yml (its Go reader was reverted in #268; re-wiring it would reintroduce per-run lock/lease isolation and break cross-run subnet coordination) Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Reverts #260
Note
Medium Risk
Changes networked fork/restore behavior on Firecracker by dropping the fast mailbox/ack path; restores depend on vsock guest-agent reconfigure, which affects timing and failure modes for forked standby restores.
Overview
Reverts the guest-initiated resume network mailbox flow (PR #260): the host no longer patches Firecracker snapshot memory, waits for guest UDP acks, or arms mailbox env on create/start.
Restore and fork networking after standby/running resume now rely only on the existing host-initiated path—
reconfigureGuestNetworkvia guest-agent gRPC (with shellipexec fallback)—when a fresh allocation is applied for forked standby restores.Removed code spans
lib/mailbox, instance handoff helpers (prepareResumeNetworkHandoff, mailbox patching/UDP waiter), guest-agent VMGenID/mailbox watcher, related tests/docs, and aForkSnapshotAPI unit test. Linux test env no longer passesHYPEMAN_TEST_NETWORK_TMPDIR; parallel test network locks/leases use the system temp dir only.Reviewed by Cursor Bugbot for commit 71309cb. Bugbot is set up for automated code reviews on this repo. Configure here.