Skip to content

fix: prevent sticky node crash#987

Merged
ovitrif merged 1 commit into
masterfrom
fix/fg-service-crash-980
Jun 3, 2026
Merged

fix: prevent sticky node crash#987
ovitrif merged 1 commit into
masterfrom
fix/fg-service-crash-980

Conversation

@ovitrif

@ovitrif ovitrif commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator

Fixes #980
Fixes #981
Refs #986

Description

This PR:

  1. Guards LightningNodeService startup so Android does not promote the service from sticky/null restarts.
  2. Promotes the service with the explicit dataSync foreground-service type.
  3. Stops the foreground service immediately on Android timeout or notification stop before running async node cleanup.
  4. Adds regression coverage for foreground-service startup, restart, promotion-failure, stop, and timeout behavior.

Preview

N/A - service lifecycle crash fix, no UI changes.

QA Notes

Repro / Fix Evidence

  • Emulator/device setup: API 35+ with dev app to.bitkit.dev, target SDK 36, and shortened dataSync timeout:
    adb shell device_config put activity_manager data_sync_fgs_timeout_duration 30000
    adb shell device_config get activity_manager data_sync_fgs_timeout_duration
  • Pre-fix repro: checkout v2.2.0, install dev build, apply a local-only stress patch that delays stopSelf() in LightningNodeService.onTimeout() after lightningRepo.stop(), start the app, press Home, and wait about 35s.
  • Pre-fix crash evidence: .ai/issue_980_prefix/logcat.txt line 1187 shows ForegroundServiceDidNotStopInTimeException after FGS (dataSync) timed out.
  • Post-fix verification: checkout this branch, install dev build, use the same 30s timeout, background the app with the node service running, and wait about 35s.
  • Post-fix clean timeout evidence: .ai/issue_980_postfix/logcat.txt line 2044 shows FGS (dataSync) timed out, followed by service timeout handling and onDestroy, with no FGS Crashed, no RemoteServiceException, and no process death.
  • Optional symmetric stress on the fix: delay async lightningRepo.stop() after stopForegroundService(startId) in onTimeout; the FGS is already stopped before the delay, so the process still does not crash.
  • Reset timeout override after QA:
    adb shell device_config delete activity_manager data_sync_fgs_timeout_duration

Manual Tests

  • 1. regression: Android 15+ or 16+ device → enable shortened dataSync FGS timeout → background Bitkit with node service running: service stops cleanly without ForegroundServiceDidNotStopInTimeException.
  • 2. regression: Bitkit with notifications enabled → start background node service → force sticky/null restart condition: service does not promote from onCreate() and does not crash with ForegroundServiceStartNotAllowedException.
  • 3. regression: Node notification → tap Stop: foreground service notification is removed and app/service stops cleanly.

Automated Checks

  • Unit tests added/updated: app/src/test/java/to/bitkit/androidServices/LightningNodeServiceTest.kt covers explicit start action, typed dataSync foreground promotion, null/unsupported restart intents, promotion failure, duplicate starts, stop action ordering, and API-35 timeout ordering.
  • Local checks passed:
    • ./gradlew testDevDebugUnitTest --tests to.bitkit.androidServices.LightningNodeServiceTest
    • just compile
    • just test
    • just lint
  • just lint exited successfully; it printed existing unrelated detekt findings in AppViewModel.kt and SupportScreen.kt.

@ovitrif ovitrif requested a review from piotr-iohk June 3, 2026 14:57
@ovitrif ovitrif added this to the 2.3.0 milestone Jun 3, 2026
@ovitrif ovitrif requested a review from jvsena42 June 3, 2026 16:11
@piotr-iohk

Copy link
Copy Markdown
Collaborator

Emulator setup (once per session)

  • API 35+ (e.g. Pixel 8 API 37).
  • App: dev build → to.bitkit.dev.
  • Target SDK 36 → skip am compat enable (limits already on).

Shorten the dataSync FGS timeout (emulator only). On API 35+, LightningNodeService runs as a dataSync foreground service. In production the system allows about 6 hours per 24h, then calls onTimeout. Waiting that long is impractical for QA, so Android exposes a device config override to set how long the service may run in the background before timeout — here 30 seconds (30000 ms). That makes the repro repeatable: start node → Home → wait ~35s → FGS (dataSync) timed out. The setting applies to the emulator until changed or wiped; it does not ship with the app. See FGS timeout testing.

adb shell device_config put activity_manager data_sync_fgs_timeout_duration 30000
adb shell device_config get activity_manager data_sync_fgs_timeout_duration

Step 1 — Repro on v2.2.0 (expect crash) ✅

Checkout

git checkout v2.2.0

Local-only patch (LightningNodeService.kt)

Do not commit. Simulates slow teardown on the broken code path.

Add import:

import kotlinx.coroutines.delay

Replace onTimeout body (v2.2.0 — stopSelf() after async stop):

     override fun onTimeout(startId: Int, fgsType: Int) {
         Logger.warn("Foreground service timeout reached", context = TAG)
         serviceScope.launch {
             lightningRepo.stop()
+            delay(15_000)
             stopSelf()
         }
         super.onTimeout(startId, fgsType)
     }

Build & run

./gradlew installDevDebug
adb shell am start -n to.bitkit.dev/to.bitkit.ui.MainActivity

Test

  1. Wallet onboarded, notifications allowed → node notification visible.
  2. Press Home (app in background).
  3. Wait ~35 s (30s FGS limit + margin).

Expected logcat

adb logcat -v time ActivityManager:E AndroidRuntime:E APP:W | grep -E "FGS|DidNotStop|timeout|FATAL|died"

Observed on v2.2.0 + patch (2026-06-03):

E ActivityManager: FGS (dataSync) timed out: ... LightningNodeService ...
E ActivityManager: FGS Crashed: ... LightningNodeService ...        # ~10s after timeout
I ActivityManager: Process to.bitkit.dev (pid ...) has died: prcp FGS

Optional full exception:

adb logcat -d | grep -E "DidNotStop|did not stop within its timeout|RemoteServiceException"

Discard patch

git checkout -- app/src/main/java/to/bitkit/androidServices/LightningNodeService.kt

Step 2 — Verify fix branch (expect no crash) ✅

git switch fix/fg-service-crash-980

Symmetric stress patch applied locally (see below) — FGS tears down first; slow LDK stop must not crash.

./gradlew installDevDebug
adb shell am start -n to.bitkit.dev/to.bitkit.ui.MainActivity

Same test as Step 1.

Expected: FGS (dataSync) timed out then Stop FGS timeout within ~seconds, no FGS Crashed, no Process … has died: prcp FGS. The node FGS stops by design (notification gone); app process may stay in background — that is not a crash.

Observed on fix branch + symmetric stress (2026-06-03):

E ActivityManager: FGS (dataSync) timed out: ... LightningNodeService ...
D ActivityManager: Stop FGS timeout: ... LightningNodeService ...

(no FGS Crashed line)

Optional — symmetric stress on fix (still no crash)

Same delay import; in fix branch onTimeout:

         stopForegroundService(startId)
-        serviceScope.launch { lightningRepo.stop() }
+        serviceScope.launch {
+            delay(15_000)
+            lightningRepo.stop()
+        }

FGS is already down before the delay → should not crash.

@piotr-iohk piotr-iohk left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tACK. see #987 (comment)

@ovitrif ovitrif self-assigned this Jun 3, 2026
@ovitrif

ovitrif commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator Author

Step 1 — Repro on v2.2.0 (expect crash) ✅
Step 2 — Verify fix branch (expect no crash) ✅

Incorporated into the PR QA Notes. No code changes were needed.

@ovitrif ovitrif enabled auto-merge June 3, 2026 17:22
@ovitrif ovitrif merged commit 4290a7c into master Jun 3, 2026
40 of 45 checks passed
@ovitrif ovitrif deleted the fix/fg-service-crash-980 branch June 3, 2026 17:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants