Description
The inference engine's SIMD operators (relu, add, matmul, softmax) have AArch64 NEON implementations but fall back to scalar on x86-64. SSE2 intrinsic versions were implemented but had to be removed because they trigger an LLVM code generation crash:
rustc-LLVM ERROR: Do not know how to split the result of this operator!
This occurs on the `x86_64-unknown-none` target, which defaults to soft-float (no SSE). Even with `-C target-feature=+sse,+sse2` in .cargo/config.toml, LLVM crashes during LTO when the `libm` crate's arch-specific code interacts with SSE-enabled FP operations.
Root Cause
The `x86_64-unknown-none` Rust target spec disables SSE by default for bare-metal. Enabling SSE via target features causes incompatibilities between:
- The soft-float ABI assumed by the target spec
- SSE register usage in optimized FP code
- `libm` crate's arch-specific optimizations (`cfg(arch_enabled)`)
Impact
- x86-64 inference uses scalar loops (~2-3x slower than SSE for matmul)
- MNIST inference still works correctly, just slower
- ARM64 (primary target) is unaffected — full NEON SIMD
Possible Fixes
- Custom target spec with SSE enabled (requires nightly `-Z build-std`)
- `#[target_feature(enable = "sse2")]` on individual functions (may work on stable)
- Disable `libm` arch feature for x86-64 only (`cfg`-gated dependency)
- Wait for Rust x86-64 bare-metal improvements (target spec evolution)
Documented In
- `runtime/src/inference/ops.rs` line 188-189 (code comment)
- `docs/api/runtime.md` SIMD Optimization section
Priority
Medium — affects x86-64 performance but not correctness. ARM64 is the primary deployment target.
Description
The inference engine's SIMD operators (relu, add, matmul, softmax) have AArch64 NEON implementations but fall back to scalar on x86-64. SSE2 intrinsic versions were implemented but had to be removed because they trigger an LLVM code generation crash:
This occurs on the `x86_64-unknown-none` target, which defaults to soft-float (no SSE). Even with `-C target-feature=+sse,+sse2` in .cargo/config.toml, LLVM crashes during LTO when the `libm` crate's arch-specific code interacts with SSE-enabled FP operations.
Root Cause
The `x86_64-unknown-none` Rust target spec disables SSE by default for bare-metal. Enabling SSE via target features causes incompatibilities between:
Impact
Possible Fixes
Documented In
Priority
Medium — affects x86-64 performance but not correctness. ARM64 is the primary deployment target.