Skip to content

x86-64 SIMD: SSE intrinsics disabled due to LLVM crash on soft-float target #72

Description

@johnjezl

Description

The inference engine's SIMD operators (relu, add, matmul, softmax) have AArch64 NEON implementations but fall back to scalar on x86-64. SSE2 intrinsic versions were implemented but had to be removed because they trigger an LLVM code generation crash:

rustc-LLVM ERROR: Do not know how to split the result of this operator!

This occurs on the `x86_64-unknown-none` target, which defaults to soft-float (no SSE). Even with `-C target-feature=+sse,+sse2` in .cargo/config.toml, LLVM crashes during LTO when the `libm` crate's arch-specific code interacts with SSE-enabled FP operations.

Root Cause

The `x86_64-unknown-none` Rust target spec disables SSE by default for bare-metal. Enabling SSE via target features causes incompatibilities between:

  1. The soft-float ABI assumed by the target spec
  2. SSE register usage in optimized FP code
  3. `libm` crate's arch-specific optimizations (`cfg(arch_enabled)`)

Impact

  • x86-64 inference uses scalar loops (~2-3x slower than SSE for matmul)
  • MNIST inference still works correctly, just slower
  • ARM64 (primary target) is unaffected — full NEON SIMD

Possible Fixes

  1. Custom target spec with SSE enabled (requires nightly `-Z build-std`)
  2. `#[target_feature(enable = "sse2")]` on individual functions (may work on stable)
  3. Disable `libm` arch feature for x86-64 only (`cfg`-gated dependency)
  4. Wait for Rust x86-64 bare-metal improvements (target spec evolution)

Documented In

  • `runtime/src/inference/ops.rs` line 188-189 (code comment)
  • `docs/api/runtime.md` SIMD Optimization section

Priority

Medium — affects x86-64 performance but not correctness. ARM64 is the primary deployment target.

Metadata

Metadata

Assignees

No one assigned

    Labels

    blockedWaiting on external dependency or informationenhancementNew feature or improvementplatform:x86-64x86-64 (RTX 3050 dev PC)sub:ai-runtimeAI/ML runtime, inference, model loading, ONNXsub:buildBuild system, CMake, Makefile, toolchain

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions