mm256_srli,slli_si256; mm256_bsrli,bslli_epi128 to const generics#1067
Conversation
|
r? @Amanieu (rust-highfive has picked a reviewer for you, use r? to override) |
|
At f16c.rs |
|
The instruction definition here says that bits 3 to 7 are ignored by the CPU. I think to be safe we should only allow imm3, we can always relax it later if necessary. |
|
|
I'm still wondering about the fact that we "* 8" the immediates that are supposed to be in bytes and <= 16 for the shifts |
|
It might be better to switch the implementation to use a shuffle like clang does and like we already do for |
|
It seems mm256_slli_si256 = mm256_bslli_epi128? |
|
Yes, see #1012. |
| use crate::{ | ||
| core_arch::{simd::*, x86::*}, | ||
| hint::unreachable_unchecked, | ||
| // hint::unreachable_unchecked, |
| } | ||
| transmute(constify_imm8!(imm8 * 8, call)) | ||
| let r = vpslldq(a, IMM8 * 8); | ||
| transmute(r) |
There was a problem hiding this comment.
You can just call _mm256_bslli_epi128 here.
| } | ||
| transmute(constify_imm8!(imm8 * 8, call)) | ||
| let r = vpsrldq(a, IMM8 * 8); | ||
| transmute(r) |
There was a problem hiding this comment.
You can just call _mm256_bsrli_epi128 here.
Thanks. I think my bsrli_epi128 and bslli_epi128 having problems. I need to check them first. |
f16c: _mm256_cvtps_ph; mm_cvtps_ph