Skip to content

mm256_srli,slli_si256; mm256_bsrli,bslli_epi128 to const generics#1067

Merged
Amanieu merged 4 commits into
rust-lang:masterfrom
minybot:avx2
Mar 10, 2021
Merged

mm256_srli,slli_si256; mm256_bsrli,bslli_epi128 to const generics#1067
Amanieu merged 4 commits into
rust-lang:masterfrom
minybot:avx2

Conversation

@minybot

@minybot minybot commented Mar 9, 2021

Copy link
Copy Markdown
Contributor

f16c: _mm256_cvtps_ph; mm_cvtps_ph

@rust-highfive

Copy link
Copy Markdown

r? @Amanieu

(rust-highfive has picked a reviewer for you, use r? to override)

@minybot

minybot commented Mar 9, 2021

Copy link
Copy Markdown
Contributor Author

At f16c.rs
_mm256_cvtps_ph(a: __m256, imm_rounding: i32). The current imm_rounding is set to 0-7.
I checked the Clang, it accepts 0-255.
Any suggestion?

@Amanieu

Amanieu commented Mar 9, 2021

Copy link
Copy Markdown
Member

The instruction definition here says that bits 3 to 7 are ignored by the CPU. I think to be safe we should only allow imm3, we can always relax it later if necessary.

@minybot

minybot commented Mar 9, 2021

Copy link
Copy Markdown
Contributor Author

The instruction definition here says that bits 3 to 7 are ignored by the CPU. I think to be safe we should only allow imm3, we can always relax it later if necessary.
Ok. I will finish f16c.

@lqd

lqd commented Mar 9, 2021

Copy link
Copy Markdown
Member

I'm still wondering about the fact that we "* 8" the immediates that are supposed to be in bytes and <= 16 for the shifts

@Amanieu

Amanieu commented Mar 9, 2021

Copy link
Copy Markdown
Member

It might be better to switch the implementation to use a shuffle like clang does and like we already do for _mm_slli_si128.

@minybot

minybot commented Mar 9, 2021

Copy link
Copy Markdown
Contributor Author

It might be better to switch the implementation to use a shuffle like clang does and like we already do for _mm_slli_si128.
Ok. I will modify it to similar to _mm_slli_si128.

@minybot

minybot commented Mar 9, 2021

Copy link
Copy Markdown
Contributor Author

It might be better to switch the implementation to use a shuffle like clang does and like we already do for _mm_slli_si128.

It seems mm256_slli_si256 = mm256_bslli_epi128?

@Amanieu

Amanieu commented Mar 9, 2021

Copy link
Copy Markdown
Member

Yes, see #1012.

use crate::{
core_arch::{simd::*, x86::*},
hint::unreachable_unchecked,
// hint::unreachable_unchecked,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted commented code.

Comment thread crates/core_arch/src/x86/avx2.rs Outdated
}
transmute(constify_imm8!(imm8 * 8, call))
let r = vpslldq(a, IMM8 * 8);
transmute(r)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just call _mm256_bslli_epi128 here.

Comment thread crates/core_arch/src/x86/avx2.rs Outdated
}
transmute(constify_imm8!(imm8 * 8, call))
let r = vpsrldq(a, IMM8 * 8);
transmute(r)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just call _mm256_bsrli_epi128 here.

@minybot

minybot commented Mar 9, 2021

Copy link
Copy Markdown
Contributor Author

Yes, see #1012.

Thanks. I think my bsrli_epi128 and bslli_epi128 having problems. I need to check them first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants