我正在实现一个快速的x888 -> 565像素转换函数在皮特曼中,根据所描述的算法英特尔[pdf]。他们的代码转换x888 -> 555,而我想要转换为565。不幸的是,转换到565意味着高位被设置,这意味着我不能使用符号饱和包指令。未签名的pack指令,直到SSE4.1才添加packusdw。我想用SSE2实现它的功能,或者找到另一种方法。
该函数采用两个XMM寄存器,每个寄存器包含4个32位像素,并输出一个XMM寄存器,其中包含8个转换后的RGB565像素。
static force_inline __m128i
pack_565_2packedx128_128 (__m128i lo, __m128i hi)
{
__m128i rb0 = _mm_and_si128 (lo, mask_565_rb);
__m128i rb1 = _mm_and_si128 (hi, mask_565_rb);
__m128i t0 = _mm_madd_epi16 (rb0, mask_565_pack_multiplier);
__m128i t1 = _mm_madd_epi16 (rb1, mask_565_pack_multiplier);
__m128i g0 = _mm_and_si128 (lo, mask_green);
__m128i g1 = _mm_and_si128 (hi, mask_green);
t0 = _mm_or_si128 (t0, g0);
t1 = _mm_or_si128 (t1, g1);
t0 = _mm_srli_epi32 (t0, 5);
t1 = _mm_srli_epi32 (t1, 5);
/* XXX: maybe there's a way to do this relatively efficiently with SSE2? */
return _mm_packus_epi32 (t0, t1);
}我想过的想法:
_mm_packs_epi32,再加0x8000到每565像素.我试过了,但我做不到。
t0 = _mm_sub_epi16 ( t0,mask_8000);t1 = _mm_sub_epi16 (t1,mask_8000);t0= _mm_packs_epi32 (t0,t1);返回_mm_add_epi16 (t0,mask_8000);还有其他(希望更有效率的)方法可以做到吗?
发布于 2012-06-14 07:06:15
您可以先对扩展值进行签名,然后使用_mm_packs_epi32
t0 = _mm_slli_epi32 (t0, 16);
t0 = _mm_srai_epi32 (t0, 16);
t1 = _mm_slli_epi32 (t1, 16);
t1 = _mm_srai_epi32 (t1, 16);
t0 = _mm_packs_epi32 (t0, t1);实际上,您可以将其与前面的轮班结合起来,以保存两个指令:
t0 = _mm_slli_epi32 (t0, 16 - 5);
t0 = _mm_srai_epi32 (t0, 16);
t1 = _mm_slli_epi32 (t1, 16 - 5);
t1 = _mm_srai_epi32 (t1, 16);
t0 = _mm_packs_epi32 (t0, t1);https://stackoverflow.com/questions/11024652
复制相似问题