目前,我正在研究在纯C中实现高效BitStream的各种可能策略。我需要这种策略来实现各种基于位的压缩算法。然而,我找不到很多关于这个话题的文献,而且我似乎找不到很多好的例子。
下面是我要找的东西:
我想知道以下几点:
这看起来可能有点过火了,但是当压缩过程中涉及到的其他代码经过了极大的优化后,BitStream部分看起来就像是在破坏整个过程。例如,在图像压缩代码中使用SIMD指令来优化部分编码过程并不少见,但最后一步是写入BitStream。
想法,推荐信,有人吗?谢谢!
发布于 2014-06-18 19:11:49
为了记录在案,下面是我最后得到的BitStream实现。它主要是基于宏的,使用32位累加器.比特存储在累加器中,从最重要位到最小有效位。例如,要测试是否设置了下一个位,可以这样做(累加器& 0x80000000)。这使得在不需要操作BitStream的情况下进行多个测试变得非常实用。为了编写,我在累加器中积累比特,并在其满时自动将它们刷新到输出缓冲区。当所有位元都已写入时,也可以手动触发刷新。你的里程可能不一样,但我很满意。我使用它来实现Microsoft点对点压缩(MPPC),它使用huffman编码,性能非常好。
struct _wBitStream
{
BYTE* buffer;
BYTE* pointer;
DWORD position;
DWORD length;
DWORD capacity;
UINT32 mask;
UINT32 offset;
UINT32 prefetch;
UINT32 accumulator;
};
typedef struct _wBitStream wBitStream;
#define BitStream_Prefetch(_bs) do { \
(_bs->prefetch) = 0; \
if (((UINT32) (_bs->pointer - _bs->buffer)) < (_bs->capacity + 4)) \
(_bs->prefetch) |= (*(_bs->pointer + 4) << 24); \
if (((UINT32) (_bs->pointer - _bs->buffer)) < (_bs->capacity + 5)) \
(_bs->prefetch) |= (*(_bs->pointer + 5) << 16); \
if (((UINT32) (_bs->pointer - _bs->buffer)) < (_bs->capacity + 6)) \
(_bs->prefetch) |= (*(_bs->pointer + 6) << 8); \
if (((UINT32) (_bs->pointer - _bs->buffer)) < (_bs->capacity + 7)) \
(_bs->prefetch) |= (*(_bs->pointer + 7) << 0); \
} while(0)
#define BitStream_Fetch(_bs) do { \
(_bs->accumulator) = 0; \
if (((UINT32) (_bs->pointer - _bs->buffer)) < (_bs->capacity + 0)) \
(_bs->accumulator) |= (*(_bs->pointer + 0) << 24); \
if (((UINT32) (_bs->pointer - _bs->buffer)) < (_bs->capacity + 1)) \
(_bs->accumulator) |= (*(_bs->pointer + 1) << 16); \
if (((UINT32) (_bs->pointer - _bs->buffer)) < (_bs->capacity + 2)) \
(_bs->accumulator) |= (*(_bs->pointer + 2) << 8); \
if (((UINT32) (_bs->pointer - _bs->buffer)) <(_bs->capacity + 3)) \
(_bs->accumulator) |= (*(_bs->pointer + 3) << 0); \
BitStream_Prefetch(_bs); \
} while(0)
#define BitStream_Flush(_bs) do { \
if (((UINT32) (_bs->pointer - _bs->buffer)) < (_bs->capacity + 0)) \
*(_bs->pointer + 0) = (_bs->accumulator >> 24); \
if (((UINT32) (_bs->pointer - _bs->buffer)) < (_bs->capacity + 1)) \
*(_bs->pointer + 1) = (_bs->accumulator >> 16); \
if (((UINT32) (_bs->pointer - _bs->buffer)) < (_bs->capacity + 2)) \
*(_bs->pointer + 2) = (_bs->accumulator >> 8); \
if (((UINT32) (_bs->pointer - _bs->buffer)) < (_bs->capacity + 3)) \
*(_bs->pointer + 3) = (_bs->accumulator >> 0); \
} while(0)
#define BitStream_Shift(_bs, _nbits) do { \
_bs->accumulator <<= _nbits; \
_bs->position += _nbits; \
_bs->offset += _nbits; \
if (_bs->offset < 32) { \
_bs->mask = ((1 << _nbits) - 1); \
_bs->accumulator |= ((_bs->prefetch >> (32 - _nbits)) & _bs->mask); \
_bs->prefetch <<= _nbits; \
} else { \
_bs->mask = ((1 << _nbits) - 1); \
_bs->accumulator |= ((_bs->prefetch >> (32 - _nbits)) & _bs->mask); \
_bs->prefetch <<= _nbits; \
_bs->offset -= 32; \
_bs->pointer += 4; \
BitStream_Prefetch(_bs); \
if (_bs->offset) { \
_bs->mask = ((1 << _bs->offset) - 1); \
_bs->accumulator |= ((_bs->prefetch >> (32 - _bs->offset)) & _bs->mask); \
_bs->prefetch <<= _bs->offset; \
} \
} \
} while(0)
#define BitStream_Write_Bits(_bs, _bits, _nbits) do { \
_bs->position += _nbits; \
_bs->offset += _nbits; \
if (_bs->offset < 32) { \
_bs->accumulator |= (_bits << (32 - _bs->offset)); \
} else { \
_bs->offset -= 32; \
_bs->mask = ((1 << (_nbits - _bs->offset)) - 1); \
_bs->accumulator |= ((_bits >> _bs->offset) & _bs->mask); \
BitStream_Flush(bs); \
_bs->accumulator = 0; \
_bs->pointer += 4; \
if (_bs->offset) { \
_bs->mask = ((1 << _bs->offset) - 1); \
_bs->accumulator |= ((_bits & _bs->mask) << (32 - _bs->offset)); \
} \
} \
} while(0)
void BitStream_Attach(wBitStream* bs, BYTE* buffer, UINT32 capacity)
{
bs->position = 0;
bs->buffer = buffer;
bs->offset = 0;
bs->accumulator = 0;
bs->pointer = bs->buffer;
bs->capacity = capacity;
bs->length = bs->capacity * 8;
}
wBitStream* BitStream_New()
{
wBitStream* bs = NULL;
bs = (wBitStream*) calloc(1, sizeof(wBitStream));
return bs;
}
void BitStream_Free(wBitStream* bs)
{
free(bs);
}发布于 2014-06-09 20:48:44
嗯。不幸的是,我要指出的并不一定是一个有效的实现,我希望有人发布一个更好的答案(我还没有像多媒体Mike建议的那样检查ffmpeg )。但我能看到你的理解中有足够的漏洞,以至于我认为把这作为一个答案是值得的。
首先,宏不是好的性能所必需的!这是一种非常过时的方法,除非您正在为非常老的或不成熟的编译器编写代码。内联函数与宏一样有效(附带警告,有时会使其更有效)。在合理使用静态函数的情况下,编译器可以决定什么应该内联,什么不该内联。虽然gcc可能不是一个优秀的编译器,但它擅长于判断一个值何时是常数,甚至是指针!这也消除了将常量放入宏中的需要。也就是说,这个:
#define UINT_BIT_SIZE (sizeof(uint) * 8)就像
static const size_t UINT_BIT_SIZE = sizeof(uint) * 8;只是后者有一种类型。对于gcc的最新版本(过去4年左右),您甚至不需要const来将其作为编译时常量来处理,只要它标记为静态(或本地),并且编译单元中的任何代码都不修改该值,它就会将其视为编译时常量。
CPU缓存的出现从根本上改变了使一段代码快速或(相对)慢的原因。如果代码的热部分不适合上缓存(L1 / L2),那么内联和/或宏最终会减慢您的速度,特别是当您必须访问主存时。同样,一次在多个地方触摸数据会导致大量缓存丢失。
话虽如此,我还是写了"Bit Creek",这是Linux设备驱动程序中一个非性能关键部分的一个次要实现("creek“,如"not”中的“creek”)。struct bit_creek表示作为位流处理的内存段,如果超出缓冲区,creek_get_bits和creek_put_bits将其读写为返回-EOVERFLOW的流。
如果我将skip_bits存储在结构中,甚至像您建议的那样,在我有一个完整的字节之前,甚至不会将字节写入主内存,那么我可能会挤出更多的性能,但是性能对此并不重要。
祝好运!
https://stackoverflow.com/questions/22202879
复制相似问题