文章/答案/技术大牛

发布

社区首页 >问答首页 >Cython中的水稻编码

问Cython中的水稻编码
EN

Stack Overflow用户

提问于 2014-03-29 22:41:14

回答 1查看 1K关注 0票数 2

下面是著名的Rice编码(= Golomb代码和M = 2^k coding)的实现，它在压缩算法中得到了广泛的应用。

不幸的是，这是相当缓慢的。造成这种低速的原因是什么？(StringIO？)数据是一个字节又一个字节写入的事实？)

为了加快编码速度，您会重新推荐使用什么？，您会用什么技巧来加速Cython ?

import struct
import StringIO

def put_bit(f, b):
    global buff, filled
    buff = buff | (b << (7-filled))
    if (filled == 7):
        f.write(struct.pack('B',buff))
        buff = 0
        filled = 0
    else:
        filled += 1

def rice_code(f, x, k):
    q = x / (1 << k)                       
    for i in range(q): 
        put_bit(f, 1)
    put_bit(f, 0)
    for i in range(k-1, -1, -1):
        put_bit(f, (x >> i) & 1)

def compress(L, k):
    f = StringIO.StringIO()
    global buff, filled
    buff = 0
    filled = 0
    for x in L:                # encode all numbers
        rice_code(f, x, k)
    for i in range(8-filled):  # write the last byte (if necessary pad with 1111...)  
        put_bit(f, 1)
    return f.getvalue()

if __name__ == '__main__':
    print struct.pack('BBB', 0b00010010, 0b00111001, 0b01111111)      #see http://fr.wikipedia.org/wiki/Codage_de_Rice#Exemples
    print compress([1,2,3,10],k = 3)

PS :这个问题是否应该转移到https://codereview.stackexchange.com/？

python

compression

cython

回答 1

Stack Overflow用户

发布于 2014-03-31 01:59:28

在构建压缩结果时，我将使用C风格的缓冲区而不是StringIO，并且我将尝试在编码循环中只使用C风格的临时程序。我还注意到，您可以预先初始化缓冲区以填充set位('1‘位)，这将使编码值具有较大的商值更快，因为您可以简单地跳过输出缓冲区中的那些位。我用这些东西重写了压缩函数，并测量了结果的速度，我的版本似乎比你的编码器快了十多倍，但是结果代码的可读性较差。

这是我的版本：

cimport cpython.string
cimport libc.stdlib
cimport libc.string
import struct

cdef int BUFFER_SIZE = 4096

def compress(L, k):
    result = ''

    cdef unsigned cvalue
    cdef char *position
    cdef int bit, nbit
    cdef unsigned q, r
    cdef unsigned ck = k
    cdef unsigned mask = (1 << ck) - 1

    cdef char *buff = <char *>libc.stdlib.malloc(BUFFER_SIZE)
    if buff is NULL:
        raise MemoryError

    try:
        #  Initialize the buffer space is assumed to contain all set bits
        libc.string.memset(buff, 0xFF, BUFFER_SIZE)

        position = buff
        bit = 7

        for value in L:
            cvalue = value
            q = cvalue >> ck
            r = cvalue & mask

            #  Skip ahead some number of pre-set one bits for the quotient
            position += q / 8
            bit -= q % 8
            if bit < 0:
                bit += 8
                position += 1

                #  If we have gone off the end of the buffer, extract 
                #  the result and reset buffer pointers
                while position - buff >= BUFFER_SIZE:
                    block = cpython.string.PyString_FromStringAndSize(
                        buff, BUFFER_SIZE)
                    result = result + block

                    libc.string.memset(buff, 0xFF, BUFFER_SIZE)
                    position = position - BUFFER_SIZE

            #  Clear the final bit to indicate the end of the quotient
            position[0] = position[0] ^ (1 << bit)
            if bit > 0:
                bit = bit - 1
            else:
                position += 1
                bit = 7

                #  Check for buffer overflow
                if position - buff >= BUFFER_SIZE:
                    block = cpython.string.PyString_FromStringAndSize(
                        buff, BUFFER_SIZE)
                    result = result + block

                    libc.string.memset(buff, 0xFF, BUFFER_SIZE)
                    position = buff

            #  Encode the remainder bits one by one
            for nbit in xrange(k - 1, -1, -1):
                position[0] = (position[0] & ~(1 << bit)) | \
                              (((r >> nbit) & 1) << bit)

                if bit > 0:
                    bit = bit - 1
                else:
                    position += 1
                    bit = 7

                    #  Check for buffer overflow
                    if position - buff >= BUFFER_SIZE:
                        block = cpython.string.PyString_FromStringAndSize(
                            buff, BUFFER_SIZE)
                        result = result + block

                        libc.string.memset(buff, 0xFF, BUFFER_SIZE)
                        position = buff

        #  Advance if we have partially used the last byte
        if bit < 7:
            position = position + 1

        #  Extract the used portion of the buffer
        block = cpython.string.PyString_FromStringAndSize(
            buff, position - buff)
        result = result + block

        return result

    finally:
        libc.stdlib.free(buff)


def test():
    a = struct.pack('BBB', 0b00010010, 0b00111001, 0b01111111)      #see http://fr.wikipedia.org/wiki/Codage_de_Rice#Exemples
    b = compress([1,2,3,10],k = 3)

    assert a == b

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/22737856

复制

相似问题

问Cython中的水稻编码
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Cython中的水稻编码EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Cython中的水稻编码
EN