首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Cython中的水稻编码

Cython中的水稻编码
EN

Stack Overflow用户
提问于 2014-03-29 22:41:14
回答 1查看 1K关注 0票数 2

下面是著名的Rice编码(= Golomb代码和M = 2^k coding)的实现,它在压缩算法中得到了广泛的应用。

不幸的是,这是相当缓慢的。造成这种低速的原因是什么?(StringIO?)数据是一个字节又一个字节写入的事实?)

为了加快编码速度,您会重新推荐使用什么?,您会用什么技巧来加速Cython ?

代码语言:javascript
复制
import struct
import StringIO

def put_bit(f, b):
    global buff, filled
    buff = buff | (b << (7-filled))
    if (filled == 7):
        f.write(struct.pack('B',buff))
        buff = 0
        filled = 0
    else:
        filled += 1

def rice_code(f, x, k):
    q = x / (1 << k)                       
    for i in range(q): 
        put_bit(f, 1)
    put_bit(f, 0)
    for i in range(k-1, -1, -1):
        put_bit(f, (x >> i) & 1)

def compress(L, k):
    f = StringIO.StringIO()
    global buff, filled
    buff = 0
    filled = 0
    for x in L:                # encode all numbers
        rice_code(f, x, k)
    for i in range(8-filled):  # write the last byte (if necessary pad with 1111...)  
        put_bit(f, 1)
    return f.getvalue()

if __name__ == '__main__':
    print struct.pack('BBB', 0b00010010, 0b00111001, 0b01111111)      #see http://fr.wikipedia.org/wiki/Codage_de_Rice#Exemples
    print compress([1,2,3,10],k = 3)    

PS :这个问题是否应该转移到https://codereview.stackexchange.com/

EN

回答 1

Stack Overflow用户

发布于 2014-03-31 01:59:28

在构建压缩结果时,我将使用C风格的缓冲区而不是StringIO,并且我将尝试在编码循环中只使用C风格的临时程序。我还注意到,您可以预先初始化缓冲区以填充set位('1‘位),这将使编码值具有较大的商值更快,因为您可以简单地跳过输出缓冲区中的那些位。我用这些东西重写了压缩函数,并测量了结果的速度,我的版本似乎比你的编码器快了十多倍,但是结果代码的可读性较差。

这是我的版本:

代码语言:javascript
复制
cimport cpython.string
cimport libc.stdlib
cimport libc.string
import struct

cdef int BUFFER_SIZE = 4096

def compress(L, k):
    result = ''

    cdef unsigned cvalue
    cdef char *position
    cdef int bit, nbit
    cdef unsigned q, r
    cdef unsigned ck = k
    cdef unsigned mask = (1 << ck) - 1

    cdef char *buff = <char *>libc.stdlib.malloc(BUFFER_SIZE)
    if buff is NULL:
        raise MemoryError

    try:
        #  Initialize the buffer space is assumed to contain all set bits
        libc.string.memset(buff, 0xFF, BUFFER_SIZE)

        position = buff
        bit = 7

        for value in L:
            cvalue = value
            q = cvalue >> ck
            r = cvalue & mask

            #  Skip ahead some number of pre-set one bits for the quotient
            position += q / 8
            bit -= q % 8
            if bit < 0:
                bit += 8
                position += 1

                #  If we have gone off the end of the buffer, extract 
                #  the result and reset buffer pointers
                while position - buff >= BUFFER_SIZE:
                    block = cpython.string.PyString_FromStringAndSize(
                        buff, BUFFER_SIZE)
                    result = result + block

                    libc.string.memset(buff, 0xFF, BUFFER_SIZE)
                    position = position - BUFFER_SIZE

            #  Clear the final bit to indicate the end of the quotient
            position[0] = position[0] ^ (1 << bit)
            if bit > 0:
                bit = bit - 1
            else:
                position += 1
                bit = 7

                #  Check for buffer overflow
                if position - buff >= BUFFER_SIZE:
                    block = cpython.string.PyString_FromStringAndSize(
                        buff, BUFFER_SIZE)
                    result = result + block

                    libc.string.memset(buff, 0xFF, BUFFER_SIZE)
                    position = buff

            #  Encode the remainder bits one by one
            for nbit in xrange(k - 1, -1, -1):
                position[0] = (position[0] & ~(1 << bit)) | \
                              (((r >> nbit) & 1) << bit)

                if bit > 0:
                    bit = bit - 1
                else:
                    position += 1
                    bit = 7

                    #  Check for buffer overflow
                    if position - buff >= BUFFER_SIZE:
                        block = cpython.string.PyString_FromStringAndSize(
                            buff, BUFFER_SIZE)
                        result = result + block

                        libc.string.memset(buff, 0xFF, BUFFER_SIZE)
                        position = buff

        #  Advance if we have partially used the last byte
        if bit < 7:
            position = position + 1

        #  Extract the used portion of the buffer
        block = cpython.string.PyString_FromStringAndSize(
            buff, position - buff)
        result = result + block

        return result

    finally:
        libc.stdlib.free(buff)


def test():
    a = struct.pack('BBB', 0b00010010, 0b00111001, 0b01111111)      #see http://fr.wikipedia.org/wiki/Codage_de_Rice#Exemples
    b = compress([1,2,3,10],k = 3)

    assert a == b
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/22737856

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档