文章/答案/技术大牛

发布

问最小上下文压缩器
EN

Code Review用户

提问于 2020-02-13 14:08:44

回答 1查看 68关注 0票数 4

我正在研究极简压缩法的一个实现。该方法基于上下文建模，然后采用简单熵编码器(白米编码器)。上下文模型使用未压缩符号流中的单个前一个字节来预测下一个字节。目前，我的实现工作非常好。但是，我不知道是否可以使它更快或更简洁。基本上，任何改善的建议都是受欢迎的。

先决条件

该实现使用以下数据结构：

struct context {
    size_t freq[256];          /* char -> frequency */
    unsigned char sorted[256]; /* index -> char */
    unsigned char order[256];  /* char -> index */
} table[256];

bio_open、bio_write_gr、bio_read_gr和bio_close函数实现了熵(Golomb)编码.

此外，函数void update_model(unsigned char delta)根据新编码的字节delta (不是本文讨论的主题)更新熵模型。

码

我使用了几个辅助函数：

static void swap_symbols(struct context *context, unsigned char c, unsigned char d)
{
    assert(context != NULL);

    unsigned char ic = context->order[c];
    unsigned char id = context->order[d];

    assert(context->sorted[ic] == c);
    assert(context->sorted[id] == d);

    context->sorted[ic] = d;
    context->sorted[id] = c;

    context->order[c] = id;
    context->order[d] = ic;
}

static void increment_frequency(struct context *context, unsigned char c)
{
    assert(context != NULL);

    unsigned char ic = context->order[c];
    size_t freq_c = ++(context->freq[c]);

    unsigned char *pd;

    for (pd = context->sorted + ic - 1; pd >= context->sorted; --pd) {
        if (freq_c <= context->freq[*pd]) {
            break;
        }
    }

    unsigned char d = *(pd + 1);

    if (c != d) {
        swap_symbols(context, c, d);
    }
}

主要输入功能定义如下：

void init()
{
    opt_k = 3;

    for (int p = 0; p < 256; ++p) {
        for (int i = 0; i < 256; ++i) {
            table[p].sorted[i] = i;
            table[p].freq[i] = 0;
            table[p].order[i] = i;
        }
    }
}

void *compress(void *iptr, size_t isize, void *optr)
{
    struct bio bio;
    unsigned char *end = (unsigned char *)iptr + isize;
    struct context *context = table + 0;

    bio_open(&bio, optr, BIO_MODE_WRITE);

    for (unsigned char *iptrc = iptr; iptrc < end; ++iptrc) {
        unsigned char c = *iptrc;

        /* get index */
        unsigned char d = context->order[c];

        bio_write_gr(&bio, opt_k, (uint32_t)d);

        assert(c == context->sorted[d]);

        /* update context model */
        increment_frequency(context, c);

        /* update Golomb-Rice model */
        update_model(d);

        context = table + c;
    }

    /* EOF symbol */
    bio_write_gr(&bio, opt_k, 256);

    bio_close(&bio, BIO_MODE_WRITE);

    return bio.ptr;
}

void *decompress(void *iptr, void *optr)
{
    struct bio bio;
    struct context *context = table + 0;

    bio_open(&bio, iptr, BIO_MODE_READ);

    unsigned char *optrc = optr;

    for (;; ++optrc) {
        uint32_t d = bio_read_gr(&bio, opt_k);

        if (d == 256) {
            break;
        }

        assert(d < 256);

        unsigned char c = context->sorted[d];

        *optrc = c;

        increment_frequency(context, c);

        update_model(d);

        context = table + c;
    }

    bio_close(&bio, BIO_MODE_READ);

    return optrc;
}

performance

compression

回答 1

Code Review用户

发布于 2020-02-13 16:02:42

符号常数

此代码可以使用一些符号常量：

struct context {
    size_t freq[256];          /* char -> frequency */
    unsigned char sorted[256]; /* index -> char */
    unsigned char order[256];  /* char -> index */
} table[256];

void init()
{
    opt_k = 3;

    for (int p = 0; p < 256; ++p) {
        for (int i = 0; i < 256; ++i) {
            table[p].sorted[i] = i;
            table[p].freq[i] = 0;
            table[p].order[i] = i;
        }
    }
}

使用符号常量将使这段代码更易于编写和维护。例如，如果256有一个符号常量，那么表和循环控件的大小可以用一行编辑来更改。

全局变量

在上面的函数init()中，显然使用全局变量，table在全局变量上，opt_k是另一个全局变量。

全局变量通常被认为是一种糟糕的编程实践，它们使代码难以编写、调试和维护。它们也使得引入bug变得非常容易。

在C编程语言中，如果程序由多个文件组成，如果程序在多个文件中声明，则全局变量可能导致程序不链接。

在C

中使用assert()

如果在某个时候要编译代码，以便使用编译器的优化特性发布，那么assert()语句将被优化。如果需要使用断言提供的错误检查，最好用If语句替换断言。

票数 1

页面原文内容由Code Review提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://codereview.stackexchange.com/questions/237236

复制

相似问题

问最小上下文压缩器
EN

先决条件

码

回答 1

Code Review用户

符号常数

全局变量

在C

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问最小上下文压缩器EN

先决条件

码

回答 1

Code Review用户

符号常数

全局变量

在C

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问最小上下文压缩器
EN