我在谷歌上搜索了快速DCT的实现。我已经找到了Loeffler算法,我已经在C++和ARM汇编中用霓虹灯实现了它。接下来,我找到了避免浮点计算的binDCT。我的参考论文/模式是这样的:

也就是说,我试着用下面的代码在C++中实现,只是为了测试:
void my_binDCT(int in[8][8], int data[8][8],const int xpos, const int ypos)
{
int i;
int row[8][8];
int x0, x1, x2, x3, x4, x5, x6, x7;
int tmp0, tmp1, tmp2, tmp3, tmp4, tmp5, tmp6, tmp7, tmp10, tmp11, tmp12, tmp13, tmp14, tmp15, tmp16, tmp17;
// transform rows
for (i = 0; i < 8; i++) {
x0 = in[xpos + 0][ypos + i];
x1 = in[xpos + 1][ypos + i];
x2 = in[xpos + 2][ypos + i];
x3 = in[xpos + 3][ypos + i];
x4 = in[xpos + 4][ypos + i];
x5 = in[xpos + 5][ypos + i];
x6 = in[xpos + 6][ypos + i];
x7 = in[xpos + 7][ypos + i];
//stage 1
tmp0 = x0 + x7;
tmp7 = x0 - x7;
tmp1 = x1 + x6;
tmp6 = x1 - x6;
tmp2 = x2 + x5;
tmp5 = x2 - x5;
tmp3 = x3 + x4;
tmp4 = x3 - x4;
//stage 2
tmp16 = ((tmp5*3)>>3) + tmp6;
tmp15 = ((tmp16*5)>>3) - tmp5;
//stage 3
tmp10 = tmp0 + tmp3;
tmp13 = tmp0 - tmp3;
tmp11 = tmp1 + tmp2;
tmp12 = tmp1 - tmp2;
tmp14 = tmp4 + tmp15;
tmp15 = tmp4 - tmp15;
auto z = tmp16;
tmp16 = tmp7 - tmp16;
tmp17 = z + tmp7;
//stage 4
tmp14 = (tmp17 >> 3) - tmp14;
tmp10 = tmp10 + tmp11;
tmp11 = (tmp10 >> 1) - tmp11;
tmp12 = ((tmp13*3)>>3) - tmp12;
tmp13 = ((tmp12*3)>>3) + tmp13;
tmp15 = ((tmp16*7)>>3) + tmp15;
tmp16 = (tmp15>>1) - tmp16;
//stage 5
row[i][0] = tmp10;
row[i][4] = tmp11;
row[i][6] = tmp12;
row[i][2] = tmp13;
row[i][7] = tmp14;
row[i][5] = tmp15;
row[i][3] = tmp16;
row[i][1] = tmp17;
}
//rotate columns
/* transform columns */
for (i = 0; i < 8; i++) {
x0 = row[0][i];
x1 = row[1][i];
x2 = row[2][i];
x3 = row[3][i];
x4 = row[4][i];
x5 = row[5][i];
x6 = row[6][i];
x7 = row[7][i];
//stage 1
tmp0 = x0 + x7;
tmp7 = x0 - x7;
tmp1 = x1 + x6;
tmp6 = x1 - x6;
tmp2 = x2 + x5;
tmp5 = x2 - x5;
tmp3 = x3 + x4;
tmp4 = x3 - x4;
//stage 2
tmp16 = ((tmp5*3)>>3) + tmp6;
tmp15 = ((tmp16*5)>>3) - tmp5;
//stage 3
tmp10 = tmp0 + tmp3;
tmp13 = tmp0 - tmp3;
tmp11 = tmp1 + tmp2;
tmp12 = tmp1 - tmp2;
tmp14 = tmp4 + tmp15;
tmp15 = tmp4 - tmp15;
auto z = tmp16;
tmp16 = tmp7 - tmp16;
tmp17 = z + tmp7;
//stage 4
tmp14 = (tmp17 >> 3) - tmp14;
tmp10 = tmp10 + tmp11;
tmp11 = (tmp10 >> 1) - tmp11;
tmp12 = ((tmp13*3)>>3) - tmp12;
tmp13 = ((tmp12*3)>>3) + tmp13;
tmp15 = ((tmp16*7)>>3) + tmp15;
tmp16 = (tmp15>>1) - tmp16;
//stage 5
data[0][i] = tmp10 >> 3;
data[4][i] = tmp11 >> 3;
data[6][i] = tmp12 >> 3;
data[2][i] = tmp13 >> 3;
data[7][i] = tmp14 >> 3;
data[5][i] = tmp15 >> 3;
data[3][i] = tmp16 >> 3;
data[1][i] = tmp17 >> 3;
}
}我已经按行编码了第一个DCT,按列编码了第二个DCT,并假定将结果除以8(根据N=8的DCT公式)。
我在一个8x8矩阵上进行了测试:
int matrix_a[8][8] = {
12, 16, 19, 12, 12, 27, 51, 47,
16, 24, 12, 19, 12, 20, 39, 51,
24, 27, 8, 39, 35, 34, 24, 44,
40, 17, 28, 32, 24, 27, 8, 32,
34, 20, 28, 20, 12, 8, 19, 34,
19, 39, 12, 27, 27, 12, 8, 34,
8, 28, -5, 39, 34, 16, 12, 19,
20, 27, 8, 27, 24, 19, 19, 8,
};我得到了这样的结果:
MYBINDCT-2:
186 13 -3 4 -2 4 6 0
-13 -20 -10 1 2 -2 1 -4
1 19 -10 -3 7 -12 -2 -4
5 2 -4 -3 -1 -4 -2 -1
11 -5 -7 1 -3 4 -1 0
-13 8 -3 0 10 -4 -6 3
-11 6 -11 1 6 0 -1 -4
-13 4 -1 -3 5 -5 -1 0 这与(四舍五入的)真实dct相去甚远:
186 20 -11 -9 -4 3 8 -1
-18 -35 -24 -5 9 -3 0 -8
14 26 -2 14 7 -19 -3 -3
-9 -10 5 -15 1 8 3 1
23 -11 -19 -9 -11 8 -2 1
-10 10 3 -3 17 -4 -8 4
-14 13 -21 -4 18 0 -1 -7
-19 7 -1 8 15 -7 -3 0 我已经应用了这个算法,做了很多测试,但我仍然不明白我在哪里犯了错。
有没有比我经验更丰富的人能给我解释一下我所犯的错误?奇怪的是,我已经实现了Loeffler,就像我写的那样,它工作得很好。除了系数和浮点数之外,其过程非常相似(蝶形模式、浮动缩放因子、归一化)。我被困住了。感谢大家能给我建议答案。
编辑:一个简短的调用是:
int main(int argc, char **argv)
{
int MYBINDCT[8][8];
my_binDCT(matrix_a, MYBINDCT, 0, 0);
cout << "\nMYBINDCT: \n";
for (int i = 0; i < 8; i++)
{
cout << '\n;
for (int j = 0; j < 8; j++)
{
cout << MYBINDCT[i][j] << " ";
}
}
return 0;
}发布于 2018-03-28 23:54:08
没有乘法器的计算方案(或者有3或5这样的粗略的乘法器)不可能非常精确;我认为您的结果实际上是可以的。
如果你的论文很好,它应该指定结果的预期精度。另外,对于8x8DCT问题,42是一个非常通用的解决方案,但具有未指明的精度。
在对DCT进行近似时,用更容易实现的定义替换DCT的定义是很常见的。如果使用DCT进行图像压缩,则将DCT的定义更改为任何变换都可以,只要您还相应地更改了IDCT (逆变换)。例如,H.264 (视频编码标准)就是这样做的。
https://stackoverflow.com/questions/49535197
复制相似问题