我正在计算每个硬件周期的大量数据(每周期64B)。为了并行化CRC计算,我想对小数据块计算CRC,然后并行地XOR它们。
方法:
因此,我计算一小块数据的CRC,然后用0x1的CRC乘以'i‘倍,如下所示。
简言之,我正在努力做到以下几点:

例如: CRC-8 在这个网站上
Input Data=(0x05 0x07) CRC=0x54
Step-1: Data=0x5 CRC=0x1B
Step-2: Data=0x7 CRC=0x15
Step-3: Data=(0x1 0x0) CRC=0x15
Step-4: Multiply step-1 CRC and step-3 CRC with primitive polynomial 0x7. So, I calculate (0x1B).(0x15) = (0x1 0xC7) mod 0x7.
Step-5: Calculate CRC Data=(0x1 0xC7) CRC=0x4E (I assume this is same as (0x1 0xC7) mod 0x7)
Step-6: XOR the result to get the final CRC. 0x4E^0x15=0x5B正如我们所看到的,步骤6中的结果不是正确的结果。
有人能帮我计算填充数据的CRC吗?或者在上面的例子中,我做错了什么?
发布于 2022-01-15 21:01:30
与计算和调整多个CRC的数据不同,数据字节可以无带乘形成一组16位的“折叠”产品,然后对xor‘’ed“折叠”产品执行一次模块化操作。优化的模块化操作使用了两个无带倍数,因此在所有折叠产品生成和xor‘组合后才能避免。无基数乘法使用XOR代替ADD,而无借入除法使用XOR代替SUB。Intel使用XMM指令PCLMULQDQ (无载波乘法)编写了一个pdf文件,其中每次读取16个字节,分成两个8字节组,每个组折叠成一个16字节的乘积,两个16字节的产品被xor‘编码成一个16字节的产品。使用8个XMM寄存器保存折叠产品,同时处理128个字节。(在AVX512和ZMM寄存器的情况下,此时有256个字节)。
假设您的硬件可以实现一个无载波乘法,它需要两个8位操作数,并生成一个16位(技术上是15位)产品。
让消息=M= 31 32 33 34 35 36 37 38。在这种情况下,CRC(M) = C7
pre-calculated constants (all values shown in hex):
2^38%107 = DF cycles forwards 0x38 bits
2^30%107 = 29 cycles forwards 0x30 bits
2^28%107 = 62 cycles forwards 0x28 bits
2^20%107 = 16 cycles forwards 0x20 bits
2^18%107 = 6B cycles forwards 0x18 bits
2^10%107 = 15 cycles forwards 0x10 bits
2^08%107 = 07 cycles forwards 0x08 bits
2^00%107 = 01 cycles forwards 0x00 bits
16 bit folded (cycled forward) products (can be calculated in parallel):
31·DF = 16CF
32·29 = 07E2
33·62 = 0AC6
34·16 = 03F8
35·6B = 0A17
36·15 = 038E
37·07 = 0085
38·01 = 0038
----
V = 1137 the xor of the 8 folded products
CRC(V) = 113700 % 107 = C7为了避免在模运算中使用无借方除法,可以使用无载乘来计算CRC(V)。例如
V = FFFE
CRC(V) = FFFE00 % 107 = 23.实现,同样,十六进制中的所有值(十六进制10 =十进制16),⊕是XOR。
input:
V = FFFE
constants:
P = 107 polynomial
I = 2^10 / 107 = 107 "inverse" of polynomial
by coincidence, it's the same value
2^10 % 107 = 15 for folding right 16 bits
fold the upper 8 bits of FFFE00 16 bits to the right:
U = FF·15 ⊕ FE00 = 0CF3 ⊕ FE00 = F2F3 (check: F2F3%107 = 23 = CRC)
Q = ((U>>8)·I)>>8 = (F2·107)>>8 = ...
to avoid a 9 bit operand, split up 107 = 100 ⊕ 7
Q = ((F2·100) ⊕ (F2·07))>>8 = ((F2<<8) ⊕ (F2·07))>>8 = (F200 ⊕ 02DE)>>8 = F0DE>>8 = F0
X = Q·P = F0·107 = F0·100 ⊕ F0·07 = F0<<8 ⊕ F0·07 = F000 ⊕ 02D0 = F2D0
CRC = U ⊕ X = F2F3 ⊕ F2D0 = 23因为CRC是8位,所以在最后两个步骤中不需要上面的8位,但是对整个计算没有多大帮助。
X = (Q·(P&FF))&FF = (F0·07)&FF = D0
CRC = (U&FF) ⊕ X = F3 ⊕ D0 = 23生成2^0x10 / 0x107和功率为2%0x107的示例程序:
#include <stdio.h>
typedef unsigned char uint8_t;
typedef unsigned short uint16_t;
#define poly 0x107
uint16_t geninv(void) /* generate 2^16 / 9 bit poly */
{
uint16_t q = 0x0000u; /* quotient */
uint16_t d = 0x0001u; /* initial dividend = 2^0 */
for(int i = 0; i < 16; i++){
d <<= 1;
q <<= 1;
if(d&0x0100){ /* if bit 8 set */
q |= 1; /* q |= 1 */
d ^= poly; /* d ^= poly */
}
}
return q; /* return inverse */
}
uint8_t powmodpoly(int n) /* generate 2^n % 9 bit poly */
{
uint16_t d = 0x0001u; /* initial dividend = 2^0 */
for(int i = 0; i < n; i++){
d <<= 1; /* shift dvnd left */
if(d&0x0100){ /* if bit 8 set */
d ^= poly; /* d ^= poly */
}
}
return (uint8_t)d; /* return remainder */
}
int main()
{
printf("%04x\n", geninv());
printf("%02x %02x %02x %02x %02x %02x %02x %02x %02x %02x\n",
powmodpoly(0x00), powmodpoly(0x08), powmodpoly(0x10), powmodpoly(0x18),
powmodpoly(0x20), powmodpoly(0x28), powmodpoly(0x30), powmodpoly(0x38),
powmodpoly(0x40), powmodpoly(0x48));
printf("%02x\n", powmodpoly(0x77)); /* 0xd9, cycles crc backwards 8 bits */
return 0;
}2^0x10 / 0x107的长手示例。
100000111 quotient
-------------------
divisor 100000111 | 10000000000000000 dividend
100000111
---------
111000000
100000111
---------
110001110
100000111
---------
100010010
100000111
---------
10101 remainder我不知道在硬件设计中可以有多少寄存器,但是假设有五个16位寄存器用来保存折叠值,或者两个或八个8位寄存器(取决于折叠的并行程度)。然后,按照Intel的文件,对所有64个字节、每次8个字节的值进行折叠,并且只需要一个模块操作。寄存器大小,fold# = 16位,reg# =8位。请注意,2模聚的幂是预先计算的常数.
foldv = prior buffer's folding value, equivalent to folded msg[-2 -1]
reg0 = foldv>>8
reg1 = foldv&0xFF
foldv = reg0·((2^0x18)%poly) advance by 3 bytes
foldv ^= reg1·((2^0x10)%poly) advance by 2 bytes
fold0 = msg[0 1] ^ foldv handling 2 bytes at a time
fold1 = msg[2 3]
fold2 = msg[4 5]
fold3 = msg[6 7]
for(i = 8; i < 56; i += 8){
reg0 = fold0>>8
reg1 = fold0&ff
fold0 = reg0·((2^0x48)%poly) advance by 9 bytes
fold0 ^= reg1·((2^0x40)%poly) advance by 8 bytes
fold0 ^= msg[i+0 i+1]
reg2 = fold1>>8 if not parallel, reg0
reg3 = fold1&ff and reg1
fold1 = reg2·((2^0x48)%poly) advance by 9 bytes
fold1 ^= reg3·((2^0x40)%poly) advance by 8 bytes
fold1 ^= msg[i+2 i+3]
...
fold3 ^= msg[i+6 i+7]
}
reg0 = fold0>>8
reg1 = fold0&ff
fold0 = reg0·((2^0x38)%poly) advance by 7 bytes
fold0 ^= reg1·((2^0x30)%poly) advance by 6 bytes
reg2 = fold1>>8 if not parallel, reg0
reg3 = fold1&ff and reg1
fold1 = reg2·((2^0x28)%poly) advance by 5 bytes
fold1 ^= reg3·((2^0x20)%poly) advance by 4 bytes
fold2 ... advance by 3 2 bytes
fold3 ... advance by 1 0 bytes
foldv = fold0^fold1^fold2^fold3假设最后一个缓冲区有5个字节:
foldv = prior folding value, equivalent to folded msg[-2 -1]
reg0 = foldv>>8
reg1 = foldv&0xFF
foldv = reg0·((2^0x30)%poly) advance by 6 bytes
foldv ^= reg1·((2^0x28)%poly) advance by 5 bytes
fold0 = msg[0 1] ^ foldv
reg0 = fold0>>8
reg1 = fold0&ff
fold0 = reg0·((2^0x20)%poly) advance by 4 bytes
fold0 ^= reg1·((2^0x18)%poly) advance by 3 bytes
fold1 = msg[2 3]
reg2 = fold1>>8
reg3 = fold1&ff
fold1 = reg0·((2^0x10)%poly) advance by 2 bytes
fold1 ^= reg1·((2^0x08)%poly) advance by 1 bytes
fold2 = msg[4] just one byte loaded
fold3 = 0
foldv = fold0^fold1^fold2^fold3
now use the method above to calculate CRC(foldv)发布于 2022-01-15 16:22:01
如图所示,您需要计算0x05 0x00的CRC,(A,0)和0x00 0x07的CRC,(0,B),然后计算排他的CRC,或者这些一起计算的CRC。在您链接的站点上计算,您将分别得到0x41和0x15。排他性的--或者把它们放在一起,然后,瞧,你得到了0x54,0x05 0x07的CRC。
(0,B)有一个快捷方式,因为对于这个CRC,零字符串的CRC是零。您可以计算0x07的CRC值,并得到与0x00 0x07相同的结果,即0x15。
一般情况下,请参阅克拉卡尼以了解如何组合CRCs。crcany将生成C代码来计算任何指定的CRC,包括用于组合CRC的代码。它使用了一种技术,在O(log(n))时间内将n个零应用于CRC,而不是O(n)时间。
https://stackoverflow.com/questions/70722842
复制相似问题