文章/答案/技术大牛

发布

社区首页 >问答首页 >6502快速签名16位除以7

问6502快速签名16位除以7
EN

Stack Overflow用户

提问于 2018-07-18 21:30:31

回答 3查看 1.5K关注 0票数 14

我正在为一个6502 cpu的汇编语言程序工作，我发现我需要一个尽可能快的除以七个例程，特别是一个可以得到16位红利的程序。

我熟悉的例程找到了这里，但概括的除以七例程发现那里相当复杂，并粗略检查了一般算法(使用整数除法)。

x/7 ~= (x + x/8 + x/64 . )/8

指示要处理16位范围，可能需要100多个周期才能完成，因为6502的单个累加器寄存器和6502上的单个内存位移动相对缓慢。

我认为查找表可能会有所帮助，但在6502上，我当然只限于查找256字节或更少的表。为此，可以假设存在两个256个字节的查找表，xdiv7和xmod7，当使用一个无符号的单字节值作为表的索引时，可以快速获得一个字节除以7或模块化7的结果。但是，我不知道如何利用这些来找到整个16位范围的值。

同时，我还需要一个模块化7算法，尽管理想的情况下，任何可以用除法解决的解决方案也会产生一个mod7结果。如果需要额外的预计算表，我可以添加这些表，只要所有表的总内存需求不超过3k。

虽然我最终需要一个有符号的除法算法，但是一个无符号的算法就足够了，因为我可以根据需要将其概括为一个有符号的例程。

如能提供任何协助，将不胜感激。

assembly

division

micro-optimization

integer-division

6502

回答 3

Stack Overflow用户

回答已采纳

发布于 2018-07-19 05:36:46

注意:正如不信者在注释中指出的那样，upperHigh和lowerLow表是相同的。所以它们可以组合成一张桌子。然而，这种优化将使代码更难阅读，而解释更难以编写，因此将合并表留给读者作为练习。

下面的代码展示了在将16位无符号值除以7时如何生成商和余数。解释代码(IMO)的最简单方法是使用一个示例，因此让我们考虑将0xa732除以7。预期的结果是：

quotient = 0x17e2
remainder = 4

我们首先考虑输入为两个8位值，upper字节和lower字节.upper字节是0xa7，lower字节是0x32。

我们从upper字节计算商和余数：

0xa700 / 7 = 0x17db
0xa700 % 7 = 3

所以我们需要三张桌子：

upperHigh存储商数的高字节：upperHigh[0xa7] = 0x17
upperLow存储商：upperLow[0xa7] = 0xdb的低字节
upperRem存储剩余部分：upperRem[0xa7] = 3

计算lower字节的商和余数：

0x32 / 7 = 0x07
0x32 % 7 = 1

所以我们需要两张桌子：

lowerLow存储商：lowerLow[0x32] = 0x07的低字节
lowerRem存储剩余部分：lowerRem[0x32] = 1

现在我们需要收集最后的答案。剩下的是两个余数之和。因为每个余数都在0,6，和在0,12。所以我们可以使用两个13字节的查找将和转换为最后的余数和进位。

商的低字节是进位和来自lowerLow和upperLow表的值的总和。注意，和可能会生成一个进位到高字节。

商数的高字节是进位和upperHigh表的值之和。

因此，要完成这个示例：

remainder = 1 + 3 = 4              // simple add (no carry in)
lowResult = 0x07 + 0xdb = 0xe2     // add with carry from remainder
highResult = 0x17                  // add with carry from lowResult

实现这一点的程序集代码包括7个表查找、一个加-不带进位指令和两个加带-进位指令.

#include <stdio.h>
#include <stdint.h>

uint8_t upperHigh[256];  // index:(upper 8 bits of the number)  value:(high 8 bits of the quotient)
uint8_t upperLow[256];   // index:(upper 8 bits of the number)  value:(low  8 bits of the quotient)
uint8_t upperRem[256];   // index:(upper 8 bits of the number)  value:(remainder when dividing the upper bits by 7)
uint8_t lowerLow[256];   // index:(lower 8 bits of the number)  value:(low  8 bits of the quotient)
uint8_t lowerRem[256];   // index:(lower 8 bits of the number)  value:(remainder when dividing the lower bits by 7)
uint8_t carryRem[13]    = { 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1 };
uint8_t combinedRem[13] = { 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5 };

void populateLookupTables(void)
{
    for (uint16_t i = 0; i < 256; i++)
    {
        uint16_t upper = i << 8;
        upperHigh[i] = (upper / 7) >> 8;
        upperLow[i] = (upper / 7) & 0xff;
        upperRem[i] = upper % 7;

        uint16_t lower = i;
        lowerLow[i] = lower / 7;
        lowerRem[i] = lower % 7;
    }
}

void divideBy7(uint8_t upperValue, uint8_t lowerValue, uint8_t *highResult, uint8_t *lowResult, uint8_t *remainder)
{
    uint8_t temp = upperRem[upperValue] + lowerRem[lowerValue];
    *remainder = combinedRem[temp];
    *lowResult = upperLow[upperValue] + lowerLow[lowerValue] + carryRem[temp];
    uint8_t carry = (upperLow[upperValue] + lowerLow[lowerValue] + carryRem[temp]) >> 8;  // Note this is just the carry flag from the 'lowResult' calcaluation
    *highResult = upperHigh[upperValue] + carry;
}

int main(void)
{
    populateLookupTables();

    uint16_t n = 0;
    while (1)
    {
        uint8_t upper = n >> 8;
        uint8_t lower = n & 0xff;

        uint16_t quotient1  = n / 7;
        uint16_t remainder1 = n % 7;

        uint8_t high, low, rem;
        divideBy7(upper, lower, &high, &low, &rem);
        uint16_t quotient2 = (high << 8) | low;
        uint16_t remainder2 = rem;

        printf("n=%u q1=%u r1=%u q2=%u r2=%u", n, quotient1, remainder1, quotient2, remainder2);
        if (quotient1 != quotient2 || remainder1 != remainder2)
            printf(" **** failed ****");
        printf("\n");

        n++;
        if (n == 0)
            break;
    }
}

票数 7

Stack Overflow用户

发布于 2018-07-19 22:14:50

在无符号整数除法例程进行8位除以7：

;Divide by 7 (From December '84 Apple Assembly Line)
;15 bytes, 27 cycles
  sta  temp
  lsr
  lsr
  lsr
  adc  temp
  ror
  lsr
  lsr
  adc  temp
  ror
  lsr
  lsr

估计大约100个周期的位移是相当准确的:104个周期到最后一个错误，106个周期总数不包括rts，112个周期为整个函数。

注意到：在为C64组装并使用副模拟器进行C64之后，我发现算法失败了，例如，65535给出了9343，正确的答案是9362。

   ; for 16 bit division  by 7
   ; input:
  ;   register A is low byte
  ;   register X is high byte
  ; output 
  ;   register A is low byte
  ;   register X is high byte
  ;
  ; memory on page zero
  ; temp     is on page zero, 2 bytes
  ; aHigh    is on page zero, 1 byte
  --
  sta temp
  stx temp+1
  stx aHigh
  --
  lsr aHigh
  ror a
  lsr aHigh
  ror a
  lsr aHigh
  ror a
  ---
  adc temp
  tax
  lda aHigh
  adc temp+1
  sta aHigh
  txa
  --
  ror aHigh
  ror a
  lsr aHigh
  ror a
  lsr aHigh
  ror a
  --
  adc temp
  tax
  lda aHigh
  adc temp+1
  sta aHigh
  txa
  --
  ror aHigh
  ror a
  lsr aHigh
  ror a
  lsr aHigh
  ror a     -- 104 cycles
  ;-------
  ldx aHigh  ; -- 106
  rts        ; -- 112 cycles

票数 4

Stack Overflow用户

发布于 2018-12-28 17:28:45

另一种方法是将除法转化为乘法。

为了求出乘法因子，我们有兴趣取倒数。我们基本上是这样做的：

d = n*(1/7)

为了使事情更准确，我们用2.2^16这个方便的幂乘积，效果很好：

d = floor(n*floor(65536/7)/65536)

乘法因子为:地板(65536/7)，为9362。结果的大小如下：

ceiling(log2(65535*9362)) = 30 bits (4 bytes rounded up)

然后，我们可以将下面的两个字节除以65536，或者仅仅使用上面的2个字节作为最终结果。

为了计算实际的旋转和添加，我们检查了因子9362的二进制表示：

10010010010010

注意位模式的重复。因此，一个有效的方案是计算：

((n*9*256/4 + n*9)*8 + n)*2 = 9362*n

计算n*9只需要上限(log2(65535*9))= 20位(3字节)。

在伪程序集中，这是：

LDA number          ; lo byte
STA multiply_nine
LDA number+1        ; high byte
STA multiply_nine+1
LDA #0              
STA multiply_nine+2 ; 3 byte result

ASL multiply_nine   ; multiply by 2
ROL multiply_nine+1
ROL mulitply_nine+2
ASL multiply_nine   ; multiply by 2 (4)
ROL multiply_nine+1
ROL mulitply_nine+2
ASL multiply_nine   ; multiply by 2 (8)
ROL multiply_nine+1
ROL mulitply_nine+2

CLC                 ; not really needed as result is only 20 bits, carry always zero
LDA multiply_nine
ADC number
STA multiply_nine
LDA multiply_nine+1
ADC number+1
STA multiply_nine+1
LDA multiply_nine+2
ADC #0
STA multiply_nine+2 ; n*9

剩下的练习我留给手术室。注意，它没有必要乘以256，因为这只是一个整体的字节移动。

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/51411251

复制

相似问题

问6502快速签名16位除以7
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问6502快速签名16位除以7EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问6502快速签名16位除以7
EN