文章/答案/技术大牛

发布

社区首页 >问答首页 >FP32矩阵乘法的INT8量化

问FP32矩阵乘法的INT8量化
EN

Stack Overflow用户

提问于 2021-10-18 04:17:03

回答 1查看 105关注 0票数 0

我尝试在FloatingPoint32bit矩阵乘法之前应用INT8bit量化，然后将累积的INT32bit输出重新量化为INT8bit。毕竟，我猜在这个过程中有几个地方搞错了。我觉得被困在发现那些麻烦的地方。

数据流仿射量化

input(fp32) -> quant(int8) ____\ matmul(int32) -> requant(int8) ->deq(fp32)

input(fp32) -> quant(int8) -/

My Pseudo Code
INPUT(FP32) :
 Embedded Words in Tensor (shape : [1, 4, 1024, 256]) A and B (B is the same as A)

输入A(=B)：enter image description here

EXPECTING OUTPUT(FP32) : 
 Embedded Words in Tensor (shape : [1, 4, 1024, 1024]) AB(after matrix multiplication to itself)

do while(true):
    # convert A and B of FP32 into INT8
    A_zero_offset = torch.empty(A.shape)
    A_zero_offset = torch.zeros_like(A_zero_offset)    # offset to be zero **[Question1]**
    scale = 255 / (torch.max(A) - torch.min(B))    # 2^8 - 1 = 255
    A_quantized = np.round((A - A_zero_offset) * scale)

    # likewise
    B_quantized = A_quantized

    AB = A_quantized.matmul(B_quantized.transpose(-1, -2))
    # now accumulated datatype is INT32

    AB_offset = torch.empty(AB.shape)
    AB_offset = AB_offset.new_full(AB.shape, torch.min(AB)) # offset to be AB's min element **[Question 1]**
    scale_AB = 255 / (torch.max(AB) - torch.min(AB))    **[Question 2]** 
    AB_requantized = np.round((AB - AB_offset) * scale_AB)

    # dequantize AB(INT8 at the status quo) into FP32
    **[Question 3]**

问题1:将A的偏移量设置为零，AB的偏移量设置为min(AB)是否有意义？

问题2:我应该对刻度计算进行什么操作，"max(AB) - min(AB)“或任何其他方法？

问题3:毕竟，当将结果反量化到FP32时，我必须遵循哪些操作，尤其是小数和偏移量计算？

matrix-multiplication

quantization

python

nlp

pytorch

回答 1

Stack Overflow用户

发布于 2021-10-18 07:21:45

我认为这种方法是完全错误的，因为每个嵌入的单词张量都有不同的最大值和最小值，所以这个bug会改变数据的连续性。我假设您已经意识到了您的松散信息，因为您不能以相同的张量形状将fp32映射到int8

import torch
import numpy as np

# create Pseudo tensor
a = torch.tensor([[0.654654, 1.654687, -0.5645365],
                  [5.687646, -5.662354, 0.6546646]], dtype=torch.float32)
print(a.dtype)
print(a)
# torch.float32
# tensor([[ 0.6547,  1.6547, -0.5645],
#         [ 5.6876, -5.6624,  0.6547]])

b = a.clone().int()
print(b)
# tensor([[ 0,  1,  0],
#         [ 5, -5,  0]], dtype=torch.int32)

# converting to int8 please note range is here -128 to + 128
c = a.clone().to(torch.int8)
print(c)
# tensor([[ 0,  1,  0],
#         [ 5, -5,  0]], dtype=torch.int8)

# converting to uint8 please note range is here 0 to 255
d = a.clone().byte()
print(d)
# tensor([[  0,   1,   0],
#         [  5, 251,   0]], dtype=torch.uint8)

你的方法(错误)

A, B = a

A_zero_offset = torch.empty(A.shape)
A_zero_offset = torch.zeros_like(A_zero_offset)  # offset to be zero **[Question1]**
scale = 255 / (torch.max(A) - torch.min(B))  # 2^8 - 1 = 255
A_quantized = np.round((A - A_zero_offset) * scale)

print(A_quantized.dtype)
print(A_quantized)

# torch.float32
# tensor([ 23.,  58., -20.])

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/69610727

复制

相似问题

问FP32矩阵乘法的INT8量化
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问FP32矩阵乘法的INT8量化EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问FP32矩阵乘法的INT8量化
EN