我尝试在FloatingPoint32bit矩阵乘法之前应用INT8bit量化,然后将累积的INT32bit输出重新量化为INT8bit。毕竟,我猜在这个过程中有几个地方搞错了。我觉得被困在发现那些麻烦的地方。
数据流仿射量化
input(fp32) -> quant(int8) ____\ matmul(int32) -> requant(int8) ->deq(fp32)
input(fp32) -> quant(int8) -/
My Pseudo Code
INPUT(FP32) :
Embedded Words in Tensor (shape : [1, 4, 1024, 256]) A and B (B is the same as A)输入A(=B):enter image description here
EXPECTING OUTPUT(FP32) :
Embedded Words in Tensor (shape : [1, 4, 1024, 1024]) AB(after matrix multiplication to itself)
do while(true):
# convert A and B of FP32 into INT8
A_zero_offset = torch.empty(A.shape)
A_zero_offset = torch.zeros_like(A_zero_offset) # offset to be zero **[Question1]**
scale = 255 / (torch.max(A) - torch.min(B)) # 2^8 - 1 = 255
A_quantized = np.round((A - A_zero_offset) * scale)
# likewise
B_quantized = A_quantized
AB = A_quantized.matmul(B_quantized.transpose(-1, -2))
# now accumulated datatype is INT32
AB_offset = torch.empty(AB.shape)
AB_offset = AB_offset.new_full(AB.shape, torch.min(AB)) # offset to be AB's min element **[Question 1]**
scale_AB = 255 / (torch.max(AB) - torch.min(AB)) **[Question 2]**
AB_requantized = np.round((AB - AB_offset) * scale_AB)
# dequantize AB(INT8 at the status quo) into FP32
**[Question 3]**问题1:将A的偏移量设置为零,AB的偏移量设置为min(AB)是否有意义?
问题2:我应该对刻度计算进行什么操作,"max(AB) - min(AB)“或任何其他方法?
问题3:毕竟,当将结果反量化到FP32时,我必须遵循哪些操作,尤其是小数和偏移量计算?
发布于 2021-10-18 07:21:45
我认为这种方法是完全错误的,因为每个嵌入的单词张量都有不同的最大值和最小值,所以这个bug会改变数据的连续性。我假设您已经意识到了您的松散信息,因为您不能以相同的张量形状将fp32映射到int8
import torch
import numpy as np
# create Pseudo tensor
a = torch.tensor([[0.654654, 1.654687, -0.5645365],
[5.687646, -5.662354, 0.6546646]], dtype=torch.float32)
print(a.dtype)
print(a)
# torch.float32
# tensor([[ 0.6547, 1.6547, -0.5645],
# [ 5.6876, -5.6624, 0.6547]])
b = a.clone().int()
print(b)
# tensor([[ 0, 1, 0],
# [ 5, -5, 0]], dtype=torch.int32)
# converting to int8 please note range is here -128 to + 128
c = a.clone().to(torch.int8)
print(c)
# tensor([[ 0, 1, 0],
# [ 5, -5, 0]], dtype=torch.int8)
# converting to uint8 please note range is here 0 to 255
d = a.clone().byte()
print(d)
# tensor([[ 0, 1, 0],
# [ 5, 251, 0]], dtype=torch.uint8)你的方法(错误)
A, B = a
A_zero_offset = torch.empty(A.shape)
A_zero_offset = torch.zeros_like(A_zero_offset) # offset to be zero **[Question1]**
scale = 255 / (torch.max(A) - torch.min(B)) # 2^8 - 1 = 255
A_quantized = np.round((A - A_zero_offset) * scale)
print(A_quantized.dtype)
print(A_quantized)
# torch.float32
# tensor([ 23., 58., -20.])https://stackoverflow.com/questions/69610727
复制相似问题