我很难理解下面的代码 (bimpy.v),它执行无符号的2位乘法操作。
编辑:添加了我的一位朋友的评论:下面的修改与减少逻辑做同样的事情!!
o_r <= (i_a[0] ? i_b : 2'b0) + ((i_a[1] ? i_b : 2'b0) << 1);这两个信号(w_r和c)在bimpy.v中的用途是什么?
assign w_r = { ((i_a[1])?i_b:{(BW){1'b0}}), 1'b0 }
^ { 1'b0, ((i_a[0])?i_b:{(BW){1'b0}}) };
assign c = { ((i_a[1])?i_b[(BW-2):0]:{(BW-1){1'b0}}) }
& ((i_a[0])?i_b[(BW-1):1]:{(BW-1){1'b0}});代码不匹配2位乘2位二进制乘法器栅极电平图,如果错误请纠正我。

我还附加了一个工作波形从bimpy.v为一个简单的2x2无符号乘法器。

我还为bimpy.v生成了一个门级表示图。

////////////////////////////////////////////////////////////////////////////////
//
// Filename: bimpy
//
// Project: A multiply core generator
//
// Purpose: An unsigned 2-bit multiply based upon the fact that LUT's allow
// 6-bits of input, but a 2x2 bit multiply will never carry more
// than one bit. While this multiply is hardware independent, it is
// really motivated by trying to optimize for a specific piece of
// hardware (Xilinx-7 series ...) that has 4-input LUT's with carry
// chains.
//
// Creator: Dan Gisselquist, Ph.D.
// Gisselquist Technology, LLC
//
////////////////////////////////////////////////////////////////////////////////
//
// Copyright (C) 2015,2017-2019, Gisselquist Technology, LLC
//
// This program is free software (firmware): you can redistribute it and/or
// modify it under the terms of the GNU General Public License as published
// by the Free Software Foundation, either version 3 of the License, or (at
// your option) any later version.
//
// This program is distributed in the hope that it will be useful, but WITHOUT
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
// for more details.
//
// You should have received a copy of the GNU General Public License along
// with this program. If not, see <http://www.gnu.org/licenses/> for a
// copy.
//
// License: GPL, v3, as defined and found on www.gnu.org,
// http://www.gnu.org/licenses/gpl.html
//
//
////////////////////////////////////////////////////////////////////////////////
module bimpy(i_clk, i_reset, i_ce, i_a, i_b, o_r);
parameter BW=2, LUTB=2;
input i_clk, i_reset, i_ce;
input [(LUTB-1):0] i_a;
input [(BW-1):0] i_b;
output reg [(BW+LUTB-1):0] o_r;
wire [(BW+LUTB-2):0] w_r;
wire [(BW+LUTB-3):1] c;
assign w_r = { ((i_a[1])?i_b:{(BW){1'b0}}), 1'b0 }
^ { 1'b0, ((i_a[0])?i_b:{(BW){1'b0}}) };
assign c = { ((i_a[1])?i_b[(BW-2):0]:{(BW-1){1'b0}}) }
& ((i_a[0])?i_b[(BW-1):1]:{(BW-1){1'b0}});
initial o_r = 0;
always @(posedge i_clk)
if (i_reset)
o_r <= 0;
else if (i_ce)
o_r <= w_r + { c, 2'b0 };
endmodule发布于 2019-01-18 12:56:28
关于 MUXes的注记
回想一下,?描述了一个多路复用器(MUX),所以语句:
out = sel ? x : y
在门级实现中等效于:
out = (sel & x) | (~sel & y)
(当sel=1,out <= x,sel=0,out <= y)
如果是y=0,那么MUX就减少到x和sel之间的一个和:out = (sel & x) | (~sel & 0) = sel & x。
派生w_r
假设BW=2和LUTB=2 w_r是4位信号.让我们把它分解:
w_r = w_rL ^ x_rR
w_rL = { ((i_a[1])?i_b:{(BW){1'b0}}), 1'b0 }
w_rR = { 1'b0, ((i_a[0])?i_b:{(BW){1'b0}}) }
请注意MUX的两个“MUXes”值是如何为零的,因此MUXes被简化为ANDs,如上面的注释所示:
w_rL = { BW{i_a[1]} & i_b, 1'b0 } = { A1 & B1, A1 & B0, 0 }
w_rR = { 1'b0, BW{i_a[0]} & i_b } = { 0, A0 & B1, A0 & B0}
我替换了i_a = {A1, A0}和i_b = {B1, B0}以简化表示。最后,通过XORing按位表示:
w_r[0] = 0 ^ (A0 & B0) = A0 & B0w_r[1] = (A1 & B0) ^ (A0 & B1)w_r[2] = (A1 & B1) ^ 0 = A1 & B1w_r[3] = 0 (隐式集)派生c
同样,对于1位c信号:
c = cL & cR
cL = i_a[1] ? i_b[(BW-2):0]:{(BW-1){1'b0}} = {A1 & B0}
cR = i_a[0] ? i_b[(BW-1):1]:{(BW-1){1'b0}} = {A0 & B1)
最终:
c = {A1 & B0 & A0 & B1}派生o_r
如果我们分解o_r比特:
o_r[0] = 0 + w_r[0] = A0 & B0o_r[1] = 0 + w_r[1] = (A1 & B0) ^ (A0 & B1)o_r[2] = c + w_r[2] = (A1 & B0 & A0 & B1) + (A1 & B1) --如果我们加上它们,那么和就是它们的异或,而进位是它们的和,即:o_r[2] = (A1 & B0 & A0 & B1) ^ (A1 & B1)o_r[3] = <carry from o_r[2] addition> = A1 & B0 & A0 & B1 & A1 & B1 = A1 & B0 & A0 & B1 (记住,我自己的ANDing等于我自己,也就是x & x = x)门级图输出
您的门级图描述了以下方程:
C0 = A0 & B0 (=o_r[0])
C1 = (A0 & B1) ^ (A1 & B0) (=o_r[1])
C2 = (A0 & B1 & A1 & B0) ^ (A1 & B1) (=o_r[2] sum)
C3 = (A0 & B1 & A1 & B0) & (A1 & B1) = A0 & B1 & A1 & B0 (=o_r[3]进位)
为什么实现如此奇怪?!
代码注释表明,乘法器单元是为特定的FPGA体系结构而构建的,而原来的编码器的意图是将每个乘法器单元用于该体系结构的一个LUT。所以我打赌,最初的编码器试图“引导”一种古老的、愚蠢的工具,以FPGA效率的方式构建乘法器,而这通常不是一种门级高效的方法。我认为这种“手动”RTL级别的优化对于今天的EDA工具是无用的(希望如此!)
https://stackoverflow.com/questions/54247731
复制相似问题