High-speed Energy-efficient 5:2 Compressor

High-speed Energy-efficient 5:2 Compressor
Ardalan Najafi, Somayeh Timarchi
Amir Najafi
Dep. of Electrical and Computer Engineering
Shahid Beheshti University
Tehran, Iran
[email protected] [email protected]
Dep. of Electrical, Computer and Biomedical Engineering
Qazvin Branch, Islamic Azad University
Qazvin, Iran
[email protected]
Abstract— Multipliers are important components that dictate
the overall arithmetic circuits’ performance. The most critical
components of multipliers are compressors. In this paper, a new
5:2 compressor architecture based on changing some internal
equations is proposed. In addition, using an efficient full-adder
(FA) block is considered to have a high-speed compressor. The
number of transistors in the proposed design is less than the best
existing 5:2 compressor architectures. Three 5:2 compressors are
considered for comparison. The proposed architecture is
compared with the best existing designs presented in the state-ofthe-art literature in terms of power, delay and area.
Architectures are simulated in 90-nm CMOS technology under 1
V supply voltage. The simulation results show that the proposed
compressor improves power and delay by 24.59% and 18.54%
respectively, compared to two of the best existing architectures.
In addition, voltage scaling and temperature analysis show that
the proposed architecture outperforms the other designs from
power-delay product (PDP) point of view in comparison to the
aforementioned designs.
Keywords—5:2 compressor; multiplier; high-performance
arithmetic circuit; low-power design
I.
INTRODUCTION
Rapid growth of using multipliers has attracted lots of
researchers’ attention into making high-performance
multipliers to consume less power and operate faster.
Multiplication process includes three steps: 1) partial product
generation; 2) partial product reduction; 3) final addition with
carry propagating. The second step makes the worst-case delay,
consumes the main part of power, and occupies the high
fraction of silicon area. To decrease the latency of this step,
compressors have been widely employed. Therefore, designing
a low-power and high-speed compressor is an important issue
that should be raised to have a proper multiplication and
subsequently a fast arithmetic computation.
5:2 compressors are widely used as building blocks of
multipliers [1, 2]. Lots of architectures for 5:2 compressors
have been proposed in the literature [3-6]. By a vast research
on these structures, it has been revealed that the structures
presented in [4] and [6] have better performance than others. In
[4], the authors have proposed a new 5:2 compressor. They
have simulated their proposed compressor by CMOS XORXNOR
modules
and
two
different
multiplexer
implementations. In [6], the authors have proposed a new
structure based on a new realization of 5:2 compressors’ truth
table.
86
In this paper, we propose a new 5:2 compressor by
applying some changes to the structure proposed in [4] to have
a high-speed and low-power compressor. We use Carry
Generator (CGEN) modules to produce Cout1 and Cout2 signals.
We also make use of a high-performance CMOS FA to
produce main output signals.
The paper is organized as follows: in Section II, we
describe two architectures proposed in [4] and [6]. In Section
III, we explore our proposed compressor. In Section IV, the
simulation results and comparison are detailed. Finally, in
Section V we conclude the paper.
II.
BACKGROUND
Every 5:2 compressor has seven inputs and four outputs.
Five inputs are primary inputs and the rest are two input carries
which receive their values from the previous stage of one bit
lower in significance. All the seven inputs, as well as output
Sum bit have the same weight. The other three output bits
weight one bit higher order.
A 5:2 compressor with five primary inputs X1, X2, X3, X4,
X5 and two output bits Sum and Carry along with carry input
bits, Cin1 and Cin2, and carry output bits, Cout1 and Cout2, is
governed by the following equation:
x1  x2  x3  x4  x5  Cin1  Cin2 
Sum  2Carrry  Cout1  Cout 2 


Various structures for 5:2 compressors are available in the
literature. The simplest implementation of 5:2 compressor is
obtained by cascading three full-adders in a hierarchical
structure. The structure has a critical path delay of 6∆ where ∆
refers to delay of XOR-XNOR, XOR, MUX or CGEN module,
as their critical delay difference is trivial.
The structure of 5:2 compressor with critical path delay of
4∆ has been proposed in [7]. A modified structure has been
proposed in [8], which has a critical path delay of 5∆. In [3, 4]
and [6] authors have proposed architectures with a critical path
delay of 4∆.
As shown in [4, 6], designs presented in these papers have
better performance than others. The results of these two papers
urged us to use these structures for our survey. These two
designs are introduced below.
MIPRO 2014/MEET
Fig. 3. CMOS implementation of Carry Generator module
Fig. 1. 5:2 compressor architecture proposed in [4]-Arch-07
A. The compressor proposed in [4]
Since CMOS implementation of MUX performs better than
a XOR gate in terms of power and delay [9], in [4] the authors
have proposed a 5:2 compressor based on using multiplexers in
place of XOR gates as shown in Fig. 1. In CMOS
implementation of the MUX and XOR-XNOR blocks, as
shown in Fig. 2(b) and Fig. 5(a), outputs and their
complements are generated. But, complement outputs are not
being utilized in architectures proposed in [3, 7, 8, 10]. If the
output of multiplexer is used as select bit of another
multiplexer, it can be used efficiently, and an extra stage to
compute the negation in multiplexer structure can be saved.
The authors used CMOS-CGEN in their architecture to
produce Cout1 signal. Fig. 3 shows the CMOS implementation
of CGEN module. The architecture proposed in [4] will be
called Arch-07 in the rest of this paper and is depicted in Fig.
1.
B. The compressor proposed in [6]
In [6], authors have proposed architectures with a critical
path delay of 4∆. Pass-transistor logics and static CMOS logics
have been used based on a new understanding of 5:2
compressors’ truth table. This architecture is depicted in Fig. 4
and will be called Arch-13 in the rest of this paper. CMOSCGEN modules have been used to produce Cout1 and Cout2
signals, in this architecture. This block is also a main block of
7-2 compressor proposed in [13].
Arch-07 is one of the most efficient architectures that exist
for 5:2 compressors and has been used in [2, 11, 12].
The authors performed simulation by using CMOS
multiplexer and XOR-XNOR modules which are depicted in
Fig. 2(b) and Fig. 5(a), respectively. In addition, Fig. 2(a) has
been used in Arch-07 as the MUX block of intermediate stages.
Fig. 4. 5:2 compressor proposed in [6]-Arch-13
Fig. 2. (a) Transmission Gate implementation of multiplexer (b) CMOS
implementation of multiplexer
Fig. 5. (a) CMOS implementation of XOR-XNOR module (b) XOR-XNOR
module proposed in [3]
MIPRO 2014/MEET
87
In Arch-13, XOR-XNOR modules of Fig. 5(b) are used to
feed complementary inputs of MUXs. In addition MUX
modules of Fig. 2(a) have been used in this architecture. The
authors have claimed power and latency reduction by
decreasing middle stage capacitances.
III.
PROPOSED COMPRESSOR
In this section the proposed architecture is introduced. Two
improvement approaches are used to propose the new 5:2
compressor architecture. First, by a closer look at dashed box
of Fig. 1, it represents the functionality of a conventional FA
and can be replaced by variety of FAs presented in the
literature. This replacement is expected to lead to considerable
speed improvement, due to 34% faster operation of CMOS FA
in comparison to two cascaded CMOS XOR gates, as it is
explained in [14]. Therefore, the CMOS FA presented in [15]
is used in our proposed 5:2 compressor architecture.
To further improvement, we make some changes to internal
equations of the 5:2 compressor to eliminate final Not gates of
the CMOS FA. By doing so, we could have reduced power
dissipation as well as improved operational speed. To achieve
this goal, we have to use XNOR gates instead of XORs of the
second stage of the architecture. Hence, we propose the
architecture which is shown in Fig. 6. This design uses 82
transistors in its architecture (i.e. 6 transistors less than Arch13). The proposed design will be called New Arch in the rest of
this paper.
In this architecture, FA-not is CMOS FA which it’s final
Not gates have been eliminated. This module is shown if Fig.
7.
As shown in Fig. 6, CGEN modules have been used to
produce Cout1 and Cout2 output signals. The CMOS-CGEN
which has been shown in Fig. 3 is used in the proposed
architecture. In addition, outputs of the XOR gates have been
fed to inputs of the XNOR gates, as it can be seen in Fig. 3.
This way, outputs of the XNOR gates are negation of what it
was before for conventional 5:2 compressors and by replacing
a FA-not instead of a FA we can have valid Sum and Carry
signals.
Fig. 6. Proposed 5:2 compressor design-New Arch
88
Fig. 7. CMOS implementation of FA-not cell
We know the following equations:
x y  x y
x y  x y



Therefore, (3) and (4) proves the accuracy of the mentioned
methods.
P  Q C
in2
 P  Q   Cin2
 P  Q  Cin2
P  Q C
in2


 PQ  PQ  PCin2  QC in2
 P  Q   Cin2  PQ


It is worth mentioning that P and Q are nodes which
represent XNOR gates outputs of the proposed architecture and
have been shown in Fig. 6. In addition, they are logical
complement of P and Q nodes, respectively, which have been
shown in Fig. 1.
Fig. 8. (a) CMOS implementation of XNOR gate (b) XOR gate using passtransistor logic presented in [16]
MIPRO 2014/MEET
The logic equations governing the proposed 5:2 compressor
architecture are detailed in (5)-(8):
Sum  x1  x2  x3  x4  x5  Cin1  Cin2 
Cout1  x1  x2   x3  x1 x2 



Cout 2  ( x4  x5 )  Cin1  x4 x5  
Carry  x1  x2  x3   x4  x5  Cin1   Cin2
 x1  x2  x3   x4  x5  Cin1 



The XOR and XNOR gates which have been used in the
proposed architecture are shown in Fig. 8.
IV.
SIMULATION RESULTS AND COMPARISON
TABLE I.
SIMULATION RESULTS IN 25°C WITH THE 1V
Power
[µw]
Arch-07
Arch-13
New Arch
Improvement
percentage
Delay
[ps]
Power-delay
product [fj]
Transistor
count
6.18
6.66
4.66
228.3
210.3
168.2
1.41
1.4
0.78
90
88
82
24.59%
18.44%
44.28%
6.82%
As it can be seen in the table, New Arch reduces power by
24.59% in comparison to Arch-07. In addition, New Arch
reduces delay, PDP and transistor count by 18.44%, 44.28%
and 6.82%, respectively, in comparison to Arch-13.
We have also performed voltage scaling and temperature
analysis. For voltage scaling analysis, 5:2 compressors have
been simulated at temperature of 25°C under supply voltages
ranging from 0.8V to 1.2V. Fig. 10 shows PDP versus supply
voltage of the three 5:2 compressor architectures.
For temperature analysis, the architectures have been
simulated in the wide temperature range of 20-120°C with 1V
supply voltage. Fig. 11 shows PDP versus temperature of the
three 5:2 compressor architectures
A. Simulation environment
The simulations have been performed by HSPICE software.
All the simulations are targeted for TSMC 90-nm technology.
The circuits under test are simulated at 1 V supply voltage. It
should be noted that all inputs are driven by frequency of
200MHz.
The simulation environment for 5:2 compressor is shown in
Fig. 9. Three cascaded compressors are used due to the fact that
the critical path of some data patterns may cross neighboring
compressors in the same stage of CSA tree. The dashed line in
Fig. 9 indicates the most probable scenario of such potential
critical paths. The delay is measured from the earliest input
signal reaching 50% of the supply voltage to the latest output
signal reaching 50% of the supply voltage for each input cycle.
The worst case is the largest delay among all input data.
B. Results and comparisons
The proposed architecture is compared to two of the best
existing 5:2 compressor architectures in terms of power, delay
and area. Therefore, three different 5:2 compressor designs are
simulated: Arch-07, Arch-13 and the proposed architecture.
Fig. 10. Power-delay product Vs. supply voltage
The simulation results for 1V supply voltage at ambient
temperature of 25°C are tabulated in Table 1. Bold numbers
represent the best ones among the three architectures in each
figure of merits. In addition, the last raw of the table shows the
improvement percentage of our proposed architecture in
comparison to the best of the existing ones.
Fig. 11. Power-delay product Vs. temperature
Fig. 9. Simulation environment
MIPRO 2014/MEET
89
Fig. 10 and Fig. 11 show that the proposed architecture
outperforms the other two designs from energy efficiency point
of view. These figures as well as Table 1 show up to 45%
improvement in PDP of the proposed architecture in
comparison to the best existing ones.
V.
CONCLUSION
In this paper, we proposed a new 5:2 compressor
architecture by changing some internal equations. The new
compressor is proposed by presenting a module named: FAnot. This module is obtained by eliminating final Not gates of a
CMOS FA block. In addition, CMOS-CGENs are used to
produce Cout1 and Cout2 signals. By applying these methods, we
have come up with a 5:2 compressor which uses 82 transistors.
Simulation and synthesis are performed by HSPICE
software; and all inputs are driven by the frequency of
200MHz. It is concluded that among the three simulated
architectures, the proposed architecture consumes the lowest
power, operates faster and consequently is the most energy
efficient one. The proposed architecture, in term of PDP, shows
44.28% improvement compared to the best existing
architecture.
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
REFERENCES
[13]
[1]
[2]
[3]
[4]
O. Kwon, K. Nowka, and E. E. Swartzlander, "A 16-Bit by 16-Bit MAC
Design Using Fast 5:3 Compressor Cells," The Journal of VLSI Signal
Processing, vol. 31, pp. 77-89, 2002.
R. Modugu, C. Minsu, and N. Park, "A fast low-power modulo 2n+1
multiplier design," in Instrumentation and Measurement Technology
Conference, 2009. I2MTC '09. IEEE, 2009, pp. 951-956.
C. H. Chang, J. Gu, and M. Zhang, "Ultra low-voltage low-power
CMOS 4-2 and 5-2 compressors for fast arithmetic circuits," IEEE
Transactions onCircuits and Systems, vol. 51, pp. 1985-1997, 2004.
S. Veeramachaneni, K. M. Krishna, L. Avinash, S. R. Puppala, and M.
B. Srinivas, "Novel Architectures for High-Speed and Low-Power 3-2,
4-2 and 5-2 Compressors," in Proc. of 20th Int. Conf. on VLSI Design,
2007, pp. 324-329.
90
[14]
[15]
[16]
R. Menon and D. Radhakrishnan, "High performance 5 : 2 compressor
architectures," Circuits, Devices and Systems, IEE Proceedings -, vol.
153, pp. 447-452, 2006.
M. Tohidi, M. Mousazadeh, S. Akbari, K. Hadidi, and A. Khoei,
"CMOS implementation of a new high speed, glitch-free 5-2 compressor
for fast arithmetic operations," in Mixed Design of Integrated Circuits
and Systems (MIXDES), 2013 Proceedings of the 20th International
Conference, 2013, pp. 204-208.
O. Kwon, K. Novka, and E. E. Swartzlander, "A 16-bit × 16-bit MAC
design using fast 5:2 compressor," in IEEE Int. Conf. ApplicationSpecific Systems, Architectures, Processors, 2000, pp. 235-243.
K. Prasad and K. K. Parhi, "Low-power 4-2 and 5-2 compressors," in
Proc. of the 35th Asilomar Conf. on Signals, Systems and Computers,
2001, pp. 129-133.
R. Zimmermann and W. Fichtner, "Low-Power Logic Styles: CMOS
Versus Pass-Transistor Logic," IEEE J. Solid-State Circuits, vol. 32, pp.
1079-1090, 1997.
G. Jiangmin and C. Chip-Hong, "Low voltage, low power (5:2)
compressor cell for fast arithmetic circuits," in Acoustics, Speech, and
Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE
International Conference on, 2003, pp. II-661-4 vol.2.
C. Vinoth, V. S. K. Bhaaskaran, B. Brindha, S. Sakthikumaran, V.
Kavinilavu, B. Bhaskar, M. Kanagasabapathy, and B. Sharath, "A novel
low power and high speed Wallace tree multiplier for RISC processor,"
in Electronics Computer Technology (ICECT), 2011 3rd International
Conference on, 2011, pp. 330-334.
R. Shende, P. Zode, and P. Zode, "Efficient design 2 k-1 binary to residue
converter," in Devices, Circuits and Systems (ICDCS), 2012
International Conference on, 2012, pp. 482-485.
M. Rouholamini, O. Kavehie, A. P. Mirbaha, S. J. Jasbi, and K. Navi,
"A New Design for 7:2 Compressors," in Proc. IEEE/ACS Int. Conf. on
Computer Systems and Applications, 2007, pp. 474-478.
A. Pishvaie, G. Jaberipur, and A. Jahanian, "Improved CMOS (4;2)
compressor designs for parallel multipliers," Computers & Electrical
Engineering, vol. 38, pp. 1703-1716, 2012.
N. Weste and K. Eshraghian, Principles of CMOS VLSI design, 2nd ed.:
Addison Wesley, 1993.
S. Nishizawa, T. Ishihara, and H. Onodera, "Analysis and comparison of
XOR cell structures for low voltage circuit design," in Quality
Electronic Design (ISQED), 2013 14th International Symposium on,
2013, pp. 703-708.
MIPRO 2014/MEET