High-speed Energy-efficient 5:2 Compressor Ardalan Najafi, Somayeh Timarchi Amir Najafi Dep. of Electrical and Computer Engineering Shahid Beheshti University Tehran, Iran [email protected] [email protected] Dep. of Electrical, Computer and Biomedical Engineering Qazvin Branch, Islamic Azad University Qazvin, Iran [email protected] Abstract— Multipliers are important components that dictate the overall arithmetic circuits’ performance. The most critical components of multipliers are compressors. In this paper, a new 5:2 compressor architecture based on changing some internal equations is proposed. In addition, using an efficient full-adder (FA) block is considered to have a high-speed compressor. The number of transistors in the proposed design is less than the best existing 5:2 compressor architectures. Three 5:2 compressors are considered for comparison. The proposed architecture is compared with the best existing designs presented in the state-ofthe-art literature in terms of power, delay and area. Architectures are simulated in 90-nm CMOS technology under 1 V supply voltage. The simulation results show that the proposed compressor improves power and delay by 24.59% and 18.54% respectively, compared to two of the best existing architectures. In addition, voltage scaling and temperature analysis show that the proposed architecture outperforms the other designs from power-delay product (PDP) point of view in comparison to the aforementioned designs. Keywords—5:2 compressor; multiplier; high-performance arithmetic circuit; low-power design I. INTRODUCTION Rapid growth of using multipliers has attracted lots of researchers’ attention into making high-performance multipliers to consume less power and operate faster. Multiplication process includes three steps: 1) partial product generation; 2) partial product reduction; 3) final addition with carry propagating. The second step makes the worst-case delay, consumes the main part of power, and occupies the high fraction of silicon area. To decrease the latency of this step, compressors have been widely employed. Therefore, designing a low-power and high-speed compressor is an important issue that should be raised to have a proper multiplication and subsequently a fast arithmetic computation. 5:2 compressors are widely used as building blocks of multipliers [1, 2]. Lots of architectures for 5:2 compressors have been proposed in the literature [3-6]. By a vast research on these structures, it has been revealed that the structures presented in [4] and [6] have better performance than others. In [4], the authors have proposed a new 5:2 compressor. They have simulated their proposed compressor by CMOS XORXNOR modules and two different multiplexer implementations. In [6], the authors have proposed a new structure based on a new realization of 5:2 compressors’ truth table. 86 In this paper, we propose a new 5:2 compressor by applying some changes to the structure proposed in [4] to have a high-speed and low-power compressor. We use Carry Generator (CGEN) modules to produce Cout1 and Cout2 signals. We also make use of a high-performance CMOS FA to produce main output signals. The paper is organized as follows: in Section II, we describe two architectures proposed in [4] and [6]. In Section III, we explore our proposed compressor. In Section IV, the simulation results and comparison are detailed. Finally, in Section V we conclude the paper. II. BACKGROUND Every 5:2 compressor has seven inputs and four outputs. Five inputs are primary inputs and the rest are two input carries which receive their values from the previous stage of one bit lower in significance. All the seven inputs, as well as output Sum bit have the same weight. The other three output bits weight one bit higher order. A 5:2 compressor with five primary inputs X1, X2, X3, X4, X5 and two output bits Sum and Carry along with carry input bits, Cin1 and Cin2, and carry output bits, Cout1 and Cout2, is governed by the following equation: x1 x2 x3 x4 x5 Cin1 Cin2 Sum 2Carrry Cout1 Cout 2 Various structures for 5:2 compressors are available in the literature. The simplest implementation of 5:2 compressor is obtained by cascading three full-adders in a hierarchical structure. The structure has a critical path delay of 6∆ where ∆ refers to delay of XOR-XNOR, XOR, MUX or CGEN module, as their critical delay difference is trivial. The structure of 5:2 compressor with critical path delay of 4∆ has been proposed in [7]. A modified structure has been proposed in [8], which has a critical path delay of 5∆. In [3, 4] and [6] authors have proposed architectures with a critical path delay of 4∆. As shown in [4, 6], designs presented in these papers have better performance than others. The results of these two papers urged us to use these structures for our survey. These two designs are introduced below. MIPRO 2014/MEET Fig. 3. CMOS implementation of Carry Generator module Fig. 1. 5:2 compressor architecture proposed in [4]-Arch-07 A. The compressor proposed in [4] Since CMOS implementation of MUX performs better than a XOR gate in terms of power and delay [9], in [4] the authors have proposed a 5:2 compressor based on using multiplexers in place of XOR gates as shown in Fig. 1. In CMOS implementation of the MUX and XOR-XNOR blocks, as shown in Fig. 2(b) and Fig. 5(a), outputs and their complements are generated. But, complement outputs are not being utilized in architectures proposed in [3, 7, 8, 10]. If the output of multiplexer is used as select bit of another multiplexer, it can be used efficiently, and an extra stage to compute the negation in multiplexer structure can be saved. The authors used CMOS-CGEN in their architecture to produce Cout1 signal. Fig. 3 shows the CMOS implementation of CGEN module. The architecture proposed in [4] will be called Arch-07 in the rest of this paper and is depicted in Fig. 1. B. The compressor proposed in [6] In [6], authors have proposed architectures with a critical path delay of 4∆. Pass-transistor logics and static CMOS logics have been used based on a new understanding of 5:2 compressors’ truth table. This architecture is depicted in Fig. 4 and will be called Arch-13 in the rest of this paper. CMOSCGEN modules have been used to produce Cout1 and Cout2 signals, in this architecture. This block is also a main block of 7-2 compressor proposed in [13]. Arch-07 is one of the most efficient architectures that exist for 5:2 compressors and has been used in [2, 11, 12]. The authors performed simulation by using CMOS multiplexer and XOR-XNOR modules which are depicted in Fig. 2(b) and Fig. 5(a), respectively. In addition, Fig. 2(a) has been used in Arch-07 as the MUX block of intermediate stages. Fig. 4. 5:2 compressor proposed in [6]-Arch-13 Fig. 2. (a) Transmission Gate implementation of multiplexer (b) CMOS implementation of multiplexer Fig. 5. (a) CMOS implementation of XOR-XNOR module (b) XOR-XNOR module proposed in [3] MIPRO 2014/MEET 87 In Arch-13, XOR-XNOR modules of Fig. 5(b) are used to feed complementary inputs of MUXs. In addition MUX modules of Fig. 2(a) have been used in this architecture. The authors have claimed power and latency reduction by decreasing middle stage capacitances. III. PROPOSED COMPRESSOR In this section the proposed architecture is introduced. Two improvement approaches are used to propose the new 5:2 compressor architecture. First, by a closer look at dashed box of Fig. 1, it represents the functionality of a conventional FA and can be replaced by variety of FAs presented in the literature. This replacement is expected to lead to considerable speed improvement, due to 34% faster operation of CMOS FA in comparison to two cascaded CMOS XOR gates, as it is explained in [14]. Therefore, the CMOS FA presented in [15] is used in our proposed 5:2 compressor architecture. To further improvement, we make some changes to internal equations of the 5:2 compressor to eliminate final Not gates of the CMOS FA. By doing so, we could have reduced power dissipation as well as improved operational speed. To achieve this goal, we have to use XNOR gates instead of XORs of the second stage of the architecture. Hence, we propose the architecture which is shown in Fig. 6. This design uses 82 transistors in its architecture (i.e. 6 transistors less than Arch13). The proposed design will be called New Arch in the rest of this paper. In this architecture, FA-not is CMOS FA which it’s final Not gates have been eliminated. This module is shown if Fig. 7. As shown in Fig. 6, CGEN modules have been used to produce Cout1 and Cout2 output signals. The CMOS-CGEN which has been shown in Fig. 3 is used in the proposed architecture. In addition, outputs of the XOR gates have been fed to inputs of the XNOR gates, as it can be seen in Fig. 3. This way, outputs of the XNOR gates are negation of what it was before for conventional 5:2 compressors and by replacing a FA-not instead of a FA we can have valid Sum and Carry signals. Fig. 6. Proposed 5:2 compressor design-New Arch 88 Fig. 7. CMOS implementation of FA-not cell We know the following equations: x y x y x y x y Therefore, (3) and (4) proves the accuracy of the mentioned methods. P Q C in2 P Q Cin2 P Q Cin2 P Q C in2 PQ PQ PCin2 QC in2 P Q Cin2 PQ It is worth mentioning that P and Q are nodes which represent XNOR gates outputs of the proposed architecture and have been shown in Fig. 6. In addition, they are logical complement of P and Q nodes, respectively, which have been shown in Fig. 1. Fig. 8. (a) CMOS implementation of XNOR gate (b) XOR gate using passtransistor logic presented in [16] MIPRO 2014/MEET The logic equations governing the proposed 5:2 compressor architecture are detailed in (5)-(8): Sum x1 x2 x3 x4 x5 Cin1 Cin2 Cout1 x1 x2 x3 x1 x2 Cout 2 ( x4 x5 ) Cin1 x4 x5 Carry x1 x2 x3 x4 x5 Cin1 Cin2 x1 x2 x3 x4 x5 Cin1 The XOR and XNOR gates which have been used in the proposed architecture are shown in Fig. 8. IV. SIMULATION RESULTS AND COMPARISON TABLE I. SIMULATION RESULTS IN 25°C WITH THE 1V Power [µw] Arch-07 Arch-13 New Arch Improvement percentage Delay [ps] Power-delay product [fj] Transistor count 6.18 6.66 4.66 228.3 210.3 168.2 1.41 1.4 0.78 90 88 82 24.59% 18.44% 44.28% 6.82% As it can be seen in the table, New Arch reduces power by 24.59% in comparison to Arch-07. In addition, New Arch reduces delay, PDP and transistor count by 18.44%, 44.28% and 6.82%, respectively, in comparison to Arch-13. We have also performed voltage scaling and temperature analysis. For voltage scaling analysis, 5:2 compressors have been simulated at temperature of 25°C under supply voltages ranging from 0.8V to 1.2V. Fig. 10 shows PDP versus supply voltage of the three 5:2 compressor architectures. For temperature analysis, the architectures have been simulated in the wide temperature range of 20-120°C with 1V supply voltage. Fig. 11 shows PDP versus temperature of the three 5:2 compressor architectures A. Simulation environment The simulations have been performed by HSPICE software. All the simulations are targeted for TSMC 90-nm technology. The circuits under test are simulated at 1 V supply voltage. It should be noted that all inputs are driven by frequency of 200MHz. The simulation environment for 5:2 compressor is shown in Fig. 9. Three cascaded compressors are used due to the fact that the critical path of some data patterns may cross neighboring compressors in the same stage of CSA tree. The dashed line in Fig. 9 indicates the most probable scenario of such potential critical paths. The delay is measured from the earliest input signal reaching 50% of the supply voltage to the latest output signal reaching 50% of the supply voltage for each input cycle. The worst case is the largest delay among all input data. B. Results and comparisons The proposed architecture is compared to two of the best existing 5:2 compressor architectures in terms of power, delay and area. Therefore, three different 5:2 compressor designs are simulated: Arch-07, Arch-13 and the proposed architecture. Fig. 10. Power-delay product Vs. supply voltage The simulation results for 1V supply voltage at ambient temperature of 25°C are tabulated in Table 1. Bold numbers represent the best ones among the three architectures in each figure of merits. In addition, the last raw of the table shows the improvement percentage of our proposed architecture in comparison to the best of the existing ones. Fig. 11. Power-delay product Vs. temperature Fig. 9. Simulation environment MIPRO 2014/MEET 89 Fig. 10 and Fig. 11 show that the proposed architecture outperforms the other two designs from energy efficiency point of view. These figures as well as Table 1 show up to 45% improvement in PDP of the proposed architecture in comparison to the best existing ones. V. CONCLUSION In this paper, we proposed a new 5:2 compressor architecture by changing some internal equations. The new compressor is proposed by presenting a module named: FAnot. This module is obtained by eliminating final Not gates of a CMOS FA block. In addition, CMOS-CGENs are used to produce Cout1 and Cout2 signals. By applying these methods, we have come up with a 5:2 compressor which uses 82 transistors. Simulation and synthesis are performed by HSPICE software; and all inputs are driven by the frequency of 200MHz. It is concluded that among the three simulated architectures, the proposed architecture consumes the lowest power, operates faster and consequently is the most energy efficient one. The proposed architecture, in term of PDP, shows 44.28% improvement compared to the best existing architecture. [5] [6] [7] [8] [9] [10] [11] [12] REFERENCES [13] [1] [2] [3] [4] O. Kwon, K. Nowka, and E. E. Swartzlander, "A 16-Bit by 16-Bit MAC Design Using Fast 5:3 Compressor Cells," The Journal of VLSI Signal Processing, vol. 31, pp. 77-89, 2002. R. Modugu, C. Minsu, and N. Park, "A fast low-power modulo 2n+1 multiplier design," in Instrumentation and Measurement Technology Conference, 2009. I2MTC '09. IEEE, 2009, pp. 951-956. C. H. Chang, J. Gu, and M. Zhang, "Ultra low-voltage low-power CMOS 4-2 and 5-2 compressors for fast arithmetic circuits," IEEE Transactions onCircuits and Systems, vol. 51, pp. 1985-1997, 2004. S. Veeramachaneni, K. M. Krishna, L. Avinash, S. R. Puppala, and M. B. Srinivas, "Novel Architectures for High-Speed and Low-Power 3-2, 4-2 and 5-2 Compressors," in Proc. of 20th Int. Conf. on VLSI Design, 2007, pp. 324-329. 90 [14] [15] [16] R. Menon and D. Radhakrishnan, "High performance 5 : 2 compressor architectures," Circuits, Devices and Systems, IEE Proceedings -, vol. 153, pp. 447-452, 2006. M. Tohidi, M. Mousazadeh, S. Akbari, K. Hadidi, and A. Khoei, "CMOS implementation of a new high speed, glitch-free 5-2 compressor for fast arithmetic operations," in Mixed Design of Integrated Circuits and Systems (MIXDES), 2013 Proceedings of the 20th International Conference, 2013, pp. 204-208. O. Kwon, K. Novka, and E. E. Swartzlander, "A 16-bit × 16-bit MAC design using fast 5:2 compressor," in IEEE Int. Conf. ApplicationSpecific Systems, Architectures, Processors, 2000, pp. 235-243. K. Prasad and K. K. Parhi, "Low-power 4-2 and 5-2 compressors," in Proc. of the 35th Asilomar Conf. on Signals, Systems and Computers, 2001, pp. 129-133. R. Zimmermann and W. Fichtner, "Low-Power Logic Styles: CMOS Versus Pass-Transistor Logic," IEEE J. Solid-State Circuits, vol. 32, pp. 1079-1090, 1997. G. Jiangmin and C. Chip-Hong, "Low voltage, low power (5:2) compressor cell for fast arithmetic circuits," in Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on, 2003, pp. II-661-4 vol.2. C. Vinoth, V. S. K. Bhaaskaran, B. Brindha, S. Sakthikumaran, V. Kavinilavu, B. Bhaskar, M. Kanagasabapathy, and B. Sharath, "A novel low power and high speed Wallace tree multiplier for RISC processor," in Electronics Computer Technology (ICECT), 2011 3rd International Conference on, 2011, pp. 330-334. R. Shende, P. Zode, and P. Zode, "Efficient design 2 k-1 binary to residue converter," in Devices, Circuits and Systems (ICDCS), 2012 International Conference on, 2012, pp. 482-485. M. Rouholamini, O. Kavehie, A. P. Mirbaha, S. J. Jasbi, and K. Navi, "A New Design for 7:2 Compressors," in Proc. IEEE/ACS Int. Conf. on Computer Systems and Applications, 2007, pp. 474-478. A. Pishvaie, G. Jaberipur, and A. Jahanian, "Improved CMOS (4;2) compressor designs for parallel multipliers," Computers & Electrical Engineering, vol. 38, pp. 1703-1716, 2012. N. Weste and K. Eshraghian, Principles of CMOS VLSI design, 2nd ed.: Addison Wesley, 1993. S. Nishizawa, T. Ishihara, and H. Onodera, "Analysis and comparison of XOR cell structures for low voltage circuit design," in Quality Electronic Design (ISQED), 2013 14th International Symposium on, 2013, pp. 703-708. MIPRO 2014/MEET
© Copyright 2025 ExpyDoc