144 CHAPTER 6 NOVEL DESIGN OF ALU 6.1 INTRODUCTION The computation unit is the main unit of any technology, which performs different arithmetic operations like as addition, subtraction and multiplication etc. and logical operations like AND, OR, invert, XOR etc. which is a dominant feature in the digital domain based applications. The arithmetic and logic unit (ALU) perform all the arithmetic operations (addition, subtraction, multiplication and division) and logic operations. Logic operations test various conditions encountered during processing and they allow different actions to be taken based on the results. The data required to perform the arithmetic and logical functions are the inputs from the designated CPU registers and operands. The ALU relies on basic items to perform its operations. The ALU is called as the heart of microprocessors, microcontrollers and CPUs because no technology can exist without those operations performed by ALU. 6.2 ARITHMETIC LOGIC UNIT An Arithmetic Logic Unit is a digital circuit that performs arithmetic and logical operations. The ALU is a fundamental building block of the central processing unit (CPU) of a computer and even the simplest microprocessors contain one for purposes such as maintaining timers. The processors were found inside modern CPUs and graphics 145 processing units (GPUs) that accommodate very powerful and very complex ALUs. A single component may contain a number of ALUs. Most of the processor's operations were performed by one or more ALUs. An ALU loads data from input registers and an external control unit commands the ALU to perform the required operation on the data and it stores the result into an output register. The inputs to the ALU were the data to be operated on (called operands) and a code from the control unit indicates which operation to perform. Most ALUs can perform the following operations: x Integer arithmetic operations (addition, subtraction, multiplication and division) x Bitwise logic operations (AND, NOT, OR, XOR) x Bit-shifting operations (shifting or rotating a word by a specified number of bits to the left or right, with or without sign extension) When designing the ALU, the principle "Divide and Conquer" is used in order to make a modular design that consists of smaller, more manageable blocks, some of which can be re-used. Instead of designing the 8-bit ALU as one circuit, it is proposed to design a one-bit ALU, also called a bit-slice. These bit-slices are put together to make an 8-bit ALU. There are different ways to design a bit-slice of the ALU. One method is forming the truth table with 6 inputs (M, S1, S0, C0, Ai and Bi) and two outputs Fi and Ci+1. But it is more complex to form the table. An alternative way is to split the ALU into two modules, namely Logic module and Arithmetic module. Designing each module separately will be 146 easier than designing a bit-slice as one unit. The block diagram of 1 bit ALU contains 2:1 MUX, a Logic unit and Arithmetic unit is shown in Figure 6.1. Figure 6.1 Block diagram of one bit ALU In the ALU design, Multiplication is the biggest task for the processor to compute the value. For this purpose, an An 8-bit ALU has been designed for 5V operation. The full adder design has been implemented using new MUX based adder and 4T XOR adder. The designed BARREL ALU SHIFTER, performs, MULTIPLICATION, SUBTRACTION, XOR, NAND ADDITION, and NOR operations. The result of all computation is obtained from the output of 8-to-1 multiplexer. The select signals of multiplexers will decide the 147 operation to be performed and corresponding input and output will be selected by the multiplexer. 6.3 ALGORITHM FOR THE DESIGN OF NOVEL ALU Step 1 Selection of required operations to be performed by ALU. Step 2 Design of bit slice ALU using the principle of Divide and Conquer in order to use a modular design that consists of smaller, more manageable blocks. Step 3 Split the ALU into Arithmetic and Logic module to design a bit slice as one unit. Step 4 Forming the truth table using the select input of multiplexer to decide the required operations to be performed by ALU. Step 5 Design of control circuit using multiplexer to make bit slice arithmetic and logic unit as single unit. Step 6 Obtain the power optimization using the sleep transistor gating technique to reduce the overall leakage power of the chip. Step 7 Design of control circuit to optimize the leakage power using NMOS and PMOS transistors for temporary shutdown of unused circuits. 148 6.4 OPTIMIZATION OF POWER USING SLEEP TRANSISTOR GATING TECHNIQUES Sleep transistor gating is the technique wherein circuit blocks that are not in use are temporarily turned off to reduce the overall leakage power of the chip. This temporary shutdown time can also be known as "low power mode" or "inactive mode". When circuit blocks are required for operation once again they are activated to "active mode". These two modes are switched at the appropriate time and in the suitable manner to maximize power performance while minimizing impact to performance. Thus the goal of power gating is to minimize leakage power by temporarily cutting power off to selective blocks that are not required in that mode. Shutting down the blocks can be accomplished either by software or hardware. For instances, Driver software can schedule the power down operations. Similarly, hardware timers can be utilized. A dedicated power management controller is the other option. The sleep transistor gating parameters are of four types: ¾ Gate Size ¾ Gate Control Slew Rate ¾ Simultaneous Switching Capacitance ¾ Gate Leakage 149 6.5 TYPES OF SLEEP TRANSISTOR GATING TECHNIQUES Fine-grain gating encapsulates the switching transistor as a part of the standard cell logic. Switching transistors are designed by either library IP vendor or standard cell designer. Usually these cell designs confirm to the normal standard cell rules and can easily be handled by EDA tools for implementation. The coarse-grained approach implements the grid style sleep transistors which drive cells locally through shared virtual power networks. This approach is less sensitive to PVT variation, introduces less IR-drop variation and imposes a smaller area overhead than the cell or cluster-based implementations. In coarse-grain power gating, the power-gating transistor is a part of the power distribution network rather than the standard cell. There are two ways of implementing a coarse-grain structure such as Ringbased and Column-based. Ring-based methodology: The gates are placed around the perimeter of the module that is being switched-off as a ring. Special corner cells are used to turn the power signals around the corners. Column-based methodology: The gates are inserted within the module with the cells abutted to each other in the form of columns. The global power is the higher layers of metal, while the switched power is in the lower layers. The Sleep Transistor gating technique is employed wherein the overall leakage power of the circuit is reduced by temporarily switching off the circuits when they are not in use. It makes the circuit to act as an 150 inactive mode or in low power mode. When the desired circuits are to be used once again, they are turned ON again or brought back to the active mode. Thus the leakage power is minimized by temporarily cutting-OFF the power to selective blocks that are not being used in that mode. The sleep transistor gating techniques in fine grained method is proposed in order to optimize the static power being dissipated. In this approach, the inputs to the gates are blocked by using NMOS and PMOS when circuit blocks are not in use. It leads to reduction of unnecessary utilization of input leading to significant amount of power reduction. In this approach, sleep transistors are controlled by the select signal applied to the ALU to select the desired operations. The overall leakage power of the circuit is reduced by temporarily switching off the circuits when it is not in use. The leakage power is minimized by temporarily cutting-OFF the power to selective blocks that are not being used in that mode. The block diagram structure of designed 8 bit ALU is shown in Figure 6.2 and the modular view of designed ALU is shown in Figure 6.3. In Fine Grained Technique, the input is given to the blocks only when it is required, thus resulting in minimization of power. The NMOS and PMOS transistors are being used to block the input going to the individual module. The selection lines of multiplexer are connected to the gate of NMOS and PMOS through which input is connected to the arithmetic and logic unit. Input is connected only during the required operation was enabled from the multiplexer. Thus the NMOS and PMOS act as a switch by blocking the input to the respective circuit until they were required. The different operation of designed ALU for different select input is shown in Table 6.1. 8 bit Input Output 151 Figure 6.2 Block diagram of proposed 8 bit ALU 152 Figure 6.3 Modular view of 8 bit ALU 153 Table 6.1 ALU operation for various select inputs Select Input 6.6 Operation S2 S1 S0 0 0 0 NAND 0 0 1 NOR 0 1 0 XOR 0 1 1 SUBTRACTION 1 0 0 SHIFT 1 0 1 ADDITION 1 1 0 MULTIPLICATION ( P0 – P7) 1 1 1 MULTIPLICATION( P8 – P15) RESULTS AND DISCUSSION The designed ALU is simulated with 0.18 µm CMOS technology. Based on the selection lines of multiplexer, the required operation is decided and input is applied to the ALU unit. Output of the various operations is taken out from the multiplexers. The simulated output waveform is depicted in Figure 6.4 (a) and (b). The multipliers are the most power hungry and require more time for computation of output. In multiplier, to compute the next output the previous stage output is used. The logical operation produces single bit delay due to computation of logical operation in one step. The ALU is simulated with sleep transistor gating techniques and without sleep transistor gating techniques to compute the power, delay and number of transistors required for implementation. The designed ALU is simulated with various multiplier architectures discussed in Sections 4.2 to 4.5 of Chapter The performance comparison is shown in Table 6.2 to Table 6.4. 4. Voltage in Volts 154 Time in nS Figure 6.4(a) Simulated output waveform of 8 bit ALU Voltage in Volts 155 Time in nS Figure 6.4 (b) Simulated output waveform of 8 bit ALU 156 Table 6.2 Performance comparison of 8 bit ALU without sleep transistor gating using various multiplier architectures Delay PDP No. of (nS) (WS) Transistors 3.24e-001 736 238.7e-9 1794 3.79e-002 726 27.53e-9 1874 7.35e-002 564 41.47e-9 1894 3.34e-002 510 17.0e-9 1864 8 bit ALU Power (W) Using array multiplier (Leonardo et al 2004) Using Wallace Tree multiplier (Naveen et al 2006) Using Multiplexer based multiplier (Kevin Biswas 2005) Using Binary Tree Multiplier (Proposed) Table 6.3 Performance comparison of 8 bit ALU with sleep transistor gating using various multiplier architectures 8 bit ALU Using array multiplier (Leonardo et al 2004) Using Wallace Tree multiplier (Naveen et al 2006) Using Multiplexer based multiplier (Kevin Biswas 2005) Using Binary Tree Multiplier (Proposed) Power Delay PDP No. of (W) (nS) (WS) Transistors 3.05e-001 740 225.7e-9 1822 3.45e-002 732 25.23e-9 1902 6.85e-002 568 38.9e-9 1922 3.05e-002 513 15.5e-9 1892 157 From the Tables 6.2 and 6.3, it is observed that ALU with the proposed binary tree multiplier requires less delay and consumes less power than the other three multiplier architectures delineated earlier. The number of transistors required is also optimum than the other multiplier architectures as shown in Table 6.4. Table 6.4 Comparison of PDP and area requirement 8 bit ALU with and without sleep transistor gating techniques 8 bit ALU Using array multiplier Using Wallace Tree multiplier PDP without SGT (WS) PDP with SGT (WS) % No. of No. of % of change Transistors Transistors transistors in without with SGT increased PDP SGT 238.7e-9 225.7e-9 5.7 1794 1822 1.5 27.53e-9 25.23e-9 9.1 1874 1902 1.47 41.47e-9 38.9e-9 6.6 1894 1922 1.45 17.05e-9 15.5e-9 10 1864 1892 1.47 Using Multiplexer based multiplier Using Binary Tree Multiplier From Table 6.4, it is observed that PDP of ALU with sleep transistor gating techniques is reduced by 10% compared to without sleep transistor gating techniques. The additional transistor required for sleep transistor technique is less than 2%. The comparison of power, delay, PDP and number of transistors required to implement the ALU using various multiplier architecture without sleep transistor gating techniques is represented as graph from Figure 6.5 to Figure 6.8 and with sleep transistor technique, it is shown from Figure 6.9 to Figure 6.12. The performance 158 comparison of ALU with and without SGT is shown in Figure 6.13 to Figure 6.16. Figure 6.5 Power consumption of ALU without SGT Figure 6.6 Delay of ALU without SGT 159 Figure 6.7 Figure 6.8 PDP of ALU without SGT Number of transistors required without SGT 160 Figure 6.9 Power consumption of ALU with SGT Figure 6.10 Delay of ALU with SGT 161 Figure 6.11 PDP of ALU with SGT Figure 6.12 Number of transistors required with SGT PDP in WS 162 Number of Transistors Figure 6.13 PDP comparison of ALU with and without SGT Figure 6.14 Transistor requirement of ALU with and without SGT 163 Figure 6.15 Percentage change of PDP using SGT Figure 6.16 Percentage increasing of transistors using SGT 164 6.7 SUMMARY Designing an ALU as a single unit is more complex and difficult to implement. In this novel method, ALU is designed as bit slice and formed together for computation. Hence complexity is reduced and performance is also improved. From the performance comparison, it is observed that while using sleep transistor gating techniques PDP of ALU is reduced by 10% with a rise of 1.5% transistor count. The designed ALU is split into a number of smaller modules and the various operations are selected by multiplexer. Hence the designed ALU is optimized for high speed applications.
© Copyright 2025 ExpyDoc