CHAPTER 6 NOVEL DESIGN OF ALU

144
CHAPTER 6
NOVEL DESIGN OF ALU
6.1
INTRODUCTION
The computation unit is the main unit of any technology, which
performs different arithmetic operations like as addition, subtraction and
multiplication etc. and logical operations like AND, OR, invert, XOR etc.
which is a dominant feature in the digital domain based applications.
The arithmetic and logic unit (ALU) perform all the arithmetic operations
(addition, subtraction, multiplication and division) and logic operations.
Logic operations test various conditions encountered during processing and
they allow different actions to be taken based on the results. The data
required to perform the arithmetic and logical functions are the inputs from
the designated CPU registers and operands. The ALU relies on basic items
to perform its operations. The ALU is called as the heart of
microprocessors, microcontrollers and CPUs because no technology can
exist without those operations performed by ALU.
6.2
ARITHMETIC LOGIC UNIT
An Arithmetic Logic Unit is a digital circuit that performs
arithmetic and logical operations. The ALU is a fundamental building
block of the central processing unit (CPU) of a computer and even the
simplest microprocessors contain one for purposes such as maintaining
timers. The processors were found inside modern CPUs and graphics
145
processing units (GPUs) that accommodate very powerful and very
complex ALUs. A single component may contain a number of ALUs.
Most of the processor's operations were performed by one or more ALUs.
An ALU loads data from input registers and an external control unit
commands the ALU to perform the required operation on the data and it
stores the result into an output register. The inputs to the ALU were the
data to be operated on (called operands) and a code from the control unit
indicates which operation to perform. Most ALUs can perform the
following operations:
x
Integer
arithmetic
operations
(addition,
subtraction,
multiplication and division)
x
Bitwise logic operations (AND, NOT, OR, XOR)
x
Bit-shifting operations (shifting or rotating a word by a specified
number of bits to the left or right, with or without sign
extension)
When designing the ALU, the principle "Divide and Conquer"
is used in order to make a modular design that consists of smaller, more
manageable blocks, some of which can be re-used. Instead of designing the
8-bit ALU as one circuit, it is proposed to design a one-bit ALU, also
called a bit-slice. These bit-slices are put together to make an 8-bit ALU.
There are different ways to design a bit-slice of the ALU.
One method is forming the truth table with 6 inputs (M, S1, S0, C0, Ai and
Bi) and two outputs Fi and Ci+1. But it is more complex to form the table.
An alternative way is to split the ALU into two modules, namely Logic
module and Arithmetic module. Designing each module separately will be
146
easier than designing a bit-slice as one unit. The block diagram of 1 bit
ALU contains 2:1 MUX, a Logic unit and Arithmetic unit is shown in
Figure 6.1.
Figure 6.1
Block diagram of one bit ALU
In the ALU design, Multiplication is the biggest task for the
processor to compute the value. For this purpose, an An 8-bit ALU has
been designed for 5V operation. The full adder design has been
implemented using new MUX based adder and 4T XOR adder.
The
designed
BARREL
ALU
SHIFTER,
performs,
MULTIPLICATION,
SUBTRACTION,
XOR,
NAND
ADDITION,
and
NOR
operations. The result of all computation is obtained from the output of
8-to-1 multiplexer. The select signals of multiplexers will decide the
147
operation to be performed and corresponding input and output will be
selected by the multiplexer.
6.3
ALGORITHM FOR THE DESIGN OF NOVEL ALU
Step 1
Selection of required operations to be performed
by ALU.
Step 2
Design of bit slice ALU using the principle of
Divide and Conquer in order to use a modular
design that consists of smaller, more manageable
blocks.
Step 3
Split the ALU into Arithmetic and Logic module
to design a bit slice as one unit.
Step 4
Forming the truth table using the select input of
multiplexer to decide the required operations to be
performed by ALU.
Step 5
Design of control circuit using multiplexer to
make bit slice arithmetic and logic unit as single
unit.
Step 6
Obtain the power optimization using the sleep
transistor gating technique to reduce the overall
leakage power of the chip.
Step 7
Design of control circuit to optimize the leakage
power using NMOS and PMOS transistors for
temporary shutdown of unused circuits.
148
6.4
OPTIMIZATION
OF
POWER
USING
SLEEP
TRANSISTOR GATING TECHNIQUES
Sleep transistor gating is the technique wherein circuit blocks
that are not in use are temporarily turned off to reduce the overall leakage
power of the chip. This temporary shutdown time can also be known as
"low power mode" or "inactive mode". When circuit blocks are required
for operation once again they are activated to "active mode". These two
modes are switched at the appropriate time and in the suitable manner to
maximize power performance while minimizing impact to performance.
Thus the goal of power gating is to minimize leakage power by temporarily
cutting power off to selective blocks that are not required in that mode.
Shutting down the blocks can be accomplished either by software or
hardware. For instances, Driver software can schedule the power down
operations. Similarly, hardware timers can be utilized. A dedicated power
management controller is the other option.
The sleep transistor gating parameters are of four types:
¾ Gate Size
¾ Gate Control Slew Rate
¾ Simultaneous Switching Capacitance
¾ Gate Leakage
149
6.5
TYPES
OF
SLEEP
TRANSISTOR
GATING
TECHNIQUES
Fine-grain gating encapsulates the switching transistor as a part
of the standard cell logic. Switching transistors are designed by either
library IP vendor or standard cell designer. Usually these cell designs
confirm to the normal standard cell rules and can easily be handled by
EDA tools for implementation.
The coarse-grained approach implements the grid style sleep
transistors which drive cells locally through shared virtual power networks.
This approach is less sensitive to PVT variation, introduces less IR-drop
variation and imposes a smaller area overhead than the cell or cluster-based
implementations. In coarse-grain power gating, the power-gating transistor
is a part of the power distribution network rather than the standard cell.
There are two ways of implementing a coarse-grain structure such as Ringbased and Column-based.
Ring-based methodology: The gates are placed around the
perimeter of the module that is being switched-off as a ring. Special corner
cells are used to turn the power signals around the corners.
Column-based methodology: The gates are inserted within the
module with the cells abutted to each other in the form of columns.
The global power is the higher layers of metal, while the switched power is
in the lower layers.
The Sleep Transistor gating technique is employed wherein the
overall leakage power of the circuit is reduced by temporarily switching off
the circuits when they are not in use. It makes the circuit to act as an
150
inactive mode or in low power mode. When the desired circuits are to be
used once again, they are turned ON again or brought back to the active
mode. Thus the leakage power is minimized by temporarily cutting-OFF
the power to selective blocks that are not being used in that mode.
The sleep transistor gating techniques in fine grained method is
proposed in order to optimize the static power being dissipated. In this
approach, the inputs to the gates are blocked by using NMOS and PMOS
when circuit blocks are not in use. It leads to reduction of unnecessary
utilization of input leading to significant amount of power reduction. In this
approach, sleep transistors are controlled by the select signal applied to the
ALU to select the desired operations. The overall leakage power of the
circuit is reduced by temporarily switching off the circuits when it is not in
use. The leakage power is minimized by temporarily cutting-OFF the
power to selective blocks that are not being used in that mode.
The block diagram structure of designed 8 bit ALU is shown in
Figure 6.2 and the modular view of designed ALU is shown in Figure 6.3.
In Fine Grained Technique, the input is given to the blocks only when it is
required, thus resulting in minimization of power. The NMOS and PMOS
transistors are being used to block the input going to the individual module.
The selection lines of multiplexer are connected to the gate of NMOS and
PMOS through which input is connected to the arithmetic and logic unit.
Input is connected only during the required operation was enabled from the
multiplexer. Thus the NMOS and PMOS act as a switch by blocking the
input to the respective circuit until they were required. The different
operation of designed ALU for different select input is shown in Table 6.1.
8 bit Input
Output
151
Figure 6.2
Block diagram of proposed 8 bit ALU
152
Figure 6.3
Modular view of 8 bit ALU
153
Table 6.1
ALU operation for various select inputs
Select Input
6.6
Operation
S2 S1
S0
0
0
0
NAND
0
0
1
NOR
0
1
0
XOR
0
1
1
SUBTRACTION
1
0
0
SHIFT
1
0
1
ADDITION
1
1
0
MULTIPLICATION ( P0 – P7)
1
1
1
MULTIPLICATION( P8 – P15)
RESULTS AND DISCUSSION
The designed ALU is simulated with 0.18 µm CMOS
technology. Based on the selection lines of multiplexer, the required
operation is decided and input is applied to the ALU unit. Output of the
various operations is taken out from the multiplexers. The simulated output
waveform is depicted in Figure 6.4 (a) and (b). The multipliers are the most
power hungry and require more time for computation of output.
In multiplier, to compute the next output the previous stage output is used.
The logical operation produces single bit delay due to computation of
logical operation in one step. The ALU is simulated with sleep transistor
gating techniques and without sleep transistor gating techniques to
compute the power, delay and number of transistors required for
implementation. The designed ALU is simulated with various multiplier
architectures
discussed
in
Sections
4.2
to
4.5
of
Chapter
The performance comparison is shown in Table 6.2 to Table 6.4.
4.
Voltage in Volts
154
Time in nS
Figure 6.4(a)
Simulated output waveform of 8 bit ALU
Voltage in Volts
155
Time in nS
Figure 6.4 (b)
Simulated output waveform of 8 bit ALU
156
Table 6.2
Performance comparison of 8 bit ALU without sleep
transistor gating using various multiplier architectures
Delay
PDP
No. of
(nS)
(WS)
Transistors
3.24e-001
736
238.7e-9
1794
3.79e-002
726
27.53e-9
1874
7.35e-002
564
41.47e-9
1894
3.34e-002
510
17.0e-9
1864
8 bit ALU
Power (W)
Using array multiplier
(Leonardo et al 2004)
Using Wallace Tree
multiplier
(Naveen et al 2006)
Using Multiplexer based
multiplier
(Kevin Biswas 2005)
Using Binary Tree
Multiplier (Proposed)
Table 6.3
Performance comparison of 8 bit ALU with sleep
transistor gating using various multiplier architectures
8 bit ALU
Using array multiplier
(Leonardo et al 2004)
Using Wallace Tree
multiplier
(Naveen et al 2006)
Using Multiplexer
based multiplier
(Kevin Biswas 2005)
Using Binary Tree
Multiplier (Proposed)
Power
Delay
PDP
No. of
(W)
(nS)
(WS)
Transistors
3.05e-001
740
225.7e-9
1822
3.45e-002
732
25.23e-9
1902
6.85e-002
568
38.9e-9
1922
3.05e-002
513
15.5e-9
1892
157
From the Tables 6.2 and 6.3, it is observed that ALU with the
proposed binary tree multiplier requires less delay and consumes less
power than the other three multiplier architectures delineated earlier.
The number of transistors required is also optimum than the other
multiplier architectures as shown in Table 6.4.
Table 6.4
Comparison of PDP and area requirement 8 bit ALU
with and without sleep transistor gating techniques
8 bit ALU
Using array
multiplier
Using Wallace
Tree multiplier
PDP
without
SGT
(WS)
PDP
with
SGT
(WS)
%
No. of
No. of
% of
change Transistors
Transistors transistors
in
without
with SGT increased
PDP
SGT
238.7e-9 225.7e-9
5.7
1794
1822
1.5
27.53e-9 25.23e-9
9.1
1874
1902
1.47
41.47e-9 38.9e-9
6.6
1894
1922
1.45
17.05e-9 15.5e-9
10
1864
1892
1.47
Using
Multiplexer
based multiplier
Using Binary
Tree Multiplier
From Table 6.4, it is observed that PDP of ALU with sleep
transistor gating techniques is reduced by 10% compared to without sleep
transistor gating techniques. The additional transistor required for sleep
transistor technique is less than 2%. The comparison of power, delay, PDP
and number of transistors required to implement the ALU using various
multiplier architecture without sleep transistor gating techniques is
represented as graph from Figure 6.5 to Figure 6.8 and with sleep transistor
technique, it is shown from Figure 6.9 to Figure 6.12. The performance
158
comparison of ALU with and without SGT is shown in Figure 6.13 to
Figure 6.16.
Figure 6.5
Power consumption of ALU without SGT
Figure 6.6
Delay of ALU without SGT
159
Figure 6.7
Figure 6.8
PDP of ALU without SGT
Number of transistors required without SGT
160
Figure 6.9
Power consumption of ALU with SGT
Figure 6.10 Delay of ALU with SGT
161
Figure 6.11 PDP of ALU with SGT
Figure 6.12 Number of transistors required with SGT
PDP in WS
162
Number of Transistors
Figure 6.13 PDP comparison of ALU with and without SGT
Figure 6.14 Transistor requirement of ALU with and without SGT
163
Figure 6.15 Percentage change of PDP using SGT
Figure 6.16 Percentage increasing of transistors using SGT
164
6.7
SUMMARY
Designing an ALU as a single unit is more complex and difficult
to implement. In this novel method, ALU is designed as bit slice and
formed together for computation. Hence complexity is reduced and
performance is also improved. From the performance comparison, it is
observed that while using sleep transistor gating techniques PDP of ALU is
reduced by 10% with a rise of 1.5% transistor count. The designed ALU is
split into a number of smaller modules and the various operations are
selected by multiplexer. Hence the designed ALU is optimized for high
speed applications.