Low Power VLSI Architectures for Variable-Length Encoding and Decoding Stephen Molloy and Rajeev Jain UCLA Department of Electrical Engineering Engineering IV, 53-138N15 Los Angeles, CA 90024 Abstract - New low power VLSI architectures are presented for variable-length encoding and decoding. Look-up table partitioning by symbol probability is shown to reduce the total power consumption of the variable-length decoder and encoder are reduced by as much as 66% and 75%, respectively. Design examples for a subband image CODEC are presented with measurements from extracted VLSI layouts. I. INTRODUCTION Variable-length coding is a lossless form of coding that achieves compression by assigning shorter length codewords to commonly occurring symbols in a source alphabet, while assigning longer codewords to statistically rare symbols [ 13. Variable-length coding has been effectively applied as one step in the MPEG and P E G image compression standards, as well as in alternative non-standard coding schemes, such as those based on wavelets. These coding schemes typically consist of a prediction and transformation step, followed by a lossy quantization step, and finally one or more lossless coding steps such as run-length coding and/or huffman coding. Previous work in the design of circuits for variable-length encoding and decoding have concentrated almost exclusively on achieving high throughput decoding for video and high-definition television (HDTV) applications, and are described in [2-51. With the growing popularity of portable computers with high-quality audio and video playback capability, low power architectures and techniques are required to provide maximum battery life, without sacrificing real-time throughput. Previous work in low power video signal processing have concentrated on other steps in the video compression and decompression process such as the discrete cosine transform. To minimize the total power consumption of multimedia systems, low power, high throughput, designs are required for all steps in the compression and decompression processes, including variable-length encoding and decoding. II. LOOK-UP TABLE PARTITIONING A significant source of power consumption in a variable-length encoder (VLE) or variable-length decoder (VLD) is the look-up table containing the mapping from source symbols to variable-length codewords. If we assume that the table power consumption increases linearly with size, we can represent the table power consumption as: 0-7803-3694-1 /97/$10.00 1997 IEEE @ ,P = N a (1) where a is the power consumption per table entry, P is the total power, and the tablle has N entries. Next, consider partitioning the table into k isolatee iub-tables, so that only one sub-table is selected when encoding or decoding a source symbol. The power consumption then becomes: i E S, i e S, i e S, where each summation is performed over a set of source alphabet symbols S,, where a is once again the power consumption per table entry, andp, is the probability that the i-th symbol of the source alphabet occurs. The union of all the subsets 8, is the entire source alphabet and the intersection of the subsets is the null set - simply meaning that each subtable contains some unique partition of the entire table. The sum can of course be minimized by equalizing the contribution of each of the right-hand-side terms of the equation. Note that visualizing a plot of source symbol probability versus symbol number, this partitioning is equivalent to dividing the source pdf into k segments with equal area under the curve. If the source alphabet had a uniform probability distribution, the terms in the right-hand-side of equation 2 would be equalized by choosing k tables of size N k , each having a probability Ilk of being selected. The resulting power consumption would become: k i=l Provided that a decision could be made which of the k sub-tables is to be selected and the remaining subtables consume no power unless selected, the active power consumption is reduced by a factor of k, minus the power consumed in the process of deciding which sub-table to select. In applications such as image coding, transform coefficients typically have highly skewed probability distri- 997 Authorized licensed use limited to: Kalpataru Institute of Technology. Downloaded on February 1, 2010 at 00:50 from IEEE Xplore. Restrictions apply. butions, many times modeled as Laplacian or Generalized Gaussian sources. In this case, we expect that a smaller number of symbols in the source alphabet will be very common, while the large remainder of the alphabet occur less often. The probability distribution of d l e v e l pairs after scalar quantization of subband coefficients in a wavelet-transformed image has a maximum for small runs and small levels, and falls off along the run and level axes. We therefore also expect that as a consequence of this, the right-hand-side terms of equation 2 for highly skewed sources will consist of small sets of highly probably source symbols and larger sets of less probable source symbols. This partitioning has a simple but compelling effect on power consumption - it is statistically likely we will access small, and therefore low power look-up tables, and it is statistically unlikely we will access larger, higher power look-up tables. The goal is to reduce power consumption in the average sense - much in the sanie way that a cache benefits the throughput of a microprocessor in the average sense by making most accesses to a small, fast memory and rare accesses to slower main memory. pipelining could be performed by adding registers to the input or output of the look-up table and using retiming transformations to balance the levels of logic between registers. Finally, an output multiplexer selects the output of the appropriate look-up table. Input Data from Fifo m Shift 4 Codeword Length 111. LOW POWER VLD ARCHITECTURE A conventional architecture for implementing variable-length decoding is shown in figure 1. This architecture consists of a pair of input registers with a combined wordlength equal to twice the maximum codeword length, a barrel shifter to select a “sliding” segment of bits, a look-up table for converting the input codeword to an output run/ level pair and a codeword length L, and an accumulator for shifting out the L bits corresponding to the current codeword. This is a constant output rate architecture, capable of decoding one variable-length codeword per cycle An architecture implementing the low-power table partitioning for variable-length decoding is illustrated in figure 2. The VLD in figure 2 is also a constant output-rate architecture capable of decoding a sustained one symbol per cycle, and is based on the designs of [2] and [3]. The VLD consists of two parts: (1) a variable-length shifting datapath and (2) the partitioned codeword tables. The shifting datapath uses a smaller look-up table containing the codeword lengths together with an accumulator to shift the input bitstream past each variable-length codeword. The LUT Select module counts the number of leading ones or zeros in the prefix of the received codeword, using this count to select one of the four individual look-up tables. At the input to each look-up table is a register with an enable signal generated fiom the decoded prefix length. The registers serve to isolate the unselected tables so that only one of the four look-up tables consumes power on any particular cycle. Note that the codeword look-up tables used to determine the corresponding d e v e l pair are not in the feedback path of the shifting datapath, and can therefore be pipelined arbitrarily. This 1 RunLevel Pair Figure 1: Conventional VLD Architecture The table-partitioning has been applied to the design of a VLD for a second-generation of the wavelet image CODEC described in [6], and is used to convert variable-length codewords into d e v e l pairs. The VLD table contains a total of 529 entries, has a maximum codeword length of 16-bitsYand produces a 9-bit run and 7-bit level. VLD circuits with and without table partitioning were implemented in synthesizable VHDL, synthesized to a standard cell netlist, and mapped to a layout in 0.5-micron CMOS. Power consumption measurements were estimated from the extracted layout at a clock rate of 12.27 MHz using a commercial power analysis tool. The 12.27 MHz clock frequency corresponds to the square-pixel NTSC clock rate used in the image CODEC. Using the common architecture of [2] without table partitioning, the power consumption of the VLD is 4.2 mW, 74% of which is consumed in the single look-up table. Of the remaining 1.1 mW, 1.03 mW is consumed in the shifting datapath and the remaining 60 UW in the non-partitioned length look-up table. Applying look-up table partitioning reduces the active power consumption of the look-up tables by 98%, fkom 3.1 mW to 63 uW. The 63 UW average power consumption corresponds to the power of each of the four subtables weighted by the probability that they are accessed. The gatecounts for the synthesized tables (given in terms of 998 Authorized licensed use limited to: Kalpataru Institute of Technology. Downloaded on February 1, 2010 at 00:50 from IEEE Xplore. Restrictions apply. two-input NAND gates), power consumption, and probability of access are given in table I. Note that the sum of the probabilities does not add up to 100% because additional variable-length codewords are allocated to special symbols such as escape codes and end-of-subband markers that assist in the decoding process. feed-forward, the look-up table producing the codeword length is merged with the look-up table producing the variable-length codeword itself, rather than splitting the two tables as shown for the VLI). R d eve1 Pair i Input Data from FIFO FIFO Pop Signal .L I LA Look-up Table I I f Length I I Length I LUT 3.1 Mux :I I -+ Fifo Data RudLevel Pair Figure 3: Conventional VLE Architecture Figure 2: Low Power VLD Architecture TABLE I: VLD Table Partitioning Results I/ I 1 Power (mW) Probability of Access I I Table Entries Weighted IV. LOW POWER VLE ARCHITECTURE The VLE architecture is the dual of the VLD, converting d e v e l pairs into variable-length symbols, and packing these symbols into a fixed-length output bitstream at a sustained one symbol per cycle. A conventional architecture is shown in figure 3, consisting of a look-up table to convert d e v e l pairs to variable-length symbols, and a shift-network to concatenate the variable-length symbols into a fixed-length output stream. The low-power VLE is shown in figure 4. Because this architecture is completely The LUT Select niodule computes a d e v e l distance metric, and uses the result to select one of the four individual look-up tables. The specifics of the d e v e l metric will be described in the: full paper. At the input to each look-up table is a register with an enable signal generated fiom the decoded prefix length. As with the VLD, the registers serve to isolate the unselected tables so that only one of the four look-up tables coinsumes power on any particular cycle. As previously mentioned, because the codeword symbol and length look-up tables are completely feedforward they can be pipelined arbitrarily using retiming transformations. The outputs of the fcur look-up tables are then multiplexed, with the resulting variable-length codeword and its length used in the shifting network to concatenate the current codeword to the end of the outgoing stream. This architecture has been implemented in the same wavelet image CODEC used in the implementation example of the previous section. Lilke the VLD, the VLE table contains a total of 529 entries, takes a 9-bit run and 7-bit level as inputs, and produces a a codeword symbol of 16-bits maximum, and four bits representing the length of the codeword. Table partitioning has been applied to the VLE, and were implemented in synthesizable VHDL, synthesized to a standard cell netlist, and mapped to a layout in 0.5-micron CMOS. As with the decoder, power consumption measurements were estimated fiom the extracted layout at the square 999 Authorized licensed use limited to: Kalpataru Institute of Technology. Downloaded on February 1, 2010 at 00:50 from IEEE Xplore. Restrictions apply. pixel NTSC clock rate of 12.27 MHz using a commercial power analysis tool. Using a straightforward single look-up table architecture without partitioning, the power consumption of the complete VLE is 4.722 mW, 85% of which, or 4.01 mW, is consumed in the look-up table. The remaining 7 10 mW is consumed in the shifting datapath. R d e v e l Pair ---I Code * Fifo Data Fifo Pop Figure 4: Low Power VLE Architecture Applying a four-way look-up table partitioning reduces the active power consumption of the look-up tables by 95%, from 4.01 mW to 192 uW. This 192 uW average power consumption corresponds to the power of each of the four sub-tables weighted by the probability that they are accessed. The gatecounts for the synthesized tables, power consumption, and probability of access are given in tableII. Gatecount Power (mW) Probability of Access Table Entries Weighted Power (mw) 0 38 0.022 0.7182 10 0.0155 1 11 293 I 0.271 I 0.1718 I 56 I 0.0466 2 11 794 I 0.955 I 0.0779 I 154 I 0.0744 V. CONCLUSIONS In this paper we have shown that the dominant source of power consumption in variable-length encoding and decoding circuits is the power consumed in the look-up tables. A new table-partitioning technique based on symbol probabilities was presented, and applied to both the variablelength encoding and variable-length decoding problems. VLSI architectures utilizing this technique as part of a subband image CODEC have been presented and shown to reduce the look-up table power consumption by as much as 98% for decoding and 95% for encoding, while reducing the total power consumption up to 68% and 75% for variablelength decoding and encoding, respectively. One important benefit of this work is the ability to implement large symbol tables to maximize the coding eficiency, without paying the usual linear increase in power consumption with increased table size. This will become increasingly important in handheld devices communicating over low-bitrate channels, where both low power and high coding efficiency are important. A second benefit of this work is that the power reduction is achieved without sacrificing throughput, as the architectures can sustain one symbol per cycle throughput and are amenable to additional pipelining of the tables by retiming. A final important benefit of this work is that these relatively large reductions in active power consumption were achieved entirely using algorithmic optimizations. The growing body of work on low-level circuit design techniques, gate-level power optimization, and low supply-voltage operation can be used together with the techniques presented in this paper to greatly reduce the power consumed in variable-length encoding and decoding circuits. REFERENCES [I] M.I. Lu and C.F. Chen, “An Encoding Procedure and a Decoding Procedure for a New Modified Huffman Code,” IEEE Transactions on Acoustics, Speech andsignal Processing, Vol. 28, No. 1, January 1990. TABLE 11: VLE Table Partitioning Results Table design. The new dominant component of the VLE power consumption is the shifting datapath. [2] M.T. Sun and S.M. Lei, “A Parallel Variable-Length-Code Decoder for Advanced Television Applications,” Proceedings of the Third Intemational Workshop on HDTV, Torino, Italy, August 1989. I I I [3] P. Ruetz, P. Tong, D. Luthi, and P. Ang, “A Video-Rate P E G Chip Set,” Journal of VLSI Signal Processing, Vol. 5, No. 3, April 1993. [4] ~ 3 1457 1.795 0.0307 309 0.0551 Total 2582 3.042 0.9930 529 0.1916 S.F. Chang and D. Messerschmitt, “Designing High-Throughput VLC Decoder Part I - Concurrent Architectures,” IEEE Transactions on Circuits and Systems for Video Technology, Vol. 2, No. 2, pp. 187-196, June 1992. [5] H.D. Lin and D. Messerschnutt, “Designing High-Throughput VLC Decoder Part I1 Parallel Decoding Methods,” IEEE Transactions on Circuits and Systems for Video Technology, Vol. 2, No. 2, pp. 197-206, The total power consumption of the VLE after table partitioning is 1.14 mW, a 75% reduction in power consumption when compared with the original, non-partitioned June 1992. [6] S. Molloy, K. Nishibori, and R. Jain, “A Video CODEC Chipset for 1000 Wireless Multimedia Networking,” IEEE Workshop on VLSI Signal Processing, pp. 381-390, Osaka, Japan, October 1995.
© Copyright 2025 ExpyDoc