A History-Based I-Cache for Low Energy Multimedia Applications K. Inoue, V. Moshnyaga, and K. Murakami Introduction Increase in cache size Power consumed in on-chip caches DEC 21164 CPU* StrongARM SA-110 CPU* Bipolar ECL CPU** 25% 43% 50% * Kamble et. al., “Analytical energy Dissipation Models for Low Power Caches”, ISLPED’97 ** Joouppi et. al., “A 300-MHz 115-W 32-b Bipolar ECL Microprocessor” ,IEEE Journal of Solid-State Circuits’93 Cache Access Energy Cache Subbanking Tag Memory Ecache = Etag + Edata 17% 90% Data Memory (Cache Lines) Subbank Breakdown of Esram per access “HBTC cache” Detect and Eliminate Unnecessary Tag-Checks! for Direct-mapped I-cache N=1 N=2 Word: 64 bits Cache Size: 64 KB Line Size: 64 B Others Data (bit-lines) Tag N=3 N=4 N: # of Subbanks (bit-lines) Conventional Tag-Check Scheme Cache is updated only when replacement occurs! A loaded data stays at the same location at least until the next cache-miss takes place Programs include a lot of loops! A number of instructions are executed repetitively Inst. A is referenced N times Miss! Ref. A Ref. A Ref. A Ref. A Tag Tag Tag Tag Check! Check! Check! Check! Ref. A Miss! Tag Check! N-1 of tag-checks are unnecessary! Cache-miss interval Cache is stable! time History-Based Tag-Comparison (HBTC) Scheme Attempts to detect and eliminate unnecessary tag-checks at run time! The target instruction has been referenced before, and No cache miss has occurred since the previous reference of the target instruction. Miss! Ref. A Ref. A Ref. A Ref. A Tag Tag Tag Tag Check! Check! Check! Check! Ref. A Tag Check! Detect and Eliminate! Cache-miss interval Miss! time But, How? Exploit Execution Footprints Provided by an Extended BTB! 1. Execute an instruction block A at time T A? Leave the execution footprint in $ the corresponding BTB-entry. (2. If a cache miss occurs, then erase all the footprints.) 3. Execute the instruction block A at time T+X A $ If the footprint is detected in the BTB, then omit the tag comparisons for all the instructions in the block A! HBTC I-$ Architecture PBAreg Branch Inst. Addr. Pred. Address for EF writing Result T F (Execution Footprint for Target) (EF for Fall-through) Branch Inst. Addr. Target Address Target Instruction Block I-$ BTB Not-taken Branch Inst. Addr. Target Address PC Branch Prediction Result Mode Controller Tag-check control HBTC I-$ Operation Normal Mode (NM): w/ Tag checks Omitting Mode (OM): w/o Tag checks Tracing Mode (TM): w/ Tag checks (on a BTB hit, the EF addressed by the PBAreg is set to 1) Mode Transition GOtoNM T/F==1 I-Cache miss or OM BTB Hit BTB replacement or RAS address or GOtoNM T/F==0 Branch misprediction NM All EFs are invalidated! GOtoNM TM PC and Pred.-result PBAreg Operation Example Iteration Count EFT EFF MODE (NM,OM,TM) 1-C 12 3 45 A Adr:T/N Top Branch Target Buffer B Branch to F Time (Iteration Count – Address of Branch) C Branch to A D Branch to A F PBAreg State of BTB Top Execution Flow Operation Example Iteration Count Initial 12 3 45 A Top B Branch to F C Branch to A D Branch to A Performing! Top F EFT EFF MODE (NM,OM,TM) Branch-C Branch-D --:-- PBAreg Operation Example Iteration Count Initial 12 3 45 A Top B Branch to F C Branch to A D Branch to A F Top 1-C EFT EFF MODE (NM,OM,TM) Branch-C Branch-D --:-- PBAreg Omitting-Mode Branch-C Branch-D --:-- Operation Example Iteration Count Initial 12 3 45 A Top B Branch to F C Branch to A D Branch to A Omitting! Top F 1-C EFT EFF MODE (NM,OM,TM) Branch-C Branch-D --:-- PBAreg Omitting-Mode Branch-C Branch-D --:-- Operation Example Iteration Count Initial 12 3 45 A Top B Branch to F C Branch to A D Branch to A Performing! Top F 1-C 2-C EFT EFF MODE (NM,OM,TM) Branch-C Branch-D --:-- PBAreg Omitting-Mode Branch-C Branch-D --:-Tracing-Mode Branch-C Branch-D C:N Operation Example Iteration Count Initial 12 3 45 A Top B Branch to F C Branch to A D Branch to A F Top 1-C 2-C 2-D EFT EFF MODE (NM,OM,TM) Branch-C Branch-D --:-- PBAreg Omitting-Mode Branch-C Branch-D --:-Tracing-Mode Branch-C Branch-D C:N Branch-C Branch-D D:T BTB hit on TM Operation Example Iteration Count Initial 12 3 45 A Top B Branch to F C Branch to A D Branch to A F Top 1-C 2-C 2-D EFT EFF MODE (NM,OM,TM) Branch-C Branch-D --:-- PBAreg Omitting-Mode Branch-C Branch-D --:-Tracing-Mode Branch-C Branch-D C:N Tracing-Mode Branch-C Branch-D D:T Operation Example Iteration Count Initial 12 3 45 A Top B Branch to F C Branch to A D Branch to A Performing! Top F 1-C 2-C 2-D EFT EFF MODE (NM,OM,TM) Branch-C Branch-D --:-- PBAreg Omitting-Mode Branch-C Branch-D --:-Tracing-Mode Branch-C Branch-D C:N Tracing-Mode Branch-C Branch-D D:T Operation Example Iteration Count Initial 12 3 45 A Top B Branch to F C Branch to A D Branch to A F Top 1-C 2-C 2-D 3-C EFT EFF MODE (NM,OM,TM) Branch-C Branch-D --:-- PBAreg Omitting-Mode Branch-C Branch-D --:-Tracing-Mode Branch-C Branch-D C:N Tracing-Mode Branch-C Branch-D D:T Branch-C Branch-D --:-- BTB hit on TM Operation Example Iteration Count Initial 12 3 45 A Top B Branch to F C Branch to A D Branch to A F Top 1-C 2-C 2-D 3-C EFT EFF MODE (NM,OM,TM) Branch-C Branch-D --:-- PBAreg Omitting-Mode Branch-C Branch-D --:-Tracing-Mode Branch-C Branch-D C:N Tracing-Mode Branch-C Branch-D D:T Omitting-Mode Branch-C Branch-D --:-- Operation Example Iteration Count Initial 12 3 45 A Top B Branch to F C Branch to A 1-C 2-C 2-D D Branch to A Omitting! 3-C Top F EFT EFF MODE (NM,OM,TM) Branch-C Branch-D --:-- PBAreg Omitting-Mode Branch-C Branch-D --:-Tracing-Mode Branch-C Branch-D C:N Tracing-Mode Branch-C Branch-D D:T Omitting-Mode Branch-C Branch-D --:-- Operation Example Iteration Count Initial 12 3 45 A Top B Branch to F C Branch to A D Branch to A F Top 1-C 2-C 2-D 3-C Performing! Performing!3-D Omitting! EFT EFF MODE (NM,OM,TM) Branch-C Branch-D --:-- PBAreg Omitting-Mode Branch-C Branch-D --:-Tracing-Mode Branch-C Branch-D C:N Tracing-Mode Branch-C Branch-D D:T Omitting-Mode Branch-C Branch-D --:-Omitting-Mode Branch-C Branch-D --:-- Performance/Energy Overhead Energy Energy for execution-footprint access for reading (every BTB look-up) for writing (BTB hit in TMode) for invalidating (Cache miss or BTB replace) Performance BTB access conflict (1 stall cycle) for execution footprint writing for execution footprint invalidation Normalized Tag-Comparison Count Evaluation – Tag-Check Counts – 0.4 ITC HBTC Comb. 0.3 0.2 0.1 0 129.compress adpcm(e) mpeg2(e) epic(e) pegwit(e) 132.ijpeg adpcm(d) mpeg2(d) epic(d) pegwit(d) The HBTC makes better results than the ITC. The hybrid approach makes significant reduction! Normalized Cache Energy Evaluation –Energy and Performance– 1 0 .9 0 .8 0 .7 0 .6 0 .5 0 .4 0 .3 0 .2 0 .1 0 Ebtbadd Eoutput Etag Edata 129.compress adpcm(e) mpeg2(e) epic(e) pegwit(e) 132.ijpeg adpcm(d) mpeg2(d) epic(d) pegwit(d) 1 .0 4 Norm. Exe. Time BTB energy overhead does not have large impact. 17% of Ecache saving (adpcm enc) Performance degradation is trivial (0.8%: worst case) 1 .0 2 1 0 .9 8 0 .9 6 0 .9 4 0 .9 2 0 .9 129.compress adpcm(e) mpeg2(e) epic(e) pegwit(e) 132.ijpeg adpcm(d) mpeg2(d) epic(d) pegwit(d) Conclusions History-Based Tag-Comparison Instruction Cache 1. Exploits execution footprints recorded in the BTB. 2. Reduces tag-comparison count by 95% (adpcm(e)). 3. Achieves 17 % of cache energy saving! Future work • Analyze energy consumption based on chip design. Buck Up Slides (History-based Tag-Comparison Cache) Cache Energy Consumption Energy consumed in Cache Edecode + Esram + Eio Etag + Edata Breakdown of Esram per access N=1 N=2 Data Memory (Cache Lines) Others Data (bit-lines) Cache Subbanking Tag Memory Word: 64 bits Cache Size: 64 KB Line Size: 64 B Tag N=3 N=4 (bit-lines) N: # of Subbanks Subbank This calculation is based on Kamble, et. Al., “Analytical energy Dissipation Models for Low Power Caches”, ISLPED’97 Low Power Caches - Reducing both Etag and Edata Adding a small L0 cache •Filter Cache •S-Cache •Block Buffering Processor L0 Cache L1 Cache Dividing cache module Multiple accessing •MRU Cache •Hash-Rehash Cache Cache Sequential Way-Access way0 way1 way2 way3 •MDM Cache Low Power Caches - Reducing Edata Dividing cache module Tag Line Tag Line •Cache Sub-Banking Accessing sequentially •Phased Cache •Pipelined Cache Miss! Replace Hit! Direct-mapped I-Cache Ecache = Edecode + Esram + Eio Esram_tag + Esram_data Reference Address Tag Index Offset Tag SRAM Data SRAM Tag Line MUX Hit? Word Data Breakdown of Cache Energy Word: 64 bits Cache Size: 64 KB Line Size: 64 B Energy consumed in Cache Edecode + Esram + Eio Breakdown of Esram per access Others 100% 90% Etag + Edata Cache Subbanking Tag Memory Data Memory (Cache Lines) 80% 70% Data 60% (bit-lines) 50% 40% 30% 20% Tag 10% (bit-lines) 0% 8(1) 4(2) 2(4) 1(8) # of words in a Subbank-entry (Total # of Subbanks) Subbank This calculation is based on Kamble, et. Al., “Analytical energy Dissipation Models for Low Power Caches”, ISLPED’97 Breakdown of Esram 32-bit CPU CS = 32 KB L S = 32 B 100% 100% Breakdown of Energy 64-bit CPU 90% 90% 80% 80% 70% 70% 60% 60% 50% 50% 40% 40% 30% 30% 20% 20% 10% 10% 0% CS = 64 KB LS = 64 B 8(1) 4(2) 2(4) 1(8) 0% 8(1) 4(2) 2(4) # of words in a Subbank (Total # of Subbanks) Esram_others Esram_data_bit Esram_tag_bit 1(8) CS: Cache Size LS: Line Size This calculation is based on Kamble, et. Al., “Analytical energy Dissipation Models for Low Power Caches”, ISLPED’97 Conventional Tag-Check Scheme Hit rate of instruction cache (I-$) is quite HIGH! Most of the tag-comparisons result in MATCH Performs tag check in EVERY cache access despite that some instructions obviously exist in the cache. Wastes unnecessary energy! Ave. # of references Per stable-time Conventional I-$ 16 KB Direct-Mapped Cache (32 B Lines) 132.ijpeg 100 10 4 10 255 Cache-line address 511 History-Based Tag-Comparison (HBTC) Instruction Cache -MotivationHit rate of instruction cache (I-$) is quite HIGH! Most of the tag-comparisons result in HIT Conventional I-$ HBTC I-$ Performs tag-comparison in EVERY cache access despite that some instructions obviously exist in cache. OMITS tag-comparison for some cache accesses if it knows that the corresponding instructions obviously exist in cache. Dissipates energy unnecessarily! Eliminate the energy consumed by the redundant tag-checks!
© Copyright 2025 ExpyDoc