スライド タイトルなし

A History-Based I-Cache for Low
Energy Multimedia Applications
K. Inoue, V. Moshnyaga, and K. Murakami
Introduction
Increase in cache size
Power consumed in on-chip caches
DEC 21164 CPU* StrongARM SA-110 CPU* Bipolar ECL CPU**
25%
43%
50%
* Kamble et. al., “Analytical energy Dissipation Models for Low Power Caches”, ISLPED’97
** Joouppi et. al., “A 300-MHz 115-W 32-b Bipolar ECL Microprocessor” ,IEEE Journal
of Solid-State Circuits’93
Cache Access Energy
Cache Subbanking
Tag
Memory
Ecache = Etag + Edata
17%
90%
Data Memory
(Cache Lines)
Subbank
Breakdown of
Esram per access
“HBTC cache”
Detect and Eliminate
Unnecessary Tag-Checks!
for Direct-mapped I-cache
N=1
N=2
Word: 64 bits
Cache Size: 64 KB
Line Size: 64 B
Others
Data
(bit-lines)
Tag
N=3
N=4
N: # of Subbanks
(bit-lines)
Conventional Tag-Check Scheme
Cache is updated only when replacement occurs!
A loaded data stays at the same location at least
until the next cache-miss takes place
Programs include a lot of loops!
A number of instructions are executed repetitively
Inst. A is referenced N times
Miss!
Ref. A
Ref. A Ref. A Ref. A
Tag
Tag
Tag
Tag
Check! Check! Check! Check!
Ref. A
Miss!
Tag
Check!
N-1 of tag-checks are unnecessary!
Cache-miss interval
Cache is stable!
time
History-Based Tag-Comparison
(HBTC) Scheme
Attempts to detect and eliminate
unnecessary tag-checks at run time!
 The target instruction has been referenced
before, and
 No cache miss has occurred since the previous
reference of the target instruction.
Miss!
Ref. A
Ref. A Ref. A Ref. A
Tag
Tag
Tag
Tag
Check! Check! Check! Check!
Ref. A
Tag
Check!
Detect and Eliminate!
Cache-miss interval
Miss!
time
But, How?
Exploit Execution Footprints
Provided by an Extended BTB!
1. Execute an instruction block A at time T
A?
Leave the execution footprint in
$ the corresponding BTB-entry.
(2. If a cache miss occurs, then erase all the footprints.)
3. Execute the instruction block A at time T+X
A
$
If the footprint is detected in the BTB,
then omit the tag comparisons for all
the instructions in the block A!
HBTC I-$ Architecture
PBAreg
Branch Inst. Addr.
Pred. Address for EF writing
Result
T
F
(Execution Footprint for Target)
(EF for Fall-through)
Branch Inst. Addr. Target Address
Target Instruction
Block
I-$
BTB
Not-taken
Branch Inst. Addr. Target Address
PC
Branch Prediction Result
Mode
Controller
Tag-check control
HBTC I-$ Operation
Normal Mode (NM): w/ Tag checks
Omitting Mode (OM): w/o Tag checks
Tracing Mode (TM): w/ Tag checks
(on a BTB hit, the EF addressed
by the PBAreg is set to 1)
Mode Transition
GOtoNM
T/F==1
I-Cache miss or
OM
BTB Hit
BTB replacement or
RAS address or
GOtoNM
T/F==0
Branch misprediction
NM
All EFs are invalidated!
GOtoNM
TM
PC and
Pred.-result
PBAreg
Operation Example
Iteration
Count
EFT EFF MODE (NM,OM,TM)
1-C
12 3 45
A
Adr:T/N
Top
Branch Target Buffer
B Branch to F
Time
(Iteration Count – Address of Branch)
C Branch to A
D Branch to A
F
PBAreg
State of BTB
Top
Execution Flow
Operation Example
Iteration
Count Initial
12 3 45
A
Top
B Branch to F
C Branch to A
D Branch to A
Performing!
Top
F
EFT EFF MODE (NM,OM,TM)
Branch-C
Branch-D
--:--
PBAreg
Operation Example
Iteration
Count Initial
12 3 45
A
Top
B Branch to F
C Branch to A
D Branch to A
F
Top
1-C
EFT EFF MODE (NM,OM,TM)
Branch-C
Branch-D
--:--
PBAreg
Omitting-Mode
Branch-C
Branch-D
--:--
Operation Example
Iteration
Count Initial
12 3 45
A
Top
B Branch to F
C Branch to A
D Branch to A
Omitting!
Top
F
1-C
EFT EFF MODE (NM,OM,TM)
Branch-C
Branch-D
--:--
PBAreg
Omitting-Mode
Branch-C
Branch-D
--:--
Operation Example
Iteration
Count Initial
12 3 45
A
Top
B Branch to F
C
Branch to A
D Branch to A
Performing!
Top
F
1-C
2-C
EFT EFF MODE (NM,OM,TM)
Branch-C
Branch-D
--:--
PBAreg
Omitting-Mode
Branch-C
Branch-D
--:-Tracing-Mode
Branch-C
Branch-D
C:N
Operation Example
Iteration
Count Initial
12 3 45
A
Top
B Branch to F
C
Branch to A
D
Branch to A
F
Top
1-C
2-C
2-D
EFT EFF MODE (NM,OM,TM)
Branch-C
Branch-D
--:--
PBAreg
Omitting-Mode
Branch-C
Branch-D
--:-Tracing-Mode
Branch-C
Branch-D
C:N
Branch-C
Branch-D
D:T BTB hit on TM
Operation Example
Iteration
Count Initial
12 3 45
A
Top
B Branch to F
C
Branch to A
D Branch to A
F
Top
1-C
2-C
2-D
EFT EFF MODE (NM,OM,TM)
Branch-C
Branch-D
--:--
PBAreg
Omitting-Mode
Branch-C
Branch-D
--:-Tracing-Mode
Branch-C
Branch-D
C:N
Tracing-Mode
Branch-C
Branch-D
D:T
Operation Example
Iteration
Count Initial
12 3 45
A
Top
B Branch to F
C
Branch to A
D Branch to A
Performing!
Top
F
1-C
2-C
2-D
EFT EFF MODE (NM,OM,TM)
Branch-C
Branch-D
--:--
PBAreg
Omitting-Mode
Branch-C
Branch-D
--:-Tracing-Mode
Branch-C
Branch-D
C:N
Tracing-Mode
Branch-C
Branch-D
D:T
Operation Example
Iteration
Count Initial
12 3 45
A
Top
B Branch to F
C
Branch to A
D Branch to A
F
Top
1-C
2-C
2-D
3-C
EFT EFF MODE (NM,OM,TM)
Branch-C
Branch-D
--:--
PBAreg
Omitting-Mode
Branch-C
Branch-D
--:-Tracing-Mode
Branch-C
Branch-D
C:N
Tracing-Mode
Branch-C
Branch-D
D:T
Branch-C
Branch-D
--:-- BTB hit on TM
Operation Example
Iteration
Count Initial
12 3 45
A
Top
B Branch to F
C
Branch to A
D Branch to A
F
Top
1-C
2-C
2-D
3-C
EFT EFF MODE (NM,OM,TM)
Branch-C
Branch-D
--:--
PBAreg
Omitting-Mode
Branch-C
Branch-D
--:-Tracing-Mode
Branch-C
Branch-D
C:N
Tracing-Mode
Branch-C
Branch-D
D:T
Omitting-Mode
Branch-C
Branch-D
--:--
Operation Example
Iteration
Count Initial
12 3 45
A
Top
B Branch to F
C
Branch to A
1-C
2-C
2-D
D Branch to A
Omitting! 3-C
Top
F
EFT EFF MODE (NM,OM,TM)
Branch-C
Branch-D
--:--
PBAreg
Omitting-Mode
Branch-C
Branch-D
--:-Tracing-Mode
Branch-C
Branch-D
C:N
Tracing-Mode
Branch-C
Branch-D
D:T
Omitting-Mode
Branch-C
Branch-D
--:--
Operation Example
Iteration
Count Initial
12 3 45
A
Top
B Branch to F
C
Branch to A
D Branch to A
F
Top
1-C
2-C
2-D
3-C
Performing!
Performing!3-D
Omitting!
EFT EFF MODE (NM,OM,TM)
Branch-C
Branch-D
--:--
PBAreg
Omitting-Mode
Branch-C
Branch-D
--:-Tracing-Mode
Branch-C
Branch-D
C:N
Tracing-Mode
Branch-C
Branch-D
D:T
Omitting-Mode
Branch-C
Branch-D
--:-Omitting-Mode
Branch-C
Branch-D
--:--
Performance/Energy Overhead
Energy
Energy for execution-footprint access
 for reading (every BTB look-up)
 for writing (BTB hit in TMode)
 for invalidating (Cache miss or BTB replace)
Performance
BTB access conflict (1 stall cycle)
 for execution footprint writing
 for execution footprint invalidation
Normalized
Tag-Comparison Count
Evaluation – Tag-Check Counts –
0.4
ITC
HBTC
Comb.
0.3
0.2
0.1
0
129.compress
adpcm(e)
mpeg2(e)
epic(e)
pegwit(e)
132.ijpeg
adpcm(d)
mpeg2(d)
epic(d)
pegwit(d)
The HBTC makes better results than the ITC.
The hybrid approach makes significant reduction!
Normalized
Cache Energy
Evaluation
–Energy and Performance–
1
0 .9
0 .8
0 .7
0 .6
0 .5
0 .4
0 .3
0 .2
0 .1
0
Ebtbadd
Eoutput
Etag
Edata
129.compress adpcm(e) mpeg2(e)
epic(e)
pegwit(e)
132.ijpeg adpcm(d) mpeg2(d) epic(d)
pegwit(d)
1 .0 4
Norm. Exe. Time
 BTB energy overhead
does not have large
impact.
 17% of Ecache saving
(adpcm enc)
 Performance degradation
is trivial
(0.8%: worst case)
1 .0 2
1
0 .9 8
0 .9 6
0 .9 4
0 .9 2
0 .9
129.compress adpcm(e) mpeg2(e)
epic(e)
pegwit(e)
132.ijpeg adpcm(d) mpeg2(d) epic(d) pegwit(d)
Conclusions
History-Based Tag-Comparison Instruction Cache
1. Exploits execution footprints recorded in the BTB.
2. Reduces tag-comparison count by 95% (adpcm(e)).
3. Achieves 17 % of cache energy saving!
Future work
• Analyze energy consumption based on chip design.
Buck Up Slides
(History-based Tag-Comparison Cache)
Cache Energy Consumption
Energy consumed in Cache
Edecode + Esram + Eio
Etag + Edata
Breakdown of
Esram per access
N=1
N=2
Data Memory
(Cache Lines)
Others
Data
(bit-lines)
Cache Subbanking
Tag
Memory
Word: 64 bits
Cache Size: 64 KB
Line Size: 64 B
Tag
N=3
N=4
(bit-lines)
N: # of Subbanks
Subbank
This calculation is based on Kamble, et. Al., “Analytical energy
Dissipation Models for Low Power Caches”, ISLPED’97
Low Power Caches
- Reducing both Etag and Edata Adding a small L0 cache
•Filter Cache
•S-Cache
•Block Buffering
Processor
L0 Cache
L1 Cache
Dividing cache module
Multiple accessing
•MRU Cache
•Hash-Rehash Cache
Cache
Sequential Way-Access
way0
way1
way2
way3
•MDM Cache
Low Power Caches
- Reducing Edata Dividing cache module
Tag
Line
Tag
Line
•Cache Sub-Banking
Accessing sequentially
•Phased Cache
•Pipelined Cache
Miss!
Replace
Hit!
Direct-mapped I-Cache
Ecache = Edecode + Esram + Eio
Esram_tag + Esram_data
Reference Address
Tag Index Offset
Tag SRAM Data SRAM
Tag
Line
MUX
Hit?
Word Data
Breakdown of Cache Energy
Word: 64 bits
Cache Size: 64 KB
Line Size: 64 B
Energy consumed in Cache
Edecode + Esram + Eio
Breakdown of
Esram per access
Others
100%
90%
Etag + Edata
Cache Subbanking
Tag
Memory
Data Memory
(Cache Lines)
80%
70%
Data
60%
(bit-lines)
50%
40%
30%
20%
Tag
10%
(bit-lines)
0%
8(1)
4(2)
2(4)
1(8)
# of words in a Subbank-entry
(Total # of Subbanks)
Subbank
This calculation is based on Kamble, et. Al., “Analytical energy
Dissipation Models for Low Power Caches”, ISLPED’97
Breakdown of Esram
32-bit CPU
CS = 32 KB
L S = 32 B
100%
100%
Breakdown of Energy
64-bit CPU
90%
90%
80%
80%
70%
70%
60%
60%
50%
50%
40%
40%
30%
30%
20%
20%
10%
10%
0%
CS = 64 KB
LS = 64 B
8(1)
4(2)
2(4)
1(8)
0%
8(1)
4(2)
2(4)
# of words in a Subbank (Total # of Subbanks)
Esram_others
Esram_data_bit
Esram_tag_bit
1(8)
CS: Cache Size
LS: Line Size
This calculation is based on Kamble, et. Al., “Analytical energy
Dissipation Models for Low Power Caches”, ISLPED’97
Conventional Tag-Check Scheme
Hit rate of instruction cache (I-$) is quite HIGH!
Most of the tag-comparisons result in MATCH
Performs tag check in EVERY
cache access despite that
some instructions obviously
exist in the cache.
Wastes unnecessary
energy!
Ave. # of references
Per stable-time
Conventional I-$
16 KB Direct-Mapped Cache (32 B Lines)
132.ijpeg
100
10
4
10
255
Cache-line address
511
History-Based Tag-Comparison (HBTC)
Instruction Cache -MotivationHit rate of instruction cache (I-$) is quite HIGH!
Most of the tag-comparisons result in HIT
Conventional I-$
HBTC I-$
Performs tag-comparison in
EVERY cache access despite
that some instructions obviously
exist in cache.
OMITS tag-comparison for some
cache accesses if it knows that
the corresponding instructions
obviously exist in cache.
Dissipates energy
unnecessarily!
Eliminate the energy
consumed by the
redundant tag-checks!