Pipeline III - Department of Computer Sciences

CS429: Computer Organization and Architecture
Pipeline III
Warren Hunt, Jr. and Bill Young
Department of Computer Sciences
University of Texas at Austin
Last updated: November 4, 2014 at 12:58
CS429 Slideset 16: 1
Pipeline III
How Do We Fix the Pipeline?
Pad the program with NOPs: Yuck!
Stall the pipeline
Data hazards:
Wait for producing instruction to complete
Then proceed with consuming instruction
Control hazards:
Wait until new PC has been determined
Then begin fetching
How is this better than inserting NOPs into the program?
Forward data within the pipeline
Grab the result from somewhere in the pipe
After it has been computed
But before it has been written back
This gives an opportunity to avoid performance degradation
due to hazards!
CS429 Slideset 16: 2
Pipeline III
Data Forwarding
Naive pipeline
Register isn’t written until completion of write-back stage.
Source operands read from register file in decode stage.
Needs to be in register file at start of stage.
Observation: value is generated in execute or memory stage..
Trick:
Pass value directly from generating instruction to decode stage.
Needs to be available at end of decode stage.
CS429 Slideset 16: 3
Pipeline III
Data Forwarding Example
# prog2
1
2
3
4
5
6
7
0x000: irmovl $10,%edx
F
D
F
E
D
F
M
E
D
W
M
E
W
M
W
F
D
F
E
D
F
0x006: irmovl
$3,%eax
0x00c: nop
0x00d: nop
0x00e: addl %edx,%eax
0x010: halt
8
9
M
E
W
M
W
D
E
M
Cycle 6
W
R[%eax] f 3
W_dstE = %eax
W_valE = 3
•
•
•
D
srcA = %edx
srcB = %eax
valA f R[%edx] = 10
valB f W_valE = 3
irmovl in write back stage
Destination value in W pipeline register
Forward as valB for decode stage
CS429 Slideset 16: 4
Pipeline III
10
W
Bypass Paths
W_icode, W_valM
Decode Stage:
W_valE, W_valM, W_dstE, W_dstM
W_valE
W_valM
W
Forwarding logic
selects valA and valB
m_valM
Memory
Data
Data
memory
memory
M_icode,
M_Bch,
M_valA
Addr, Data
M_valE
Normally from
register file
M
e_valE
Bch
CC
CC
Execute
Forwarding: get valA
or valB from later
pipeline stage
ALU
ALU
E_valA, E_valB,
E_srcA, E_srcB
E
valA, valB
Forward
Forwarding Sources:
d_srcA,
d_srcB
Decode
A
B
Register
Register M
file
file
E
Execute: valE
Write back
D
Memory: valE, valM
Write back: valE,
valM
icode, ifun,
rA, rB, valC
Instruction
Instruction
memory
memory
Fetch
valP
PC
PC
increment
increment
predPC
PC
f_PC
F
CS429 Slideset 16: 5
valP
Pipeline III
Data Forwarding Example 2
# prog4
1
2
3
4
5
0x000: irmovl $10,%edx
F
D
E
M
W
F
D
F
E
M
E
D
0x006: irmovl
$3,%eax
0x00c: addl %edx,%eax
D
F
0x00e: halt
6
7
8
W
M
E
W
M
W
Cycle 4
M
M_dstE = %edx
M_valE = 10
E
E_dstE = %eax
e_valE f 0 + 3 = 3
D
srcA = %edx
srcB = %eax
valA f M_valE = 10
valB f e_valE = 3
Register %edx: generated by ALU during previous cycle;
forwarded from memory as valA.
Register %eax: value just generated by ALU; forward from
execute as valB.
CS429 Slideset 16: 6
Pipeline III
Implementing Forwarding
Add new feedback paths from E, M, and W pipeline registers
into decode stage.
Create logic blocks to select from multiple sources for valA
and valB in decode stage.
CS429 Slideset 16: 7
Pipeline III
W_valE
Write back
W_valM
W
icode
valE
valM
dstE dstM
data out
read
Mem.
control
Data
Data
memory
memory
write
Memory
m_valM
data in
Addr
M_Bch
M
icode
M_valA
M_valE
Bch
valE
valA
dstE dstM
e_Bch
e_valE
ALU
ALU
CC
CC
ALU
A
Execute
E
icode ifun
ALU
fun.
ALU
B
valC
valA
valB
dstE dstM srcA srcB
d_srcA d_srcB
dstE dstM srcA srcB
Sel+Fwd
A
Decode
Fwd
B
A
W_valM
B
Register
RegisterM
file
file
W_valE
E
D
Fetch
icode ifun
rA
rB
valC
Instruction
Instruction
memory
memory
valP
PC
PC
increment
increment
Predict
PC
f_PC
M_valA
Select
PC
F
W_valM
predPC
CS429 Slideset 16: 8
Pipeline III
Limitation of Forwarding
# prog5
1
2
3
4
5
0x000: irmovl $128,%edx
F
D
E
M
W
F
D
F
E
M
E
D
F
0x006: irmovl
$3,%ecx
0x00c: rmmovl %ecx, 0(%edx)
0x012: irmovl $10,%ebx
0x018: mrmovl 0(%edx),%eax # Load %eax
D
F
6
W
M
E
D
F
0x01e: addl %ebx,%eax # Use %eax
0x020: halt
7
8
9
10
11
W
M
E
D
F
W
M
E
D
W
M
E
W
M
W
Cycle 7
Cycle 8
M
M_dstE = %ebx
M_valE = 10
M
M_dstM = %eax
m_valM f M[128] = 3
•
•
•
D
? M_valE = 10
valA f
? R[%eax] = 0
valB f
Error
Load-use dependency:
Value needed by end of decode stage in cycle 7.
Value read from memory in memory stage of cycle 8.
CS429 Slideset 16: 9
Pipeline III
Avoiding Load/Use Hazard
# prog5
1
2
3
4
5
6
7
0x000: irmovl $128,%edx
F
D
F
E
D
F
M
E
D
W
M
E
W
M
W
F
D
F
E
D
M
E
F
D
F
0x006: irmovl
$3,%ecx
0x00c: rmmovl %ecx, 0(%edx)
0x012: irmovl $10,%ebx
0x018: mrmovl 0(%edx),%eax # Load %eax
bubble
0x01e: addl %ebx,%eax # Use %eax
0x020: halt
8
9
10
11
W
M
E
D
F
W
M
E
D
W
M
E
W
M
12
W
Cycle 8
W
W_dstE = %ebx
W_valE = 10
M
M_dstM = %eax
? M[128] = 3
m_valM f
•
•
•
D
? W_valE = 10
valA f
? m_valM = 3
valB f
Stall using instruction for one cycle.
Can the pickup loaded value by forwarding from memory
stage.
CS429 Slideset 16: 10
Pipeline III
Control for Load/Use Hazard
# prog5
1
2
3
4
5
0x000: irmovl $128,%edx
F
D
E
M
W
F
D
E
M
W
F
D
F
E
D
F
0x006: irmovl
$3,%ecx
0x00c: rmmovl %ecx, 0(%edx)
0x012: irmovl $10,%ebx
0x018: mrmovl 0(%edx),%eax # Load %eax
6
7
8
M
E
W
M
W
D
E
M
W
D
F
E
D
F
M
E
D
bubble
0x01e: addl %ebx,%eax # Use %eax
F
0x020: halt
9
10
11
W
M
E
W
M
Cycle 8
Stall instructions in fetch and decode stages
Inject bubble into execute stage.
Condition
Load/Use Hazard
F
stall
D
stall
CS429 Slideset 16: 11
E
bubble
Pipeline III
W
W_dstE = %ebx
W_valE = 10
M M
W
normal normal
M_dstM = %eax
? M[128] = 3
m_valM f
•
•
•
12
W
Branch Misprediction Example
0x000:
xorl
%eax, %eax
0x002:
jne
t
0x007:
irmovl $1, %eax
0x00d:
nop
0x00e:
nop
0x00f:
nop
0x010:
halt
0x011: t: irmovl $2, %edx
0x017:
irmovl $3, %ecx
0x01d:
irmovl $4, %edx
# Not taken
# Fall through
# Target (should not execute)
# Should not execute
# Should not execute
Should only execute the first 7 instructions.
CS429 Slideset 16: 12
Pipeline III
Handling Misprediction
# prog8
1
2
3
4
5
6
0x000: xorl %eax,%eax
F
D
F
E
D
F
M
E
D
W
M
W
E
M
W
D
F
E
D
F
0x002: jne target # Not taken
0x00e: irmovl $2,%edx # Target
bubble
0x014: irmovl $3,%ebx # Target+1
7
8
9
M
E
W
M
W
D
E
M
10
F
bubble
0x007: irmovl $2,%edx # Fall through
0x00d: halt
W
Predict branch as taken
Fetch 2 instructions at target
Cancel when mispredicted
Detect branch not taken in execute stage
On following cycle, replace instruction in execute and decode
stage by bubbles.
No side effects have occurred yet.
CS429 Slideset 16: 13
Pipeline III
Control for Misprediction
# prog8
1
2
3
4
5
6
0x000: xorl %eax,%eax
F
D
F
E
D
F
M
E
D
W
M
W
E
M
W
D
F
E
D
F
0x002: jne target # Not taken
0x00e: irmovl $2,%edx # Target
bubble
0x014: irmovl $3,%ebx # Target+1
bubble
0x00d: halt
F
normal
D
bubble
CS429 Slideset 16: 14
8
9
M
E
W
M
W
D
E
M
10
F
0x007: irmovl $2,%edx # Fall through
Condition
Mispredicted
Branch
7
E
bubble
Pipeline III
M
normal
W
normal
W
Return Example
0x000:
0x006:
0x00b:
0x011:
0x020:
0x020:
0x026:
0x027:
0x02d:
0x033:
0x039:
0x100:
0x100:
irmovl
call
irmovl
halt
.pos 0x20
p: irmovl
ret
irmovl
irmovl
irmovl
irmovl
.pos 0x100
Stack:
Stack, % esp
p
$5, %esi
# Initialize stack pointer
# Procedure call
# Return point
$-1, %edi
# procedure
$1,
$2,
$3,
$4,
#
#
#
#
%eax
%ecx
%edx
%ebx
should
should
should
should
be
be
be
be
# Stack pointer
Previously executed three additional instructions.
CS429 Slideset 16: 15
not
not
not
not
Pipeline III
executed
executed
executed
executed
Correct Return Example
0x026:
0x00b:
ret
bubble
bubble
bubble
irmovl
$5, %esi
# Return
As ret passes through pipeline, stall at fetch stage—while in
decode, execute, and memory stages.
Inject bubble into decode stage.
Release stall when reach write-back stage.
CS429 Slideset 16: 16
Pipeline III
Control for Return
0x026:
0x00b:
ret
bubble
bubble
bubble
irmovl
Condition
Processing ret
$5, %esi
F
stall
D
bubble
CS429 Slideset 16: 17
# Return
E
normal
Pipeline III
M
normal
W
normal
Special Control Cases
Detection:
Condition
Processing ret
Load/Use Hazard
Mispredicted Branch
Trigger
IRET in { D_icode, E_icode, M_icode }
E_icode in {IMRMOVL, IPOPL} &&
E_dstM in {d_srcA, d_srcB}
E_icode == IJXX & !e_Bch
Action (on next cycle):
Condition
F
Processing ret
stall
Load/Use Hazard
stall
Mispredicted
normal
Branch
D
bubble
stall
bubble
CS429 Slideset 16: 18
E
normal
bubble
bubble
Pipeline III
M
normal
normal
normal
W
normal
normal
normal