CS429: Computer Organization and Architecture Pipeline III Warren Hunt, Jr. and Bill Young Department of Computer Sciences University of Texas at Austin Last updated: November 4, 2014 at 12:58 CS429 Slideset 16: 1 Pipeline III How Do We Fix the Pipeline? Pad the program with NOPs: Yuck! Stall the pipeline Data hazards: Wait for producing instruction to complete Then proceed with consuming instruction Control hazards: Wait until new PC has been determined Then begin fetching How is this better than inserting NOPs into the program? Forward data within the pipeline Grab the result from somewhere in the pipe After it has been computed But before it has been written back This gives an opportunity to avoid performance degradation due to hazards! CS429 Slideset 16: 2 Pipeline III Data Forwarding Naive pipeline Register isn’t written until completion of write-back stage. Source operands read from register file in decode stage. Needs to be in register file at start of stage. Observation: value is generated in execute or memory stage.. Trick: Pass value directly from generating instruction to decode stage. Needs to be available at end of decode stage. CS429 Slideset 16: 3 Pipeline III Data Forwarding Example # prog2 1 2 3 4 5 6 7 0x000: irmovl $10,%edx F D F E D F M E D W M E W M W F D F E D F 0x006: irmovl $3,%eax 0x00c: nop 0x00d: nop 0x00e: addl %edx,%eax 0x010: halt 8 9 M E W M W D E M Cycle 6 W R[%eax] f 3 W_dstE = %eax W_valE = 3 • • • D srcA = %edx srcB = %eax valA f R[%edx] = 10 valB f W_valE = 3 irmovl in write back stage Destination value in W pipeline register Forward as valB for decode stage CS429 Slideset 16: 4 Pipeline III 10 W Bypass Paths W_icode, W_valM Decode Stage: W_valE, W_valM, W_dstE, W_dstM W_valE W_valM W Forwarding logic selects valA and valB m_valM Memory Data Data memory memory M_icode, M_Bch, M_valA Addr, Data M_valE Normally from register file M e_valE Bch CC CC Execute Forwarding: get valA or valB from later pipeline stage ALU ALU E_valA, E_valB, E_srcA, E_srcB E valA, valB Forward Forwarding Sources: d_srcA, d_srcB Decode A B Register Register M file file E Execute: valE Write back D Memory: valE, valM Write back: valE, valM icode, ifun, rA, rB, valC Instruction Instruction memory memory Fetch valP PC PC increment increment predPC PC f_PC F CS429 Slideset 16: 5 valP Pipeline III Data Forwarding Example 2 # prog4 1 2 3 4 5 0x000: irmovl $10,%edx F D E M W F D F E M E D 0x006: irmovl $3,%eax 0x00c: addl %edx,%eax D F 0x00e: halt 6 7 8 W M E W M W Cycle 4 M M_dstE = %edx M_valE = 10 E E_dstE = %eax e_valE f 0 + 3 = 3 D srcA = %edx srcB = %eax valA f M_valE = 10 valB f e_valE = 3 Register %edx: generated by ALU during previous cycle; forwarded from memory as valA. Register %eax: value just generated by ALU; forward from execute as valB. CS429 Slideset 16: 6 Pipeline III Implementing Forwarding Add new feedback paths from E, M, and W pipeline registers into decode stage. Create logic blocks to select from multiple sources for valA and valB in decode stage. CS429 Slideset 16: 7 Pipeline III W_valE Write back W_valM W icode valE valM dstE dstM data out read Mem. control Data Data memory memory write Memory m_valM data in Addr M_Bch M icode M_valA M_valE Bch valE valA dstE dstM e_Bch e_valE ALU ALU CC CC ALU A Execute E icode ifun ALU fun. ALU B valC valA valB dstE dstM srcA srcB d_srcA d_srcB dstE dstM srcA srcB Sel+Fwd A Decode Fwd B A W_valM B Register RegisterM file file W_valE E D Fetch icode ifun rA rB valC Instruction Instruction memory memory valP PC PC increment increment Predict PC f_PC M_valA Select PC F W_valM predPC CS429 Slideset 16: 8 Pipeline III Limitation of Forwarding # prog5 1 2 3 4 5 0x000: irmovl $128,%edx F D E M W F D F E M E D F 0x006: irmovl $3,%ecx 0x00c: rmmovl %ecx, 0(%edx) 0x012: irmovl $10,%ebx 0x018: mrmovl 0(%edx),%eax # Load %eax D F 6 W M E D F 0x01e: addl %ebx,%eax # Use %eax 0x020: halt 7 8 9 10 11 W M E D F W M E D W M E W M W Cycle 7 Cycle 8 M M_dstE = %ebx M_valE = 10 M M_dstM = %eax m_valM f M[128] = 3 • • • D ? M_valE = 10 valA f ? R[%eax] = 0 valB f Error Load-use dependency: Value needed by end of decode stage in cycle 7. Value read from memory in memory stage of cycle 8. CS429 Slideset 16: 9 Pipeline III Avoiding Load/Use Hazard # prog5 1 2 3 4 5 6 7 0x000: irmovl $128,%edx F D F E D F M E D W M E W M W F D F E D M E F D F 0x006: irmovl $3,%ecx 0x00c: rmmovl %ecx, 0(%edx) 0x012: irmovl $10,%ebx 0x018: mrmovl 0(%edx),%eax # Load %eax bubble 0x01e: addl %ebx,%eax # Use %eax 0x020: halt 8 9 10 11 W M E D F W M E D W M E W M 12 W Cycle 8 W W_dstE = %ebx W_valE = 10 M M_dstM = %eax ? M[128] = 3 m_valM f • • • D ? W_valE = 10 valA f ? m_valM = 3 valB f Stall using instruction for one cycle. Can the pickup loaded value by forwarding from memory stage. CS429 Slideset 16: 10 Pipeline III Control for Load/Use Hazard # prog5 1 2 3 4 5 0x000: irmovl $128,%edx F D E M W F D E M W F D F E D F 0x006: irmovl $3,%ecx 0x00c: rmmovl %ecx, 0(%edx) 0x012: irmovl $10,%ebx 0x018: mrmovl 0(%edx),%eax # Load %eax 6 7 8 M E W M W D E M W D F E D F M E D bubble 0x01e: addl %ebx,%eax # Use %eax F 0x020: halt 9 10 11 W M E W M Cycle 8 Stall instructions in fetch and decode stages Inject bubble into execute stage. Condition Load/Use Hazard F stall D stall CS429 Slideset 16: 11 E bubble Pipeline III W W_dstE = %ebx W_valE = 10 M M W normal normal M_dstM = %eax ? M[128] = 3 m_valM f • • • 12 W Branch Misprediction Example 0x000: xorl %eax, %eax 0x002: jne t 0x007: irmovl $1, %eax 0x00d: nop 0x00e: nop 0x00f: nop 0x010: halt 0x011: t: irmovl $2, %edx 0x017: irmovl $3, %ecx 0x01d: irmovl $4, %edx # Not taken # Fall through # Target (should not execute) # Should not execute # Should not execute Should only execute the first 7 instructions. CS429 Slideset 16: 12 Pipeline III Handling Misprediction # prog8 1 2 3 4 5 6 0x000: xorl %eax,%eax F D F E D F M E D W M W E M W D F E D F 0x002: jne target # Not taken 0x00e: irmovl $2,%edx # Target bubble 0x014: irmovl $3,%ebx # Target+1 7 8 9 M E W M W D E M 10 F bubble 0x007: irmovl $2,%edx # Fall through 0x00d: halt W Predict branch as taken Fetch 2 instructions at target Cancel when mispredicted Detect branch not taken in execute stage On following cycle, replace instruction in execute and decode stage by bubbles. No side effects have occurred yet. CS429 Slideset 16: 13 Pipeline III Control for Misprediction # prog8 1 2 3 4 5 6 0x000: xorl %eax,%eax F D F E D F M E D W M W E M W D F E D F 0x002: jne target # Not taken 0x00e: irmovl $2,%edx # Target bubble 0x014: irmovl $3,%ebx # Target+1 bubble 0x00d: halt F normal D bubble CS429 Slideset 16: 14 8 9 M E W M W D E M 10 F 0x007: irmovl $2,%edx # Fall through Condition Mispredicted Branch 7 E bubble Pipeline III M normal W normal W Return Example 0x000: 0x006: 0x00b: 0x011: 0x020: 0x020: 0x026: 0x027: 0x02d: 0x033: 0x039: 0x100: 0x100: irmovl call irmovl halt .pos 0x20 p: irmovl ret irmovl irmovl irmovl irmovl .pos 0x100 Stack: Stack, % esp p $5, %esi # Initialize stack pointer # Procedure call # Return point $-1, %edi # procedure $1, $2, $3, $4, # # # # %eax %ecx %edx %ebx should should should should be be be be # Stack pointer Previously executed three additional instructions. CS429 Slideset 16: 15 not not not not Pipeline III executed executed executed executed Correct Return Example 0x026: 0x00b: ret bubble bubble bubble irmovl $5, %esi # Return As ret passes through pipeline, stall at fetch stage—while in decode, execute, and memory stages. Inject bubble into decode stage. Release stall when reach write-back stage. CS429 Slideset 16: 16 Pipeline III Control for Return 0x026: 0x00b: ret bubble bubble bubble irmovl Condition Processing ret $5, %esi F stall D bubble CS429 Slideset 16: 17 # Return E normal Pipeline III M normal W normal Special Control Cases Detection: Condition Processing ret Load/Use Hazard Mispredicted Branch Trigger IRET in { D_icode, E_icode, M_icode } E_icode in {IMRMOVL, IPOPL} && E_dstM in {d_srcA, d_srcB} E_icode == IJXX & !e_Bch Action (on next cycle): Condition F Processing ret stall Load/Use Hazard stall Mispredicted normal Branch D bubble stall bubble CS429 Slideset 16: 18 E normal bubble bubble Pipeline III M normal normal normal W normal normal normal
© Copyright 2025 ExpyDoc