Parallel Processing & Parallel Algorithm May 8, 2003 B4 Yuuki Horita Chapter 3 Principles of Parallel Programming Principles of Parallel Programming • • • • Data Dependency Processors Communication Mapping Granularity Data Dependency • • • • • data flow dependency data anti-dependency data output dependency data input dependency data control dependency data flow dependency & data anti-dependency - data flow dependency S1: A = B + C S2: D = A + E - data anti-dependency S1: A = B + C S2: B = D + E data output dependency & data input dependency - data output dependency S1: A = B + C S2: A = D + E - data input dependency S1: A = B + C S2: D = B + E data control dependency S1: A = B - C if ( A > 0 ) S2: then D = 1; S3: else D = 2; endif S1⇒S2,S3 Dependency Graph • G = ( V, E ) V(Vertices) : statements E(Edges) : dependency ex) S1: A = B + C S2: B = A * 3 S3: A = 2 * C S4: P = B ≧0 if ( P is True) S5: then D = 1 S6: else D = 2 endif S1 → S2 : flow dep. , anti-dep. S1 → S3 : output dep. , input dep. S2 → S3 : anti-dep. S2 → S4 : flow dep. S4 → S5 , S6 : control dep. Elimination of the dependencies • data output dependency • data anti-dependency ⇒ renaming can remove these form of dependency Elimination of the dependencies(2) Ex) S1: A = B + C S2: B = D + E S3: A = F + G Renaming S1,S2 : anti-dep. S1,S3 : output-dep. S1: A = B + C S2: B’ = D + E S3: A’ = F + G No dependency! Processors communication • Message passing communication - processors communicate via communication links • Shared memory communication - processors communicate via common memory Message Passing System Interconnection Network PE1 PE2 PE3 ・・・ PEm M1 M2 M3 ・・・ Mm Message Passing System(2) • Send and Receive operations - blocking - nonblocking • System - synchronous blocking send and blocking receive operations - asynchronous nonblocking send and blocking receive operations ( the messages are buffered ) Shared memory system Global Memory M1 M2 M3 ・・・ Mm Interconnection Network PE1 PE2 PE3 ・・・ PEm Mapping … matching parallel algorithms to parallel architecture mapping to - asynchronous architecture - synchronous architecture - distributed architecture Mapping to Asynchronous Architecture Mapping a program to an asynchronous sharedmemory computer has the following steps: 1. 2. 3. Allocate each statement in the program to a processor Allocate each variable to a memory Specify the control flow for each processor the sender may send many messages without the receiver removing them from the channel ⇒ the need for buffering messages Mapping to Synchronous Architecture • • • • Mapping a program is the same as Asynchronous Architecture a common clock for synchronization purposes each processor executes an instruction at each step ( at each clock tick) only one message exists at any one time on the channel ⇒ no buffering is needed Mapping to Distributed Architecture • A local memory accessible only by owning processor • only a pair of processor along the channel • Mapping a program is the same as in the asynchronous shared-memory architecture, except that each variable is allocated either to a processor or a channel Granularity relates to the ratio of the amount of computation to the amount of communication • fine : at statement level • medium : at procedure level • coarse : at program level Program Level Parallelism • a program creates a new process by creating a complete copy of itself - Fork() (UNIX) Statement Level Parallelism ・ Parbegin-Parend block Par-begin Statement1 Statement2 : Statementn Par-end ⇒ the statements Statement1~Statementn are executed in parallel Statement Level Parallelism(2) ex) (a + b) * (c + d) – (e / f ) Par-begin Par-begin t1 = a + b t2 = c + d Par-end t4 = t1 * t2 t3 = e / f Par-end t5 = t4 – t3 Statement Level Parallelism(3) ・ Fork, Join, Quit - Fork x cause a new process to be created and to start executing at the instruction labeled x - Join t, y t=t–1 if ( t = 0) then go to y - Quit the process terminates Statement Level Parallelism(4) ex) (a + b) * (c + d) – (e / f ) P1: P2: P4: P3: P5: n=2 m=2 Fork P2 Fork P3 t1 = a + b; Join m, P4; Quit; t2 = c + d; Join m, P4; Quit; t4 = t1*t2; Join n, P5; Quit; t3 = e / f; Join n, P5; Quit; t5 = t4 – t3
© Copyright 2025 ExpyDoc