January 25, 2011
1
Sample of the correct code for the last week's report.
#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#include "mpi.h"
#define N 1000
int main(int argc, char *argv[]) {
double *a, *b, local_c, c, r, *temp;
int
i, p, myid, procs, local_size;
struct timeval tv;
MPI_Status status;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &procs);
MPI_Comm_rank(MPI_COMM_WORLD, &myid);
local_size = N/procs;
a = (double *)malloc(local_size * sizeof(double));
b = (double *)malloc(local_size * sizeof(double));
if (myid == 0){
gettimeofday(&tv, NULL);
srand(tv.tv_usec);
for (i = 0; i < local_size; i++){
r = rand();
a[i]=r/RAND_MAX;
}
for (i = 0; i < local_size; i++){
r = rand();
b[i]=r/RAND_MAX;
}
temp = (double *)malloc(local_size * sizeof(double));
for (p = 1; p < procs; p++){
for (i = 0; i < local_size; i++){
r = rand();
temp[i]=r/RAND_MAX;
}
MPI_Send(temp, local_size, MPI_DOUBLE, p, 0,
MPI_COMM_WORLD);
for (i = 0; i < local_size; i++){
r = rand();
temp[i]=r/RAND_MAX;
}
MPI_Send(temp, local_size, MPI_DOUBLE, p, 0,
MPI_COMM_WORLD);
}
} else{
MPI_Recv(a, local_size, MPI_DOUBLE, 0, 0,
MPI_COMM_WORLD, &status);
MPI_Recv(b, local_size, MPI_DOUBLE, 0, 0,
MPI_COMM_WORLD, &status);
}
local_c = 0.0;
for (i = 0; i < local_size; i++)
local_c = local_c + a[i]*b[i];
MPI_Reduce(&local_c, &c, 1, MPI_DOUBLE, MPI_SUM, 0,
MPI_COMM_WORLD);
if (myid == 0)
printf("Result: %e \n", c);
free(a);
free(b);
if (myid == 0)
free(temp);
MPI_Barrier(MPI_COMM_WORLD);
MPI_Finalize();
}
2
An answer that uses MPI_Bcast
#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#include "mpi.h"
#define N 1000
int main(int argc, char *argv[]) {
double *a, *b, c, sum, r;
double t1, t2;
int i;
int myid,procs;
struct timeval tv;
a = (double *)malloc(N * sizeof(double));
b = (double *)malloc(N * sizeof(double));
c = 0.0;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &myid);
MPI_Comm_size(MPI_COMM_WORLD, &procs);
MPI_Barrier(MPI_COMM_WORLD);
if(0 == myid){
t1 = MPI_Wtime();
gettimeofday(&tv, NULL);
srand(tv.tv_usec);
for (i = 0; i < N; i++){
r = rand();
a[i]=r/RAND_MAX;
}
for (i = 0; i < N; i++){
r = rand();
b[i]=r/RAND_MAX;
}
}
MPI_Bcast(a, N, MPI_DOUBLE, 0, MPI_COMM_WORLD);
MPI_Bcast(b, N, MPI_DOUBLE, 0, MPI_COMM_WORLD);
for (i = (N/procs)*myid; i < (N/procs)*(myid+1); i++){
c = c + a[i]*b[i];
}
MPI_Reduce(&c, &sum, 1, MPI_DOUBLE, MPI_SUM, 0,
MPI_COMM_WORLD);
MPI_Barrier(MPI_COMM_WORLD);
if(0 == myid){
t2 = MPI_Wtime();
printf("Result: %e\n", sum);
printf("number of processes:%d\n",procs);
printf("Elapsed time: %e sec.\n", t2 - t1);
}
free(a);
free(b);
MPI_Finalize();
}
Compare to the previous slide:
- The structure of the program is simpler.
- Consumes more memory.
- Does more amount of communications.
--> Can be slower.
3
実行結果
Result of execution
プロセス数を増やすと遅くなる!
The program gets slow down as the number of processes increased.
$ mpirun -np 8 ./mpi-2.exe
Result: 2.582018e+02
number of processes: 8
Elapsed time: 2.067089e-03 sec.
$ mpirun -np 2 ./mpi-2.exe
Result: 2.565114e+02
number of processes: 2
Elapsed time: 5.409718e-04 sec.
$ mpirun -np 1 ./mpi-2.exe
Result: 2.539768e+02
number of processes: 1
Elapsed time: 1.161098e-04 sec.
今回のプログラムは計算量が少なすぎて
並列化の効果よりも通信時間の方が
長かった.
ベクトルの内積: O(N)
行列同士の積: O(N3)
Amount of calculation in this program is too small.
So, the effect of the parallelization is much
smaller than the time for communication.
Dot-product of vectors: O(N)
Matrix-Matrix multiply: O(N3)
4
Today's Topic
ノンブロッキング通信
Non-Blocking Communication
通信の完了を待つ間に他の処理を行う
Execute other instructions while waiting for the completion of a communication.
集団通信関数の実装
Implementation of collective communications
デッドロック Deadlock
5
ノンブロッキング通信関数
Non-blocking communication functions
ノンブロッキング = ある命令の完了を待たずに次の命令に移る
Non-blocking = Do not wait for the completion of an
instruction and proceed to the next instruction
Example) MPI_Irecv & MPI_Wait
Blocking
Non-Blocking
MPI_Recv
Wait for the
arrival of data
data
Proceed to the next
instruction without
waiting for the data
MPI_Irecv
next instructions
MPI_Wait
data
next instructions
6
MPI_Irecv
Usage:
int MPI_Irecv(void *b, int c, MPI_Datatype d, int src,
int t, MPI_Comm comm, MPI_Request *r);
Non-Blocking Receive
Parameters:
start address for storing received data,
number of elements, data type,
rank of the source, tag (= 0, in most cases),
communicator (= MPI_COMM_WORLD, in most cases),
request
request: 通信要求 Communication Request
この通信の完了を待つ際に用いる
Used for Waiting completion of this communication
Example)
MPI_Request req;
...
MPI_Irecv(a, 100, MPI_INT, 0, 0, MPI_COMM_WORLD, &req);
...
MPI_Wait(&req, &status);
7
MPI_Isend
Usage:
int MPI_Send(void *b, int c, MPI_Datatype d,
int dest, int t, MPI_Comm comm);
Non-Blocking Send
Parameters:
start address for sending data,
number of elements, data type,
rank of the destination, tag (= 0, in most cases),
communicator (= MPI_COMM_WORLD, in most cases),
request
Example)
MPI_Request req;
...
MPI_Isend(a, 100, MPI_INT, 1, 0, MPI_COMM_WORLD, &req);
...
MPI_Wait(&req, &status);
8
Non-Blocking Send?
Blocking send (MPI_Send):
送信データが別の場所にコピーされるのを待つ
Wait for the data to be copied to somewhere else.
ネットワークにデータを送出し終わるか、一時的にデータのコピーを作成
するまで。
Until completion of the data to be transferred to the network
or, until completion of the data to be copied to a temporal memory.
Non-Blocking send (MPI_Recv):
待たない
9
Notice: ノンブロッキング通信中はデータが不定
Data is not sure in non-blocking communications
MPI_Irecv:
受信データの格納場所と指定した変数の値は MPI_Waitまで不定
Value of the variable specified for receiving data is not fixed
before MPI_Wait
Value of A at here
can be 10 or 50
MPI_Irecv to A
A
10
arrived
data
...
~=A
...
A
50
50
MPI_Wait
Value of A is 10
~=A
10
Notice: ノンブロッキング通信中はデータが不定
Data is not sure in non-blocking communications
MPI_Isend:
送信データを格納した変数を MPI_Waitより前に書き換えると、実際
に送信される値は不定
If the variable that stored the data to be sent is modified before
MPI_Wait, the value to be actually sent is unpredictable.
Modifying value of A here
causes incorrect
communication
You can modify value of A
at here without any
problem
MPI_Isend A
...
A = 50
...
A
10
A
50
data sent
10 or 50
MPI_Wait
A = 100
11
MPI_Wait
Usage:
int MPI_Wait(MPI_Request *req, MPI_Status *stat);
ノンブロッキング通信(MPI_Isend、 MPI_Irecv)の完了を待つ。
Wait for the completion of MPI_Isend or MPI_Irecv
送信データの書き換えや受信データの参照が行える
Make sure that sending data can be modified,
or receiving data can be referred.
Parameters:
request, status
status:
MPI_Irecv 完了時に受信データの statusを格納
The status of the received data is stored at the
completion of MPI_Irecv
12
MPI_Waitall
Usage:
int MPI_Waitall(int c,
MPI_Request *requests, MPI_Status *statuses);
指定した数のノンブロッキング通信の完了を待つ
Wait for the completion of specified number of nonblocking communications
Parameters:
count, requests, statuses
count:
ノンブロッキング通信の数
The number of non-blocking communications
requests, statuses:
少なくとも count個の要素を持つ MPI_Request と MPI_Status
の配列
Arrays of MPI_Request or MPI_Status that consists at
least 'count' number of elements.
13
集団通信関数の中身
Inside of the functions of collective communications
通常,集団通信関数は,
MPI_Send, MPI_Recv, MPI_Isend, MPI_Irecv
等の一対一通信で実装される
Usually, functions of collective communications are
implemented by using message passing functions.
14
Inside of MPI_Bcast
One of the most simple implementations
int MPI_Bcast(char *a, int c, MPI_Datatype d, int root,
MPI_Comm comm)
{
int i, myid, procs;
MPI_Status st;
MPI_Comm_rank(comm, &myid);
MPI_Comm_rank(comm, &procs);
if (myid == root){
for (i = 0; i < procs)
if (i != root)
MPI_Send(a, c, d, i, 0, comm);
} else{
MPI_Recv(a, c, d, root, 0, comm, &st);
}
return 0;
}
15
Another implementation: With MPI_Isend
int MPI_Bcast(char *a, int c, MPI_Datatype d, int root,
MPI_Comm comm)
{
int i, myid, procs;
MPI_Status st, *stats;
MPI_Request *reqs;
MPI_Comm_rank(comm, &myid);
MPI_Comm_rank(comm, &procs);
if (myid == root){
stats = (MPI_Status *)malloc(sizeof(MPI_Status)*procs);
reqs = (MPI_Request *)malloc(sizeof(MPI_Request)*procs);
for (i = 0; i < procs)
if (i != root)
MPI_Isend(a, c, d, i, 0, comm);
MPI_Waitall(procs, reqs, stats);
free(stats);
free(reqs);
} else{
MPI_Recv(a, c, d, root, 0, comm, &st);
}
return 0;
}
16
Another implementation: Binomial Tree
int MPI_Bcast(char *a, int c, MPI_Datatype d,
int root, MPI_Comm comm)
{
int i, myid, procs;
MPI_Status st;
int mask, relative_rank, src, dst;
int tag = 1, success = 0;
MPI_Comm_rank(comm, &myid);
MPI_Comm_rank(comm, &procs);
relative_rank = myid - root;
if (relative_rank < 0)
relative_rank += procs;
mask = 1;
while (mask < num_procs){
if (relative_rank & mask){
src = myid - mask;
if (src < 0) src += procs;
MPI_Recv(a, c, d, src, 0, comm, &st);
break;
}
mask <<= 1;
}
mask >>= 1;
while (mask > 0){
if (relative_rank + mask < procs){
dst = myid + mask;
if (dst >= procs) dst -= procs;
MPI_Send (a, c, d, dst, 0, comm);
}
mask >>= 1;
}
return 0;
}
17
Flow of Binomial Tree
Use 'mask' to determine when and how to Send/Recv
Rank 0
Rank 1
Rank 2
Rank 3
Rank 4
Rank 5
Rank 6
Rank 7
mask = 1 mask = 1 mask = 1 mask = 1 mask = 1 mask = 1 mask = 1 mask = 1
mask = 2 Recv
mask = 2 Recv
mask = 2 Recv
mask = 2 Recv
mask = 4
mask = 4
from 6
from 4 Recv
from 0 Recv
from 2 Recv
from 4
mask = 4
from 0
from
0
Send to 4
mask = 2
Send to 2
mask = 1
Send to 3
mask = 2
Send to 6
mask = 1
Send to 5
mask = 1
Send to 7
mask = 1
Send to 1
18
Deadlock
何らかの理由で、プログラムを進行させることができなくなった状態
A status of a program in which it cannot proceed by some reasons.
MPIプログラムでデッドロックが発生しやすい場所:
Places you need to be careful for deadlocks:
1. MPI_Recv, MPI_Wait, MPI_Waitall
Wrong case:
if (myid == 0){
MPI_Recv from rank 1
MPI_Send to rank 1
}
if (myid == 1){
MPI_Recv from rank 0
MPI_Send to rank 1
}
One solution:
use MPI_Irecv
if (myid == 0){
MPI_Irecv from rank 1
MPI_Send to rank 1
MPI_Wait
}
if (myid == 1){
MPI_Irecv from rank 0
MPI_Send to rank 0
MPI_Wait
}
2. Collective communications
全部のプロセスが同じ集団通信関数を実行するまで先に進めない
A program cannot proceed until all processes call
the same collective communication function
Summary
並列プログラムの作成には,
計算の分割,データの分割,通信が必要
Parallel programs need distribution of computation,
distribution of data and communications.
並列化で必ず高速化できるとは限らない
Parallelization does not always speed up programs.
並列化出来ないプログラムがある
There are non-parallelizable programs
並列プログラムではデッドロックに注意
Be careful about deadlocks.
20
Report) Make Reduce function by yourself
次のページのプログラムの my_reduce関数の中身を追加
してプログラムを完成させる
Fill the inside of 'my_reduce' function in the program
shown in the next slide
my_reduce: MPI_Reduceの簡略版
Simplified version of MPI_Reduce
整数の総和のみ. ルートランクは 0限定.
コミュニケータは MPI_COMM_WORLD
Calculates total sum of integer numbers. The root rank is always 0.
The communicator is always MPI_COMM_WORLD.
アルゴリズムは好きなものを考えてよい
Any algorithm is OK.
21
#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"
#define N 20
int my_reduce(int *a, int *b, int c)
{
complete here by yourself
return 0;
}
int main(int argc, char *argv[])
{
int i, myid, procs;
int a[N], b[N];
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &myid);
MPI_Comm_size(MPI_COMM_WORLD, &procs);
for (i = 0; i < N; i++){
a[i] = i;
b[i] = 0;
}
my_reduce(a, b, N);
if (myid == 0)
for (i = 0; i < N; i++)
printf("b[%d] = %d , correct answer = %d\n", i, b[i], i*procs);
MPI_Finalize();
return 0;
}
22
© Copyright 2026 ExpyDoc