スライド 1

January 10, 2012
1
並列処理：複数の演算器で仕事を分担
Parallel Processing: Share a job among multiple processors
 Multi-Core / Multi-CPU PC
 1台の計算機内 within one PC
 小規模問題向け for small problems
 Cluster of PCs / Supercomputer
 複数の計算機を相互接続
Interconnect computers
 中規模～大規模問題向け
for middle- to large-scale problems
計算機間で通信が必要
Communication is required among computers
2
どうやって、プログラムに通信を記述するか？
How to Describe Communications in a Program?
 TCP, UDP ?
 Good:
- 多くのネットワークに実装されており，可搬性が高い．
Portable: Available on many networks.
 Bad:
- 接続やデータ転送の手続きが複雑
Protocols for connections and data-transfer are complicated.
- 広域ネットワークを対象に設計されており，オーバーヘッドが大きい．
High overhead, since they are designed for wide-area
(= unreliable) networks.
記述可能だが，並列処理には適さない
Possible. But not suitable for parallel processing.
3
MPI (Message Passing Interface)
 並列計算向けに設計された通信関数群
A set of communication functions designed for
parallel processing
 C, C++, Fortranのプログラムから呼び出し
Can be called from C/C++/Fortran programs.
 "Message Passing" = Send + Receive
 実際には，Send, Receive 以外にも多数の関数を利用可能．
Actually, more functions other than Send and Receive are available.
 ともかく、プログラム例を見てみましょう
Let's see a sample program, first.
4
#include <stdio.h>
#include "mpi.h"
int main(int argc, char *argv[])
{
int myid, procs, ierr, i;
double myval, val;
MPI_Status status;
FILE *fp;
char s[64];
Setup MPI environment
MPI_Init(&argc, &argv);
Get own process ID (= rank)
MPI_Comm_rank(MPI_COMM_WORLD, &myid);
Get total number of processes
MPI_Comm_size(MPI_COMM_WORLD, &procs);
If my ID is 0
if (myid == 0) {
fp = fopen("test.dat", "r");
input data for this process and keep it in myval
fscanf(fp, "%lf", &myval);
i = １～procs－１
for (i = 1; i < procs; i++){
input data and keep it in val
fscanf(fp, "%lf", &val);
MPI_Send(&val, 1, MPI_DOUBLE, i, 0, MPI_COMM_WORLD);
}
use MPI_Send to send value in val to process i
fclose(fp);
} else
MPI_Recv(&myval, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD, &status);
processes with ID other than 0
use MPI_Recv to receive data from process 0 and keep it in myval
printf("PROCS: %d, MYID: %d, MYVAL: %e\n", procs, myid, myval);
MPI_Finalize();
print-out its own myval
return 0;
}
end of parallel computing
5
プログラム例の実行の流れ
Flow of the sample program.
 複数の"プロセス"が，自分の番号（ランク）に応じて実行
Multiple "Processes" execute the program according to their number (= rank).
rank 0
rank 1
read data
myval
from a file
rank 2
read data
from a file
receive data
from rank 0
val
wait for the arrival of
the data
send val
to rank 1
read data
from a file
send val
to rank 2
print myval
receive data
from rank 0
myval
val
print myval
wait for the arrival of
the data
myval
print myval
6
実行例
Sample of the Result of Execution
 各プロセスがそれぞれ勝手に表示するので、表示の順番は
毎回変わる可能性がある。
The order of the output can be different,
since each process proceeds execution independently.
PROCS:
PROCS:
PROCS:
PROCS:
4
4
4
4
MYID:
MYID:
MYID:
MYID:
1
2
0
3
MYVAL:
MYVAL:
MYVAL:
MYVAL:
20.0000000000000000
30.0000000000000000
10.0000000000000000
40.0000000000000000
rank
rank
rank
rank
1
2
0
3
7
MPIインタフェースの特徴
Characteristics of MPI Interface
 MPI プログラムは，普通の C言語プログラム
MPI programs are ordinal programs in C-language
 Not a new language
 各プロセスが同じプログラムを実行する
Every process execute the same program
 ランク（＝プロセス番号）を使って，プロセス毎に違う仕事を実行
Each process executes its own work according to its rank(=process number)
 他のプロセスの変数を直接見ることはできない。
A process cannot read or write variables on other process directly
Rank 0
Read file
Read file
myval
Rank 2
val
Receive
Send
Read file
Rank 1
val
Receive
myval
Print myval
Send
Print myval
myval
Print myval
8
TCP, UDP vs MPI
 MPI:並列計算に特化したシンプルな通信インタフェース
Simple interface dedicated for parallel computing
 SPMD(Single Program Multiple Data-stream) model
 全プロセスが同じプログラムを実行
All processes execute the same program
 TCP, UDP: 各種サーバ等，様々な用途を想定した汎用的な
通信インタフェース
Generic interface for various communications,
such as internet servers
 Server/Client model
 各プロセスが自分のプログラムを実行
Each process executes its own program.
9
TCP Client
sock = socket(PF_INET, SOCK_STREAM, IPPROTO_TCP);
memset(&echoServAddr, 0, sizeof(echoServAddr));
echoServAddr.sin_family = AF_INET;
echoServAddr.sin_addr.s_addr = inet_addr(servIP);
echoServAddr.sin_port = htons(echoServPort);
connect(sock, (struct sockaddr *) &echoServAddr,
sizeof(echoServAddr));
echoStringLen = strlen(echoString);
send(sock, echoString, echoStringLen, 0);
initialize
int main(int argc, char *argv[])
{
int myid, procs, ierr, i;
double myval, val;
MPI_Status status;
FILE *fp;
char s[64];
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &myid);
MPI_Comm_size(MPI_COMM_WORLD, &procs);
if (myid == 0) {
fp = fopen("test.dat", "r");
fscanf(fp, "%lf", &myval);
for (i = 1; i < procs; i++){
fscanf(fp, "%lf", &val);
MPI_Send(&val, 1, MPI_DOUBLE, i, 0, MPI_COMM_WORLD);
}
fclose(fp);
} else
MPI_Recv(&myval, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD, &status);
totalBytesRcvd = 0;
printf("Received: ");
while (totalBytesRcvd < echoStringLen){
bytesRcvd = recv(sock, echoBuffer, RCVBUFSIZE - 1, 0);
totalBytesRcvd += bytesRcvd;
echoBuffer[bytesRcvd] = '\0' ;
printf(echoBuffer);
}
printf("\n");
close(sock);
initialize
TCP Server
servSock = socket(PF_INET, SOCK_STREAM, IPPROTO_TCP);
memset(&echoServAddr, 0, sizeof(echoServAddr));
echoServAddr.sin_family = AF_INET;
echoServAddr.sin_addr.s_addr = htonl(INADDR_ANY);
echoServAddr.sin_port = htons(echoServPort);
bind(servSock, (struct sockaddr *) &echoServAddr,
sizeof(echoServAddr));
listen(servSock, MAXPENDING);
for (;;){
clntLen = sizeof(echoClntAddr);
clntSock = accept(servSock,(struct sockaddr *)&echoClntAddr,
&clntLen);
recvMsgSize = recv(clntSock, echoBuffer, RCVBUFSIZE, 0);
while (recvMsgSize > 0){
send(clntSock, echoBuffer, recvMsgSize, 0);
recvMsgSize = recv(clntSock, echoBuffer, RCVBUFSIZE, 0);
}
close(clntSock);
}
MPI
#include <stdio.h>
#include "mpi.h"
printf("PROCS: %d, MYID: %d, MYVAL: %e\n", procs, myid, myval);
MPI_Finalize();
return 0;
}
initialize
10
MPIの位置づけ
Layer of MPI
 ネットワークの違いを、MPIが隠ぺい
Hide the differences of networks
Applications
MPI
Sockets
TCP
…
XTI
UDP
IP
Ethernet driver,
Ethernet card
…
…
High-Speed
Interconnect
(InfiniBand, etc.)
11
MPIプログラムのコンパイル，実行
How to compile & execute MPI programs
 Compile command:
mpicc
Example)
mpicc -O3 test.c -o test.exe
optimization option
O is not 0
source file
to compile
 Execution command:
executable file
to create
mpirun
Example)
mpirun -np 8 ./test.exe
number of
processes
executable file
to execute
12
Ex 0) MPIプログラムの実行
Execution of an MPI program
 psihexaにログインして、以下を実行しなさい。
Login to psihexa, and try the following commands.
$
$
$
$
$
$
cp /tmp/test-mpi.c .
cp /tmp/test.dat .
cat test-mpi.c
cat test.dat
mpicc test-mpi.c –o test-mpi
mpirun -np 8 ./test-mpi
 時間に余裕があったら，プロセス数を変えたり，
プログラムを書き換えたりしてみる．
Try changing the number of processes,
or modifying the source program.
MPIライブラリ
MPI Library
 MPI関数の実体は，MPIライブラリに格納されている
The bodies of MPI functions are in "MPI Library".
 mpicc が自動的に MPIライブラリをプログラムに結合する
mpicc links the library to the program
mpicc
main()
{
MPI_Init(...);
...
MPI_Comm_rank(...);
...
MPI_Send(...);
...
}
source program
compile
link
Executable
file
MPI_Init
MPI_Comm_rank
...
MPI Library
14
MPIプログラムの基本構造
Basic Structure of MPI Programs
#include <stdio.h>
#include "mpi.h"
Crucial lines
header file
"mpi.h"
int main(int argc, char *argv[])
{
...
MPI_Init(&argc, &argv);
Function for start-up
...
MPI_Comm_rank(MPI_COMM_WORLD, &myid);
MPI_Comm_size(MPI_COMM_WORLD, &procs);
...
MPI_Send(&val, 1, MPI_DOUBLE, i, 0, MPI_COMM_WORLD);
...
MPI_Recv(&myval, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD, &status);
You can call
MPI functions
in this area
...
MPI_Finalize();
Functions for finish
return 0;
}
15
今日の MPI関数
MPI Functions Today
 MPI_Init
 Initialization
 MPI_Finalize
 Finalization
 MPI_Comm_size
 Get number of processes
 MPI_Comm_rank
 Get rank (= Process number) of this process
 MPI_Send & MPI_Recv
 Message Passing
 MPI_Bcast & MPI_Gather
 Collective Communication ( = Group Communication )
16
MPI_Init
Usage:
int MPI_Init(int *argc, char **argv);
 MPIの並列処理開始
Start parallel execution of in MPI
 プロセスの起動やプロセス間通信路の確立等。
Start processes and establish connections
among them.
#include <stdio.h>
#include "mpi.h"
 他のMPI関数を呼ぶ前に、
int main(int argc, char *argv[])
必ずこの関数を呼ぶ。
{
Most be called once before calling
int myid, procs, ierr;
double myval, val;
otherMPI functions
 引数：
Parameter:
Example
MPI_Status status;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &myid);
MPI_Comm_size(MPI_COMM_WORLD, &procs);
...
 main関数の2つの引数へのポインタを渡す。
Specify pointers of both of the arguments of 'main' function.

各プロセス起動時に実行ファイル名やオプションを共有
するために参照。
Each process most share the name of the executable file,
and the options given to the mpirun command.
17
MPI_Finalize
Usage:
int MPI_Finalize();
 並列処理の終了
Finishes paralles execution
 このルーチン実行後はMPIルーチンを
呼び出せない
MPI functions cannot be called
after this function.
Example
main()
{
...
MPI_Finalize();
}
 プログラム終了前に全プロセスで必ずこのルーチンを実行させる。
Every process needs to call this function before exitting the program.
18
MPI_Comm_rank
Usage:
int MPI_Comm_rank(MPI_Comm comm, int *rank);
 そのプロセスのランクを取得する
Get the rank(= process number) of the process
 2番目の引数に格納
Returned in the second argument
Example
...
MPI_Comm_rank(MPI_COMM_WORLD, &myid);
...
 最初の引数 = “コミュニケータ”
1st argument = "communicator"
 プロセスのグループを表す識別子
An identifier for the group of processes
 通常は，MPI_COMM_WORLD を指定
In most cases, just specify MPI_COMM_WORLD, here.


MPI_COMM_WORLD: 実行に参加する全プロセスによるグループ
a group that consists all of the processes in this execution
プロセスを複数のグループに分けて、それぞれ別の仕事をさせることも可能
Processes can be devided into multiple groups and attached different jobs.
19
MPI_Comm_size
Usage:
int MPI_Comm_size(MPI_Comm comm, int *size);
 プロセス数を取得する
Get the number of processes
 2番目の引数に格納される
Example
...
MPI_Comm_size(MPI_COMM_WORLD, &procs);
...
20
一対一通信
Message Passing
 送信プロセスと受信プロセスの間で行われる通信
Communication between "sender" and "receiver"
 送信関数と受信関数を，"適切"に呼び出す．
Functions of Sending and Receiving most be called in a correct manner.
 "From" rank and "To" rank are correct
 Specified size of the data to be transferred is the same on both side
 Same "Tag" is specified on both side
Rank 0
Send
To: Rank 1
Size: 10 Integer data
Tag: 100
Rank 1
Receive
From: Rank 0
Size: 10 Integer data
Tag: 100
Wait for the
message
21
Usage:
int MPI_Send(void *b, int c, MPI_Datatype d,
int dest, int t, MPI_Comm comm);
MPI_Send
 送信内容
Information of the message to send
開始アドレス,
number of elements 要素数,
data type データ型,
rank of the destination 送信先,
 start address of the data
Example
...
MPI_Send(&val, 1, MPI_DOUBLE, i, 0,
MPI_COMM_WORLD);
...
tag,
communicator (= MPI_COMM_WORLD, in most cases)
MPI_INT
 data types: Integer
Real(Single)
MPI_FLOAT
Real(Double)
MPI_DOUBLE
Character
MPI_CHAR
 tag：メッセージに付ける番号（整数）
The number attached to each message


不特定のプロセスから届く通信を処理するタイプのプログラムで使用
Used in a kind of programs that handles anonymous messages.
通常は、0 を指定しておいて良い. Usually, you can specify 0.
22
Example of MPI_Send
 整数変数 d の値を送信（整数1個）
Send the value of an integer variable 'd'
MPI_Send(&d, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
 実数配列 mat の最初の要素から100番目の要素までを送信
Send first 100 elements of array 'mat' (with MPI_DOUBLE type)
MPI_Send(mat, 100, MPI_DOUBLE, 1, 0, MPI_COMM_WORLD);
 整数配列 data の10番目の要素から50個を送信
Send elements of an integer array 'data' from 10th to 59th element
MPI_Send(&(data[10]), 50, MPI_INT, 1, 0, MPI_COMM_WORLD);
23
MPI_Recv
Usage:
int MPI_Recv(void *b, int c, MPI_Datatype d, int src,
int t, MPI_Comm comm, MPI_Status *st);
 Information of the message to receive
 start address for storing data
受信データ格納用の開始アドレス, Example
number of elements 要素数,
...
MPI_Recv(&myval, 1, MPI_DOUBLE, 0, 0,
data type データ型,
MPI_COMM_WORLD &status);
rank of the source 送信元,
...
tag (= 0, in most cases),
communicator (= MPI_COMM_WORLD, in most cases),
status
 status: メッセージの情報を格納する整数配列
An integer array for storing the information of arrived message
 送信元ランクやタグの値を参照可能（通常は、あまり使わない）
Consists the information about the source rank and the tag.
( Not be used in most case )
24
集団通信
Collective Communications
 グループ内の全プロセスで行う通信
Communications among all of the processes in the group
 Examples)
 MPI_Bcast

copy a data to other
processes
 MPI_Gather

Gather data from
other processes
to an array
Rank 0
Rank 1
3 1 8 2
3 1 8 2
Rank 0
Rank 2
3 1 8 2
Rank 1
7
5
Rank 2
9
7 5 9
 MPI_Reduce

Apply a 'Reduction'
operation to the
distributed data
to produce one
array
Rank 0
Rank 1
1 2 3
12
15
4 5 6
Rank 2
7 8 9
18
25
MPI_Bcast
Usage:
int MPI_Bcast(void *b, int c, MPI_Datatype d,
int root, MPI_Comm comm);
 あるプロセスのデータを全プロセスにコピー
copy a data on a process to all of the processes
 Parameters:
 start address, number of elements, data type,
root rank, communicator
 root rank: コピー元のデータを所有するプロセスのランク
rank of the process that has the original data
 Example:
MPI_Bcast(a, 3, MPI_DOUBLE, 0, MPI_COMM_WORLD);
Rank 0
a
Rank 1
a
Rank 2
a
Rank 3
a
26
MPI_Gather
Usage:
int MPI_Gather(void *sb, int sc MPI_Datatype st, void *rb, int rc,
MPI_Datatype rt, int root, MPI_Comm comm);
 全プロセスからデータを集めて一つの配列を構成
Gather data from other processes to construct an array
 Parameters:
 send data: start address, number of elements, data type,
receive data: start address, number of elements, data type,
(means only on the root rank)
root rank, communicator
 root rank: 結果の配列を格納するプロセスのランク
rank of the process that stores the result array
 Example:
MPI_Gather(a, 3, MPI_DOUBLE, b, 3, MPI_DOUBLE, 0, MPI_COMM_WORLD);
Rank 0
a
Rank 1
a
Rank 2
a
Rank 3
a
b
27
集団通信の利用に当たって
Usage of Collective Communications
 同じ関数を全プロセスが実行するよう、記述する。
Every process must call the same function
 例えば MPI_Bcastは，root rankだけでなく全プロセスで実行
For example, MPI_Bcast must be called not only by the root rank
but also all of the other ranks
 送信データと受信データの場所を別々に指定するタイプの集
団通信では、送信データの範囲と受信データの範囲が重な
らないように指定する。
On functions that require information of both send and receive,
the specified ranges of the addresses for sending and receiving
cannot be overlapped.
 MPI_Gather, MPI_Allgather, MPI_Gatherv, MPI_Allgatherv,
MPI_Recude, MPI_Allreduce, MPI_Alltoall, MPI_Alltoallv, etc.
28
まとめ
Summary
 MPIでは、一つのプログラムを複数のプロセスが実行する
On MPI, multiple processes run the same program
 各プロセスには、そのランク（番号）に応じて仕事を割り当てる
Jobs are attached according to the rank(the number) of each process
 各プロセスはそれぞれ自分だけの記憶空間で動作する
Each process runs on its own memory space
 他のプロセスが持っているデータを参照するには、通信する
Accesses to the data on other processes can be made only by explicit
communication among processes
 MPI functions
 MPI_Init, MPI_Finalize, MPI_Comm_rank
 MPI_Send, MPI_Recv
 MPI_Bcast, MPI_Gather
29
References
 MPI Forum
http://www.mpi-forum.org/
 specification of "MPI standard"
 MPI仕様（日本語訳）
http://phase.hpcc.jp/phase/mpi-j/ml/
 理化学研究所の講習会資料
http://accc.riken.jp/HPC/training/mpi/mpi_all_2007-0207.pdf
30
Ex 1) 乱数を表示するプログラム
A program that displays random numbers
 「各プロセスがそれぞれ自分のランクと整数乱数を一つ表示する
プログラム」を作成しなさい。
Make a program in which each process displays its own
rank with one integer random number
 Sample:
#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
int main(int argc, char *argv[])
{
int r;
struct timeval tv;
gettimeofday(&tv, NULL);
srand(tv.tv_usec);
r = rand();
printf("%d¥n", r);
}
Ex 1) (cont.)
 Example of the result of execution
1:
0:
3:
2:
4:
5:
6:
7:
520391
947896500
1797525940
565917780
1618651506
274032293
1248787350
828046128
Ex 1) Sample of the answer
#include
#include
#include
#include
<stdio.h>
<stdlib.h>
<sys/time.h>
"mpi.h"
int main(int argc, char *argv[])
{
int r, myid, procs;
struct timeval tv;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &myid);
MPI_Comm_size(MPI_COMM_WORLD, &procs);
gettimeofday(&tv, NULL);
srand(tv.tv_usec);
r = rand();
printf("%d: %d¥n", myid, r);
MPI_Finalize();
}
レポート課題：順番をそろえて表示する
Report: Display in order
 Ex 1) で作成したプログラムについて、以下の条件を満たす
ように修正しなさい。
「ランク０からランクの順に、それぞれのプロセスで生成した
乱数を表示する。」
Modify the program in Ex1), so that:
Messages are printed out in the order of the rank
of each process
 Example of the result of the execution
0:
1:
2:
3:
4:
5:
6:
7:
1524394631
999094501
941763604
526956378
152374643
1138154117
1926814754
156004811

Download Report