Parallel Computing I SS 2015 - Thread programming with Pthreads

Parallel Computing
Thread programming with Pthreads
Thorsten Grahs, 20. April 2015
Table of contents
Basics
Processes & threads
Posix threads
Managing threads
Mutually exclusive threads
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 2
Parallel programming patterns
Parallel programs consists of a
collection of tasks.
In general, these task should executed
on processes or threads on multiple
processors.
To handle such processes/tasks,
special coordination and control
structures are needed.
The simplest/low level structure is a thread
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 3
Shared Memory Programming
Shared memory programming can be done on a system
which has more than one computing units (core) sharing
the same physical memory.
The data between different computing units is shared in
the form of shared variables.
There are many tools (Application Programming
Interfaces or API) like
OpenMP
pthreads
ITBB ( Intel Threading Building Blocks )
for supporting this type of programming model
Today we focus on pthreads
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 4
Threads
Smallest unit of processing
that a scheduler work on.
Threads executing under
the same process share
the address space
They are light-weighted i.e.
easy to create
Context switching between
threads is very fast.
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 5
Processes & threads
The building blocks of a Linux system are processes.
Each process has it own data, instructions and memory
space.
Threads are sub-units of processes and easy to create
(because no data protection is needed) and share
memory space.
The ability of threads to run simultaneously can be used to
do many independent tasks concurrently.
Threading API provide tools to assign id to threads,
share data and instruction and for synchronization.
In general, multi-threading problems follow the fork-join
model.
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 6
Programming model
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 7
Linux processes
A process on a Linux system is an object through which
the resources used by a program like memory, processor
time and IO are managed and monitored.
Processes are building blocks of any Linux system and
can run in the kernel space or in the user space.
A process consists of an address space
(a set of memory pages) and a set of data structures.
The address space of a process contains the code and
libraries that the process is executing,the process
variables, its stacks, and various extra information needed
by the kernel while the process is running.
Some of the common Linux command to monitor and
manage processes are ps -ef, top, strace, kill etc.
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 8
What are threads?
Threads are light weight
sub-processes within a process.
F.i. opening a new tab in a
browser launches a new thread.
Threads inherit many attributes of
a the parent process
(such as the process address
space).
Try of a definition
A thread is a sequence of related instructions executed
independently of other instruction sequences
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 9
Threads and parallel programming
Why treads are important for parallel computing?
There are light-weighted independent stream of
instructions which could easily create concurrency.
There offer a natural programming model
for shared address spaces.
They are building blocks one of the basic concepts of
parallel APIs.
OpenMP is based on cooperating threads running
simultaneously on multiple processors ore cores.
Multi-threading is a building block for modern computer
architectures
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 10
Multi threads
If we run multiple threads concurrently,
we are doing multi-threading i.e. we run
several tasks in parallel
In a multi-thread process the processor
can switch execution resources
between threads, resulting in
concurrent execution.
Concurrency on a single processor (core) system
indicates that more than one thread is making progress,
but the threads are not actually running simultaneously.
On a multi-core system each thread in the process can
run concurrently on a separate core i.e., true parallelism.
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 11
Posix Threads (PThreads)
Portable Operating System Interface Threads
pthreads (POSIX threads) is an API or library, which can be
used for shared memory/address space programming.
Low-level threading libraries for C programming on Linux
Provides more than one 100 functions to manage threads
However, less than 10 are in general commonly used.
When a thread is created, a new thread of control is
added to the current process.
Each thread in the process runs simultaneously, and has
access to the calling process’s global data.
In addition each thread has its own private attributes and
call stack.
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 12
Managing pthreads
To create a new thread, a running thread calls the
pthread_create() function, and passes a pointer to a
function for the new thread to run.
One argument for the new thread’s function can also be
passed, along with thread attributes.
The execution of a thread begins with the successful
return from the pthread create() function.
The thread ends when the function that was called with
the thread completes normally.
A thread can also be terminated if the thread calls a
pthread_exit() routine, or if any other thread calls
pthread_cancel() to explicitly terminate that thread.
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 13
Example: Creating threads | pt_join.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// EXAMPEL pthread : create & join − Shows how data can be returned by a subprogram
#include <stdio.h>
#include <pthread.h>
void ∗print_char (void ∗ch){
int i ;
for ( i =0; i <10; i ++)
printf ( "%c", ∗(char∗)ch);
return NULL; }
int main ()
{
char ch1=’−’, ch2=’∗’;
pthread_t p1, p2;
pthread_create (&p1, NULL, print_char, &ch1);
pthread_create (&p2, NULL, print_char, &ch2);
pthread_join (p1, NULL);
pthread_join (p2, NULL);
printf ( " \n") ;
return 0;
}
Compile with
gcc pt_join.c -lpthread -o ptjoin
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 14
Example: Creating threads
In this program the “main thread”, which is started when
the program is executed, creates two additional threads:
p1
p2
Both threads loops through the function print_char
where they print “-” and “*” respectively.
When every thread has printed 10 signs, they return.
The main thread wait for them returning, prints a newline
and finishes
./ptjoin
----------**********
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 15
pthread_create
int pthread_create (pthread_t *th, pthread_attr_t *attr,void *(*start_routine)(void*),void *arg);
First argument point to an allocated memory space for the
thread (here: &p1 to pthread_t p1).
Second argument attr may hold arguments for the create
thread, Passing NULL will cause standard properties.
Third argument void *(*start_routine)(void*) is a
pointer to a function, which will be started by the thread
Fourth argument arg is passed as a argument to this
function
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 16
Example: Creating threads
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
/∗−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
EXAMPEL pthread : return
∗ This programs shows how data can be returned by a subprogram
which every threads exectute in parallel .
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−∗/
#include <pthread.h>
#include <stdio.h>
#include<stdlib .h>
void∗ child_thread( void ∗ param ){
long id , jd ;
id = (long )param;
jd = id ∗id;
return (void ∗)jd ;
}
int main(int argc, char ∗argv[]) {
long i ;
pthread_t ∗thread;
long ∗return_value;
int num_threads;
Listing 1: pt_return.c
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 17
Example: Creating threads
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46 }
if (argc < 2){
fprintf ( stderr , " ./ return_pthreads
return(−1);
}
<# threads>\n");
num_threads = atoi(argv[1]);
thread = (pthread_t ∗)malloc(num_threads∗sizeof(pthread_t));
return_value=(long ∗)malloc(num_threads∗sizeof(long ));
for ( i =0; i <num_threads; i++ ){
pthread_create(&thread[i],NULL,&child_thread,(void∗)i );
}
for ( i =0; i <num_threads ; i++ ) {
pthread_join(thread[ i ], (void∗∗)&return_value[i] ) ;
printf ( "input = %ld output=%ld \n",i, return_value[i] );
}
return (0) ;
Listing 2: pt_return.c cont.
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 18
Example: Creating threads
Compile with
gcc pt_return.c -lpthread -o ptreturn
$ ./ptreturn 10
input = 0 output=0
input = 1 output=1
input = 2 output=4
input = 3 output=9
input = 4 output=16
input = 5 output=25
input = 6 output=36
input = 7 output=49
input = 8 output=64
input = 9 output=81
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 19
pthread_exit
Terminate the calling thread
void pthread_exit (void *retval);
The pthread_exit() function terminates the calling thread
It returns a value via retval
If the thread is joinable, the return value is available to
another thread in the same process
A thread could also be canceled by another one (later).
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 20
Join with a terminated thread
Joining is one way to accomplish synchronization between
threads.
This could be accomplished by the routine pthread_join
For example:
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 21
pthread_join
int pthread_join (pthread_t th,void **returnval);
The function waits for the thread th to terminate.
pthread_join()suspends execution of calling thread until
target threads completes execution.
The thread specified by thread must be joinable.
If returnval!=NULL, returnval holds the return value
from thread th
If multiple threads simultaneously try to join with the same
thread, the results are undefined.
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 22
Shared memory model
All threads have access to the same global, shared
memory
Threads also have their own private data
Programmers are responsible for synchronizing access
(protecting) globally shared data.
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 23
Sharing data
Threads can collaborate on the same data
(shared memory)
Suppose that two (or more) concurrently running threads
both try to increment an integer variable I:
thread 1: I=I+2
thread 2: I=I+3
This is an legitimate activity if the variable is an
accumulator for values computed by independent
processes.
Could this be problematic?
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 24
Race condition
It is problematic!
The result of these updates depends on the order of
execution:
This situation is called a
race condition
Could we avoid this behaviour?
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 25
Spawning threads
We already mentioned that threads share the same
memory space.
This could cause some troubles.
In the following are some examples:
(details are not SO important)
Spawning threads
In the following example
1. some threads are created
2. the update a shared variable
3. The Threads join again
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 26
forking
function adder()
joining
Spawning threads
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#include < stdlib .h>
#include <stdio.h>
#include <pthread.h>
int sum = 0;
void∗ adder() {
sum = sum +1;
return ;
}
#define NTHREADS 50
int main () {
int i ;
pthread_t th [NTHREADS];
printf ( " forking \n") ;
for ( i =0; i <NTHREADS; i++)
if (pthread_create(&th[i ], NULL,&adder,NULL) != 0) return i+1;
printf ( " joining \n") ;
for ( i =0; i <NTHREADS; i++)
if (pthread_join(th [ i ], NULL) != 0) return NTHREADS+i+1;
printf ( "Sum computed: %d\n", sum);
return 0;
}
Listing 3: spawnThreads.c
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 27
Spawning threads
1
2
3
4
5
$ gcc -o spawnThreads -pthread spawnThreads.c
$ ./spawnThreads
forking
joining
Sum computed: 50
This expected result is a coincidence.
It only happens because updating the variable is much
quicker than creating threads
If we artificially increase the time for the update, we will no
longer get the right result
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 28
Spawning threads modified
We chance the adder function:
1
2 void∗ adder() {
3
int t = sum;
4
sleep(1);
5
sum = t +1;
6
return ;
7 }
Listing 4: spawnThreadsMod.c (adder)
1
2
3
4
5
$ gcc -o spawnThreadsMod -pthread spawnThreadsMod.c
$ ./spawnThreadsMod
forking
joining
Sum computed: 1
Now all threads read out the value of sum, wait a while
(presumably calculating something) and then update,
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 29
Mutually exclusive
We can fix this behaviour by having a lock on the code region
that should be mutually exclusive
7 pthread_mutex_t lock;
8
9 void∗ adder() {
10
int t , r ;
11
pthread_mutex_lock(&lock);
12
t = sum;
13
sleep(1);
14
sum = t +1;
15
pthread_mutex_unlock(&lock);
16
return ;
22
23
pthread_t th [NTHREADS];
pthread_mutex_init(&lock,NULL);
Listing 5: spawnThreadsMutex.c
1
2
3
4
5
$ gcc -o mutex -pthread spawnThreadsMutex.c
$ ./mutex
forking
joining
Sum computed: 50
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 30
Mutex variables
Mutex
Mutex is an abbreviation for mutual exclusion.
They can be used to prevent raceconditions.
Mutex variables are one of the primary means of
implementing thread synchronization and for protecting
shared data when multiple writes occur.
A mutex variable acts like a lockprotecting access to a
shared data resource.
The basic concept is that only one thread can lock
(or own) a mutex variable at any given time.
Thus, even if several threads try to lock a mutex only one
thread will be successful.
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 31
Mutex sequence
A typical sequence in the use of a mutex is as follows:
Create and initialize a mutex variable
Several threads attempt to lock the mutex
Only one succeeds and that thread owns the mutex
The owner thread performs some set of actions
The owner unlocks the mutex
Another thread acquires the mutex and repeats the
process
Finally the mutex is destroyed
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 32
Locks
When your code accesses some memory, you lock it up:
mutex: the lock.
critical section: the code locked with a
mutex.
Now if a thread wants to run this code, he needs the key. So
only one thread can run the code at a time:
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 33
Critical section
Code in a critical section
Have some privacy!
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 34
Mutex creation
Declaration
Mutex variables must be declared with type
pthread_mutex_t
They must be initialized before they can be use
Initialization
There are two ways to initialize a mutex variable:
Statically, when it is declared. For example:
pthread_mutex_t mymutex =
PTHREAD_MUTEX_INITIALIZER;
Dynamically, with the pthread_mutex_init() routine.
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 35
Mutex locking
pthread_mutex_lock()
This routine is used by a thread to acquire a lock on the
specified mutex variable.
If the mutex is already locked by another thread, this call
will block the calling thread until the mutex is unlocked.
pthread_mutex_trylock()
This routine will attempt to lock a mutex.
However, if the mutex is already locked, the routine will
return immediately with a “busy“ error code.
Useful in preventing deadlock conditions,
as in a priority-inversion situation.
pthread_mutex_unlock()
This routine unlock a mutex if called by the owning thread.
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 36
Example: Scalar product
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
//
// This program computes the scalar product of two vectors using pthreads.
#include
#include
#include
#include
<pthread.h>
<sys/time.h>
< stdlib .h>
<stdio.h>
int time_estimate(struct timeval t1 , struct timeval t2) {
float t ;
t = (t2 .tv_sec−t1.tv_sec) + (t2.tv_usec−t1.tv_usec)/1000000.0;
printf ( "Time [seconds]= %12.6f \n",t);
return (0) ;
}
typedef struct {
int thread_id;
int num_threads;
int nsize;
} thread_data;
float ∗vecA, ∗vecB,global_sum ;
int vec_length;
pthread_mutex_t mutex_sum = PTHREAD_MUTEX_INITIALIZER;
Listing 6: scalarProd.c
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 37
Example: Scalar product
26 /∗ This is the function which every threads computes in parallel ∗/
27
28 void ∗ VecSum(void ∗arg){
29
thread_data ∗p = (thread_data ∗)arg;
30
int myid = p−>thread_id;
31
int nthreads=p−>num_threads;
32
int n = p−>nsize;
33
int i ;
34
float local_sum;
35
36
for ( i =myid∗n/nthreads; i < (myid+1)∗n/nthreads; i++)
37
local_sum+=vecA[i] ∗ vecB[i];
38
39
fprintf (stdout, "local_sum=%.2f\n",local_sum);
40
41
/∗updating global sum using mutex lock ∗/
42
43
pthread_mutex_lock(&mutex_sum);
44
global_sum += local_sum;
45
pthread_mutex_unlock(&mutex_sum);
46
47
return ;
48 }
Listing 7: scalarProd (cont.)
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 38
Example: Scalar product | main
51 int main(int argc, char ∗argv[]) {
52
int ret_count;
53
pthread_t ∗ threads;
54
pthread_attr_t pta;
55
int i , num_threads, vec_length;
56
thread_data ∗arg;
57
struct timeval t1 , t2 ;
58
struct timezone tzp;
59
60
if (argc < 2){ fprintf ( stderr , " ./ scalar_prod_pthreads
<vect size> <num threads>\n"); return(−1); }
61
62
vec_length = abs(atoi(argv [1]) ) ;
63
num_threads = abs(atoi(argv[2]) ) ;
64
65
vecA = ( float ∗) malloc(sizeof( float ) ∗ vec_length);
66
vecB = ( float ∗) malloc(sizeof( float ) ∗ vec_length);
67
arg = (thread_data ∗)malloc(num_threads∗sizeof(thread_data));
68
69
pthread_attr_init (&pta);
70
71
threads = (pthread_t ∗) malloc(sizeof(pthread_t) ∗ num_threads);
72
73
for ( i =0; i < vec_length; i ++){
74
vecA[i] = ( float ) i +1;
75
vecB[i] = 1.0;
76
}
Listing 8: scalarProd (cont.)
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 39
Example: Scalar product
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
/∗ Now find out the starting time ∗/
gettimeofday (&t1, &tzp);
/∗Thread Creation ∗/
for ( i = 0; i < num_threads;i++) {
arg[ i ]. thread_id = i ;
arg[ i ]. num_threads = num_threads;
arg[ i ]. nsize = vec_length;
ret_count=pthread_create(&threads[i],NULL,VecSum, (void ∗) (arg+i));
}
/∗ joining thread ∗/
for ( i = 0; i < num_threads;i++)
ret_count=pthread_join(threads[i ], NULL);
/∗ Find out the final time ∗/
gettimeofday (&t2, &tzp);
printf ( "A
= ");
for ( i =0; i < vec_length; i ++)
fprintf (stdout, " %.2f ",vecA[i]) ;
printf ( " \n") ;
Listing 9: scalarProd (cont.)
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 40
Example: Scalar product
103
104
105
106
107
108
109
110
111
112
113
114
115 }
printf ( "B
= ");
for ( i =0; i < vec_length; i ++)
printf ( " %.2f ",vecB[i]) ;
printf ( " \n") ;
printf ( "A.B =
%.6f\n ",global_sum);
ret_count=pthread_attr_destroy(&pta);
time_estimate(t1,t2) ;
return (0) ;
Listing 10: scalarProd (cont.)
1
2
3
4
5
6
7
8
9
$ gcc -o scalar -pthread scalarProd.c
$ ./scalar 10 3
local_sum=6.00
local_sum=15.00
local_sum=34.00
A = 1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00 9.00 10.00
B = 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
A.B = 55.000000
Time [seconds]=
0.000171
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 41
Resume
Why treads are important for parallel computing?
Threads are independent stream of instructions which
could easily create concurrency.
There are light-weighted i.e. easy to create with less
overhead
There offer a natural programming model for shared
address spaces.
OpenMP is based on cooperating threads running
simultaneously on multiple processors ore cores.
OpenMP will be covered in the next lectures
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 42
Further readings
POSIX Threads Programming
Blaise Barney, Lawrence Livermore National Laboratory
https://computing.llnl.gov/tutorials/pthreads
Advanced Linux Programming
Coude sourcery
Chapter 4: Threads
http://www.advancedlinuxprogramming.com
Parallel Programming
T. Rauber & G. Rünger
Chapter 6: Thread programming
Springer 2007.
20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 43