Parallel Computing Thread programming with Pthreads Thorsten Grahs, 20. April 2015 Table of contents Basics Processes & threads Posix threads Managing threads Mutually exclusive threads 20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 2 Parallel programming patterns Parallel programs consists of a collection of tasks. In general, these task should executed on processes or threads on multiple processors. To handle such processes/tasks, special coordination and control structures are needed. The simplest/low level structure is a thread 20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 3 Shared Memory Programming Shared memory programming can be done on a system which has more than one computing units (core) sharing the same physical memory. The data between different computing units is shared in the form of shared variables. There are many tools (Application Programming Interfaces or API) like OpenMP pthreads ITBB ( Intel Threading Building Blocks ) for supporting this type of programming model Today we focus on pthreads 20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 4 Threads Smallest unit of processing that a scheduler work on. Threads executing under the same process share the address space They are light-weighted i.e. easy to create Context switching between threads is very fast. 20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 5 Processes & threads The building blocks of a Linux system are processes. Each process has it own data, instructions and memory space. Threads are sub-units of processes and easy to create (because no data protection is needed) and share memory space. The ability of threads to run simultaneously can be used to do many independent tasks concurrently. Threading API provide tools to assign id to threads, share data and instruction and for synchronization. In general, multi-threading problems follow the fork-join model. 20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 6 Programming model 20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 7 Linux processes A process on a Linux system is an object through which the resources used by a program like memory, processor time and IO are managed and monitored. Processes are building blocks of any Linux system and can run in the kernel space or in the user space. A process consists of an address space (a set of memory pages) and a set of data structures. The address space of a process contains the code and libraries that the process is executing,the process variables, its stacks, and various extra information needed by the kernel while the process is running. Some of the common Linux command to monitor and manage processes are ps -ef, top, strace, kill etc. 20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 8 What are threads? Threads are light weight sub-processes within a process. F.i. opening a new tab in a browser launches a new thread. Threads inherit many attributes of a the parent process (such as the process address space). Try of a definition A thread is a sequence of related instructions executed independently of other instruction sequences 20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 9 Threads and parallel programming Why treads are important for parallel computing? There are light-weighted independent stream of instructions which could easily create concurrency. There offer a natural programming model for shared address spaces. They are building blocks one of the basic concepts of parallel APIs. OpenMP is based on cooperating threads running simultaneously on multiple processors ore cores. Multi-threading is a building block for modern computer architectures 20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 10 Multi threads If we run multiple threads concurrently, we are doing multi-threading i.e. we run several tasks in parallel In a multi-thread process the processor can switch execution resources between threads, resulting in concurrent execution. Concurrency on a single processor (core) system indicates that more than one thread is making progress, but the threads are not actually running simultaneously. On a multi-core system each thread in the process can run concurrently on a separate core i.e., true parallelism. 20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 11 Posix Threads (PThreads) Portable Operating System Interface Threads pthreads (POSIX threads) is an API or library, which can be used for shared memory/address space programming. Low-level threading libraries for C programming on Linux Provides more than one 100 functions to manage threads However, less than 10 are in general commonly used. When a thread is created, a new thread of control is added to the current process. Each thread in the process runs simultaneously, and has access to the calling process’s global data. In addition each thread has its own private attributes and call stack. 20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 12 Managing pthreads To create a new thread, a running thread calls the pthread_create() function, and passes a pointer to a function for the new thread to run. One argument for the new thread’s function can also be passed, along with thread attributes. The execution of a thread begins with the successful return from the pthread create() function. The thread ends when the function that was called with the thread completes normally. A thread can also be terminated if the thread calls a pthread_exit() routine, or if any other thread calls pthread_cancel() to explicitly terminate that thread. 20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 13 Example: Creating threads | pt_join.c 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 // EXAMPEL pthread : create & join − Shows how data can be returned by a subprogram #include <stdio.h> #include <pthread.h> void ∗print_char (void ∗ch){ int i ; for ( i =0; i <10; i ++) printf ( "%c", ∗(char∗)ch); return NULL; } int main () { char ch1=’−’, ch2=’∗’; pthread_t p1, p2; pthread_create (&p1, NULL, print_char, &ch1); pthread_create (&p2, NULL, print_char, &ch2); pthread_join (p1, NULL); pthread_join (p2, NULL); printf ( " \n") ; return 0; } Compile with gcc pt_join.c -lpthread -o ptjoin 20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 14 Example: Creating threads In this program the “main thread”, which is started when the program is executed, creates two additional threads: p1 p2 Both threads loops through the function print_char where they print “-” and “*” respectively. When every thread has printed 10 signs, they return. The main thread wait for them returning, prints a newline and finishes ./ptjoin ----------********** 20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 15 pthread_create int pthread_create (pthread_t *th, pthread_attr_t *attr,void *(*start_routine)(void*),void *arg); First argument point to an allocated memory space for the thread (here: &p1 to pthread_t p1). Second argument attr may hold arguments for the create thread, Passing NULL will cause standard properties. Third argument void *(*start_routine)(void*) is a pointer to a function, which will be started by the thread Fourth argument arg is passed as a argument to this function 20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 16 Example: Creating threads 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 /∗−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− EXAMPEL pthread : return ∗ This programs shows how data can be returned by a subprogram which every threads exectute in parallel . −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−∗/ #include <pthread.h> #include <stdio.h> #include<stdlib .h> void∗ child_thread( void ∗ param ){ long id , jd ; id = (long )param; jd = id ∗id; return (void ∗)jd ; } int main(int argc, char ∗argv[]) { long i ; pthread_t ∗thread; long ∗return_value; int num_threads; Listing 1: pt_return.c 20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 17 Example: Creating threads 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 } if (argc < 2){ fprintf ( stderr , " ./ return_pthreads return(−1); } <# threads>\n"); num_threads = atoi(argv[1]); thread = (pthread_t ∗)malloc(num_threads∗sizeof(pthread_t)); return_value=(long ∗)malloc(num_threads∗sizeof(long )); for ( i =0; i <num_threads; i++ ){ pthread_create(&thread[i],NULL,&child_thread,(void∗)i ); } for ( i =0; i <num_threads ; i++ ) { pthread_join(thread[ i ], (void∗∗)&return_value[i] ) ; printf ( "input = %ld output=%ld \n",i, return_value[i] ); } return (0) ; Listing 2: pt_return.c cont. 20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 18 Example: Creating threads Compile with gcc pt_return.c -lpthread -o ptreturn $ ./ptreturn 10 input = 0 output=0 input = 1 output=1 input = 2 output=4 input = 3 output=9 input = 4 output=16 input = 5 output=25 input = 6 output=36 input = 7 output=49 input = 8 output=64 input = 9 output=81 20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 19 pthread_exit Terminate the calling thread void pthread_exit (void *retval); The pthread_exit() function terminates the calling thread It returns a value via retval If the thread is joinable, the return value is available to another thread in the same process A thread could also be canceled by another one (later). 20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 20 Join with a terminated thread Joining is one way to accomplish synchronization between threads. This could be accomplished by the routine pthread_join For example: 20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 21 pthread_join int pthread_join (pthread_t th,void **returnval); The function waits for the thread th to terminate. pthread_join()suspends execution of calling thread until target threads completes execution. The thread specified by thread must be joinable. If returnval!=NULL, returnval holds the return value from thread th If multiple threads simultaneously try to join with the same thread, the results are undefined. 20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 22 Shared memory model All threads have access to the same global, shared memory Threads also have their own private data Programmers are responsible for synchronizing access (protecting) globally shared data. 20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 23 Sharing data Threads can collaborate on the same data (shared memory) Suppose that two (or more) concurrently running threads both try to increment an integer variable I: thread 1: I=I+2 thread 2: I=I+3 This is an legitimate activity if the variable is an accumulator for values computed by independent processes. Could this be problematic? 20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 24 Race condition It is problematic! The result of these updates depends on the order of execution: This situation is called a race condition Could we avoid this behaviour? 20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 25 Spawning threads We already mentioned that threads share the same memory space. This could cause some troubles. In the following are some examples: (details are not SO important) Spawning threads In the following example 1. some threads are created 2. the update a shared variable 3. The Threads join again 20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 26 forking function adder() joining Spawning threads 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 #include < stdlib .h> #include <stdio.h> #include <pthread.h> int sum = 0; void∗ adder() { sum = sum +1; return ; } #define NTHREADS 50 int main () { int i ; pthread_t th [NTHREADS]; printf ( " forking \n") ; for ( i =0; i <NTHREADS; i++) if (pthread_create(&th[i ], NULL,&adder,NULL) != 0) return i+1; printf ( " joining \n") ; for ( i =0; i <NTHREADS; i++) if (pthread_join(th [ i ], NULL) != 0) return NTHREADS+i+1; printf ( "Sum computed: %d\n", sum); return 0; } Listing 3: spawnThreads.c 20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 27 Spawning threads 1 2 3 4 5 $ gcc -o spawnThreads -pthread spawnThreads.c $ ./spawnThreads forking joining Sum computed: 50 This expected result is a coincidence. It only happens because updating the variable is much quicker than creating threads If we artificially increase the time for the update, we will no longer get the right result 20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 28 Spawning threads modified We chance the adder function: 1 2 void∗ adder() { 3 int t = sum; 4 sleep(1); 5 sum = t +1; 6 return ; 7 } Listing 4: spawnThreadsMod.c (adder) 1 2 3 4 5 $ gcc -o spawnThreadsMod -pthread spawnThreadsMod.c $ ./spawnThreadsMod forking joining Sum computed: 1 Now all threads read out the value of sum, wait a while (presumably calculating something) and then update, 20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 29 Mutually exclusive We can fix this behaviour by having a lock on the code region that should be mutually exclusive 7 pthread_mutex_t lock; 8 9 void∗ adder() { 10 int t , r ; 11 pthread_mutex_lock(&lock); 12 t = sum; 13 sleep(1); 14 sum = t +1; 15 pthread_mutex_unlock(&lock); 16 return ; 22 23 pthread_t th [NTHREADS]; pthread_mutex_init(&lock,NULL); Listing 5: spawnThreadsMutex.c 1 2 3 4 5 $ gcc -o mutex -pthread spawnThreadsMutex.c $ ./mutex forking joining Sum computed: 50 20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 30 Mutex variables Mutex Mutex is an abbreviation for mutual exclusion. They can be used to prevent raceconditions. Mutex variables are one of the primary means of implementing thread synchronization and for protecting shared data when multiple writes occur. A mutex variable acts like a lockprotecting access to a shared data resource. The basic concept is that only one thread can lock (or own) a mutex variable at any given time. Thus, even if several threads try to lock a mutex only one thread will be successful. 20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 31 Mutex sequence A typical sequence in the use of a mutex is as follows: Create and initialize a mutex variable Several threads attempt to lock the mutex Only one succeeds and that thread owns the mutex The owner thread performs some set of actions The owner unlocks the mutex Another thread acquires the mutex and repeats the process Finally the mutex is destroyed 20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 32 Locks When your code accesses some memory, you lock it up: mutex: the lock. critical section: the code locked with a mutex. Now if a thread wants to run this code, he needs the key. So only one thread can run the code at a time: 20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 33 Critical section Code in a critical section Have some privacy! 20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 34 Mutex creation Declaration Mutex variables must be declared with type pthread_mutex_t They must be initialized before they can be use Initialization There are two ways to initialize a mutex variable: Statically, when it is declared. For example: pthread_mutex_t mymutex = PTHREAD_MUTEX_INITIALIZER; Dynamically, with the pthread_mutex_init() routine. 20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 35 Mutex locking pthread_mutex_lock() This routine is used by a thread to acquire a lock on the specified mutex variable. If the mutex is already locked by another thread, this call will block the calling thread until the mutex is unlocked. pthread_mutex_trylock() This routine will attempt to lock a mutex. However, if the mutex is already locked, the routine will return immediately with a “busy“ error code. Useful in preventing deadlock conditions, as in a priority-inversion situation. pthread_mutex_unlock() This routine unlock a mutex if called by the owning thread. 20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 36 Example: Scalar product 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 // // This program computes the scalar product of two vectors using pthreads. #include #include #include #include <pthread.h> <sys/time.h> < stdlib .h> <stdio.h> int time_estimate(struct timeval t1 , struct timeval t2) { float t ; t = (t2 .tv_sec−t1.tv_sec) + (t2.tv_usec−t1.tv_usec)/1000000.0; printf ( "Time [seconds]= %12.6f \n",t); return (0) ; } typedef struct { int thread_id; int num_threads; int nsize; } thread_data; float ∗vecA, ∗vecB,global_sum ; int vec_length; pthread_mutex_t mutex_sum = PTHREAD_MUTEX_INITIALIZER; Listing 6: scalarProd.c 20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 37 Example: Scalar product 26 /∗ This is the function which every threads computes in parallel ∗/ 27 28 void ∗ VecSum(void ∗arg){ 29 thread_data ∗p = (thread_data ∗)arg; 30 int myid = p−>thread_id; 31 int nthreads=p−>num_threads; 32 int n = p−>nsize; 33 int i ; 34 float local_sum; 35 36 for ( i =myid∗n/nthreads; i < (myid+1)∗n/nthreads; i++) 37 local_sum+=vecA[i] ∗ vecB[i]; 38 39 fprintf (stdout, "local_sum=%.2f\n",local_sum); 40 41 /∗updating global sum using mutex lock ∗/ 42 43 pthread_mutex_lock(&mutex_sum); 44 global_sum += local_sum; 45 pthread_mutex_unlock(&mutex_sum); 46 47 return ; 48 } Listing 7: scalarProd (cont.) 20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 38 Example: Scalar product | main 51 int main(int argc, char ∗argv[]) { 52 int ret_count; 53 pthread_t ∗ threads; 54 pthread_attr_t pta; 55 int i , num_threads, vec_length; 56 thread_data ∗arg; 57 struct timeval t1 , t2 ; 58 struct timezone tzp; 59 60 if (argc < 2){ fprintf ( stderr , " ./ scalar_prod_pthreads <vect size> <num threads>\n"); return(−1); } 61 62 vec_length = abs(atoi(argv [1]) ) ; 63 num_threads = abs(atoi(argv[2]) ) ; 64 65 vecA = ( float ∗) malloc(sizeof( float ) ∗ vec_length); 66 vecB = ( float ∗) malloc(sizeof( float ) ∗ vec_length); 67 arg = (thread_data ∗)malloc(num_threads∗sizeof(thread_data)); 68 69 pthread_attr_init (&pta); 70 71 threads = (pthread_t ∗) malloc(sizeof(pthread_t) ∗ num_threads); 72 73 for ( i =0; i < vec_length; i ++){ 74 vecA[i] = ( float ) i +1; 75 vecB[i] = 1.0; 76 } Listing 8: scalarProd (cont.) 20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 39 Example: Scalar product 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 /∗ Now find out the starting time ∗/ gettimeofday (&t1, &tzp); /∗Thread Creation ∗/ for ( i = 0; i < num_threads;i++) { arg[ i ]. thread_id = i ; arg[ i ]. num_threads = num_threads; arg[ i ]. nsize = vec_length; ret_count=pthread_create(&threads[i],NULL,VecSum, (void ∗) (arg+i)); } /∗ joining thread ∗/ for ( i = 0; i < num_threads;i++) ret_count=pthread_join(threads[i ], NULL); /∗ Find out the final time ∗/ gettimeofday (&t2, &tzp); printf ( "A = "); for ( i =0; i < vec_length; i ++) fprintf (stdout, " %.2f ",vecA[i]) ; printf ( " \n") ; Listing 9: scalarProd (cont.) 20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 40 Example: Scalar product 103 104 105 106 107 108 109 110 111 112 113 114 115 } printf ( "B = "); for ( i =0; i < vec_length; i ++) printf ( " %.2f ",vecB[i]) ; printf ( " \n") ; printf ( "A.B = %.6f\n ",global_sum); ret_count=pthread_attr_destroy(&pta); time_estimate(t1,t2) ; return (0) ; Listing 10: scalarProd (cont.) 1 2 3 4 5 6 7 8 9 $ gcc -o scalar -pthread scalarProd.c $ ./scalar 10 3 local_sum=6.00 local_sum=15.00 local_sum=34.00 A = 1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00 9.00 10.00 B = 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 A.B = 55.000000 Time [seconds]= 0.000171 20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 41 Resume Why treads are important for parallel computing? Threads are independent stream of instructions which could easily create concurrency. There are light-weighted i.e. easy to create with less overhead There offer a natural programming model for shared address spaces. OpenMP is based on cooperating threads running simultaneously on multiple processors ore cores. OpenMP will be covered in the next lectures 20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 42 Further readings POSIX Threads Programming Blaise Barney, Lawrence Livermore National Laboratory https://computing.llnl.gov/tutorials/pthreads Advanced Linux Programming Coude sourcery Chapter 4: Threads http://www.advancedlinuxprogramming.com Parallel Programming T. Rauber & G. Rünger Chapter 6: Thread programming Springer 2007. 20. April 2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 43
© Copyright 2024 ExpyDoc