Harness the Power of GPUs: An Introduction to GPGPU Programming Lecture 10: OpenCL Guido Juckeland Visiting Scholar Technische Universität Dresden, Germany June 20, 2014 1 Members and implementations 2 Watch out! • • In the meantime 4 versions released with different features Downwards compatible, but… http://www.khronos.org/assets/uploads/developers/library/overview/opencl-overview.pdf 3 OpenCL eco system AMD GPU NVIDIA GPU Exotic stuff AMD CPU Intel CPU Intel MIC 4 Conceptual OpenCL Device Architecture © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010 ECE408, University of Illinois, Urbana-Champaign 5 Mapping OpenCL Memory Types to CUDA OpenCL Memory Types CUDA Equivalent global memory global memory constant memory constant memory local memory shared memory private memory Local memory © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010 ECE408, University of Illinois, Urbana-Champaign 6 OpenCL to CUDA Data Parallelism Model Mapping OpenCL Parallelism Concept CUDA Equivalent kernel kernel host program host program NDRange (index space) grid work item thread work group block © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010 ECE408, University of Illinois, Urbana-Champaign 7 Overview of OpenCL Execution Model © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010 ECE408, University of Illinois, Urbana-Champaign 8 Mapping of OpenCL Dimensions and Indices to CUDA OpenCL API Call Explanation CUDA Equivalent get_global_id(0); global index of the work item in the x dimension blockIdx.x×blockDim.x +threadIdx.x get_local_id(0) local index of the work item within the work group in the x dimension blockIdx.x get_global_size(0); size of NDRange in the x dimension gridDim.x ×blockDim.x get_local_size(0); Size of each work group in the x dimension blockDim.x © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010 ECE408, University of Illinois, Urbana-Champaign 9 A Simple OpenCL Kernel Example __kernel void vadd(__global const float *a, __global const float *b, __global float *result) { int id = get_global_id(0); result[id] = a[id] + b[id]; } © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010 ECE408, University of Illinois, Urbana-Champaign 10 OpenCL as a whole Compile Code Create data & arguments Send to execution http://www.khronos.org/assets/uploads/developers/library/overview/opencl-overview.pdf 11 OpenCL Context for Device Management © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010 ECE408, University of Illinois, Urbana-Champaign 12 Creating OpenCL Context and Device Queue … cl_int clerr = CL_SUCCESS; cl_context clctx=clCreateContextFromType(0, CL_DEVICE_TYPE_ALL, NULL, NULL, &clerr); size_t parmsz; clerr= clGetContextInfo(clctx, CL_CONTEXT_DEVICES, 0, NULL, &parmsz); cl_device_id* cldevs= (cl_device_id *) malloc(parmsz); clerr= clGetContextInfo(clctx, CL_CONTEXT_DEVICES, parmsz,cldevs, NULL); cl_command_queue clcmdq=clCreateCommandQueue(clctx,cldevs[0], 0, &clerr); © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010 ECE408, University of Illinois, Urbana-Champaign 13 Memory management (~cudaMalloc) http://www.khronos.org/assets/uploads/developers/library/overview/opencl-overview.pdf 14 Data transfers (~cudaMemcpy) http://www.khronos.org/assets/uploads/developers/library/overview/opencl-overview.pdf 15 Different from CUDA: Compiling kernels http://www.khronos.org/assets/uploads/developers/library/overview/opencl-overview.pdf 16 And go… http://www.khronos.org/assets/uploads/developers/library/overview/opencl-overview.pdf 17 Managing kernel execution http://www.khronos.org/assets/uploads/developers/library/overview/opencl-overview.pdf 18 Kernels in one queue http://www.khronos.org/assets/uploads/developers/library/overview/opencl-overview.pdf 19 Kernels in multiple queues http://www.khronos.org/assets/uploads/developers/library/overview/opencl-overview.pdf 20 Building and running OpenCL programs • In your host code: #include <CL/opencl.h> • When compiling/linking: gcc foo.c -lOpenCL • Execution: ./a.out 21 QUESTIONS? 22
© Copyright 2025 ExpyDoc