Harness the power of GPUs - Lecture 10

Harness the Power of GPUs:
An Introduction to GPGPU Programming
Lecture 10: OpenCL
Guido Juckeland
Visiting Scholar
Technische Universität Dresden, Germany
June 20, 2014
1
Members and implementations
2
Watch out!
•
•
In the meantime 4 versions released with different features
Downwards compatible, but…
http://www.khronos.org/assets/uploads/developers/library/overview/opencl-overview.pdf
3
OpenCL eco system
AMD GPU
NVIDIA GPU
Exotic stuff
AMD CPU
Intel CPU
Intel MIC
4
Conceptual OpenCL Device Architecture
© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010
ECE408, University of Illinois, Urbana-Champaign
5
Mapping OpenCL Memory Types to CUDA
OpenCL Memory Types
CUDA Equivalent
global memory
global memory
constant memory
constant memory
local memory
shared memory
private memory
Local memory
© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010
ECE408, University of Illinois, Urbana-Champaign
6
OpenCL to CUDA Data Parallelism Model Mapping
OpenCL Parallelism Concept
CUDA Equivalent
kernel
kernel
host program
host program
NDRange (index space)
grid
work item
thread
work group
block
© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010
ECE408, University of Illinois, Urbana-Champaign
7
Overview of OpenCL Execution Model
© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010
ECE408, University of Illinois, Urbana-Champaign
8
Mapping of OpenCL Dimensions
and Indices to CUDA
OpenCL API Call
Explanation
CUDA Equivalent
get_global_id(0);
global index of the work
item in the x dimension
blockIdx.x×blockDim.x
+threadIdx.x
get_local_id(0)
local index of the work
item within the work
group in the x dimension
blockIdx.x
get_global_size(0);
size of NDRange in the x
dimension
gridDim.x ×blockDim.x
get_local_size(0);
Size of each work group in
the x dimension
blockDim.x
© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010
ECE408, University of Illinois, Urbana-Champaign
9
A Simple OpenCL Kernel Example
__kernel void vadd(__global const float *a,
__global const float *b, __global float *result) {
int id = get_global_id(0);
result[id] = a[id] + b[id];
}
© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010
ECE408, University of Illinois, Urbana-Champaign
10
OpenCL as a whole
Compile Code
Create data &
arguments
Send to
execution
http://www.khronos.org/assets/uploads/developers/library/overview/opencl-overview.pdf
11
OpenCL Context for Device Management
© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010
ECE408, University of Illinois, Urbana-Champaign
12
Creating OpenCL Context and Device Queue
…
cl_int clerr = CL_SUCCESS;
cl_context clctx=clCreateContextFromType(0, CL_DEVICE_TYPE_ALL, NULL, NULL,
&clerr);
size_t parmsz;
clerr= clGetContextInfo(clctx, CL_CONTEXT_DEVICES, 0, NULL, &parmsz);
cl_device_id* cldevs= (cl_device_id *) malloc(parmsz);
clerr= clGetContextInfo(clctx, CL_CONTEXT_DEVICES, parmsz,cldevs, NULL);
cl_command_queue clcmdq=clCreateCommandQueue(clctx,cldevs[0], 0, &clerr);
© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010
ECE408, University of Illinois, Urbana-Champaign
13
Memory management (~cudaMalloc)
http://www.khronos.org/assets/uploads/developers/library/overview/opencl-overview.pdf
14
Data transfers (~cudaMemcpy)
http://www.khronos.org/assets/uploads/developers/library/overview/opencl-overview.pdf
15
Different from CUDA: Compiling kernels
http://www.khronos.org/assets/uploads/developers/library/overview/opencl-overview.pdf
16
And go…
http://www.khronos.org/assets/uploads/developers/library/overview/opencl-overview.pdf
17
Managing kernel execution
http://www.khronos.org/assets/uploads/developers/library/overview/opencl-overview.pdf
18
Kernels in one queue
http://www.khronos.org/assets/uploads/developers/library/overview/opencl-overview.pdf
19
Kernels in multiple queues
http://www.khronos.org/assets/uploads/developers/library/overview/opencl-overview.pdf
20
Building and running OpenCL programs
•
In your host code:
#include <CL/opencl.h>
•
When compiling/linking:
gcc foo.c -lOpenCL
•
Execution:
./a.out
21
QUESTIONS?
22