OpenCL Runtime: Programs and Kernels¶
Program¶
-
class
pyopencl.
Program
(context, src)¶ -
class
pyopencl.
Program
(context, devices, binaries) binaries must contain one binary for each entry in devices. If src is a
bytes
object starting with a valid SPIR-V magic number, it will be handed off to the OpenCL implementation as such, rather than as OpenCL C source code. (SPIR-V support requires OpenCL 2.1.)Changed in version 2016.2: Add support for SPIR-V.
-
info
¶ Lower case versions of the
program_info
constants may be used as attributes on instances of this class to directly query info attributes.
-
get_info
(param)¶ See
program_info
for values of param.
-
get_build_info
(device, param)¶ See
program_build_info
for values of param.
-
build
(options=[], devices=None, cache_dir=None)¶ options is a string of compiler flags. Returns self.
If cache_dir is not None - built binaries are cached in an on-disk cache with given path. If passed cache_dir is None, but context of this program was created with not-None cache_dir - it will be used as cache directory. If passed cache_dir is None and context was created with None cache_dir: built binaries will be cached in an on-disk cache called
pyopencl-compiler-cache-vN-uidNAME-pyVERSION
in the directory returned bytempfile.gettempdir()
. By setting the environment variablePYOPENCL_NO_CACHE
to any non-empty value, this caching is suppressed. Any options found in the environment variablePYOPENCL_BUILD_OPTIONS
will be appended to options.Changed in version 2011.1: options may now also be a
list
ofstr
.Changed in version 2013.1: Added
PYOPENCL_NO_CACHE
. AddedPYOPENCL_BUILD_OPTIONS
.
-
compile
(self, options=[], devices=None, headers=[])¶ - Parameters
headers – a list of tuples (name, program).
Only available with CL 1.2.
New in version 2011.2.
-
kernel_name
¶ You may use
program.kernel_name
to obtain aKernel
objects from a program. Note that every lookup of this type produces a new kernel object, so that this won’t work:prg.sum.set_args(a_g, b_g, res_g) ev = cl.enqueue_nd_range_kernel(queue, prg.sum, a_np.shape, None)
Instead, either use the (recommended, stateless) calling interface:
prg.sum(queue, prg.sum, a_np.shape, None)
or keep the kernel in a temporary variable:
sum_knl = prg.sum sum_knl.set_args(a_g, b_g, res_g) ev = cl.enqueue_nd_range_kernel(queue, sum_knl, a_np.shape, None)
Note that the
Program
has to be built (seebuild()
) in order for this to work simply by attribute lookup.Note
The
program_info
attributes live in the same name space and take precedence overKernel
names.
-
static
from_int_ptr
(int_ptr_value)¶ Constructs a
pyopencl
handle from a C-level pointer (given as the integer int_ptr_value). If retain is True (the default)pyopencl
will callclRetainXXX
on the provided object. If the previous owner of the object will not release the reference, retain should be set to False, to effectively transfer ownership topyopencl
.Changed in version 2016.1: retain added
-
int_ptr
¶
Instances of this class are hashable, and two instances of this class may be compared using “==” and “!=”. (Hashability was added in version 2011.2.) Two objects are considered the same if the underlying OpenCL object is the same, as established by C pointer equality.
-
-
pyopencl.
create_program_with_built_in_kernels
(context, devices, kernel_names)¶ Only available with CL 1.2.
New in version 2011.2.
-
pyopencl.
link_program
(context, programs, options=[], devices=None)¶ Only available with CL 1.2.
New in version 2011.2.
-
pyopencl.
unload_platform_compiler
(platform)¶ Only available with CL 1.2.
New in version 2011.2.
Kernel¶
-
class
pyopencl.
Kernel
(program, name)¶ -
info
¶ Lower case versions of the
kernel_info
constants may be used as attributes on instances of this class to directly query info attributes.
-
get_info
(param)¶ See
kernel_info
for values of param.
-
get_work_group_info
(param, device)¶ See
kernel_work_group_info
for values of param.
-
get_arg_info
(arg_index, param)¶ See
kernel_arg_info
for values of param.Only available in OpenCL 1.2 and newer.
-
set_arg
(self, index, arg)¶ arg may be
None: This may be passed for __global memory references to pass a NULL pointer to the kernel.
Anything that satisfies the Python buffer interface, in particular
numpy.ndarray
,str
, ornumpy
’s sized scalars, such asnumpy.int32
ornumpy.float64
.Note
Note that Python’s own
int
orfloat
objects will not work out of the box. SeeKernel.set_scalar_arg_dtypes()
for a way to make them work. Alternatively, the standard library modulestruct
can be used to convert Python’s native number types to binary data in astr
.An instance of
MemoryObject
. (e.g.Buffer
,Image
, etc.)An instance of
LocalMemory
.An instance of
Sampler
.
-
set_scalar_arg_dtypes
(arg_dtypes)¶ Inform the wrapper about the sized types of scalar
Kernel
arguments. For each argument, arg_dtypes contains an entry. For non-scalars, this must be None. For scalars, it must be an object acceptable to thenumpy.dtype
constructor, indicating that the corresponding scalar argument is of that type.After invoking this function with the proper information, most suitable number types will automatically be cast to the right type for kernel invocation.
Note
The information set by this rountine is attached to a single kernel instance. A new kernel instance is created every time you use program.kernel attribute access. The following will therefore not work:
prg = cl.Program(...).build() prg.kernel.set_scalar_arg_dtypes(...) prg.kernel(queue, n_globals, None, args)
-
__call__
(queue, global_size, local_size, *args, global_offset=None, wait_for=None, g_times_l=False)¶ Use
enqueue_nd_range_kernel()
to enqueue a kernel execution, after usingset_args()
to set each argument in turn. See the documentation forset_arg()
to see what argument types are allowed. Returns a newpyopencl.Event
. wait_for may either be None or a list ofpyopencl.Event
instances for whose completion this command waits before starting exeuction.None may be passed for local_size.
If g_times_l is specified, the global size will be multiplied by the local size. (which makes the behavior more like Nvidia CUDA) In this case, global_size and local_size also do not have to have the same number of dimensions.
Note
__call__()
is not thread-safe. It sets the arguments usingset_args()
and then runsenqueue_nd_range_kernel()
. Another thread could race it in doing the same things, with undefined outcome. This issue is inherited from the C-level OpenCL API. The recommended solution is to make a kernel (i.e. access prg.kernel_name, which corresponds to making a new kernel) for every thread that may enqueue calls to the kernel.A solution involving implicit locks was discussed and decided against on the mailing list in October 2012.
Changed in version 0.92: local_size was promoted to third positional argument from being a keyword argument. The old keyword argument usage will continue to be accepted with a warning throughout the 0.92 release cycle. This is a backward-compatible change (just barely!) because local_size as third positional argument can only be a
tuple
or None.tuple
instances are never validKernel
arguments, and None is valid as an argument, but its treatment in the wrapper had a bug (now fixed) that prevented it from working.Changed in version 2011.1: Added the g_times_l keyword arg.
-
capture_call
(filename, queue, global_size, local_size, *args, global_offset=None, wait_for=None, g_times_l=False)¶ This method supports the exact same interface as
__call__()
, but instead of invoking the kernel, it writes a self-contained PyOpenCL program to filename that reproduces this invocation. Data and kernel source code will be packaged up in filename’s source code.This is mainly intended as a debugging aid. For example, it can be used to automate the task of creating a small, self-contained test case for an observed problem. It can also help separate a misbehaving kernel from a potentially large or time-consuming outer code.
To use, simply change:
evt = my_kernel(queue, gsize, lsize, arg1, arg2, ...)
to:
evt = my_kernel.capture_call("bug.py", queue, gsize, lsize, arg1, arg2, ...)
New in version 2013.1.
-
classmethod
from_int_ptr
(int_ptr_value, retain=True)¶ Constructs a
pyopencl
handle from a C-level pointer (given as the integer int_ptr_value). If retain is True (the default)pyopencl
will callclRetainXXX
on the provided object. If the previous owner of the object will not release the reference, retain should be set to False, to effectively transfer ownership topyopencl
.Changed in version 2016.1: retain added
-
int_ptr
¶
Instances of this class are hashable, and two instances of this class may be compared using “==” and “!=”. (Hashability was added in version 2011.2.) Two objects are considered the same if the underlying OpenCL object is the same, as established by C pointer equality.
-
-
class
pyopencl.
LocalMemory
(size)¶ A helper class to pass __local memory arguments to kernels.
New in version 0.91.2.
-
size
¶ The size of local buffer in bytes to be provided.
-
-
pyopencl.
enqueue_nd_range_kernel
(queue, kernel, global_work_size, local_work_size, global_work_offset=None, wait_for=None, g_times_l=False)¶ Returns a new
pyopencl.Event
. wait_for may either be None or a list ofpyopencl.Event
instances for whose completion this command waits before starting exeuction.If g_times_l is specified, the global size will be multiplied by the local size. (which makes the behavior more like Nvidia CUDA) In this case, global_size and local_size also do not have to have the same number of dimensions.
Changed in version 2011.1: Added the g_times_l keyword arg.
-
pyopencl.
enqueue_task
(queue, kernel, wait_for=None)¶ Returns a new
pyopencl.Event
. wait_for may either be None or a list ofpyopencl.Event
instances for whose completion this command waits before starting exeuction.