Introduction to CUDA using python

Examples

next episode

Overview

Teaching: 0 min
Exercises: 0 min

Questions

What type of operations can be accelerated using libraries

How can libraries be used to accelerate calculations

How can CUDA python be used to write my own kernels

Worked examples moving from division between vectors to sum reduction

Objectives

Learn to use CUDA libraries

Learn to accelerate Python code using CUDA

Show examples for each of the CUDA use scenarios mentioned:

libraries
compiler directives - not applicable to python?
programming languages: CUDA Python

After visiting a great number of web pages this week, this NVidia page is the main source I have settled on.

There are two examples here using Anaconda NumbaPro.

There is lots of documentation to read on the Continuum Analytics website - linked to at the above site

Libraries

Anaconda accelerate provides access to numerical libraries optimised for performance on Intel CPUs and NVidia GPUs. Using accelerate, you can access

CUDA library functions for
- basic linear algebra (BLAS)
- sparse matrices PyCUDA allows you to call kernels written in CUDA C - so not appropriate for this course?
- Fast Fourier Tranforms (FFT)
- random numbers
- sorting
Intel Math Kernal Library (MKL) functions for faster BLAS, core maths (UFunc) and FFT operations using the CPU in
- NumPy
- SciPy
- scikit-learn
- NumExpr
a profiler (so you know well you are doing)

Compiler directives

I read about @vectorize for automatically accelerating functions, but everything pointed to NumbaPro which has been depreciated. This blog post indicates what has gone where (NumbaPro was paid-for software: now split into Numba (open-source) and Accelerate (free for academic use).

Some Numba examples

Numba user manual

CUDA Python

CUDA functionality can accessed directly from Python code. Information on this page is a bit sparse.

Thankfully the Numba documentation looks fairly comprehensive and includes some examples.

PyCUDA

Looks to be just a wrapper to enable calling kernels written in CUDA C. This would seem to be out of the scope of this course?

FIXME: Find some examples for some of the above (more on GPU obviously). Some material here, the most useful being examples on github:

Continuum Analytics NumbaPro repo

NVIDIA NumbaPro repo

To do list for understanding:

NVidia page examples (See code folder)
- Mandlebrot example
  - Get last section “Even Bigger Speedups with CUDA Python” working.
  I have tried the Mandlebrot example on Zrek, and only the first part works. I have emailed Nvidia and the GitHub repo owner asking for help updating this code which uses Numbapro (deprecated). No reponse received. Help required! I just got a response, suggesting this has now been fixed.
  - The github repo has a different sequence of steps, so look at what has been done there as well.
- ~~Try the other (Monte Carlo) example from same page~~
  - Understand how the speed ups work in the monte-carlo example then document it
~~Read Numba user manual~~
Read Numba documentation
Find library examples using anaconda accelerate e.g. cuBLAS
Find MKL examples using anaconda accelerate
Work through this set of jupyter notebooks, which looks to be a sub-set of this python resource
- Look out particularly for @vectorize
Read CUDA C programming guide for the detail of how CUDA works

To do list for lesson structure:

Explain what different Numba options are used for:
- numba.jit: CPU compilation of python code. ‘cache’ option for quicker subsequent calls
- numba.vectorize: ufunc with scalar input. Target options: cpu, parallel, cuda
- numba.guvectorize: as above but with input is an arbitrary number of array elements
- For all three above, ‘nopython’ option is quicker
What to compile: identify critical paths in code (ok, but which profiler?)
Link to Numba troubleshooting page

Key Points

previous episode

Introduction to CUDA using python

Examples

next episode

Overview

Libraries

Compiler directives

CUDA Python

PyCUDA

To do list for understanding:

To do list for lesson structure:

Key Points

previous episode

next episode