Overview

Teaching: 0 min
Exercises: 0 min
Questions
  • What type of operations can be accelerated using libraries

  • How can libraries be used to accelerate calculations

  • How can CUDA python be used to write my own kernels

  • Worked examples moving from division between vectors to sum reduction

Objectives
  • Learn to use CUDA libraries

  • Learn to accelerate Python code using CUDA

Show examples for each of the CUDA use scenarios mentioned:

After visiting a great number of web pages this week, this NVidia page is the main source I have settled on.

There are two examples here using Anaconda NumbaPro.

There is lots of documentation to read on the Continuum Analytics website - linked to at the above site

Libraries

Anaconda accelerate provides access to numerical libraries optimised for performance on Intel CPUs and NVidia GPUs. Using accelerate, you can access

Compiler directives

I read about @vectorize for automatically accelerating functions, but everything pointed to NumbaPro which has been depreciated. This blog post indicates what has gone where (NumbaPro was paid-for software: now split into Numba (open-source) and Accelerate (free for academic use).

Some Numba examples

Numba user manual

CUDA Python

CUDA functionality can accessed directly from Python code. Information on this page is a bit sparse.

Thankfully the Numba documentation looks fairly comprehensive and includes some examples.

PyCUDA

Looks to be just a wrapper to enable calling kernels written in CUDA C. This would seem to be out of the scope of this course?

FIXME: Find some examples for some of the above (more on GPU obviously). Some material here, the most useful being examples on github:

Continuum Analytics NumbaPro repo

NVIDIA NumbaPro repo

To do list for understanding:

To do list for lesson structure:

Key Points