CUDA¶
If you have an Nvidia GPU, you can accelerate some CPU-intensive tasks of the EnsensoSDK with CUDA .
You can globally enable CUDA support with the Enabled node. After it is activated, all commands that support it will be executed on the GPU. The following commands can be accelerated with CUDA:
Note
Executing the ComputeDisparityMap command automatically rectifies the camera images. If you need a disparity map, you should therefore never explicitly execute the RectifyImages command, but directly call ComputeDisparityMap. This will yield a better performance, because it allows the CUDA implementation to interleave the rectification with the stereo matching and avoid unnecessary copy operations to the GPU.
Note
GPU support in the PartFinder command is in alpha stage. Currently, only the LineMod hypothesis search can be performed on the GPU. However, the implementation of this search is different from that of the CPU, so the results of the PartFinder can be significantly different between CPU and GPU.
To switch the search from CPU to GPU, CUDA should be explicitly enabled, see the CUDA/Enabled parameter.
The hypothesis search on GPU is not yet supported on Nvidia Jetson systems.
Multiple GPUs¶
You can see a list of all available GPUs in the Devices node. By default all commands will use the first GPU in that list. The Nvidia driver will also order the devices such that the most powerful one becomes the first entry. In most cases you therefore won’t need to change anything.
If you want to use a different GPU for computations you can globally select it with the Device node. Additionally, you can override the device to be used in the parameters of all commands that support CUDA (see the parameter descriptions of the commands for more information). This way, you can also distribute the computations for different cameras to multiple CUDA devices.
Hints and Limitations¶
You need to have a GPU with at least Compute Capability 3.0. To check whether your GPU satisfies this requirement, you can use this overview on the Nvidia website.
By default, the NxLib keeps memory allocated on the GPU in between commands that use CUDA. This increases performance. You can limit or disable this behavior with the StaticBuffers setting.
The CUDA implementation of Semi-Global-Matching needs the number of disparities to be a multiple of 32. For compatibility with the CPU implementation, you can still choose it in multiples of 16, but the computation will automatically use the next larger multiple of 32.