toil.lib.accelerators

Accelerator (i.e. GPU) utilities for Toil

Functions

have_working_nvidia_smi()

Return True if the nvidia-smi binary, from nvidia's CUDA userspace

get_host_accelerator_numbers()

Work out what accelerator is what.

have_working_nvidia_docker_runtime()

Return True if Docker exists and can handle an "nvidia" runtime and the "--gpus" option.

count_nvidia_gpus()

Return the number of nvidia GPUs seen by nvidia-smi, or 0 if it is not working.

count_amd_gpus()

Return the number of amd GPUs seen by rocm-smi, or 0 if it is not working.

get_individual_local_accelerators()

Determine all the local accelerators available. Report each with count 1,

get_restrictive_environment_for_local_accelerators(...)

Get environment variables which can be applied to a process to restrict it

Module Contents

toil.lib.accelerators.have_working_nvidia_smi()[source]

Return True if the nvidia-smi binary, from nvidia’s CUDA userspace utilities, is installed and can be run successfully.

TODO: This isn’t quite the same as the check that cwltool uses to decide if it can fulfill a CUDARequirement.

Return type:

bool

toil.lib.accelerators.get_host_accelerator_numbers()[source]

Work out what accelerator is what.

For each accelerator visible to us, returns the host-side (for example, outside-of-Slurm-job) number for that accelerator. It is often the same as the apparent number.

Can be used with Docker’s –gpus=‘“device=#,#,#”’ option to forward the right GPUs as seen from a Docker daemon.

Return type:

List[int]

toil.lib.accelerators.have_working_nvidia_docker_runtime()[source]

Return True if Docker exists and can handle an “nvidia” runtime and the “–gpus” option.

Return type:

bool

toil.lib.accelerators.count_nvidia_gpus()[source]

Return the number of nvidia GPUs seen by nvidia-smi, or 0 if it is not working.

Return type:

int

toil.lib.accelerators.count_amd_gpus()[source]

Return the number of amd GPUs seen by rocm-smi, or 0 if it is not working. :return:

Return type:

int

toil.lib.accelerators.get_individual_local_accelerators()[source]

Determine all the local accelerators available. Report each with count 1, in the order of the number that can be used to assign them.

TODO: How will numbers work with multiple types of accelerator? We need an accelerator assignment API.

Return type:

List[toil.job.AcceleratorRequirement]

toil.lib.accelerators.get_restrictive_environment_for_local_accelerators(accelerator_numbers)[source]

Get environment variables which can be applied to a process to restrict it to using only the given accelerator numbers.

The numbers are in the space of accelerators returned by get_individual_local_accelerators().

Parameters:

accelerator_numbers (Union[Set[int], List[int]])

Return type:

Dict[str, str]