toil.lib.accelerators

Accelerator (i.e. GPU) utilities for Toil

Attributes

memoize

Memoize a function result based on its parameters using this decorator.

Classes

AcceleratorRequirement

Requirement for one or more computational accelerators, like a GPU or FPGA.

Functions

have_working_nvidia_smi()

Return True if the nvidia-smi binary, from nvidia's CUDA userspace

get_host_accelerator_numbers()

Work out what accelerator is what.

have_working_nvidia_docker_runtime()

Return True if Docker exists and can handle an "nvidia" runtime and the "--gpus" option.

count_nvidia_gpus()

Return the number of nvidia GPUs seen by nvidia-smi, or 0 if it is not working.

get_individual_local_accelerators()

Determine all the local accelerators available. Report each with count 1,

get_restrictive_environment_for_local_accelerators(...)

Get environment variables which can be applied to a process to restrict it

Module Contents

class toil.lib.accelerators.AcceleratorRequirement[source]

Bases: TypedDict

Requirement for one or more computational accelerators, like a GPU or FPGA.

count: int

How many of the accelerator are needed to run the job.

kind: str

What kind of accelerator is required. Can be “gpu”. Other kinds defined in the future might be “fpga”, etc.

model: typing_extensions.NotRequired[str]

What model of accelerator is needed. The exact set of values available depends on what the backing scheduler calls its accelerators; strings like “nvidia-tesla-k80” might be expected to work. If a specific model of accelerator is not required, this should be absent.

brand: typing_extensions.NotRequired[str]

What brand or manufacturer of accelerator is required. The exact set of values available depends on what the backing scheduler calls the brands of its accleerators; strings like “nvidia” or “amd” might be expected to work. If a specific brand of accelerator is not required (for example, because the job can use multiple brands of accelerator that support a given API) this should be absent.

api: typing_extensions.NotRequired[str]

What API is to be used to communicate with the accelerator. This can be “cuda”. Other APIs supported in the future might be “rocm”, “opencl”, “metal”, etc. If the job does not need a particular API to talk to the accelerator, this should be absent.

toil.lib.accelerators.memoize

Memoize a function result based on its parameters using this decorator.

For example, this can be used in place of lazy initialization. If the decorating function is invoked by multiple threads, the decorated function may be called more than once with the same arguments.

toil.lib.accelerators.have_working_nvidia_smi()[source]

Return True if the nvidia-smi binary, from nvidia’s CUDA userspace utilities, is installed and can be run successfully.

TODO: This isn’t quite the same as the check that cwltool uses to decide if it can fulfill a CUDARequirement.

Return type:

bool

toil.lib.accelerators.get_host_accelerator_numbers()[source]

Work out what accelerator is what.

For each accelerator visible to us, returns the host-side (for example, outside-of-Slurm-job) number for that accelerator. It is often the same as the apparent number.

Can be used with Docker’s –gpus=‘“device=#,#,#”’ option to forward the right GPUs as seen from a Docker daemon.

Return type:

List[int]

toil.lib.accelerators.have_working_nvidia_docker_runtime()[source]

Return True if Docker exists and can handle an “nvidia” runtime and the “–gpus” option.

Return type:

bool

toil.lib.accelerators.count_nvidia_gpus()[source]

Return the number of nvidia GPUs seen by nvidia-smi, or 0 if it is not working.

Return type:

int

toil.lib.accelerators.get_individual_local_accelerators()[source]

Determine all the local accelerators available. Report each with count 1, in the order of the number that can be used to assign them.

TODO: How will numbers work with multiple types of accelerator? We need an accelerator assignment API.

Return type:

List[toil.job.AcceleratorRequirement]

toil.lib.accelerators.get_restrictive_environment_for_local_accelerators(accelerator_numbers)[source]

Get environment variables which can be applied to a process to restrict it to using only the given accelerator numbers.

The numbers are in the space of accelerators returned by get_individual_local_accelerators().

Parameters:

accelerator_numbers (Union[Set[int], List[int]])

Return type:

Dict[str, str]