toil.batchSystems.slurm

Attributes

EXIT_STATUS_UNAVAILABLE_VALUE

logger

TERMINAL_STATES

NONTERMINAL_STATES

Exceptions

InsufficientSystemResources

Common base class for all non-exit exceptions.

CalledProcessErrorStderr

Version of CalledProcessError that include stderr in the error message if it is set

Classes

BatchJobExitReason

Enum where members are also (and must be) ints

AbstractGridEngineBatchSystem

A partial implementation of BatchSystemSupport for batch systems run on a

OptionSetter

Protocol for the setOption function we get to let us set up CLI options for

Requirer

Base class implementing the storage and presentation of requirements.

SlurmBatchSystem

A partial implementation of BatchSystemSupport for batch systems run on a

Functions

call_command(cmd, *args[, input, timeout, useCLocale, ...])

Simplified calling of external commands.

Module Contents

class toil.batchSystems.slurm.BatchJobExitReason[source]

Bases: enum.IntEnum

Enum where members are also (and must be) ints

FINISHED: int = 1

Successfully finished.

FAILED: int = 2

Job finished, but failed.

LOST: int = 3

Preemptable failure (job’s executing host went away).

KILLED: int = 4

Job killed before finishing.

ERROR: int = 5

Internal error.

MEMLIMIT: int = 6

Job hit batch system imposed memory limit.

MISSING: int = 7

Job disappeared from the scheduler without actually stopping, so Toil killed it.

MAXJOBDURATION: int = 8

Job ran longer than –maxJobDuration, so Toil killed it.

PARTITION: int = 9

Job was not able to talk to the leader via the job store, so Toil declared it failed.

classmethod to_string(value)[source]

Convert to human-readable string.

Given an int that may be or may be equal to a value from the enum, produce the string value of its matching enum entry, or a stringified int.

Parameters:

value (int)

Return type:

str

toil.batchSystems.slurm.EXIT_STATUS_UNAVAILABLE_VALUE = 255
exception toil.batchSystems.slurm.InsufficientSystemResources(requirer, resource, available=None, batch_system=None, source=None, details=[])[source]

Bases: Exception

Common base class for all non-exit exceptions.

Parameters:
  • requirer (toil.job.Requirer)

  • resource (str)

  • available (Optional[toil.job.ParsedRequirement])

  • batch_system (Optional[str])

  • source (Optional[str])

  • details (List[str])

__str__()[source]

Explain the exception.

Return type:

str

class toil.batchSystems.slurm.AbstractGridEngineBatchSystem(config, maxCores, maxMemory, maxDisk)[source]

Bases: toil.batchSystems.cleanup_support.BatchSystemCleanupSupport

A partial implementation of BatchSystemSupport for batch systems run on a standard HPC cluster. By default auto-deployment is not implemented.

exception GridEngineThreadException[source]

Bases: Exception

Common base class for all non-exit exceptions.

class GridEngineThread(newJobsQueue, updatedJobsQueue, killQueue, killedJobsQueue, boss)[source]

Bases: threading.Thread

A class that represents a thread of control.

This class can be safely subclassed in a limited fashion. There are two ways to specify the activity: by passing a callable object to the constructor, or by overriding the run() method in a subclass.

Parameters:
getBatchSystemID(jobID)[source]

Get batch system-specific job ID

Note: for the moment this is the only consistent way to cleanly get the batch system job ID

Parameters:

jobID (int) – Toil BatchSystem numerical job ID

Return type:

str

forgetJob(jobID)[source]

Remove jobID passed

Parameters:

jobID (int) – toil job ID

Return type:

None

createJobs(newJob)[source]

Create a new job with the given attributes.

Implementation-specific; called by GridEngineThread.run()

Parameters:

newJob (JobTuple)

Return type:

bool

killJobs()[source]

Kill any running jobs within thread

checkOnJobs()[source]

Check and update status of all running jobs.

Respects statePollingWait and will return cached results if not within time period to talk with the scheduler.

run()[source]

Run any new jobs

abstract coalesce_job_exit_codes(batch_job_id_list)[source]

Returns exit codes and possibly exit reasons for a list of jobs, or None if they are running.

Called by GridEngineThread.checkOnJobs().

This is an optional part of the interface. It should raise NotImplementedError if not actually implemented for a particular scheduler.

Parameters:

batch_job_id_list (string) – List of batch system job ID

Return type:

List[Union[int, Tuple[int, Optional[toil.batchSystems.abstractBatchSystem.BatchJobExitReason]], None]]

abstract prepareSubmission(cpu, memory, jobID, command, jobName, job_environment=None, gpus=None)[source]

Preparation in putting together a command-line string for submitting to batch system (via submitJob().)

Param:

int cpu

Param:

int memory

Param:

int jobID: Toil job ID

Param:

string subLine: the command line string to be called

Param:

string jobName: the name of the Toil job, to provide metadata to batch systems if desired

Param:

dict job_environment: the environment variables to be set on the worker

Return type:

List[str]

Parameters:
  • cpu (int)

  • memory (int)

  • jobID (int)

  • command (str)

  • jobName (str)

  • job_environment (Optional[Dict[str, str]])

  • gpus (Optional[int])

abstract submitJob(subLine)[source]

Wrapper routine for submitting the actual command-line call, then processing the output to get the batch system job ID

Param:

string subLine: the literal command line string to be called

Return type:

string: batch system job ID, which will be stored internally

abstract getRunningJobIDs()[source]

Get a list of running job IDs. Implementation-specific; called by boss AbstractGridEngineBatchSystem implementation via AbstractGridEngineBatchSystem.getRunningBatchJobIDs()

Return type:

list

abstract killJob(jobID)[source]

Kill specific job with the Toil job ID. Implementation-specific; called by GridEngineThread.killJobs()

Parameters:

jobID (string) – Toil job ID

abstract getJobExitCode(batchJobID)[source]

Returns job exit code and possibly an instance of abstractBatchSystem.BatchJobExitReason.

Returns None if the job is still running.

If the job is not running but the exit code is not available, it will be EXIT_STATUS_UNAVAILABLE_VALUE. Implementation-specific; called by GridEngineThread.checkOnJobs().

The exit code will only be 0 if the job affirmatively succeeded.

Parameters:

batchjobID (string) – batch system job ID

Return type:

Union[int, Tuple[int, Optional[toil.batchSystems.abstractBatchSystem.BatchJobExitReason]], None]

classmethod supportsAutoDeployment()[source]

Whether this batch system supports auto-deployment of the user script itself.

If it does, the setUserScript() can be invoked to set the resource object representing the user script.

Note to implementors: If your implementation returns True here, it should also override

issueBatchJob(command, jobDesc, job_environment=None)[source]

Issues a job with the specified command to the batch system and returns a unique job ID number.

Parameters:
  • command (str) – the command to execute somewhere to run the Toil worker process

  • job_desc – the JobDescription for the job being run

  • job_environment (Optional[Dict[str, str]]) – a collection of job-specific environment variables to be set on the worker.

Returns:

a unique job ID number that can be used to reference the newly issued job

killBatchJobs(jobIDs)[source]

Kills the given jobs, represented as Job ids, then checks they are dead by checking they are not in the list of issued jobs.

getIssuedBatchJobIDs()[source]

Gets the list of issued jobs

getRunningBatchJobIDs()[source]

Retrieve running job IDs from local and batch scheduler.

Respects statePollingWait and will return cached results if not within time period to talk with the scheduler.

getUpdatedBatchJob(maxWait)[source]

Returns information about job that has updated its status (i.e. ceased running, either successfully or with an error). Each such job will be returned exactly once.

Does not return info for jobs killed by killBatchJobs, although they may cause None to be returned earlier than maxWait.

Parameters:

maxWait – the number of seconds to block, waiting for a result

Returns:

If a result is available, returns UpdatedBatchJobInfo. Otherwise it returns None. wallTime is the number of seconds (a strictly positive float) in wall-clock time the job ran for, or None if this batch system does not support tracking wall time.

shutdown()[source]

Signals thread to shutdown (via sentinel) then cleanly joins the thread

Return type:

None

setEnv(name, value=None)[source]

Set an environment variable for the worker process before it is launched. The worker process will typically inherit the environment of the machine it is running on but this method makes it possible to override specific variables in that inherited environment before the worker is launched. Note that this mechanism is different to the one used by the worker internally to set up the environment of a job. A call to this method affects all jobs issued after this method returns. Note to implementors: This means that you would typically need to copy the variables before enqueuing a job.

If no value is provided it will be looked up from the current environment.

Parameters:
  • name – the environment variable to be set on the worker.

  • value – if given, the environment variable given by name will be set to this value. If None, the variable’s current value will be used as the value on the worker

Raises:

RuntimeError – if value is None and the name cannot be found in the environment

classmethod getWaitDuration()[source]
sleepSeconds(sleeptime=1)[source]

Helper function to drop on all state-querying functions to avoid over-querying.

with_retries(operation, *args, **kwargs)[source]

Call operation with args and kwargs. If one of the calls to a command fails, sleep and try again.

class toil.batchSystems.slurm.OptionSetter[source]

Bases: Protocol

Protocol for the setOption function we get to let us set up CLI options for each batch system.

Actual functionality is defined in the Config class.

OptionType
__call__(option_name, parsing_function=None, check_function=None, default=None, env=None, old_names=None)[source]
Parameters:
  • option_name (str)

  • parsing_function (Optional[Callable[[Any], OptionType]])

  • check_function (Optional[Callable[[OptionType], Union[None, bool]]])

  • default (Optional[OptionType])

  • env (Optional[List[str]])

  • old_names (Optional[List[str]])

Return type:

bool

class toil.batchSystems.slurm.Requirer(requirements)[source]

Base class implementing the storage and presentation of requirements.

Has cores, memory, disk, and preemptability as properties.

Parameters:

requirements (Mapping[str, ParseableRequirement])

assignConfig(config)[source]

Assign the given config object to be used to provide default values.

Must be called exactly once on a loaded JobDescription before any requirements are queried.

Parameters:

config (toil.common.Config) – Config object to query

Return type:

None

__getstate__()[source]

Return the dict to use as the instance’s __dict__ when pickling.

Return type:

Dict[str, Any]

__copy__()[source]

Return a semantically-shallow copy of the object, for copy.copy().

Return type:

Requirer

__deepcopy__(memo)[source]

Return a semantically-deep copy of the object, for copy.deepcopy().

Parameters:

memo (Any)

Return type:

Requirer

property requirements: RequirementsDict

Get dict containing all non-None, non-defaulted requirements.

Return type:

RequirementsDict

property disk: int

Get the maximum number of bytes of disk required.

Return type:

int

property memory: int

Get the maximum number of bytes of memory required.

Return type:

int

property cores: int | float

Get the number of CPU cores required.

Return type:

Union[int, float]

property preemptible: bool

Whether a preemptible node is permitted, or a nonpreemptible one is required.

Return type:

bool

preemptable(val)[source]
Parameters:

val (ParseableFlag)

Return type:

None

property accelerators: List[AcceleratorRequirement]

Any accelerators, such as GPUs, that are needed.

Return type:

List[AcceleratorRequirement]

scale(requirement, factor)[source]

Return a copy of this object with the given requirement scaled up or down.

Only works on requirements where that makes sense.

Parameters:
Return type:

Requirer

requirements_string()[source]

Get a nice human-readable string of our requirements.

Return type:

str

exception toil.batchSystems.slurm.CalledProcessErrorStderr(returncode, cmd, output=None, stderr=None)[source]

Bases: subprocess.CalledProcessError

Version of CalledProcessError that include stderr in the error message if it is set

__str__()[source]

Return str(self).

Return type:

str

toil.batchSystems.slurm.call_command(cmd, *args, input=None, timeout=None, useCLocale=True, env=None, quiet=False)[source]

Simplified calling of external commands.

If the process fails, CalledProcessErrorStderr is raised.

The captured stderr is always printed, regardless of if an exception occurs, so it can be logged.

Always logs the command at debug log level.

Parameters:
  • quiet (Optional[bool]) – If True, do not log the command output. If False (the default), do log the command output at debug log level.

  • useCLocale (bool) – If True, C locale is forced, to prevent failures that can occur in some batch systems when using UTF-8 locale.

  • cmd (List[str])

  • args (str)

  • input (Optional[str])

  • timeout (Optional[float])

  • env (Optional[Dict[str, str]])

Returns:

Command standard output, decoded as utf-8.

Return type:

str

toil.batchSystems.slurm.logger
toil.batchSystems.slurm.TERMINAL_STATES: Dict[str, toil.batchSystems.abstractBatchSystem.BatchJobExitReason]
toil.batchSystems.slurm.NONTERMINAL_STATES: Set[str]
class toil.batchSystems.slurm.SlurmBatchSystem(config, maxCores, maxMemory, maxDisk)[source]

Bases: toil.batchSystems.abstractGridEngineBatchSystem.AbstractGridEngineBatchSystem

A partial implementation of BatchSystemSupport for batch systems run on a standard HPC cluster. By default auto-deployment is not implemented.

class GridEngineThread(newJobsQueue, updatedJobsQueue, killQueue, killedJobsQueue, boss)[source]

Bases: toil.batchSystems.abstractGridEngineBatchSystem.AbstractGridEngineBatchSystem.GridEngineThread

A class that represents a thread of control.

This class can be safely subclassed in a limited fashion. There are two ways to specify the activity: by passing a callable object to the constructor, or by overriding the run() method in a subclass.

Parameters:
getRunningJobIDs()[source]

Get a list of running job IDs. Implementation-specific; called by boss AbstractGridEngineBatchSystem implementation via AbstractGridEngineBatchSystem.getRunningBatchJobIDs()

Return type:

list

killJob(jobID)[source]

Kill specific job with the Toil job ID. Implementation-specific; called by GridEngineThread.killJobs()

Parameters:

jobID (string) – Toil job ID

prepareSubmission(cpu, memory, jobID, command, jobName, job_environment=None, gpus=None)[source]

Preparation in putting together a command-line string for submitting to batch system (via submitJob().)

Param:

int cpu

Param:

int memory

Param:

int jobID: Toil job ID

Param:

string subLine: the command line string to be called

Param:

string jobName: the name of the Toil job, to provide metadata to batch systems if desired

Param:

dict job_environment: the environment variables to be set on the worker

Return type:

List[str]

Parameters:
  • cpu (int)

  • memory (int)

  • jobID (int)

  • command (str)

  • jobName (str)

  • job_environment (Optional[Dict[str, str]])

  • gpus (Optional[int])

submitJob(subLine)[source]

Wrapper routine for submitting the actual command-line call, then processing the output to get the batch system job ID

Param:

string subLine: the literal command line string to be called

Return type:

string: batch system job ID, which will be stored internally

coalesce_job_exit_codes(batch_job_id_list)[source]

Collect all job exit codes in a single call. :param batch_job_id_list: list of Job ID strings, where each string has the form “<job>[.<task>]”. :return: list of job exit codes or exit code, exit reason pairs associated with the list of job IDs.

Parameters:

batch_job_id_list (list)

Return type:

List[Union[int, Tuple[int, Optional[toil.batchSystems.abstractBatchSystem.BatchJobExitReason]], None]]

getJobExitCode(batchJobID)[source]

Get job exit code for given batch job ID. :param batchJobID: string of the form “<job>[.<task>]”. :return: integer job exit code.

Parameters:

batchJobID (str)

Return type:

Union[int, Tuple[int, Optional[toil.batchSystems.abstractBatchSystem.BatchJobExitReason]], None]

prepareSbatch(cpu, mem, jobID, jobName, job_environment, gpus)[source]

Returns the sbatch command line to run to queue the job.

Parameters:
  • cpu (int)

  • mem (int)

  • jobID (int)

  • jobName (str)

  • job_environment (Optional[Dict[str, str]])

  • gpus (Optional[int])

Return type:

List[str]

parse_elapsed(elapsed)[source]
classmethod add_options(parser)[source]

If this batch system provides any command line options, add them to the given parser.

Parameters:

parser (Union[argparse.ArgumentParser, argparse._ArgumentGroup])

OptionType
classmethod setOptions(setOption)[source]

Process command line or configuration options relevant to this batch system.

Parameters:

setOption (toil.batchSystems.options.OptionSetter) – A function with signature setOption(option_name, parsing_function=None, check_function=None, default=None, env=None) returning nothing, used to update run configuration as a side effect.

Return type:

None