Batch System API

The batch system interface is used by Toil to abstract over different ways of running batches of jobs, for example Slurm, GridEngine, Mesos, Parasol and a single node. The toil.batchSystems.abstractBatchSystem.AbstractBatchSystem API is implemented to run jobs using a given job management system, e.g. Mesos.

Batch System Enivronmental Variables

Environmental variables allow passing of scheduler specific parameters.

For SLURM there are two environment variables - the first applies to all jobs, while the second defined the partition to use for parallel jobs:

export TOIL_SLURM_ARGS="-t 1:00:00 -q fatq"
export TOIL_SLURM_PE='multicore'

Depending on your SLURM configuration and Python environment, you may need to add –export=ALL to TOIL_SLURM_ARGS in order for the started jobs to properly inherit the environment.

For TORQUE there are two environment variables - one for everything but the resource requirements, and another - for resources requirements (without the -l prefix):

export TOIL_TORQUE_ARGS="-q fatq"
export TOIL_TORQUE_REQS="walltime=1:00:00"

For GridEngine (SGE, UGE), there is an additional environmental variable to define the parallel environment for running multicore jobs:

export TOIL_GRIDENGINE_PE='smp'
export TOIL_GRIDENGINE_ARGS='-q batch.q'

For HTCondor, additional parameters can be included in the submit file passed to condor_submit:

export TOIL_HTCONDOR_PARAMS='requirements = TARGET.has_sse4_2 == true; accounting_group = test'

The environment variable is parsed as a semicolon-separated string of parameter = value pairs.

Batch System API

class toil.batchSystems.abstractBatchSystem.AbstractBatchSystem[source]

An abstract base class to represent the interface the batch system must provide to Toil.

classmethod supportsAutoDeployment() → bool[source]

Whether this batch system supports auto-deployment of the user script itself.

If it does, the setUserScript() can be invoked to set the resource object representing the user script.

Note to implementors: If your implementation returns True here, it should also override

classmethod supportsWorkerCleanup() → bool[source]

Indicates whether this batch system invokes BatchSystemSupport.workerCleanup() after the last job for a particular workflow invocation finishes. Note that the term worker refers to an entire node, not just a worker process. A worker process may run more than one job sequentially, and more than one concurrent worker process may exist on a worker node, for the same workflow. The batch system is said to shut down after the last worker process terminates.

setUserScript(userScript: toil.resource.Resource) → None[source]

Set the user script for this workflow. This method must be called before the first job is issued to this batch system, and only if supportsAutoDeployment() returns True, otherwise it will raise an exception.

Parameters:userScript – the resource object representing the user script or module and the modules it depends on.
set_message_bus(message_bus: toil.bus.MessageBus) → None[source]

Give the batch system an opportunity to connect directly to the message bus, so that it can send informational messages about the jobs it is running to other Toil components.

Currently the only message a batch system may send is JobAnnotationMessage.

issueBatchJob(jobDesc: toil.job.JobDescription, job_environment: Optional[Dict[str, str]] = None) → int[source]

Issues a job with the specified command to the batch system and returns a unique jobID.

Parameters:
  • jobDesc – a toil.job.JobDescription
  • job_environment – a collection of job-specific environment variables to be set on the worker.
Returns:

a unique jobID that can be used to reference the newly issued job

killBatchJobs(jobIDs: List[int]) → None[source]

Kills the given job IDs. After returning, the killed jobs will not appear in the results of getRunningBatchJobIDs. The killed job will not be returned from getUpdatedBatchJob.

Parameters:jobIDs – list of IDs of jobs to kill
getIssuedBatchJobIDs() → List[int][source]

Gets all currently issued jobs

Returns:A list of jobs (as jobIDs) currently issued (may be running, or may be waiting to be run). Despite the result being a list, the ordering should not be depended upon.
getRunningBatchJobIDs() → Dict[int, float][source]

Gets a map of jobs as jobIDs that are currently running (not just waiting) and how long they have been running, in seconds.

Returns:dictionary with currently running jobID keys and how many seconds they have been running as the value
getUpdatedBatchJob(maxWait: int) → Optional[toil.batchSystems.abstractBatchSystem.UpdatedBatchJobInfo][source]

Returns information about job that has updated its status (i.e. ceased running, either successfully or with an error). Each such job will be returned exactly once.

Does not return info for jobs killed by killBatchJobs, although they may cause None to be returned earlier than maxWait.

Parameters:maxWait – the number of seconds to block, waiting for a result
Returns:If a result is available, returns UpdatedBatchJobInfo. Otherwise it returns None. wallTime is the number of seconds (a strictly positive float) in wall-clock time the job ran for, or None if this batch system does not support tracking wall time.
getSchedulingStatusMessage() → Optional[str][source]

Get a log message fragment for the user about anything that might be going wrong in the batch system, if available.

If no useful message is available, return None.

This can be used to report what resource is the limiting factor when scheduling jobs, for example. If the leader thinks the workflow is stuck, the message can be displayed to the user to help them diagnose why it might be stuck.

Returns:User-directed message about scheduling state.
shutdown() → None[source]

Called at the completion of a toil invocation. Should cleanly terminate all worker threads.

setEnv(name: str, value: Optional[str] = None) → None[source]

Set an environment variable for the worker process before it is launched. The worker process will typically inherit the environment of the machine it is running on but this method makes it possible to override specific variables in that inherited environment before the worker is launched. Note that this mechanism is different to the one used by the worker internally to set up the environment of a job. A call to this method affects all jobs issued after this method returns. Note to implementors: This means that you would typically need to copy the variables before enqueuing a job.

If no value is provided it will be looked up from the current environment.

classmethod add_options(parser: Union[argparse.ArgumentParser, argparse._ArgumentGroup]) → None[source]

If this batch system provides any command line options, add them to the given parser.

classmethod setOptions(setOption: Callable[[str, Optional[Callable[[Any], OptionType]], Optional[Callable[[OptionType], None]], Optional[OptionType], Optional[List[str]]], None]) → None[source]

Process command line or configuration options relevant to this batch system.

Parameters:setOption – A function with signature setOption(option_name, parsing_function=None, check_function=None, default=None, env=None) returning nothing, used to update run configuration as a side effect.
getWorkerContexts() → List[AbstractContextManager[Any]][source]

Get a list of picklable context manager objects to wrap worker work in, in order.

Can be used to ask the Toil worker to do things in-process (such as configuring environment variables, hot-deploying user scripts, or cleaning up a node) that would otherwise require a wrapping “executor” process.