The batch system interface¶
The batch system interface is used by Toil to abstract over different ways of running
batches of jobs, for example Slurm, GridEngine, Mesos, Parasol and a single node. The
toil.batchSystems.abstractBatchSystem.AbstractBatchSystem API is implemented to
run jobs using a given job management system, e.g. Mesos.
Environmental variables allow passing of scheduler specific parameters.
export TOIL_SLURM_ARGS="-t 1:00:00 -q fatq"
For GridEngine (SGE, UGE), there is an additional environmental variable to define the parallel environment for running multicore jobs:
export TOIL_GRIDENGINE_PE='smp' export TOIL_GRIDENGINE_ARGS='-q batch.q'
An abstract (as far as Python currently allows) base class to represent the interface the batch system must provide to Toil.
Whether this batch system supports hot deployment of the user script itself. If it does, the
setUserScript()can be invoked to set the resource object representing the user script.
Note to implementors: If your implementation returns True here, it should also override
Return type: bool
Indicates whether this batch system invokes
workerCleanup()after the last job for a particular workflow invocation finishes. Note that the term worker refers to an entire node, not just a worker process. A worker process may run more than one job sequentially, and more than one concurrent worker process may exist on a worker node, for the same workflow. The batch system is said to shut down after the last worker process terminates.
Return type: bool
Set the user script for this workflow. This method must be called before the first job is issued to this batch system, and only if
supportsHotDeployment()returns True, otherwise it will raise an exception.
Parameters: userScript (toil.resource.Resource) – the resource object representing the user script or module and the modules it depends on.
Issues a job with the specified command to the batch system and returns a unique jobID.
- command (str) – the string to run as a command,
- memory (int) – int giving the number of bytes of memory the job needs to run
- cores (float) – the number of cores needed for the job
- disk (int) – int giving the number of bytes of disk space the job needs to run
- preemptable (bool) – True if the job can be run on a preemptable node
a unique jobID that can be used to reference the newly issued job
Kills the given job IDs.
Parameters: jobIDs (list[int]) – list of IDs of jobs to kill
Gets all currently issued jobs
Returns: A list of jobs (as jobIDs) currently issued (may be running, or may be waiting to be run). Despite the result being a list, the ordering should not be depended upon. Return type: list[str]
Gets a map of jobs as jobIDs that are currently running (not just waiting) and how long they have been running, in seconds.
Returns: dictionary with currently running jobID keys and how many seconds they have been running as the value Return type: dict[str,float]
Returns a job that has updated its status.
Parameters: maxWait (float) – the number of seconds to block, waiting for a result Return type: (str, int)|None Returns: If a result is available, returns a tuple (jobID, exitValue, wallTime). Otherwise it returns None. wallTime is the number of seconds (a float) in wall-clock time the job ran for or None if this batch system does not support tracking wall time. Returns None for jobs that were killed.
Called at the completion of a toil invocation. Should cleanly terminate all worker threads.
Set an environment variable for the worker process before it is launched. The worker process will typically inherit the environment of the machine it is running on but this method makes it possible to override specific variables in that inherited environment before the worker is launched. Note that this mechanism is different to the one used by the worker internally to set up the environment of a job. A call to this method affects all jobs issued after this method returns. Note to implementors: This means that you would typically need to copy the variables before enqueuing a job.
If no value is provided it will be looked up from the current environment.
Gets the period of time to wait (floating point, in seconds) between checking for missing/overlong jobs.