Job Methods API

Jobs are the units of work in Toil which are composed into workflows.

class toil.job.Job(memory=None, cores=None, disk=None, preemptable=None, unitName=None, checkpoint=False, displayName=None)[source]

Class represents a unit of work in toil.

__init__(memory=None, cores=None, disk=None, preemptable=None, unitName=None, checkpoint=False, displayName=None)[source]

This method must be called by any overriding constructor.

Parameters:
  • memory (int or string convertible by toil.lib.humanize.human2bytes to an int) – the maximum number of bytes of memory the job will require to run.
  • cores (int or string convertible by toil.lib.humanize.human2bytes to an int) – the number of CPU cores required.
  • disk (int or string convertible by toil.lib.humanize.human2bytes to an int) – the amount of local disk space required by the job, expressed in bytes.
  • preemptable (bool) – if the job can be run on a preemptable node.
  • checkpoint – if any of this job’s successor jobs completely fails, exhausting all their retries, remove any successor jobs and rerun this job to restart the subtree. Job must be a leaf vertex in the job graph when initially defined, see toil.job.Job.checkNewCheckpointsAreCutVertices().
run(fileStore)[source]

Override this function to perform work and dynamically create successor jobs.

Parameters:fileStore (toil.fileStores.abstractFileStore.AbstractFileStore) – Used to create local and globally sharable temporary files and to send log messages to the leader process.
Returns:The return value of the function can be passed to other jobs by means of toil.job.Job.rv().
addChild(childJob)[source]

Adds childJob to be run as child of this job. Child jobs will be run directly after this job’s toil.job.Job.run() method has completed.

Parameters:childJob (toil.job.Job) –
Returns:childJob
Return type:toil.job.Job
hasChild(childJob)[source]

Check if childJob is already a child of this job.

Parameters:childJob (toil.job.Job) –
Returns:True if childJob is a child of the job, else False.
Return type:bool
addFollowOn(followOnJob)[source]

Adds a follow-on job, follow-on jobs will be run after the child jobs and their successors have been run.

Parameters:followOnJob (toil.job.Job) –
Returns:followOnJob
Return type:toil.job.Job
hasFollowOn(followOnJob)[source]

Check if given job is already a follow-on of this job.

Parameters:followOnJob (toil.job.Job) –
Returns:True if the followOnJob is a follow-on of this job, else False.
Return type:bool
addService(service, parentService=None)[source]

Add a service.

The toil.job.Job.Service.start() method of the service will be called after the run method has completed but before any successors are run. The service’s toil.job.Job.Service.stop() method will be called once the successors of the job have been run.

Services allow things like databases and servers to be started and accessed by jobs in a workflow.

Raises:

toil.job.JobException – If service has already been made the child of a job or another service.

Parameters:
  • service (toil.job.Job.Service) – Service to add.
  • parentService (toil.job.Job.Service) – Service that will be started before ‘service’ is started. Allows trees of services to be established. parentService must be a service of this job.
Returns:

a promise that will be replaced with the return value from toil.job.Job.Service.start() of service in any successor of the job.

Return type:

toil.job.Promise

addChildFn(fn, *args, **kwargs)[source]

Adds a function as a child job.

Parameters:fn – Function to be run as a child job with *args and **kwargs as arguments to this function. See toil.job.FunctionWrappingJob for reserved keyword arguments used to specify resource requirements.
Returns:The new child job that wraps fn.
Return type:toil.job.FunctionWrappingJob
addFollowOnFn(fn, *args, **kwargs)[source]

Adds a function as a follow-on job.

Parameters:fn – Function to be run as a follow-on job with *args and **kwargs as arguments to this function. See toil.job.FunctionWrappingJob for reserved keyword arguments used to specify resource requirements.
Returns:The new follow-on job that wraps fn.
Return type:toil.job.FunctionWrappingJob
addChildJobFn(fn, *args, **kwargs)[source]

Adds a job function as a child job. See toil.job.JobFunctionWrappingJob for a definition of a job function.

Parameters:fn – Job function to be run as a child job with *args and **kwargs as arguments to this function. See toil.job.JobFunctionWrappingJob for reserved keyword arguments used to specify resource requirements.
Returns:The new child job that wraps fn.
Return type:toil.job.JobFunctionWrappingJob
addFollowOnJobFn(fn, *args, **kwargs)[source]

Add a follow-on job function. See toil.job.JobFunctionWrappingJob for a definition of a job function.

Parameters:fn – Job function to be run as a follow-on job with *args and **kwargs as arguments to this function. See toil.job.JobFunctionWrappingJob for reserved keyword arguments used to specify resource requirements.
Returns:The new follow-on job that wraps fn.
Return type:toil.job.JobFunctionWrappingJob
tempDir

Shortcut to calling job.fileStore.getLocalTempDir(). Temp dir is created on first call and will be returned for first and future calls :return: Path to tempDir. See job.fileStore.getLocalTempDir :rtype: str

log(text, level=20)[source]

convenience wrapper for fileStore.logToMaster()

static wrapFn(fn, *args, **kwargs)[source]

Makes a Job out of a function. Convenience function for constructor of toil.job.FunctionWrappingJob.

Parameters:fn – Function to be run with *args and **kwargs as arguments. See toil.job.JobFunctionWrappingJob for reserved keyword arguments used to specify resource requirements.
Returns:The new function that wraps fn.
Return type:toil.job.FunctionWrappingJob
static wrapJobFn(fn, *args, **kwargs)[source]

Makes a Job out of a job function. Convenience function for constructor of toil.job.JobFunctionWrappingJob.

Parameters:fn – Job function to be run with *args and **kwargs as arguments. See toil.job.JobFunctionWrappingJob for reserved keyword arguments used to specify resource requirements.
Returns:The new job function that wraps fn.
Return type:toil.job.JobFunctionWrappingJob
encapsulate()[source]

Encapsulates the job, see toil.job.EncapsulatedJob. Convenience function for constructor of toil.job.EncapsulatedJob.

Returns:an encapsulated version of this job.
Return type:toil.job.EncapsulatedJob
rv(*path)[source]

Creates a promise (toil.job.Promise) representing a return value of the job’s run method, or, in case of a function-wrapping job, the wrapped function’s return value.

Parameters:path ((Any)) – Optional path for selecting a component of the promised return value. If absent or empty, the entire return value will be used. Otherwise, the first element of the path is used to select an individual item of the return value. For that to work, the return value must be a list, dictionary or of any other type implementing the __getitem__() magic method. If the selected item is yet another composite value, the second element of the path can be used to select an item from it, and so on. For example, if the return value is [6,{‘a’:42}], .rv(0) would select 6 , rv(1) would select {‘a’:3} while rv(1,’a’) would select 3. To select a slice from a return value that is slicable, e.g. tuple or list, the path element should be a slice object. For example, assuming that the return value is [6, 7, 8, 9] then .rv(slice(1, 3)) would select [7, 8]. Note that slicing really only makes sense at the end of path.
Returns:A promise representing the return value of this jobs toil.job.Job.run() method.
Return type:toil.job.Promise
prepareForPromiseRegistration(jobStore)[source]

Ensure that a promise by this job (the promissor) can register with the promissor when another job referring to the promise (the promissee) is being serialized. The promissee holds the reference to the promise (usually as part of the the job arguments) and when it is being pickled, so will the promises it refers to. Pickling a promise triggers it to be registered with the promissor.

Returns:
checkJobGraphForDeadlocks()[source]

See toil.job.Job.checkJobGraphConnected(), toil.job.Job.checkJobGraphAcyclic() and toil.job.Job.checkNewCheckpointsAreLeafVertices() for more info.

Raises:toil.job.JobGraphDeadlockException – if the job graph is cyclic, contains multiple roots or contains checkpoint jobs that are not leaf vertices when defined (see toil.job.Job.checkNewCheckpointsAreLeaves()).
getRootJobs()[source]
Returns:The roots of the connected component of jobs that contains this job. A root is a job with no predecessors.

:rtype : set of toil.job.Job instances

checkJobGraphConnected()[source]
Raises:toil.job.JobGraphDeadlockException – if toil.job.Job.getRootJobs() does not contain exactly one root job.

As execution always starts from one root job, having multiple root jobs will cause a deadlock to occur.

checkJobGraphAcylic()[source]
Raises:toil.job.JobGraphDeadlockException – if the connected component of jobs containing this job contains any cycles of child/followOn dependencies in the augmented job graph (see below). Such cycles are not allowed in valid job graphs.

A follow-on edge (A, B) between two jobs A and B is equivalent to adding a child edge to B from (1) A, (2) from each child of A, and (3) from the successors of each child of A. We call each such edge an edge an “implied” edge. The augmented job graph is a job graph including all the implied edges.

For a job graph G = (V, E) the algorithm is O(|V|^2). It is O(|V| + |E|) for a graph with no follow-ons. The former follow-on case could be improved!

checkNewCheckpointsAreLeafVertices()[source]

A checkpoint job is a job that is restarted if either it fails, or if any of its successors completely fails, exhausting their retries.

A job is a leaf it is has no successors.

A checkpoint job must be a leaf when initially added to the job graph. When its run method is invoked it can then create direct successors. This restriction is made to simplify implementation.

Raises:toil.job.JobGraphDeadlockException – if there exists a job being added to the graph for which checkpoint=True and which is not a leaf.
defer(function, *args, **kwargs)[source]

Register a deferred function, i.e. a callable that will be invoked after the current attempt at running this job concludes. A job attempt is said to conclude when the job function (or the toil.job.Job.run() method for class-based jobs) returns, raises an exception or after the process running it terminates abnormally. A deferred function will be called on the node that attempted to run the job, even if a subsequent attempt is made on another node. A deferred function should be idempotent because it may be called multiple times on the same node or even in the same process. More than one deferred function may be registered per job attempt by calling this method repeatedly with different arguments. If the same function is registered twice with the same or different arguments, it will be called twice per job attempt.

Examples for deferred functions are ones that handle cleanup of resources external to Toil, like Docker containers, files outside the work directory, etc.

Parameters:
  • function (callable) – The function to be called after this job concludes.
  • args (list) – The arguments to the function
  • kwargs (dict) – The keyword arguments to the function
getTopologicalOrderingOfJobs()[source]
Returns:a list of jobs such that for all pairs of indices i, j for which i < j, the job at index i can be run before the job at index j.
Return type:list