toil.common

Module Contents

Classes

Config

Class to represent configuration operations for a toil workflow run.

Toil

A context manager that represents a Toil workflow.

ToilMetrics

Functions

parser_with_common_options([provisioner_options, ...])

addOptions(parser[, config, jobstore_as_flag])

Add Toil command line options to a parser.

parseBool(val)

getNodeID()

Return unique ID of the current node (host). The resulting string will be convertable to a uuid.UUID.

parseSetEnv(l)

Parse a list of strings of the form "NAME=VALUE" or just "NAME" into a dictionary.

iC(minValue[, maxValue])

Returns a function that checks if a given int is in the given half-open interval.

fC(minValue[, maxValue])

Returns a function that checks if a given float is in the given half-open interval.

parse_accelerator_list(specs)

Parse a string description of one or more accelerator requirements.

cacheDirName(workflowID)

return

Name of the cache directory.

getDirSizeRecursively(dirPath)

This method will return the cumulative number of bytes occupied by the files

getFileSystemSize(dirPath)

Return the free space, and total size of the file system hosting dirPath.

safeUnpickleFromStream(stream)

Attributes

defaultTargetTime

SYS_MAX_SIZE

UUID_LENGTH

logger

JOBSTORE_HELP

toil.common.defaultTargetTime = 1800
toil.common.SYS_MAX_SIZE = 9223372036854775807
toil.common.UUID_LENGTH = 32
toil.common.logger
class toil.common.Config[source]

Class to represent configuration operations for a toil workflow run.

logFile: Optional[str]
logRotating: bool
cleanWorkDir: str
max_jobs: int
max_local_jobs: int
run_local_jobs_on_workers: bool
tes_endpoint: str
tes_user: str
tes_password: str
tes_bearer_token: str
jobStore: str
batchSystem: str
batch_logs_dir: Optional[str]

The backing scheduler will be instructed, if possible, to save logs to this directory, where the leader can read them.

workflowAttemptNumber: int
disableAutoDeployment: bool
workflowID: Optional[str]

This attribute uniquely identifies the job store and therefore the workflow. It is necessary in order to distinguish between two consecutive workflows for which self.jobStore is the same, e.g. when a job store name is reused after a previous run has finished successfully and its job store has been clean up.

prepare_start()[source]

After options are set, prepare for initial start of workflow.

Return type

None

prepare_restart()[source]

Before restart options are set, prepare for a restart of a workflow. Set up any execution-specific parameters and clear out any stale ones.

Return type

None

setOptions(options)[source]

Creates a config object from the options object.

Parameters

options (argparse.Namespace) –

Return type

None

__eq__(other)[source]

Return self==value.

Parameters

other (object) –

Return type

bool

__hash__()[source]

Return hash(self).

Return type

int

toil.common.JOBSTORE_HELP = Multiline-String
Show Value
"""The location of the job store for the workflow.  A job store holds persistent information about the jobs, stats, and files in a workflow. If the workflow is run with a distributed batch system, the job store must be accessible by all worker nodes. Depending on the desired job store implementation, the location should be formatted according to one of the following schemes:

file:<path> where <path> points to a directory on the file systen

aws:<region>:<prefix> where <region> is the name of an AWS region like us-west-2 and <prefix> will be prepended to the names of any top-level AWS resources in use by job store, e.g. S3 buckets.

 google:<project_id>:<prefix> TODO: explain

For backwards compatibility, you may also specify ./foo (equivalent to file:./foo or just file:foo) or /bar (equivalent to file:/bar)."""
toil.common.parser_with_common_options(provisioner_options=False, jobstore_option=True)[source]
Parameters
  • provisioner_options (bool) –

  • jobstore_option (bool) –

Return type

argparse.ArgumentParser

toil.common.addOptions(parser, config=None, jobstore_as_flag=False)[source]

Add Toil command line options to a parser.

Parameters
  • config (Optional[Config]) – If specified, take defaults from the given Config.

  • jobstore_as_flag (bool) – make the job store option a –jobStore flag instead of a required jobStore positional argument.

  • parser (argparse.ArgumentParser) –

Return type

None

toil.common.parseBool(val)[source]
Parameters

val (str) –

Return type

bool

toil.common.getNodeID()[source]

Return unique ID of the current node (host). The resulting string will be convertable to a uuid.UUID.

Tries several methods until success. The returned ID should be identical across calls from different processes on the same node at least until the next OS reboot.

The last resort method is uuid.getnode() that in some rare OS configurations may return a random ID each time it is called. However, this method should never be reached on a Linux system, because reading from /proc/sys/kernel/random/boot_id will be tried prior to that. If uuid.getnode() is reached, it will be called twice, and exception raised if the values are not identical.

Return type

str

class toil.common.Toil(options)[source]

Bases: ContextManager[Toil]

Inheritance diagram of toil.common.Toil

A context manager that represents a Toil workflow.

Specifically the batch system, job store, and its configuration.

Parameters

options (argparse.Namespace) –

config: Config
__enter__()[source]

Derive configuration from the command line options.

Then load the job store and, on restart, consolidate the derived configuration with the one from the previous invocation of the workflow.

Return type

Toil

__exit__(exc_type, exc_val, exc_tb)[source]

Clean up after a workflow invocation.

Depending on the configuration, delete the job store.

Parameters
Return type

Literal[False]

start(rootJob)[source]

Invoke a Toil workflow with the given job as the root for an initial run.

This method must be called in the body of a with Toil(...) as toil: statement. This method should not be called more than once for a workflow that has not finished.

Parameters

rootJob (toil.job.Job) – The root job of the workflow

Returns

The root job’s return value

Return type

Any

restart()[source]

Restarts a workflow that has been interrupted.

Returns

The root job’s return value

Return type

Any

classmethod getJobStore(locator)[source]

Create an instance of the concrete job store implementation that matches the given locator.

Parameters

locator (str) – The location of the job store to be represent by the instance

Returns

an instance of a concrete subclass of AbstractJobStore

Return type

toil.jobStores.abstractJobStore.AbstractJobStore

static parseLocator(locator)[source]
Parameters

locator (str) –

Return type

Tuple[str, str]

static buildLocator(name, rest)[source]
Parameters
  • name (str) –

  • rest (str) –

Return type

str

classmethod resumeJobStore(locator)[source]
Parameters

locator (str) –

Return type

toil.jobStores.abstractJobStore.AbstractJobStore

static createBatchSystem(config)[source]

Create an instance of the batch system specified in the given config.

Parameters

config (Config) – the current configuration

Returns

an instance of a concrete subclass of AbstractBatchSystem

Return type

toil.batchSystems.abstractBatchSystem.AbstractBatchSystem

importFile(srcUrl: str, sharedFileName: str, symlink: bool = False) None[source]
importFile(srcUrl: str, sharedFileName: None = None, symlink: bool = False) toil.fileStores.FileID
import_file(src_uri: str, shared_file_name: str, symlink: bool = False) None[source]
import_file(src_uri: str, shared_file_name: None = None, symlink: bool = False) toil.fileStores.FileID

Import the file at the given URL into the job store.

See toil.jobStores.abstractJobStore.AbstractJobStore.importFile() for a full description

exportFile(jobStoreFileID, dstUrl)[source]
Parameters
Return type

None

export_file(file_id, dst_uri)[source]

Export file to destination pointed at by the destination URL.

See toil.jobStores.abstractJobStore.AbstractJobStore.exportFile() for a full description

Parameters
Return type

None

static normalize_uri(uri, check_existence=False)[source]

Given a URI, if it has no scheme, prepend “file:”.

Parameters
  • check_existence (bool) – If set, raise an error if a URI points to a local file that does not exist.

  • uri (str) –

Return type

str

static getToilWorkDir(configWorkDir=None)[source]

Return a path to a writable directory under which per-workflow directories exist.

This directory is always required to exist on a machine, even if the Toil worker has not run yet. If your workers and leader have different temp directories, you may need to set TOIL_WORKDIR.

Parameters

configWorkDir (Optional[str]) – Value passed to the program using the –workDir flag

Returns

Path to the Toil work directory, constant across all machines

Return type

str

classmethod get_toil_coordination_dir(config_work_dir, config_coordination_dir)[source]

Return a path to a writable directory, which will be in memory if convenient. Ought to be used for file locking and coordination.

Parameters
  • config_work_dir (Optional[str]) – Value passed to the program using the –workDir flag

  • config_coordination_dir (Optional[str]) – Value passed to the program using the –coordinationDir flag

Returns

Path to the Toil coordination directory. Ought to be on a POSIX filesystem that allows directories containing open files to be deleted.

Return type

str

classmethod getLocalWorkflowDir(workflowID, configWorkDir=None)[source]

Return the directory where worker directories and the cache will be located for this workflow on this machine.

Parameters
  • configWorkDir (Optional[str]) – Value passed to the program using the –workDir flag

  • workflowID (str) –

Returns

Path to the local workflow directory on this machine

Return type

str

classmethod get_local_workflow_coordination_dir(workflow_id, config_work_dir, config_coordination_dir)[source]

Return the directory where coordination files should be located for this workflow on this machine. These include internal Toil databases and lock files for the machine.

If an in-memory filesystem is available, it is used. Otherwise, the local workflow directory, which may be on a shared network filesystem, is used.

Parameters
  • workflow_id (str) – Unique ID of the current workflow.

  • config_work_dir (Optional[str]) – Value used for the work directory in the current Toil Config.

  • config_coordination_dir (Optional[str]) – Value used for the coordination directory in the current Toil Config.

Returns

Path to the local workflow coordination directory on this machine.

Return type

str

exception toil.common.ToilRestartException(message)[source]

Bases: Exception

Inheritance diagram of toil.common.ToilRestartException

Common base class for all non-exit exceptions.

Parameters

message (str) –

exception toil.common.ToilContextManagerException[source]

Bases: Exception

Inheritance diagram of toil.common.ToilContextManagerException

Common base class for all non-exit exceptions.

class toil.common.ToilMetrics(bus, provisioner=None)[source]
Parameters
startDashboard(clusterName, zone)[source]
Parameters
  • clusterName (str) –

  • zone (str) –

Return type

None

add_prometheus_data_source()[source]
Return type

None

log(message)[source]
Parameters

message (str) –

Return type

None

logClusterSize(m)[source]
Parameters

m (toil.bus.ClusterSizeMessage) –

Return type

None

logClusterDesiredSize(m)[source]
Parameters

m (toil.bus.ClusterDesiredSizeMessage) –

Return type

None

logQueueSize(m)[source]
Parameters

m (toil.bus.QueueSizeMessage) –

Return type

None

logMissingJob(m)[source]
Parameters

m (toil.bus.JobMissingMessage) –

Return type

None

logIssuedJob(m)[source]
Parameters

m (toil.bus.JobIssuedMessage) –

Return type

None

logFailedJob(m)[source]
Parameters

m (toil.bus.JobFailedMessage) –

Return type

None

logCompletedJob(m)[source]
Parameters

m (toil.bus.JobCompletedMessage) –

Return type

None

shutdown()[source]
Return type

None

toil.common.parseSetEnv(l)[source]

Parse a list of strings of the form “NAME=VALUE” or just “NAME” into a dictionary.

Strings of the latter from will result in dictionary entries whose value is None.

>>> parseSetEnv([])
{}
>>> parseSetEnv(['a'])
{'a': None}
>>> parseSetEnv(['a='])
{'a': ''}
>>> parseSetEnv(['a=b'])
{'a': 'b'}
>>> parseSetEnv(['a=a', 'a=b'])
{'a': 'b'}
>>> parseSetEnv(['a=b', 'c=d'])
{'a': 'b', 'c': 'd'}
>>> parseSetEnv(['a=b=c'])
{'a': 'b=c'}
>>> parseSetEnv([''])
Traceback (most recent call last):
...
ValueError: Empty name
>>> parseSetEnv(['=1'])
Traceback (most recent call last):
...
ValueError: Empty name
Parameters

l (List[str]) –

Return type

Dict[str, Optional[str]]

toil.common.iC(minValue, maxValue=SYS_MAX_SIZE)[source]

Returns a function that checks if a given int is in the given half-open interval.

Parameters
  • minValue (int) –

  • maxValue (int) –

Return type

Callable[[int], bool]

toil.common.fC(minValue, maxValue=None)[source]

Returns a function that checks if a given float is in the given half-open interval.

Parameters
  • minValue (float) –

  • maxValue (Optional[float]) –

Return type

Callable[[float], bool]

toil.common.parse_accelerator_list(specs)[source]

Parse a string description of one or more accelerator requirements.

Parameters

specs (Optional[str]) –

Return type

List[toil.job.AcceleratorRequirement]

toil.common.cacheDirName(workflowID)[source]
Returns

Name of the cache directory.

Parameters

workflowID (str) –

Return type

str

toil.common.getDirSizeRecursively(dirPath)[source]

This method will return the cumulative number of bytes occupied by the files on disk in the directory and its subdirectories.

If the method is unable to access a file or directory (due to insufficient permissions, or due to the file or directory having been removed while this function was attempting to traverse it), the error will be handled internally, and a (possibly 0) lower bound on the size of the directory will be returned.

The environment variable ‘BLOCKSIZE’=’512’ is set instead of the much cleaner –block-size=1 because Apple can’t handle it.

Parameters

dirPath (str) – A valid path to a directory or file.

Returns

Total size, in bytes, of the file or directory at dirPath.

Return type

int

toil.common.getFileSystemSize(dirPath)[source]

Return the free space, and total size of the file system hosting dirPath.

Parameters

dirPath (str) – A valid path to a directory.

Returns

free space and total size of file system

Return type

Tuple[int, int]

toil.common.safeUnpickleFromStream(stream)[source]
Parameters

stream (IO[Any]) –

Return type

Any