toil.resource

Attributes

exactPython

logger

Exceptions

ResourceException

Common base class for all non-exit exceptions.

Classes

concat

A literal iterable to combine sequence literals (lists, set) with generators or list comprehensions.

ErrorCondition

A wrapper describing an error condition.

Resource

Represents a file or directory that will be deployed to each node before any jobs in the user script are invoked.

FileResource

A resource read from a file on the leader.

DirectoryResource

A resource read from a directory on the leader.

VirtualEnvResource

A resource read from a virtualenv on the leader.

ModuleDescriptor

A path to a Python module decomposed into a namedtuple of three elements

Functions

inVirtualEnv()

Test if we are inside a virtualenv or Conda virtual environment.

mkdtemp([suffix, prefix, dir])

Make a temporary directory like tempfile.mkdtemp, but with relaxed permissions.

strict_bool(s)

Variant of bool() that only accepts two possible string values.

retry([intervals, infinite_retries, errors, ...])

Retry a function if it fails with any Exception defined in "errors".

Module Contents

toil.resource.inVirtualEnv()[source]

Test if we are inside a virtualenv or Conda virtual environment.

Return type:

bool

toil.resource.mkdtemp(suffix=None, prefix=None, dir=None)[source]

Make a temporary directory like tempfile.mkdtemp, but with relaxed permissions.

The permissions on the directory will be 711 instead of 700, allowing the group and all other users to traverse the directory. This is necessary if the directory is on NFS and the Docker daemon would like to mount it or a file inside it into a container, because on NFS even the Docker daemon appears bound by the file permissions.

See <https://github.com/DataBiosphere/toil/issues/4644>, and <https://stackoverflow.com/a/67928880> which talks about a similar problem but in the context of user namespaces.

Parameters:
  • suffix (Optional[str])

  • prefix (Optional[str])

  • dir (Optional[str])

Return type:

str

class toil.resource.concat(*args)[source]

A literal iterable to combine sequence literals (lists, set) with generators or list comprehensions.

Instead of

>>> [ -1 ] + [ x * 2 for x in range( 3 ) ] + [ -1 ]
[-1, 0, 2, 4, -1]

you can write

>>> list( concat( -1, ( x * 2 for x in range( 3 ) ), -1 ) )
[-1, 0, 2, 4, -1]

This is slightly shorter (not counting the list constructor) and does not involve array construction or concatenation.

Note that concat() flattens (or chains) all iterable arguments into a single result iterable:

>>> list( concat( 1, range( 2, 4 ), 4 ) )
[1, 2, 3, 4]

It only does so one level deep. If you need to recursively flatten a data structure, check out crush().

If you want to prevent that flattening for an iterable argument, wrap it in concat():

>>> list( concat( 1, concat( range( 2, 4 ) ), 4 ) )
[1, range(2, 4), 4]

Some more example.

>>> list( concat() ) # empty concat
[]
>>> list( concat( 1 ) ) # non-iterable
[1]
>>> list( concat( concat() ) ) # empty iterable
[]
>>> list( concat( concat( 1 ) ) ) # singleton iterable
[1]
>>> list( concat( 1, concat( 2 ), 3 ) ) # flattened iterable
[1, 2, 3]
>>> list( concat( 1, [2], 3 ) ) # flattened iterable
[1, 2, 3]
>>> list( concat( 1, concat( [2] ), 3 ) ) # protecting an iterable from being flattened
[1, [2], 3]
>>> list( concat( 1, concat( [2], 3 ), 4 ) ) # protection only works with a single argument
[1, 2, 3, 4]
>>> list( concat( 1, 2, concat( 3, 4 ), 5, 6 ) )
[1, 2, 3, 4, 5, 6]
>>> list( concat( 1, 2, concat( [ 3, 4 ] ), 5, 6 ) )
[1, 2, [3, 4], 5, 6]

Note that while strings are technically iterable, concat() does not flatten them.

>>> list( concat( 'ab' ) )
['ab']
>>> list( concat( concat( 'ab' ) ) )
['ab']
Parameters:

args (Any)

__iter__()[source]
Return type:

Iterator[Any]

toil.resource.strict_bool(s)[source]

Variant of bool() that only accepts two possible string values.

Parameters:

s (str)

Return type:

bool

class toil.resource.ErrorCondition(error=None, error_codes=None, boto_error_codes=None, error_message_must_include=None, retry_on_this_condition=True)[source]

A wrapper describing an error condition.

ErrorCondition events may be used to define errors in more detail to determine whether to retry.

Parameters:
  • error (Optional[Any])

  • error_codes (List[int])

  • boto_error_codes (List[str])

  • error_message_must_include (str)

  • retry_on_this_condition (bool)

toil.resource.retry(intervals=None, infinite_retries=False, errors=None, log_message=None, prepare=None)[source]

Retry a function if it fails with any Exception defined in “errors”.

Does so every x seconds, where x is defined by a list of numbers (ints or floats) in “intervals”. Also accepts ErrorCondition events for more detailed retry attempts.

Parameters:
  • intervals (Optional[List]) – A list of times in seconds we keep retrying until returning failure. Defaults to retrying with the following exponential back-off before failing: 1s, 1s, 2s, 4s, 8s, 16s

  • infinite_retries (bool) – If this is True, reset the intervals when they run out. Defaults to: False.

  • errors (Optional[Sequence[Union[ErrorCondition, Type[Exception]]]]) –

    A list of exceptions OR ErrorCondition objects to catch and retry on. ErrorCondition objects describe more detailed error event conditions than a plain error. An ErrorCondition specifies: - Exception (required) - Error codes that must match to be retried (optional; defaults to not checking) - A string that must be in the error message to be retried (optional; defaults to not checking) - A bool that can be set to False to always error on this condition.

    If not specified, this will default to a generic Exception.

  • log_message (Optional[Tuple[Callable, str]]) – Optional tuple of (“log/print function()”, “message string”) that will precede each attempt.

  • prepare (Optional[List[Callable]]) – Optional list of functions to call, with the function’s arguments, between retries, to reset state.

Returns:

The result of the wrapped function or raise.

Return type:

Callable[[Callable[Ellipsis, RT]], Callable[Ellipsis, RT]]

toil.resource.exactPython = 'python3.9'
toil.resource.logger
class toil.resource.Resource[source]

Bases: namedtuple('Resource', ('name', 'pathHash', 'url', 'contentHash'))

Represents a file or directory that will be deployed to each node before any jobs in the user script are invoked.

Each instance is a namedtuple with the following elements:

The pathHash element contains the MD5 (in hexdigest form) of the path to the resource on the leader node. The path, and therefore its hash is unique within a job store.

The url element is a “file:” or “http:” URL at which the resource can be obtained.

The contentHash element is an MD5 checksum of the resource, allowing for validation and caching of resources.

If the resource is a regular file, the type attribute will be ‘file’.

If the resource is a directory, the type attribute will be ‘dir’ and the URL will point at a ZIP archive of that directory.

resourceEnvNamePrefix = 'JTRES_'
rootDirPathEnvName
classmethod create(jobStore, leaderPath)[source]

Saves the content of the file or directory at the given path to the given job store and returns a resource object representing that content for the purpose of obtaining it again at a generic, public URL. This method should be invoked on the leader node.

Parameters:
Return type:

Resource

refresh(jobStore)[source]
Parameters:

jobStore (toil.jobStores.abstractJobStore.AbstractJobStore)

Return type:

Resource

classmethod prepareSystem()[source]

Prepares this system for the downloading and lookup of resources. This method should only be invoked on a worker node. It is idempotent but not thread-safe.

Return type:

None

classmethod cleanSystem()[source]

Remove all downloaded, localized resources.

Return type:

None

register()[source]

Register this resource for later retrieval via lookup(), possibly in a child process.

Return type:

None

classmethod lookup(leaderPath)[source]

Return a resource object representing a resource created from a file or directory at the given path on the leader.

This method should be invoked on the worker. The given path does not need to refer to an existing file or directory on the worker, it only identifies the resource within an instance of toil. This method returns None if no resource for the given path exists.

Parameters:

leaderPath (str)

Return type:

Optional[Resource]

download(callback=None)[source]

Download this resource from its URL to a file on the local system.

This method should only be invoked on a worker node after the node was setup for accessing resources via prepareSystem().

Parameters:

callback (Optional[Callable[[str], None]])

Return type:

None

property localPath: str
Abstractmethod:

Return type:

str

Get the path to resource on the worker.

The file or directory at the returned path may or may not yet exist. Invoking download() will ensure that it does.

property localDirPath: str

The path to the directory containing the resource on the worker.

Return type:

str

pickle()[source]
Return type:

str

classmethod unpickle(s)[source]
Parameters:

s (str)

Return type:

Resource

class toil.resource.FileResource[source]

Bases: Resource

A resource read from a file on the leader.

property localPath: str

Get the path to resource on the worker.

The file or directory at the returned path may or may not yet exist. Invoking download() will ensure that it does.

Return type:

str

class toil.resource.DirectoryResource[source]

Bases: Resource

A resource read from a directory on the leader.

The URL will point to a ZIP archive of the directory. All files in that directory (and any subdirectories) will be included. The directory may be a package but it does not need to be.

property localPath: str

Get the path to resource on the worker.

The file or directory at the returned path may or may not yet exist. Invoking download() will ensure that it does.

Return type:

str

class toil.resource.VirtualEnvResource[source]

Bases: DirectoryResource

A resource read from a virtualenv on the leader.

All modules and packages found in the virtualenv’s site-packages directory will be included.

class toil.resource.ModuleDescriptor[source]

Bases: namedtuple('ModuleDescriptor', ('dirPath', 'name', 'fromVirtualEnv'))

A path to a Python module decomposed into a namedtuple of three elements

  • dirPath, the path to the directory that should be added to sys.path before importing the module,

  • moduleName, the fully qualified name of the module with leading package names separated by dot and

>>> import toil.resource
>>> ModuleDescriptor.forModule('toil.resource') 
ModuleDescriptor(dirPath='/.../src', name='toil.resource', fromVirtualEnv=False)
>>> import subprocess, tempfile, os
>>> dirPath = tempfile.mkdtemp()
>>> path = os.path.join( dirPath, 'foo.py' )
>>> with open(path,'w') as f:
...     _ = f.write('from toil.resource import ModuleDescriptor\n'
...                 'print(ModuleDescriptor.forModule(__name__))')
>>> subprocess.check_output([ sys.executable, path ]) 
b"ModuleDescriptor(dirPath='...', name='foo', fromVirtualEnv=False)\n"
>>> from shutil import rmtree
>>> rmtree( dirPath )

Now test a collision. ‘collections’ is part of the standard library in Python 2 and 3. >>> dirPath = tempfile.mkdtemp() >>> path = os.path.join( dirPath, ‘collections.py’ ) >>> with open(path,’w’) as f: … _ = f.write(‘from toil.resource import ModuleDescriptorn’ … ‘ModuleDescriptor.forModule(__name__)’)

This should fail and return exit status 1 due to the collision with the built-in module: >>> subprocess.call([ sys.executable, path ]) 1

Clean up >>> rmtree( dirPath )

dirPath: str
name: str
classmethod forModule(name)[source]

Return an instance of this class representing the module of the given name.

If the given module name is “__main__”, it will be translated to the actual file name of the top-level script without the .py or .pyc extension. This method assumes that the module with the specified name has already been loaded.

Parameters:

name (str)

Return type:

ModuleDescriptor

property belongsToToil: bool

True if this module is part of the Toil distribution

Return type:

bool

saveAsResourceTo(jobStore)[source]

Store the file containing this module–or even the Python package directory hierarchy containing that file–as a resource to the given job store and return the corresponding resource object. Should only be called on a leader node.

Parameters:

jobStore (toil.jobStores.abstractJobStore.AbstractJobStore)

Return type:

Resource

localize()[source]

Check if this module was saved as a resource.

If it was, return a new module descriptor that points to a local copy of that resource. Should only be called on a worker node. On the leader, this method returns this resource, i.e. self.

Return type:

ModuleDescriptor

globalize()[source]

Reverse the effect of localize().

Return type:

ModuleDescriptor

toCommand()[source]
Return type:

Sequence[str]

classmethod fromCommand(command)[source]
Parameters:

command (Sequence[str])

Return type:

ModuleDescriptor

makeLoadable()[source]
Return type:

ModuleDescriptor

load()[source]
Return type:

Optional[types.ModuleType]

exception toil.resource.ResourceException[source]

Bases: Exception

Common base class for all non-exit exceptions.