toil.resource¶
Attributes¶
Exceptions¶
Common base class for all non-exit exceptions. |
Classes¶
A literal iterable to combine sequence literals (lists, set) with generators or list comprehensions. |
|
A wrapper describing an error condition. |
|
Represents a file or directory that will be deployed to each node before any jobs in the user script are invoked. |
|
A resource read from a file on the leader. |
|
A resource read from a directory on the leader. |
|
A resource read from a virtualenv on the leader. |
|
A path to a Python module decomposed into a namedtuple of three elements |
Functions¶
Test if we are inside a virtualenv or Conda virtual environment. |
|
|
Make a temporary directory like tempfile.mkdtemp, but with relaxed permissions. |
|
Variant of bool() that only accepts two possible string values. |
|
Retry a function if it fails with any Exception defined in "errors". |
Module Contents¶
- toil.resource.inVirtualEnv()[source]¶
Test if we are inside a virtualenv or Conda virtual environment.
- Return type:
- toil.resource.mkdtemp(suffix=None, prefix=None, dir=None)[source]¶
Make a temporary directory like tempfile.mkdtemp, but with relaxed permissions.
The permissions on the directory will be 711 instead of 700, allowing the group and all other users to traverse the directory. This is necessary if the directory is on NFS and the Docker daemon would like to mount it or a file inside it into a container, because on NFS even the Docker daemon appears bound by the file permissions.
See <https://github.com/DataBiosphere/toil/issues/4644>, and <https://stackoverflow.com/a/67928880> which talks about a similar problem but in the context of user namespaces.
- class toil.resource.concat(*args)[source]¶
A literal iterable to combine sequence literals (lists, set) with generators or list comprehensions.
Instead of
>>> [ -1 ] + [ x * 2 for x in range( 3 ) ] + [ -1 ] [-1, 0, 2, 4, -1]
you can write
>>> list( concat( -1, ( x * 2 for x in range( 3 ) ), -1 ) ) [-1, 0, 2, 4, -1]
This is slightly shorter (not counting the list constructor) and does not involve array construction or concatenation.
Note that concat() flattens (or chains) all iterable arguments into a single result iterable:
>>> list( concat( 1, range( 2, 4 ), 4 ) ) [1, 2, 3, 4]
It only does so one level deep. If you need to recursively flatten a data structure, check out crush().
If you want to prevent that flattening for an iterable argument, wrap it in concat():
>>> list( concat( 1, concat( range( 2, 4 ) ), 4 ) ) [1, range(2, 4), 4]
Some more example.
>>> list( concat() ) # empty concat [] >>> list( concat( 1 ) ) # non-iterable [1] >>> list( concat( concat() ) ) # empty iterable [] >>> list( concat( concat( 1 ) ) ) # singleton iterable [1] >>> list( concat( 1, concat( 2 ), 3 ) ) # flattened iterable [1, 2, 3] >>> list( concat( 1, [2], 3 ) ) # flattened iterable [1, 2, 3] >>> list( concat( 1, concat( [2] ), 3 ) ) # protecting an iterable from being flattened [1, [2], 3] >>> list( concat( 1, concat( [2], 3 ), 4 ) ) # protection only works with a single argument [1, 2, 3, 4] >>> list( concat( 1, 2, concat( 3, 4 ), 5, 6 ) ) [1, 2, 3, 4, 5, 6] >>> list( concat( 1, 2, concat( [ 3, 4 ] ), 5, 6 ) ) [1, 2, [3, 4], 5, 6]
Note that while strings are technically iterable, concat() does not flatten them.
>>> list( concat( 'ab' ) ) ['ab'] >>> list( concat( concat( 'ab' ) ) ) ['ab']
- Parameters:
args (Any)
- toil.resource.strict_bool(s)[source]¶
Variant of bool() that only accepts two possible string values.
- class toil.resource.ErrorCondition(error=None, error_codes=None, boto_error_codes=None, error_message_must_include=None, retry_on_this_condition=True)[source]¶
A wrapper describing an error condition.
ErrorCondition events may be used to define errors in more detail to determine whether to retry.
- toil.resource.retry(intervals=None, infinite_retries=False, errors=None, log_message=None, prepare=None)[source]¶
Retry a function if it fails with any Exception defined in “errors”.
Does so every x seconds, where x is defined by a list of numbers (ints or floats) in “intervals”. Also accepts ErrorCondition events for more detailed retry attempts.
- Parameters:
intervals (Optional[List]) – A list of times in seconds we keep retrying until returning failure. Defaults to retrying with the following exponential back-off before failing: 1s, 1s, 2s, 4s, 8s, 16s
infinite_retries (bool) – If this is True, reset the intervals when they run out. Defaults to: False.
errors (Optional[Sequence[Union[ErrorCondition, Type[Exception]]]]) –
A list of exceptions OR ErrorCondition objects to catch and retry on. ErrorCondition objects describe more detailed error event conditions than a plain error. An ErrorCondition specifies: - Exception (required) - Error codes that must match to be retried (optional; defaults to not checking) - A string that must be in the error message to be retried (optional; defaults to not checking) - A bool that can be set to False to always error on this condition.
If not specified, this will default to a generic Exception.
log_message (Optional[Tuple[Callable, str]]) – Optional tuple of (“log/print function()”, “message string”) that will precede each attempt.
prepare (Optional[List[Callable]]) – Optional list of functions to call, with the function’s arguments, between retries, to reset state.
- Returns:
The result of the wrapped function or raise.
- Return type:
Callable[[Callable[Ellipsis, RT]], Callable[Ellipsis, RT]]
- toil.resource.exactPython = 'python3.9'¶
- toil.resource.logger¶
- class toil.resource.Resource[source]¶
Bases:
namedtuple('Resource', ('name','pathHash','url','contentHash'))Represents a file or directory that will be deployed to each node before any jobs in the user script are invoked.
Each instance is a namedtuple with the following elements:
The pathHash element contains the MD5 (in hexdigest form) of the path to the resource on the leader node. The path, and therefore its hash is unique within a job store.
The url element is a “file:” or “http:” URL at which the resource can be obtained.
The contentHash element is an MD5 checksum of the resource, allowing for validation and caching of resources.
If the resource is a regular file, the type attribute will be ‘file’.
If the resource is a directory, the type attribute will be ‘dir’ and the URL will point at a ZIP archive of that directory.
- resourceEnvNamePrefix = 'JTRES_'¶
- rootDirPathEnvName¶
- classmethod create(jobStore, leaderPath)[source]¶
Saves the content of the file or directory at the given path to the given job store and returns a resource object representing that content for the purpose of obtaining it again at a generic, public URL. This method should be invoked on the leader node.
- Parameters:
leaderPath (str)
- Return type:
- classmethod prepareSystem()[source]¶
Prepares this system for the downloading and lookup of resources. This method should only be invoked on a worker node. It is idempotent but not thread-safe.
- Return type:
None
- register()[source]¶
Register this resource for later retrieval via lookup(), possibly in a child process.
- Return type:
None
- classmethod lookup(leaderPath)[source]¶
Return a resource object representing a resource created from a file or directory at the given path on the leader.
This method should be invoked on the worker. The given path does not need to refer to an existing file or directory on the worker, it only identifies the resource within an instance of toil. This method returns None if no resource for the given path exists.
- download(callback=None)[source]¶
Download this resource from its URL to a file on the local system.
This method should only be invoked on a worker node after the node was setup for accessing resources via prepareSystem().
- Parameters:
callback (Optional[Callable[[str], None]])
- Return type:
None
- property localPath: str¶
- Abstractmethod:
- Return type:
Get the path to resource on the worker.
The file or directory at the returned path may or may not yet exist. Invoking download() will ensure that it does.
- class toil.resource.FileResource[source]¶
Bases:
ResourceA resource read from a file on the leader.
- class toil.resource.DirectoryResource[source]¶
Bases:
ResourceA resource read from a directory on the leader.
The URL will point to a ZIP archive of the directory. All files in that directory (and any subdirectories) will be included. The directory may be a package but it does not need to be.
- class toil.resource.VirtualEnvResource[source]¶
Bases:
DirectoryResourceA resource read from a virtualenv on the leader.
All modules and packages found in the virtualenv’s site-packages directory will be included.
- class toil.resource.ModuleDescriptor[source]¶
Bases:
namedtuple('ModuleDescriptor', ('dirPath','name','fromVirtualEnv'))A path to a Python module decomposed into a namedtuple of three elements
dirPath, the path to the directory that should be added to sys.path before importing the module,
moduleName, the fully qualified name of the module with leading package names separated by dot and
>>> import toil.resource >>> ModuleDescriptor.forModule('toil.resource') ModuleDescriptor(dirPath='/.../src', name='toil.resource', fromVirtualEnv=False)
>>> import subprocess, tempfile, os >>> dirPath = tempfile.mkdtemp() >>> path = os.path.join( dirPath, 'foo.py' ) >>> with open(path,'w') as f: ... _ = f.write('from toil.resource import ModuleDescriptor\n' ... 'print(ModuleDescriptor.forModule(__name__))') >>> subprocess.check_output([ sys.executable, path ]) b"ModuleDescriptor(dirPath='...', name='foo', fromVirtualEnv=False)\n"
>>> from shutil import rmtree >>> rmtree( dirPath )
Now test a collision. ‘collections’ is part of the standard library in Python 2 and 3. >>> dirPath = tempfile.mkdtemp() >>> path = os.path.join( dirPath, ‘collections.py’ ) >>> with open(path,’w’) as f: … _ = f.write(‘from toil.resource import ModuleDescriptorn’ … ‘ModuleDescriptor.forModule(__name__)’)
This should fail and return exit status 1 due to the collision with the built-in module: >>> subprocess.call([ sys.executable, path ]) 1
Clean up >>> rmtree( dirPath )
- classmethod forModule(name)[source]¶
Return an instance of this class representing the module of the given name.
If the given module name is “__main__”, it will be translated to the actual file name of the top-level script without the .py or .pyc extension. This method assumes that the module with the specified name has already been loaded.
- Parameters:
name (str)
- Return type:
- saveAsResourceTo(jobStore)[source]¶
Store the file containing this module–or even the Python package directory hierarchy containing that file–as a resource to the given job store and return the corresponding resource object. Should only be called on a leader node.
- Parameters:
- Return type:
- localize()[source]¶
Check if this module was saved as a resource.
If it was, return a new module descriptor that points to a local copy of that resource. Should only be called on a worker node. On the leader, this method returns this resource, i.e. self.
- Return type:
- load()[source]¶
- Return type:
Optional[types.ModuleType]