toil.wdl.wdltoil

Attributes

logger

file_digest

WDLContext

F

WDLBindings

SHARED_PATH_ATTR

DirectoryNamingStateDict

Exceptions

InsufficientMountDiskSpace

Common base class for all non-exit exceptions.

Classes

ReadableFileObj

Protocol that is more specific than what file_digest takes as an argument.

FileDigester

Protocol for the features we need from hashlib.file_digest.

NonDownloadingSize

WDL size() implementation that avoids downloading files.

ToilWDLStdLibBase

Standard library implementation for WDL as run on Toil.

ToilWDLStdLibWorkflow

Standard library implementation for workflow scope.

ToilWDLStdLibTaskCommand

Standard library implementation to use inside a WDL task command evaluation.

ToilWDLStdLibTaskOutputs

Standard library implementation for WDL as run on Toil, with additional

WDLBaseJob

Base job class for all WDL-related jobs.

WDLTaskWrapperJob

Job that determines the resources needed to run a WDL job.

WDLTaskJob

Job that runs a WDL task.

WDLWorkflowNodeJob

Job that evaluates a WDL workflow node.

WDLWorkflowNodeListJob

Job that evaluates a list of WDL workflow nodes, which are in the same

WDLCombineBindingsJob

Job that collects the results from WDL workflow nodes and combines their

WDLWorkflowGraph

Represents a graph of WDL WorkflowNodes.

WDLSectionJob

Job that can create more graph for a section of the workflow.

WDLScatterJob

Job that evaluates a scatter in a WDL workflow. Runs the body for each

WDLArrayBindingsJob

Job that takes all new bindings created in an array of input environments,

WDLConditionalJob

Job that evaluates a conditional in a WDL workflow.

WDLWorkflowJob

Job that evaluates an entire WDL workflow.

WDLOutputsJob

Job which evaluates an outputs section for a workflow.

WDLStartJob

Job that evaluates an entire WDL workflow, and returns the workflow outputs

WDLInstallImportsJob

Class represents a unit of work in toil.

WDLImportWrapper

Job to organize importing files on workers instead of the leader. Responsible for extracting filenames and metadata,

Functions

wdl_error_reporter(task[, exit, log])

Run code in a context where WDL errors will be reported with pretty formatting.

report_wdl_errors(task[, exit, log])

Create a decorator to report WDL errors with the given task message.

remove_common_leading_whitespace(expression[, ...])

Remove "common leading whitespace" as defined in the WDL 1.1 spec.

toil_read_source(uri, path, importer)

Implementation of a MiniWDL read_source function that can use any

virtualized_equal(value1, value2)

Check if two WDL values are equal when taking into account file virtualization.

combine_bindings(all_bindings)

Combine variable bindings from multiple predecessor tasks into one set for

log_bindings(log_function, message, all_bindings)

Log bindings to the console, even if some are still promises.

get_supertype(types)

Get the supertype that can hold values of all the given types.

for_each_node(root)

Iterate over all WDL workflow nodes in the given node, including inputs,

recursive_dependencies(root)

Get the combined workflow_node_dependencies of root and everything under

parse_disks(spec, disks_spec)

Parse a WDL disk spec into a disk mount specification.

pack_toil_uri(file_id, task_path, dir_id, file_basename)

Encode a Toil file ID and metadata about who wrote it as a URI.

unpack_toil_uri(toil_uri)

Unpack a URI made by make_toil_uri to retrieve the FileID and the basename

clone_metadata(old_file, new_file)

Copy all Toil metadata from one WDL File to another.

set_file_value(file, new_value)

Return a copy of a WDL File with all metadata intact but the value changed.

set_file_nonexistent(file, nonexistent)

Return a copy of a WDL File with all metadata intact but the nonexistent flag set to the given value.

get_file_nonexistent(file)

Return the nonexistent flag for a file.

set_file_virtualized_value(file, virtualized_value)

Return a copy of a WDL File with all metadata intact but the virtualized_value attribute set to the given value.

get_file_virtualized_value(file)

Get the virtualized storage location for a file.

get_shared_fs_path(file)

If a File has a shared filesystem path, get that path.

set_shared_fs_path(file, path)

Return a copy of the given File associated with the given shared filesystem path.

view_shared_fs_paths(bindings)

Given WDL bindings, return a copy where all files have their shared filesystem paths as their values.

poll_execution_cache(node, bindings)

Return the cached result of calling this workflow or task, and its key.

fill_execution_cache(cache_key, output_bindings, ...)

Cache the result of calling a workflow or task.

choose_human_readable_directory(root_dir, ...)

Select a good directory to save files from a task and source directory in.

evaluate_decls_to_bindings(decls, all_bindings, ...[, ...])

Evaluate decls with a given bindings environment and standard library.

extract_workflow_inputs(environment)

convert_files(environment, file_to_id, file_to_data, ...)

Resolve relative-URI files in the given environment convert the file values to a new value made from a given mapping.

convert_remote_files(environment, file_source, task_path)

Resolve relative-URI files in the given environment and import all files.

evaluate_named_expression(context, name, ...)

Evaluate an expression when we know the name of it.

evaluate_decl(node, environment, stdlib)

Evaluate the expression of a declaration node, or raise an error.

evaluate_call_inputs(context, expressions, ...[, ...])

Evaluate a bunch of expressions with names, and make them into a fresh set of bindings. inputs_dict is a mapping of

evaluate_defaultable_decl(node, environment, stdlib)

If the name of the declaration is already defined in the environment, return its value. Otherwise, return the evaluated expression.

devirtualize_files(environment, stdlib)

Make sure all the File values embedded in the given bindings point to files

virtualize_files(environment, stdlib[, enforce_existence])

Make sure all the File values embedded in the given bindings point to files

add_paths(task_container, host_paths)

Based off of WDL.runtime.task_container.add_paths from miniwdl

drop_if_missing(file, standard_library)

Return None if a file doesn't exist, or its path if it does.

drop_missing_files(environment, standard_library)

Make sure all the File values embedded in the given bindings point to files

get_file_paths_in_bindings(environment)

Get the paths of all files in the bindings. Doesn't guarantee that

map_over_files_in_bindings(environment, transform)

Run all File values embedded in the given bindings through the given

map_over_files_in_binding(binding, transform)

Run all File values' types and values embedded in the given binding's value through the given

map_over_typed_files_in_value(value, transform)

Run all File values embedded in the given value through the given

ensure_null_files_are_nullable(value, original_value, ...)

Run through all nested values embedded in the given value and check that the null values are valid.

make_root_job(target, inputs, inputs_search_path, ...)

main()

A Toil workflow to interpret WDL input files.

Module Contents

toil.wdl.wdltoil.logger
class toil.wdl.wdltoil.ReadableFileObj

Bases: Protocol

Protocol that is more specific than what file_digest takes as an argument. Also guarantees a read() method.

Would extend the protocol from Typeshed for hashlib but those are only declared for 3.11+.

readinto(buf, /)
Parameters:

buf (bytearray)

Return type:

int

readable()
Return type:

bool

read(number)
Parameters:

number (int)

Return type:

bytes

class toil.wdl.wdltoil.FileDigester

Bases: Protocol

Protocol for the features we need from hashlib.file_digest.

__call__(__f, __alg_name)
Parameters:
Return type:

hashlib._Hash

toil.wdl.wdltoil.file_digest: FileDigester
toil.wdl.wdltoil.WDLContext
exception toil.wdl.wdltoil.InsufficientMountDiskSpace(mount_targets, desired_bytes, available_bytes)

Bases: Exception

Common base class for all non-exit exceptions.

Parameters:
  • mount_targets (list[str])

  • desired_bytes (int)

  • available_bytes (int)

toil.wdl.wdltoil.wdl_error_reporter(task, exit=False, log=logger.critical)

Run code in a context where WDL errors will be reported with pretty formatting.

Parameters:
  • task (str)

  • exit (bool)

  • log (Callable[[str], None])

Return type:

Generator[None]

toil.wdl.wdltoil.F
toil.wdl.wdltoil.report_wdl_errors(task, exit=False, log=logger.critical)

Create a decorator to report WDL errors with the given task message.

Decorator can then be applied to a function, and if a WDL error happens it will say that it could not {task}.

Parameters:
  • task (str)

  • exit (bool)

  • log (Callable[[str], None])

Return type:

Callable[[F], F]

toil.wdl.wdltoil.remove_common_leading_whitespace(expression, tolerate_blanks=True, tolerate_dedents=False, tolerate_all_whitespace=True, debug=False)

Remove “common leading whitespace” as defined in the WDL 1.1 spec.

See <https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#stripping-leading-whitespace>.

Operates on a WDL.Expr.String expression that has already been parsed.

Parameters:
  • tolerate_blanks (bool) – If True, don’t allow totally blank lines to zero the common whitespace.

  • tolerate_dedents (bool) – If True, remove as much of the whitespace on the first indented line as is found on subesquent lines, regardless of whether later lines are out-dented relative to it.

  • tolerate_all_whitespace (bool) – If True, don’t allow all-whitespace lines to reduce the common whitespace prefix.

  • debug (bool) – If True, the function will show its work by logging at debug level.

  • expression (WDL.Expr.String)

Return type:

WDL.Expr.String

async toil.wdl.wdltoil.toil_read_source(uri, path, importer)

Implementation of a MiniWDL read_source function that can use any filename or URL supported by Toil.

Needs to be async because MiniWDL will await its result.

Parameters:
  • uri (str)

  • path (list[str])

  • importer (WDL.Tree.Document | None)

Return type:

WDL.Tree.ReadSourceResult

toil.wdl.wdltoil.virtualized_equal(value1, value2)

Check if two WDL values are equal when taking into account file virtualization.

Treats virtualized and non-virtualized Files referring to the same underlying file as equal.

Parameters:
  • value1 (WDL.Value.Base) – WDL value

  • value2 (WDL.Value.Base) – WDL value

Returns:

Whether the two values are equal with file virtualization accounted for

Return type:

bool

toil.wdl.wdltoil.WDLBindings
toil.wdl.wdltoil.combine_bindings(all_bindings)

Combine variable bindings from multiple predecessor tasks into one set for the current task.

Parameters:

all_bindings (Sequence[WDLBindings])

Return type:

WDLBindings

toil.wdl.wdltoil.log_bindings(log_function, message, all_bindings)

Log bindings to the console, even if some are still promises.

Parameters:
  • log_function (Callable[Ellipsis, None]) – Function (like logger.info) to call to log data

  • message (str) – Message to log before the bindings

  • all_bindings (Sequence[toil.job.Promised[WDLBindings]]) – A list of bindings or promises for bindings, to log

Return type:

None

toil.wdl.wdltoil.get_supertype(types)

Get the supertype that can hold values of all the given types.

Parameters:

types (Sequence[WDL.Type.Base])

Return type:

WDL.Type.Base

toil.wdl.wdltoil.for_each_node(root)

Iterate over all WDL workflow nodes in the given node, including inputs, internal nodes of conditionals and scatters, and gather nodes.

Parameters:

root (WDL.Tree.WorkflowNode)

Return type:

Iterator[WDL.Tree.WorkflowNode]

toil.wdl.wdltoil.recursive_dependencies(root)

Get the combined workflow_node_dependencies of root and everything under it, which are not on anything in that subtree.

Useful because section nodes can have internal nodes with dependencies not reflected in those of the section node itself.

Parameters:

root (WDL.Tree.WorkflowNode)

Return type:

set[str]

toil.wdl.wdltoil.parse_disks(spec, disks_spec)

Parse a WDL disk spec into a disk mount specification. :param spec: Disks spec to parse :param disks_spec: All disks spec as specified in the WDL file. Only used for better error messages. :return: Specified mount point (None if omitted or local-disk), number of units, size of unit (ex GB)

Parameters:
  • spec (str)

  • disks_spec (list[WDL.Value.String] | str)

Return type:

tuple[str | None, float, str]

toil.wdl.wdltoil.pack_toil_uri(file_id, task_path, dir_id, file_basename)

Encode a Toil file ID and metadata about who wrote it as a URI.

The URI will start with the scheme in TOIL_URI_SCHEME.

Parameters:
Return type:

str

toil.wdl.wdltoil.unpack_toil_uri(toil_uri)

Unpack a URI made by make_toil_uri to retrieve the FileID and the basename (no path prefix) that the file is supposed to have.

Parameters:

toil_uri (str)

Return type:

tuple[toil.fileStores.FileID, str, str, str]

toil.wdl.wdltoil.SHARED_PATH_ATTR = '_shared_fs_path'
toil.wdl.wdltoil.clone_metadata(old_file, new_file)

Copy all Toil metadata from one WDL File to another.

Parameters:
  • old_file (WDL.Value.File)

  • new_file (WDL.Value.File)

Return type:

None

toil.wdl.wdltoil.set_file_value(file, new_value)

Return a copy of a WDL File with all metadata intact but the value changed.

Parameters:
  • file (WDL.Value.File)

  • new_value (str)

Return type:

WDL.Value.File

toil.wdl.wdltoil.set_file_nonexistent(file, nonexistent)

Return a copy of a WDL File with all metadata intact but the nonexistent flag set to the given value.

Parameters:
  • file (WDL.Value.File)

  • nonexistent (bool)

Return type:

WDL.Value.File

toil.wdl.wdltoil.get_file_nonexistent(file)

Return the nonexistent flag for a file.

Parameters:

file (WDL.Value.File)

Return type:

bool

toil.wdl.wdltoil.set_file_virtualized_value(file, virtualized_value)

Return a copy of a WDL File with all metadata intact but the virtualized_value attribute set to the given value.

Parameters:
  • file (WDL.Value.File)

  • virtualized_value (str)

Return type:

WDL.Value.File

toil.wdl.wdltoil.get_file_virtualized_value(file)

Get the virtualized storage location for a file.

Parameters:

file (WDL.Value.File)

Return type:

Optional[str]

toil.wdl.wdltoil.get_shared_fs_path(file)

If a File has a shared filesystem path, get that path.

This will be the path the File was initially imported from, or the path that it has in the call cache.

Parameters:

file (WDL.Value.File)

Return type:

Optional[str]

toil.wdl.wdltoil.set_shared_fs_path(file, path)

Return a copy of the given File associated with the given shared filesystem path.

This should be the path it was initially imported from, or the path that it has in the call cache.

Parameters:
  • file (WDL.Value.File)

  • path (str)

Return type:

WDL.Value.File

toil.wdl.wdltoil.view_shared_fs_paths(bindings)

Given WDL bindings, return a copy where all files have their shared filesystem paths as their values.

Parameters:

bindings (WDL.Env.Bindings[WDL.Value.Base])

Return type:

WDL.Env.Bindings[WDL.Value.Base]

toil.wdl.wdltoil.poll_execution_cache(node, bindings)

Return the cached result of calling this workflow or task, and its key.

Returns None and the key if the cache has no result for us.

Deals in un-namespaced bindings.

Parameters:
  • node (Union[WDL.Tree.Workflow, WDL.Tree.Task])

  • bindings (WDLBindings)

Return type:

tuple[WDLBindings | None, str]

toil.wdl.wdltoil.fill_execution_cache(cache_key, output_bindings, file_store, wdl_options, miniwdl_logger=None, miniwdl_config=None)

Cache the result of calling a workflow or task.

Deals in un-namespaced bindings.

Returns:

possibly modified bindings to continue on with, that may reference the cache.

Parameters:
Return type:

WDLBindings

toil.wdl.wdltoil.DirectoryNamingStateDict
toil.wdl.wdltoil.choose_human_readable_directory(root_dir, source_task_path, parent_id, state)

Select a good directory to save files from a task and source directory in.

The directories involved may not exist.

Parameters:
  • root_dir (str) – Directory that the path will be under

  • source_task_path (str) – The dotted WDL name of whatever generated the file. We assume this is an acceptable filename component.

  • parent_id (str) – UUID of the directory that the file came from. All files with the same parent ID will be placed as siblings files in a shared parent directory.

  • state (DirectoryNamingStateDict) – A state dict that must be passed to repeated calls.

Return type:

str

toil.wdl.wdltoil.evaluate_decls_to_bindings(decls, all_bindings, standard_library, include_previous=False, drop_missing_files=False)

Evaluate decls with a given bindings environment and standard library. Creates a new bindings object that only contains the bindings from the given decls. Guarantees that each decl in decls can access the variables defined by the previous ones. :param all_bindings: Environment to use when evaluating decls :param decls: Decls to evaluate :param standard_library: Standard library :param include_previous: Whether to include the existing environment in the new returned environment. This will be false for outputs where only defined decls should be included :param drop_missing_files: Whether to coerce nonexistent files to null. The coerced elements will be checked that the transformation is valid. Currently should only be enabled in output sections, see https://github.com/openwdl/wdl/issues/673#issuecomment-2248828116 :return: New bindings object

Parameters:
  • decls (list[WDL.Tree.Decl])

  • all_bindings (WDL.Env.Bindings[WDL.Value.Base])

  • standard_library (ToilWDLStdLibBase)

  • include_previous (bool)

  • drop_missing_files (bool)

Return type:

WDL.Env.Bindings[WDL.Value.Base]

class toil.wdl.wdltoil.NonDownloadingSize

Bases: WDL.StdLib._Size

WDL size() implementation that avoids downloading files.

MiniWDL’s default size() implementation downloads the whole file to get its size. We want to be able to get file sizes from code running on the leader, where there may not be space to download the whole file. So we override the fancy class that implements it so that we can handle sizes for FileIDs using the FileID’s stored size info.

toil.wdl.wdltoil.extract_workflow_inputs(environment)
Parameters:

environment (WDLBindings)

Return type:

list[str]

toil.wdl.wdltoil.convert_files(environment, file_to_id, file_to_data, task_path)

Resolve relative-URI files in the given environment convert the file values to a new value made from a given mapping.

Will return bindings with file values set to their corresponding relative-URI.

Parameters:
Returns:

new bindings object

Return type:

WDLBindings

toil.wdl.wdltoil.convert_remote_files(environment, file_source, task_path, search_paths=None, import_remote_files=True, execution_dir=None)

Resolve relative-URI files in the given environment and import all files.

Returns an environment where each File’s value is set to the URI it was found at, its virtualized value is set to what it was loaded into the filestore as (if applicable), and its shared filesystem path is set if it came from the local filesystem.

Parameters:
  • environment (WDLBindings) – Bindings to evaluate on

  • file_source (toil.jobStores.abstractJobStore.AbstractJobStore) – Context to search for files with

  • task_path (str) – Dotted WDL name of the user-level code doing the importing (probably the workflow name).

  • search_paths (Optional[list[str]]) – If set, try resolving input location relative to the URLs or directories in this list.

  • import_remote_files (bool) – If set, import files from remote locations. Else leave them as URI references.

  • execution_dir (Optional[str])

Return type:

WDLBindings

class toil.wdl.wdltoil.ToilWDLStdLibBase(file_store, wdl_options, share_files_with=None)

Bases: WDL.StdLib.Base

Standard library implementation for WDL as run on Toil.

Parameters:
size
property execution_dir: str | None
Return type:

str | None

property task_path: str
Return type:

str

get_local_paths()

Get all the local paths of files devirtualized (or virtualized) through the stdlib.

Return type:

list[str]

static devirtualize_to(filename, dest_dir, file_source, state, wdl_options, devirtualized_to_virtualized=None, virtualized_to_devirtualized=None, export=None)

Download or export a WDL virtualized filename/URL to the given directory.

The destination directory must already exist. No other devirtualize_to call may be writing to it, including the case of another workflow writing the same task to the same place in the call cache at the same time.

Makes sure sibling files stay siblings and files with the same name don’t clobber each other. Called from within this class for tasks, and statically at the end of the workflow for outputs.

Returns the local path to the file. If the file is already a local path, or if it already has an entry in virtualized_to_devirtualized, that path will be re-used instead of creating a new copy in dest_dir.

The input filename could already be devirtualized. In this case, the filename should not be added to the cache.

Parameters:
  • state (DirectoryNamingStateDict) – State dict which must be shared among successive calls into a dest_dir.

  • wdl_options (WDLContext) – WDL options to carry through.

  • export (bool | None) – Always create exported copies of files rather than views that a FileStore might clean up.

  • filename (str)

  • dest_dir (str)

  • file_source (toil.fileStores.abstractFileStore.AbstractFileStore | toil.common.Toil)

  • devirtualized_to_virtualized (dict[str, str] | None)

  • virtualized_to_devirtualized (dict[str, str] | None)

Return type:

str

class toil.wdl.wdltoil.ToilWDLStdLibWorkflow(*args, **kwargs)

Bases: ToilWDLStdLibBase

Standard library implementation for workflow scope.

Handles deduplicating files generated by write_* calls at workflow scope with copies already in the call cache, so that tasks that depend on them can also be fulfilled from the cache.

Parameters:
  • args (Any)

  • kwargs (Any)

class toil.wdl.wdltoil.ToilWDLStdLibTaskCommand(file_store, container, wdl_options)

Bases: ToilWDLStdLibBase

Standard library implementation to use inside a WDL task command evaluation.

Expects all the filenames in variable bindings to be container-side paths; these are the “virtualized” filenames, while the “devirtualized” filenames are host-side paths.

Parameters:
container
class toil.wdl.wdltoil.ToilWDLStdLibTaskOutputs(file_store, stdout_path, stderr_path, file_to_mountpoint, wdl_options, share_files_with=None)

Bases: ToilWDLStdLibBase, WDL.StdLib.TaskOutputs

Standard library implementation for WDL as run on Toil, with additional functions only allowed in task output sections.

Parameters:
stdout_used()

Return True if the standard output was read by the WDL.

Return type:

bool

stderr_used()

Return True if the standard error was read by the WDL.

Return type:

bool

toil.wdl.wdltoil.evaluate_named_expression(context, name, expected_type, expression, environment, stdlib)

Evaluate an expression when we know the name of it.

Parameters:
  • context (WDL.Error.SourceNode | WDL.Error.SourcePosition)

  • name (str)

  • expected_type (WDL.Type.Base | None)

  • expression (WDL.Expr.Base | None)

  • environment (WDLBindings)

  • stdlib (WDL.StdLib.Base)

Return type:

WDL.Value.Base

toil.wdl.wdltoil.evaluate_decl(node, environment, stdlib)

Evaluate the expression of a declaration node, or raise an error.

Parameters:
  • node (WDL.Tree.Decl)

  • environment (WDLBindings)

  • stdlib (WDL.StdLib.Base)

Return type:

WDL.Value.Base

toil.wdl.wdltoil.evaluate_call_inputs(context, expressions, environment, stdlib, inputs_dict=None)

Evaluate a bunch of expressions with names, and make them into a fresh set of bindings. inputs_dict is a mapping of variable names to their expected type for the input decls in a task.

Parameters:
  • context (WDL.Error.SourceNode | WDL.Error.SourcePosition)

  • expressions (dict[str, WDL.Expr.Base])

  • environment (WDLBindings)

  • stdlib (WDL.StdLib.Base)

  • inputs_dict (dict[str, WDL.Type.Base] | None)

Return type:

WDLBindings

toil.wdl.wdltoil.evaluate_defaultable_decl(node, environment, stdlib)

If the name of the declaration is already defined in the environment, return its value. Otherwise, return the evaluated expression.

Parameters:
  • node (WDL.Tree.Decl)

  • environment (WDLBindings)

  • stdlib (WDL.StdLib.Base)

Return type:

WDL.Value.Base

toil.wdl.wdltoil.devirtualize_files(environment, stdlib)

Make sure all the File values embedded in the given bindings point to files that are actually available to command line commands. The same virtual file always maps to the same devirtualized filename even with duplicates

Parameters:
Return type:

WDLBindings

toil.wdl.wdltoil.virtualize_files(environment, stdlib, enforce_existence=True)

Make sure all the File values embedded in the given bindings point to files that are usable from other machines.

Parameters:
Return type:

WDLBindings

toil.wdl.wdltoil.add_paths(task_container, host_paths)

Based off of WDL.runtime.task_container.add_paths from miniwdl Maps the host path to the container paths

Parameters:
  • task_container (WDL.runtime.task_container.TaskContainer)

  • host_paths (Iterable[str])

Return type:

None

toil.wdl.wdltoil.drop_if_missing(file, standard_library)

Return None if a file doesn’t exist, or its path if it does.

filename represents a URI or file name belonging to a WDL value of type value_type. work_dir represents the current working directory of the job and is where all relative paths will be interpreted from

Parameters:
Return type:

WDL.Value.File | None

toil.wdl.wdltoil.drop_missing_files(environment, standard_library)

Make sure all the File values embedded in the given bindings point to files that exist, or are null.

Files must not be virtualized.

Parameters:
Return type:

WDLBindings

toil.wdl.wdltoil.get_file_paths_in_bindings(environment)

Get the paths of all files in the bindings. Doesn’t guarantee that duplicates are removed.

TODO: Duplicative with WDL.runtime.task._fspaths, except that is internal and supports Directory objects.

Parameters:

environment (WDLBindings)

Return type:

list[str]

toil.wdl.wdltoil.map_over_files_in_bindings(environment, transform)

Run all File values embedded in the given bindings through the given transformation function.

The transformation function must not mutate the original File.

TODO: Replace with WDL.Value.rewrite_env_paths or WDL.Value.rewrite_files

Parameters:
  • environment (WDLBindings)

  • transform (Callable[[WDL.Value.File], WDL.Value.File | None])

Return type:

WDLBindings

toil.wdl.wdltoil.map_over_files_in_binding(binding, transform)

Run all File values’ types and values embedded in the given binding’s value through the given transformation function.

The transformation function must not mutate the original File.

Parameters:
  • binding (WDL.Env.Binding[WDL.Value.Base])

  • transform (Callable[[WDL.Value.File], WDL.Value.File | None])

Return type:

WDL.Env.Binding[WDL.Value.Base]

toil.wdl.wdltoil.map_over_typed_files_in_value(value, transform)

Run all File values embedded in the given value through the given transformation function.

The transformation function must not mutate the original File.

If the transform returns None, the file value is changed to Null.

The transform has access to the type information for the value, so it knows if it may return None, depending on if the value is optional or not.

The transform is allowed to return None only if the mapping result won’t actually be used, to allow for scans. So error checking needs to be part of the transform itself.

Parameters:
  • value (WDL.Value.Base)

  • transform (Callable[[WDL.Value.File], WDL.Value.File | None])

Return type:

WDL.Value.Base

toil.wdl.wdltoil.ensure_null_files_are_nullable(value, original_value, expected_type)

Run through all nested values embedded in the given value and check that the null values are valid.

If a null value is found that does not have a valid corresponding expected_type, raise an error

(This is currently only used to check that null values arising from File coercion are in locations with a nullable File? type. If this is to be used elsewhere, the error message should be changed to describe the appropriate types and not just talk about files.)

For example: If one of the nested values is null but the equivalent nested expected_type is not optional, a FileNotFoundError will be raised :param value: WDL base value to check. This is the WDL value that has been transformed and has the null elements :param original_value: The original WDL base value prior to the transformation. Only used for error messages :param expected_type: The WDL type of the value

Parameters:
  • value (WDL.Value.Base)

  • original_value (WDL.Value.Base)

  • expected_type (WDL.Type.Base)

Return type:

None

class toil.wdl.wdltoil.WDLBaseJob(wdl_options, **kwargs)

Bases: toil.job.Job

Base job class for all WDL-related jobs.

Responsible for post-processing returned bindings, to do things like add in null values for things not defined in a section. Post-processing operations can be added onto any job before it is saved, and will be applied as long as the job’s run method calls postprocess().

Also responsible for remembering the Toil WDL configuration keys and values.

Parameters:
  • wdl_options (WDLContext)

  • kwargs (Any)

run(file_store)

Run a WDL-related job.

Remember to decorate non-trivial overrides with report_wdl_errors().

Parameters:

file_store (toil.fileStores.abstractFileStore.AbstractFileStore)

Return type:

Any

then_underlay(underlay)

Apply an underlay of backup bindings to the result.

Parameters:

underlay (toil.job.Promised[WDLBindings])

Return type:

None

then_remove(remove)

Remove the given bindings from the result.

Parameters:

remove (toil.job.Promised[WDLBindings])

Return type:

None

then_namespace(namespace)

Put the result bindings into a namespace.

Parameters:

namespace (str)

Return type:

None

then_overlay(overlay)

Overlay the given bindings on top of the (possibly namespaced) result.

Parameters:

overlay (toil.job.Promised[WDLBindings])

Return type:

None

postprocess(bindings)

Apply queued changes to bindings.

Should be applied by subclasses’ run() implementations to their return values.

Parameters:

bindings (WDLBindings)

Return type:

WDLBindings

defer_postprocessing(other)

Give our postprocessing steps to a different job.

Use this when you are returning a promise for bindings, on the job that issues the promise.

Parameters:

other (WDLBaseJob)

Return type:

None

class toil.wdl.wdltoil.WDLTaskWrapperJob(task, prev_node_results, task_id, wdl_options, **kwargs)

Bases: WDLBaseJob

Job that determines the resources needed to run a WDL job.

Responsible for evaluating the input declarations for unspecified inputs, evaluating the runtime section, and scheduling or chaining to the real WDL job.

All bindings are in terms of task-internal names.

Parameters:
  • task (WDL.Tree.Task)

  • prev_node_results (Sequence[toil.job.Promised[WDLBindings]])

  • task_id (list[str])

  • wdl_options (WDLContext)

  • kwargs (Any)

run(file_store)

Evaluate inputs and runtime and schedule the task.

Parameters:

file_store (toil.fileStores.abstractFileStore.AbstractFileStore)

Return type:

toil.job.Promised[WDLBindings]

class toil.wdl.wdltoil.WDLTaskJob(task, task_internal_bindings, runtime_bindings, task_id, mount_spec, wdl_options, cache_key=None, **kwargs)

Bases: WDLBaseJob

Job that runs a WDL task.

Responsible for re-evaluating input declarations for unspecified inputs, evaluating the runtime section, re-scheduling if resources are not available, running any command, and evaluating the outputs.

All bindings are in terms of task-internal names.

Parameters:
  • task (WDL.Tree.Task)

  • task_internal_bindings (toil.job.Promised[WDLBindings])

  • runtime_bindings (toil.job.Promised[WDLBindings])

  • task_id (list[str])

  • mount_spec (dict[str | None, int])

  • wdl_options (WDLContext)

  • cache_key (str | None)

  • kwargs (Any)

INJECTED_MESSAGE_DIR = '.toil_wdl_runtime'
add_injections(command_string, task_container)

Inject extra Bash code from the Toil WDL runtime into the command for the container.

Currently doesn’t implement the MiniWDL plugin system, but does add resource usage monitoring to Docker containers.

Parameters:
  • command_string (str)

  • task_container (WDL.runtime.task_container.TaskContainer)

Return type:

str

handle_injection_messages(outputs_library)

Handle any data received from injected runtime code in the container.

Parameters:

outputs_library (ToilWDLStdLibTaskOutputs)

Return type:

None

handle_message_file(file_path)

Handle a message file received from in-container injected code.

Takes the host-side path of the file.

Parameters:

file_path (str)

Return type:

None

can_fake_root()

Determine if –fakeroot is likely to work for Singularity.

Return type:

bool

can_mount_proc()

Determine if –containall will work for Singularity. On Kubernetes, this will result in operation not permitted See: https://github.com/apptainer/singularity/issues/5857

So if Kubernetes is detected, return False :return: bool

Return type:

bool

ensure_mount_point(file_store, mount_spec)

Ensure the mount point sources are available.

Will check if the mount point source has the requested amount of space available.

Note: We are depending on Toil’s job scheduling backend to error when the sum of multiple mount points disk requests is greater than the total available For example, if a task has two mount points request 100 GB each but there is only 100 GB available, the df check may pass but Toil should fail to schedule the jobs internally

Parameters:
Returns:

Dict mapping mount point target to mount point source

Return type:

dict[str, str]

run(file_store)

Actually run the task.

Parameters:

file_store (toil.fileStores.abstractFileStore.AbstractFileStore)

Return type:

toil.job.Promised[WDLBindings]

class toil.wdl.wdltoil.WDLWorkflowNodeJob(node, prev_node_results, wdl_options, **kwargs)

Bases: WDLBaseJob

Job that evaluates a WDL workflow node.

Parameters:
  • node (WDL.Tree.WorkflowNode)

  • prev_node_results (Sequence[toil.job.Promised[WDLBindings]])

  • wdl_options (WDLContext)

  • kwargs (Any)

run(file_store)

Actually execute the workflow node.

Parameters:

file_store (toil.fileStores.abstractFileStore.AbstractFileStore)

Return type:

toil.job.Promised[WDLBindings]

class toil.wdl.wdltoil.WDLWorkflowNodeListJob(nodes, prev_node_results, wdl_options, **kwargs)

Bases: WDLBaseJob

Job that evaluates a list of WDL workflow nodes, which are in the same scope and in a topological dependency order, and which do not call out to any other workflows or tasks or sections.

Parameters:
  • nodes (list[WDL.Tree.WorkflowNode])

  • prev_node_results (Sequence[toil.job.Promised[WDLBindings]])

  • wdl_options (WDLContext)

  • kwargs (Any)

run(file_store)

Actually execute the workflow nodes.

Parameters:

file_store (toil.fileStores.abstractFileStore.AbstractFileStore)

Return type:

toil.job.Promised[WDLBindings]

class toil.wdl.wdltoil.WDLCombineBindingsJob(prev_node_results, **kwargs)

Bases: WDLBaseJob

Job that collects the results from WDL workflow nodes and combines their environment changes.

Parameters:
  • prev_node_results (Sequence[toil.job.Promised[WDLBindings]])

  • kwargs (Any)

run(file_store)

Aggregate incoming results.

Parameters:

file_store (toil.fileStores.abstractFileStore.AbstractFileStore)

Return type:

WDLBindings

class toil.wdl.wdltoil.WDLWorkflowGraph(nodes)

Represents a graph of WDL WorkflowNodes.

Operates at a certain level of instantiation (i.e. sub-sections are represented by single nodes).

Assumes all relevant nodes are provided; dependencies outside the provided nodes are assumed to be satisfied already.

Parameters:

nodes (Sequence[WDL.Tree.WorkflowNode])

real_id(node_id)

Map multiple IDs for what we consider the same node to one ID.

This elides/resolves gathers.

Parameters:

node_id (str)

Return type:

str

is_decl(node_id)

Return True if a node represents a WDL declaration, and false otherwise.

Parameters:

node_id (str)

Return type:

bool

get(node_id)

Get a node by ID.

Parameters:

node_id (str)

Return type:

WDL.Tree.WorkflowNode

get_dependencies(node_id)

Get all the nodes that a node depends on, recursively (into the node if it has a body) but not transitively.

Produces dependencies after resolving gathers and internal-to-section dependencies, on nodes that are also in this graph.

Parameters:

node_id (str)

Return type:

set[str]

get_transitive_dependencies(node_id)

Get all the nodes that a node depends on, transitively.

Parameters:

node_id (str)

Return type:

set[str]

topological_order()

Get a topological order of the nodes, based on their dependencies.

Return type:

list[str]

leaves()

Get all the workflow node IDs that have no dependents in the graph.

Return type:

list[str]

class toil.wdl.wdltoil.WDLSectionJob(wdl_options, **kwargs)

Bases: WDLBaseJob

Job that can create more graph for a section of the workflow.

Parameters:
  • wdl_options (WDLContext)

  • kwargs (Any)

static coalesce_nodes(order, section_graph)

Given a topological order of WDL workflow node IDs, produce a list of lists of IDs, still in topological order, where each list of IDs can be run under a single Toil job.

Parameters:
Return type:

list[list[str]]

create_subgraph(nodes, gather_nodes, environment, local_environment=None, subscript=None)

Make a Toil job to evaluate a subgraph inside a workflow or workflow section.

Returns:

a child Job that will return the aggregated environment after running all the things in the section.

Parameters:
  • gather_nodes (Sequence[WDL.Tree.Gather]) – Names exposed by these will always be defined with something, even if the code that defines them does not actually run.

  • environment (WDLBindings) – Bindings in this environment will be used to evaluate the subgraph and will be passed through.

  • local_environment (WDLBindings | None) – Bindings in this environment will be used to evaluate the subgraph but will go out of scope at the end of the section.

  • subscript (int | None) – If the subgraph is being evaluated multiple times, this should be a disambiguating integer for logging.

  • nodes (Sequence[WDL.Tree.WorkflowNode])

Return type:

WDLBaseJob

make_gather_bindings(gathers, undefined)

Given a collection of Gathers, create bindings from every identifier gathered, to the given “undefined” placeholder (which would be Null for a single execution of the body, or an empty array for a completely unexecuted scatter).

These bindings can be overlaid with bindings from the actual execution, so that references to names defined in unexecuted code get a proper default undefined value, and not a KeyError at runtime.

The information to do this comes from MiniWDL’s “gathers” system: <https://miniwdl.readthedocs.io/en/latest/WDL.html#WDL.Tree.WorkflowSection.gathers>

TODO: This approach will scale O(n^2) when run on n nested conditionals, because generating these bindings for the outer conditional will visit all the bindings from the inner ones.

Parameters:
  • gathers (Sequence[WDL.Tree.Gather])

  • undefined (WDL.Value.Base)

Return type:

WDLBindings

class toil.wdl.wdltoil.WDLScatterJob(scatter, prev_node_results, wdl_options, **kwargs)

Bases: WDLSectionJob

Job that evaluates a scatter in a WDL workflow. Runs the body for each value in an array, and makes arrays of the new bindings created in each instance of the body. If an instance of the body doesn’t create a binding, it gets a null value in the corresponding array.

Parameters:
  • scatter (WDL.Tree.Scatter)

  • prev_node_results (Sequence[toil.job.Promised[WDLBindings]])

  • wdl_options (WDLContext)

  • kwargs (Any)

run(file_store)

Run the scatter.

Parameters:

file_store (toil.fileStores.abstractFileStore.AbstractFileStore)

Return type:

toil.job.Promised[WDLBindings]

class toil.wdl.wdltoil.WDLArrayBindingsJob(input_bindings, base_bindings, **kwargs)

Bases: WDLBaseJob

Job that takes all new bindings created in an array of input environments, relative to a base environment, and produces bindings where each new binding name is bound to an array of the values in all the input environments.

Useful for producing the results of a scatter.

Parameters:
  • input_bindings (Sequence[toil.job.Promised[WDLBindings]])

  • base_bindings (WDLBindings)

  • kwargs (Any)

run(file_store)

Actually produce the array-ified bindings now that promised values are available.

Parameters:

file_store (toil.fileStores.abstractFileStore.AbstractFileStore)

Return type:

WDLBindings

class toil.wdl.wdltoil.WDLConditionalJob(conditional, prev_node_results, wdl_options, **kwargs)

Bases: WDLSectionJob

Job that evaluates a conditional in a WDL workflow.

Parameters:
  • conditional (WDL.Tree.Conditional)

  • prev_node_results (Sequence[toil.job.Promised[WDLBindings]])

  • wdl_options (WDLContext)

  • kwargs (Any)

run(file_store)

Run the conditional.

Parameters:

file_store (toil.fileStores.abstractFileStore.AbstractFileStore)

Return type:

toil.job.Promised[WDLBindings]

class toil.wdl.wdltoil.WDLWorkflowJob(workflow, prev_node_results, workflow_id, wdl_options, **kwargs)

Bases: WDLSectionJob

Job that evaluates an entire WDL workflow.

Parameters:
  • workflow (WDL.Tree.Workflow)

  • prev_node_results (Sequence[toil.job.Promised[WDLBindings]])

  • workflow_id (list[str])

  • wdl_options (WDLContext)

  • kwargs (Any)

run(file_store)

Run the workflow. Return the result of the workflow.

Parameters:

file_store (toil.fileStores.abstractFileStore.AbstractFileStore)

Return type:

toil.job.Promised[WDLBindings]

class toil.wdl.wdltoil.WDLOutputsJob(workflow, bindings, wdl_options, cache_key=None, **kwargs)

Bases: WDLBaseJob

Job which evaluates an outputs section for a workflow.

Returns an environment with just the outputs bound, in no namespace.

Parameters:
  • workflow (WDL.Tree.Workflow)

  • bindings (toil.job.Promised[WDLBindings])

  • wdl_options (WDLContext)

  • cache_key (str | None)

  • kwargs (Any)

run(file_store)

Make bindings for the outputs.

Parameters:

file_store (toil.fileStores.abstractFileStore.AbstractFileStore)

Return type:

WDLBindings

class toil.wdl.wdltoil.WDLStartJob(target, inputs, wdl_options, **kwargs)

Bases: WDLSectionJob

Job that evaluates an entire WDL workflow, and returns the workflow outputs namespaced with the workflow name. Inputs may or may not be namespaced with the workflow name; both forms are accepted.

Parameters:
  • target (WDL.Tree.Workflow | WDL.Tree.Task)

  • inputs (toil.job.Promised[WDLBindings])

  • wdl_options (WDLContext)

  • kwargs (Any)

run(file_store)

Actually build the subgraph.

Parameters:

file_store (toil.fileStores.abstractFileStore.AbstractFileStore)

Return type:

toil.job.Promised[WDLBindings]

class toil.wdl.wdltoil.WDLInstallImportsJob(task_path, inputs, import_data, **kwargs)

Bases: toil.job.Job

Class represents a unit of work in toil.

Parameters:
run(file_store)

Convert the filenames in the workflow inputs ito the URIs :return: Promise of transformed workflow inputs

Parameters:

file_store (toil.fileStores.abstractFileStore.AbstractFileStore)

Return type:

toil.job.Promised[WDLBindings]

class toil.wdl.wdltoil.WDLImportWrapper(target, inputs, wdl_options, inputs_search_path, import_remote_files, import_workers_threshold, import_workers_disk, **kwargs)

Bases: WDLSectionJob

Job to organize importing files on workers instead of the leader. Responsible for extracting filenames and metadata, calling ImportsJob, applying imports to input bindings, and scheduling the start workflow job

This class is only used when runImportsOnWorkers is enabled.

Parameters:
  • target (Union[WDL.Tree.Workflow, WDL.Tree.Task])

  • inputs (WDLBindings)

  • wdl_options (WDLContext)

  • inputs_search_path (list[str])

  • import_remote_files (bool)

  • import_workers_threshold (toil.job.ParseableIndivisibleResource)

  • import_workers_disk (toil.job.ParseableIndivisibleResource)

  • kwargs (Any)

run(file_store)

Run a WDL-related job.

Remember to decorate non-trivial overrides with report_wdl_errors().

Parameters:

file_store (toil.fileStores.abstractFileStore.AbstractFileStore)

Return type:

toil.job.Promised[WDLBindings]

toil.wdl.wdltoil.make_root_job(target, inputs, inputs_search_path, toil, wdl_options, options)
Parameters:
  • target (WDL.Tree.Workflow | WDL.Tree.Task)

  • inputs (WDLBindings)

  • inputs_search_path (list[str])

  • toil (toil.common.Toil)

  • wdl_options (WDLContext)

  • options (configargparse.Namespace)

Return type:

WDLSectionJob

toil.wdl.wdltoil.main()

A Toil workflow to interpret WDL input files.

Return type:

None