toil.wdl.wdl_synthesis

Module Contents

Classes

SynthesizeWDL

SynthesizeWDL takes the "workflows_dictionary" and "tasks_dictionary" produced by

Attributes

logger

toil.wdl.wdl_synthesis.logger
class toil.wdl.wdl_synthesis.SynthesizeWDL(version, tasks_dictionary, workflows_dictionary, output_directory, json_dict, docker_user, jobstore=None, destBucket=None)[source]

SynthesizeWDL takes the “workflows_dictionary” and “tasks_dictionary” produced by wdl_analysis.py and uses them to write a native python script for use with Toil.

A WDL “workflow” section roughly corresponds to the python “main()” function, where functions are wrapped as Toil “jobs”, output dependencies specified, and called.

A WDL “task” section corresponds to a unique python function, which will be wrapped as a Toil “job” and defined outside of the “main()” function that calls it.

Generally this handles breaking sections into their corresponding Toil counterparts.

For example: write the imports, then write all functions defining jobs (which have subsections like: write header, define variables, read “File” types into the jobstore, docker call, etc.), then write the main and all of its subsections.

Parameters
  • version (str) –

  • tasks_dictionary (dict) –

  • workflows_dictionary (dict) –

  • output_directory (str) –

  • json_dict (dict) –

  • docker_user (str) –

  • jobstore (Optional[str]) –

  • destBucket (Optional[str]) –

write_modules()[source]
write_main()[source]

Writes out a huge string representing the main section of the python compiled toil script.

Currently looks at and writes 5 sections: 1. JSON Variables (includes importing and preparing files as tuples) 2. TSV Variables (includes importing and preparing files as tuples) 3. CSV Variables (includes importing and preparing files as tuples) 4. Wrapping each WDL “task” function as a toil job 5. List out children and encapsulated jobs by priority, then start job0.

This should create variable declarations necessary for function calls. Map file paths appropriately and store them in the toil fileStore so that they are persistent from job to job. Create job wrappers for toil. And finally write out, and run the jobs in order of priority using the addChild and encapsulate commands provided by toil.

Returns

giant string containing the main def for the toil script.

write_main_header()[source]
write_main_jobwrappers()[source]

Writes out ‘jobs’ as wrapped toil objects in preparation for calling.

Returns

A string representing this.

write_main_jobwrappers_declaration(declaration)[source]
write_main_destbucket()[source]

Writes out a loop for exporting outputs to a cloud bucket.

Returns

A string representing this.

fetch_ignoredifs(assignments, breaking_assignment)[source]
fetch_ignoredifs_chain(assignments, breaking_assignment)[source]
write_main_jobwrappers_if(if_statement)[source]
write_main_jobwrappers_scatter(task, assignment)[source]
fetch_scatter_outputs(task)[source]
fetch_scatter_inputs(assigned)[source]
fetch_scatter_inputs_chain(inputs, assigned, ignored_ifs, inputs_list)[source]
write_main_jobwrappers_call(task)[source]
fetch_call_outputs(task)[source]
write_functions()[source]

Writes out a python function for each WDL “task” object.

Returns

a giant string containing the meat of the job defs.

write_scatterfunctions_within_if(ifstatement)[source]
write_scatterfunction(job, scattername)[source]

Writes out a python function for each WDL “scatter” object.

write_scatterfunction_header(scattername)[source]
Returns

write_scatterfunction_outputreturn(scatter_outputs)[source]
Returns

write_scatterfunction_lists(scatter_outputs)[source]
Returns

write_scatterfunction_loop(job, scatter_outputs)[source]
Returns

write_scatter_callwrapper(job, previous_dependency)[source]
write_function(job)[source]

Writes out a python function for each WDL “task” object.

Each python function is a unit of work written out as a string in preparation to being written out to a file. In WDL, each “job” is called a “task”. Each WDL task is written out in multiple steps:

1: Header and inputs (e.g. ‘def mapping(self, input1, input2)’) 2: Log job name (e.g. ‘job.fileStore.logToMaster(‘initialize_jobs’)’) 3: Create temp dir (e.g. ‘tempDir = fileStore.getLocalTempDir()’) 4: import filenames and use readGlobalFile() to get files from the

jobStore

5: Reformat commandline variables (like converting to ‘ ‘.join(files)). 6: Commandline call using subprocess.Popen(). 7: Write the section returning the outputs. Also logs stats.

Returns

a giant string containing the meat of the job defs for the toil script.

write_function_header(job)[source]

Writes the header that starts each function, for example, this function can write and return:

‘def write_function_header(self, job, job_declaration_array):’

Parameters
  • job – A list such that: (job priority #, job ID #, Job Skeleton Name, Job Alias)

  • job_declaration_array – A list of all inputs that job requires.

Returns

A string representing this.

json_var(var, task=None, wf=None)[source]
Parameters
  • var

  • task

  • wf

Returns

needs_file_import(var_type)[source]

Check if the given type contains a File type. A return value of True means that the value with this type has files to import.

Parameters

var_type (toil.wdl.wdl_types.WDLType) –

Return type

bool

write_declaration_type(var_type)[source]

Return a string that preserves the construction of the given WDL type so it can be passed into the compiled script.

Parameters

var_type (toil.wdl.wdl_types.WDLType) –

write_function_bashscriptline(job)[source]

Writes a function to create a bashscript for injection into the docker container.

Parameters
  • job_task_reference – The job referenced in WDL’s Task section.

  • job_alias – The actual job name to be written.

Returns

A string writing all of this.

write_function_dockercall(job)[source]

Writes a string containing the apiDockerCall() that will run the job.

Parameters
  • job_task_reference – The name of the job calling docker.

  • docker_image – The corresponding name of the docker image. e.g. “ubuntu:latest”

Returns

A string containing the apiDockerCall() that will run the job.

write_function_cmdline(job)[source]

Write a series of commandline variables to be concatenated together eventually and either called with subprocess.Popen() or with apiDockerCall() if a docker image is called for.

Parameters

job – A list such that: (job priority #, job ID #, Job Skeleton Name, Job Alias)

Returns

A string representing this.

write_function_subprocesspopen()[source]

Write a subprocess.Popen() call for this function and write it out as a string.

Parameters

job – A list such that: (job priority #, job ID #, Job Skeleton Name, Job Alias)

Returns

A string representing this.

write_function_outputreturn(job, docker=False)[source]

Find the output values that this function needs and write them out as a string.

Parameters
  • job – A list such that: (job priority #, job ID #, Job Skeleton Name, Job Alias)

  • job_task_reference – The name of the job to look up values for.

Returns

A string representing this.

indent(string2indent)[source]

Indent the input string by 4 spaces.

Parameters

string2indent (str) –

Return type

str

needsdocker(job)[source]
Parameters

job

Returns

write_python_file(module_section, fn_section, main_section, output_file)[source]

Just takes three strings and writes them to output_file.

Parameters
  • module_section – A string of ‘import modules’.

  • fn_section – A string of python ‘def functions()’.

  • main_section – A string declaring toil options and main’s header.

  • job_section – A string import files into toil and declaring jobs.

  • output_file – The file to write the compiled toil script to.