Running in the cloud¶
Toil jobs can be run on a variety of cloud platforms. Of these, Amazon Web Services is currently the best-supported solution.
screen to open a new
session. Later, type
ctrl-a and then
d to disconnect from it, and run
screen -r to reconnect to it. Commands running under
continue running even when you are disconnected, allowing you to unplug your
laptop and take it home without ending your Toil jobs. See Toil Provisioner
for complications that can occur when using screen within the Toil Appliance.
To begin, launch a Toil leader instance using your choice of provisioners.
Once we have our leader instance launched, the only remaining step is
to kick off our Toil run with special autoscaling options. Now might be
an opportune time to read up on Toil’s extensive configuration options
--help to your toil script invocation.
There are a number of autoscaling specific options, but only 2 options are
strictly necessary to enable autoscaling:
These options, respectively, tell Toil that we are running on AWS (currently the
only supported autoscaling environment) and which instance type to use for the
Toil worker instances.
Toil can run on a heterogenous cluster of both preemptable and non-preemptable nodes.
Our preemptable node type can be set by using the
--preemptableNodeType=<> flag. While individual jobs can
each explicitly specify whether or not they should be run on preemptable nodes
via the boolean
preemptable resource requirement, the
--defaultPreemptable flag will allow jobs without a
requirement to run on preemptable machines.
We can set the maximum number of preemptable and non-preemptable nodes via the flags
Specify Preemptability Carefully
Ensure that your choices for
sense for your workflow and won’t cause it to hang - if the workflow requires preemptable nodes set
--maxPreemptableNodes to some non-zero value and if any job requires
non-preemptable nodes set
--maxNodes to some non-zero value.
--preemptableCompensation flag can be used to handle
cases where preemptable nodes may not be available but are required for your
Using Mesos with Toil on AWS
The mesos master and agent processes bind to the private IP addresses of their
EC2 instance, so be sure to use the master’s private IP when specifying
--mesosMaster. Using the public IP will prevent the nodes from properly
discovering each other.
Running on AWS¶
See Amazon Web Services to get setup for running on AWS.
Having followed the Quickstart: A simple workflow guide, the user can run their
HelloWorld.py script on a distributed cluster just by modifying the run
command. Since our cluster is distributed, we’ll use the
aws job store
which uses a combination of one S3 bucket and a couple of SimpleDB domains.
This allows all nodes in the cluster access to the job store which would not be
possible if we were to use the
file job store with a locally mounted file
system on the leader.
HelloWorld.py to the leader node, and run:
$ python HelloWorld.py \ --batchSystem=mesos \ --mesosMaster=master-private-ip:5050 \ aws:us-west-2:my-aws-jobstore
Alternatively, to run a CWL workflow:
$ cwltoil --batchSystem=mesos \ --mesosMaster=master-private-ip:5050 \ --jobStore=aws:us-west-2:my-aws-jobstore \ example.cwl \ example-job.yml
When running a CWL workflow on AWS, input files can be provided either on the
local file system or in S3 buckets using
s3:// URL references. Final output
files will be copied to the local file system of the leader node.
Running on Azure¶
See Azure to get setup for running on Azure. This section assumes that you are SSHed into your cluster’s leader node.
The Azure templates do not create a shared filesystem; you need to use the
azure job store for which you need to create an Azure storage account.
You can store multiple job stores in a single storage account.
To create a new storage account, if you do not already have one:
- Click here,
or navigate to
https://portal.azure.com/#create/Microsoft.StorageAccountin your browser.
- If necessary, log into the Microsoft Account that you use for Azure.
- Fill out the presented form. The Name for the account, notably, must be a 3-to-24-character string of letters and lowercase numbers that is globally unique. For Deployment model, choose Resource manager. For Resource group, choose or create a resource group different than the one in which you created your cluster. For Location, choose the same region that you used for your cluster.
- Press the Create button. Wait for your storage account to be created; you should get a notification in the notifications area at the upper right when that is done.
Once you have a storage account, you need to authorize the cluster to access the storage account, by giving it the access key. To do find your storage account’s access key:
- When your storage account has been created, open it up and click the “Settings” icon.
- In the Settings panel, select Access keys.
- Select the text in the Key1 box and copy it to the clipboard, or use the copy-to-clipboard icon.
You then need to share the key with the cluster. To do this temporarily, for the duration of an SSH or screen session:
- On the leader node, run
export AZURE_ACCOUNT_KEY="<KEY>", replacing
<KEY>with the access key you copied from the Azure portal.
To do this permanently:
On the leader node, run
In the editor that opens, navigate with the arrow keys, and give the file the following contents
Be sure to replace
<accountname>with the name that you used for your Azure storage account, and
<accountkey>with the key you obtained above. (If you want, you can have multiple accounts with different keys in this file, by adding multipe lines. If you do this, be sure to leave the
AZURE_ACCOUNT_KEYenvironment variable unset.)
ctrl-oto save the file, and
ctrl-xto exit the editor.
Once that’s done, you are now ready to actually execute a job, storing your job
store in that Azure storage account. Assuming you followed the
Quickstart: A simple workflow guide above, you have an Azure storage account created, and
you have placed the storage account’s access key on the cluster, you can run
HelloWorld.py script by doing the following:
Place your script on the leader node, either by downloading it from the command line or typing or copying it into a command-line editor.
Run the command:
$ python HelloWorld.py \ --batchSystem=mesos \ --mesosMaster=10.0.0.5:5050 \ azure:<accountname>:hello-world-001
To run a CWL workflow:
$ cwltoil --batchSystem=mesos \ --mesosMaster=10.0.0.5:5050 \ --jobStore=azure:<accountname>:hello-world-001 \ example.cwl \ example-job.yml
Be sure to replace
<accountname>with the name of your Azure storage account.
Note that once you run a job with a particular job store name (the part after the account name) in a particular storage account, you cannot re-use that name in that account unless one of the following happens:
- You are restarting the same job with the
- You clean the job store with
toil clean azure:<accountname>:<jobstore>.
- You delete all the items created by that job, and the main job store table used by Toil, from the account (destroying all other job stores using the account).
- The job finishes successfully and cleans itself up.
Running on Open Stack¶
After setting up Toil on OpenStack, Toil scripts can be run
by designating a job store location as shown in Quickstart: A simple workflow.
Be sure to specify a temporary directory that Toil can use to run jobs in with
$ python HelloWorld.py --workDir=/tmp file:jobStore
Running on Google Compute Engine¶
If you wish to use the Google Storage job store, install Toil with the
.boto with your
credentials and some configuration:
[Credentials] gs_access_key_id = KEY_ID gs_secret_access_key = SECRET_KEY [Boto] https_validate_certificates = True [GSUtil] content_language = en default_api_version = 2
gs_secret_access_key can be generated by navigating
to your Google Cloud Storage console and clicking on Settings. On
the Settings page, navigate to the Interoperability tab and click Enable
interoperability access. On this page you can now click Create a new key to
generate an access key and a matching secret. Insert these into their
respective places in the
.boto file and you will be able to use a Google
job store when invoking a Toil script, as in the following example:
$ python HelloWorld.py google:projectID:jobStore
projectID component of the job store argument above refers your Google
Cloud Project ID in the Google Cloud Console, and will be visible in the
console’s banner at the top of the screen. The
jobStore component is a name
of your choosing that you will use to refer to this job store.