This section details how to properly set up Toil and its dependencies in various cloud environments.
Amazon Web Services¶
The native Toil provisioner is included in Toil alongside the
[aws] extra and
allows us to spin up a cluster without any external dependencies. It is built around the
Toil Appliance, a Docker image that bundles Toil and all its requirements,
e.g. Mesos. This makes deployment simple across platforms, and you can even
simulate a cluster locally (see Developing with the Toil Appliance for details).
When using the Toil provisioner, the appliance image will be automatically chosen
based on the pip installed version of Toil on your system. That choice can be
overriden by setting the environment variables
TOIL_APPLIANCE_SELF. See Environment variable options for more information on these variables. If
you are developing autoscaling and want to test and build your own
appliance have a look at Developing with the Toil Appliance.
Using the provisioner to launch a Toil leader instance is simple:
$ toil launch-cluster CLUSTER-NAME-HERE --nodeType=t2.micro \ -z us-west-2a --keyPairName=your-AWS-key-pair-name
The cluster name is used to uniquely identify your cluster and will be used to
populate the instance’s
Name tag. In addition, the Toil provisioner will
automatically tag your cluster with an
Owner tag that corresponds to your
keypair name to facilitate cost tracking.
-z parameter is important since it specifies which EC2 availability
zone to launch the cluster in. Alternatively, you can specify this option
TOIL_AWS_ZONE environment variable. This is generally preferable
since it lets us avoid repeating the
-z option for every subsequent
cluster command. We will assume this environment variable is set for the
rest of the tutorial. Note: the zone is different from an EC2 region. A
region corresponds to a geographical area like
us-west-2 (Oregon), and
availability zones are partitions of this area like
An important caveat to note here is that there is no currently parameter to specify the size of the instance’s root volume, which is currently set to 50 Gb. This support will be added soon, but in the mean time instances with ephemeral SSD volumes should be used if > 50 Gb of disk will be needed by any job in the pipeline. See here for a full selection of EC2 instance types.
Once the leader is running, the
rsync-cluster utilities can be
used to interact with the instance:
$ toil rsync-cluster CLUSTER-NAME-HERE \ ~/localFile :/remoteDestination
The most frequent use case for the
rsync-cluster utility is deploying your
Toil script to the Toil leader. Note that the syntax is the same as traditional
rsync with the exception of the hostname before
the colon. This is not needed in
toil rsync-cluster since the hostname is automatically
determined by Toil.
The last utility provided by the Toil Provisioner is
ssh-cluster and it
can be used as follows:
$ toil ssh-cluster CLUSTER-NAME-HERE
This will give you a shell on the Toil leader, where you proceed to start off your
:ref:Autoscaling run. This shell actually originates from within the Toil leader container,
and as such has a couple restrictions involving the use of the
The shell doesn’t know that it is a TTY, which prevents it from properly allocating
a new screen session. This can be worked around via:
$ script $ screen
script will get things working properly again.
Finally, you can execute remote commands with the following syntax:
$ toil ssh-cluster CLUSTER-NAME-HERE remoteCommand
It is not advised that you run your Toil workflow using remote execution like this unless a tool like nohup is used to insure the process does not die if the SSH connection is interrupted.
Setting up clusters with CGCloud has the benefit of coming pre-packaged with Toil and Mesos, our preferred batch system for running on AWS.
Create and activate a virtualenv:
$ virtualenv ~/cgcloud $ source ~/cgcloud/bin/activate
Install CGCloud and the CGCloud Toil plugin:
$ pip install cgcloud-toil
Add the following to your
~/.profile, using the appropriate region for your account:
export CGCLOUD_ZONE=us-west-2a export CGCLOUD_PLUGINS="cgcloud.toil:$CGCLOUD_PLUGINS"
Setup credentials for your AWS account in
[default] aws_access_key_id=PASTE_YOUR_FOO_ACCESS_KEY_ID_HERE aws_secret_access_key=PASTE_YOUR_FOO_SECRET_KEY_ID_HERE region=us-west-2
Register your SSH key. If you don’t have one, create it with
$ cgcloud register-key ~/.ssh/id_rsa.pub
Create a template toil-box which will contain necessary prerequisites:
$ cgcloud create -IT toil-box
Create a small leader/worker cluster:
$ cgcloud create-cluster toil -s 2 -t m3.large
SSH into the leader:
$ cgcloud ssh toil-leader
At this point, any Toil script can be run on the distributed AWS cluster by following instructions in Running on AWS.
Finally, if you wish to tear down the cluster and remove all its data permanently, CGCloud allows you to do so without logging into the AWS web interface:
$ cgcloud terminate-cluster toil
Toil comes with a cluster template to facilitate easy deployment of clusters running Toil on Microsoft Azure. The template allows these clusters to be created and managed through the Azure portal. To use the template to set up a Toil Mesos cluster on Azure, use the deploy button above, or open the deploy link in your browser.
Our group is working to expand distributed cluster support to OpenStack by providing convenient Docker containers to launch Mesos from. Currently, OpenStack nodes can be set up to run Toil in single machine mode by following the Installation.
Google Compute Engine¶
Support for running on Google Cloud is currently experimental. Our group is working to expand distributed cluster support to Google Compute with a cluster provisioning tool based around a Dockerized Mesos setup. Currently, Google Compute Engine nodes can be configured to run Toil in single machine mode by following the Installation.