Running in Google Compute Engine (GCE)¶
Preparing your Google environment¶
Toil supports using the Google Cloud Platform. Setting this up is easy!
Make sure that the
Follow Google’s Instructions to download credentials and set the
Create a new ssh key with the proper format. To create a new ssh key run the command
$ ssh-keygen -t rsa -f ~/.ssh/id_rsa -C [USERNAME]
[USERNAME]is something like
email@example.com. Make sure to leave your password blank.
This command could overwrite an old ssh key you may be using. If you have an existing ssh key you would like to use, it will need to be called id_rsa and it needs to have no password set.
Make sure only you can read the SSH keys:
$ chmod 400 ~/.ssh/id_rsa ~/.ssh/id_rsa.pub
Add your newly formatted public key to Google. To do this, log into your Google Cloud account and go to metadata section under the Compute tab.
Near the top of the screen click on ‘SSH Keys’, then edit, add item, and paste the key. Then save:
For more details look at Google’s instructions for adding SSH keys.
Google Job Store¶
To use the Google Job Store you will need to set the
GOOGLE_APPLICATION_CREDENTIALS environment variable by following Google’s instructions.
Then to run the sort example with the Google job store you would type
$ python sort.py google:my-project-id:my-google-sort-jobstore
Running a Workflow with Autoscaling¶
Google Autoscaling is in beta!
The steps to run a GCE workflow are similar to those of AWS (Running a Workflow with Autoscaling), except you will
need to explicitly specify the
--provisioner gce option which otherwise defaults to
Launch the leader node in GCE using the Launch-Cluster Command command:
(venv) $ toil launch-cluster <CLUSTER-NAME> \ --provisioner gce \ --leaderNodeType n1-standard-1 \ --keyPairName <SSH-KEYNAME> \ --zone us-west1-a
<SSH-KEYNAME>is the first part of
[USERNAME]used when setting up your ssh key. For example if
--keyPairNameoption is for an SSH key that was added to the Google account. If your ssh key
firstname.lastname@example.org, then your key pair name will be just
Upload the sort example and ssh into the leader:
(venv) $ toil rsync-cluster --provisioner gce <CLUSTER-NAME> sort.py :/root (venv) $ toil ssh-cluster --provisioner gce <CLUSTER-NAME>
Run the workflow:
$ python /root/sort.py google:<PROJECT-ID>:<JOBSTORE-NAME> \ --provisioner gce \ --batchSystem mesos \ --nodeTypes n1-standard-2 \ --maxNodes 2
$ exit # this exits the ssh from the leader node (venv) $ toil destroy-cluster --provisioner gce <CLUSTER-NAME>