toil.provisioners.aws.awsProvisioner

Attributes

AWSRegionName

AWSServerErrors

CLUSTER_LAUNCHING_PERMISSIONS

a_short_time

E2Instances

memoize

Memoize a function result based on its parameters using this decorator.

logger

F

Exceptions

ClusterCombinationNotSupportedException

Indicates that a provisioner does not support making a given type of cluster with a given architecture.

NoSuchClusterException

Indicates that the specified cluster does not exist.

NoSuchZoneException

Indicates that a valid zone could not be found.

ManagedNodesNotSupportedException

Raised when attempting to add managed nodes (which autoscale up and down by

InvalidClusterStateException

Common base class for all non-exit exceptions.

Classes

AWSConnectionManager

Class that represents a connection to AWS. Caches Boto 3 and Boto 2 objects

InstanceType

ErrorCondition

A wrapper describing an error condition.

AbstractProvisioner

Interface for provisioning worker nodes to use in a Toil cluster.

Shape

Represents a job or a node's "shape", in terms of the dimensions of memory, cores, disk and

Node

AWSProvisioner

Interface for provisioning worker nodes to use in a Toil cluster.

Functions

zone_to_region(zone)

Get a region (e.g. us-west-2) from a zone (e.g. us-west-1c).

get_flatcar_ami(ec2_client[, architecture])

Retrieve the flatcar AMI image to use as the base for all Toil autoscaling instances.

get_policy_permissions(region)

Returns an action collection containing lists of all permission grant patterns keyed by resource

policy_permissions_allow(given_permissions[, ...])

Check whether given set of actions are a subset of another given set of actions, returns true if they are

create_s3_bucket(s3_resource, bucket_name, region)

Create an AWS S3 bucket, using the given Boto3 S3 session, with the

flatten_tags(tags)

Convert tags from a key to value dict into a list of 'Key': xxx, 'Value': xxx dicts.

boto3_pager(requestor_callable, result_attribute_name, ...)

Yield all the results from calling the given Boto 3 method with the

human2bytes(string)

Given a string representation of some memory (i.e. '1024 Mib'), return the

create_auto_scaling_group(autoscaling_client, ...[, ...])

Create a new Auto Scaling Group with the given name (which is also its

create_instances(ec2_resource, image_id, key_name, ...)

Replaces create_ondemand_instances. Uses boto3 and returns a list of Boto3 instance dicts.

create_launch_template(ec2_client, template_name, ...)

Creates a launch template with the given name for launching instances with the given parameters.

create_ondemand_instances(boto3_ec2, image_id, spec[, ...])

Requests the RunInstances EC2 API call but accounts for the race between recently created

increase_instance_hop_limit(boto3_ec2, boto_instance_list)

Increase the default HTTP hop limit, as we are running Toil and Kubernetes inside a Docker container, so the default

create_spot_instances(boto3_ec2, price, image_id, spec)

Create instances on the spot market.

wait_instances_running(boto3_ec2, instances)

Wait until no instance in the given iterable is 'pending'. Yield every instance that

wait_transition(boto3_ec2, resource, from_states, to_state)

Wait until the specified EC2 resource (instance, image, volume, ...) transitions from any

wait_until_instance_profile_arn_exists(...)

truncExpBackoff()

get_error_body(e)

Get the body from a Boto 2 or 3 error, or compatible types.

get_error_code(e)

Get the error code name from a Boto 2 or 3 error, or compatible types.

get_error_message(e)

Get the error message string from a Boto 2 or 3 error, or compatible types.

get_error_status(e)

Get the HTTP status code from a compatible source.

old_retry([delays, timeout, predicate])

Deprecated.

retry([intervals, infinite_retries, errors, ...])

Retry a function if it fails with any Exception defined in "errors".

get_best_aws_zone([spotBid, nodeType, boto3_ec2, ...])

Get the right AWS zone to use.

get_client(…)

Get a Boto 3 client for a particular AWS service, usable by the current thread.

awsRetryPredicate(e)

expectedShutdownErrors(e)

Matches errors that we expect to occur during shutdown, and which indicate

awsRetry(f)

This decorator retries the wrapped function if aws throws unexpected errors.

awsFilterImpairedNodes(nodes, boto3_ec2)

collapse_tags(instance_tags)

Collapse tags from boto3 format to node format

Module Contents

toil.provisioners.aws.awsProvisioner.zone_to_region(zone)

Get a region (e.g. us-west-2) from a zone (e.g. us-west-1c).

Parameters:

zone (str)

Return type:

AWSRegionName

toil.provisioners.aws.awsProvisioner.AWSRegionName
toil.provisioners.aws.awsProvisioner.AWSServerErrors
toil.provisioners.aws.awsProvisioner.get_flatcar_ami(ec2_client, architecture='amd64')

Retrieve the flatcar AMI image to use as the base for all Toil autoscaling instances.

AMI must be available to the user on AWS (attempting to launch will return a 403 otherwise).

Priority is:
  1. User specified AMI via TOIL_AWS_AMI

  2. Official AMI from stable.release.flatcar-linux.net

  3. Search the AWS Marketplace

If all of these sources fail, we raise an error to complain.

Parameters:
  • ec2_client (botocore.client.BaseClient) – Boto3 EC2 Client

  • architecture (str) – The architecture type for the new AWS machine. Can be either amd64 or arm64

Return type:

str

toil.provisioners.aws.awsProvisioner.CLUSTER_LAUNCHING_PERMISSIONS = ['iam:CreateRole', 'iam:CreateInstanceProfile', 'iam:TagInstanceProfile', 'iam:DeleteRole',...
toil.provisioners.aws.awsProvisioner.get_policy_permissions(region)

Returns an action collection containing lists of all permission grant patterns keyed by resource that they are allowed upon. Requires AWS credentials to be associated with a user or assumed role.

Parameters:
  • zone – AWS zone to connect to

  • region (str)

Return type:

AllowedActionCollection

toil.provisioners.aws.awsProvisioner.policy_permissions_allow(given_permissions, required_permissions=[])

Check whether given set of actions are a subset of another given set of actions, returns true if they are otherwise false and prints a warning.

Parameters:
  • required_permissions (List[str]) – Dictionary containing actions required, keyed by resource

  • given_permissions (AllowedActionCollection) – Set of actions that are granted to a user or role

Return type:

bool

class toil.provisioners.aws.awsProvisioner.AWSConnectionManager

Class that represents a connection to AWS. Caches Boto 3 and Boto 2 objects by region.

Access to any kind of item goes through the particular method for the thing you want (session, resource, service, Boto2 Context), and then you pass the region you want to work in, and possibly the type of thing you want, as arguments.

This class is intended to eventually enable multi-region clusters, where connections to multiple regions may need to be managed in the same provisioner.

We also support None for a region, in which case no region will be passed to Boto/Boto3. The caller is responsible for implementing e.g. TOIL_AWS_REGION support.

Since connection objects may not be thread safe (see <https://boto3.amazonaws.com/v1/documentation/api/1.14.31/guide/session.html#multithreading-or-multiprocessing-with-sessions>), one is created for each thread that calls the relevant lookup method.

session(region)

Get the Boto3 Session to use for the given region.

Parameters:

region (Optional[str])

Return type:

boto3.session.Session

resource(region: str | None, service_name: Literal['s3'], endpoint_url: str | None = None) mypy_boto3_s3.S3ServiceResource
resource(region: str | None, service_name: Literal['iam'], endpoint_url: str | None = None) mypy_boto3_iam.IAMServiceResource
resource(region: str | None, service_name: Literal['ec2'], endpoint_url: str | None = None) mypy_boto3_ec2.EC2ServiceResource

Get the Boto3 Resource to use with the given service (like ‘ec2’) in the given region.

Parameters:

endpoint_url – AWS endpoint URL to use for the client. If not specified, a default is used.

client(region: str | None, service_name: Literal['ec2'], endpoint_url: str | None = None, config: botocore.client.Config | None = None) mypy_boto3_ec2.EC2Client
client(region: str | None, service_name: Literal['iam'], endpoint_url: str | None = None, config: botocore.client.Config | None = None) mypy_boto3_iam.IAMClient
client(region: str | None, service_name: Literal['s3'], endpoint_url: str | None = None, config: botocore.client.Config | None = None) mypy_boto3_s3.S3Client
client(region: str | None, service_name: Literal['sts'], endpoint_url: str | None = None, config: botocore.client.Config | None = None) mypy_boto3_sts.STSClient
client(region: str | None, service_name: Literal['sdb'], endpoint_url: str | None = None, config: botocore.client.Config | None = None) mypy_boto3_sdb.SimpleDBClient
client(region: str | None, service_name: Literal['autoscaling'], endpoint_url: str | None = None, config: botocore.client.Config | None = None) mypy_boto3_autoscaling.AutoScalingClient

Get the Boto3 Client to use with the given service (like ‘ec2’) in the given region.

Parameters:
  • endpoint_url – AWS endpoint URL to use for the client. If not specified, a default is used.

  • config – Custom configuration to use for the client.

toil.provisioners.aws.awsProvisioner.create_s3_bucket(s3_resource, bucket_name, region)

Create an AWS S3 bucket, using the given Boto3 S3 session, with the given name, in the given region.

Supports the us-east-1 region, where bucket creation is special.

ALL S3 bucket creation should use this function.

Parameters:
  • s3_resource (mypy_boto3_s3.S3ServiceResource)

  • bucket_name (str)

  • region (toil.lib.aws.AWSRegionName)

Return type:

mypy_boto3_s3.service_resource.Bucket

toil.provisioners.aws.awsProvisioner.flatten_tags(tags)

Convert tags from a key to value dict into a list of ‘Key’: xxx, ‘Value’: xxx dicts.

Parameters:

tags (Dict[str, str])

Return type:

List[Dict[str, str]]

toil.provisioners.aws.awsProvisioner.boto3_pager(requestor_callable, result_attribute_name, **kwargs)

Yield all the results from calling the given Boto 3 method with the given keyword arguments, paging through the results using the Marker or NextToken, and fetching out and looping over the list in the response with the given attribute name.

Parameters:
  • requestor_callable (Callable[Ellipsis, Any])

  • result_attribute_name (str)

  • kwargs (Any)

Return type:

Iterable[Any]

toil.provisioners.aws.awsProvisioner.human2bytes(string)

Given a string representation of some memory (i.e. ‘1024 Mib’), return the integer number of bytes.

Parameters:

string (str)

Return type:

int

toil.provisioners.aws.awsProvisioner.a_short_time = 5
toil.provisioners.aws.awsProvisioner.create_auto_scaling_group(autoscaling_client, asg_name, launch_template_ids, vpc_subnets, min_size, max_size, instance_types=None, spot_bid=None, spot_cheapest=False, tags=None)

Create a new Auto Scaling Group with the given name (which is also its unique identifier).

Parameters:
  • autoscaling_client (mypy_boto3_autoscaling.client.AutoScalingClient) – Boto3 client for autoscaling.

  • asg_name (str) – Unique name for the autoscaling group.

  • launch_template_ids (Dict[str, str]) – ID of the launch template to make instances from, for each instance type.

  • vpc_subnets (List[str]) – One or more subnet IDs to place instances in the group into. Determine the availability zone(s) instances will launch into.

  • min_size (int) – Minimum number of instances to have in the group at all times.

  • max_size (int) – Maximum number of instances to allow in the group at any time.

  • instance_types (Optional[Iterable[str]]) – Use a pool over the given instance types, instead of the type given in the launch template. For on-demand groups, this is a prioritized list. For spot groups, we let AWS balance according to spot_strategy. Must be 20 types or shorter.

  • spot_bid (Optional[float]) – If set, the ASG will be a spot market ASG. Bid is in dollars per instance hour. All instance types in the group are bid on equivalently.

  • spot_cheapest (bool) – If true, use the cheapest spot instances available out of instance_types, instead of the spot instances that minimize eviction probability.

  • tags (Optional[Dict[str, str]]) – Tags to apply to the ASG only. Tags for the instances should be added to the launch template instead.

Return type:

None

The default version of the launch template is used.

toil.provisioners.aws.awsProvisioner.create_instances(ec2_resource, image_id, key_name, instance_type, num_instances=1, security_group_ids=None, user_data=None, block_device_map=None, instance_profile_arn=None, placement_az=None, subnet_id=None, tags=None)

Replaces create_ondemand_instances. Uses boto3 and returns a list of Boto3 instance dicts.

See “create_instances” (returns a list of ec2.Instance objects):

https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ec2.html#EC2.ServiceResource.create_instances

Not to be confused with “run_instances” (same input args; returns a dictionary):

https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ec2.html#EC2.Client.run_instances

Tags, if given, are applied to the instances, and all volumes.

Parameters:
  • ec2_resource (mypy_boto3_ec2.service_resource.EC2ServiceResource)

  • image_id (str)

  • key_name (str)

  • instance_type (str)

  • num_instances (int)

  • security_group_ids (Optional[List])

  • user_data (Optional[Union[str, bytes]])

  • block_device_map (Optional[List[Dict]])

  • instance_profile_arn (Optional[str])

  • placement_az (Optional[str])

  • subnet_id (str)

  • tags (Optional[Dict[str, str]])

Return type:

List[mypy_boto3_ec2.service_resource.Instance]

toil.provisioners.aws.awsProvisioner.create_launch_template(ec2_client, template_name, image_id, key_name, instance_type, security_group_ids=None, user_data=None, block_device_map=None, instance_profile_arn=None, placement_az=None, subnet_id=None, tags=None)

Creates a launch template with the given name for launching instances with the given parameters.

We only ever use the default version of any launch template.

Internally calls https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ec2.html?highlight=create_launch_template#EC2.Client.create_launch_template

Parameters:
  • tags (Optional[Dict[str, str]]) – Tags, if given, are applied to the template itself, all instances, and all volumes.

  • user_data (Optional[Union[str, bytes]]) – non-base64-encoded user data to pass to the instances.

  • ec2_client (mypy_boto3_ec2.client.EC2Client)

  • template_name (str)

  • image_id (str)

  • key_name (str)

  • instance_type (str)

  • security_group_ids (Optional[List])

  • block_device_map (Optional[List[Dict]])

  • instance_profile_arn (Optional[str])

  • placement_az (Optional[str])

  • subnet_id (Optional[str])

Returns:

the ID of the launch template.

Return type:

str

toil.provisioners.aws.awsProvisioner.create_ondemand_instances(boto3_ec2, image_id, spec, num_instances=1)

Requests the RunInstances EC2 API call but accounts for the race between recently created instance profiles, IAM roles and an instance creation that refers to them.

Return type:

List[InstanceTypeDef]

Parameters:
  • boto3_ec2 (mypy_boto3_ec2.client.EC2Client)

  • image_id (str)

  • spec (Mapping[str, Any])

  • num_instances (int)

toil.provisioners.aws.awsProvisioner.increase_instance_hop_limit(boto3_ec2, boto_instance_list)

Increase the default HTTP hop limit, as we are running Toil and Kubernetes inside a Docker container, so the default hop limit of 1 will not be enough when grabbing metadata information with ec2_metadata

Must be called after the instances are guaranteed to be running.

Parameters:
  • boto_instance_list (List[mypy_boto3_ec2.type_defs.InstanceTypeDef]) – List of boto instances to modify

  • boto3_ec2 (mypy_boto3_ec2.client.EC2Client)

Returns:

Return type:

None

toil.provisioners.aws.awsProvisioner.create_spot_instances(boto3_ec2, price, image_id, spec, num_instances=1, timeout=None, tentative=False, tags=None)

Create instances on the spot market.

Parameters:

boto3_ec2 (mypy_boto3_ec2.client.EC2Client)

Return type:

Generator[mypy_boto3_ec2.type_defs.DescribeInstancesResultTypeDef, None, None]

toil.provisioners.aws.awsProvisioner.wait_instances_running(boto3_ec2, instances)

Wait until no instance in the given iterable is ‘pending’. Yield every instance that entered the running state as soon as it does.

Parameters:
  • boto3_ec2 (EC2Client) – the EC2 connection to use for making requests

  • instances (Iterable[InstanceTypeDef]) – the instances to wait on

Return type:

Iterable[InstanceTypeDef]

toil.provisioners.aws.awsProvisioner.wait_transition(boto3_ec2, resource, from_states, to_state, state_getter=lambda x: ...)

Wait until the specified EC2 resource (instance, image, volume, …) transitions from any of the given ‘from’ states to the specified ‘to’ state. If the instance is found in a state other that the to state or any of the from states, an exception will be thrown.

Parameters:
  • resource (mypy_boto3_ec2.type_defs.InstanceTypeDef) – the resource to monitor

  • from_states (Iterable[str]) – a set of states that the resource is expected to be in before the transition occurs

  • to_state (str) – the state of the resource when this method returns

  • boto3_ec2 (mypy_boto3_ec2.client.EC2Client)

  • state_getter (Callable[[mypy_boto3_ec2.type_defs.InstanceTypeDef], str])

toil.provisioners.aws.awsProvisioner.wait_until_instance_profile_arn_exists(instance_profile_arn)
Parameters:

instance_profile_arn (str)

class toil.provisioners.aws.awsProvisioner.InstanceType(name, cores, memory, disks, disk_capacity, architecture)
Parameters:
__slots__ = ('name', 'cores', 'memory', 'disks', 'disk_capacity', 'architecture')
__str__()

Return str(self).

Return type:

str

__eq__(other)

Return self==value.

Parameters:

other (object)

Return type:

bool

toil.provisioners.aws.awsProvisioner.E2Instances
toil.provisioners.aws.awsProvisioner.memoize

Memoize a function result based on its parameters using this decorator.

For example, this can be used in place of lazy initialization. If the decorating function is invoked by multiple threads, the decorated function may be called more than once with the same arguments.

toil.provisioners.aws.awsProvisioner.truncExpBackoff()
Return type:

Iterator[float]

class toil.provisioners.aws.awsProvisioner.ErrorCondition(error=None, error_codes=None, boto_error_codes=None, error_message_must_include=None, retry_on_this_condition=True)

A wrapper describing an error condition.

ErrorCondition events may be used to define errors in more detail to determine whether to retry.

Parameters:
  • error (Optional[Any])

  • error_codes (List[int])

  • boto_error_codes (List[str])

  • error_message_must_include (str)

  • retry_on_this_condition (bool)

toil.provisioners.aws.awsProvisioner.get_error_body(e)

Get the body from a Boto 2 or 3 error, or compatible types.

Returns the code and message if the error does not have a body.

Parameters:

e (Exception)

Return type:

str

toil.provisioners.aws.awsProvisioner.get_error_code(e)

Get the error code name from a Boto 2 or 3 error, or compatible types.

Returns empty string for other errors.

Parameters:

e (Exception)

Return type:

str

toil.provisioners.aws.awsProvisioner.get_error_message(e)

Get the error message string from a Boto 2 or 3 error, or compatible types.

Note that error message conditions also check more than this; this function does not fall back to the traceback for incompatible types.

Parameters:

e (Exception)

Return type:

str

toil.provisioners.aws.awsProvisioner.get_error_status(e)

Get the HTTP status code from a compatible source.

Such as a Boto 2 or 3 error, kubernetes.client.rest.ApiException, http.client.HTTPException, urllib3.exceptions.HTTPError, requests.exceptions.HTTPError, urllib.error.HTTPError, or compatible type

Returns 0 from other errors.

Parameters:

e (Exception)

Return type:

int

toil.provisioners.aws.awsProvisioner.old_retry(delays=DEFAULT_DELAYS, timeout=DEFAULT_TIMEOUT, predicate=lambda e: ...)

Deprecated.

Retry an operation while the failure matches a given predicate and until a given timeout expires, waiting a given amount of time in between attempts. This function is a generator that yields contextmanagers. See doctests below for example usage.

Parameters:
  • delays (Iterable[float]) – an interable yielding the time in seconds to wait before each retried attempt, the last element of the iterable will be repeated.

  • timeout (float) – a overall timeout that should not be exceeded for all attempts together. This is a best-effort mechanism only and it won’t abort an ongoing attempt, even if the timeout expires during that attempt.

  • predicate (Callable[[Exception],bool]) – a unary callable returning True if another attempt should be made to recover from the given exception. The default value for this parameter will prevent any retries!

Returns:

a generator yielding context managers, one per attempt

Return type:

Iterator

Retry for a limited amount of time:

>>> true = lambda _:True
>>> false = lambda _:False
>>> i = 0
>>> for attempt in old_retry( delays=[0], timeout=.1, predicate=true ):
...     with attempt:
...         i += 1
...         raise RuntimeError('foo')
Traceback (most recent call last):
...
RuntimeError: foo
>>> i > 1
True

If timeout is 0, do exactly one attempt:

>>> i = 0
>>> for attempt in old_retry( timeout=0 ):
...     with attempt:
...         i += 1
...         raise RuntimeError( 'foo' )
Traceback (most recent call last):
...
RuntimeError: foo
>>> i
1

Don’t retry on success:

>>> i = 0
>>> for attempt in old_retry( delays=[0], timeout=.1, predicate=true ):
...     with attempt:
...         i += 1
>>> i
1

Don’t retry on unless predicate returns True:

>>> i = 0
>>> for attempt in old_retry( delays=[0], timeout=.1, predicate=false):
...     with attempt:
...         i += 1
...         raise RuntimeError( 'foo' )
Traceback (most recent call last):
...
RuntimeError: foo
>>> i
1
toil.provisioners.aws.awsProvisioner.retry(intervals=None, infinite_retries=False, errors=None, log_message=None, prepare=None)

Retry a function if it fails with any Exception defined in “errors”.

Does so every x seconds, where x is defined by a list of numbers (ints or floats) in “intervals”. Also accepts ErrorCondition events for more detailed retry attempts.

Parameters:
  • intervals (Optional[List]) – A list of times in seconds we keep retrying until returning failure. Defaults to retrying with the following exponential back-off before failing: 1s, 1s, 2s, 4s, 8s, 16s

  • infinite_retries (bool) – If this is True, reset the intervals when they run out. Defaults to: False.

  • errors (Optional[Sequence[Union[ErrorCondition, Type[Exception]]]]) –

    A list of exceptions OR ErrorCondition objects to catch and retry on. ErrorCondition objects describe more detailed error event conditions than a plain error. An ErrorCondition specifies: - Exception (required) - Error codes that must match to be retried (optional; defaults to not checking) - A string that must be in the error message to be retried (optional; defaults to not checking) - A bool that can be set to False to always error on this condition.

    If not specified, this will default to a generic Exception.

  • log_message (Optional[Tuple[Callable, str]]) – Optional tuple of (“log/print function()”, “message string”) that will precede each attempt.

  • prepare (Optional[List[Callable]]) – Optional list of functions to call, with the function’s arguments, between retries, to reset state.

Returns:

The result of the wrapped function or raise.

Return type:

Callable[[Callable[Ellipsis, RT]], Callable[Ellipsis, RT]]

exception toil.provisioners.aws.awsProvisioner.ClusterCombinationNotSupportedException(provisioner_class, cluster_type, architecture, reason=None)

Bases: Exception

Indicates that a provisioner does not support making a given type of cluster with a given architecture.

Parameters:
  • provisioner_class (Type)

  • cluster_type (str)

  • architecture (str)

  • reason (Optional[str])

exception toil.provisioners.aws.awsProvisioner.NoSuchClusterException(cluster_name)

Bases: Exception

Indicates that the specified cluster does not exist.

Parameters:

cluster_name (str)

exception toil.provisioners.aws.awsProvisioner.NoSuchZoneException

Bases: Exception

Indicates that a valid zone could not be found.

class toil.provisioners.aws.awsProvisioner.AbstractProvisioner(clusterName=None, clusterType='mesos', zone=None, nodeStorage=50, nodeStorageOverrides=None, enable_fuse=False)

Bases: abc.ABC

Interface for provisioning worker nodes to use in a Toil cluster.

Parameters:
  • clusterName (Optional[str])

  • clusterType (Optional[str])

  • zone (Optional[str])

  • nodeStorage (int)

  • nodeStorageOverrides (Optional[List[str]])

  • enable_fuse (bool)

LEADER_HOME_DIR = '/root/'
cloud: str = None
abstract supportedClusterTypes()

Get all the cluster types that this provisioner implementation supports.

Return type:

Set[str]

abstract createClusterSettings()

Initialize class for a new cluster, to be deployed, when running outside the cloud.

abstract readClusterSettings()

Initialize class from an existing cluster. This method assumes that the instance we are running on is the leader.

Implementations must call _setLeaderWorkerAuthentication().

setAutoscaledNodeTypes(nodeTypes)

Set node types, shapes and spot bids for Toil-managed autoscaling. :param nodeTypes: A list of node types, as parsed with parse_node_types.

Parameters:

nodeTypes (List[Tuple[Set[str], Optional[float]]])

hasAutoscaledNodeTypes()

Check if node types have been configured on the provisioner (via setAutoscaledNodeTypes).

Returns:

True if node types are configured for autoscaling, and false otherwise.

Return type:

bool

getAutoscaledInstanceShapes()

Get all the node shapes and their named instance types that the Toil autoscaler should manage.

Return type:

Dict[Shape, str]

static retryPredicate(e)

Return true if the exception e should be retried by the cluster scaler. For example, should return true if the exception was due to exceeding an API rate limit. The error will be retried with exponential backoff.

Parameters:

e – exception raised during execution of setNodeCount

Returns:

boolean indicating whether the exception e should be retried

abstract launchCluster(*args, **kwargs)

Initialize a cluster and create a leader node.

Implementations must call _setLeaderWorkerAuthentication() with the leader so that workers can be launched.

Parameters:
  • leaderNodeType – The leader instance.

  • leaderStorage – The amount of disk to allocate to the leader in gigabytes.

  • owner – Tag identifying the owner of the instances.

abstract addNodes(nodeTypes, numNodes, preemptible, spotBid=None)

Used to add worker nodes to the cluster

Parameters:
  • numNodes (int) – The number of nodes to add

  • preemptible (bool) – whether or not the nodes will be preemptible

  • spotBid (Optional[float]) – The bid for preemptible nodes if applicable (this can be set in config, also).

  • nodeTypes (Set[str])

Returns:

number of nodes successfully added

Return type:

int

addManagedNodes(nodeTypes, minNodes, maxNodes, preemptible, spotBid=None)

Add a group of managed nodes of the given type, up to the given maximum. The nodes will automatically be launched and terminated depending on cluster load.

Raises ManagedNodesNotSupportedException if the provisioner implementation or cluster configuration can’t have managed nodes.

Parameters:
  • minNodes – The minimum number of nodes to scale to

  • maxNodes – The maximum number of nodes to scale to

  • preemptible – whether or not the nodes will be preemptible

  • spotBid – The bid for preemptible nodes if applicable (this can be set in config, also).

  • nodeTypes (Set[str])

Return type:

None

abstract terminateNodes(nodes)

Terminate the nodes represented by given Node objects

Parameters:

nodes (List[toil.provisioners.node.Node]) – list of Node objects

Return type:

None

abstract getLeader()
Returns:

The leader node.

abstract getProvisionedWorkers(instance_type=None, preemptible=None)

Gets all nodes, optionally of the given instance type or preemptability, from the provisioner. Includes both static and autoscaled nodes.

Parameters:
  • preemptible (Optional[bool]) – Boolean value to restrict to preemptible nodes or non-preemptible nodes

  • instance_type (Optional[str])

Returns:

list of Node objects

Return type:

List[toil.provisioners.node.Node]

abstract getNodeShape(instance_type, preemptible=False)

The shape of a preemptible or non-preemptible node managed by this provisioner. The node shape defines key properties of a machine, such as its number of cores or the time between billing intervals.

Parameters:

instance_type (str) – Instance type name to return the shape of.

Return type:

Shape

abstract destroyCluster()

Terminates all nodes in the specified cluster and cleans up all resources associated with the cluster. :param clusterName: identifier of the cluster to terminate.

Return type:

None

class InstanceConfiguration

Allows defining the initial setup for an instance and then turning it into an Ignition configuration for instance user data.

addFile(path, filesystem='root', mode='0755', contents='', append=False)

Make a file on the instance with the given filesystem, mode, and contents.

See the storage.files section: https://github.com/kinvolk/ignition/blob/flatcar-master/doc/configuration-v2_2.md

Parameters:
addUnit(name, enabled=True, contents='')

Make a systemd unit on the instance with the given name (including .service), and content. Units will be enabled by default.

Unit logs can be investigated with:

systemctl status whatever.service

or:

journalctl -xe

Parameters:
addSSHRSAKey(keyData)

Authorize the given bare, encoded RSA key (without “ssh-rsa”).

Parameters:

keyData (str)

toIgnitionConfig()

Return an Ignition configuration describing the desired config.

Return type:

str

getBaseInstanceConfiguration()

Get the base configuration for both leader and worker instances for all cluster types.

Return type:

InstanceConfiguration

addVolumesService(config)

Add a service to prepare and mount local scratch volumes.

Parameters:

config (InstanceConfiguration)

addNodeExporterService(config)

Add the node exporter service for Prometheus to an instance configuration.

Parameters:

config (InstanceConfiguration)

toil_service_env_options()
Return type:

str

add_toil_service(config, role, keyPath=None, preemptible=False)

Add the Toil leader or worker service to an instance configuration.

Will run Mesos master or agent as appropriate in Mesos clusters. For Kubernetes clusters, will just sleep to provide a place to shell into on the leader, and shouldn’t run on the worker.

Parameters:
  • role (str) – Should be ‘leader’ or ‘worker’. Will not work for ‘worker’ until leader credentials have been collected.

  • keyPath (str) – path on the node to a server-side encryption key that will be added to the node after it starts. The service will wait until the key is present before starting.

  • preemptible (bool) – Whether a worker should identify itself as preemptible or not to the scheduler.

  • config (InstanceConfiguration)

getKubernetesValues(architecture='amd64')

Returns a dict of Kubernetes component versions and paths for formatting into Kubernetes-related templates.

Parameters:

architecture (str)

addKubernetesServices(config, architecture='amd64')

Add installing Kubernetes and Kubeadm and setting up the Kubelet to run when configured to an instance configuration. The same process applies to leaders and workers.

Parameters:
abstract getKubernetesAutoscalerSetupCommands(values)

Return Bash commands that set up the Kubernetes cluster autoscaler for provisioning from the environment supported by this provisioner.

Should only be implemented if Kubernetes clusters are supported.

Parameters:

values (Dict[str, str]) – Contains definitions of cluster variables, like AUTOSCALER_VERSION and CLUSTER_NAME.

Returns:

Bash snippet

Return type:

str

getKubernetesCloudProvider()

Return the Kubernetes cloud provider (for example, ‘aws’), to pass to the kubelets in a Kubernetes cluster provisioned using this provisioner.

Defaults to None if not overridden, in which case no cloud provider integration will be used.

Returns:

Cloud provider name, or None

Return type:

Optional[str]

addKubernetesLeader(config)

Add services to configure as a Kubernetes leader, if Kubernetes is already set to be installed.

Parameters:

config (InstanceConfiguration)

addKubernetesWorker(config, authVars, preemptible=False)

Add services to configure as a Kubernetes worker, if Kubernetes is already set to be installed.

Authenticate back to the leader using the JOIN_TOKEN, JOIN_CERT_HASH, and JOIN_ENDPOINT set in the given authentication data dict.

Parameters:
  • config (InstanceConfiguration) – The configuration to add services to

  • authVars (Dict[str, str]) – Dict with authentication info

  • preemptible (bool) – Whether the worker should be labeled as preemptible or not

exception toil.provisioners.aws.awsProvisioner.ManagedNodesNotSupportedException

Bases: RuntimeError

Raised when attempting to add managed nodes (which autoscale up and down by themselves, without the provisioner doing the work) to a provisioner that does not support them.

Polling with this and try/except is the Right Way to check if managed nodes are available from a provisioner.

class toil.provisioners.aws.awsProvisioner.Shape(wallTime, memory, cores, disk, preemptible)

Represents a job or a node’s “shape”, in terms of the dimensions of memory, cores, disk and wall-time allocation.

The wallTime attribute stores the number of seconds of a node allocation, e.g. 3600 for AWS. FIXME: and for jobs?

The memory and disk attributes store the number of bytes required by a job (or provided by a node) in RAM or on disk (SSD or HDD), respectively.

Parameters:
__eq__(other)

Return self==value.

Parameters:

other (Any)

Return type:

bool

greater_than(other)
Parameters:

other (Any)

Return type:

bool

__gt__(other)

Return self>value.

Parameters:

other (Any)

Return type:

bool

__repr__()

Return repr(self).

Return type:

str

__str__()

Return str(self).

Return type:

str

__hash__()

Return hash(self).

Return type:

int

toil.provisioners.aws.awsProvisioner.get_best_aws_zone(spotBid=None, nodeType=None, boto3_ec2=None, zone_options=None)

Get the right AWS zone to use.

Reports the TOIL_AWS_ZONE environment variable if set.

Otherwise, if we are running on EC2 or ECS, reports the zone we are running in.

Otherwise, if a spot bid, node type, and Boto2 EC2 connection are specified, picks a zone where instances are easy to buy from the zones in the region of the Boto2 connection. These parameters must always be specified together, or not at all.

In this case, zone_options can be used to restrict to a subset of the zones in the region.

Otherwise, if we have the TOIL_AWS_REGION variable set, chooses a zone in that region.

Finally, if a default region is configured in Boto 2, chooses a zone in that region.

Returns None if no method can produce a zone to use.

Parameters:
  • spotBid (Optional[float])

  • nodeType (Optional[str])

  • boto3_ec2 (Optional[botocore.client.BaseClient])

  • zone_options (Optional[List[str]])

Return type:

Optional[str]

class toil.provisioners.aws.awsProvisioner.Node(publicIP, privateIP, name, launchTime, nodeType, preemptible, tags=None, use_private_ip=None)
Parameters:
maxWaitTime
__str__()

Return str(self).

__repr__()

Return repr(self).

__hash__()

Return hash(self).

remainingBillingInterval()

If the node has a launch time, this function returns a floating point value between 0 and 1.0 representing how far we are into the current billing cycle for the given instance. If the return value is .25, we are one quarter into the billing cycle, with three quarters remaining before we will be charged again for that instance.

Assumes a billing cycle of one hour.

Returns:

Float from 0 -> 1.0 representing percentage of pre-paid time left in cycle.

Return type:

float

waitForNode(role, keyName='core')
Parameters:
Return type:

None

copySshKeys(keyName)

Copy authorized_keys file to the core user from the keyName user.

injectFile(fromFile, toFile, role)

rysnc a file to the container with the given role

extractFile(fromFile, toFile, role)

rysnc a file from the container with the given role

sshAppliance(*args, **kwargs)
Parameters:
  • args – arguments to execute in the appliance

  • kwargs – tty=bool tells docker whether or not to create a TTY shell for interactive SSHing. The default value is False. Input=string is passed as input to the Popen call.

sshInstance(*args, **kwargs)

Run a command on the instance. Returns the binary output of the command.

coreSSH(*args, **kwargs)

If strict=False, strict host key checking will be temporarily disabled. This is provided as a convenience for internal/automated functions and ought to be set to True whenever feasible, or whenever the user is directly interacting with a resource (e.g. rsync-cluster or ssh-cluster). Assumed to be False by default.

kwargs: input, tty, appliance, collectStdout, sshOptions, strict

Parameters:

input (bytes) – UTF-8 encoded input bytes to send to the command

coreRsync(args, applianceName='toil_leader', **kwargs)
Parameters:
  • args (List[str])

  • applianceName (str)

  • kwargs (Any)

Return type:

int

toil.provisioners.aws.awsProvisioner.get_client(service_name: Literal['ec2'], region_name: str | None = None, endpoint_url: str | None = None, config: botocore.client.Config | None = None) mypy_boto3_ec2.EC2Client
toil.provisioners.aws.awsProvisioner.get_client(service_name: Literal['iam'], region_name: str | None = None, endpoint_url: str | None = None, config: botocore.client.Config | None = None) mypy_boto3_iam.IAMClient
toil.provisioners.aws.awsProvisioner.get_client(service_name: Literal['s3'], region_name: str | None = None, endpoint_url: str | None = None, config: botocore.client.Config | None = None) mypy_boto3_s3.S3Client
toil.provisioners.aws.awsProvisioner.get_client(service_name: Literal['sts'], region_name: str | None = None, endpoint_url: str | None = None, config: botocore.client.Config | None = None) mypy_boto3_sts.STSClient
toil.provisioners.aws.awsProvisioner.get_client(service_name: Literal['sdb'], region_name: str | None = None, endpoint_url: str | None = None, config: botocore.client.Config | None = None) mypy_boto3_sdb.SimpleDBClient
toil.provisioners.aws.awsProvisioner.get_client(service_name: Literal['autoscaling'], region_name: str | None = None, endpoint_url: str | None = None, config: botocore.client.Config | None = None) mypy_boto3_autoscaling.AutoScalingClient

Get a Boto 3 client for a particular AWS service, usable by the current thread.

Global alternative to AWSConnectionManager.

toil.provisioners.aws.awsProvisioner.logger
toil.provisioners.aws.awsProvisioner.awsRetryPredicate(e)
Parameters:

e (Exception)

Return type:

bool

toil.provisioners.aws.awsProvisioner.expectedShutdownErrors(e)

Matches errors that we expect to occur during shutdown, and which indicate that we need to wait or try again.

Should not match any errors which indicate that an operation is impossible or unnecessary (such as errors resulting from a thing not existing to be deleted).

Parameters:

e (Exception)

Return type:

bool

toil.provisioners.aws.awsProvisioner.F
toil.provisioners.aws.awsProvisioner.awsRetry(f)

This decorator retries the wrapped function if aws throws unexpected errors.

It should wrap any function that makes use of boto

Parameters:

f (Callable[Ellipsis, F])

Return type:

Callable[Ellipsis, F]

toil.provisioners.aws.awsProvisioner.awsFilterImpairedNodes(nodes, boto3_ec2)
Parameters:
  • nodes (List[mypy_boto3_ec2.type_defs.InstanceTypeDef])

  • boto3_ec2 (mypy_boto3_ec2.client.EC2Client)

Return type:

List[mypy_boto3_ec2.type_defs.InstanceTypeDef]

exception toil.provisioners.aws.awsProvisioner.InvalidClusterStateException

Bases: Exception

Common base class for all non-exit exceptions.

toil.provisioners.aws.awsProvisioner.collapse_tags(instance_tags)

Collapse tags from boto3 format to node format :param instance_tags: tags as list of TagTypeDef :return: Dict of tags

Parameters:

instance_tags (List[mypy_boto3_ec2.type_defs.TagTypeDef])

Return type:

Dict[str, str]

class toil.provisioners.aws.awsProvisioner.AWSProvisioner(clusterName, clusterType, zone, nodeStorage, nodeStorageOverrides, sseKey, enable_fuse)

Bases: toil.provisioners.abstractProvisioner.AbstractProvisioner

Interface for provisioning worker nodes to use in a Toil cluster.

Parameters:
  • clusterName (Optional[str])

  • clusterType (Optional[str])

  • zone (Optional[str])

  • nodeStorage (int)

  • nodeStorageOverrides (Optional[List[str]])

  • sseKey (Optional[str])

  • enable_fuse (bool)

supportedClusterTypes()

Get all the cluster types that this provisioner implementation supports.

Return type:

Set[str]

createClusterSettings()

Create a new set of cluster settings for a cluster to be deployed into AWS.

Return type:

None

readClusterSettings()

Reads the cluster settings from the instance metadata, which assumes the instance is the leader.

Return type:

None

launchCluster(leaderNodeType, leaderStorage, owner, keyName, botoPath, userTags, vpcSubnet, awsEc2ProfileArn, awsEc2ExtraSecurityGroupIds, **kwargs)

Starts a single leader node and populates this class with the leader’s metadata.

Parameters:
  • leaderNodeType (str) – An AWS instance type, like “t2.medium”, for example.

  • leaderStorage (int) – An integer number of gigabytes to provide the leader instance with.

  • owner (str) – Resources will be tagged with this owner string.

  • keyName (str) – The ssh key to use to access the leader node.

  • botoPath (str) – The path to the boto credentials directory.

  • userTags (Optional[Dict[str, str]]) – Optionally provided user tags to put on the cluster.

  • vpcSubnet (Optional[str]) – Optionally specify the VPC subnet for the leader.

  • awsEc2ProfileArn (Optional[str]) – Optionally provide the profile ARN.

  • awsEc2ExtraSecurityGroupIds (Optional[List[str]]) – Optionally provide additional security group IDs.

  • kwargs (Dict[str, Any])

Returns:

None

Return type:

None

toil_service_env_options()

Set AWS tags in user docker container

Return type:

str

getKubernetesAutoscalerSetupCommands(values)

Get the Bash commands necessary to configure the Kubernetes Cluster Autoscaler for AWS.

Parameters:

values (Dict[str, str])

Return type:

str

getKubernetesCloudProvider()

Use the “aws” Kubernetes cloud provider when setting up Kubernetes.

Return type:

Optional[str]

getNodeShape(instance_type, preemptible=False)

Get the Shape for the given instance type (e.g. ‘t2.medium’).

Parameters:
  • instance_type (str)

  • preemptible (bool)

Return type:

toil.provisioners.abstractProvisioner.Shape

static retryPredicate(e)

Return true if the exception e should be retried by the cluster scaler. For example, should return true if the exception was due to exceeding an API rate limit. The error will be retried with exponential backoff.

Parameters:

e (Exception) – exception raised during execution of setNodeCount

Returns:

boolean indicating whether the exception e should be retried

Return type:

bool

destroyCluster()

Terminate instances and delete the profile and security group.

Return type:

None

terminateNodes(nodes)

Terminate the nodes represented by given Node objects

Parameters:

nodes (List[toil.provisioners.node.Node]) – list of Node objects

Return type:

None

addNodes(nodeTypes, numNodes, preemptible, spotBid=None)

Used to add worker nodes to the cluster

Parameters:
  • numNodes (int) – The number of nodes to add

  • preemptible (bool) – whether or not the nodes will be preemptible

  • spotBid (Optional[float]) – The bid for preemptible nodes if applicable (this can be set in config, also).

  • nodeTypes (Set[str])

Returns:

number of nodes successfully added

Return type:

int

addManagedNodes(nodeTypes, minNodes, maxNodes, preemptible, spotBid=None)

Add a group of managed nodes of the given type, up to the given maximum. The nodes will automatically be launched and terminated depending on cluster load.

Raises ManagedNodesNotSupportedException if the provisioner implementation or cluster configuration can’t have managed nodes.

Parameters:
  • minNodes (int) – The minimum number of nodes to scale to

  • maxNodes (int) – The maximum number of nodes to scale to

  • preemptible (bool) – whether or not the nodes will be preemptible

  • spotBid (Optional[float]) – The bid for preemptible nodes if applicable (this can be set in config, also).

  • nodeTypes (Set[str])

Return type:

None

getProvisionedWorkers(instance_type=None, preemptible=None)

Gets all nodes, optionally of the given instance type or preemptability, from the provisioner. Includes both static and autoscaled nodes.

Parameters:
  • preemptible (Optional[bool]) – Boolean value to restrict to preemptible nodes or non-preemptible nodes

  • instance_type (Optional[str])

Returns:

list of Node objects

Return type:

List[toil.provisioners.node.Node]

getLeader(wait=False)

Get the leader for the cluster as a Toil Node object.

Parameters:

wait (bool)

Return type:

toil.provisioners.node.Node

full_policy(resource)

Produce a dict describing the JSON form of a full-access-granting AWS IAM policy for the service with the given name (e.g. ‘s3’).

Parameters:

resource (str)

Return type:

Dict[str, Any]

kubernetes_policy()

Get the Kubernetes policy grants not provided by the full grants on EC2 and IAM. See <https://github.com/DataBiosphere/toil/wiki/Manual-Autoscaling-Kubernetes-Setup#leader-policy> and <https://github.com/DataBiosphere/toil/wiki/Manual-Autoscaling-Kubernetes-Setup#worker-policy>.

These are mostly needed to support Kubernetes’ AWS CloudProvider, and some are for the Kubernetes Cluster Autoscaler’s AWS integration.

Some of these are really only needed on the leader.

Return type:

Dict[str, Any]