toil.lib.ec2

Attributes

a_short_time

a_long_time

logger

INCONSISTENCY_ERRORS

iam_client

Exceptions

UserError

Unspecified run-time error.

UnexpectedResourceState

Common base class for all non-exit exceptions.

Classes

panic

The Python idiom for reraising a primary exception fails when the except block raises a

ErrorCondition

A wrapper describing an error condition.

Functions

establish_boto3_session([region_name])

Get a Boto 3 session usable by the current thread.

flatten_tags(tags)

Convert tags from a key to value dict into a list of 'Key': xxx, 'Value': xxx dicts.

get_error_code(e)

Get the error code name from a Boto 2 or 3 error, or compatible types.

get_error_message(e)

Get the error message string from a Boto 2 or 3 error, or compatible types.

old_retry([delays, timeout, predicate])

Deprecated.

retry([intervals, infinite_retries, errors, ...])

Retry a function if it fails with any Exception defined in "errors".

not_found(e)

inconsistencies_detected(e)

retry_ec2([t, retry_for, retry_while])

wait_transition(boto3_ec2, resource, from_states, to_state)

Wait until the specified EC2 resource (instance, image, volume, ...) transitions from any

wait_instances_running(boto3_ec2, instances)

Wait until no instance in the given iterable is 'pending'. Yield every instance that

wait_spot_requests_active(boto3_ec2, requests[, ...])

Wait until no spot request in the given iterator is in the 'open' state or, optionally,

create_spot_instances(boto3_ec2, price, image_id, spec)

Create instances on the spot market.

create_ondemand_instances(boto3_ec2, image_id, spec[, ...])

Requests the RunInstances EC2 API call but accounts for the race between recently created

increase_instance_hop_limit(boto3_ec2, boto_instance_list)

Increase the default HTTP hop limit, as we are running Toil and Kubernetes inside a Docker container, so the default

prune(bushy)

Prune entries in the given dict with false-y values.

wait_until_instance_profile_arn_exists(...)

create_instances(ec2_resource, image_id, key_name, ...)

Replaces create_ondemand_instances. Uses boto3 and returns a list of Boto3 instance dicts.

create_launch_template(ec2_client, template_name, ...)

Creates a launch template with the given name for launching instances with the given parameters.

create_auto_scaling_group(autoscaling_client, ...[, ...])

Create a new Auto Scaling Group with the given name (which is also its

Module Contents

toil.lib.ec2.establish_boto3_session(region_name=None)

Get a Boto 3 session usable by the current thread.

This function may not always establish a new session; it can be memoized.

Parameters:

region_name (Optional[str])

Return type:

boto3.Session

toil.lib.ec2.flatten_tags(tags)

Convert tags from a key to value dict into a list of ‘Key’: xxx, ‘Value’: xxx dicts.

Parameters:

tags (Dict[str, str])

Return type:

List[Dict[str, str]]

class toil.lib.ec2.panic(log=None)

The Python idiom for reraising a primary exception fails when the except block raises a secondary exception, e.g. while trying to cleanup. In that case the original exception is lost and the secondary exception is reraised. The solution seems to be to save the primary exception info as returned from sys.exc_info() and then reraise that.

This is a contextmanager that should be used like this

try:

# do something that can fail

except:
with panic( log ):

# do cleanup that can also fail

If a logging logger is passed to panic(), any secondary Exception raised within the with block will be logged. Otherwise those exceptions are swallowed. At the end of the with block the primary exception will be reraised.

__enter__()
__exit__(*exc_info)
class toil.lib.ec2.ErrorCondition(error=None, error_codes=None, boto_error_codes=None, error_message_must_include=None, retry_on_this_condition=True)

A wrapper describing an error condition.

ErrorCondition events may be used to define errors in more detail to determine whether to retry.

Parameters:
  • error (Optional[Any])

  • error_codes (List[int])

  • boto_error_codes (List[str])

  • error_message_must_include (str)

  • retry_on_this_condition (bool)

toil.lib.ec2.get_error_code(e)

Get the error code name from a Boto 2 or 3 error, or compatible types.

Returns empty string for other errors.

Parameters:

e (Exception)

Return type:

str

toil.lib.ec2.get_error_message(e)

Get the error message string from a Boto 2 or 3 error, or compatible types.

Note that error message conditions also check more than this; this function does not fall back to the traceback for incompatible types.

Parameters:

e (Exception)

Return type:

str

toil.lib.ec2.old_retry(delays=DEFAULT_DELAYS, timeout=DEFAULT_TIMEOUT, predicate=lambda e: ...)

Deprecated.

Retry an operation while the failure matches a given predicate and until a given timeout expires, waiting a given amount of time in between attempts. This function is a generator that yields contextmanagers. See doctests below for example usage.

Parameters:
  • delays (Iterable[float]) – an interable yielding the time in seconds to wait before each retried attempt, the last element of the iterable will be repeated.

  • timeout (float) – a overall timeout that should not be exceeded for all attempts together. This is a best-effort mechanism only and it won’t abort an ongoing attempt, even if the timeout expires during that attempt.

  • predicate (Callable[[Exception],bool]) – a unary callable returning True if another attempt should be made to recover from the given exception. The default value for this parameter will prevent any retries!

Returns:

a generator yielding context managers, one per attempt

Return type:

Iterator

Retry for a limited amount of time:

>>> true = lambda _:True
>>> false = lambda _:False
>>> i = 0
>>> for attempt in old_retry( delays=[0], timeout=.1, predicate=true ):
...     with attempt:
...         i += 1
...         raise RuntimeError('foo')
Traceback (most recent call last):
...
RuntimeError: foo
>>> i > 1
True

If timeout is 0, do exactly one attempt:

>>> i = 0
>>> for attempt in old_retry( timeout=0 ):
...     with attempt:
...         i += 1
...         raise RuntimeError( 'foo' )
Traceback (most recent call last):
...
RuntimeError: foo
>>> i
1

Don’t retry on success:

>>> i = 0
>>> for attempt in old_retry( delays=[0], timeout=.1, predicate=true ):
...     with attempt:
...         i += 1
>>> i
1

Don’t retry on unless predicate returns True:

>>> i = 0
>>> for attempt in old_retry( delays=[0], timeout=.1, predicate=false):
...     with attempt:
...         i += 1
...         raise RuntimeError( 'foo' )
Traceback (most recent call last):
...
RuntimeError: foo
>>> i
1
toil.lib.ec2.retry(intervals=None, infinite_retries=False, errors=None, log_message=None, prepare=None)

Retry a function if it fails with any Exception defined in “errors”.

Does so every x seconds, where x is defined by a list of numbers (ints or floats) in “intervals”. Also accepts ErrorCondition events for more detailed retry attempts.

Parameters:
  • intervals (Optional[List]) – A list of times in seconds we keep retrying until returning failure. Defaults to retrying with the following exponential back-off before failing: 1s, 1s, 2s, 4s, 8s, 16s

  • infinite_retries (bool) – If this is True, reset the intervals when they run out. Defaults to: False.

  • errors (Optional[Sequence[Union[ErrorCondition, Type[Exception]]]]) –

    A list of exceptions OR ErrorCondition objects to catch and retry on. ErrorCondition objects describe more detailed error event conditions than a plain error. An ErrorCondition specifies: - Exception (required) - Error codes that must match to be retried (optional; defaults to not checking) - A string that must be in the error message to be retried (optional; defaults to not checking) - A bool that can be set to False to always error on this condition.

    If not specified, this will default to a generic Exception.

  • log_message (Optional[Tuple[Callable, str]]) – Optional tuple of (“log/print function()”, “message string”) that will precede each attempt.

  • prepare (Optional[List[Callable]]) – Optional list of functions to call, with the function’s arguments, between retries, to reset state.

Returns:

The result of the wrapped function or raise.

Return type:

Callable[[Callable[Ellipsis, RT]], Callable[Ellipsis, RT]]

toil.lib.ec2.a_short_time = 5
toil.lib.ec2.a_long_time
toil.lib.ec2.logger
exception toil.lib.ec2.UserError(message=None, cause=None)

Bases: RuntimeError

Unspecified run-time error.

toil.lib.ec2.not_found(e)
toil.lib.ec2.inconsistencies_detected(e)
toil.lib.ec2.INCONSISTENCY_ERRORS
toil.lib.ec2.retry_ec2(t=a_short_time, retry_for=10 * a_short_time, retry_while=not_found)
exception toil.lib.ec2.UnexpectedResourceState(resource, to_state, state)

Bases: Exception

Common base class for all non-exit exceptions.

toil.lib.ec2.wait_transition(boto3_ec2, resource, from_states, to_state, state_getter=lambda x: ...)

Wait until the specified EC2 resource (instance, image, volume, …) transitions from any of the given ‘from’ states to the specified ‘to’ state. If the instance is found in a state other that the to state or any of the from states, an exception will be thrown.

Parameters:
  • resource (mypy_boto3_ec2.type_defs.InstanceTypeDef) – the resource to monitor

  • from_states (Iterable[str]) – a set of states that the resource is expected to be in before the transition occurs

  • to_state (str) – the state of the resource when this method returns

  • boto3_ec2 (mypy_boto3_ec2.client.EC2Client)

  • state_getter (Callable[[mypy_boto3_ec2.type_defs.InstanceTypeDef], str])

toil.lib.ec2.wait_instances_running(boto3_ec2, instances)

Wait until no instance in the given iterable is ‘pending’. Yield every instance that entered the running state as soon as it does.

Parameters:
  • boto3_ec2 (EC2Client) – the EC2 connection to use for making requests

  • instances (Iterable[InstanceTypeDef]) – the instances to wait on

Return type:

Iterable[InstanceTypeDef]

toil.lib.ec2.wait_spot_requests_active(boto3_ec2, requests, timeout=None, tentative=False)

Wait until no spot request in the given iterator is in the ‘open’ state or, optionally, a timeout occurs. Yield spot requests as soon as they leave the ‘open’ state.

Parameters:
  • boto3_ec2 (mypy_boto3_ec2.client.EC2Client) – ec2 client

  • requests (Iterable[mypy_boto3_ec2.type_defs.SpotInstanceRequestTypeDef]) – The requests to wait on.

  • timeout (float) – Maximum time in seconds to spend waiting or None to wait forever. If a timeout occurs, the remaining open requests will be cancelled.

  • tentative (bool) – if True, give up on a spot request at the earliest indication of it not being fulfilled immediately

Return type:

Iterable[List[mypy_boto3_ec2.type_defs.SpotInstanceRequestTypeDef]]

toil.lib.ec2.create_spot_instances(boto3_ec2, price, image_id, spec, num_instances=1, timeout=None, tentative=False, tags=None)

Create instances on the spot market.

Parameters:

boto3_ec2 (mypy_boto3_ec2.client.EC2Client)

Return type:

Generator[mypy_boto3_ec2.type_defs.DescribeInstancesResultTypeDef, None, None]

toil.lib.ec2.create_ondemand_instances(boto3_ec2, image_id, spec, num_instances=1)

Requests the RunInstances EC2 API call but accounts for the race between recently created instance profiles, IAM roles and an instance creation that refers to them.

Return type:

List[InstanceTypeDef]

Parameters:
  • boto3_ec2 (mypy_boto3_ec2.client.EC2Client)

  • image_id (str)

  • spec (Mapping[str, Any])

  • num_instances (int)

toil.lib.ec2.increase_instance_hop_limit(boto3_ec2, boto_instance_list)

Increase the default HTTP hop limit, as we are running Toil and Kubernetes inside a Docker container, so the default hop limit of 1 will not be enough when grabbing metadata information with ec2_metadata

Must be called after the instances are guaranteed to be running.

Parameters:
  • boto_instance_list (List[mypy_boto3_ec2.type_defs.InstanceTypeDef]) – List of boto instances to modify

  • boto3_ec2 (mypy_boto3_ec2.client.EC2Client)

Returns:

Return type:

None

toil.lib.ec2.prune(bushy)

Prune entries in the given dict with false-y values. Boto3 may not like None and instead wants no key.

Parameters:

bushy (dict)

Return type:

dict

toil.lib.ec2.iam_client
toil.lib.ec2.wait_until_instance_profile_arn_exists(instance_profile_arn)
Parameters:

instance_profile_arn (str)

toil.lib.ec2.create_instances(ec2_resource, image_id, key_name, instance_type, num_instances=1, security_group_ids=None, user_data=None, block_device_map=None, instance_profile_arn=None, placement_az=None, subnet_id=None, tags=None)

Replaces create_ondemand_instances. Uses boto3 and returns a list of Boto3 instance dicts.

See “create_instances” (returns a list of ec2.Instance objects):

https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ec2.html#EC2.ServiceResource.create_instances

Not to be confused with “run_instances” (same input args; returns a dictionary):

https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ec2.html#EC2.Client.run_instances

Tags, if given, are applied to the instances, and all volumes.

Parameters:
  • ec2_resource (mypy_boto3_ec2.service_resource.EC2ServiceResource)

  • image_id (str)

  • key_name (str)

  • instance_type (str)

  • num_instances (int)

  • security_group_ids (Optional[List])

  • user_data (Optional[Union[str, bytes]])

  • block_device_map (Optional[List[Dict]])

  • instance_profile_arn (Optional[str])

  • placement_az (Optional[str])

  • subnet_id (str)

  • tags (Optional[Dict[str, str]])

Return type:

List[mypy_boto3_ec2.service_resource.Instance]

toil.lib.ec2.create_launch_template(ec2_client, template_name, image_id, key_name, instance_type, security_group_ids=None, user_data=None, block_device_map=None, instance_profile_arn=None, placement_az=None, subnet_id=None, tags=None)

Creates a launch template with the given name for launching instances with the given parameters.

We only ever use the default version of any launch template.

Internally calls https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ec2.html?highlight=create_launch_template#EC2.Client.create_launch_template

Parameters:
  • tags (Optional[Dict[str, str]]) – Tags, if given, are applied to the template itself, all instances, and all volumes.

  • user_data (Optional[Union[str, bytes]]) – non-base64-encoded user data to pass to the instances.

  • ec2_client (mypy_boto3_ec2.client.EC2Client)

  • template_name (str)

  • image_id (str)

  • key_name (str)

  • instance_type (str)

  • security_group_ids (Optional[List])

  • block_device_map (Optional[List[Dict]])

  • instance_profile_arn (Optional[str])

  • placement_az (Optional[str])

  • subnet_id (Optional[str])

Returns:

the ID of the launch template.

Return type:

str

toil.lib.ec2.create_auto_scaling_group(autoscaling_client, asg_name, launch_template_ids, vpc_subnets, min_size, max_size, instance_types=None, spot_bid=None, spot_cheapest=False, tags=None)

Create a new Auto Scaling Group with the given name (which is also its unique identifier).

Parameters:
  • autoscaling_client (mypy_boto3_autoscaling.client.AutoScalingClient) – Boto3 client for autoscaling.

  • asg_name (str) – Unique name for the autoscaling group.

  • launch_template_ids (Dict[str, str]) – ID of the launch template to make instances from, for each instance type.

  • vpc_subnets (List[str]) – One or more subnet IDs to place instances in the group into. Determine the availability zone(s) instances will launch into.

  • min_size (int) – Minimum number of instances to have in the group at all times.

  • max_size (int) – Maximum number of instances to allow in the group at any time.

  • instance_types (Optional[Iterable[str]]) – Use a pool over the given instance types, instead of the type given in the launch template. For on-demand groups, this is a prioritized list. For spot groups, we let AWS balance according to spot_strategy. Must be 20 types or shorter.

  • spot_bid (Optional[float]) – If set, the ASG will be a spot market ASG. Bid is in dollars per instance hour. All instance types in the group are bid on equivalently.

  • spot_cheapest (bool) – If true, use the cheapest spot instances available out of instance_types, instead of the spot instances that minimize eviction probability.

  • tags (Optional[Dict[str, str]]) – Tags to apply to the ASG only. Tags for the instances should be added to the launch template instead.

Return type:

None

The default version of the launch template is used.