toil.lib.retry

This file holds the retry() decorator function and RetryCondition object.

retry() can be used to decorate any function based on the list of errors one wishes to retry on.

This list of errors can contain normal Exception objects, and/or RetryCondition objects wrapping Exceptions to include additional conditions.

For example, retrying on a one Exception (HTTPError):

from requests import get
from requests.exceptions import HTTPError

@retry(errors=[HTTPError])
def update_my_wallpaper():
    return get('https://www.deviantart.com/')

Or:

from requests import get
from requests.exceptions import HTTPError

@retry(errors=[HTTPError, ValueError])
def update_my_wallpaper():
    return get('https://www.deviantart.com/')

The examples above will retry for the default interval on any errors specified the “errors=” arg list.

To retry on specifically 500/502/503/504 errors, you could specify an ErrorCondition object instead, for example:

from requests import get
from requests.exceptions import HTTPError

@retry(errors=[
    ErrorCondition(
               error=HTTPError,
               error_codes=[500, 502, 503, 504]
           )])
def update_my_wallpaper():
    return requests.get('https://www.deviantart.com/')

To retry on specifically errors containing the phrase “NotFound”:

from requests import get
from requests.exceptions import HTTPError

@retry(errors=[
    ErrorCondition(
        error=HTTPError,
        error_message_must_include="NotFound"
    )])
def update_my_wallpaper():
    return requests.get('https://www.deviantart.com/')

To retry on all HTTPError errors EXCEPT an HTTPError containing the phrase “NotFound”:

from requests import get
from requests.exceptions import HTTPError

@retry(errors=[
    HTTPError,
    ErrorCondition(
               error=HTTPError,
               error_message_must_include="NotFound",
               retry_on_this_condition=False
           )])
def update_my_wallpaper():
    return requests.get('https://www.deviantart.com/')

To retry on boto3’s specific status errors, an example of the implementation is:

import boto3
from botocore.exceptions import ClientError

@retry(errors=[
    ErrorCondition(
               error=ClientError,
               boto_error_codes=["BucketNotFound"]
           )])
def boto_bucket(bucket_name):
    boto_session = boto3.session.Session()
    s3_resource = boto_session.resource('s3')
    return s3_resource.Bucket(bucket_name)

Any combination of these will also work, provided the codes are matched to the correct exceptions. A ValueError will not return a 404, for example.

The retry function as a decorator should make retrying functions easier and clearer It also encourages smaller independent functions, as opposed to lumping many different things that may need to be retried on different conditions in the same function.

The ErrorCondition object tries to take some of the heavy lifting of writing specific retry conditions and boil it down to an API that covers all common use-cases without the user having to write any new bespoke functions.

Use-cases covered currently:

  1. Retrying on a normal error, like a KeyError.

  2. Retrying on HTTP error codes (use ErrorCondition).

  3. Retrying on boto 3’s specific status errors, like “BucketNotFound” (use ErrorCondition).

  4. Retrying when an error message contains a certain phrase (use ErrorCondition).

  5. Explicitly NOT retrying on a condition (use ErrorCondition).

If new functionality is needed, it’s currently best practice in Toil to add functionality to the ErrorCondition itself rather than making a new custom retry method.

Attributes

SUPPORTED_HTTP_ERRORS

kubernetes

botocore

logger

RT

DEFAULT_DELAYS

DEFAULT_TIMEOUT

E

retry_flaky_test

Classes

ErrorCondition

A wrapper describing an error condition.

Functions

retry([intervals, infinite_retries, errors, ...])

Retry a function if it fails with any Exception defined in "errors".

return_status_code(e)

get_error_code(e)

Get the error code name from a Boto 2 or 3 error, or compatible types.

get_error_message(e)

Get the error message string from a Boto 2 or 3 error, or compatible types.

get_error_status(e)

Get the HTTP status code from a compatible source.

get_error_body(e)

Get the body from a Boto 2 or 3 error, or compatible types.

meets_error_message_condition(e, error_message)

meets_error_code_condition(e, error_codes)

These are expected to be normal HTTP error codes, like 404 or 500.

meets_boto_error_code_condition(e, boto_error_codes)

These are expected to be AWS's custom error aliases, like 'BucketNotFound' or 'AccessDenied'.

error_meets_conditions(e, error_conditions)

old_retry([delays, timeout, predicate])

Deprecated.

Module Contents

toil.lib.retry.SUPPORTED_HTTP_ERRORS
toil.lib.retry.kubernetes = None
toil.lib.retry.botocore = None
toil.lib.retry.logger
class toil.lib.retry.ErrorCondition(error=None, error_codes=None, boto_error_codes=None, error_message_must_include=None, retry_on_this_condition=True)[source]

A wrapper describing an error condition.

ErrorCondition events may be used to define errors in more detail to determine whether to retry.

Parameters:
  • error (Optional[Any])

  • error_codes (List[int])

  • boto_error_codes (List[str])

  • error_message_must_include (str)

  • retry_on_this_condition (bool)

error
error_codes
boto_error_codes
error_message_must_include
retry_on_this_condition
toil.lib.retry.RT
toil.lib.retry.retry(intervals=None, infinite_retries=False, errors=None, log_message=None, prepare=None)[source]

Retry a function if it fails with any Exception defined in “errors”.

Does so every x seconds, where x is defined by a list of numbers (ints or floats) in “intervals”. Also accepts ErrorCondition events for more detailed retry attempts.

Parameters:
  • intervals (Optional[List]) – A list of times in seconds we keep retrying until returning failure. Defaults to retrying with the following exponential back-off before failing: 1s, 1s, 2s, 4s, 8s, 16s

  • infinite_retries (bool) – If this is True, reset the intervals when they run out. Defaults to: False.

  • errors (Optional[Sequence[Union[ErrorCondition, Type[Exception]]]]) –

    A list of exceptions OR ErrorCondition objects to catch and retry on. ErrorCondition objects describe more detailed error event conditions than a plain error. An ErrorCondition specifies: - Exception (required) - Error codes that must match to be retried (optional; defaults to not checking) - A string that must be in the error message to be retried (optional; defaults to not checking) - A bool that can be set to False to always error on this condition.

    If not specified, this will default to a generic Exception.

  • log_message (Optional[Tuple[Callable, str]]) – Optional tuple of (“log/print function()”, “message string”) that will precede each attempt.

  • prepare (Optional[List[Callable]]) – Optional list of functions to call, with the function’s arguments, between retries, to reset state.

Returns:

The result of the wrapped function or raise.

Return type:

Callable[[Callable[Ellipsis, RT]], Callable[Ellipsis, RT]]

toil.lib.retry.return_status_code(e)[source]
toil.lib.retry.get_error_code(e)[source]

Get the error code name from a Boto 2 or 3 error, or compatible types.

Returns empty string for other errors.

Parameters:

e (Exception)

Return type:

str

toil.lib.retry.get_error_message(e)[source]

Get the error message string from a Boto 2 or 3 error, or compatible types.

Note that error message conditions also check more than this; this function does not fall back to the traceback for incompatible types.

Parameters:

e (Exception)

Return type:

str

toil.lib.retry.get_error_status(e)[source]

Get the HTTP status code from a compatible source.

Such as a Boto 2 or 3 error, kubernetes.client.rest.ApiException, http.client.HTTPException, urllib3.exceptions.HTTPError, requests.exceptions.HTTPError, urllib.error.HTTPError, or compatible type

Returns 0 from other errors.

Parameters:

e (Exception)

Return type:

int

toil.lib.retry.get_error_body(e)[source]

Get the body from a Boto 2 or 3 error, or compatible types.

Returns the code and message if the error does not have a body.

Parameters:

e (Exception)

Return type:

str

toil.lib.retry.meets_error_message_condition(e, error_message)[source]
Parameters:
toil.lib.retry.meets_error_code_condition(e, error_codes)[source]

These are expected to be normal HTTP error codes, like 404 or 500.

Parameters:
toil.lib.retry.meets_boto_error_code_condition(e, boto_error_codes)[source]

These are expected to be AWS’s custom error aliases, like ‘BucketNotFound’ or ‘AccessDenied’.

Parameters:
toil.lib.retry.error_meets_conditions(e, error_conditions)[source]
toil.lib.retry.DEFAULT_DELAYS = (0, 1, 1, 4, 16, 64)
toil.lib.retry.DEFAULT_TIMEOUT = 300
toil.lib.retry.E
toil.lib.retry.old_retry(delays=DEFAULT_DELAYS, timeout=DEFAULT_TIMEOUT, predicate=lambda e: ...)[source]

Deprecated.

Retry an operation while the failure matches a given predicate and until a given timeout expires, waiting a given amount of time in between attempts. This function is a generator that yields contextmanagers. See doctests below for example usage.

Parameters:
  • delays (Iterable[float]) – an interable yielding the time in seconds to wait before each retried attempt, the last element of the iterable will be repeated.

  • timeout (float) – a overall timeout that should not be exceeded for all attempts together. This is a best-effort mechanism only and it won’t abort an ongoing attempt, even if the timeout expires during that attempt.

  • predicate (Callable[[Exception],bool]) – a unary callable returning True if another attempt should be made to recover from the given exception. The default value for this parameter will prevent any retries!

Returns:

a generator yielding context managers, one per attempt

Return type:

Iterator

Retry for a limited amount of time:

>>> true = lambda _:True
>>> false = lambda _:False
>>> i = 0
>>> for attempt in old_retry( delays=[0], timeout=.1, predicate=true ):
...     with attempt:
...         i += 1
...         raise RuntimeError('foo')
Traceback (most recent call last):
...
RuntimeError: foo
>>> i > 1
True

If timeout is 0, do exactly one attempt:

>>> i = 0
>>> for attempt in old_retry( timeout=0 ):
...     with attempt:
...         i += 1
...         raise RuntimeError( 'foo' )
Traceback (most recent call last):
...
RuntimeError: foo
>>> i
1

Don’t retry on success:

>>> i = 0
>>> for attempt in old_retry( delays=[0], timeout=.1, predicate=true ):
...     with attempt:
...         i += 1
>>> i
1

Don’t retry on unless predicate returns True:

>>> i = 0
>>> for attempt in old_retry( delays=[0], timeout=.1, predicate=false):
...     with attempt:
...         i += 1
...         raise RuntimeError( 'foo' )
Traceback (most recent call last):
...
RuntimeError: foo
>>> i
1
toil.lib.retry.retry_flaky_test