toil.jobStores.aws.utils

Attributes

AWSServerErrors

DEFAULT_DELAYS

DEFAULT_TIMEOUT

logger

DIAL_SPECIFIC_REGION_CONFIG

Exceptions

ServerSideCopyProhibitedError

Raised when AWS refuses to perform a server-side copy between S3 keys, and

Classes

SDBHelper

A mixin with methods for storing limited amounts of binary data in an SDB item

Functions

connection_error(e)

Return True if an error represents a failure to make a network connection.

get_bucket_region(bucket_name[, endpoint_url, ...])

Get the AWS region name associated with the given S3 bucket.

compat_bytes(s)

get_error_code(e)

Get the error code name from a Boto 2 or 3 error, or compatible types.

get_error_message(e)

Get the error message string from a Boto 2 or 3 error, or compatible types.

get_error_status(e)

Get the HTTP status code from a compatible source.

old_retry([delays, timeout, predicate])

Deprecated.

retry([intervals, infinite_retries, errors, ...])

Retry a function if it fails with any Exception defined in "errors".

fileSizeAndTime(localFilePath)

uploadFromPath(localFilePath, resource, bucketName, fileID)

Uploads a file to s3, using multipart uploading if applicable

uploadFile(readable, resource, bucketName, fileID[, ...])

Upload a readable object to s3, using multipart uploading if applicable.

copyKeyMultipart(resource, srcBucketName, srcKeyName, ...)

Copies a key from a source key to a destination key in multiple parts. Note that if the

monkeyPatchSdbConnection(sdb)

type sdb:

SDBConnection

sdb_unavailable(e)

no_such_sdb_domain(e)

retryable_ssl_error(e)

retryable_sdb_errors(e)

retry_sdb([delays, timeout, predicate])

Module Contents

toil.jobStores.aws.utils.AWSServerErrors
toil.jobStores.aws.utils.connection_error(e)

Return True if an error represents a failure to make a network connection.

Parameters:

e (Exception)

Return type:

bool

toil.jobStores.aws.utils.get_bucket_region(bucket_name, endpoint_url=None, only_strategies=None)

Get the AWS region name associated with the given S3 bucket.

Takes an optional S3 API URL override.

Parameters:
  • only_strategies (Optional[Set[int]]) – For testing, use only strategies with 1-based numbers in this set.

  • bucket_name (str)

  • endpoint_url (Optional[str])

Return type:

str

toil.jobStores.aws.utils.compat_bytes(s)
Parameters:

s (Union[bytes, str])

Return type:

str

toil.jobStores.aws.utils.DEFAULT_DELAYS = (0, 1, 1, 4, 16, 64)
toil.jobStores.aws.utils.DEFAULT_TIMEOUT = 300
toil.jobStores.aws.utils.get_error_code(e)

Get the error code name from a Boto 2 or 3 error, or compatible types.

Returns empty string for other errors.

Parameters:

e (Exception)

Return type:

str

toil.jobStores.aws.utils.get_error_message(e)

Get the error message string from a Boto 2 or 3 error, or compatible types.

Note that error message conditions also check more than this; this function does not fall back to the traceback for incompatible types.

Parameters:

e (Exception)

Return type:

str

toil.jobStores.aws.utils.get_error_status(e)

Get the HTTP status code from a compatible source.

Such as a Boto 2 or 3 error, kubernetes.client.rest.ApiException, http.client.HTTPException, urllib3.exceptions.HTTPError, requests.exceptions.HTTPError, urllib.error.HTTPError, or compatible type

Returns 0 from other errors.

Parameters:

e (Exception)

Return type:

int

toil.jobStores.aws.utils.old_retry(delays=DEFAULT_DELAYS, timeout=DEFAULT_TIMEOUT, predicate=lambda e: ...)

Deprecated.

Retry an operation while the failure matches a given predicate and until a given timeout expires, waiting a given amount of time in between attempts. This function is a generator that yields contextmanagers. See doctests below for example usage.

Parameters:
  • delays (Iterable[float]) – an interable yielding the time in seconds to wait before each retried attempt, the last element of the iterable will be repeated.

  • timeout (float) – a overall timeout that should not be exceeded for all attempts together. This is a best-effort mechanism only and it won’t abort an ongoing attempt, even if the timeout expires during that attempt.

  • predicate (Callable[[Exception],bool]) – a unary callable returning True if another attempt should be made to recover from the given exception. The default value for this parameter will prevent any retries!

Returns:

a generator yielding context managers, one per attempt

Return type:

Iterator

Retry for a limited amount of time:

>>> true = lambda _:True
>>> false = lambda _:False
>>> i = 0
>>> for attempt in old_retry( delays=[0], timeout=.1, predicate=true ):
...     with attempt:
...         i += 1
...         raise RuntimeError('foo')
Traceback (most recent call last):
...
RuntimeError: foo
>>> i > 1
True

If timeout is 0, do exactly one attempt:

>>> i = 0
>>> for attempt in old_retry( timeout=0 ):
...     with attempt:
...         i += 1
...         raise RuntimeError( 'foo' )
Traceback (most recent call last):
...
RuntimeError: foo
>>> i
1

Don’t retry on success:

>>> i = 0
>>> for attempt in old_retry( delays=[0], timeout=.1, predicate=true ):
...     with attempt:
...         i += 1
>>> i
1

Don’t retry on unless predicate returns True:

>>> i = 0
>>> for attempt in old_retry( delays=[0], timeout=.1, predicate=false):
...     with attempt:
...         i += 1
...         raise RuntimeError( 'foo' )
Traceback (most recent call last):
...
RuntimeError: foo
>>> i
1
toil.jobStores.aws.utils.retry(intervals=None, infinite_retries=False, errors=None, log_message=None, prepare=None)

Retry a function if it fails with any Exception defined in “errors”.

Does so every x seconds, where x is defined by a list of numbers (ints or floats) in “intervals”. Also accepts ErrorCondition events for more detailed retry attempts.

Parameters:
  • intervals (Optional[List]) – A list of times in seconds we keep retrying until returning failure. Defaults to retrying with the following exponential back-off before failing: 1s, 1s, 2s, 4s, 8s, 16s

  • infinite_retries (bool) – If this is True, reset the intervals when they run out. Defaults to: False.

  • errors (Optional[Sequence[Union[ErrorCondition, Type[Exception]]]]) –

    A list of exceptions OR ErrorCondition objects to catch and retry on. ErrorCondition objects describe more detailed error event conditions than a plain error. An ErrorCondition specifies: - Exception (required) - Error codes that must match to be retried (optional; defaults to not checking) - A string that must be in the error message to be retried (optional; defaults to not checking) - A bool that can be set to False to always error on this condition.

    If not specified, this will default to a generic Exception.

  • log_message (Optional[Tuple[Callable, str]]) – Optional tuple of (“log/print function()”, “message string”) that will precede each attempt.

  • prepare (Optional[List[Callable]]) – Optional list of functions to call, with the function’s arguments, between retries, to reset state.

Returns:

The result of the wrapped function or raise.

Return type:

Callable[[Callable[Ellipsis, RT]], Callable[Ellipsis, RT]]

toil.jobStores.aws.utils.logger
toil.jobStores.aws.utils.DIAL_SPECIFIC_REGION_CONFIG
class toil.jobStores.aws.utils.SDBHelper

A mixin with methods for storing limited amounts of binary data in an SDB item

>>> import os
>>> H=SDBHelper
>>> H.presenceIndicator() 
u'numChunks'
>>> H.binaryToAttributes(None)['numChunks']
0
>>> H.attributesToBinary({u'numChunks': 0})
(None, 0)
>>> H.binaryToAttributes(b'') 
{u'000': b'VQ==', u'numChunks': 1}
>>> H.attributesToBinary({u'numChunks': 1, u'000': b'VQ=='}) 
(b'', 1)

Good pseudo-random data is very likely smaller than its bzip2ed form. Subtract 1 for the type character, i.e ‘C’ or ‘U’, with which the string is prefixed. We should get one full chunk:

>>> s = os.urandom(H.maxRawValueSize-1)
>>> d = H.binaryToAttributes(s)
>>> len(d), len(d['000'])
(2, 1024)
>>> H.attributesToBinary(d) == (s, 1)
True

One byte more and we should overflow four bytes into the second chunk, two bytes for base64-encoding the additional character and two bytes for base64-padding to the next quartet.

>>> s += s[0:1]
>>> d = H.binaryToAttributes(s)
>>> len(d), len(d['000']), len(d['001'])
(3, 1024, 4)
>>> H.attributesToBinary(d) == (s, 2)
True
maxAttributesPerItem = 256
maxValueSize = 1024
maxRawValueSize
classmethod maxBinarySize(extraReservedChunks=0)
classmethod binaryToAttributes(binary)

Turn a bytestring, or None, into SimpleDB attributes.

Return type:

Dict[str, str]

classmethod attributeDictToList(attributes)

Convert the attribute dict (ex: from binaryToAttributes) into a list of attribute typed dicts to be compatible with boto3 argument syntax :param attributes: Dict[str, str], attribute in object form :return: List[AttributeTypeDef], list of attributes in typed dict form

Parameters:

attributes (Dict[str, str])

Return type:

List[mypy_boto3_sdb.type_defs.AttributeTypeDef]

classmethod attributeListToDict(attributes)

Convert the attribute boto3 representation of list of attribute typed dicts back to a dictionary with name, value pairs :param attribute: List[AttributeTypeDef, attribute in typed dict form :return: Dict[str, str], attribute in dict form

Parameters:

attributes (List[mypy_boto3_sdb.type_defs.AttributeTypeDef])

Return type:

Dict[str, str]

classmethod get_attributes_from_item(item, keys)
Parameters:
  • item (mypy_boto3_sdb.type_defs.ItemTypeDef)

  • keys (List[str])

Return type:

List[Optional[str]]

classmethod presenceIndicator()

The key that is guaranteed to be present in the return value of binaryToAttributes(). Assuming that binaryToAttributes() is used with SDB’s PutAttributes, the return value of this method could be used to detect the presence/absence of an item in SDB.

classmethod attributesToBinary(attributes)
Return type:

(str|None,int)

Returns:

the binary data and the number of chunks it was composed from

Parameters:

attributes (List[mypy_boto3_sdb.type_defs.AttributeTypeDef])

toil.jobStores.aws.utils.fileSizeAndTime(localFilePath)
toil.jobStores.aws.utils.uploadFromPath(localFilePath, resource, bucketName, fileID, headerArgs=None, partSize=50 << 20)

Uploads a file to s3, using multipart uploading if applicable

Parameters:
  • localFilePath (str) – Path of the file to upload to s3

  • resource (S3.Resource) – boto3 resource

  • bucketName (str) – name of the bucket to upload to

  • fileID (str) – the name of the file to upload to

  • headerArgs (dict) – http headers to use when uploading - generally used for encryption purposes

  • partSize (int) – max size of each part in the multipart upload, in bytes

Returns:

version of the newly uploaded file

toil.jobStores.aws.utils.uploadFile(readable, resource, bucketName, fileID, headerArgs=None, partSize=50 << 20)

Upload a readable object to s3, using multipart uploading if applicable. :param readable: a readable stream or a file path to upload to s3 :param S3.Resource resource: boto3 resource :param str bucketName: name of the bucket to upload to :param str fileID: the name of the file to upload to :param dict headerArgs: http headers to use when uploading - generally used for encryption purposes :param int partSize: max size of each part in the multipart upload, in bytes :return: version of the newly uploaded file

Parameters:
  • bucketName (str)

  • fileID (str)

  • headerArgs (Optional[dict])

  • partSize (int)

exception toil.jobStores.aws.utils.ServerSideCopyProhibitedError

Bases: RuntimeError

Raised when AWS refuses to perform a server-side copy between S3 keys, and insists that you pay to download and upload the data yourself instead.

toil.jobStores.aws.utils.copyKeyMultipart(resource, srcBucketName, srcKeyName, srcKeyVersion, dstBucketName, dstKeyName, sseAlgorithm=None, sseKey=None, copySourceSseAlgorithm=None, copySourceSseKey=None)

Copies a key from a source key to a destination key in multiple parts. Note that if the destination key exists it will be overwritten implicitly, and if it does not exist a new key will be created. If the destination bucket does not exist an error will be raised.

This function will always do a fast, server-side copy, at least until/unless <https://github.com/boto/boto3/issues/3270> is fixed. In some situations, a fast, server-side copy is not actually possible. For example, when residing in an AWS VPC with an S3 VPC Endpoint configured, copying from a bucket in another region to a bucket in your own region cannot be performed server-side. This is because the VPC Endpoint S3 API servers refuse to perform server-side copies between regions, the source region’s API servers refuse to initiate the copy and refer you to the destination bucket’s region’s API servers, and the VPC routing tables are configured to redirect all access to the current region’s S3 API servers to the S3 Endpoint API servers instead.

If a fast server-side copy is not actually possible, a ServerSideCopyProhibitedError will be raised.

Parameters:
  • resource (mypy_boto3_s3.S3ServiceResource) – boto3 resource

  • srcBucketName (str) – The name of the bucket to be copied from.

  • srcKeyName (str) – The name of the key to be copied from.

  • srcKeyVersion (str) – The version of the key to be copied from.

  • dstBucketName (str) – The name of the destination bucket for the copy.

  • dstKeyName (str) – The name of the destination key that will be created or overwritten.

  • sseAlgorithm (str) – Server-side encryption algorithm for the destination.

  • sseKey (str) – Server-side encryption key for the destination.

  • copySourceSseAlgorithm (str) – Server-side encryption algorithm for the source.

  • copySourceSseKey (str) – Server-side encryption key for the source.

Return type:

str

Returns:

The version of the copied file (or None if versioning is not enabled for dstBucket).

toil.jobStores.aws.utils.monkeyPatchSdbConnection(sdb)
toil.jobStores.aws.utils.sdb_unavailable(e)
toil.jobStores.aws.utils.no_such_sdb_domain(e)
toil.jobStores.aws.utils.retryable_ssl_error(e)
toil.jobStores.aws.utils.retryable_sdb_errors(e)
toil.jobStores.aws.utils.retry_sdb(delays=DEFAULT_DELAYS, timeout=DEFAULT_TIMEOUT, predicate=retryable_sdb_errors)