toil.jobStores.aws.utils

Module Contents

Classes

SDBHelper

A mixin with methods for storing limited amounts of binary data in an SDB item

Functions

fileSizeAndTime(localFilePath)

uploadFromPath(localFilePath, resource, bucketName, fileID)

Uploads a file to s3, using multipart uploading if applicable

uploadFile(readable, resource, bucketName, fileID[, ...])

Upload a readable object to s3, using multipart uploading if applicable.

copyKeyMultipart(resource, srcBucketName, srcKeyName, ...)

Copies a key from a source key to a destination key in multiple parts. Note that if the

monkeyPatchSdbConnection(sdb)

type sdb:

SDBConnection

sdb_unavailable(e)

no_such_sdb_domain(e)

retryable_ssl_error(e)

retryable_sdb_errors(e)

retry_sdb([delays, timeout, predicate])

Attributes

logger

DIAL_SPECIFIC_REGION_CONFIG

toil.jobStores.aws.utils.logger
toil.jobStores.aws.utils.DIAL_SPECIFIC_REGION_CONFIG
class toil.jobStores.aws.utils.SDBHelper

A mixin with methods for storing limited amounts of binary data in an SDB item

>>> import os
>>> H=SDBHelper
>>> H.presenceIndicator() 
u'numChunks'
>>> H.binaryToAttributes(None)['numChunks']
0
>>> H.attributesToBinary({u'numChunks': 0})
(None, 0)
>>> H.binaryToAttributes(b'') 
{u'000': b'VQ==', u'numChunks': 1}
>>> H.attributesToBinary({u'numChunks': 1, u'000': b'VQ=='}) 
(b'', 1)

Good pseudo-random data is very likely smaller than its bzip2ed form. Subtract 1 for the type character, i.e ‘C’ or ‘U’, with which the string is prefixed. We should get one full chunk:

>>> s = os.urandom(H.maxRawValueSize-1)
>>> d = H.binaryToAttributes(s)
>>> len(d), len(d['000'])
(2, 1024)
>>> H.attributesToBinary(d) == (s, 1)
True

One byte more and we should overflow four bytes into the second chunk, two bytes for base64-encoding the additional character and two bytes for base64-padding to the next quartet.

>>> s += s[0:1]
>>> d = H.binaryToAttributes(s)
>>> len(d), len(d['000']), len(d['001'])
(3, 1024, 4)
>>> H.attributesToBinary(d) == (s, 2)
True
maxAttributesPerItem = 256
maxValueSize = 1024
maxRawValueSize
classmethod maxBinarySize(extraReservedChunks=0)
classmethod binaryToAttributes(binary)

Turn a bytestring, or None, into SimpleDB attributes.

classmethod presenceIndicator()

The key that is guaranteed to be present in the return value of binaryToAttributes(). Assuming that binaryToAttributes() is used with SDB’s PutAttributes, the return value of this method could be used to detect the presence/absence of an item in SDB.

classmethod attributesToBinary(attributes)
Return type:

(str|None,int)

Returns:

the binary data and the number of chunks it was composed from

toil.jobStores.aws.utils.fileSizeAndTime(localFilePath)
toil.jobStores.aws.utils.uploadFromPath(localFilePath, resource, bucketName, fileID, headerArgs=None, partSize=50 << 20)

Uploads a file to s3, using multipart uploading if applicable

Parameters:
  • localFilePath (str) – Path of the file to upload to s3

  • resource (S3.Resource) – boto3 resource

  • bucketName (str) – name of the bucket to upload to

  • fileID (str) – the name of the file to upload to

  • headerArgs (dict) – http headers to use when uploading - generally used for encryption purposes

  • partSize (int) – max size of each part in the multipart upload, in bytes

Returns:

version of the newly uploaded file

toil.jobStores.aws.utils.uploadFile(readable, resource, bucketName, fileID, headerArgs=None, partSize=50 << 20)

Upload a readable object to s3, using multipart uploading if applicable. :param readable: a readable stream or a file path to upload to s3 :param S3.Resource resource: boto3 resource :param str bucketName: name of the bucket to upload to :param str fileID: the name of the file to upload to :param dict headerArgs: http headers to use when uploading - generally used for encryption purposes :param int partSize: max size of each part in the multipart upload, in bytes :return: version of the newly uploaded file

Parameters:
  • bucketName (str)

  • fileID (str)

  • headerArgs (Optional[dict])

  • partSize (int)

exception toil.jobStores.aws.utils.ServerSideCopyProhibitedError

Bases: RuntimeError

Raised when AWS refuses to perform a server-side copy between S3 keys, and insists that you pay to download and upload the data yourself instead.

toil.jobStores.aws.utils.copyKeyMultipart(resource, srcBucketName, srcKeyName, srcKeyVersion, dstBucketName, dstKeyName, sseAlgorithm=None, sseKey=None, copySourceSseAlgorithm=None, copySourceSseKey=None)

Copies a key from a source key to a destination key in multiple parts. Note that if the destination key exists it will be overwritten implicitly, and if it does not exist a new key will be created. If the destination bucket does not exist an error will be raised.

This function will always do a fast, server-side copy, at least until/unless <https://github.com/boto/boto3/issues/3270> is fixed. In some situations, a fast, server-side copy is not actually possible. For example, when residing in an AWS VPC with an S3 VPC Endpoint configured, copying from a bucket in another region to a bucket in your own region cannot be performed server-side. This is because the VPC Endpoint S3 API servers refuse to perform server-side copies between regions, the source region’s API servers refuse to initiate the copy and refer you to the destination bucket’s region’s API servers, and the VPC routing tables are configured to redirect all access to the current region’s S3 API servers to the S3 Endpoint API servers instead.

If a fast server-side copy is not actually possible, a ServerSideCopyProhibitedError will be raised.

Parameters:
  • resource (mypy_boto3_s3.S3ServiceResource) – boto3 resource

  • srcBucketName (str) – The name of the bucket to be copied from.

  • srcKeyName (str) – The name of the key to be copied from.

  • srcKeyVersion (str) – The version of the key to be copied from.

  • dstBucketName (str) – The name of the destination bucket for the copy.

  • dstKeyName (str) – The name of the destination key that will be created or overwritten.

  • sseAlgorithm (str) – Server-side encryption algorithm for the destination.

  • sseKey (str) – Server-side encryption key for the destination.

  • copySourceSseAlgorithm (str) – Server-side encryption algorithm for the source.

  • copySourceSseKey (str) – Server-side encryption key for the source.

Return type:

str

Returns:

The version of the copied file (or None if versioning is not enabled for dstBucket).

toil.jobStores.aws.utils.monkeyPatchSdbConnection(sdb)
toil.jobStores.aws.utils.sdb_unavailable(e)
toil.jobStores.aws.utils.no_such_sdb_domain(e)
toil.jobStores.aws.utils.retryable_ssl_error(e)
toil.jobStores.aws.utils.retryable_sdb_errors(e)
toil.jobStores.aws.utils.retry_sdb(delays=DEFAULT_DELAYS, timeout=DEFAULT_TIMEOUT, predicate=retryable_sdb_errors)