toil.lib.aws.s3

Attributes

logger

AWS_MAX_MULTIPART_COUNT

AWS_MAX_CHUNK_SIZE

AWS_MIN_CHUNK_SIZE

DEFAULT_AWS_CHUNK_SIZE

Exceptions

AWSKeyNotFoundError

Common base class for all non-exit exceptions.

AWSKeyAlreadyExistsError

Common base class for all non-exit exceptions.

AWSBadEncryptionKeyError

Common base class for all non-exit exceptions.

Classes

MultiPartPipe

An object-oriented wrapper for os.pipe. Clients should subclass it, implement

Functions

create_s3_bucket(s3_resource, bucket_name, region[, ...])

Create an AWS S3 bucket, using the given Boto3 S3 session, with the

delete_s3_bucket(s3_resource, bucket_name[, quiet])

Delete the bucket with 'bucket_name'.

bucket_exists(s3_resource, bucket)

head_s3_object(bucket, key, header[, region])

Attempt to HEAD an s3 object and return its response.

list_multipart_uploads(bucket, region, prefix[, ...])

copy_s3_to_s3(s3_resource, src_bucket, src_key, ...[, ...])

copy_local_to_s3(s3_resource, local_file_path, ...[, ...])

copy_s3_to_local(s3_resource, local_file_path, ...[, ...])

parse_s3_uri(uri)

list_s3_items(s3_resource, bucket, prefix[, startafter])

upload_to_s3(readable, s3_resource, bucket, key[, ...])

Upload a readable object to s3, using multipart uploading if applicable.

download_stream(s3_resource, bucket, key[, ...])

Context manager that gives out a download stream to download data.

download_fileobject(s3_resource, bucket, key, fileobj)

s3_key_exists(s3_resource, bucket, key[, check, ...])

Return True if the s3 obect exists, and False if not. Will error if encryption args are incorrect.

get_s3_object(s3_resource, bucket, key[, extra_args])

put_s3_object(s3_resource, bucket, key, body[, extra_args])

generate_presigned_url(s3_resource, bucket, key_name, ...)

create_public_url(s3_resource, bucket, key)

get_s3_bucket_region(s3_resource, bucket)

Module Contents

toil.lib.aws.s3.logger
toil.lib.aws.s3.AWS_MAX_MULTIPART_COUNT = 10000
toil.lib.aws.s3.AWS_MAX_CHUNK_SIZE = 5000000000000
toil.lib.aws.s3.AWS_MIN_CHUNK_SIZE = 5000000
toil.lib.aws.s3.DEFAULT_AWS_CHUNK_SIZE = 134217728
exception toil.lib.aws.s3.AWSKeyNotFoundError

Bases: Exception

Common base class for all non-exit exceptions.

exception toil.lib.aws.s3.AWSKeyAlreadyExistsError

Bases: Exception

Common base class for all non-exit exceptions.

exception toil.lib.aws.s3.AWSBadEncryptionKeyError

Bases: Exception

Common base class for all non-exit exceptions.

toil.lib.aws.s3.create_s3_bucket(s3_resource, bucket_name, region, tags=None, public=False, encryptable=False)

Create an AWS S3 bucket, using the given Boto3 S3 session, with the given name, in the given region.

Supports the us-east-1 region, where bucket creation is special.

ALL S3 bucket creation should use this function.

Parameters:
  • public (bool) – If True, objects in the bucket can be made publicly accessible. If False, they cannot. Historically, the default setting was to allow public access, but in 2023, Amazon made not allowing public access to any objects their default. This function uses the new default value of False.

  • encryptable (bool) – If True, allow server-side encryption with customer-managed keys (SSE-C) in the bucket. Historically, all buckets supported this, but in November 2025 and April 2026, Amazon made not allowing this the default. This function uses the new default value of False.

  • s3_resource (mypy_boto3_s3.S3ServiceResource)

  • bucket_name (str)

  • region (mypy_boto3_s3.literals.BucketLocationConstraintType | Literal['us-east-1'])

  • tags (dict[str, str] | None)

Return type:

mypy_boto3_s3.service_resource.Bucket

toil.lib.aws.s3.delete_s3_bucket(s3_resource, bucket_name, quiet=True)

Delete the bucket with ‘bucket_name’.

Note: ‘quiet’ is False when used for a clean up utility script (contrib/admin/cleanup_aws_resources.py)

that prints progress rather than logging. Logging should be used for all other internal Toil usage.

Parameters:
  • s3_resource (mypy_boto3_s3.S3ServiceResource)

  • bucket_name (str)

  • quiet (bool)

Return type:

None

toil.lib.aws.s3.bucket_exists(s3_resource, bucket)
Parameters:
  • s3_resource (mypy_boto3_s3.S3ServiceResource)

  • bucket (str)

Return type:

bool | mypy_boto3_s3.service_resource.Bucket

toil.lib.aws.s3.head_s3_object(bucket, key, header, region=None)

Attempt to HEAD an s3 object and return its response.

Parameters:
Return type:

mypy_boto3_s3.type_defs.HeadObjectOutputTypeDef

toil.lib.aws.s3.list_multipart_uploads(bucket, region, prefix, max_uploads=1)
Parameters:
Return type:

mypy_boto3_s3.type_defs.ListMultipartUploadsOutputTypeDef

toil.lib.aws.s3.copy_s3_to_s3(s3_resource, src_bucket, src_key, dst_bucket, dst_key, extra_args=None)
Parameters:
  • s3_resource (mypy_boto3_s3.S3ServiceResource)

  • src_bucket (str)

  • src_key (str)

  • dst_bucket (str)

  • dst_key (str)

  • extra_args (dict[Any, Any] | None)

Return type:

None

toil.lib.aws.s3.copy_local_to_s3(s3_resource, local_file_path, dst_bucket, dst_key, extra_args=None)
Parameters:
  • s3_resource (mypy_boto3_s3.S3ServiceResource)

  • local_file_path (str)

  • dst_bucket (str)

  • dst_key (str)

  • extra_args (dict[Any, Any] | None)

Return type:

None

toil.lib.aws.s3.copy_s3_to_local(s3_resource, local_file_path, src_bucket, src_key, extra_args=None)
Parameters:
  • s3_resource (mypy_boto3_s3.S3ServiceResource)

  • local_file_path (str)

  • src_bucket (str)

  • src_key (str)

  • extra_args (dict[Any, Any] | None)

Return type:

None

class toil.lib.aws.s3.MultiPartPipe(part_size, s3_resource, bucket_name, file_id, encryption_args, encoding=None, errors=None)

Bases: toil.lib.pipes.WritablePipe

An object-oriented wrapper for os.pipe. Clients should subclass it, implement readFrom() to consume the readable end of the pipe, then instantiate the class as a context manager to get the writable end. See the example below.

>>> import sys, shutil, codecs
>>> class MyPipe(WritablePipe):
...     def readFrom(self, readable):
...         shutil.copyfileobj(codecs.getreader('utf-8')(readable), sys.stdout)
>>> with MyPipe() as writable:
...     _ = writable.write('Hello, world!\n'.encode('utf-8'))
Hello, world!

Each instance of this class creates a thread and invokes the readFrom method in that thread. The thread will be join()ed upon exit from the context manager, i.e. the body of the with statement.

Now, exceptions in the reader thread will be reraised in the main thread:

>>> class MyPipe(WritablePipe):
...     def readFrom(self, readable):
...         raise RuntimeError('Hello, world!')
>>> with MyPipe() as writable:
...     pass
Traceback (most recent call last):
...
RuntimeError: Hello, world!

More complicated, less illustrative tests:

Same as above, but proving that handles are closed:

>>> x = os.dup(0); os.close(x)
>>> class MyPipe(WritablePipe):
...     def readFrom(self, readable):
...         raise RuntimeError('Hello, world!')
>>> with MyPipe() as writable:
...     pass
Traceback (most recent call last):
...
RuntimeError: Hello, world!
>>> y = os.dup(0); os.close(y); x == y
True

Exceptions in the body of the with statement aren’t masked, and handles are closed:

>>> x = os.dup(0); os.close(x)
>>> class MyPipe(WritablePipe):
...     def readFrom(self, readable):
...         pass
>>> with MyPipe() as writable:
...     raise RuntimeError('Hello, world!')
Traceback (most recent call last):
...
RuntimeError: Hello, world!
>>> y = os.dup(0); os.close(y); x == y
True

Exceptions in the body of the with statement will cause an error that the readFrom method can detect, visible by the time the empty-read EOF marker is visible.

>>> seen_errors = []
>>> class MyPipe(WritablePipe):
...     def readFrom(self, readable):
...         while readable.read(100):
...             pass
...         if self.writer_error is not None:
...             seen_errors.append(self.writer_error)
>>> with MyPipe() as writable:
...     raise RuntimeError('Hello, world!')
Traceback (most recent call last):
...
RuntimeError: Hello, world!
>>> len(seen_errors)
1
>>> type(seen_errors[0])
<class 'RuntimeError'>
Parameters:
  • part_size (int)

  • s3_resource (mypy_boto3_s3.S3ServiceResource)

  • bucket_name (str)

  • file_id (str)

  • encryption_args (dict[Any, Any] | None)

  • encoding (str | None)

  • errors (str | None)

encoding = None
errors = None
part_size
s3_client
bucket_name
file_id
encryption_args
readFrom(readable)

Implement this method to read data from the pipe. This method should support both binary and text mode output.

If this method needs to do any sort of cleanup on failure, it should check self.writer_error after observing EOF, to distinguish normal and abnormal termination of the writer.

Parameters:

readable (file) – the file object representing the readable end of the pipe. Do not explicitly invoke the close() method of the object; that will be done automatically.

Return type:

None

toil.lib.aws.s3.parse_s3_uri(uri)
Parameters:

uri (str)

Return type:

tuple[str, str]

toil.lib.aws.s3.list_s3_items(s3_resource, bucket, prefix, startafter=None)
Parameters:
  • s3_resource (mypy_boto3_s3.S3ServiceResource)

  • bucket (str)

  • prefix (str)

  • startafter (str | None)

Return type:

collections.abc.Iterator[mypy_boto3_s3.type_defs.ObjectTypeDef]

toil.lib.aws.s3.upload_to_s3(readable, s3_resource, bucket, key, extra_args=None)

Upload a readable object to s3, using multipart uploading if applicable.

Parameters:
  • readable (IO[Any]) – a readable stream or a local file path to upload to s3

  • resource (S3.Resource) – boto3 resource

  • bucket (str) – name of the bucket to upload to

  • key (str) – the name of the file to upload to

  • extra_args (dict) – http headers to use when uploading - generally used for encryption purposes

  • partSize (int) – max size of each part in the multipart upload, in bytes

  • s3_resource (mypy_boto3_s3.S3ServiceResource)

Returns:

version of the newly uploaded file

Return type:

None

toil.lib.aws.s3.download_stream(s3_resource, bucket, key, checksum_to_verify=None, extra_args=None, encoding=None, errors=None)

Context manager that gives out a download stream to download data.

Parameters:
  • s3_resource (mypy_boto3_s3.S3ServiceResource)

  • bucket (str)

  • key (str)

  • checksum_to_verify (str | None)

  • extra_args (dict[Any, Any] | None)

  • encoding (str | None)

  • errors (str | None)

Return type:

collections.abc.Iterator[IO[Any]]

toil.lib.aws.s3.download_fileobject(s3_resource, bucket, key, fileobj, extra_args=None)
Parameters:
  • s3_resource (mypy_boto3_s3.S3ServiceResource)

  • bucket (mypy_boto3_s3.service_resource.Bucket)

  • key (str)

  • fileobj (io.BytesIO)

  • extra_args (dict[Any, Any] | None)

Return type:

None

toil.lib.aws.s3.s3_key_exists(s3_resource, bucket, key, check=False, extra_args=None)

Return True if the s3 obect exists, and False if not. Will error if encryption args are incorrect.

Parameters:
  • s3_resource (mypy_boto3_s3.S3ServiceResource)

  • bucket (str)

  • key (str)

  • check (bool)

  • extra_args (dict[Any, Any] | None)

Return type:

bool

toil.lib.aws.s3.get_s3_object(s3_resource, bucket, key, extra_args=None)
Parameters:
  • s3_resource (mypy_boto3_s3.S3ServiceResource)

  • bucket (str)

  • key (str)

  • extra_args (dict[Any, Any] | None)

Return type:

mypy_boto3_s3.type_defs.GetObjectOutputTypeDef

toil.lib.aws.s3.put_s3_object(s3_resource, bucket, key, body, extra_args=None)
Parameters:
  • s3_resource (mypy_boto3_s3.S3ServiceResource)

  • bucket (str)

  • key (str)

  • body (str | bytes)

  • extra_args (dict[Any, Any] | None)

Return type:

mypy_boto3_s3.type_defs.PutObjectOutputTypeDef

toil.lib.aws.s3.generate_presigned_url(s3_resource, bucket, key_name, expiration)
Parameters:
  • s3_resource (mypy_boto3_s3.S3ServiceResource)

  • bucket (str)

  • key_name (str)

  • expiration (int)

Return type:

str

toil.lib.aws.s3.create_public_url(s3_resource, bucket, key)
Parameters:
  • s3_resource (mypy_boto3_s3.S3ServiceResource)

  • bucket (str)

  • key (str)

Return type:

str

toil.lib.aws.s3.get_s3_bucket_region(s3_resource, bucket)
Parameters:
  • s3_resource (mypy_boto3_s3.S3ServiceResource)

  • bucket (str)

Return type:

str