toil.test.provisioners.clusterScalerTest

Module Contents

Classes

BinPackingTest

A common base class for Toil tests.

ClusterScalerTest

A common base class for Toil tests.

ScalerThreadTest

A common base class for Toil tests.

MockBatchSystemAndProvisioner

Mimics a leader, job batcher, provisioner and scalable batch system.

Attributes

logger

c4_8xlarge_preemptible

c4_8xlarge

r3_8xlarge

r5_2xlarge

r5_4xlarge

t2_micro

toil.test.provisioners.clusterScalerTest.logger
toil.test.provisioners.clusterScalerTest.c4_8xlarge_preemptible
toil.test.provisioners.clusterScalerTest.c4_8xlarge
toil.test.provisioners.clusterScalerTest.r3_8xlarge
toil.test.provisioners.clusterScalerTest.r5_2xlarge
toil.test.provisioners.clusterScalerTest.r5_4xlarge
toil.test.provisioners.clusterScalerTest.t2_micro
class toil.test.provisioners.clusterScalerTest.BinPackingTest(methodName='runTest')[source]

Bases: toil.test.ToilTest

A common base class for Toil tests.

Please have every test case directly or indirectly inherit this one.

When running tests you may optionally set the TOIL_TEST_TEMP environment variable to the path of a directory where you want temporary test files be placed. The directory will be created if it doesn’t exist. The path may be relative in which case it will be assumed to be relative to the project root. If TOIL_TEST_TEMP is not defined, temporary files and directories will be created in the system’s default location for such files and any temporary files or directories left over from tests will be removed automatically removed during tear down. Otherwise, left-over files will not be removed.

setUp()[source]

Hook method for setting up the test fixture before exercising it.

testPackingOneShape()[source]

Pack one shape and check that the resulting reservations look sane.

testSorting()[source]

Test that sorting is correct: preemptible, then memory, then cores, then disk, then wallTime.

testAddingInitialNode()[source]

Pack one shape when no nodes are available and confirm that we fit one node properly.

testLowTargetTime()[source]

Test that a low targetTime (0) parallelizes jobs aggressively (1000 queued jobs require 1000 nodes).

Ideally, low targetTime means: Start quickly and maximize parallelization after the cpu/disk/mem have been packed.

Disk/cpu/mem packing is prioritized first, so we set job resource reqs so that each t2.micro (1 cpu/8G disk/1G RAM) can only run one job at a time with its resources.

Each job is parametrized to take 300 seconds, so (the minimum of) 1 of them should fit into each node’s 0 second window, so we expect 1000 nodes.

testHighTargetTime()[source]

Test that a high targetTime (3600 seconds) maximizes packing within the targetTime.

Ideally, high targetTime means: Maximize packing within the targetTime after the cpu/disk/mem have been packed.

Disk/cpu/mem packing is prioritized first, so we set job resource reqs so that each t2.micro (1 cpu/8G disk/1G RAM) can only run one job at a time with its resources.

Each job is parametrized to take 300 seconds, so 12 of them should fit into each node’s 3600 second window. 1000/12 = 83.33, so we expect 84 nodes.

testZeroResourceJobs()[source]

Test that jobs requiring zero cpu/disk/mem pack first, regardless of targetTime.

Disk/cpu/mem packing is prioritized first, so we set job resource reqs so that each t2.micro (1 cpu/8G disk/1G RAM) can run a seemingly infinite number of jobs with its resources.

Since all jobs should pack cpu/disk/mem-wise on a t2.micro, we expect only one t2.micro to be provisioned. If we raise this, as in testLowTargetTime, it will launch 1000 t2.micros.

testLongRunningJobs()[source]

Test that jobs with long run times (especially service jobs) are aggressively parallelized.

This is important, because services are one case where the degree of parallelization really, really matters. If you have multiple services, they may all need to be running simultaneously before any real work can be done.

Despite setting globalTargetTime=3600, this should launch 1000 t2.micros because each job’s estimated runtime (30000 seconds) extends well beyond 3600 seconds.

run1000JobsOnMicros(jobCores, jobMem, jobDisk, jobTime, globalTargetTime)[source]

Test packing 1000 jobs on t2.micros. Depending on the targetTime and resources, these should pack differently.

testPathologicalCase()[source]

Test a pathological case where only one node can be requested to fit months’ worth of jobs.

If the reservation is extended to fit a long job, and the bin-packer naively searches through all the reservation slices to find the first slice that fits, it will happily assign the first slot that fits the job, even if that slot occurs days in the future.

testJobTooLargeForAllNodes()[source]

If a job is too large for all node types, the scaler should print a warning, but definitely not crash.

class toil.test.provisioners.clusterScalerTest.ClusterScalerTest(methodName='runTest')[source]

Bases: toil.test.ToilTest

A common base class for Toil tests.

Please have every test case directly or indirectly inherit this one.

When running tests you may optionally set the TOIL_TEST_TEMP environment variable to the path of a directory where you want temporary test files be placed. The directory will be created if it doesn’t exist. The path may be relative in which case it will be assumed to be relative to the project root. If TOIL_TEST_TEMP is not defined, temporary files and directories will be created in the system’s default location for such files and any temporary files or directories left over from tests will be removed automatically removed during tear down. Otherwise, left-over files will not be removed.

setUp()[source]

Hook method for setting up the test fixture before exercising it.

testRounding()[source]

Test to make sure the ClusterScaler’s rounding rounds properly.

testMaxNodes()[source]

Set the scaler to be very aggressive, give it a ton of jobs, and make sure it doesn’t go over maxNodes.

testMinNodes()[source]

Without any jobs queued, the scaler should still estimate “minNodes” nodes.

testPreemptibleDeficitResponse()[source]

When a preemptible deficit was detected by a previous run of the loop, the scaler should add non-preemptible nodes to compensate in proportion to preemptibleCompensation.

testPreemptibleDeficitIsSet()[source]

Make sure that updateClusterSize sets the preemptible deficit if it can’t launch preemptible nodes properly. That way, the deficit can be communicated to the next run of estimateNodeCount.

testNoLaunchingIfDeltaAlreadyMet()[source]

Check that the scaler doesn’t try to launch “0” more instances if the delta was able to be met by unignoring nodes.

testBetaInertia()[source]
test_overhead_accounting_large()[source]

If a node has a certain raw memory or disk capacity, that won’t all be available when it actually comes up; some disk and memory will be used by the OS, and the backing scheduler (Mesos, Kubernetes, etc.).

Make sure this overhead is accounted for for large nodes.

test_overhead_accounting_small()[source]

If a node has a certain raw memory or disk capacity, that won’t all be available when it actually comes up; some disk and memory will be used by the OS, and the backing scheduler (Mesos, Kubernetes, etc.).

Make sure this overhead is accounted for for small nodes.

test_overhead_accounting_observed()[source]

If a node has a certain raw memory or disk capacity, that won’t all be available when it actually comes up; some disk and memory will be used by the OS, and the backing scheduler (Mesos, Kubernetes, etc.).

Make sure this overhead is accounted for so that real-world observed failures cannot happen again.

class toil.test.provisioners.clusterScalerTest.ScalerThreadTest(methodName='runTest')[source]

Bases: toil.test.ToilTest

A common base class for Toil tests.

Please have every test case directly or indirectly inherit this one.

When running tests you may optionally set the TOIL_TEST_TEMP environment variable to the path of a directory where you want temporary test files be placed. The directory will be created if it doesn’t exist. The path may be relative in which case it will be assumed to be relative to the project root. If TOIL_TEST_TEMP is not defined, temporary files and directories will be created in the system’s default location for such files and any temporary files or directories left over from tests will be removed automatically removed during tear down. Otherwise, left-over files will not be removed.

testClusterScaling()[source]

Test scaling for a batch of non-preemptible jobs and no preemptible jobs (makes debugging easier).

testClusterScalingMultipleNodeTypes()[source]
testClusterScalingWithPreemptibleJobs()[source]

Test scaling simultaneously for a batch of preemptible and non-preemptible jobs.

class toil.test.provisioners.clusterScalerTest.MockBatchSystemAndProvisioner(config, secondsPerJob)[source]

Bases: toil.batchSystems.abstractBatchSystem.AbstractScalableBatchSystem, toil.provisioners.abstractProvisioner.AbstractProvisioner

Mimics a leader, job batcher, provisioner and scalable batch system.

start()[source]
shutDown()[source]
nodeInUse(nodeIP)[source]

Can be used to determine if a worker node is running any tasks. If the node is doesn’t exist, this function should simply return False.

Parameters:

nodeIP – The worker nodes private IP address

Returns:

True if the worker node has been issued any tasks, else False

ignoreNode(nodeAddress)[source]

Stop sending jobs to this node. Used in autoscaling when the autoscaler is ready to terminate a node, but jobs are still running. This allows the node to be terminated after the current jobs have finished.

Parameters:

nodeAddress – IP address of node to ignore.

unignoreNode(nodeAddress)[source]

Stop ignoring this address, presumably after a node with this address has been terminated. This allows for the possibility of a new node having the same address as a terminated one.

supportedClusterTypes()[source]

Get all the cluster types that this provisioner implementation supports.

createClusterSettings()[source]

Initialize class for a new cluster, to be deployed, when running outside the cloud.

readClusterSettings()[source]

Initialize class from an existing cluster. This method assumes that the instance we are running on is the leader.

Implementations must call _setLeaderWorkerAuthentication().

setAutoscaledNodeTypes(node_types)[source]

Set node types, shapes and spot bids for Toil-managed autoscaling. :param nodeTypes: A list of node types, as parsed with parse_node_types.

Parameters:

node_types (List[Tuple[Set[toil.provisioners.abstractProvisioner.Shape], Optional[float]]])

getProvisionedWorkers(instance_type=None, preemptible=None)[source]

Returns a list of Node objects, each representing a worker node in the cluster

Parameters:

preemptible – If True only return preemptible nodes else return non-preemptible nodes

Returns:

list of Node

terminateNodes(nodes)[source]

Terminate the nodes represented by given Node objects

Parameters:

nodes – list of Node objects

remainingBillingInterval(node)[source]
addJob(jobShape, preemptible=False)[source]

Add a job to the job queue

getNumberOfJobsIssued(preemptible=None)[source]
getJobs()[source]
getNodes(preemptible=False, timeout=600)[source]

Returns a dictionary mapping node identifiers of preemptible or non-preemptible nodes to NodeInfo objects, one for each node.

Parameters:
  • preemptible (Optional[bool]) – If True (False) only (non-)preemptible nodes will be returned. If None, all nodes will be returned.

  • timeout (int)

addNodes(nodeTypes, numNodes, preemptible)[source]

Used to add worker nodes to the cluster

Parameters:
  • numNodes – The number of nodes to add

  • preemptible – whether or not the nodes will be preemptible

  • spotBid – The bid for preemptible nodes if applicable (this can be set in config, also).

  • nodeTypes (Set[str])

Returns:

number of nodes successfully added

Return type:

int

getNodeShape(nodeType, preemptible=False)[source]

The shape of a preemptible or non-preemptible node managed by this provisioner. The node shape defines key properties of a machine, such as its number of cores or the time between billing intervals.

Parameters:

instance_type (str) – Instance type name to return the shape of.

getWorkersInCluster(nodeShape)[source]
launchCluster(leaderNodeType, keyName, userTags=None, vpcSubnet=None, leaderStorage=50, nodeStorage=50, botoPath=None, **kwargs)[source]

Initialize a cluster and create a leader node.

Implementations must call _setLeaderWorkerAuthentication() with the leader so that workers can be launched.

Parameters:
  • leaderNodeType – The leader instance.

  • leaderStorage – The amount of disk to allocate to the leader in gigabytes.

  • owner – Tag identifying the owner of the instances.

destroyCluster()[source]

Terminates all nodes in the specified cluster and cleans up all resources associated with the cluster. :param clusterName: identifier of the cluster to terminate.

Return type:

None

getLeader()[source]
Returns:

The leader node.

getNumberOfNodes(nodeType=None, preemptible=None)[source]