toil.leader¶
The leader script (of the leader/worker pair) for running jobs.
Attributes¶
Classes¶
Represents the Toil leader. |
Module Contents¶
- toil.leader.logger¶
- class toil.leader.Leader(config, batchSystem, provisioner, jobStore, rootJob, jobCache=None)[source]¶
Represents the Toil leader.
Responsible for determining what jobs are ready to be scheduled, by consulting the job store, and issuing them in the batch system.
- Parameters:
config (toil.common.Config)
batchSystem (toil.batchSystems.abstractBatchSystem.AbstractBatchSystem)
provisioner (Optional[toil.provisioners.abstractProvisioner.AbstractProvisioner])
rootJob (toil.job.JobDescription)
jobCache (Optional[dict[Union[str, toil.job.TemporaryID], toil.job.JobDescription]])
- config¶
- jobStore¶
- jobStoreLocator¶
- toilState¶
- batchSystem¶
- preemptibleJobsIssued = 0¶
- serviceJobsIssued = 0¶
- preemptibleServiceJobsIssued = 0¶
- timeSinceJobsLastRescued = None¶
- provisioner¶
- clusterScaler = None¶
- serviceManager¶
- statsAndLogging¶
- potentialDeadlockTime = 0¶
- toilMetrics: toil.common.ToilMetrics | None = None¶
- debugJobNames = ('CWLJob', 'CWLWorkflow', 'CWLScatter', 'CWLGather', 'ResolveIndirect')¶
- deadlockThrottler¶
- statusThrottler¶
- kill_throttler¶
- progress_overall = None¶
- progress_failed = None¶
- GOOD_COLOR = (0, 60, 108)¶
- BAD_COLOR = (253, 199, 0)¶
- PROGRESS_BAR_FORMAT = '{desc}{desc_pad}{percentage:3.0f}%|{bar}| {count:{len_total}d}/{total:d} ({count_1:d} failures)...¶
- recommended_fail_exit_code = 1¶
- run()[source]¶
Run the leader process to issue and manage jobs.
- Raises:
toil.exceptions.FailedJobsException if failed jobs remain after running.
- Returns:
The return value of the root job’s run function.
- Return type:
Any
- create_status_sentinel_file(fail)[source]¶
Create a file in the jobstore indicating failure or success.
- Parameters:
fail (bool)
- Return type:
None
- feed_deadlock_watchdog()[source]¶
Note that progress has been made and any pending deadlock checks should be reset.
- Return type:
None
- issueJob(jobNode)[source]¶
Add a job to the queue of jobs currently trying to run.
- Parameters:
jobNode (toil.job.JobDescription)
- Return type:
None
- issueServiceJob(service_id)[source]¶
Issue a service job.
Put it on a queue if the maximum number of service jobs to be scheduled has been reached.
- Parameters:
service_id (str)
- Return type:
None
- issueQueingServiceJobs()[source]¶
Issues any queuing service jobs up to the limit of the maximum allowed.
- getNumberOfJobsIssued(preemptible=None)[source]¶
Get number of jobs that have been added by issueJob(s) and not removed by removeJob.
- removeJob(jobBatchSystemID)[source]¶
Remove a job from the system by batch system ID.
- Returns:
Job description as it was issued.
- Parameters:
jobBatchSystemID (int)
- Return type:
- getJobs(preemptible=None)[source]¶
Get all issued jobs.
- Parameters:
preemptible (Optional[bool]) – If specified, select only preemptible or only non-preemptible jobs.
- Return type:
- killJobs(jobsToKill, exit_reason=BatchJobExitReason.KILLED)[source]¶
Kills the given set of jobs and then sends them for processing.
Returns the jobs that, upon processing, were reissued.
- Parameters:
exit_reason (toil.batchSystems.abstractBatchSystem.BatchJobExitReason)
- reissueOverLongJobs()[source]¶
Check each issued job.
If a job is running for longer than desirable issue a kill instruction. Wait for the job to die then we pass the job to process_finished_job.
- Return type:
None
- reissueMissingJobs(killAfterNTimesMissing=3)[source]¶
Check all the current job ids are in the list of currently issued batch system jobs.
If a job is missing, we mark it as so, if it is missing for a number of runs of this function (say 10).. then we try deleting the job (though its probably lost), we wait then we pass the job to process_finished_job.
- process_finished_job(batch_system_id, result_status, wall_time=None, exit_reason=None)[source]¶
Process finished jobs.
Called when an attempt to run a job finishes, either successfully or otherwise.
Takes the job out of the issued state, and then works out what to do about the fact that it succeeded or failed.
- Returns:
True if the job is going to run again, and False if the job is fully done or completely failed.
- Return type:
- process_finished_job_description(finished_job, result_status, wall_time=None, exit_reason=None, batch_system_id=None)[source]¶
Process a finished JobDescription based upon its success or failure.
If wall-clock time is available, informs the cluster scaler about the job finishing.
If the job failed and a batch system ID is available, checks for and reports batch system logs.
Checks if it succeeded and was removed, or if it failed and needs to be set up after failure, and dispatches to the appropriate function.
- Returns:
True if the job is going to run again, and False if the job is fully done or completely failed.
- Parameters:
finished_job (toil.job.JobDescription)
result_status (int)
wall_time (Optional[float])
exit_reason (Optional[toil.batchSystems.abstractBatchSystem.BatchJobExitReason])
batch_system_id (Optional[int])
- Return type: