toil.batchSystems.abstractGridEngineBatchSystem¶
Attributes¶
Exceptions¶
Common base class for all non-exit exceptions. |
Classes¶
A partial implementation of BatchSystemSupport for batch systems run on a |
Module Contents¶
- toil.batchSystems.abstractGridEngineBatchSystem.logger¶
- toil.batchSystems.abstractGridEngineBatchSystem.JobTuple¶
- exception toil.batchSystems.abstractGridEngineBatchSystem.ExceededRetryAttempts[source]¶
Bases:
Exception
Common base class for all non-exit exceptions.
- class toil.batchSystems.abstractGridEngineBatchSystem.AbstractGridEngineBatchSystem(config, maxCores, maxMemory, maxDisk)[source]¶
Bases:
toil.batchSystems.cleanup_support.BatchSystemCleanupSupport
A partial implementation of BatchSystemSupport for batch systems run on a standard HPC cluster. By default auto-deployment is not implemented.
- Parameters:
config (toil.common.Config)
maxCores (float)
maxMemory (int)
maxDisk (int)
- exception GridEngineThreadException[source]¶
Bases:
Exception
Common base class for all non-exit exceptions.
- class GridEngineThread(newJobsQueue, updatedJobsQueue, killQueue, killedJobsQueue, boss)[source]¶
Bases:
threading.Thread
A class that represents a thread of control.
This class can be safely subclassed in a limited fashion. There are two ways to specify the activity: by passing a callable object to the constructor, or by overriding the run() method in a subclass.
- Parameters:
newJobsQueue (queue.Queue)
updatedJobsQueue (queue.Queue)
killQueue (queue.Queue)
killedJobsQueue (queue.Queue)
- boss¶
- newJobsQueue¶
- updatedJobsQueue¶
- killQueue¶
- killedJobsQueue¶
- runningJobs¶
- runningJobsLock¶
- exception = None¶
- getBatchSystemID(jobID)[source]¶
Get batch system-specific job ID
Note: for the moment this is the only consistent way to cleanly get the batch system job ID
- forgetJob(jobID)[source]¶
Remove jobID passed
- Parameters:
jobID (int) – toil job ID
- Return type:
None
- createJobs(newJob)[source]¶
Create a new job with the given attributes.
Implementation-specific; called by GridEngineThread.run()
- Parameters:
newJob (JobTuple)
- Return type:
- checkOnJobs()[source]¶
Check and update status of all running jobs.
Respects statePollingWait and will return cached results if not within time period to talk with the scheduler.
- coalesce_job_exit_codes(batch_job_id_list)[source]¶
Returns exit codes and possibly exit reasons for a list of jobs, or None if they are running.
Called by GridEngineThread.checkOnJobs().
The default implementation falls back on self.getJobExitCode and polls each job individually
- Parameters:
batch_job_id_list (string) – List of batch system job ID
- Return type:
list[Union[int, tuple[int, Optional[toil.batchSystems.abstractBatchSystem.BatchJobExitReason]], None]]
- abstract prepareSubmission(cpu, memory, jobID, command, jobName, job_environment=None, gpus=None)[source]¶
Preparation in putting together a command-line string for submitting to batch system (via submitJob().)
- Param:
int cpu
- Param:
int memory
- Param:
int jobID: Toil job ID
- Param:
string subLine: the command line string to be called
- Param:
string jobName: the name of the Toil job, to provide metadata to batch systems if desired
- Param:
dict job_environment: the environment variables to be set on the worker
- Return type:
List[str]
- Parameters:
- abstract submitJob(subLine)[source]¶
Wrapper routine for submitting the actual command-line call, then processing the output to get the batch system job ID
- Param:
string subLine: the literal command line string to be called
- Return type:
string: batch system job ID, which will be stored internally
- abstract getRunningJobIDs()[source]¶
Get a list of running job IDs. Implementation-specific; called by boss AbstractGridEngineBatchSystem implementation via AbstractGridEngineBatchSystem.getRunningBatchJobIDs()
- Return type:
- abstract killJob(jobID)[source]¶
Kill specific job with the Toil job ID. Implementation-specific; called by GridEngineThread.killJobs()
- Parameters:
jobID (string) – Toil job ID
- abstract getJobExitCode(batchJobID)[source]¶
Returns job exit code and possibly an instance of abstractBatchSystem.BatchJobExitReason.
Returns None if the job is still running.
If the job is not running but the exit code is not available, it will be EXIT_STATUS_UNAVAILABLE_VALUE. Implementation-specific; called by GridEngineThread.checkOnJobs().
The exit code will only be 0 if the job affirmatively succeeded.
- Parameters:
batchjobID (string) – batch system job ID
- Return type:
Union[int, tuple[int, Optional[toil.batchSystems.abstractBatchSystem.BatchJobExitReason]], None]
- config¶
- currentJobs¶
- newJobsQueue¶
- updatedJobsQueue¶
- killQueue¶
- killedJobsQueue¶
- background_thread¶
- classmethod supportsAutoDeployment()[source]¶
Whether this batch system supports auto-deployment of the user script itself.
If it does, the
setUserScript()
can be invoked to set the resource object representing the user script.Note to implementors: If your implementation returns True here, it should also override
- count_needed_gpus(job_desc)[source]¶
Count the number of cluster-allocateable GPUs we want to allocate for the given job.
- Parameters:
job_desc (toil.job.JobDescription)
- issueBatchJob(command, job_desc, job_environment=None)[source]¶
Issues a job with the specified command to the batch system and returns a unique job ID number.
- Parameters:
command (str) – the command to execute somewhere to run the Toil worker process
job_desc (toil.job.JobDescription) – the JobDescription for the job being run
job_environment (Optional[dict[str, str]]) – a collection of job-specific environment variables to be set on the worker.
- Returns:
a unique job ID number that can be used to reference the newly issued job
- killBatchJobs(jobIDs)[source]¶
Kills the given jobs, represented as Job ids, then checks they are dead by checking they are not in the list of issued jobs.
- getRunningBatchJobIDs()[source]¶
Retrieve running job IDs from local and batch scheduler.
Respects statePollingWait and will return cached results if not within time period to talk with the scheduler.
- getUpdatedBatchJob(maxWait)[source]¶
Returns information about job that has updated its status (i.e. ceased running, either successfully or with an error). Each such job will be returned exactly once.
Does not return info for jobs killed by killBatchJobs, although they may cause None to be returned earlier than maxWait.
- Parameters:
maxWait – the number of seconds to block, waiting for a result
- Returns:
If a result is available, returns UpdatedBatchJobInfo. Otherwise it returns None. wallTime is the number of seconds (a strictly positive float) in wall-clock time the job ran for, or None if this batch system does not support tracking wall time.
- shutdown()[source]¶
Signals thread to shutdown (via sentinel) then cleanly joins the thread
- Return type:
None
- setEnv(name, value=None)[source]¶
Set an environment variable for the worker process before it is launched. The worker process will typically inherit the environment of the machine it is running on but this method makes it possible to override specific variables in that inherited environment before the worker is launched. Note that this mechanism is different to the one used by the worker internally to set up the environment of a job. A call to this method affects all jobs issued after this method returns. Note to implementors: This means that you would typically need to copy the variables before enqueuing a job.
If no value is provided it will be looked up from the current environment.
- Parameters:
name – the environment variable to be set on the worker.
value – if given, the environment variable given by name will be set to this value. If None, the variable’s current value will be used as the value on the worker
- Raises:
RuntimeError – if value is None and the name cannot be found in the environment