Overview

When submitted through the prisms_jobs Python package or the included scripts, cluster jobs are stored in a SQLite jobs database. This allows for convenient monitoring and searching of submitted jobs.

It is often necessary to submit multiple jobs until a particular task is complete, whether due to walltime or other limitations. prisms-jobs distinguishes and tracks both individual “jobstatus” (‘R’, ‘Q’, ‘C’, ‘E’, etc.) and “taskstatus”. Jobs marked as ‘auto’ can be automatically or easily resubmitted until the “taskstatus” is “Complete”.

Possible values for “taskstatus” are:

“Complete” Job and task are complete.
“Incomplete” Job or task are incomplete.
“Continued” Job is complete, but task was not complete.
“Check” Non-auto job is complete and requires user input for status.
“Error:.*” Some kind of error was noted.
“Aborted” The job and task have been aborted.

Jobs are marked ‘auto’ either by submitting through the python class prisms_jobs.Job with the attribute auto=True, or by submitting a script which contains the line #auto=True using the included psub command line program.

Jobs can be monitored using the command line program pstat. All ‘auto’ jobs which have stopped can be resubmitted using pstat --continue. In this case, ‘continuation_jobid’ is set with the jobid for the next job in the series of jobs comprising a task.

Example screen shot:

$ pstat


Tracked:
JobID        JobName                  Nodes Procs     Walltime S      Runtime Task                     A ContJobID
------------ ------------------------ ----- ----- ------------ - ------------ ------------------------ - ------------
11791024     STDIN                      1     1     0:01:00:00 Q            - Incomplete               1 -
11791025     STDIN                      1     1     0:01:00:00 Q            - Incomplete               1 -


Untracked:
JobID        JobName                  Nodes Procs     Walltime S      Runtime Task                     A ContJobID
------------ ------------------------ ----- ----- ------------ - ------------ ------------------------ - ------------
11791026     taskmaster                 1     1     0:01:00:00 W   0:01:00:00 Untracked                0 -

Additionally, when scheduling periodic jobs is not allowed other ways, the taskmaster script can fully automate this process. taskmaster executes pstat --continue and then resubmits itself to execute again periodically. As not all compute resources allow this behavior, remember check the policy prior to using taskmaster on a new compute resource.

A script marked ‘auto’ should check itself for completion and when reached execute pstat --complete $JOBID --force in bash, or prisms_jobs.complete_job() in Python. If an ‘auto’ job script does not set its taskstatus to “Complete” it may continue to be resubmitted indefinitely.

Jobs not marked ‘auto’ are shown with the status “Check” in pstat until the user marks them as “Complete”.