Main JOBMANAGER concept: Job¶
This chapter explains the main JOBMANAGER concept: a job.
What is a job?¶
A job is a work that a user wants to perform on a computation resource (single computer or a cluster). The JOBMANAGER provides different types of job depending of what a user wants to do.
There are three types of described in the table below.
Type of job | Description |
---|---|
Command script | It’s a shell script containing the user’s commands. This kind of job is not related to SALOME. It could be used to launch any codes. |
SALOME Python script | It’s a Python script that will be launched into a SALOME session dedicated to this script. |
YACS schema | It’s a YACS schema that will be launched into a SALOME session dedicated to this schema. |
Job content description¶
All types of job share some attributes. There could be specific attributes for some types of jobs. These exceptions will be indicated in the future in this documentation. A job has two kinds of attributes: attributes that describes the job himself, and attributes that describes the computation requirements.
The first table below describes the attributes of a job.
Attribute | Mandatory | Description |
---|---|---|
Name | Yes | This is the name of the job. It’s unique into a SALOME session. |
Type | Yes | This is the type of the job. Currently, there are three types: command, python_salome and yacs_file. |
Job file | Yes | This is the name, with the location, of the file containing the job’s data. Depending of the type it could a shell script, a Python script or a YACS schema, e.g. /home/user/work.sh. |
Env file | No | An environment file could be attached to the job. It will be executed before the job. |
Input files | No | A list of files or directories in the user computer that have to copied into the job’s work directory. |
Output files | No | A list of files or directories that have to be copied from the job’s resource to the user computer into the result directory. |
Work directory | Yes | It’s the directory on the job’s resource where the job will be executed. |
Result directory | Yes | It’s the directory in the user computer where the job’s results have to be copied at the end of the job. |
WC Key | No | The Workload Characterization Key is used on some clusters to associate each job with a project or organization. |
The second table below describes the attributes of computation requirements.
Attribute | Description |
---|---|
Maximum duration | It’s the maximum expected duration of the job. When a batch manager is used, this time is interpreted as a walltime and not as a cputime. If maximum duration is not set or set to 0, the time will be set to the default value of the batch queue selected. |
Number of cpu | It’s the number of cpus/cores requested. |
Memory | It’s the amount of required memory. It is generally specified per node. With some batch mangers, it is possible to specify the required memory per core (only available with SLURM for now). |
Queue | It’s optional. It permits to choose a specific batch queue on the targeted cluster. If it is not defined, most of the batch systems will affect your job to the queue that fits with the other attributes requirements. |
Exclusive | It indicates if the job can share nodes with other jobs or not. |
In addition to those attributes, the user can also specify some extra parameters with a few lines that will be added “as is” to the job submission file.
Job’s states¶
A job could have many states in the JOBMANAGER. The table below describes the normal states.
State | Description |
---|---|
Created | The job is correctly created and could be launched. |
In_Process | It’s a transient state between Created and Queued. |
Queued | The job is queued into the resource’s batch manager. |
Paused | The job is paused. Currently the JOBMANAGER GUI does not allow to paused a job. |
Running | The job is running on the resource. |
Finished | The job has run and it’s finished. |
The table below describes the error states.
State | Description |
---|---|
Not Created | This state means that the job cannot be created with it’s current description. It’s often a problem with the selected resource. |
Failed | This state means that the execution of the job in the resource failed. |
Error | This state is used when a job is loaded and that it cannot be followed. It mainly happens when a job was launched into a ssh resource. If the list is saved, an error will happen when the list is loaded (ssh resource cannot be followed). |