Submitting Batch Jobs on Dawson
The SP nodes process batch jobs via the IBM Loadleveler load-balancing
batch queueing system. It uses a backfill
algorithm which runs shorter jobs ahead of longer jobs if the
available nodes would otherwise sit idle. Otherwise scheduling of jobs
is first come, first served. We have defined six job queues to serve
this cluster. See
Using Queues on Dawson for a more detailed description of the queue
organization of dawson which is summarized in the following table for quick
reference.
Queues
|
Class Name
|
Maximum Nodes
|
CPU Limit (hours)
|
Limit
|
|
pa
|
4
|
1
|
anytime
|
|
pb
|
4
|
4
|
midnight --- 8 am
|
|
pc
|
2
|
4
|
|
|
pd
|
2
|
8
|
|
|
pe
|
1
|
4
|
|
|
pf
|
1
|
8
|
|
|
sa (restricted)
|
1 processor
|
24
|
8 serial jobs (2 node total)
|
|
ia
|
|
20 minutes
|
interactive job
|
Detailed documentation on submitting and managing jobs with LoadLeveler can be found in IBM's
LoadLeveler documentation.
Before you can submit a job or perform any other job related tasks,
you need to build a job command file. A job command file describes the
job you want to submit. The job command file can be the , and can
include LoadLeveler keyword statements. For example, to specify a binary
to be executed, you can use the executable keyword, which is
described later in this section. To specify a shell script to be
executed, the executable keyword can be used; if it is not used,
LoadLeveler assumes that the job command file itself is the executable.
The job command file can include the following:
- LoadLeveler keyword statements: A keyword is a word that
can appear in job command files. A keyword statement is a
statement that begins with a LoadLeveler keyword. These keywords are
described in Job
command file keywords.
- Comment statements: You can use comments to document your job
command files. You can add comment lines to the file as you would in a
shell script.
- Shell command statements: If you use a shell script as the
executable, the job command file can include shell commands.
- LoadLeveler Variables: See
Job command file
variables for more information.
$ llsubmit batch.cmd
an example batch file is as follow:
# @ executable = /usr/bin/poe
# @ arguments = /gpfs/home/username/mpiexample/main
# @ job_type = parallel
# @ output = out.$(jobid)
# @ error = err.$(jobid)
# @ min_processors = 2
# @ max_processors = 4
# @ wall_clock_limit = 00:01:00
# @ class = pd
# @ queue
Be sure to specify a reasonable (and accurate) wall_clock_limit
time in your command file. By informing the scheduler how long your job
will likely run it becomes possible for the system to be more efficiently
utilized. Failure to set a time limit will result in a default time limit
which is much lower than the maximum time limit of the queue.
Also if you consistently set a time limit much higher than your jobs ever use
you may be contacted by a systems administrator to discuss how to improve your
job configurations.
|