What is TORQUE/PBS?
On this page:
Introduction
TORQUE, also known by its historical name, Portable Batch System
(PBS), is the resource manager on the Quarry system at
Indiana University. Tools for job submission and management are
available in /usr/local/bin; most tools have associated
man pages. For detailed online information, such as user
manuals and administrator guides, see the additional
documentation below.
TORQUE manages jobs that users submit to various queues on the
system, each queue representing a group of resources with attributes
necessary for the queue's jobs. Commonly used TORQUE tools include
qsub, for job submission; qstat, for
monitoring the status of jobs; and qdel, for terminating
jobs prior to completion. More detailed information regarding these
commands and others is available below, or in the documentation
mentioned above.
Policy
Jobs that are run interactively on the user nodes are limited to 20 minutes of CPU time. Monitoring scripts on the user nodes will kill processes exceeding 20 minutes of wall clock time.
Run jobs that require more than 20 minutes of CPU time on the
interactive nodes, b005-b008. To access one
of these nodes, you must first log into Quarry and from
there use ssh to connect to b005,
b006, b007, or b008.
Queues
The following queues are available on Quarry:
Note: Cluster-wide, the maximum number of tasks is 928 (116 compute nodes available [112 in queues, 4 in user-selectable debug] X 8 tasks per node).
SERIAL queue properties
-
Nodes: 5 serial
(
q029-q033) + 33 normal (q034-q066) + 46 long (q067-q112) + 28 himem (q113-q140) = 112 total - Maximum walltime: 12 hours
- Maximum nodes per job: 1 node
- Maximum cores per job: 8 cores
- Maximum number of jobs per queue: 3,500
- Maximum number of jobs per user: 1,500
- Direct submission: No
NORMAL queue properties
-
Nodes: 33 normal
(
q034-q066) + 46 long (q067-q112) = 79 total - Maximum walltime: 7 days
- Maximum nodes per job: 2 nodes
- Maximum cores per job: 16 cores
- Maximum number of jobs per queue: None
- Maximum number of jobs per user: None
- Direct submission: No
LONG queue properties
-
Nodes: 46 long
(
q067-q112) = 46 total - Maximum walltime: 14 days
- Maximum nodes per job: 42 nodes
- Maximum cores per job: 336 cores
- Maximum number of jobs per queue: 4,000
- Maximum number of jobs per user: 500
- Direct submission: No
Note: The LONG queue was formerly known as DEFAULTQ.
An alias for the old name is in place for job submissions, but when
you use commands like llclass and showq -w
class=XXXXX, you must use the new name.
DEBUG queue properties
- Nodes: 4 blades dedicated
- Maximum walltime: 30 min
- Maximum nodes per job: 2 nodes
- Maximum cores per job: 16 cores
- Maximum number of jobs per queue: None
- Maximum number of jobs per user: 2
- Direct submission: Yes
Note: The DEBUG queue was formerly known as FASTQ.
An alias for the old name is in place for job submissions, but when
you use commands like llclass and showq -w
class=XXXXX, you must use the new name.
HIMEM queue properties
-
Nodes: 28 himem
(
q113-q140) = 28 total - Maximum walltime: 14 days
- Maximum nodes per job: 28 (= maximum in queue)
- Maximum cores per job: 224 (= maximum in queue)
- Maximum jobs per queue: None
- Maximum jobs per user: None
- Direct submission: Yes, for now
OSG queue properties (group restricted access)
- Nodes: 16 blades dedicated
- Maximum walltime: 14 days
- Maximum nodes per job: None
- Maximum cores per job: None
- Maximum number of jobs per queue: None
- Maximum number of jobs per user: None
- Direct submission: Yes
Note: Access to the osg queue is
restricted and jobs are routed to this queue via Globus.
The debug queue is for debugging only; once code has been
debugged, you may submit it to the long queue. If you
have questions about the job queues, email High
Performance Systems.
Note: The OSG queue was formerly known as OSGQ.
An alias for the old name is in place for job submissions, but when
you use commands like llclass and showq -w
class=XXXXX, you must use the new name.
If you do not specify a queue when you submit a job, it will
automatically go into the long queue. The
debug queue is intended for test jobs requiring 30
minutes or less of wall clock time. To use this queue you must submit
your job directly to it:
Jobs
Scripts
TORQUE most commonly handles job scripts, although interactive jobs are
also supported. A job script may be as simple as a bash
or tcsh shell script, but also may include a number of
TORQUE job directives. You must always begin TORQUE job scripts, which will
be executed under your preferred login shell, with a "shebang" line
specifying which command interpreter it should run under, for example:
TORQUE directives, which are lines beginning with the string
#PBS, include switches for specifying such useful
information as wall clock time required to complete the job, number of
nodes and processors necessary, and filenames for job output and
errors. These directives must be at the top of the script following
the "shebang" line. An example TORQUE job script might look like this:
Line by line, this script says:
- Use
bashas the command interpreter for this script. - Keep the job output.
- This job requires four nodes, two processors per node, and 30 minutes of wall clock time
- Send job-related email to
username@indiana.edu. - Send email if the job is aborted (a), when it begins (b), and when it ends (e).
- The job name is JobName.
- Join standard output and standard error.
- Execute
~/bin/binarynameon eight processors from the machines in$PBS_NODEFILEusingmpirun.
For additional details on TORQUE directives, view the man
pages by entering man qsub .
Submission
Submit jobs with the qsub command. If the command exits
successfully, a job ID will be returned to standard output, for
example:
If you require attribute values different from the defaults, but less
than the maximum allowed, specify these either in the job script with
TORQUE directives, or on the command line with the
-l switch. For example, to submit a job that
needs more than the default 30 minutes of walltime on Quarry, use:
Note that command-line arguments override directives in the job
script, and that you may specify many attributes on the command line,
either as comma-separated options following the
-l switch, or each with its own
-l switch. The following two commands are
equivalent:
Useful qsub switches include:
-q queue name |
To specify non-default queues |
-r |
Job is rerunnable |
-a date_time |
Execute the job only after
date_time.
|
-V |
Export environment variables in your current environment to the job. |
-I |
Run interactively, usually for testing purposes |
See the qsub man page for more information.
Monitoring
The qstat command is useful for monitoring the status of
a queued or running job. Switches include:
-u user_list |
Display jobs for users in
user_list.
|
-a |
Display all jobs. |
-r |
Display running jobs. |
-f |
Display full listing of jobs (excessive detail). |
-n |
Display nodes allocated to jobs. |
For example, to see all the running jobs in the Quarry
long queue, at the Quarry shell prompt, enter:
Another useful command for monitoring jobs is the Moab Scheduler
showq. To list the queued jobs in dispatch order, enter:
See the showq man page for more information.
Deleting
Use the qdel command to delete queued or running
jobs. Occasionally, a node will become unresponsive to the point that
it cannot respond to the TORQUE server's requests to kill a job. In
that case, try adding the -W (uppercase W) force option
to qdel. If that doesn't work, contact High
Performance Systems for help.




