The Quarry Cluster
TORQUE
TORQUE is the resource manager on Quarry. Tools for job submission and management are available in /usr/local/bin, most of which have associated man pages. Documentation is also available in online:
The PBS paradigm, in a nutshell, is as follows: users submit jobs to various queues on the system, each separate queue representing a group of resources with attributes necessary for the queue's jobs. Commonly used PBS tools include qsub, for job submission; qstat, for monitoring the status of jobs; and qdel, to terminate jobs prior to completion. More detailed information regarding these commands and others is available below, or in the documentation described earlier.
Policy
Interactive jobs, roughly defined as jobs that are not managed by PBS, and therefore that will run on the user nodes, are limited to 20 minutes of CPU time. Any job requiring more than 20 minutes should be submitted to a PBS queue and run on the cluster compute nodes. Monitoring scripts on the user nodes will kill processes exceeding 20 minutes of wallclock time.
Queues
Quarry compute node resources are divided into four queues:
- defaultq - the "default queue", includes 64 Quarry compute nodes
- fastq - "fast queue", includes 4 Quarry compute nodes
- osg - "Open Science Grid queue", includes 16 Quarry compute nodes
- lead - "LEAD queue", includes 16 Quarry compute nodes
Jobs submitted from Quarry login nodes are automatically sent to defaultq. The fastq is intended for test jobs requiring 30 minutes of walltime or less.
Jobs
Scripts
PBS most commonly handles job scripts, although interactive jobs are also supported. A job script may be as simple as a bash or tcsh shell script, but also may include a number of PBS job directives. In any case, it is always recommended that PBS job scripts, which will be executed under your preferred login shell, begin with a "sha-bang" line specifying which command interpreter it should run under. For example:#!/bin/bash
PBS directives, which are lines beginning with the string #PBS, include switches for specifying such useful information as walltime required to complete the job, number of nodes and processors necessary, and filenames for job output and error. An example PBS job script might look like this:
#!/bin/bash #PBS -k o #PBS -l nodes=4:ppn=2,cput=4:00:00,walltime=30:00 #PBS -M username@indiana.edu #PBS -m abe #PBS -N JobName #PBS -j oe mpiexec -np 8 -machinefile $PBS_NODEFILE ~/bin/binarynameLine by line, this script says:
- use bash as the command interpreter for this script
- job output should be kept
- this job requires 4 nodes, 2 processors per node, 4 hours of CPU time and 30 minutes of wall clock time
- send job-related email to username@indiana.edu
- send email if the job is aborted (a), begins (b) and ends (e)
- the job name is JobName
- standard output and standard error should be joined
- execute ~/bin/binaryname on 8 processors from the machines in $PBS_NODEFILE using mpirun
Submission
Submit jobs with the qsub command. If the command exits successfully, a job id will be returned to standard output. For example:[jdoe@Quarry]$ qsub job.script 123456 [jdoe@Quarry]$If you require attribute values different than the defaults, but less than the maximum allowed, specify these either in the job script with PBS directives, or on the command line with the -l switch. For example, to submit a job that needs more than the default two hours of walltime on Quarry:
qsub -l walltime=10:00:00 job.script
There are a couple of things to note here. First, command line arguments override directives in the job script, and second, you may specify many attributes on the command line, either as comma-seperated options following the -l switch, or each with its own -l switch. The following two commands are equivalent:
qsub -l cput=01:30:00,ncpus=16,mem=1024mb job.script
qsub -l cput=01:30:00 -l ncpus=16 -l mem=1024mb job.script
Useful qsub switches include:
- -q queue name (to specify non-default queues)
- -r (job is rerunnable)
- -a date_time (only execute the job after date_time)
- -V (export environment variables in qsub command's environment to the job)
- -I (run interactively, usually for testing purposes)
Monitoring
The qstat command is useful for monitoring the status of a queued or running job. Switches include:- -u user_list (display jobs for users in user_list)
- -a (display all jobs)
- -r (display running jobs)
- -f (display full listing of jobs, excessive detail)
- -n (display nodes allocated to jobs)
For example, to see all the running jobs in the Quarry bq, type this at a Quarry shell prompt:
qstat -r defaultq | less
Deleting
You may delete queued or running jobs with the qdel command. Occasionally, a node will become unresponsive to the point that it cannot respond to the PBS server's requests that a job be killed. In that case, try adding the -W force option to qdel. Otherwise, contact High Performance Systems, hps-admin@iu.edu, for assistance.
Default and Maximum Configurations
Server-wide
- Default Walltime: 2 hours
- Default CPUs per job: 1
- Default Nodes per job: 1
Queue-specific
defaultq- Maximum Walltime: 336 hours (14 days)
- Maximum jobs per user: 16
- Maximum Walltime: 336 hours (14 days)
- Maximum Walltime: 336 hours (14 days)
- Maximum CPUs per job: 8
- Maximum Nodes per job: 2
- Maximum Walltime: 30 minutes




