The AVIDD-O Cluster
Maui
Maui is an advanced job scheduler for use on clusters and supercomputers. It is a highly optimized and configurable tool capable of supporting a large array of scheduling and fairness policies, dynamic priorities, extensive reservations, and is acknowledged by many as 'the most advanced scheduler in the world'. It is currently in use at hundreds of leading government, academic, and commercial sites throughout the world. It improves the manageability and efficiency of machines ranging from clusters of a few processors to multi-teraflop supercomputers.
On the AVIDD-O cluster, Maui serves as the job scheduler for the Torque resource manager. Once a job has been submitted to the Torque queue, it may become eligible for dispatch by Maui. There are a number of relevant Maui commands that provide useful information regarding a queued or running job's status:
- showq - display active, idle, or all jobs
- showstart <jobid> - display estimated dispatch time for <jobid>
- checkjob <jobid$#62 - display attributes for <jobid>
For more information about these commands as well as other Maui utilities, please see the Maui User's Manual.
Fairshare Scheduling
Fairshare scheduling allows historical resource usage to be considered when making job priority decisions.
Administrators can set target utilization goals for each user, group, class or service group. When these utilization goals are exceeded by one usage class, jobs from other usage classes will take precedent over jobs from the offending class.
Currently, the AVIDD fairshare policy records usage over the last 7 days and decays at a rate of 80% per day. Each usage class (usually a username) has a goal of 20% usage. Anything above that will cause that user's jobs to have a lower scheduling priority.
The diagnose command can be used to display the fairshare scheduling usage table. You can see from the following that the baikgrp and dsheen users have exceeded their "fairshare" and will be given lower priorities over the next week.
[root@ih1 maui-3.2.6]# diagnose -f
FairShare Information
Depth: 7 intervals Interval Length: 1:00:00:00 Decay Rate: 0.80
FS Policy: DEDICATEDPS
System FS Settings: Target Usage: 0.00 Flags: 0
FSInterval % Target 0 1 2 3 4 5 6
FSWeight ------- ------- 1.0000 0.8000 0.6400 0.5120 0.4096 0.3277 0.2621
TotalUsage 100.00 ------- 1872.2 1605.8 631.7 1868.0 3222.6 1857.5 1439.1
USER
-------------
haiyang* 0.00 20.00 ------- ------- ------- ------- ------- ------- -------
baikgrp* 45.91 20.00 81.11 45.57 79.98 49.70 4.88 20.49 10.79
balin* 0.00 20.00 ------- ------- ------- ------- ------- ------- -------
akewalra* 0.00 20.00 ------- ------- ------- ------- ------- ------- -------
kevidale* 0.25 20.00 ------- ------- ------- 0.23 0.74 0.78 -------
dlauer* 0.00 20.00 ------- ------- ------- ------- ------- ------- -------
kmane* 0.00 20.00 ------- ------- ------- ------- ------- ------- -------
bramley* 0.00 20.00 ------- ------- ------- ------- ------- ------- -------
qzou* 0.18 20.00 ------- ------- ------- 0.05 0.34 0.53 1.01
mathess* 0.00 20.00 ------- ------- ------- ------- ------- ------- -------
iyengar* 0.54 20.00 ------- ------- ------- ------- 0.63 2.58 3.34
pewang* 0.00 20.00 ------- ------- ------- ------- ------- ------- -------
rrepasky* 0.00 20.00 ------- ------- ------- ------- ------- ------- -------
agopu* 0.00 20.00 ------- ------- ------- ------- ------- ------- -------
heap* 0.02 20.00 0.09 ------- ------- ------- ------- ------- -------
vsingan* 0.00 20.00 ------- ------- ------- ------- ------- ------- -------
huili* 0.00 20.00 ------- ------- ------- ------- ------- ------- -------
dsheen* 39.26 20.00 14.97 43.03 ------- 33.74 86.48 62.68 -------
turnerg* 0.00 20.00 ------- ------- ------- ------- ------- ------- -------
ejolson* 0.00 20.00 ------- ------- ------- ------- ------- ------- -------
ssrivast* 0.00 20.00 ------- ------- ------- ------- ------- ------- -------
smiddha* 0.00 20.00 ------- ------- ------- ------- ------- ------- -------
mburland* 4.90 20.00 0.17 5.37 4.83 11.14 3.11 5.19 16.89
febertra* 0.00 20.00 ------- ------- ------- ------- ------- ------- -------
lsandvos* 0.01 20.00 ------- 0.06 ------- ------- ------- ------- -------
mswat* 0.00 20.00 ------- ------- ------- ------- ------- ------- -------
acolubri* 5.72 20.00 3.67 5.97 15.20 5.14 3.75 7.75 10.01
mbaik* 3.22 20.00 ------- ------- ------- ------- 0.07 ------- 57.96
When will your job start?
Maui uses the fairshare tables to determine who will get the next open processors to run on. The showq command shows the state of submitted jobs:
[root@ih1 maui-3.2.6]# showq
ACTIVE JOBS--------------------
JOBNAME USERNAME STATE PROC REMAINING STARTTIME
17164 heap Running 1 2:29:45 Wed Sep 17 10:57:18
17182 heap Running 1 2:41:49 Wed Sep 17 11:09:22
17185 heap Running 1 2:43:49 Wed Sep 17 11:11:22
17194 heap Running 1 2:49:51 Wed Sep 17 11:17:24
17195 heap Running 1 2:50:31 Wed Sep 17 11:18:04
17196 heap Running 1 2:51:11 Wed Sep 17 11:18:44
17197 heap Running 1 2:51:52 Wed Sep 17 11:19:25
17198 heap Running 1 2:52:32 Wed Sep 17 11:20:05
17199 heap Running 1 2:53:12 Wed Sep 17 11:20:45
17200 heap Running 1 2:53:52 Wed Sep 17 11:21:25
17201 heap Running 1 2:54:32 Wed Sep 17 11:22:05
17202 heap Running 1 2:55:13 Wed Sep 17 11:22:46
17203 heap Running 1 2:55:53 Wed Sep 17 11:23:26
17204 heap Running 1 2:56:33 Wed Sep 17 11:24:06
17205 heap Running 1 2:57:13 Wed Sep 17 11:24:46
.
.
.
104 Active Jobs 104 of 182 Processors Active (57.14%)
53 of 91 Nodes Active (58.24%)
IDLE JOBS----------------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
16672 ejolson Idle 1 8:08:00:00 Tue Sep 16 23:27:05
16673 ejolson Idle 1 8:08:00:00 Tue Sep 16 23:27:06
16674 ejolson Idle 1 16:16:00:00 Tue Sep 16 23:27:06
16675 ejolson Idle 1 16:16:00:00 Tue Sep 16 23:27:06
16676 ejolson Idle 1 8:08:00:00 Tue Sep 16 23:27:06
16677 ejolson Idle 1 8:08:00:00 Tue Sep 16 23:27:06
16678 ejolson Idle 1 8:08:00:00 Tue Sep 16 23:27:06
16679 ejolson Idle 1 8:08:00:00 Tue Sep 16 23:27:06
16680 ejolson Idle 1 16:16:00:00 Tue Sep 16 23:27:06
16681 ejolson Idle 1 16:16:00:00 Tue Sep 16 23:27:06
16682 ejolson Idle 1 33:08:00:00 Tue Sep 16 23:27:06
16683 ejolson Idle 1 33:08:00:00 Tue Sep 16 23:27:07
12 Idle Jobs
BLOCKED JOBS----------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
Total Jobs: 116 Active Jobs: 104 Idle Jobs: 12 Blocked Jobs: 0
The top of the IDLE JOBS list will run next if there are resources available. Various reservations can prevent jobs from running if they have blocked off resources that waiting jobs would need. The list of reservations can be examined with the command "showres"
But when will your job really start?
To find the estimated start time of a particular job try:
showstart $JOBID
[root@ih1 maui-3.2.6]# showstart 16672
job 16672 requires 1 proc for 8:08:00:00
Earliest start in 5:03:54:32 on Mon Sep 22 17:00:00
Earliest completion in 13:11:54:32 on Wed Oct 1 01:00:00
Best Partition: DEFAULT




