login.deepthought.umd.edu.
Here's an example of a simple script, we'll call test.sh:
#PBS -lwalltime=1:00 #PBS -lncpus=4 hostname date |
The first two lines specify parameters to the scheduler. The first, walltime, specifies the maximum amount of time you expect your job to run. The walltime parameter is in the form HH:MM:SS. If you leave off any digits, the ones you provide will be assumed to be the smallest units available, for instance a walltime of 1:00 is equal to one minute. You should specify a reasonable estimate for this number, because if you specify too large of a number your job may not be scheduled appropriately, and if you specify too small of a number your job will be terminated before it completes.
There are two parameters on the second line. The first,
nodes, tells the scheduler how many nodes on which you
want your job to run. The ppn parameter defines how many
processors on each node you'll need. Currently the definition of
nodes is rather misleading, as if you specify a
ppn value smaller than the number of processors on a
given node, you may end up with fewer actual nodes than you specify.
For example, if you specify nodes=2,ppn=2, if there's a 4
CPU machine available, you'll be allocated 4 CPUs on that one node.
If you want to be sure to get multiple machines, specifying
ppn=4 is your best bet.
For a more detailed method of specifying CPU/machine requirements,
check out the examples section.
The remaining lines in the file are just standard commands, you will
replace them with whatever your job requires. In this case once the
job runs, it will print out the time and hostname to the output file.
By default the script will be run in whatever shell you use to log in
to the cluster, so if your normal shell is tcsh then the
script will be run inside tcsh. If you want to change
this, check out the examples section.
To submit your job, pick a queue that fits your needs, we'll choose the queue serial for this test, and then submit the job. (The serial queue is the default queue, but for this example we'll specify it anyway.)
deepthought:~: qsub -q serial test.sh 4178.deepthought.umd.edu |
The number that is returned to you is the identifier for the job, and you should use that anytime you want to find out more information about your job. For information on how to verify that your job is running, see the section Monitoring Your Jobs.
Once your job completes, unless you've specified otherwise, your
output and any errors that occur will be written to two files in the
same directory from which you submitted your job. The files will be
named with the same name as your job script, with .eNNNN
and .oNNNN appended, where the Ns are replaced by the job
identifier.
Note that by default when you log in to the cluster, you are sitting in your home directory, and all output and submissions will be transferred to and from your home directory. For best performance, you should consider running your jobs from a space set aside for them. See Files and Storage and the qsub example on Running Your Job in a Different Directory for more information.
Here's what you should see when your job completes:
deepthought:~: cat test.sh.o4178 Warning: no access to tty (Bad file descriptor). Thus no job control in this shell. compute-2-39.deepthought.umd.edu Mon Jan 22 11:13:09 EST 2007 deepthought:~: cat test.sh.e4178 term: Undefined variable. |
As you can see in the output files above, the script ran and printed the hostname and date as specified by the job script. The few error messages that you see above are expected and can be ignored.
In addition to queue priorities, users of the cluster with paid
allocations (users that contribute money or resources to the cluster)
get priority over non-paying users. All users are provided with a
certain number of service units (SUs) as determined by the
HPCC Allocations and
Advisory Committee. In addition, "free" usage of the cluster is
provided to users with paid or non-paid allocations, assuming cycles
are available. "Free" jobs run at low priority and will be preempted
(evicted) if a higher-priority job comes along. To specify a queue,
use the -q option to qsub. Note:
paid users will also need to specify their high-priority account in
order to take advantage of their elevated priority. If no account is
specified, the default priorities will be used. See the section Job Accounting for information on how to
specify an alternate account.
The queues are as follows:
| queue | #nodes | wallclock | priority | notes | ||
|---|---|---|---|---|---|---|
| min | max | min | max | |||
| debug | 2 | 15 min | high | always available; use for interactive jobs | ||
| wide-debug | 5 | 100% | 15 min | high | ||
| narrow-med | 20% | 8 hr | med | |||
| wide-short | 5 | 100% | 2 hr | med | ||
| narrow-long | 20% | 3 days | low | |||
| narrow-extended | 25% | 2 weeks | low | paid allocations only | ||
| med-extended | 5 | 50% | 1 week | low | paid allocations only | |
| wide-med | 5 | 100% | 8 hr | low | ||
| ib | 2 | 100% | 1 week | high | InfiniBand connected hosts | |
| serial | 100% | unlimited | very low | free; preemptable | ||
$PBS_NODEFILE which contains the name of a file that
lists all of the nodes that you've been assigned.
$PBS_NODEFILE and as such you don't need to include it on the
command line. OpenMPI is also compiled to support all of the various
interconnect hardware, so for nodes with fast transport (Infiniband/Myrinet),
the fastest interface will be selected automatically.
The following example will run the MPI executable alltoall on
each of four processors on ten different nodes. Note that you will
need to add the command tap -q openmpi-gnu to your
.cshrc.mine file to set up your
environment properly to run OpenMPI. For further information on the
tap command check out the section Setting
Up Your Environment.
#PBS -l nodes=10:ppn=4 #PBS -l walltime=00:00:60 mpirun -np 40 alltoall |
alltoall on
each of four processors on ten different nodes. Note that you will
need to add the command tap -q lam-gnu (or one of the other
MPI flavors) to your .cshrc.mine file to set up your
environment properly to run LAM. For further information on the
tap command check out the section Setting
Up Your Environment.
#PBS -l nodes=10:ppn=4 #PBS -l walltime=00:00:60 lamboot $PBS_NODEFILE mpirun C alltoall lamhalt |
If you see errors in your output of the form "LAM failed to execute a
LAM binary on the remote node X", it is most likely because you failed
to add the appropriate tap command to your
.cshrc.mine file.
alltoall on
each of four processors on ten different nodes. Note that you will
need to add the command tap -q mpich-gnu (or one of the other
MPI flavors) to your .cshrc.mine file to set up your
environment properly to run MPICH. For further information on the
tap command check out the section Setting
Up Your Environment.
Note also that if you've never run MPICH before, you'll need to create the file .mpd.conf in your home directory. This file should contain at least a line of the form MPD_SECRETWORD=we23jfn82933. (DO NOT use the example provided, make up your own secret word.)
#PBS -l nodes=10:ppn=4 #PBS -l walltime=00:00:60 mpdboot -n 10 -f $PBS_NODEFILE mpiexec -n 40 alltoall mpdallexit |
#PBS -l nodes=10:ppn=4 #PBS -l walltime=00:00:60 foreach node (`cat $PBS_NODEFILE`) ssh $node hostname end |
And if your shell is sh/ksh/bash, use this:
#PBS -l nodes=10:ppn=4 #PBS -l walltime=00:00:60 for node in `cat $PBS_NODEFILE`; do ssh $node hostname done |
#PBS -l nodes=2:ppn=4:mhz3000 #PBS -l walltime=00:00:60 myjob |
nodes=2:ppn=2, both of these jobs can be scheduled
simultaneously onto the same 4-processor machine.
If you want to request a specific amount of memory for your job, try something like the following:
#PBS -l nodes=1:ppn=4 #PBS -l mem=1024mb myjob |
This example requests a single 4 processor node with 1GB (1024MB) of memory.
Most of the nodes currently have at least 30GB of scratch space, and some have as much as 250GB available. Scratch space is currently mounted as /tmp. Scratch space will be cleared once your job completes.
The following example specifies a scratch space requirement of 5GB. Note however that if you do this, the scheduler will set a filesize limit of 5GB. If you then try to create a file larger than that, your job will automatically be killed, so be sure to specify a size large enough for your needs.
#PBS -l nodes=1:ppn=4 #PBS -l file=5gb myjob |
If you want to be notified via email when your job completes, you can
add the -mXX option to your description file. If you want
to receive mail when the job starts, replace the Xs with the
letter b. If you want to receive mail when your job
completes, replace the Xs with the letter e. You
may add both letters if you like, and you'll get two email messages. By
default, you will always be sent email if your job is aborted by the
scheduler for any reason. The completion email will tell you the exit
status of your job as well as the amount of resources the job
consumed. Note that the CPU time and memory usage numbers provided in
this email are unreliable at best. The email messages by default will
be sent to your Glue account. If you'd like them to go elsewhere, you
can add the -M option followed by a comma-seperated list
of usernames.
#PBS -l walltime=00:00:60 #PBS -mbe -Mbob@myhost.com,jane@yourhost.com date |
-S option to your description file. Also note that when
using the bash shell, you must explicitly run your
.profile script, as it is not run for you automatically.
If you have tap commands in your submit script, this is
especially important because tap is defined in
.profile. If you're using tcsh you don't
need to worry about this.
The following example changes to using /bin/bash as the
execution shell.
#PBS -lwalltime=00:01:00 #PBS -S /bin/bash . ~/.profile # only needed for bash shell date hostname |
/data/dt-raid5/bob/my_program when you submit your
job, when the job runs, it will look in your home directory for any
files that don't have a full pathname specified. To change this
behavior, you'll need to add the -d argument to your job
description file.
Also note that if you are using MPI, you may also need to add either
the -wd option for LAM (mpirun) or the
-wdir option for MPICH (mpiexec) to specify the
working directory.
The following example (using LAM) switches the working directory to
/data/dt-raid5/bob/my_program.
#PBS -lwalltime=00:01:00 #PBS -d /data/dt-raid5/bob/my_program lamboot $PBS_NODEFILE mpirun -wd /data/dt-raid5/bob/my_program C alltoall lamhalt |
To specify your estimated runtime, use the walltime
parameter. This value should be specified in the form
HHH:MM:SS. Note that if your job is expected to run over
multiple days, simply convert the number of days into hours- for
example a 3 day job would have a walltime value of 72:00:00.
You may leave off the leading digits if you like- so a walltime of
15:00 will represent 15 minutes. Note also that while the
scheduler may show walltimes in the form DD:HH:MM:SS when you
view the queue status, this format will not be accepted when you
submit a job.
If you do not specify a walltime, the default (maximum) permitted walltime for the queue will be used. See the section entitled Choosing a Queue for more information on queues and their assigned limits.
The following example specifies a walltime of 60 seconds, which should be more than enough for the job to complete.
#PBS -l nodes=1:ppn=4 #PBS -l walltime=00:00:60 hostname |
showq. For example:
deepthought:~: showq
ACTIVE JOBS--------------------
JOBNAME USERNAME STATE PROC REMAINING STARTTIME
4178 kevin Running 4 00:01:00 Mon Jan 22 11:13:09
1 Active Job 4 of 236 Processors Active (1.69%)
1 of 59 Nodes Active (1.69%)
IDLE JOBS----------------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
0 Idle Jobs
BLOCKED JOBS----------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
Total Jobs: 1 Active Jobs: 1 Idle Jobs: 0 Blocked Jobs: 0
|
If your job shows up in the ACTIVE JOBS section as shown above,
your job should be off and running.
If your job shows up in the
IDLE JOBS section, that means that there currently are
insufficient resources available to run your job. Check to make sure
you haven't requested more processors than you need, and that you've
specified a reasonable walltime. If you see lots of jobs in the
ACTIVE JOBS section, it's probable that you'll just need to
wait for someone else's job to finish before yours can start.
If your job shows up in the BLOCKED JOBS section, it most
likely means that you did not have a sufficient amount of time
remaining in your CPU allocation to run the job. Either specify a
smaller walltime, or obtain an additional allocation. See the
section Diagnosing Job Problems for further
information.
To find out more detailed information about your job, use the
checkjob command. This command will show you which
specific nodes were allocated to your job, and it will also show you
the job requirements you specified when you submitted the job.
deepthought:~: checkjob 4209 checking job 4209 State: Running Creds: user:kevin group:wheel account:kevin class:serial qos:serial WallTime: 00:00:00 of 00:01:00 SubmitTime: Tue Jan 23 10:33:55 (Time Queued Total: 00:00:01 Eligible: 00:00:01) StartTime: Tue Jan 23 10:33:56 Total Tasks: 1 Req[0] TaskCount: 1 Partition: DEFAULT Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0 Opsys: [NONE] Arch: [NONE] Features: [prod] Allocated Nodes: [compute-2-39.deeptho:1] IWD: [NONE] Executable: [NONE] Bypass: 0 StartCount: 1 PartitionMask: [ALL] Flags: RESTARTABLE PREEMPTEE PREEMPTOR Attr: PREEMPTEE Reservation '4209' (00:00:00 -> 00:01:00 Duration: 00:01:00) PE: 1.00 StartPriority: 200 |
If you want to view the output of your job while it is running, you
can use the command qpeek. This command can be used to view
both the standard output and standard error streams from your job, and
can also be used to follow the output as it occurs.
deepthought:~: qpeek
qpeek: Peek into a job's output spool files
Usage: qpeek [options] JOBID
Options:
-c Show all of the output file ("cat", default)
-h Show only the beginning of the output file ("head")
-t Show only the end of the output file ("tail")
-f Show only the end of the file and keep listening ("tail -f")
- |
canceljob
command.
deepthought:~: canceljob 7274 job '7274' cancelled |
mybalance.
deepthought:~: mybalance Project Machines Balance -------- -------- -------- test ANY 12093571 test-hi ANY 71999976 |
This shows you the balance remaining (in seconds) in all of the accounts to which you are authorized to charge. The account with the -hi suffix is your high-priority account.
To submit jobs to an account other than your default
(standard-priority) account, use the -A option to
qsub.
deepthought:~: qsub -A test-hi test.sh 4194.deepthought.umd.edu |
checkjob command to get more
information about why your job isn't running.
deepthought:~: checkjob 4195 [ ... deleted for brevity ... ] job is deferred. Reason: NoResources (cannot create reservation for job '4195' (intital reservation attempt)) Holds: Defer (hold reason: NoResources) PE: 232.00 StartPriority: 200 cannot select job 4195 for partition DEFAULT (job hold active) |
In this example, we see that the job was deferred because there are insufficient resources available to run the job. Once sufficient resources become available, the job will run automatically.
If instead, you see the following as part of the checkjob output, it means that the job you are trying to run will exceed the allocation you have remaining. This may simply be because you did not specify a walltime as part of your job specification. If your specifications are correct, you can either resubmit your job to your standard-priority account, or to the free serial queue, or you can request an additional allocation from the committee.
deepthought:~: checkjob 4204 [ ... deleted for brevity ... ] job is deferred. Reason: BankFailure (cannot debit job account) Holds: Defer (hold reason: BankFailure) PE: 32.00 StartPriority: 200 cannot select job 4204 for partition DEFAULT (job hold active) |
If none of the above conditions apply, and your job is listed in the IDLE JOBS section, keep the following in mind:
showq command lists jobs according to
priority order, with the highest priority jobs listed first.
If you need shell access to additional nodes, provided some are
available you can ask the scheduler to assign them to you with
qsub -I. Assuming your requirements are met, you will be
given a shell on the first node, and on that node,
$PBS_NODEFILE will be set to the name of a file
containing the list of nodes to which you now have access. You can
then ssh to and between any of the nodes in that list, and you can
also ssh to all of your assigned nodes from the head node.
For example, if you want to request two seperate nodes, try this:
deepthought:~: qsub -lnodes=2:ppn=4 -lwalltime=00:15:00 -I qsub: waiting for job 4216.deepthought.umd.edu to start qsub: job 4216.deepthought.umd.edu ready DISPLAY not set. compute-2-39:~: cat $PBS_NODEFILE compute-2-39.deepthought.umd.edu compute-2-39.deepthought.umd.edu compute-2-39.deepthought.umd.edu compute-2-39.deepthought.umd.edu compute-2-38.deepthought.umd.edu compute-2-38.deepthought.umd.edu compute-2-38.deepthought.umd.edu compute-2-38.deepthought.umd.edu compute-2-39:~: ssh compute-2-38 date Tue Jan 23 11:22:48 EST 2007 |
PATH.
Your account as provided gives you access to the basic tools needed to
submit and monitor jobs, access basic Gnu compilers, etc. It is
HIGHLY suggested that you DO NOT remove or modify the dot files
(.cshrc, .profile, etc) that are provided
for you. Instead, add any customizations you need to the alternate
set of files described here.
If you choose to modify the system default files, you run the risk of
losing any systemwide changes that are necessary to keep your account
running smoothly.
For packages that are not included in your default environment, the
tap command is provided. When run, this command will
modify your current session by adding the appropriate entries to your
PATH, MANPATH, LD_LIBRARY_PATH
and will set any other variables necessary to ensure the proper
functioning of the package in question. Note that these changes are
temporary and only exist until you log out. If you want to have
tap run for you automatically, add the command tap
-q <package> to your .cshrc.mine file. (The
-q argument prevents tap from displaying any
text output when it runs, which can confuse some shells.) If
you run the tap command without any arguments, it will
provide a list of available packages. Note that many of these
packages are not accessible on the cluster by default, if you want
access to them, let us know and if possible, we'll make them available.
For example, if you want to run Matlab, you'll want to do the
following. Notice that Matlab is not available until after the
tap command has been run.
deepthought:~: matlab
matlab: Command not found.
deepthought:~: tap matlab
----------------------------------------------------------------------
This is a shortcut to the default version of Matlab available
on your platform.
Run command "matlab" to start up the program,
or "matlab -h" to see various command-line options.
There may be other versions of Matlab available. Please check
the Dash/KDE menu for specific versions of Matlab.
----------------------------------------------------------------------
deepthought:~: matlab
< M A T L A B >
Copyright 1984-2005 The MathWorks, Inc.
Version 7.0.4.352 (R14) Service Pack 2
January 29, 2005
To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.
>>
|
| Package Name | Description |
|---|---|
| ansys100 | Ansys 10.0 |
| blast | BLAST 2.2.18 |
| cap3 | CAP3 compiled with Intel compilers |
| clustalw | ClustalW 1.83 compiled with Intel compilers |
| cns | CNS 1.2 compiled with Intel compilers |
| fftw | FFTW 2.1.5 (with MPI extensions, built with lam-gnu) |
| garli | GARLI 0.951 compiled with Intel compilers (single process version) |
| garli-mpi | GARLI 0.942 compiled with Intel compilers and OpenMPI (MPI version) |
| gromacs | GROMACS version 3.3.3 compiled with Intel compilers |
| gsl | GNU Scientific Library version 1.8 |
| hdf | HDF 4.2r1 |
| hdf5 | HDF 1.6.5 |
| intel | Intel Compilers 10.1.008, MKL 10.0.011 |
| intel-mpi | Intel MPI 3.1 - note that this does NOT work with Infiniband |
| java | Java 1.5.0_11 |
| java6 | Java 1.6.0_04 |
| lam-gnu | LAM 7.1.2 compiled with Gnu compilers |
| lam-intel | LAM 7.1.2 compiled with Intel compilers |
| lapack | LAPACK 3.1.0 (This is the reference implementation -
non-optimised) It includes the BLAS library as well. |
| lucy | lucy 1.19 compiled with Intel compilers |
| mathematica60 | Mathematica 6.0 |
| matlab | Matlab 7.0.4 |
| matlab2007b | Matlab 7.5.0 |
| modeltest | modeltest 3.7 compiled with Intel compilers also includes MrModeltest 2.2 |
| mpich-gnu | MPICH 1.0.4p1 compiled with Gnu compilers |
| mpich-intel | MPICH 1.0.4p1 compiled with Intel compilers |
| mrbayes | MrBayes 3.1.2 compiled with Intel compilers (single process version) |
| mrbayes-mpi | MrBayes 3.1.2 compiled with Intel compilers and OpenMPI (MPI version) |
| muscle | MUSCLE 3.6 compiled with Intel compilers |
| namd | NAMD 2.6 |
| openmpi-gnu | OpenMPI 1.2.5 compiled with Gnu compilers |
| openmpi-intel | OpenMPI 1.2.5 compiled with Intel compilers |
| openmpi-pgi | OpenMPI 1.2.5 compiled with PGI compilers |
| netcdf | NetCDF 3.6.1 |
| paml | PAML 4b compiled with Intel compilers |
| povray | POV-Ray 3.6 |
| R | R 2.4.1 |
| xplor-nih | xplor-nih 2.19 |
Because much of the data generated on the cluster is of a transient nature and because of its size, data stored in the /data partitions is not backed up. This data resides on RAID protected filesystems, however there is always a small chance of loss or corruption. If you have critical data that must be saved, be sure to copy it elsewhere.
There are several general purpose areas that are intended for storage of computational data. These areas are accessible to all users of the cluster and as such you should be sure to protect any files or directories you create there. See Securing Your Data for more information.
The areas are:
| Path | Description |
|---|---|
| /data/dt-vol0 | This is a RAID5 filesystem, currently approximately 2TB |
| /data/dt-vol1 | This is a RAID5 filesystem, currently approximately 2TB |
| /data/dt-raid5 | This is a RAID5 filesystem, currently approximately 500GB |
| /data/dt-raid10 | This is a RAID10 filesystem, currently approximately 500GB |
| The remaining filesystems are for members of the CLFS group only | |
| /data/dt-vol2 | This is a RAID5 filesystem, currently approximately 1.7TB |
| /data/dt-vol3 | This is a RAID5 filesystem, currently approximately 1.7TB |
| /data/dt-vol4 | This is a RAID5 filesystem, currently approximately 1.7TB |
| /data/dt-vol5 | This is a RAID5 filesystem, currently approximately 1.7TB |
The filesystems will perform differently depending on their usage, so you may want to try both to see which works best for your application.
Please remember that you are sharing these filesystems with other
researchers and other groups. If you have data residing there that
you don't need, please remove it promptly. If you know you are going
to create large files, make sure there is sufficient space available
in the filesystem you are using. You can check this yourself with
the df command:
deepthought:~: df -h /data/dt-vol0
Filesystem Size Used Avail Use% Mounted on
g20-fs1.deepthought.umd.edu:/export/data/vol0
1.8T 850G 888G 49% /a/g20-fs1/data/dt-vol0
|
This output shows that there are currently 888 GB of free space
available on /data/dt-vol0.
If you have a Glue account and you want to share your data back and
forth with that account, you can access it at
/glue_homes/<username>. Note that you cannot have
jobs read or write directly from your Glue directory, you'll need to
copy data back and forth by hand as needed.
Your home directory as configured is private and only you have access to it. Any directories you create outside your home directory are your responsibility to secure appropriately. If you are unsure of how to do so, please contact hpcc-help@umd.edu for additional assistance.
If you're a member of a group, you'll want to make sure that you give
your group access to these directories, and you may want to consider
setting your umask so that any files you create automatically have
group read and write access. To do so, add the line umask
002 to your .cshrc.mine file.