Information/Instructions/TACC/Ranger
From C-SAFE Wiki
Contents |
Ranger (UT Austin)
Machine Name
- ranger.tacc.utexas.edu TACC System Resource Monitor
Help
- Help Desk: https://portal.tacc.utexas.edu/gridsphere/gridsphere?cid=consulting
- Main operator: 512-475-9411 (<- TRY THIS NUMBER FIRST)
- Consulting Director: Evan Turner 512-475-9465
- Consultant: John Lockman 512-471-9373
- David Anderson (at least for passwords) - david@tacc.utexas.edu - (512)475-9466
Compilation Notes
- See configure line (and notes) here.
- Compilers tried with 04/17/08 code.
- pgcc: Will not configure without some help from Dav.
- gcc-3.4: Will configure and compile but jobs with more than 1024 processors hang.
- gcc-4.2: Configures but is not fully supported and you have to link against Intel mpi/hypre/petsc libraries.
- icc-10.1:Will configure and compile with run with mvapich_0.9.9 and mvapich_devel/1.0
We are currently advocating using icc-10.1 and mvapich_devel/1.0
- With the intel compilers you should use the optimization flags
-O2 -fp-model precise -xW
other
- Uintah ThirdParty software and commonly used scripts are located at
/scratch/projects/tg/uintah
- For TAU, download the tarball from http://www.cs.uoregon.edu/research/tau/downloads.php
configure it with this line:
./configure -c++=icpc -cc=icc -fortran=intel -mpi \ -mpiinc=/opt/apps/intel10_1/mvapich-devel/1.0/include/ \ -mpilib=/opt/apps/intel10_1/mvapich-devel/1.0/lib/shared -PROFILECALLPATH
To enable TAU in Uintah, use: --with-tau=/your_tau_path/x86-64/lib/Makefile.tau-callpath-icpc-mpi
- To run with a number of processors not divisible by 16 (from the Ranger user manual):
If the total number of tasks that you need is less than "Number of Tasks per Node x Number of Nodes", then set the MY_NSLOTS environment variable to the total number of tasks. In a job script, use the following -pe option and environment variable statement:
$# -pe <TpN>way <NoN x 16>
...
setenv MY_NSLOTS <Total Number of Tasks> { C-type shells }
or
export MY_NSLOTS=<Total Number of Tasks> { Bourne-type shells }
e.g.
$# -pe  8way 64 {use 8 Tasks per Node, 4 Nodes requested}
...
setenv MY_NSLOTS 31 {31 tasks are launched}
where
TpN is a number in the set {1, 2, 4, 8, 12, 15}
Note that doing this will still accrue wall-clock hours for all cores on all nodes regardless if they are used or not.
Blas/Atlas/Lapack
Currently it appears that the MKL library is the best bet:
- --with-mkl=/scratch/projects/tg/uintah/SystemLibLinks/mkl
Other options might include the ACML module or gotoBLAS module; however we have not been successful in using them yet.
% module load acml
or
% module load gotoblas
login3% env | grep GOTOB TACC_INTEL_GOTOBLAS_LIB=/opt/apps/intel10_1/gotoblas/1.23 TACC_PGI_GOTOBLAS_LIB=/opt/apps/pgi7_1/gotoblas/1.23
Batch Queue
For more information on our allocation, go to the TACC portal, and click on Allocations (then on Usage).
For more information on the queue system, go to the TACC user guide, then click on Development and then Running Code.
- Use the 'qsub' command to submit a batch script.
- Use the 'qstat' or 'showq' commands to get information about jobs in the queue.
> qstat -g c # will show you a summary of the queues > qstat # will show you all YOUR jobs > qstat -j jobid # will show you detailed information on the job specified with jobid > qstat -f # will spam your screen for 5 minutes
- Use the 'normal' or 'development' queues:
** Table 5. SGE Batch Environment Queues ** Queue Name MaxRuntime MaxProcs SU-Rate Purpose *normal* 24 hrs 4096 1 Normal Priority *large* 24 hrs 12288 1 Large Core Count *development* 2 hrs 256 1 development *serial* 2 hrs 16 1 Large Jobs *Request* 24 hrs 16384 1 Special Requests *systest* -- -- - System Testing
- An example batch script:
#!/bin/tcsh
#$ -V # Inherit the submission environment
#$ -cwd # Start job in submission directory
#$ -N S.1024 # Job Name
#$ -j y # Combine stderr & stdout into stdout
#$ -o $JOB_NAME.o$JOB_ID # Name of the output file (eg. myMPI.oJobID)
#$ -pe 16way 1024 # Requests 16 cores/node, 1024 cores total
#$ -q normal # Queue name
#$ -l h_rt=12:00:00 # Run time (hh:mm:ss)
#$ -M blah@utah.edu # Email notification address (UNCOMMENT)
#$ -m be # Email at Begin/End of job (UNCOMMENT)
set echo # {echo cmds, use "set echo" in csh}
set OUT = "out.1024"
ibrun ./sus -mpi scalingRun.ups >& $OUT
Maximum number of cores that can be requested 4096. Note that the proper modules must be loaded/unloaded (as described here) before the job is run. These modules are best dealt with in your ~/.login_user file.
Interactive Node
Although we don't have an interactive queue yet, there is a hack to get interactive access to a job.
You can submit a job that does nothing but sleep, and then login to the node assigned to the job. However, you still need to set up the correct environment, so you have to capture the job environment from the batch script. You can use the following script as an example:
#!/bin/tcsh #$ -N inter_16p #$ -cwd #$ -o $JOB_NAME.o$JOB_ID #$ -j y #$ -q development #$ -pe 16way 16 #$ -V #$ -l h_rt=01:00:00 env | sed -e"s/=\(.*\)/='\1'/" -e's/=/ /' -e's/^/setenv /' >& job_envs.$JOB_ID sleep 3600
This batch script starts a 16 core job on 1 node in the development queue and sleeps for 1 hour. It also dumps your environment in a file entitled job_envs.$JOB_ID.
Here are the steps to use this script:
1. Submit the batch script and wait for it to start.
2. After the script starts running, determine which node it is running on using the qstat command. Some examples:
(Note, I believe that 'qw' means waiting in the queue, while 'r' means running.)
login3% qstat job-ID prior name user state submit/start at queue slots ja-task-ID ----------------------------------------------------------------------------------------------------------------- 812562 0.00002 inter_16p tg802225 qw 06/29/2009 12:12:28 16 126621 0.00072 b4.815k15N tg802119 r 06/12/2008 09:54:22 normal@i101-104.ranger.tacc.ut 16
This job is running on i101-104.
3. ssh into the node listed in qstat.
4. If you're using csh or tcsh, you can source the file created by your job to set up the proper environment: source job_envs.<JOBID>
If you're using bash as your shell, you'd have to change the line in the batch script to print out the environment variables in a format acceptable by bash. Or just start a csh or tcsh shell and source the file.
Once your environment is setup, you should be able to run your mpi executable using :
$MPICH_HOME/bin/mpirun_rsh -np 1 <node names> <executable>
(note you'll need to type your passwd in several times)
If you're just debugging a serial code, you can just run normally.
Trouble Shooting
If you are experiencing dead locks on Ranger using large numbers of processors try disabling shared memory in your job launch script
export MV_USE_SHARED_MEM=0 export MV_USE_SHMEM_COLL=0
Back to: Main:Information:Instructions:TACC
