C-SAFE Home

Information/Instructions/TACC/Ranger

From C-SAFE Wiki

Jump to: navigation, search

Contents

Ranger (UT Austin)

Machine Name

Help

Compilation Notes

  • See configure line (and notes) here.
  • Compilers tried with 04/17/08 code.
    • pgcc: Will not configure without some help from Dav.
    • gcc-3.4: Will configure and compile but jobs with more than 1024 processors hang.
    • gcc-4.2: Configures but is not fully supported and you have to link against Intel mpi/hypre/petsc libraries.
    • icc-10.1:Will configure and compile with run with mvapich_0.9.9 and mvapich_devel/1.0

We are currently advocating using icc-10.1 and mvapich_devel/1.0

  • With the intel compilers you should use the optimization flags
-O2 -fp-model precise -xW



other

  • Uintah ThirdParty software and commonly used scripts are located at
/scratch/projects/tg/uintah

configure it with this line:

./configure -c++=icpc -cc=icc -fortran=intel -mpi \
 -mpiinc=/opt/apps/intel10_1/mvapich-devel/1.0/include/ \
 -mpilib=/opt/apps/intel10_1/mvapich-devel/1.0/lib/shared  -PROFILECALLPATH 

To enable TAU in Uintah, use: --with-tau=/your_tau_path/x86-64/lib/Makefile.tau-callpath-icpc-mpi

  • To run with a number of processors not divisible by 16 (from the Ranger user manual):

If the total number of tasks that you need is less than "Number of Tasks per Node x Number of Nodes", then set the MY_NSLOTS environment variable to the total number of tasks. In a job script, use the following -pe option and environment variable statement:

    $# -pe <TpN>way <NoN x 16>
     ...
     setenv MY_NSLOTS <Total Number of Tasks>          { C-type shells }
     or
     export MY_NSLOTS=<Total Number of Tasks>          { Bourne-type shells }
     e.g.
     $# -pe &nbsp8way 64        {use 8 Tasks per Node, 4 Nodes requested}
     ...
     setenv MY_NSLOTS 31 {31 tasks are launched}
     where
     TpN is a number in the set {1, 2, 4, 8, 12, 15}

Note that doing this will still accrue wall-clock hours for all cores on all nodes regardless if they are used or not.

Blas/Atlas/Lapack

Currently it appears that the MKL library is the best bet:

--with-mkl=/scratch/projects/tg/uintah/SystemLibLinks/mkl

Other options might include the ACML module or gotoBLAS module; however we have not been successful in using them yet.

% module load acml

or

% module load gotoblas

login3% env | grep GOTOB
TACC_INTEL_GOTOBLAS_LIB=/opt/apps/intel10_1/gotoblas/1.23
TACC_PGI_GOTOBLAS_LIB=/opt/apps/pgi7_1/gotoblas/1.23

Batch Queue

For more information on our allocation, go to the TACC portal, and click on Allocations (then on Usage).

For more information on the queue system, go to the TACC user guide, then click on Development and then Running Code.

  • Use the 'qsub' command to submit a batch script.
  • Use the 'qstat' or 'showq' commands to get information about jobs in the queue.
> qstat -g c     # will show you a summary of the queues
> qstat          # will show you all YOUR jobs
> qstat -j jobid # will show you detailed information on the job specified with jobid
> qstat -f       # will spam your screen for 5 minutes
  • Use the 'normal' or 'development' queues:
** Table 5. SGE Batch Environment Queues **

Queue Name  MaxRuntime MaxProcs SU-Rate Purpose
*normal*      24 hrs    4096     1      Normal Priority
*large*       24 hrs   12288     1      Large Core Count
*development*  2 hrs     256     1      development
*serial*       2 hrs      16     1      Large Jobs
*Request*     24 hrs   16384     1      Special Requests
*systest*     --          --     -      System Testing
  • An example batch script:

#!/bin/tcsh
#$ -V                           # Inherit the submission environment
#$ -cwd                         # Start job in submission directory
#$ -N S.1024                    # Job Name
#$ -j y                         # Combine stderr & stdout into stdout
#$ -o $JOB_NAME.o$JOB_ID        # Name of the output file (eg. myMPI.oJobID)
#$ -pe 16way 1024               # Requests 16 cores/node, 1024 cores total
#$ -q normal                    # Queue name
#$ -l h_rt=12:00:00             # Run time (hh:mm:ss)
#$ -M blah@utah.edu             # Email notification address (UNCOMMENT)
#$ -m be                        # Email at Begin/End of job (UNCOMMENT)


set echo                     # {echo cmds, use "set echo" in csh}
set OUT = "out.1024"
ibrun ./sus -mpi scalingRun.ups >& $OUT


Maximum number of cores that can be requested 4096. Note that the proper modules must be loaded/unloaded (as described here) before the job is run. These modules are best dealt with in your ~/.login_user file.

Interactive Node

Although we don't have an interactive queue yet, there is a hack to get interactive access to a job.

You can submit a job that does nothing but sleep, and then login to the node assigned to the job. However, you still need to set up the correct environment, so you have to capture the job environment from the batch script. You can use the following script as an example:

#!/bin/tcsh
#$ -N inter_16p
#$ -cwd
#$ -o $JOB_NAME.o$JOB_ID
#$ -j y
#$ -q development
#$ -pe 16way 16
#$ -V
#$ -l h_rt=01:00:00


env | sed -e"s/=\(.*\)/='\1'/" -e's/=/ /' -e's/^/setenv /' >& job_envs.$JOB_ID

sleep 3600

This batch script starts a 16 core job on 1 node in the development queue and sleeps for 1 hour. It also dumps your environment in a file entitled job_envs.$JOB_ID.

Here are the steps to use this script:

1. Submit the batch script and wait for it to start.

2. After the script starts running, determine which node it is running on using the qstat command. Some examples:

(Note, I believe that 'qw' means waiting in the queue, while 'r' means running.)

login3% qstat 
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
812562 0.00002 inter_16p  tg802225     qw    06/29/2009 12:12:28                                   16

126621 0.00072 b4.815k15N tg802119     r     06/12/2008 09:54:22  normal@i101-104.ranger.tacc.ut    16

This job is running on i101-104.

3. ssh into the node listed in qstat.

4. If you're using csh or tcsh, you can source the file created by your job to set up the proper environment: source job_envs.<JOBID>

If you're using bash as your shell, you'd have to change the line in the batch script to print out the environment variables in a format acceptable by bash. Or just start a csh or tcsh shell and source the file.

Once your environment is setup, you should be able to run your mpi executable using :

$MPICH_HOME/bin/mpirun_rsh -np 1 <node names> <executable>

(note you'll need to type your passwd in several times)

If you're just debugging a serial code, you can just run normally.

Trouble Shooting

If you are experiencing dead locks on Ranger using large numbers of processors try disabling shared memory in your job launch script

export MV_USE_SHARED_MEM=0
export MV_USE_SHMEM_COLL=0

Back to: Main:Information:Instructions:TACC