C-SAFE Home

Information/Instructions/UpDraft

From C-SAFE Wiki

Jump to: navigation, search

Contents

UpDraft

  • UpDraft is the new 2048 CPU (256 node x 8 core) U of U C-SAFE/CRSIM/UofU linux cluster.
  • CRSIM/C-SAFE share ~1360 cores, while general U of U computing share ~680 cores. Full machine use is also available.
  • UpDraft is administered by CHPC.

Before You Start Using UpDraft

  • You must have a CHPC account (Sign up here for one.)
    • When requesting a CHPC account, you will need to fill in the form, print it out, and have it signed by your PI; then fax it to 585-5366.
    • The CHPC help-desk can be reached at x1-6440.

Disk Space

WARNING: Read the following info about disk space!

Local (Fast) Disk - For Developers

Updraft2 (the head node) has been turned into a file server for Uintah developers/users. It uses (currently, a small - 140 GB) local disk that is exported to the Updraft compute nodes. This should be enough disk space for each Uintah developer/user to keep a couple of Uintah builds. (Please don't store data here). The disk is located at:

/uufs/updraft.arches/common/uintah/homebrew

I suggest doing (something like) the following (from your home directory): (Make sure you are on updraft2 - see Notes below!)

mkdir /uufs/updraft.arches/common/uintah/homebrew/<your_user_id>
ln -s /uufs/updraft.arches/common/uintah/homebrew/<your_user_id> LocalDisk
cd LocalDisk
svn co ...

Note: Make sure to log into updraft2 to get to this disk space. If you "ssh updraft", you will most likely end up on updraft1 and will be asking yourself, "Where did my local disk go?" This space is not mounted generally at CHPC. It only is reachable from updraft2 and the updraft compute nodes.

Note 2: This local disk space is NOT BACKED UP. So make sure to commit changes to your local SVN code to the repository on a regular basis.

Disk Space for Data Files

There are ~3 other file systems to know about (NONE OF WHICH ARE BACKED UP!):

  • HOME (your home directory) - VERY SLOW
  • /scratch/serial - Not terribly slow - gets bogged down when many CHPC users are using it
    • May be purged at given (30 day?) intervals... and all files you haven't recently touched will be deleted.
  • /scratch/uintah - Not terribly slow - but gets bogged down when large Updraft simulations are run.

You will find that the HOME and /scratch file systems "hangs up" fairly frequently for 10-20 seconds at a time (when you type 'ls' or try file tab completion from the command line) when large jobs are running and dumping data (from any cluster at CHPC). If you are using the local Uintah disk space for development work, hopefully this won't happen. Please let me (dav@sci.utah.edu) know if you do run into hangs.

Configure

GCC 4

#! /bin/tcsh

../src/configure \
        --enable-optimize="-O3" \
        --enable-64bit \
        \
        --with-fortran=/uufs/updraft.arches/sys/pkg/scirun_tp/LibraryLinks/g2c \
        \
        --with-mkl=/uufs/updraft.arches/sys/pkg/scirun_tp/LibraryLinks/mkl \
        \
        --with-hypre=/uufs/updraft.arches/sys/pkg/scirun_tp/Uintah/hypre-2.0.0-install/gcc4 \
        --with-petsc=/uufs/updraft.arches/sys/pkg/petsc/2.3.3-p15-gnu4 \
        \
        \
        PETSC_ARCH=linux-gnu \
        CC=gcc4 \
        CXX=g++4 \
        F77=g77

GCC 3

#! /bin/tcsh

../src/configure \
        --enable-optimize="-O3" \
        --enable-64bit \
        \
        --with-fortran=/uufs/updraft.arches/sys/pkg/scirun_tp/LibraryLinks/g2c \
        \
        --with-mkl=/uufs/updraft.arches/sys/pkg/scirun_tp/LibraryLinks/mkl \
        \
        \
        --with-hypre=/uufs/updraft.arches/sys/pkg/scirun_tp/Uintah/hypre-2.0.0-install/gcc3 \
        --with-petsc=/uufs/updraft.arches/sys/pkg/petsc/2.3.3-p15-gnu \
        \
        \
        PETSC_ARCH=linux-gnu \
        CC=gcc \
        CXX=g++ \
        F77=g77

ICC

#! /bin/tcsh

# Update path to include icc/icpc/ifort:
source /uufs/arches/sys/pkg/intel/ifort/std/bin/ifortvars.csh
source /uufs/arches/sys/pkg/intel/icc/std/bin/iccvars.csh

../src/configure \
        --enable-optimize="-O3 -xT" \
        --enable-64bit \
        \
        --with-fortran=/uufs/arches/sys/pkg/intel/ifort/10.1.011 \
        \
        --with-mkl=/uufs/updraft.arches/sys/pkg/scirun_tp/LibraryLinks/mkl \
        \
        --with-hypre=/uufs/updraft.arches/sys/pkg/hypre/2.0.0_intel \
        --with-petsc=/uufs/updraft.arches/sys/pkg/petsc/2.3.3-p15-intel \
        \
        \
        PETSC_ARCH=linux-gnu \
        CC=icc \
        CXX=icpc \
        F77=ifort

Compiling

  • A few updraft nodes (not yet determined which) will have local OSes (instead of net-booting) which should allow for much faster compilations. Stay tuned for more info on this.

Submitting the Job (PBS)

To submit a batch job to UpDraft, use the 'qsub' command. If 'qsub' is not already in your path, then it can be found in /uufs/updraft.arches/sys/bin/.

Below is a sample batch script file:

#PBS -S /bin/csh
#PBS -N "advect"                  
#PBS -l walltime=00:10:00
#PBS -l nodes=1:ppn=8
#PBS -j oe
#PBS -V 
#PBS -r n
#PBS -M <add your email address>
#PBS -m abe
#PBS -A uintah
#PBS -l qos=uintah

#! /bin/csh
# name of output file
set OUT = "out.8"

cd /scratch/serial/<user>/test

mpirun -q 0 -t 3600 -m $PBS_NODEFILE -np 8 sus -mpi -ice advect.ups >& $OUT

There's a sample batch script at:

 /uufs/updraft.arches/sys/pkg/scirun/sampleBatchScript.ups

Notes: Comments are not allowed on the lines with #PBS commands

Scripts

Some useful Uintah scripts located here:

/uufs/updraft.arches/sys/pkg/scirun/scripts

Please add this path to your dot file.

Llogin

  • llogin (a script located in /uufs/updraft.arches/sys/pkg/scirun/scripts) allows the user to request N interactive nodes for X amount of time. (FYI, llogin is just a wrapper around a qsub call.)
  • Example usage:
    • llogin 4 48:00:00 - requests 4 nodes for 48 hours
    • llogin 8 - requests 8 nodes for 6 hours
    • llogin - requests 1 node for 6 hours

myqstat

Runs 'qstat', but substitutes the user's actual name for their uuid.

myw

Runs 'w', but substitutes the user's actual name for their uuid.

usage

To get a general idea of the number of processors in use/available, run 'usage'.

/uufs/updraft.arches/sys/pkg/scirun/scripts/usage

Other Commands

  • To determine if a DAT (Dedicated Application Time) is occuring:
> showres | grep DAT
  • For more info on the DAT (where Mar9_DAT.926 was returned from the 'showres' command):
> mdiag -v -r Mar9_DAT.926
  • Change a job's parameters while the jobs waits in the queue
> qalter -l {attribute to change}
  • Get an estimate when a job might start
> showstart {jobid}
  • See stat for the jobs running on the uintah qos
 
> showstats -q uintah
  • See a list of job priorities
> mdiag -p 

Contact:

If you have questions about updraft, contact homebrew@cs.utah.edu, or CHPC Ops <operations@chpc.utah.edu>.

Notes:

  • When compiling, if you see strange characters in the error messages on your terminal, you may be able to fix it by doing:
unsetenv LANG
  • It can take 5 minutes or so for canceled jobs to be removed from the queue.
  • To execute a mpi run use:
mpirun -m $PBS_NODEFILE -np ............

It is _highly_ recommended that you limit your core files to 0 bytes. Add the following to your .cshrc file

#Uintah users
limit coredumpsize 0  <<---- Add this line


# Get out if terminal is not interactive or we're a root
if ($?USER == 0 || $?TERM == 0) then
    exit
endif

Misc:

  • Updraft hardware problem log: [1]

Stuff:

  • showq -r
  • checkjob -v <job#>
  • mdiag -q
    • Lists the QOS options
  • qsub -l nodes=4:ppn=8,qos=preemtable
    • qos => preemptable, uintah, bigrun
  • diagnose -f



Back to: Main