Migrating from Pacman/Fish

If you are an existing user of either Pacman or Fish, it is important that you know what to expect when you begin using Chinook. Below is a comparison of some characteristics across the different HPC clusters currently operated by RCS:

Attribute Chinook Fish Pacman
Operating System CentOS 6 (CentOS 7 upgrade planned) Cray Linux Environment (CLE) 4 RHEL 6
Workload manager Slurm PBS/Torque (Cray) PBS/Torque
Usernames UA usernames Legacy ARSC usernames Legacy ARSC usernames
Login nodes 2 (with more coming) 2 12 + 1 high memory
Compute nodes Intel Xeon, 24/28 cores per node AMD Istanbul/Interlagos, 12/16 cores per node, nVidia Tesla GPUs AMD Opteron, 16/32 cores per node
Interconnect QLogic QDR InfiniBand (EDR upgrade planned) Cray Gemini QLogic QDR and Voltaire SDR InfiniBand
Default compiler suite Intel PGI PGI
$CENTER Yes Yes Yes
$ARCHIVE Yes Yes Yes
/projects Yes Yes Yes
$HOME Yes (only available on cluster) Yes (only available on cluster) Yes
/usr/local/unsupported Yes (only available on cluster) Yes (only available on cluster) Yes

User-compiled software

All software previously compiled on Pacman or Fish will need to be recompiled for Chinook. This is due to differences between the hardware and Linux kernel present on Chinook and those on Pacman / Fish.

Software stack differences

Compiler toolchain modules

On Pacman and Fish, the environment modules responsible for loading a compiler and related set of core HPC libraries follow the "PrgEnv" naming convention common to many HPC facilities. Chinook's equivalent modules are called compiler toolchains, or just toolchains. See Compiler Toolchains for more information on what is available.

At this time, the "PrgEnv" modules on Chinook are deprecated, and have been replaced by toolchain modules instead. Depending on feedback, we may provide "PrgEnv"-style symlinks to the toolchain modules in the future.

Dependency loading behavior

The modules on Chinook now each specify and load a (mostly) complete module dependency tree. To illustrate, consider loading an Intel-compiled netCDF library. Here is what happens on Pacman:

$ module purge
$ module load netcdf/4.3.0.intel-2013_sp1
$ module list --terse
Currently Loaded Modulefiles:

And an equivalent action on Chinook:

$ module purge
$ module load data/netCDF/4.4.1-pic-intel-2016b
$ module list --terse
Currently Loaded Modulefiles:

On Pacman and Fish, you get exactly the module you requested and no more (with a few exceptions). This has advantages and disadvantages:

  • advantage: It is easy to experiment with different software builds by swapping library modules
  • disadvantage: It is not immediately obvious which libraries were used during any given software build when multiple versions of those libraries exist
  • disadvantage: It is trivial to introduce a fatal error in an application by inadvertently loading an incompatible library module or omitting a needed one

On Chinook, standardizing and loading all module dependencies results in consistency and reproducibility. When you load the Intel-compiled netCDF module on Chinook, for example, you get modules loaded for the following:

  • The netCDF library
  • Its immediate dependencies (HDF5, zlib, curl)
  • The dependencies for the dependencies (and so on, recursively)
  • The exact Intel compiler, MPI library, and Intel Math Kernel Library (MKL) used to build netCDF
  • An upgraded version of GCC to supersede the ever-present system version

This takes out the guesswork of manually piecing together a software stack module by module. Every successive dependency will modify LD_LIBRARY_PATH and other variables appropriately, so that desired application or library will dynamically link to the proper supporting libraries instead of accidentally picking up an inappropriate matching library.

One ramification of loading a full dependency tree is that trying to load software compiled with different compiler toolchains will likely result in module conflicts - even if the tools you are trying to load provide only binaries and nothing else. This is because combining two or more different dependency trees will likely result in unintended and harmful dynamic linking due to two different builds of a core compiler or library being loaded. LD_LIBRARY_PATH ensures that the library version found first will be used to satisfy all dependencies on that particular library, causing no problems for the software packages that expect it and possibly wreaking havoc for the packages that expect a different build.

Slurm Translation Guide

One of the most immediately evident changes with Chinook is that it uses Slurm for job scheduling rather than PBS/Torque. The workflow for submitting jobs has not changed significantly, but the syntax and commands have. Below is an excerpt from SchedMD's "Rosetta Stone of Workload Managers" relevant to PBS/Torque.

For more information on Slurm, please see Using the Batch System.

Source: http://slurm.schedmd.com/rosetta.pdf, 28-Apr-2013

User Commands PBS/Torque Slurm
Job submission qsub [script_file] sbatch [script_file]
Job deletion qdel [job_id] scancel [job_id]
Job status (by job) qstat [job_id] squeue [job_id]
Job status (by user) qstat -u [user_name] squeue -u [user_name]
Job hold qhold [job_id] scontrol hold [job_id]
Job release qrls [job_id] scontrol release [job_id]
Queue list qstat -Q squeue
Node list pbsnodes -l sinfo -N OR scontrol show nodes
Cluster status qstat -a sinfo
GUI xpbsmon sview
Environment PBS/Torque Slurm
Job Specification PBS/Torque Slurm
Script directive #PBS #SBATCH
Queue -q [queue] -p [queue]
Node Count -l nodes=[count] -N [min[-max]]
CPU Count -l ppn=[count] OR -l mppwidth=[PE_count] -n [count]
Wall Clock Limit -l walltime=[hh:mm:ss] -t [min] OR -t [days-hh:mm:ss]
Standard Output File -o [file_name] -o [file_name]
Standard Error File -e [file_name] e [file_name]
Combine stdout/err -j oe (both to stdout) OR -j eo (both to stderr) (use -o without -e)
Copy Environment -V --export=[ALL | NONE | variables]
Event Notification -m abe --mail-type=[events]
Email Address -M [address] --mail-user=[address]
Job Name -N [name] --job-name=[name]
Job Restart -r [y|n] --requeue OR --no-requeue (NOTE: configurable default)
Working Directory N/A --workdir=[dir_name]
Resource Sharing -l naccesspolicy=singlejob --exclusive OR--shared
Memory Size -l mem=[MB] --mem=[mem][M|G|T] OR --mem-per-cpu=[mem][M|G|T]
Account to Charge -W group_list=[account] --account=[account]
Tasks Per Node -l mppnppn [PEs_per_node] --tasks-per-node=[count]
CPUs Per Task   --cpus-per-task=[count]
Job Dependency -d [job_id] --depend=[state:job_id]
Job Project   --wckey=[name]
Job host preference   --nodelist=[nodes] AND/OR --exclude= [nodes]
Quality of Service -l qos=[name] --qos=[name]
Job Arrays -t [array_spec] --array=[array_spec] (Slurm version 2.6+)
Generic Resources -l other=[resource_spec] --gres=[resource_spec]
Licenses   --licenses=[license_spec]
Begin Time -A "YYYY-MM-DD HH:MM:SS" --begin=YYYY-MM-DD[THH:MM[:SS]]

Frequently Asked Questions

Will I need to copy my files from Pacman/Fish to Chinook?

$ARCHIVE and $CENTER are mounted on Chinook, so you will have access to all your existing files on Chinook. However, your home directory is new and we will not be automatically copying any Pacman/Fish home directory contents to Chinook. If you would like to transfer any files from your Pacman/Fish home directory, you may do so using scp or sftp between Pacman/Fish and Chinook.

I used the PGI compiler on Pacman/Fish. What are my options on Chinook?

Support for the PGI compiler suite will expire in FY17. If possible, please look into compiling your code using the Intel or GNU compiler suites. If not, the latest version of the PGI compilers available when support lapses will remain installed on Chinook.

I have a PBS/Torque batch script. Can I use it on Chinook?

Possibly. Slurm does provide compatibility scripts for various PBS/Torque commands including qsub. The compatibility is not perfect, and you will likely need to debug why your batch script isn't doing what you expect. It is worth putting that time towards porting the PBS script to Slurm syntax and using sbatch instead.