Research Computing Systems

Research Computing Systems at the Geophysical Institute provides advanced computing, storage, data sharing solutions and research IT support to University of Alaska research communities, collaborators and supporters.

Services

Information for new users

RCS computing and storage resources are available upon request to University of Alaska faculty, staff, and sponsored individuals. Please see Getting Access for more information.

If your recent work was made possible by RCS resources, please see our
Citation and Acknowledgement
page.

High Performance Computing - old

  • Penguin Computing Cluster

    chinook

    In 2016 the Geophysical Institute launched Chinook as an energy efficient linux cluster purchased from Penguin Computing,Inc. Chinook is the foundation for a new community HPC condo-style cluster for UA researchers and local, state, and federal collaborators. The computing environment hosted on Chinook includes:

    • Relion 1900 Compute Nodes with Dual Intel Xeon 12 core processes (24 cores per node) and 128GB memory
    • CentOS Operating System, Slurm open source batch scheduling software, and Scyld ClusterWare HPC Management software
    • Direct access to the center-wide 275 TB Lustre scratch file system
    • Direct access to the center-wide 7 PB Solaris long term tape storage file system
  • Beowulf Compute Cluster

    pacman

    Pacman was procured for the Pacific Area Climate Monitoring and Analysis Network in 2011. Hosting several thousand cores and operating RedHat Enterprise Linux and Scyld ClusterWare, Pacman is a core resource for computational research at the University of Alaska.
    Pacman is composed of:

    • AMD Opteron processors on compute nodes varying in size ranging from 4 to 128 cores per node
    • QLogic QDR and Voltaire SDR Infiniband Interconnect
    • Direct access to the center-wide 275 TB Lustre scratch file system
    • Direct access to the center-wide 7 PB Solaris long term tape storage file system
  • Cray XK6m-200

    fish

    Fish began operating at UAF in 2012 as a grant funded resource for the Pacific Area Climate Monitoring and Analysis Network. Running the Cray Linux Environment and GPU enabled compute nodes, Fish offers a fast and reliable computing environment to UA researchers.
    Fish is composed of:

    • Twelve core nodes with AMD Istanbul processors
    • Sixteen core nodes with AMD Interlagos processors and one nVIDIA Tesla X2090 accelerator per node
    • Cray proprietary Gemini Interconnect
    • Direct access to the center-wide 275 TB Lustre scratch file system
    • Direct access to the center-wide 7 PB Solaris long term tape storage file system
  • Interested in using these resources?
    Visit the RCS User Access page to learn how.

Chinook System Documentation

Index

Getting Access

Are you interested in using the Chinook HPC cluster in your computational work? Please read our directions on how to obtain RCS project and user accounts.

Logging In

To log into Chinook, you will need a user account and Secure Shell (SSH) client program.

Linux / Mac:

Linux and Mac users should use OpenSSH, which is already installed on your computer. Open a terminal session and run the following command to connect to Chinook:


ssh uausername@chinook.alaska.edu

where uausername is your UA username (e.g. jsmith2). When prompted for a password, enter your UA password.

Windows:

Windows users will need to download and install a third-party SSH client in order to log into Chinook. Here are a few options:

  • PuTTY (open source, MIT license)
  • Secure Shell (proprietary, UA credentials required for download)

For reference, here is an ARSC guide to installing and using PuTTY, originally written for PuTTY beta 0.63.

Use the SSH client you have chosen and installed to connect to chinook.alaska.edu, When prompted for a username, either interactively or in configuration, use your UA username. When prompted for a password, use your UA password.

System Architecture

Community Condo Model

Chinook is a community, condo model high performance computing (HPC) cluster. Common infrastructure elements such as the environmentally regulated data center, network connectivity, equipment racks, management and technical staff, and a small amount of CPUs provide subsidized resources to PIs that they may not be able to procure individually, and allows them to focus time and energy on research, rather than owning and operating individual clusters.

Participants in the condo service share unused portions or elements of the computational resources they add to Chinook with each other and non-invested users - such as students or occasional users - who may or may not pay a fee for access. A queue management system gives vested PIs top priority to the share he/she has purchased whenever the PI needs the resource. RCS also reserves the option to use manual or automated job preemption to interrupt community user jobs as needed to give vested PIs access to their share.

Tier 1: Community Nodes

This level of service is open to the UA research community using nodes procured for the community and unused portions of shareholder nodes. Users in this tier are anticipated to receive:

  • Unlimited CPU
  • Limited Lustre storage (pending)
  • Limited wall time queue limits (pending)

Tier 2: Shareholder Shared Nodes

This level of service is for the PI or project that requires CPUs beyond what can be offered by Tier 1 or requires priority access to HPC resources. Users in this tier are shareholders that procure equipment and receive:

  • Unlimited CPU
  • Limited (greater than Tier 1) Lustre & DataDir storage (pending)
  • Limited wall time queue limits (greater than Tier 1) (pending)
  • Higher initial job priority weighted by the share procured
  • Preemption over Tier 1 users
  • Short term reservations (pending)

Tier 3: Shareholder Dedicated Nodes

This level of service is for the PI or project that requires dedicated resources. RCS will manage and operate procured nodes for an additional service fee. Users interested in this level of service should contact RCS.

  • Limited CPU to procured nodes and infrastructure components
  • Limited Lustre (equal to Tier 1 unless additional capacity procured by PI/project) + DataDir storage (pending)
  • No priority or preemption rights to Tier 1 or Tier 2
  • Dedicate queue(s) with unlimited wall times
Node Type Description Approximate Cost
Standard Node Relion 1900: Dual, 14-core
Intel E5-2690v4 processors with 128 GB RAM, 3-year warranty
$8,200
Relion 1900: Dual, 14-core
Intel E5-2690v4 processors with 128 GB RAM, 4-year warranty
$8,500
Relion 1900: Dual, 14-core
Intel E5-2690v4 processors with 128 GB RAM, 5-year warranty
$8,800
BigMem Node Relion 1900: Dual, 14-core
Intel E5-2690v4 processors with 750 GB RAM
$26,000

Note: Please contact RCS if you are interested in Chinook membership. All node types include licenses for ClusterWare Scyld Mellanox UFM and Linux with a minimum 3-year service contract. The purchase price provides Chinook community membership that aligns with the warranty of the equipment purchased. When membership expires, the resources must be upgraded or the warranty must be renewed, otherwise the resources revert to the community pool and the project will be given a Tier 1 status.

Lifecycle management

All compute nodes include factory support for the duration of the warranty period. During this time any hardware problems will be corrected as soon as possible. After the warranty expires, compute nodes will be supported on a best-effort basis until they suffer complete failure, are replaced, or reach a service age of 5 years. Once a node has reached end-of-life due to failure or obsolescence, it will be removed from service.

Policies

Login Shells

The login shells supported on RCS HPC systems are bash, csh, ksh, and tcsh. If you would like your default login shell changed, please contact uaf-rcs@alaska.edu

Security Policies

Users of RCS HPC systems agree to abide by published UAF policies and standards: http://www.alaska.edu/oit/services/policies-and-standards. Every user of RCS HPC systems may rightfully expect their programs, data, and documents stored on RCS systems to be inaccessible by others, secure against arbitrary loss or alteration, and available for use at all times. To help protect system security and achieve this goal, RCS staff reserve the right to routinely examine user accounts. In the event of a suspected security incident, RCS staff may inactivate and examine the contents of user accounts without prior notification.

Account Sharing

Users of RCS HPC systems may not share their account with anyone under any circumstances. This policy ensures every user is solely responsible for all actions from within their account. To allow group members to read project data, Unix file and directory permissions should be granted group access. Contact uaf-rcs@alaska.edu for more information regarding changing group file and directory permissions.

Policy Enforcement

Abuse of RCS resources is a serious matter and is subject to immediate action. A perceived, attempted, or actual violation of standards, procedures, or guidelines pursuant with RCS policies may result in disciplinary action including the loss of system privileges and possibly legal prosecution in the case of criminal activity. RCS employs the following mechanisms to enforce its policies:

  • Contacting the user via phone or email to ask them to correct the problem.
  • Modifying the permissions on user's files or directories in response to a security violation.
  • Inactivating accounts or disabling access to resources to ensure availability and security of RCS HPC systems.

User Owned Files and Directories

For the $HOME filesystems, RCS recommends file and directory permissions authorize write access only to the file owner. Group and world write permissions for files and directories in $HOME should be avoided under all circumstances.

For the $CENTER and $ARCHIVE filesystems, RCS recommends using extreme caution when opening group and world read, write, and execute permissions on user-owned files and directories.

Setuid and setgid permissions are prohibited in all user owned files and directories.

User files may be scanned for system security or maintenance purposes.

Non-Printing Characters in File Names

Non-printing characters, such as ASCII codes for RETURN or DELETE, are occasionally introduced by accident into file names. These characters present a low-level risk to system security and integrity and are prohibited.

Techniques for renaming, deleting, or accessing files containing non-printing characters in the filename are described at www.arsc.edu/support/howtos/nonprintingchars/index.xml

Passwords

RCS uses University of Alaska (UA) systems for user authentication. Therefore, passwords used on RCS systems are subject to UA password guidelines. University of Alaska passwords may be changed using the ELMO webpage (https://elmo.alaska.edu). If you suspect your password has been compromised, contact the UAF OIT Helpdesk (helpdesk@alaska.edu, 907-450-8300) immediately.

SSH Public Keys

Sharing of private SSH keys to allow another user to access an RCS account is considered account sharing and is prohibited on RCS systems.

Users of SSH public keys are responsible for their private keys and ensuring they are protected against access from other users. Private SSH keys should be generated and stored only on trusted systems.

Tampering

Do not attempt to break passwords, tamper with system files, access files in other users' directories without permission, or otherwise abuse the privileges given to you with your RCS account. Your privileges do not extend beyond the directories, files, and volumes which you rightfully own or to which you have been given permission to access.

System Generated E-mail

RCS provides a ~/.forward file for users on each system. When the system generates an email, the message will be forwarded to email address(es) listed in the .forward file. Users are free to update their ~/.forward file to their preferred email address.

Third-party Software

Software requests

RCS evaluates third-party software installation requests for widely-used HPC software on a case-by-case basis. Some factors that affect request eligibility are:

  • Applicability to multiple research groups
  • Complexity of the installation process
  • Software licensing

If a third-party software installation request is found to be a viable candidate for installation, RCS may elect to install the software through one of several means:

  • RPM
  • Binary (pre-built) distribution
  • Source build

If an application or library is available through standard RPM repositories (Penguin Computing, CentOS, EPEL, ...) then the RPM may be installed. Users should test the installed software to determine if it meets requirements. If the RPM version does not meet needs, please contact RCS to have alternate installation methods evaluated.

Software that is not installed as an RPM will be installed in a publicly-available location and be accessible via Linux environment modules. If the software is built from source, then RCS will default to using the Intel compiler suite.

Installing your own software

Individuals and research groups may install third party applications and libraries for their own use in the following locations:

  • $HOME
  • /usr/local/unsupported

Packages built for personal use only should be installed in $HOME.

The /usr/local/unsupported directory is intended to host user-installed and maintained software packages and datasets that are shared with a group of users on the system. Users who add content to /usr/local/unsupported are fully responsible for the maintenance of the files and software versions. Please read the /usr/local/unsupported/README.RCS file for more information on /usr/local/unsupported.

To request a new subdirectory within /usr/local/unsupported, please contact RCS with the following information:

  • The name of the requested subdirectory, which can be your project's name (e.g., UAFCLMIT) or the type of software you intend to install in the directory (e.g., "ClimateModels")
  • A general description of what you intend to install
  • A rough estimate of the amount of storage you will need (e.g., 100 MB)

Computing Environment

Chinook currently has two login nodes:

  • chinook00.rcs.alaska.edu
  • chinook01.rcs.alaska.edu
  • The CentOS 6 operating system has been installed on Chinook. An upgrade to CentOS 7 is planned for 2016.
  • Chinook hosts Intel Relion 1900 compute nodes each with 24 cores and 128 GB memory.
  • The batch scheduler on Chinook is "Slurm".

Available Filesystems and Storage

$HOME

  • The $HOME filesystem is accessible from the Chinook login and compute nodes.
  • Default $HOME quotas are set to 4 GBs.
  • The $HOME filesystem is backed up regularly.

$CENTER

  • The $CENTER scratch filesystem is accessible from the Chinook login and compute nodes.
  • Default $CENTER quotas are set to 750 GBs.
  • Files older than 30 days are subject to being removed automatically. Copy your files off $CENTER if you intend to keep the data longer than 30 days.
  • If you have a legacy ARSC username, a symbolic link has been created linking your /center/w/ARSCusername directory to your /center/w/UAusername directory.

$ARCHIVE

  • The $ARCHIVE filesystem is accessible from the Chinook login nodes only.
  • Files stored in $ARCHIVE will be written to tape and taken offline over time. Use the "batch_stage" command to bring the files back online prior to viewing the contents of the file or copying the data off $ARCHIVE.
  • If you have a legacy ARSC username, a symbolic link has been created linking your /archive/u1/uaf/ARSCusername directory to your /archive/u1/uaf/UAusername directory.

Batch Scheduler Translation Guide

A translation guide for users transitioning from pacman (PBS/Torque) to Chinook (Slurm).

Source: http://slurm.schedmd.com/rosetta.pdf 28-Apr-2013
User Commands PBS/Torque Slurm
Job submission qsub [script_file] sbatch [script_file]
Job deletion qdel [job_id] scancel [job_id]
Job status (by job) qstat [job_id] squeue [job_id]
Job status (by user) qstat -u [user_name] squeue -u [user_name]
Job hold qhold [job_id] scontrol hold [job_id]
Job release qrls [job_id] scontrol release [job_id]
Queue list qstat -Q squeue
Node list pbsnodes -l sinfo -N OR scontrol show nodes
Cluster status qstat -a sinfo
GUI xpbsmon sview
 
Environment PBS/Torque Slurm
Job ID $PBS_JOBID $SLURM_JOBID
Submit Directory $PBS_O_WORKDIR $SLURM_SUBMIT_DIR
Submit Host $PBS_O_HOST $SLURM_SUBMIT_HOST
Node List $PBS_NODEFILE $SLURM_JOB_NODELIST
Job Array Index $PBS_ARRAYID $SLURM_ARRAY_TASK_ID
 
Job Specification PBS/Torque Slurm
  #PBS #SBATCH
Queue -q [queue] -p [queue]
Node Count -l nodes=[count] -N [min[-max]]
CPU Count -l ppn=[count] OR -l mppwidth=[PE_count] -n [count]
Wall Clock Limit -l walltime=[hh:mm:ss] -t [min] OR -t [days-hh:mm:ss]
Standard Output File -o [file_name] -o [file_name]
Standard Error File -e [file_name] e [file_name]
Combine stdout/err -j oe (both to stdout) OR -j eo (both to stderr) (use -o without -e)
Copy Environment -V --export=[ALL | NONE | variables]
Event Notification -m abe --mail-type=[events]
Email Address -M [address] --mail-user=[address]
Job Name -N [name] --job-name=[name]
Job Restart -r [y|n] --requeue OR --no-requeue (NOTE: configurable default)
Working Directory N/A --workdir=[dir_name]
Resource Sharing -l naccesspolicy=singlejob --exclusive OR--shared
Memory Size -l mem=[MB] --mem=[mem][M|G|T] OR --mem-per-cpu=[mem][M|G|T]
Account to Charge -W group_list=[account] --account=[account]
Tasks Per Node -l mppnppn [PEs_per_node] --tasks-per-node=[count]
CPUs Per Task   --cpus-per-task=[count]
Job Dependency -d [job_id] --depend=[state:job_id]
Job Project   --wckey=[name]
Job host preference   --nodelist=[nodes] AND/OR --exclude= [nodes]
Quality of Service -l qos=[name] --qos=[name]
Job Arrays -t [array_spec] --array=[array_spec] (Slurm version 2.6+)
Generic Resources -l other=[resource_spec] --gres=[resource_spec]
Licenses   --licenses=[license_spec]
Begin Time -A "YYYY-MM-DD HH:MM:SS" --begin=YYYY-MM-DD[THH:MM[:SS]]

Available partitions

Name Node count Max walltime Nodes per job (min-max) Other rules Purpose
debug 1 1 hour 1 For debugging job scripts
t1small 5 1 day 1-2 For short, small jobs with quick turnover
t1standard 28 4 days 3-28 Default General-purpose partition
t2small 5 2 days 1-2 Tier 2 only. Increased priority. Tier 2 version of t1small
t2standard 28 7 days 3-28 Tier 2 only. Increased priority. Tier 2 general-purpose partition
transfer 1 1 day 1 Shared use Copy files between archival storage and scratch space

Selecting a partition is done by adding a directive to the job submission script such as #SBATCH --partition=t1standard, or on the command line: $ sbatch -p t1standard

Anyone interested in gaining access to the higher priority partitions by subscribing to support the cluster or procuring additional compute capacity should contact uaf-rcs@alaska.edu.

Sample batch job submission script

Batch job submission on Chinook is done using the "sbatch" command. For example, a batch script named "mybatch.sh" is submitted to Slurm as a batch job using "sbatch mybatch.sh".

Here is what a batch script for an MPI application might look like:

#!/bin/sh
 
#SBATCH --partition=t1standard
#SBATCH --ntasks=<NUMTASKS>
#SBATCH --tasks-per-node=24
#SBATCH --mail-user=<USERNAME>@alaska.edu
#SBATCH --mail-type=BEGIN
#SBATCH --mail-type=END
#SBATCH --mail-type=FAIL
#SBATCH --output=<APPLICATION>.%j
 
# Load any desired modules, usually the same as loaded to compile
. /etc/profile.d/modules.sh
module purge
module load PrgEnv-intel
module load slurm
 
cd $SLURM_SUBMIT_DIR
# Generate a list of allocated nodes; will serve as a machinefile for mpirun
srun -l /bin/hostname | sort -n | awk '{print $2}' > ./nodes.$SLURM_JOB_ID
# Launch the MPI application
mpirun -np $SLURM_NTASKS -machinefile ./nodes.$SLURM_JOB_ID ./<APPLICATION>
# Clean up the machinefile
rm ./nodes.$SLURM_JOB_ID

The above batch script contains placeholders because every batch scripts will be different. Here's what each placeholder means (and what you should replace it with):

  • <APPLICATION>: The executable that you want to run in parallel
  • <NUMTASKS>: The number of parallel tasks (cores) that you are requesting from Slurm
  • <USERNAME>: Your Chinook username (same as your UA username)

In addition to the placeholders, there are many environment variables that Slurm defines at runtime for jobs. Here are the ones used in the above script:

  • $SLURM_JOB_ID: The job's numeric id
  • $SLURM_NTASKS: The value supplied as <NUMTASKS>
  • $SLURM_SUBMIT_DIR: The current working directory when "sbatch" was invoked

The example provided above was written with MPI applications in mind. Applications not using an MPI library need not create a machinefile and should not use mpirun.

Sample interactive job submission

srun -p debug --nodes=1 --exclusive --pty /bin/bash

The above srun command will reserve one node in the debug partition and launch an interactive shell job.

Chinook Getting Started Guide

  1. First, review the chinook online document on the RCS website.
  2. Log in
  3. Enter the "show_storage" command to view your default 4GB $HOME quota, 750GB $CENTER quota, and use of the $ARCHIVE filesystem on the chinook login nodes.
  4. The Slurm batch scheduler is installed to manage work on the compute nodes (this is different from pacman and fish using Torque and Moab.)
  5. View the available partitions (or queues) on chinook with the "sinfo" command.
  6. Copy and paste the sample job submission script on this page (see above) or from /usr/local/pkg/samples_home into your directory.
  7. Modify your job submission script to customize your job size (--ntasks and --task-per-node), maximum walltime, output filename, module loading, directory location of your files, and the name of your executable.
  8. Save your changes and view (or "cat") your file to confirm it looks okay, remembering to include the critical "srun" command prior to the "mpirun" call if you are running a multinode job.
  9. Add your job submission script to the scheduler's partitions by typing the "sbatch" command followed by the name of your job submission script.
  10. Use the "squeue", "sview", and/or "qmap" commands to view queued and running jobs on the cluster.
  11. The system will email you when the job finishes.
  12. Compute node reservations can be made in advance for scheduled downtimes and special user campaigns.
  13. So far users are reporting faster runs on chinook compared to pacman, great news!
  14. Any questions regarding the Chinook system can be sent to uaf-rcs@alaska.edu
  15. Have a great day!

Maintenance Periods

The annual maintenance schedule for Chinook is as follows:
  • Monthly: First Wednesday for non-interrupt Operating System security and Scyld ClusterWare updates
  • Quarterly: First Wednesday of Jan, Apr, Jul, and Oct for Operating System and Scyld ClusterWare updates that may require system downtime
  • Twice per year: The month of May during the FAST and over the winter closure for Operating System and Scyld ClusterWare updates that may require system downtime

High Performance Computing

  • Chinook

    Penguin Computing Community Cluster

    Chinook is the foundation for a new energy-efficient, condo-style HPC cluster for UA researchers. The computing environment hosted on Chinook includes:

    • Relion 1900 Compute Nodes each with dual Intel Xeon 12- or 14-core processors (24 or 28 cores per node) and 128GB memory
    • Multiple login nodes with dual Intel Xeon processors and 48 or more GBs memory
    • CentOS operating system, Slurm open-source workload management software, and Scyld ClusterWare HPC management software
    • Direct access to the center-wide 275 TB Lustre scratch file system
    • Access to the center-wide 7 PB long term tape storage file systems

    Are you interested in using the Chinook HPC cluster in your computational work? Please read our directions on how to obtain RCS project and user accounts.

    In 2016, the Geophysical Institute launched Chinook as an energy efficient linux cluster purchased from Penguin Computing, Inc. Chinook is named in honor of long time GI colleague Kevin Engle's unique, strong, collaborative nature and passion for salmon and Alaska.

  • Fish

    Cray XK6m-200

    Fish began operating at UAF in 2012 as a grant funded resource for the Pacific Area Climate Monitoring and Analysis Network. Running the Cray Linux Environment and GPU-enabled compute nodes, Fish offers a fast and reliable computing environment to UA researchers.

    Fish is composed of:

    • Twelve core nodes with AMD Istanbul processors
    • Sixteen core nodes with AMD Interlagos processors and one nVIDIA Tesla X2090 accelerator per node
    • Cray proprietary Gemini Interconnect
    • Direct access to the center-wide 275 TB Lustre scratch file system
    • Direct access to the center-wide 7 PB Solaris long term tape storage file system
  • Interested in using these resources? Please read our directions on how to obtain RCS project and user accounts.

Chinook

Are you interested in using the Chinook HPC cluster in your computational work? Please read our directions on how to obtain RCS project and user accounts.

Logging In

To log into Chinook, you will need a user account and Secure Shell (SSH) client program.

Use the SSH client you have chosen and installed to connect to chinook.alaska.edu. When prompted for a username, either interactively or while configuring the client, you should provide your UA username. You will be prompted for a password upon opening an SSH connection. When this happens, enter your UA password.

Linux

Linux users should use the OpenSSH client, which is already installed on your computer. Open a terminal session and run the following command to connect to Chinook:

ssh uausername@chinook.alaska.edu

replacing uausername with your UA username (e.g. jsmith2).

Mac

Mac users, like Linux users, should use the pre-installed OpenSSH client. See above for directions.

Unlike Linux, Mac operating systems do not come with an X Window server pre-installed. If you want to run any graphical applications on Chinook, we recommend installing XQuartz on your Mac.

Windows

Windows users will need to download and install a third-party SSH client in order to log into Chinook. Here are a few available options:

  • PuTTY (open source, MIT license)
  • Secure Shell (proprietary, UA credentials required for download)

Installing PuTTY

RCS recommends that Windows users download and install PuTTY, a free-and-open-source ssh/rsh/telnet client.

  1. Download PuTTY from the official site.
  2. Run the PuTTY installer, and select "Next".
  3. By default, the installer will install in C:\Program Files (x86)\PuTTY under 64-bit Windows, and C:\Program Files\PuTTY under 32-bit Windows. Select "Next".
  4. The installer will prompt you for a Start Menu folder in which to create shortcuts. Select "Next".
  5. Select "Create a desktop icon for PuTTY", and select "Next".
  6. The installer will allow you to review your choices. Select "Install" after you have done so.
  7. The installer will require only a few seconds to install PuTTY on your computer. Select "Finish". As it closes, the installer will by default open PuTTY's readme file, which contains additional information on using the additional tools included with PuTTY.

Using PuTTY

Establishing a remote login session over SSH using PuTTY is reasonably straightforward. The following steps describe how to do this, turn on X11 forwarding, and save connection settings.

Remote Login via SSH

  1. Open PuTTY using the icon placed on your desktop by the PuTTY installer.
  2. For "Host Name", enter chinook.alaska.edu.
  3. Select "Open" to initiate an SSH connection request.
  4. You may receive a security alert warning you that your client cannot establish authenticity of the host you are connecting to. This warning is always displayed the first time your SSH client connects to any computer it has never connected before. If you have never connected to Chinook using this PuTTY installation, select "Yes".
  5. A terminal window should open. You will be prompted for a username. Enter your UA username.
  6. You will be prompted for a password. Enter your UA password and continue.
  7. On successful authentication, a command prompt will appear and allow you to execute commands on Chinook.

Enabling Graphics

Some applications on Chinook, especially visualization applications, require a graphical display. It is possible to tunnel graphics over an SSH connection using X11 graphics forwarding, which is supported by PuTTY.

  1. Install a local X Window server. We recommend installing the last free version of XMing, which became proprietary software in May 2007.
  2. In PuTTY, define a connection to chinook.alaska.edu and navigate to "Connection-SSH-X11". Check the box labeled "Enable X11 forwarding".
  3. Initiate an SSH connection request and log in as outlined in the last section.
  4. Ensure that your local X server is running. Without this, any graphical application will fail to run properly.
  5. Run xlogo, a simple graphical application. If you see a window containing a black X on a white background, you have successfully enabled X11 forwarding.

Saving Connection Settings

  1. Configure your connection settings as desired
  2. Navigate to "Category-Session"
  3. Enter a name for your session in the "Saved Sessions" input box, and select "Save". Your session should now appear as a new line in the text box to the left of "Save".
  4. To load saved settings, select the session you want to load and then select "Load".

Optionally, PuTTY's command-line flags allow you to create shortcuts that load a particular connection.

  1. Copy your PuTTY shortcut icon
  2. Right click on the copy, and select "Properties"
  3. In the "Target" field, append -load followed by the connection name in quotation marks
  4. Select "Apply", and close the window
  5. Rename the modified shortcut appropriately

Troubleshooting

When I try to connect, PuTTY opens an alert box that says "Disconnected: No supported authentication methods available".

This message means that authentication by username failed. This is most likely caused by an incorrect username, or because you do not have access to Chinook. Please ensure that you received an email from RCS User Support (uaf-rcs@alaska.edu) notifying you of your Chinook account creation, and use the username provided in that email.

My application returns the error "X connection to localhost:10.0 broken (explicit kill or server shutdown)" (or similar).

This is an indication that your local X server is not running. Check the icons on the right-hand side of your task bar for the X server icon. If it is not present, ensure that you have installed an X server locally and that it is running. Once the icon is present, try opening your program again.

I received the "Unknown Host Key" popup alert, followed by another popup stating: "Server unexpectedly closed network connection".

This indicates that the server's SSH timeout was triggered. SSH servers are often configured to kill incoming connections that do not send data for a while. While you were responding to the "Unknown Host Key" popup, the remote host's connection timeout expired and it disconnected you. You should be able to reconnect without problem.

Using VNC to Login

To run graphical applications on RCS systems remotely, the Virtual Network Computing (VNC) application is available and provides some advantages beyond using X Windows over SSH such as a detachable session and better performance over a slow speed connection. Here is basic set up information required for this approach.

***Important Note: Please follow all of these steps with each new VNC session.***

Step 1: Install VNC on your local system

There are multiple VNC viewer programs available with unique interfaces and features. The application on RCS systems is TigerVNC.

MAC users can use the built in Apple "Screen Sharing" as a VNC client and do not have to install an additional client.

After installing the software, make sure ports 5900 and 5901 are open to allow VNC traffic through your host firewall.

Step 2: Setup port forwarding over SSH for the VNC session

On Linux or MAC systems:

  local$ ssh -L 5901:localhost:5901 username@remote.alaska.edu

On a Windows system:

Setup a SSH tunnel with PuTTY on Windows.

  1. On the left side of the PuTTY dialog box when you open PuTTY, choose Connection->SSH->Tunnels
  2. in Source Port enter 5901
  3. in Destination enter remote.alaska.edu:5901
  4. Click Add and you should see the following in the list of forwarded ports:

    L5901 remote.alaska.edu:5901

Step 3: Connect to the remote system and start the VNC server

Log onto the remote system over SSH and specify the appropriate ports for VNC client (your local system) and server (remote system) communication.

Launch a VNC server instance on the remote system. The initial vncserver instance will prompt you for a password to protect your session. Subsequent launches of vncserver will use the same password and you will not be prompted for a password.

  remote$ vncserver -localhost

  You will require a password to access your desktops.
  Password:
  Verify:

  New 'remote:1 (username)' desktop is remote:1

  Creating default startup script /u1/uaf/username/.vnc/xstartup
  Starting applications specified in /u1/uaf/username/.vnc/xstartup
  Log file is /u1/uaf/username/.vnc/remote:1.log

Step 4: Open VNC on your local system

  1. Launch Apple "Screen Sharing" on a MAC.

    The Apple "Screen Sharing" connect to server dialog can be accessed with {apple key} K or Finder - Go - Connect to Server. Use "vnc://localhost:5901" as the "Server Address".

  2. Launch VNC on Windows from the menu or a launcher icon.

    On Windows, the VNC application should have installed a launcher somewhere in the menus and may have also installed an icon on the desk or start bar depending on options you chose when installing. Use the menu or icon to start VNC.

  3. Launch Linux VNC viewer from the command line

    Launch your VNC viewer program and connect to host "localhost" and port 5901. The example below shows how to launch the client using TigerVNC.

    local$ vncviewer localhost:5901

If you are using the TigerVNC GUI, enter "localhost:5901" into the "VNC server:" box then click the "Connect" button. You will then be prompted for the password created in Step 2. If your local VNC client connects successfully, you will then see your desktop on the remote system.

Your circumstances might require the use of different ports due to firewall issues or if you are running more than one VNC server session on the remote system. (Other people on the system might be running their own sessions as well and occupying the ports.) If this is the case, you may need to specify port 5902 or 5903 or ... Add 5900 to the display number to determine the correct remote port to use.

To determine whether the VNC viewer has successfully connected, check the log file noted when vncserver was started on the remote system.

After starting the server, the option exists to log out and back in again using different port forwarding parameters.

Note that some VNC viewer programs can automatically set up the SSH port forwarding through a command-line flag such as "-via" or some option in a graphical configuration menu.

Step 5: When finished, close the VNC session

To close your VNC session, view the open sessions on the remote system, then close the appropriate one.

  remote$ vncserver -list
  TigerVNC server sessions:
  X DISPLAY #     PROCESS ID
  :1                    252550
  remote$ vncserver -kill :1

Troubleshooting

  1. Orphaned Session

    If a previous VNC session remains open on the remote system, that old session will need to be closed prior to establishing a new connection using the same port. To identify and kill the old session, first obtain the processID of the "Xnvc" process, then issue the kill command.

      remote$ ps -elf | grep username | grep Xvnc
      0 S username    236193      1  0  80   0 - 24842 poll_s Nov09 ?        
            00:00:10 /usr/bin/Xvnc :1 -desktop remote:1 (username) 
            -auth /u1/uaf/username/.Xauthority -geometry 1024x768 
            -rfbwait 30000 -rfbauth /u1/uaf/username/.vnc/passwd 
            -rfbport 5901 -fp catalogue:/etc/X11/fontpath.d -pn -localhost
      remote$ kill 236193
    

  2. Locked Session

    Depending on your desktop settings on the remote system, the X screensaver may kick in and lock the session after a period of inactivity. If this happens, you'll be prompted for a password that doesn't exist. The xlock process can be killed from the command line. We recommend disabling X locking in the VNC displayed desktop settings to avoid this happening.

  3. Reset Server Password

    To change the VNC server password, use the 'vncpasswd' command on the remote system.

  4. More Information

    Run 'vncserver --help' and 'man vncserver' for more information on how to use the application.

Available Filesystems

On account creation, a new RCS HPC user is given ownership of a subdirectory created on all of the following major filesystems. The paths to each of these subdirectories are recorded in your shell's environment variables, making it easy to use these paths on the command line.

The major filesystems available on Chinook are typically referred to by the Bash syntax used to expand the corresponding environment variable. These names are used below.

The following protocols for transferring files are supported on Chinook:

  • Secure Copy (SCP)
  • SSH File Transfer Protocol (SFTP)
  • rsync (client-to-client only, no daemon)

$HOME

  • The $HOME filesystem is accessible from the Chinook login and compute nodes.
  • Default $HOME quota for Tier 1 users: 10 GB
  • Default $HOME quota for Tier 2 users: 20 GB
  • The $HOME filesystem is backed up regularly.

$CENTER

  • The $CENTER scratch filesystem is accessible from the Chinook login and compute nodes.
  • Default $CENTER quotas are set to 750 GBs.
  • Files older than 30 days are subject to being removed automatically. Copy your files off $CENTER if you intend to keep the data longer than 30 days.
  • If you have a legacy ARSC username, a symbolic link has been created linking your /center/w/ARSCusername directory to your /center/w/UAusername directory.

$ARCHIVE

  • The $ARCHIVE filesystem is accessible from the Chinook login nodes only.
  • Files stored in $ARCHIVE will be written to tape and taken offline over time. Use the "batch_stage" command to bring the files back online prior to viewing the contents of the file or copying the data off $ARCHIVE.
  • If you have a legacy ARSC username, a symbolic link has been created linking your /archive/u1/uaf/ARSCusername directory to your /archive/u1/uaf/UAusername directory.

Migrating from Pacman/Fish

If you are an existing user of either Pacman or Fish, it is important that you know what to expect when you begin using Chinook. Below is a comparison of some characteristics across the different HPC clusters currently operated by RCS:

Attribute Chinook Fish Pacman
Operating System CentOS 6 (CentOS 7 upgrade planned) Cray Linux Environment (CLE) 4 RHEL 6
Workload manager Slurm PBS/Torque (Cray) PBS/Torque
Usernames UA usernames Legacy ARSC usernames Legacy ARSC usernames
Login nodes 2 (with more coming) 2 12 + 1 high memory
Compute nodes Intel Xeon, 24/28 cores per node AMD Istanbul/Interlagos, 12/16 cores per node, nVidia Tesla GPUs AMD Opteron, 16/32 cores per node
Interconnect QLogic QDR InfiniBand (EDR upgrade planned) Cray Gemini QLogic QDR and Voltaire SDR InfiniBand
Default compiler suite Intel PGI PGI
$CENTER Yes Yes Yes
$ARCHIVE Yes Yes Yes
/projects Yes Yes Yes
$HOME Yes (only available on cluster) Yes (only available on cluster) Yes
/usr/local/unsupported Yes (only available on cluster) Yes (only available on cluster) Yes

User-compiled software

All software previously compiled on Pacman or Fish will need to be recompiled for Chinook. This is due to differences between the hardware and Linux kernel present on Chinook and those on Pacman / Fish.

Software stack differences

Compiler toolchain modules

On Pacman and Fish, the environment modules responsible for loading a compiler and related set of core HPC libraries follow the "PrgEnv" naming convention common to many HPC facilities. Chinook's equivalent modules are called compiler toolchains, or just toolchains. See Compiler Toolchains for more information on what is available.

At this time, the "PrgEnv" modules on Chinook are deprecated, and have been replaced by toolchain modules instead. Depending on feedback, we may provide "PrgEnv"-style symlinks to the toolchain modules in the future.

Dependency loading behavior

The modules on Chinook now each specify and load a (mostly) complete module dependency tree. To illustrate, consider loading an Intel-compiled netCDF library. Here is what happens on Pacman:

$ module purge
$ module load netcdf/4.3.0.intel-2013_sp1
$ module list --terse
Currently Loaded Modulefiles:
netcdf/4.3.0.intel-2013_sp1

And an equivalent action on Chinook:

$ module purge
$ module load data/netCDF/4.4.1-pic-intel-2016b
$ module list --terse
Currently Loaded Modulefiles:
compiler/GCCcore/5.4.0
tools/binutils/2.26-GCCcore-5.4.0
compiler/icc/2016.3.210-GCC-5.4.0-2.26
compiler/ifort/2016.3.210-GCC-5.4.0-2.26
openmpi/intel/1.10.2
toolchain/pic-iompi/2016b
numlib/imkl/11.3.3.210-pic-iompi-2016b
toolchain/pic-intel/2016b
lib/zlib/1.2.8-pic-intel-2016b
tools/Szip/2.1-pic-intel-2016b
data/HDF5/1.8.17-pic-intel-2016b
tools/cURL/7.49.1-pic-intel-2016b
data/netCDF/4.4.1-pic-intel-2016b

On Pacman and Fish, you get exactly the module you requested and no more (with a few exceptions). This has advantages and disadvantages:

  • advantage: It is easy to experiment with different software builds by swapping library modules
  • disadvantage: It is not immediately obvious which libraries were used during any given software build when multiple versions of those libraries exist
  • disadvantage: It is trivial to introduce a fatal error in an application by inadvertently loading an incompatible library module or omitting a needed one

On Chinook, standardizing and loading all module dependencies results in consistency and reproducibility. When you load the Intel-compiled netCDF module on Chinook, for example, you get modules loaded for the following:

  • The netCDF library
  • Its immediate dependencies (HDF5, zlib, curl)
  • The dependencies for the dependencies (and so on, recursively)
  • The exact Intel compiler, MPI library, and Intel Math Kernel Library (MKL) used to build netCDF
  • An upgraded version of GCC to supersede the ever-present system version

This takes out the guesswork of manually piecing together a software stack module by module. Every successive dependency will modify LD_LIBRARY_PATH and other variables appropriately, so that desired application or library will dynamically link to the proper supporting libraries instead of accidentally picking up an inappropriate matching library.

One ramification of loading a full dependency tree is that trying to load software compiled with different compiler toolchains will likely result in module conflicts - even if the tools you are trying to load provide only binaries and nothing else. This is because combining two or more different dependency trees will likely result in unintended and harmful dynamic linking due to two different builds of a core compiler or library being loaded. LD_LIBRARY_PATH ensures that the library version found first will be used to satisfy all dependencies on that particular library, causing no problems for the software packages that expect it and possibly wreaking havoc for the packages that expect a different build.

Slurm Translation Guide

One of the most immediately evident changes with Chinook is that it uses Slurm for job scheduling rather than PBS/Torque. The workflow for submitting jobs has not changed significantly, but the syntax and commands have. Below is an excerpt from SchedMD's "Rosetta Stone of Workload Managers" relevant to PBS/Torque.

For more information on Slurm, please see Using the Batch System.

Source: http://slurm.schedmd.com/rosetta.pdf, 28-Apr-2013

User Commands PBS/Torque Slurm
Job submission qsub [script_file] sbatch [script_file]
Job deletion qdel [job_id] scancel [job_id]
Job status (by job) qstat [job_id] squeue [job_id]
Job status (by user) qstat -u [user_name] squeue -u [user_name]
Job hold qhold [job_id] scontrol hold [job_id]
Job release qrls [job_id] scontrol release [job_id]
Queue list qstat -Q squeue
Node list pbsnodes -l sinfo -N OR scontrol show nodes
Cluster status qstat -a sinfo
GUI xpbsmon sview
Environment PBS/Torque Slurm
Job ID $PBS_JOBID $SLURM_JOBID
Submit Directory $PBS_O_WORKDIR $SLURM_SUBMIT_DIR
Submit Host $PBS_O_HOST $SLURM_SUBMIT_HOST
Node List $PBS_NODEFILE $SLURM_JOB_NODELIST
Job Array Index $PBS_ARRAYID $SLURM_ARRAY_TASK_ID
Job Specification PBS/Torque Slurm
Script directive #PBS #SBATCH
Queue -q [queue] -p [queue]
Node Count -l nodes=[count] -N [min[-max]]
CPU Count -l ppn=[count] OR -l mppwidth=[PE_count] -n [count]
Wall Clock Limit -l walltime=[hh:mm:ss] -t [min] OR -t [days-hh:mm:ss]
Standard Output File -o [file_name] -o [file_name]
Standard Error File -e [file_name] e [file_name]
Combine stdout/err -j oe (both to stdout) OR -j eo (both to stderr) (use -o without -e)
Copy Environment -V --export=[ALL | NONE | variables]
Event Notification -m abe --mail-type=[events]
Email Address -M [address] --mail-user=[address]
Job Name -N [name] --job-name=[name]
Job Restart -r [y|n] --requeue OR --no-requeue (NOTE: configurable default)
Working Directory N/A --workdir=[dir_name]
Resource Sharing -l naccesspolicy=singlejob --exclusive OR--shared
Memory Size -l mem=[MB] --mem=[mem][M|G|T] OR --mem-per-cpu=[mem][M|G|T]
Account to Charge -W group_list=[account] --account=[account]
Tasks Per Node -l mppnppn [PEs_per_node] --tasks-per-node=[count]
CPUs Per Task   --cpus-per-task=[count]
Job Dependency -d [job_id] --depend=[state:job_id]
Job Project   --wckey=[name]
Job host preference   --nodelist=[nodes] AND/OR --exclude= [nodes]
Quality of Service -l qos=[name] --qos=[name]
Job Arrays -t [array_spec] --array=[array_spec] (Slurm version 2.6+)
Generic Resources -l other=[resource_spec] --gres=[resource_spec]
Licenses   --licenses=[license_spec]
Begin Time -A "YYYY-MM-DD HH:MM:SS" --begin=YYYY-MM-DD[THH:MM[:SS]]

Frequently Asked Questions

Will I need to copy my files from Pacman/Fish to Chinook?

$ARCHIVE and $CENTER are mounted on Chinook, so you will have access to all your existing files on Chinook. However, your home directory is new and we will not be automatically copying any Pacman/Fish home directory contents to Chinook. If you would like to transfer any files from your Pacman/Fish home directory, you may do so using scp or sftp between Pacman/Fish and Chinook.

I used the PGI compiler on Pacman/Fish. What are my options on Chinook?

Support for the PGI compiler suite will expire in FY17. If possible, please look into compiling your code using the Intel or GNU compiler suites. If not, the latest version of the PGI compilers available when support lapses will remain installed on Chinook.

I have a PBS/Torque batch script. Can I use it on Chinook?

Possibly. Slurm does provide compatibility scripts for various PBS/Torque commands including qsub. The compatibility is not perfect, and you will likely need to debug why your batch script isn't doing what you expect. It is worth putting that time towards porting the PBS script to Slurm syntax and using sbatch instead.

Using the Batch System

The Slurm (Simple Linux Utility for Resource Management) workload manager is a software package for submitting, scheduling, and monitoring jobs on large compute clusters. Slurm is available on Chinook for submitting and monitoring user jobs.

Similar to PBS/TORQUE, Slurm accepts user jobs specified in batch scripts. More information on Slurm batch scripts may be found below.

Common Slurm commands, Slurm batch scripts, translating from PBS/TORQUE to Slurm, and running interactive jobs are discussed below. SchedMD, the company behind Slurm, has also put together a quick reference for Slurm commands.

Batch overview

The general principle behind batch processing is automating repetitive tasks. Single tasks are known as jobs, while a set of jobs is known as a batch. This distinction is mostly academic, since the terms job and batch job are now mostly synonymous, but here we'll use the terms separately.

There are three basic steps in a batch or job-oriented workflow:

  1. Copy input data from archival storage to scratch space
  2. Run computational tasks over the input data
  3. Copy output to archival storage

On Chinook the first and last steps must occur on login nodes, and the computation step on compute nodes. This is enforced by the login nodes having finite CPU ulimits set and $ARCHIVE not being present on the compute nodes.

Depending on the scale and characteristics of a particular job, different jobs may require different combinations of computational resources. Garnering these resources is a combination of:

  • Choosing which partition to submit the job to
  • Choosing what resources to request from the partition

This is done by writing batch scripts whose directives specify these resources.

Available partitions

Name Node count Max walltime Nodes per job (min-max) Other rules Purpose
debug 1 1 hour 1 For debugging job scripts
t1small 5 1 day 1-2 For short, small jobs with quick turnover
t1standard 32 4 days 3-32 Default General-purpose partition
t2small 5 2 days 1-2 Tier 2 users only. Increased priority and walltime. Tier 2 version of t1small
t2standard 32 7 days 3-32 Tier 2 users only. Increased priority and walltime. Tier 2 general-purpose partition
transfer 1 1 day 1 Shared use Copy files between archival storage and scratch space

Selecting a partition is done by adding a directive to the job submission script such as #SBATCH --partition=t1standard, or on the command line: $ sbatch -p t1standard

Anyone interested in gaining access to the higher-priority Tier 2 partitions (t2small, t2standard) by subscribing to support the cluster or procuring additional compute capacity should contact uaf-rcs@alaska.edu.

Common Slurm Commands

sacct

The sacct command is used for viewing information about submitted jobs. This can be useful for monitoring job progress or diagnosing problems that occurred during job execution. By default, sacct will report the job ID, job name, partition, account, allocated CPU cores, job state, and the exit code for all of the current user's jobs that have been submitted since midnight of the current day.

sacct's output, as with most Slurm informational commands, can be customized in a large number of ways. Here are a few of the more useful options:

Command Result
sacct --starttime 2016-03-01 select jobs since midnight of March 1, 2016
sacct --allusers select jobs from all users (default is only the current user)
sacct --accounts=account_list select jobs whose account appears in a comma-separated list of accounts
sacct --format=field_names print fields specified by a comma-separated list of field names
sacct --helpformat print list of fields that can be specified with --format

For more information on sacct, please visit https://slurm.schedmd.com/sacct.html.

sbatch

The sbatch command is used for submitting jobs to the cluster. Although it is possible to supply command-line arguments to sbatch, it is generally a good idea to put all or most resource requests in the batch script for reproducibility.

Sample usage:

sbatch mybatch.sh

On successful batch submission, sbatch will print out the new job's ID. sbatch may fail if the resources requested cannot be satisfied by the indicated partition.

For more information on sbatch, please visit https://slurm.schedmd.com/sbatch.html.

scontrol

The scontrol command is used for monitoring and modifying queued or running jobs. Although many scontrol subcommands apply only to cluster administration, there are some that may be useful for users:

Command Result
scontrol hold job_id place hold on job specified by job_id
scontrol release job_id release hold on job specified by job_id
scontrol show reservation show details on active or pending reservations
scontrol show nodes show hardware details for compute nodes

For more information on scontrol, please visit https://slurm.schedmd.com/scontrol.html.

sinfo

The sinfo command is used for viewing compute node and partition status. By default, sinfo will report the ID, partition, job name, user, state, time elapsed, nodes requested, nodes held by running jobs, and reason for being in the queue for queued jobs.

sinfo's output, as with most Slurm informational commands, can be customized in a large number of ways. Here are a few of the more useful options:

Command Result
sinfo --partition=t1standard show node info for the partition named 't1standard'
sinfo --summarize group by partition, aggregate state by A/I/O/T (Available/Idle/Other/Total)
sinfo --reservation show Slurm reservation information
sinfo --format=format_tokens print fields specified by format_tokens
sinfo --Format=field_names print fields specified by comma-separated field_names

There are a large number of fields hidden by default that can be displayed using --format and --Format. Refer to the sinfo's manual page for the complete list of fields.

For more information on sinfo, please visit https://slurm.schedmd.com/sinfo.html.

smap

The smap command is an ncurses-based tool useful for viewing the status of jobs, nodes, and node reservations. It aggregates data exposed by other Slurm commands, such as sinfo and squeue.

Command Result
sinfo -i 15 Run sinfo, refreshing every 15 seconds

For more information on smap, please visit https://slurm.schedmd.com/smap.html.

squeue

The squeue command is used for viewing job status. By default, squeue will report the ID, partition, job name, user, state, time elapsed, nodes requested, nodes held by running jobs, and reason for being in the queue for queued jobs.

squeue's output, as with most Slurm informational commands, can be customized in a large number of ways. Here are a few of the more useful options:

Command Result
squeue --user=user_list filter by a comma-separated list of usernames
squeue --start print expected start times of pending jobs
squeue --format=format_tokens print fields specified by format_tokens
squeue --Format=field_names print fields specified by comma-separated field_names

The majority of squeue's customization is done using --format or --Format. The lowercase --format allows for controlling which fields are present, their alignments, and other contextual details such as whitespace, but comes at the cost of readability and completeness (not all fields can be specified using the provided tokens). In contrast, the capitalized --Format accepts a complete set of verbose field names, but offers less flexibility with contextual details.

As an example, the following command produces output identical to squeue --start:

squeue --format="%.18i %.9P %.8j %.8u %.2t %.19S %.6D %20Y %R" --sort=S --states=PENDING

--Format can produce equivalent (but not identical) output:

squeue --Format=jobid,partition,name,username,state,starttime,numnodes,schednodes,reasonlist --sort=S --states=PENDING

For more information on squeue, please visit https://slurm.schedmd.com/squeue.html.

sreport

The sreport command is used for generating job and cluster usage reports. Statistics will be shown for jobs run since midnight of the current day by default. Although many of sreport's reports are more useful for cluster administrators, there are some commands that may be useful to users:

Command Result
sreport cluster AccountUtilizationByUser -t Hours start=2016-03-01 report hours used since Mar 1, 2016, grouped by account
sreport cluster UserUtilizationByAccount -t Hours start=2016-03-01 Users=$USER report hours used by the current user since Mar 1, 2016

For more information on sreport, please visit https://slurm.schedmd.com/sreport.html.

srun

The srun command is used to launch a parallel job step. Typically, srun is invoked from a Slurm batch script to perform part (or all) of the job's work. srun may be used multiple times in a batch script, allowing for multiple program runs to occur in one job.

Alternatively, srun can be run directly from the command line on a login node, in which case srun will first create a resource allocation for running the job. Use command-line keyword arguments to specify the parameters normally used in batch scripts, such as --partition, --nodes, --ntasks, and others. For example, srun --partition=debug --nodes=1 --ntasks=8 whoami will obtain an allocation consisting of 8 cores on 1 node and then run the command whoami on all of them.

Please note that srun does not inherently parallelize programs - it simply runs many independent instances of the specified program in parallel across the nodes assigned to the job. Put another way, srun will launch a program in parallel, but makes no guarantee that the program is designed to be run in parallel at any degree.

See Interactive Jobs for an example of how to use srunto allocate and run an interactive job (i.e. a job whose input and output are attached to your terminal).

A note about MPI: srun is designed to run MPI applications without the need for using mpirun or mpiexec, but this ability is currently not available on Chinook. It may be made available in the future. Until then, please refer to the directions on how to run MPI applications on Chinook below.

For more information on srun, please visit https://slurm.schedmd.com/srun.html.

sview

The sview command is a graphical interface useful for viewing the status of jobs, nodes, partitions, and node reservations. It aggregates data exposed by other Slurm commands, such as sinfo, squeue, and smap, and refreshes every few seconds.

For more information on sview, please visit https://slurm.schedmd.com/sview.html.

Batch Scripts

Batch scripts are plain-text files that specify a job to be run. They consist of batch scheduler (Slurm) directives which specify the resources requested for the job, followed by a script used to successfully run a program.

Here is a simple example of a batch script that will be accepted by Slurm on Chinook:

#!/bin/bash
#SBATCH --partition=debug
#SBATCH --ntasks=24
#SBATCH --tasks-per-node=24

echo "Hello world"

On submitting the batch script to Slurm using sbatch, the job's ID is printed:

$ ls
hello.slurm
$ sbatch hello.slurm
Submitted batch job 8137

Among other things, Slurm stores what the current working directory was when sbatch was run. Upon job completion (nearly immediate for a trivial job like the one specified by hello.slurm), output is written to a file in that directory.

$ ls
hello.slurm  slurm-8137.out
$ cat slurm-8137.out
Hello world

Running an MPI Application

Here is what a batch script for an MPI application might look like:

#!/bin/sh

#SBATCH --partition=t1standard
#SBATCH --ntasks=<NUMTASKS>
#SBATCH --tasks-per-node=24
#SBATCH --mail-user=<USERNAME>@alaska.edu
#SBATCH --mail-type=BEGIN
#SBATCH --mail-type=END
#SBATCH --mail-type=FAIL
#SBATCH --output=<APPLICATION>.%j

# Load any desired modules, usually the same as loaded to compile
. /etc/profile.d/modules.sh
module purge
module load PrgEnv-intel
module load slurm

cd $SLURM_SUBMIT_DIR
# Generate a list of allocated nodes; will serve as a machinefile for mpirun
srun -l /bin/hostname | sort -n | awk '{print $2}' > ./nodes.$SLURM_JOB_ID
# Launch the MPI application
mpirun -np $SLURM_NTASKS -machinefile ./nodes.$SLURM_JOB_ID ./<APPLICATION>
# Clean up the machinefile
rm ./nodes.$SLURM_JOB_ID
  • <APPLICATION>: The executable to run in parallel
  • <NUMTASKS>: The number of parallel tasks requested from Slurm
  • <USERNAME>: Your Chinook username (same as your UA username)

There are many environment variables that Slurm defines at runtime for jobs. Here are the ones used in the above script:

  • $SLURM_JOB_ID: The job's numeric id
  • $SLURM_NTASKS: The value supplied as <NUMTASKS>
  • $SLURM_SUBMIT_DIR: The current working directory when "sbatch" was invoked

Interactive Jobs

Command Line Interactive Jobs

Interactive jobs are possible on Chinook using srun:


chinook:~$ srun -p debug --nodes=1 --exclusive --pty /bin/bash

The above command will reserve one node in the debug partition and launch an interactive shell job. The --pty option executes task zero in pseudo terminal mode and implicitly sets --unbuffered and --error and --output to /dev/null for all tasks except task zero, which may cause those tasks to exit immediately.

Displaying X Windows from Interactive Jobs

A new module named "sintr" is available to create an interactive job that forwards application windows from the first compute node back the local display. This relies on using X11 forwarding over SSH, so make sure to enable graphics when connecting to a chinook login node. The sintr command accepts the same command line arguments as sbatch. To launch a single node interactive job in the debug partition, for example, follow these steps:


chinook:~$ module load sintr
chinook:~$ sintr -p debug -N 1
Waiting for JOBID #### to start.
...

The command will wait for a node to be assigned and the job to launch. As soon as that happens, the next prompt should be on the first allocated compute node, and the DISPLAY environment variable will be set to send X windows back across the SSH connection. It is now possible to load and execute a desired windowed application. Here's an example with Totalview.


bash-4.1$ module load totalview
bash-4.1$ totalview

After exiting an application, exit the session too. This will release the allocated node(s) and end the interactive job.


bash-4.1$ exit
exit
[screen is terminating]
chinook:~$

Third-Party Software

Installing Your Own Software

Individuals and research groups may install third party applications and libraries for their own use in the following locations:

  • $HOME
  • /usr/local/unsupported

Packages built for personal use only should be installed in $HOME.

The /usr/local/unsupported directory is intended to host user-installed and maintained software packages and datasets that are shared with a group of users on the system. Users who add content to /usr/local/unsupported are fully responsible for the maintenance of the files and software versions. Please read the /usr/local/unsupported/README.RCS file for more information.

To request a new subdirectory within /usr/local/unsupported, please contact RCS with the following information:

  • The name of the requested subdirectory, which can be your project's name (e.g., UAFCLMIT) or the type of software you intend to install in the directory (e.g., "ClimateModels")
  • A general description of what you intend to install
  • A rough estimate of the amount of storage you will need (e.g., 100 MB)

Using The Software Stack

Chinook already has builds of many third-party software packages (see below for a listing). There are often multiple builds of a particular software package - different versions, different compilers used to build the software, different compile-time flags, et cetera. To avoid conflicts between these many disparate package builds, Chinook employs an environment module system you can use to load and unload different combinations of software packages into your environment.

What are Environment Modules?

The environment modules found on Chinook (often referred to simply as "modules") are Tcl script files that are used to update shell environment variables such as PATH, MANPATH, and LD_LIBRARY_PATH. These variables allow your shell to discover the particular application or library as specified by the module. Some environment modules set additional variables (such as PYTHONPATH or PERL5LIB), while others simply load a suite of other modules.

Common module commands

Command Result
module avail list all available modules
module avail pkg list all available modules beginning with the string pkg
module load pkg load a module named pkg
module swap old new attempt to replace loaded module named old with one named new
module unload pkg unload a module named pkg
module list list all currently-loaded modules
module purge unload all modules
module show pkg summarize environment changes made by module named pkg (sometimes incomplete)

Searching for modules

Because module avail will search for the provided string only at the beginning of a module's fully-qualified name, it can be difficult to use module avail to search for modules nested in any kind of hierarchy. This is the case on Chinook - modules are categorized, then named. Here are some examples:

  • compiler/GCC/version
  • devel/CMake/version
  • math/GMP/version

To find modules for GCC using a pure module avail command, you would need to run module avail compiler/GCC. This is difficult, because you must already know that the module is in the compiler category.

To make things more complicated, module avail is also case-sensitive. Running module avail devel/cmake will not find the module named devel/CMake/version.

Better module searching

One workaround for these impediments is to combine module avail output with grep's full-text case-insensitive string matching ability. The example below additionally uses Bash file descriptor redirection syntax to redirect stderr to stdout because module avail outputs to stderr.

module avail --terse 2>&1 | grep -i pkg

replacing pkg with the string you are searching for.

RCS is currently evaluating Lmod as a replacement for Chinook's current environment modules framework. Lmod has many desirable features, including but not limited to a more user-friendly module avail behavior.

For more information on Chinook's module framework, please visit http://modules.sourceforge.net/index.html.


Compiler Toolchains

Compiler toolchains are modules that bundle together a set of compiler, MPI, and numerical library modules. To use a compiler toolchain, load the compiler toolchain module and all the submodules will be loaded. This will set variables such as PATH, CPATH, LIBRARY_PATH, LD_LIBRARY_PATH, and others. Other variable conventions such as CC and CXX are not automatically defined.

Since Chinook is an Intel-based HPC cluster, RCS defaults to compiling software using Intel-based compiler toolchains.

Toolchain Name Version Comprises
pic-foss 2016b GNU Compiler Collection 5.4.0, Penguin-modified OpenMPI 1.10.2, OpenBLAS 0.2.18, FFTW 3.3.4, ScaLAPACK 2.0.2
pic-intel 2016b Intel Compiler Collection 2016.3.210 (2016 update 3), Penguin-modified OpenMPI 1.10.2, Intel Math Kernel Library (MKL) 11.3.3.210

MPI Libraries

RCS defaults to compiling software against OpenMPI.

Name Version Compiled by Notes
Intel GCC
MPICH2 1.5
MVAPICH2 2.1
MVAPICH2-PSM 2.1
OpenMPI 1.10.2 Included in pic-intel, pic-foss compiler toolchains
OpenMPI 1.6.5
OpenMPI 1.7.5
OpenMPI 1.8.8

Maintained Software Installations

As of 2016-10-19.

Name Version Compiled by Notes
Intel GCC
Autoconf 2.69
Automake 1.15
Autotools 20150215
BamTools 2.4.0
BayeScan 2.1
BCFtools 1.3.1
binutils 2.26 Included in pic-foss compiler toolchain
Bison 3.0.4
Boost 1.61.0
BWA 0.7.15
bzip2 1.0.6
cairo 1.14.6
CMake 3.5.2
cURL 7.49.1
Doxygen 1.8.11
ESMF 7.0.0
expat 2.2.0
FASTX-Toolkit 0.0.14
FFTW 3.3.4 Included in pic-foss compiler toolchain
flex 2.6.0
fontconfig 2.12.1
freetype 2.6.5
g2clib 1.4.0
g2lib 1.4.0
GCC 5.4.0 Included in pic-foss compiler toolchain
GDAL 2.1.0
gettext 0.19.8
GLib 2.49.5
GMP 6.1.1
GSL 2.1
gzip 1.6
HDF 4.2.11
HDF5 1.8.17
HTSlib 1.3.1
icc 2016.3.210 Included in pic-intel compiler toolchain
idl 8.4.1
ifort 2016.3.210 Included in pic-intel compiler toolchain
imkl 11.3.3.210 Included in pic-intel compiler toolchain
JasPer 1.900.1
libffi 3.2.1
libgtextutils 0.7
libjpeg-turbo 1.5.0
libpng 1.6.24
libreadline 6.3
libtool 2.4.6
libxml2 2.9.4
M4 1.4.17
makedepend 1.0.5
MATLAB R2015a
MATLAB R2015b
MATLAB R2016a
Mothur 1.38.1.1
NASM 2.12.02
NCL 6.3.0 Binary distribution
ncurses 6.0
netCDF 4.4.1
netCDF-C++4 4.3.0
netCDF-Fortran 4.4.4
OpenBLAS 0.2.18 Included in pic-foss compiler toolchain
PCRE 8.39
Perl 5.22.1
pixman 0.34.0
pkg-config 0.29.1
Python 2.7.12
SAMtools 1.3.1
ScaLAPACK 2.0.2 Included in pic-foss compiler toolchain
Singularity 2.2
SQLite 3.13.0
Szip 2.1
Tcl 8.6.5
Tk 8.6.5
UDUNITS 2.2.20
VCFtools 0.1.14
X11 20160819
XZ 5.2.2
zlib 1.2.8

Software requests

RCS evaluates third-party software installation requests for widely-used HPC software on a case-by-case basis. Some factors that affect request eligibility are:

  • Applicability to multiple research groups
  • Complexity of the installation process
  • Software licensing

If a third-party software installation request is found to be a viable candidate for installation, RCS may elect to install the software through one of several means:

  • RPM
  • Binary (pre-built) distribution
  • Source build

If an application or library is available through standard RPM repositories (Penguin Computing, CentOS, EPEL, ...) then the RPM may be installed. Users should test the installed software to determine if it meets requirements. If the RPM version does not meet needs, please contact RCS to have alternate installation methods evaluated.

Software that is not installed as an RPM will be installed in a publicly-available location and be accessible via Linux environment modules. If the software is built from source, then RCS will default to using the Intel compiler suite.

Compiling from Source Code

Compiling C, C++, and Fortran code on Chinook is reasonably similar to how it is done on regular CentOS systems, with some important differences. This page will try to outline both areas, focusing mostly on the differences.

Tools

Available Compilers

The default compiler suite on Chinook is the Intel Parallel Studio XE Composer Edition, providing icc, icpc, and ifort. Intel compilers are designed for best performance on Intel processors, which can be taken advantage of on Chinook.

The GNU Compiler Collection is also available and maintained on Chinook, providing gcc, g++, and gfortran. GNU compiler compatibility is ubiquitous across free and open-source software projects, which includes much scientific software.

For docmentation on each of these compiler suites, please refer to the following:

Open-source Linear Algebra / FFT

The following free and open-source linear algebra and fast Fourier transform libraries have been built using GCC and are available for use:

  • OpenBLAS (includes LAPACK)
  • ScaLAPACK
  • FFTW

Intel MKL

The Intel Math Kernel Library (MKL) is available for use on Chinook. MKL offers Intel-tuned versions of all of the above open-source libraries, effectively replacing them.

For more information on linking against MKL, see Intel's MKL Linking Quick Start. Of particular note is the online MKL Link Line Advisor, which will generate appropriate link flag strings for your needs.

For more information on MKL itself, see Intel's MKL documentation.

System Architecture

Physical Architecture

Chinook currently has two login nodes:

  • chinook00.alaska.edu
  • chinook01.alaska.edu

chinook.alaska.edu will point to one of the above login nodes.

As of December 2016 all Chinook compute nodes are Intel Relion 1900 compute nodes with dual Intel Xeon E5-2690 v3 12-core processors (24 cores per node) and 128GB RAM.

Software Architecture

Chinook currently runs the CentOS 6 operating system (Linux kernel version 2.6). An upgrade to CentOS 7 (Linux kernel version 3.10) is planned for 2017.

Recent versions of the Intel and GNU compiler collections, several different MPI implementations, and core math libraries are available on Chinook. For more details, please refer to the list of third-party software maintained on Chinook.

Community Condo Model

Chinook is a community, condo model high performance computing (HPC) cluster. Common infrastructure elements such as the environmentally regulated data center, network connectivity, equipment racks, management and technical staff, and a small amount of CPUs provide subsidized resources to PIs that they may not be able to procure individually, and allows them to focus time and energy on research, rather than owning and operating individual clusters.

Participants in the condo service share unused portions or elements of the computational resources they add to Chinook with each other and non-invested users - such as students or occasional users - who may or may not pay a fee for access. A queue management system gives vested PIs top priority to the share he/she has purchased whenever the PI needs the resource. RCS also reserves the option to use manual or automated job preemption to interrupt community user jobs as needed to give vested PIs access to their share.

Tier 1: Community Nodes

This level of service is open to the UA research community using nodes procured for the community and unused portions of shareholder nodes. Users in this tier receive:

  • Unlimited total compute node CPU hours
  • Lower initial job priority
  • $HOME quota of 10 GB
  • 750 GB per user Lustre storage quota ($CENTER or $CENTER1)
  • Access to job queues with limited wall time
  • Standard user support (account questions, software requests, short job debugging and diagnosis assistance, ...)

Tier 2: Shareholder Shared Nodes

This level of service is for the PI or project that requires CPUs beyond what can be offered by Tier 1 or requires priority access to HPC resources. Users in this tier are shareholders that procure equipment or support and receive:

  • Unlimited total compute node CPU hours
  • Higher initial job priority, weighted by number of shares procured
  • $HOME quota of 10 GB
  • 1 TB per user Lustre storage quota ($CENTER or $CENTER1)
  • Project shared Lustre storage quota ($CENTER or $CENTER1) weighted by the shares procured (not currently in effect)
  • Access to job queues with limited wall times greater than Tier 1
  • Preemption over Tier 1 users (not currently in effect)
  • Short term reservations (contact uaf-rcs@alaska.edu to arrange)
  • Higher priority given to user support requests

Tier 3: Shareholder Dedicated Nodes

This level of service is for the PI or project that requires dedicated resources. RCS will manage and operate procured nodes for an additional service fee. Users interested in this level of service should contact RCS.

  • Limited CPU to procured nodes and infrastructure components
  • Limited Lustre (equal to Tier 1 unless additional capacity procured by PI/project) + DataDir storage (pending)
  • No priority or preemption rights to Tier 1 or Tier 2
  • Dedicated queue(s) with unlimited wall times

Purchasing Chinook Shares

Please contact RCS (uaf-rcs@alaska.edu) if you are interested in purchasing compute nodes or support and becoming a tier 2 or 3 shareholder. All node types include licenses for Scyld ClusterWare, Mellanox UFM, and Linux with a minimum 3-year service contract. The purchase price provides shares in Chinook that align with the warranty of the equipment purchased. When shares expire, the resources must be upgraded or the warranty must be renewed, otherwise the resources revert to the community pool and the project will be given a Tier 1 status.

Node Type Description Approximate Cost
Standard Compute Node Relion 1900: 28-core
Intel E5-2690v4 processors with 128 GB RAM, 3-year warranty
~$8,200
Relion 1900: 28-core
Intel E5-2690v4 processors with 128 GB RAM, 4-year warranty
~$8,500
Relion 1900: 28-core
Intel E5-2690v4 processors with 128 GB RAM, 5-year warranty
~$8,800
BigMem Compute Node Relion 1900: 28-core
Intel E5-2690v4 processors with 1.5 TB RAM
~$26,000
Lustre Object Server Node TBD TBD

Lifecycle Management

All compute nodes include factory support for the duration of the warranty period. During this time any hardware problems will be corrected as soon as possible. After the warranty expires, compute nodes will be supported on a best-effort basis until they suffer complete failure, are replaced, or reach a service age of 5 years. Once a node has reached end-of-life due to failure or obsolescence, it will be removed from service.

Storage

  • Digdug

    Lustre File System

    Dig Dug is a non-commodity cluster hosting 275 Terabytes of fast-access scratch storage for the RCS HPC clusters. UA researchers utilize this resource for program compilation, data manipulation and massive short-term file input/output storage. Dig Dug serves the Gemini and Infiniband interconnects and Ethernet connection to the HPC clusters.

  • Bigdipper

    Sun SPARC Enterprise T5440 Server

    Bigdipper hosts the long term mass file storage for RCS user data. Offering 214 Terabytes of local disk cache and management of the long term file writing and staging requests to tape, Bigdipper works in conjunction with the automated tape library to offer access to over 7 Petabytes of file storage.

  • Galaga

    IBM System Storage TS3500 Tape Library

    Galaga is home to over seven Petabytes of enterprise-scale, long-term data storage on the UAF campus. Hosting an automated tape cartridge system with an available tape slot count of over 2,600, this system presents a seemingly unlimited file storage solution to the UA community.

  • Interested in using these resources?
    Visit the RCS User Access page to learn how.

Labs and Classrooms

RCS maintains three labs and classrooms on the UAF campus.

  • Remote Sensing Lab

    WRRB 004. This lab contains 16 student workstations running Windows 7 Pro plus an instructor's station and projector. Installed software supports GIS and Remote Sensing exercises and learning.

    WRRB 004 Schedule
    Schedule Request

  • Mac Classroom

    WRRB 009. RCS is hosting 12 Mac Pro Systems in the 009 West Ridge Research Building. Use of these systems is available to staff, faculty and students affiliated with a UAF class or project. Access to the room is available 24 hours a day, 7 days a week, upon request. Room access is controlled via PolarExpress card entry.

    WRRB 009 Schedule
    Schedule Request

  • Linux Lab

    WRRB 004. RCS maintains an HPC visualization lab with six workstations running Red Hat Enterprise Linux in 004 West Ridge Research Building. These machines are available to all RCS HPC users.

    Request Access
    Remote Login

Remote Login

To log into Linux workstations remotely, you will need a user account and Secure Shell (SSH) client program.

Linux

Linux users should use the OpenSSH client, which is already installed on your computer. Open a terminal session and run the following command to connect to one of the workstations:

ssh uausername@host

replacing uausername with your UA username (e.g. jsmith2) and host with one of the following host names:

  • newton.arsc.edu
  • einstein.arsc.edu
  • planck.arsc.edu
  • tesla.arsc.edu
  • feynman.arsc.edu
  • hawking.arsc.edu

Mac

Mac users, like Linux users, should use the pre-installed OpenSSH client. See above for directions.

Unlike Linux, Mac operating systems do not come with an X Window server pre-installed. If you want to run any graphical applications, we recommend installing XQuartz on your Mac.

Windows

Windows users will need to download and install a third-party SSH client in order to log into the Linux Workstations. Here are a few available options:

  • PuTTY (open source, MIT license)
  • Secure Shell (proprietary, UA credentials required for download)

Installing PuTTY

RCS recommends that Windows users download and install PuTTY, a free-and-open-source ssh/rsh/telnet client.

  1. Download PuTTY from the official site.
  2. Run the PuTTY installer, and select "Next".
  3. By default, the installer will install in C:\Program Files (x86)\PuTTY under 64-bit Windows, and C:\Program Files\PuTTY under 32-bit Windows. Select "Next".
  4. The installer will prompt you for a Start Menu folder in which to create shortcuts. Select "Next".
  5. Select "Create a desktop icon for PuTTY", and select "Next".
  6. The installer will allow you to review your choices. Select "Install" after you have done so.
  7. The installer will require only a few seconds to install PuTTY on your computer. Select "Finish". As it closes, the installer will by default open PuTTY's readme file, which contains additional information on using the additional tools included with PuTTY.

Using PuTTY

Establishing a remote login session over SSH using PuTTY is reasonably straightforward. The following steps describe how to do this, turn on X11 forwarding, and save connection settings.

Remote Login via SSH

  1. Open PuTTY using the icon placed on your desktop by the PuTTY installer.
  2. For "Host Name", enter hostname.arsc.edu where hostname is newton, einstein, planck, tesla, feyman, hawking
  3. Select "Open" to initiate an SSH connection request.
  4. You may receive a security alert warning you that your client cannot establish authenticity of the host you are connecting to. This warning is always displayed the first time your SSH client connects to any computer it has never connected before. If you have never connected to one of the Linux workstations using this PuTTY installation, select "Yes".
  5. A terminal window should open. You will be prompted for a username. Enter your UA username.
  6. You will be prompted for a password. Enter your UA password and continue.
  7. On successful authentication, a command prompt will appear and allow you to execute commands on the linux workstation.

Enabling Graphics

Some applications, especially visualization applications, require a graphical display. It is possible to tunnel graphics over an SSH connection using X11 graphics forwarding, which is supported by PuTTY.

  1. Install a local X Window server. We recommend installing the last free version of XMing, which became proprietary software in May 2007.
  2. In PuTTY, define a connection to hostname.arsc.edu where hostname is newton, einstein, planck, tesla, feyman, hawking and navigate to "Connection-SSH-X11". Check the box labeled "Enable X11 forwarding".
  3. Initiate an SSH connection request and log in as outlined in the last section.
  4. Ensure that your local X server is running. Without this, any graphical application will fail to run properly.
  5. Run xlogo, a simple graphical application. If you see a window containing a black X on a white background, you have successfully enabled X11 forwarding.

Saving Connection Settings

  1. Configure your connection settings as desired
  2. Navigate to "Category-Session"
  3. Enter a name for your session in the "Saved Sessions" input box, and select "Save". Your session should now appear as a new line in the text box to the left of "Save".
  4. To load saved settings, select the session you want to load and then select "Load".

Optionally, PuTTY's command-line flags allow you to create shortcuts that load a particular connection.

  1. Copy your PuTTY shortcut icon
  2. Right click on the copy, and select "Properties"
  3. In the "Target" field, append -load followed by the connection name in quotation marks
  4. Select "Apply", and close the window
  5. Rename the modified shortcut appropriately

Troubleshooting

When I try to connect, PuTTY opens an alert box that says "Disconnected: No supported authentication methods available".

This message means that authentication by username failed. This is most likely caused by an incorrect username, or because you do not have access to the Linux Workstations. Please ensure that you received an email from RCS User Support (uaf-rcs@alaska.edu) notifying you of your account creation, and use the username provided in that email.

My application returns the error "X connection to localhost:10.0 broken (explicit kill or server shutdown)" (or similar).

This is an indication that your local X server is not running. Check the icons on the right-hand side of your task bar for the X server icon. If it is not present, ensure that you have installed an X server locally and that it is running. Once the icon is present, try opening your program again.

I received the "Unknown Host Key" popup alert, followed by another popup stating: "Server unexpectedly closed network connection".

This indicates that the server's SSH timeout was triggered. SSH servers are often configured to kill incoming connections that do not send data for a while. While you were responding to the "Unknown Host Key" popup, the remote host's connection timeout expired and it disconnected you. You should be able to reconnect without problem.

Getting Access

RCS computing and storage resources are available upon request to University of Alaska faculty, staff, and sponsored individuals. The following steps outline the resource request process.

  • Step 1: Request or Identify a Project

    In order to use RCS resources, RCS users must be affiliated with one or more RCS projects.

    • If you are joining an existing RCS project, please contact your principal investigator for the RCS Project ID.
    • If you are UA staff or faculty, you may request a new RCS project using the project account application form.
  • Step 2: Request A User Account

    • Already have a UA username and password? After applying for an RCS project (UA faculty / staff only) or identifying which existing RCS project you would like to join, please complete the user account application form.
    • Do you need a UA username and password? Ask your UA-affiliated colleague to sponsor a UA Guest account for you by completing the form available from UAF OIT at https://www.alaska.edu/oit/services/account-management/forms/formMemberAffiliateAccountRequestForm.pdf. The following fields should be completed on this form:

      • UA Guest: The UA Guest completes the top section of the form, then signs the bottom.
      • UA Sponsor: In the "ACCESS REQUESTED BY SPONSOR" section, check the "Other" box, then enter "Authserv affiliation for RCS HPC and storage."
      • UA Sponsor: Enter an expiration date in the "Sponsor Specified Expiration Date." Leaving the field blank will default to 12 months.
      • UA Sponsor: Sign and date the form as the affiliates's sponsor.
      • UA Sponsor: Return the completed form to OIT.

      After you receive your UA Guest account credentials, complete the user account application form.

Project Account Application

University of Alaska username
Note: Only select Long-Term Data Storage if you are not also requesting HPC access.

Project Account Agreement

By submitting this form, you understand and agree to the following terms:

I understand the information I submitted on this form will be used to evaluate my eligibility for an RCS resource grant. I attest the information submitted on this form is true and correct to the best of my knowledge. If I am granted access to RCS systems I will read, understand, and abide by the rules, policies, and procedures regarding proper use of RCS resources.

Eligibility of project members using RCS resources is contingent on them conducting work consistent with the project description provided herein.

Commercial software availability is contingent on funding provided by projects, academic departments, or the University of Alaska. Some software licenses may have access restrictions which limit the use of and access to the software package.

Access to RCS resources is contingent on my affiliation with the University of Alaska. Should that affiliation end, I understand that access to the resources assigned to this project may be terminated or reassigned at the discretion of RCS.

Should this resource grant proposal be accepted, I also agree to include the acknowledgement found at http://www.gi.alaska.edu/research-computing-systems/citing in any publications which result from research supported by this project grant.

User Account Application

University of Alaska username
If you do not know the project ID, please contact your principal investigator. If this submission is accompanied by a project account submission, please enter "Pending".
If you have multiple UA affiliations, please select the one that best describes your status on the project you are joining.

RCS Account Agreement

The following paragraph applies to the Principal Investigator only:

Annual Report

I understand that as a condition of continued access to RCS resources, I will submit a brief annual report on this project describing research objectives, computational methodology, current results and significance. This report shall include references to all publications (per: Credit paragraph below).

The following paragraphs apply to all users:

RCS Account Policies

I acknowledge personal responsibility for my account credentials and understand that I am responsible for their security. I will protect my account from misuse. I will not share my account or its credentials with anyone for any reason. I hereby attest that I have read, understand, and agree to be bound by the rules, policies, and procedures regarding my access to RCS resources. I agree to report to RCS any problems I encounter while using RCS systems, or any misuse of accounts or passwords by other persons which may occur and come to my knowledge. I understand that RCS will investigate each incident.

Restrictions and Auditing

I will not execute, copy or store copyrighted or proprietary software or information on RCS systems without proper authorization. I understand that I am not allowed to process sensitive or classified information on RCS resources. I understand that as a user of RCS systems, my activities are audited and that misuse of RCS resources may result in disciplinary action and/or revocation of current, or denial of future computing privileges.

Credit

I agree to include the acknowledgement found at http://www.gi.alaska.edu/research-computing-systems/citing in any publications that result from research supported by this grant. I will submit a copy of each of these publications to RCS (the copies can be included with the annual report).

Availability

I understand that RCS makes a reasonable attempt to ensure the availability and integrity of my data and software through regular on-site backups of my home directory. RCS does not maintain off-site backups of user data and I agree to assume all responsibility for the risk of loss of my data and software--regardless of the cause of that loss. I understand that RCS makes a reasonable attempt to ensure the availability of its HPC resources and that periodically, any or all of these systems may go down for scheduled or unscheduled maintenance. I agree that it is my responsibility to ensure the recoverability of my data, wherever feasible, should any of my jobs be unexpectedly restarted or lost due to a downtime.

By submitting this form, you agree to the above terms and conditions.

Directory

News Releases

Contact Us

  • Office Suite

    Geophysical Institute
    508 Elvey

  • Phone & Email

    Phone: 907-450-8602
    Email: uaf-rcs@alaska.edu

  • Shipping

    Research Computing Systems Geophysical Institute
    University of Alaska Fairbanks
    PO Box 757320
    903 Koyukuk Drive
    Fairbanks, AK 99775-7320

Network

  • GI Network

    Tell me about the GI network

    The GI provides network service in the Elvey Building and WRRB. Network service is provided in other buildings, including the Akasofu building, by OIT. For Elvey and WRRB network service, contact RCS. Other locations should contact the OIT Help Desk or (907)450-8300.

    Why does the GI have it's own network? What is the firewall and why do I care? How does the GI interact with OIT's network? I'm in Akasofu but I'm a member of the GI, why can't RCS help me? I'm in WRRB and I'm not a member of the GI...why do I have to deal with RCS?

  • Telephones

    Telephone service is provided through the network. Your computer can receive its networking from the telephone. Ensure the left network port (as seen from the rear, labeled "10/100/1000 SW") is plugged into the wall. Your computer should be plugged into the right port ("10/100/1000 PC"). If you change offices, simply take your phone to your new office. Your phone number is tied to your particular telephone, so will follow you around. Contact RCS if your telephone does not turn on when plugged into the wall in Elvey or WRRB. Contact OIT otherwise.

  • NetReg

    What is NetReg?

    NetReg is a network registration system that manages IP addresses and DNS entries. The vast majority of computers connected to the wired network in the Elvey Building and WRRB must be enrolled in NetReg. Your UA username and password will log you in, but RCS must provision your account before it is useable. Contact uaf-rcs@alaska.edu to set up your account.

    Using NetReg

    NetReg is a web based application. Simple point your browser to https://netreg.gi.alaska.edu to log in and start using it. It only works for machines on the GI network. Follow the prompts to register your new computer - most people will want to use Simple Registration with a Dynamic IP Address. Printers and other headless network appliances can be registered manually. You will need to know the device's MAC address; it is often printed on the back of the device or, for printers, by printing a status page. Contact uaf-rcs@alaska.edu if you need assistance registering any device or the device has special network requirements.

  • VPN

    What is the GI VPN?

    Why do I need the GI VPN? Why can't I use the UA VPN?

Citation and Acknowledgement

Has your recent work using RCS resources resulted in publication? Please include the following text in your publication(s):

This work was supported in part by the high-performance computing and data storage resources operated by the Research Computing Systems Group at the University of Alaska Fairbanks, Geophysical Institute.

Licensed / Proprietary Software

Proprietary software licenses hosted by RCS are made available to the UAF campus network. Please see the below links (log in using your UA credentials) for more information.

Policies

Login Shells

The login shells supported on RCS systems are bash, csh, ksh, and tcsh. If you would like your default login shell changed, please contact uaf-rcs@alaska.edu

Security Policies

Users of RCS systems agree to abide by published UAF policies and standards: http://www.alaska.edu/oit/services/policies-and-standards. Every user of RCS systems may rightfully expect their programs, data, and documents stored on RCS systems to be inaccessible by others, secure against arbitrary loss or alteration, and available for use at all times. To help protect system security and achieve this goal, RCS staff reserve the right to routinely examine user accounts. In the event of a suspected security incident, RCS staff may inactivate and examine the contents of user accounts without prior notification.

Account Sharing

Users of RCS systems may not share their account with anyone under any circumstances. This policy ensures every user is solely responsible for all actions from within their account. When shared access to a particular set of files on a system is desired, UNIX group permissions should be applied. Contact uaf-rcs@alaska.edu for more information regarding changing group permissions on files and directories.

Policy Enforcement

Abuse of RCS resources is a serious matter and is subject to immediate action. A perceived, attempted, or actual violation of standards, procedures, or guidelines pursuant with RCS policies may result in disciplinary action including the loss of system privileges and possibly legal prosecution in the case of criminal activity. RCS employs the following mechanisms to enforce its policies:

  • Contacting the user via phone or email to ask them to correct the problem.
  • Modifying the permissions on user's files or directories in response to a security violation.
  • Inactivating accounts or disabling access to resources to ensure availability and security of RCS systems.

User-owned Files and Directories

For home directories, RCS recommends authorizing write access to only the file/directory owner. Group and world write permissions in a home directory should be avoided under all circumstances.

For the $CENTER and $ARCHIVE filesystems, RCS recommends using caution when opening group and world read / execute permissions, and extreme caution when opening group and world write permissions.

Setuid and setgid permissions are prohibited on all user-owned files and directories.

User files may be scanned by RCS staff at any time for system security or maintenance purposes.

Non-printing characters, such as ASCII codes for RETURN or DELETE, are occasionally introduced by accident into file names. These characters present a low-level risk to system security and integrity and are prohibited. Techniques for renaming, deleting, or accessing files containing non-printing characters in the filename are described at www.arsc.edu/support/howtos/nonprintingchars/index.xml

Passwords

RCS uses University of Alaska (UA) credentials for user authentication. Therefore, passwords used to access RCS systems are subject to UA password guidelines. UA passwords may be changed using ELMO (https://elmo.alaska.edu). If you suspect your password has been compromised, contact the UAF OIT Helpdesk (helpdesk@alaska.edu, 907-450-8300) immediately.

SSH Public Keys

Sharing of private SSH keys to allow another user to access an RCS account is considered account sharing and is prohibited on RCS systems.

Users of SSH public keys are responsible for their private keys and ensuring they are protected against access from other users. Private SSH keys should be generated and stored only on trusted systems.

Tampering

Do not attempt to break passwords, tamper with system files, access files in other users' directories without permission, or otherwise abuse the privileges given to you with your RCS account. Your privileges do not extend beyond the directories, files, and volumes which you rightfully own or to which you have been given permission to access.

System-generated E-mail

RCS provides a ~/.forward file for users on each system. When the system generates an email, the message will be forwarded to email address(es) listed in the .forward file. Users are free to update their ~/.forward file to their preferred email address.

Maintenance Periods

Annual maintenance schedules are in place for the various RCS systems. They are as follows:

Chinook

  • Monthly: First Wednesday for non-interrupt Operating System security and Scyld ClusterWare updates
  • Quarterly: First Wednesday of Jan, Apr, Jul, and Oct for Operating System and Scyld ClusterWare updates that may require system downtime
  • Twice per year: The month of May during the FAST and over the winter closure for Operating System and Scyld ClusterWare updates that may require system downtime