Chinook will be offline from March 28th, 9am to March 28th 9pm to upgrade the high-speed, Lustre filesystem on the Chinook compute and login nodes, and to fully reboot Chinook and the Infiniband switches to address issues that have affected performance on Chinook.
This maintenance window is required to replace essential $ARCHIVE hardware and software that are past end-of-life. Oracle Solaris based systems will be upgraded to new, Linux based systems.
Following this outage:
Chinook and $CENTER1 will be offline from 9 AM to 5PM on February 21, 2018 to upgrade the $CENTER1 filesystem version to correct an identified software defect. Jobs that are scheduled to run during the downtime will stay in the queue and be run after maintenance is completed.
At the end of November, users will notice a change in automated RCS ticket numbers and will be provided with the option to activate an RCS Service Desk portal account.
Participation in the RCS Service Desk portal is entirely optional.
As always, RCS is available to support users via email, phone, and walk-in.
We are currently investigating issues with copying and creating files on $CENTER1 reported by users. Users may see an error stating "No space left on device" while trying to create, copy, or download files to $CENTER1, and may have to wait and retry the operation. RCS expects that fixing this issue will require an outage and we will update users with further information and when this outage will be scheduled.
The $ARCHIVE filesystem is experiencing heavy utilization, impacting file transfers to and from $ARCHIVE. Users may experience slow file transfers throughout the day as process of archiving files to tape finishes.
Our maintenance on Chinook to address the slowdown on multi-node jobs is now complete. As a result of this change in multi-node jobs the following should be added to your job submission script:
ulimit -l unlimited
ulimit -s unlimited
If these commands are not added there is a possibility that your job may fail.
On June 13th there will be maintenance to configure the networking on Chinook and assist in addressing an observed slowdown in multi-node jobs running on the cluster.
Any jobs that are scheduled to run during this time will remain in the queue.
Chinook will be taken offline for a short time on March 22, 2017, at 9:00AM AKDT to allow for the replacement of $CENTER on Chinook with a new Lustre filesystem. The new filesystem will be assigned the environment variable $CENTER1.