On July 27th there will be maintenance on the $CENTER1 file system to address slowdown and errors involving file operations. The system will need to be taken offline to apply a patch that we anticipate will resolve these issues. Jobs that are scheduled to run during this downtime will stay in the queue and be run after the downtime reservation. The Chinook HPC cluster and the Linux workstations will be affected by this outage.
Chinook00 is being rebooted on July 12, 2017 at 3pm AKDT. This should be a brief outage, and logins during the time chinook00.alaska.edu is offline should be redirected to the other login nodes.
Due to continued system troubles from the June 28 unplanned power outage, some web services may be experiencing glitches. RCS is currently troubleshooting to restore services.
RCS Systems are still currently in the process of being restored. Access to RCS systems may be available but we are still investigating issues that are causing unexpected behavior and are impacting work on these systems.
As of June 29th, 8am RCS systems are steadily coming back online. It is currently unknown when all systems will be fully operational and it may extend past to the previous estimate of June 29th, 9am.
We will distribute further notifications as we assess our systems and can give a concrete estimate of when each system will be back online.
There was an unplanned power outage in the UAF Butro Data Center this morning. OIT and Facilities Services have replaced the critical equipment.
This was a hard power failure and Research Computing Systems (RCS) is currently assessing the impacts to our hardware and services.
Network on UAF campus has been restored and all RCS HPC, storage, and web services are planned to be back online by 9 AM AKST, June 29, 2017.
We will distribute notifications as more information is available.
The $ARCHIVE filesystem is experiencing heavy utilization, impacting file transfers to and from $ARCHIVE. Users may experience slow file transfers throughout the day as process of archiving files to tape finishes.
***This outage has been extended.***
The Linux workstations hosted in WRRB 004 will be taken offline from May 20-31, 2017. During this time, RCS staff will reconfigure them to align with chinook.alaska.edu.
Please note the following changes:
On May 20 and 21, 2017, the Butrovich data center will be undergoing the annual University Fire Alarm and Safety Systems Test (FAST).
During this time OIT and RCS will be performing preventative maintenance on systems and services.
The $CENTER1 Lustre filesystem became temporarily unavailable to the Chinook compute nodes on May 11th around 3pm AKDT, causing some submitted jobs to fail immediately. To resolve this issue the job partitions were taken down, and any submitted jobs were placed into a waiting queue until the partitions were brought back online.
Any jobs that were in the process of running during that timeframe should be unaffected.