changelog Updates

Memex Login Hung on 11/17/21

by Floyd Fayton
Initial incident and probable cause: Broken pipe on the login caused by a file update on the login. The master node and login were both rebooted and “wwsh file sync” commands were automated memex_routecheck.sh (crontab and cron.hourly).
New
System Failure
Maintenance
Announcement

Master Node Rebooted

by Floyd Fayton, HPC Admin
Incident (5/21/20): While fixing issues with the GPU nodes, the master node became unstable because of several mount points that were damaged. All of the mount were runtime filesystems so a reboot was requested and fulfilled by SRCF...
System Failure
Fix

SLURM Priority Adjustment

by Floyd Fayton, HPC Admin
Since priorities were not working for those users who use Memex less frequently and in smaller batches of submitted jobs, these parameters were adjusted: As a result gres/gpu was added to: This functionality changes in SLURM 19+, but...
Announcement
Maintenance
Improvement

Did You Know ... Slack Edition

by Floyd Fayton, HPC Admin
Did you know we have a Slack channel for HPC/Research Computing? Signup to our Carnegie Institution for Science workspace (click here) and then join the #hpc channel. Please use your Google login, "@carnegiescience.edu", email...
Tips
Announcement
Welcome Guide

Did You Know ... Python Edition

by Floyd Fayton, HPC Admin
Did you know official support for Python 2 is over? That said, we have Python 3 available on Memex by loading the module, "python/3.6.7". This Python version includes conda, R, Jupyter, IntelMPI, and many other packages. Most...
Announcement
Tips
Welcome Guide

Did You Know ... Storage Edition

by Floyd Fayton, HPC Admin
Did you know there's a 256GB quota for all This policy will be fully enforced in the coming weeks. This requirement is needed to manage space and load for > ==Note:== If you are currently over the limit, you will have time to move...
Tips
Announcement
Welcome Guide

Memex unable to accept new user logins

by Floyd Fayton, HPC Admin
Issue 01/14/20: While installing new packages on the login server, the file /etc/resolv.conf was overwritten and caused new user logins to fail. Once the file was replaced with the proper nameserver values, Memex accepted new logins...
Fix
System Failure

Did You Know ... SLURM Edition

by Floyd Fayton, HPC Admin
Did you know there's a gui to view SLURM jobs? Inside a VNC or "ssh -XY .." session, type Did you know you can view the maximum resources of each node with: Did you know you can view the maximum memory used for running jobs with (see...
Tips
Announcement
Welcome Guide

User reported that rsync/cp/scp too slow on /memexnfs/apps,

by Floyd Fayton, HPC Admin
Temporary Resolution 01/02/20: Using rclone instead of rsync/cp/scp is 10x faster for large directory and possibly large file syncs to /memexnfs/ mountpoints. Although all reads from /memexnfs mounts are performing performing as...
System Failure
Announcement
Fix