changelog Updates

Did You Know ... Python Edition

by Floyd Fayton, HPC Admin
Did you know official support for Python 2 is over? That said, we have Python 3 available on Memex by loading the module, "python/3.6.7". This Python version includes conda, R, Jupyter, IntelMPI, and many other packages. Most...
Announcement
Tips
Welcome Guide

Did You Know ... Storage Edition

by Floyd Fayton, HPC Admin
Did you know there's a 256GB quota for all This policy will be fully enforced in the coming weeks. This requirement is needed to manage space and load for > ==Note:== If you are currently over the limit, you will have time to move...
Tips
Announcement
Welcome Guide

Did You Know ... SLURM Edition

by Floyd Fayton, HPC Admin
Did you know there's a gui to view SLURM jobs? Inside a VNC or "ssh -XY .." session, type Did you know you can view the maximum resources of each node with: Did you know you can view the maximum memory used for running jobs with (see...
Tips
Announcement
Welcome Guide

User reported that rsync/cp/scp too slow on /memexnfs/apps,

by Floyd Fayton, HPC Admin
Temporary Resolution 01/02/20: Using rclone instead of rsync/cp/scp is 10x faster for large directory and possibly large file syncs to /memexnfs/ mountpoints. Although all reads from /memexnfs mounts are performing performing as...
System Failure
Announcement
Fix

Login Node Slowness (module command hanging on memex.carnegiescience.edu)

by Floyd Fayton, HPC Admin
Resolved 12/10/19: After tuning the NFS server and clients, the slowness was resolved. Although there were several adjustments, the RPCNFSDCOUNT variable in /etc/sysconfig/nfs was the change that made the biggest improvement (100x...
Announcement
System Failure
Fix

System Update & Failed disk in SureStoreHD, memexnfs ZFS pool degraded

by Floyd Fayton, HPC Admin
System Update 11/6/19: System was updated on November 8th, which includes updates to SLURM, ZFS (0.6 to 0.8, which improves time to rebuild failed disk), and CentOS (7.5 to 7.7). Resolved 11/2/19: Issue resolved. Replacement disk has...
Announcement
System Failure

Memory failing in our SureStore UHD server (replacing DIMMs today)

by Floyd Fayton, HPC Admin
Update: Replacing the failing DIMMs now.. Issue: During IOR testing of /work on Memex, it was discovered that performance was lower than usual and two DIMMs were failing. Once discovered the manufacturer was contacted for replacements...
System Failure
Announcement

Intel Python Issue.. conda base corrupted

by Floyd Fayton, HPC Admin
On July 8th, the conda environment for modules, python/2.7.0 and python/3.6.0, was affected after an incomplete install for seaborn and pandas was aborted. Subsequent steps to fallback to a sane state and install those packages failed...
Announcement
System Failure
Fix

Login Stalled - SSH denial/System locked up

by Floyd Fayton, HPC Admin
Memex's login became unresponsive to established and new SSH sessions. A reboot was initiated shortly after. The initial concern is that a new package, abrt-gui, was installed, detected a system issue and halted the server. Admin...
Announcement
System Failure
Fix