changelog Updates

Memex Login Hung on 11/17/21

by Floyd Fayton
Initial incident and probable cause: Broken pipe on the login caused by a file update on the login. The master node and login were both rebooted and “wwsh file sync” commands were automated memex_routecheck.sh (crontab and cron.hourly).
New
System Failure
Maintenance
Announcement

Login hangs after kernel message...

by Floyd Fayton, HPC Admin
Update (6/29/20): The login hang was caused by high I/O load which is returning after a weekend hiatus. Unfortunately, limiting the I/O on the login is not yet feasible due to the design of the system. The issue is not due to a lack of...
New
System Failure

SLURM's Default Memory Per CPU Increased (1GB --> 2GB)

by Floyd Fayton, HPC Admin
Hi All, Based on previous usage, the default allocation of 1GB of memory per cpu is too low. I have now increased this default to 2GB of memory per cpu. If you are not setting memory requirements in your job(s), this will change will...
Announcement
Improvement

SLURM Priority Adjustment

by Floyd Fayton, HPC Admin
Since priorities were not working for those users who use Memex less frequently and in smaller batches of submitted jobs, these parameters were adjusted: As a result gres/gpu was added to: This functionality changes in SLURM 19+, but...
Announcement
Maintenance
Improvement

New Nodes Added - memex-c[117-124]

by Floyd Fayton, HPC Admin
Nodes, memex-c[117-124], were added to Memex on 2/13/19. These nodes are identical to memex-c[109-116], which all have 256GB of raw memory and up to 250GB of free/unused memory per node. Users can request any of these nodes by adding...
Improvement
Announcement