urn:noticeable:projects:3f43Ej0LaTLbXv21eFelchangelog Updateshpc-internal.carnegiescience.edu2021-12-07T18:26:14.900ZCopyright © changelogNoticeablehttps://storage.noticeable.io/projects/3f43Ej0LaTLbXv21eFel/newspages/t8lIbf2iSTWZIIP91xqU/01h55ta3gshjbemty2fj8xrzn2-header-logo.pnghttps://storage.noticeable.io/projects/3f43Ej0LaTLbXv21eFel/newspages/t8lIbf2iSTWZIIP91xqU/01h55ta3gshjbemty2fj8xrzn2-header-logo.png#1e88e5urn:noticeable:publications:qicYM3U7JTa4BbA78PsI2021-11-17T21:32:57.411Z2021-12-07T18:26:14.900ZMemex Login Hung on 11/17/21Initial incident and probable cause: Broken pipe on the login caused by a file update on the login. The master node and login were both rebooted and “wwsh file sync” commands were automated memex_routecheck.sh (crontab and cron.hourly).<p>Initial incident and probable cause: </p><p>Broken pipe on the login caused by a file update on the login. The master node and login were both rebooted and “wwsh file sync” commands were automated memex_routecheck.sh (crontab and cron.hourly). After reboot, the routing table and maintenance motd were both updated sooner than before. Also added a ping check in order to determine whether the networks needs restarting (after the routing table is updated).</p>Floyd Fayton[email protected]urn:noticeable:publications:r21hNwxoxQFI4LTYNgHe2020-05-07T15:15:00.001Z2020-06-03T18:03:55.692ZSLURM Priority AdjustmentSince priorities were not working for those users who use Memex less frequently and in smaller batches of submitted jobs, these parameters were adjusted: As a result gres/gpu was added to: This functionality changes in SLURM 19+, but...<p>Since priorities were not working for those users who use Memex less frequently and in smaller batches of submitted jobs, these parameters were adjusted:</p> <p><code>PriorityWeightFairShare=20000</code><br> <code>PriorityWeightTRES=CPU=1000,Mem=2000,GRES/gpu=3000</code></p> <p>As a result gres/gpu was added to:</p> <p><code>AccountingStorageTRES=cpu,mem,energy,node,billing,fs/disk,vmem,pages,gres/gpu</code></p> <p>This functionality changes in SLURM 19+, but our current version is 18.08 which is the latest version packaged with OpenHPC 1.3.</p> <p><strong>Update:</strong> AcctGatherFilesystemType was also enabled for lustre.</p> Floyd Fayton[email protected]