Resolved 12/10/19:

After tuning the NFS server and clients, the slowness was resolved. Although there were several adjustments, the RPCNFSDCOUNT variable in /etc/sysconfig/nfs was the change that made the biggest improvement (100x bandwidth). This tuning of the system was partially documented in our ticket system for future reference.

Update 12/04/19:

To see improvements to the module command, please log out and log back in. This step is required for you to take advantage of Lmod’s caching feature which improves its responsiveness. In the meantime, I am still investigating ways to improve filesystem performance for /memexnfs/* mountpoints. If you are running SLURM jobs which writes or reads large files, I suggest using /lustre/scratch/$USER as a working directory. If you are running a parallel or multiprocessing job, also use /lustre/scratch/$USER as a working directory. For instance,

mkdir /lustre/scratch/$USER
rsync -aWz /home/$USER/workdir/ /lustre/scratch/$USER/workdir/
cd /lustre/scratch/$USER/workdir/

then submit your job as normal.

After the job finishes, you can rsync the directory back to /home/$USER/workdir for safe keeping. Please keep in mind, you’ll need to add "--delete" to the rsync command for an exact copy of the /lustre/scratch/$USER/workdir (which deletes any files/dirs in /home/$USER/workdir that are not in /lustre/scratch/$USER/workdir).

Use cautiously,

rsync -aWz --delete /lustre/scratch/$USER/workdir/ /home/$USER/workdir/

Please note, the directories under /lustre/scratch are not backed up and are not currently scrubbed.

The use of /lustre/scratch/$USER is recommended because the read/write performance of /memexnfs/* is being hindered by multiple I/O streams, including transfers (by root and users), SLURM jobs, and any other login node activity by users (VNC, shell/interpreter scripts, etc.).

Update 12/03/19:

Lmod’s cache was enabled to improve the performance of the module commands on Memex. However, I/O performance is still a bit sluggish, so more investigation is required to improve performance on the mounted filesystems.

Issue (started week of 11/25/2019):

After logging onto Memex (password/DUO), the user login hangs while trying to load modules. This is an ongoing issue which seems to be caused by remote and/or local mounted filesystems,

Screen Shot 2019-12-02 at 3.12.27 PM.png

while Lmod is traversing one or more of the module paths (usually in “$MODULEPATH”). We are investigating the issue, but in the meantime enter “Ctrl+c” if the bash command prompt doesn’t appear swiftly after the following banner:

Screen Shot 2019-12-02 at 3.03.54 PM.png