urn:noticeable:projects:3f43Ej0LaTLbXv21eFelchangelog Updateshpc-internal.carnegiescience.edu2021-12-07T18:26:14.900ZCopyright © changelogNoticeablehttps://storage.noticeable.io/projects/3f43Ej0LaTLbXv21eFel/newspages/t8lIbf2iSTWZIIP91xqU/01h55ta3gshjbemty2fj8xrzn2-header-logo.pnghttps://storage.noticeable.io/projects/3f43Ej0LaTLbXv21eFel/newspages/t8lIbf2iSTWZIIP91xqU/01h55ta3gshjbemty2fj8xrzn2-header-logo.png#1e88e5urn:noticeable:publications:qicYM3U7JTa4BbA78PsI2021-11-17T21:32:57.411Z2021-12-07T18:26:14.900ZMemex Login Hung on 11/17/21Initial incident and probable cause: Broken pipe on the login caused by a file update on the login. The master node and login were both rebooted and “wwsh file sync” commands were automated memex_routecheck.sh (crontab and cron.hourly).<p>Initial incident and probable cause: </p><p>Broken pipe on the login caused by a file update on the login. The master node and login were both rebooted and “wwsh file sync” commands were automated memex_routecheck.sh (crontab and cron.hourly). After reboot, the routing table and maintenance motd were both updated sooner than before. Also added a ping check in order to determine whether the networks needs restarting (after the routing table is updated).</p>Floyd Fayton[email protected]urn:noticeable:publications:kiXk98fZM3iLHtLNxMyQ2020-06-25T20:09:00.001Z2020-06-29T14:06:52.111ZLogin hangs after kernel message...Update (6/29/20): The login hang was caused by high I/O load which is returning after a weekend hiatus. Unfortunately, limiting the I/O on the login is not yet feasible due to the design of the system. The issue is not due to a lack of...<p><strong>Update (6/29/20):</strong><br> The <strong>login</strong> hang was caused by high I/O load which is returning after a weekend hiatus. Unfortunately, limiting the I/O on the login is not yet feasible due to the design of the system. The issue is not due to a lack of memory, CPU, or bandwidth on the login but the ability of the login to process lots of I/O by users (including transfers and local login processes). The issue is not present when performing the same work on compute nodes (transfers, normal commands, writing files, etc.). Short of an entire system redesign, we are planning a way forward with the way storage is used and configured on Memex. The earliest possible change(s) will be made during the July 8th shutdown. Details will be sent once those plans are set.</p> <p><strong>Incident (06/24/20 @1540):</strong><br> Login stalled after the global message below</p> <pre><code class="hljs language-Message from syslogd@memex at Jun 24 12:22:34 ..."> kernel:NMI watchdog: BUG: soft lockup - CPU#<span class="hljs-number">10</span> stuck <span class="hljs-keyword">for</span> <span class="hljs-number">23</span>s! [$IP-m:<span class="hljs-number">69361</span>] Message <span class="hljs-keyword">from</span> <span class="hljs-symbol">syslogd@</span>memex at Jun <span class="hljs-number">24</span> <span class="hljs-number">12</span>:<span class="hljs-number">23</span>:<span class="hljs-number">02</span> ... kernel:NMI watchdog: BUG: soft lockup - CPU#<span class="hljs-number">10</span> stuck <span class="hljs-keyword">for</span> <span class="hljs-number">22</span>s! [$IP-m:<span class="hljs-number">69361</span>] Message <span class="hljs-keyword">from</span> <span class="hljs-symbol">syslogd@</span>memex at Jun <span class="hljs-number">24</span> <span class="hljs-number">12</span>:<span class="hljs-number">23</span>:<span class="hljs-number">34</span> ... kernel:NMI watchdog: BUG: soft lockup - CPU#<span class="hljs-number">10</span> stuck <span class="hljs-keyword">for</span> <span class="hljs-number">23</span>s! [$IP-m:<span class="hljs-number">69361</span>] Message <span class="hljs-keyword">from</span> <span class="hljs-symbol">syslogd@</span>memex at Jun <span class="hljs-number">24</span> <span class="hljs-number">12</span>:<span class="hljs-number">24</span>:<span class="hljs-number">02</span> ... kernel:NMI watchdog: BUG: soft lockup - CPU#<span class="hljs-number">10</span> stuck <span class="hljs-keyword">for</span> <span class="hljs-number">23</span>s! [$IP-m:<span class="hljs-number">69361</span>] Message <span class="hljs-keyword">from</span> <span class="hljs-symbol">syslogd@</span>memex at Jun <span class="hljs-number">24</span> <span class="hljs-number">12</span>:<span class="hljs-number">24</span>:<span class="hljs-number">30</span> ... kernel:NMI watchdog: BUG: soft lockup - CPU#<span class="hljs-number">10</span> stuck <span class="hljs-keyword">for</span> <span class="hljs-number">22</span>s! [$IP-m:<span class="hljs-number">69361</span>] Message <span class="hljs-keyword">from</span> <span class="hljs-symbol">syslogd@</span>memex at Jun <span class="hljs-number">24</span> <span class="hljs-number">12</span>:<span class="hljs-number">24</span>:<span class="hljs-number">58</span> ... kernel:NMI watchdog: BUG: soft lockup - CPU#<span class="hljs-number">10</span> stuck <span class="hljs-keyword">for</span> <span class="hljs-number">22</span>s! [$IP-m:<span class="hljs-number">69361</span>]``` ------------- </code></pre> Floyd Fayton[email protected]urn:noticeable:publications:zx7mwKxW16mNkfChp64x2020-05-22T02:02:00.001Z2020-06-25T20:30:56.494ZMaster Node RebootedIncident (5/21/20): While fixing issues with the GPU nodes, the master node became unstable because of several mount points that were damaged. All of the mount were runtime filesystems so a reboot was requested and fulfilled by SRCF...<p>**Incident (5/21/20): **</p> <p>While fixing issues with the GPU nodes, the master node became unstable because of several mount points that were damaged. All of the mount were runtime filesystems so a reboot was requested and fulfilled by SRCF personnel the next morning. No SLURM jobs were reported as affected but new logins were denied until the master node was rebooted. The outage lasted for about 8hrs.</p> Floyd Fayton[email protected]urn:noticeable:publications:omAlGcd5omhvBQgZVFeq2020-01-14T21:45:00.001Z2020-01-17T17:08:01.805ZMemex unable to accept new user loginsIssue 01/14/20: While installing new packages on the login server, the file /etc/resolv.conf was overwritten and caused new user logins to fail. Once the file was replaced with the proper nameserver values, Memex accepted new logins...<p><strong>Issue 01/14/20:</strong></p> <p>While installing new packages on the login server, the file /etc/resolv.conf was overwritten and caused new user logins to fail. Once the file was replaced with the proper nameserver values, Memex accepted new logins again.</p> Floyd Fayton[email protected]urn:noticeable:publications:5csmLDRBAVK9iyQKDttS2019-12-19T17:35:00.001Z2020-01-15T17:27:49.876ZUser reported that rsync/cp/scp too slow on /memexnfs/apps,Temporary Resolution 01/02/20: Using rclone instead of rsync/cp/scp is 10x faster for large directory and possibly large file syncs to /memexnfs/ mountpoints. Although all reads from /memexnfs mounts are performing performing as...<p><strong>Temporary Resolution 01/02/20:</strong><br> Using rclone instead of rsync/cp/scp is 10x faster for large directory and possibly large file syncs to /memexnfs/ mountpoints. Although all reads from /memexnfs mounts are performing performing as expected, all disk to disk writes to /memexnfs are not. As load increases, presumably from cluster jobs and transfers (syncs to /home), write performance suffers. The workaround is to use rclone and this email was sent out to all users:</p> <blockquote> <p>Please use rclone for large local or remote transfers while using /memenfs/* filesystems, or /share/apps/dept, /home/username, /work/DEPT, or /scratch/username. There seems to be an issue with the common linux commands, rsync and cp, when transferring large directories (size and number of files).</p> </blockquote> <blockquote> <p>The solution is to use rclone instead of rsync or cp or scp for large directories (size and number of files),</p> </blockquote> <p><code>rclone sync /home/username/directory/ /scratch/username/directory/ -LP</code></p> <blockquote> <p>This syncing issue currently affects write speeds but not read speeds for large directories to /memexnfs/*. This solution has been tested and should also work fine for small directories and files.</p> </blockquote> <blockquote> <p>Of course, rclone is used to sync files to/from GDrive as well.</p> </blockquote> <blockquote> <p>Rclone Tutorial:<br> <a href="https://carnegiescience.freshservice.com/support/solutions/articles/3000040389?utm_source=noticeable&amp;utm_campaign=3f43ej0latlbxv21efel.user-reported-that-rsync-cp-scp-too-slow-on-memexnfs-apps&amp;utm_content=publication+link&amp;utm_id=3f43Ej0LaTLbXv21eFel.t8lIbf2iSTWZIIP91xqU.5csmLDRBAVK9iyQKDttS&amp;utm_medium=newspage" target="_blank" rel="noopener">https://carnegiescience.freshservice.com/support/solutions/articles/3000040389</a></p> </blockquote> <p><strong>Issue 12/19/19:</strong><br> A user reported rsyncs were tool slow on the /share/apps mount of /memexnfs/apps. Since all /memexnfs/* mounts share the same disks/setup, it was determined the issue was not isolated to /share/apps, /work, /scratch, and /home (all /memexnfs mountpoints across the cluster).</p> Floyd Fayton[email protected]urn:noticeable:publications:EC7QoW1ZmiKkRDgyNIDr2019-12-02T19:57:00.001Z2020-01-15T17:25:26.267ZLogin Node Slowness (module command hanging on memex.carnegiescience.edu)Resolved 12/10/19: After tuning the NFS server and clients, the slowness was resolved. Although there were several adjustments, the RPCNFSDCOUNT variable in /etc/sysconfig/nfs was the change that made the biggest improvement (100x...<p><strong>Resolved 12/10/19:</strong></p> <p>After tuning the NFS server and clients, the slowness was resolved. Although there were several adjustments, the RPCNFSDCOUNT variable in /etc/sysconfig/nfs was the change that made the biggest improvement (100x bandwidth). This tuning of the system was partially documented in <a href="https://carnegiescience.freshservice.com/support/solutions/articles/3000044399?utm_source=noticeable&amp;utm_campaign=3f43ej0latlbxv21efel.login-slowness-module-command-hanging&amp;utm_content=publication+link&amp;utm_id=3f43Ej0LaTLbXv21eFel.t8lIbf2iSTWZIIP91xqU.EC7QoW1ZmiKkRDgyNIDr&amp;utm_medium=newspage" target="_blank" rel="noopener">our ticket system</a> for future reference.</p> <p><strong>Update 12/04/19:</strong></p> <p>To see improvements to the module command, please log out and log back in. This step is required for you to take advantage of Lmod’s caching feature which improves its responsiveness. In the meantime, I am still investigating ways to improve filesystem performance for <code>/memexnfs/*</code> mountpoints. If you are running SLURM jobs which writes or reads large files, I suggest using <code>/lustre/scratch/$USER</code> as a working directory. If you are running a parallel or multiprocessing job, also use <code>/lustre/scratch/$USER</code> as a working directory. For instance,</p> <blockquote> <p>mkdir /lustre/scratch/$USER<br> rsync -aWz /home/$USER/workdir/ /lustre/scratch/$USER/workdir/<br> cd /lustre/scratch/$USER/workdir/</p> </blockquote> <p>then submit your job as normal.</p> <p>After the job finishes, you can <code>rsync</code> the directory back to <code>/home/$USER/workdir</code> for safe keeping. Please keep in mind, you’ll need to add <code>"--delete"</code> to the <code>rsync</code> command for an exact copy of the <code>/lustre/scratch/$USER/workdir</code> (which deletes any files/dirs in <code>/home/$USER/workdir</code> that are not in <code>/lustre/scratch/$USER/workdir</code>).</p> <p>Use cautiously,</p> <blockquote> <p>rsync -aWz --delete /lustre/scratch/$USER/workdir/ /home/$USER/workdir/</p> </blockquote> <p>Please note, the directories under <code>/lustre/scratch</code> are <strong>not</strong> backed up and are <strong>not</strong> currently scrubbed.</p> <p>The use of <code>/lustre/scratch/$USER</code> is recommended because the read/write performance of <code>/memexnfs/*</code> is being hindered by multiple I/O streams, including transfers (by root and users), SLURM jobs, and any other login node activity by users (VNC, shell/interpreter scripts, etc.).</p> <p><strong>Update 12/03/19:</strong></p> <p>Lmod’s cache was enabled to improve the performance of the module commands on Memex. However, I/O performance is still a bit sluggish, so more investigation is required to improve performance on the mounted filesystems.</p> <p><strong>Issue (started week of 11/25/2019):</strong></p> <p>After logging onto Memex (password/DUO), the user login hangs while trying to load modules. This is an ongoing issue which seems to be caused by remote and/or local mounted filesystems,</p> <p><img src="https://storage.noticeable.io/projects/3f43Ej0LaTLbXv21eFel/publications/EC7QoW1ZmiKkRDgyNIDr/01h55ta3gsaf68z6xy02yhv34s-image.png" alt="Screen Shot 2019-12-02 at 3.12.27 PM.png"></p> <p>while Lmod is traversing one or more of the module paths (usually in “$MODULEPATH”). We are investigating the issue, but in the meantime <strong>enter</strong> “Ctrl+c” if the bash command prompt doesn’t appear swiftly after the following banner:</p> <p><img src="https://storage.noticeable.io/projects/3f43Ej0LaTLbXv21eFel/publications/EC7QoW1ZmiKkRDgyNIDr/01h55ta3gswf49314edn913r1t-image.png" alt="Screen Shot 2019-12-02 at 3.03.54 PM.png"></p> Floyd Fayton[email protected]urn:noticeable:publications:y3QRYShmzZxk6tnDtkgn2019-10-14T15:40:00.001Z2020-01-15T17:25:34.931ZSystem Update & Failed disk in SureStoreHD, memexnfs ZFS pool degradedSystem Update 11/6/19: System was updated on November 8th, which includes updates to SLURM, ZFS (0.6 to 0.8, which improves time to rebuild failed disk), and CentOS (7.5 to 7.7). Resolved 11/2/19: Issue resolved. Replacement disk has...<p><strong>System Update 11/6/19:</strong></p> <p><em>System was updated on November 8th, which includes updates to SLURM, ZFS (0.6 to 0.8, which improves time to rebuild failed disk), and CentOS (7.5 to 7.7).</em></p> <p><strong>Resolved 11/2/19:</strong></p> <p>Issue resolved. Replacement disk has finished resilvering.</p> <p><strong>Update 10/22/19:</strong></p> <p>The new drive in our primary filesystem is still rebuilding and will be done in about 8 days. The type of filesystem is ZFS (version 0.6) in RAIDZ1 configuration which means one failed drive puts the filesystem in a degraded state. This degraded state will continue until the drive is "resilvered", or data is copied to the healthy disk and it comes online. This process, which takes entirely too much time, was flagged as a ZFS bug back in November, 2017.</p> <p>The current version of ZFS, 0.8.0, was released in May of this year and addresses the bug. The improvement to the resilvering process is said to be 5-6x better that the performance we’re currently seeing. The command line slowness you are experiencing on Memex’s login is far worse (up to 5x worse) that the I/O performance on Memex’s compute nodes, but all I/O for /home, /scratch, /work/ and /share/apps will be affected. This means you can still submit jobs from the login but all other activities will be slow in /home, /scratch, and /work.</p> <p>A way around this is to use your own Lustre scratch directory, /lustre/scratch/username (if it doesn’t exist, you can create it, “mkdir -p /lustre/scratch/username”), to edit files, run local commands, etc. Cleanup for /lustre/scratch/username is turned off for now and you can even submit jobs from here.</p> <p><code>**ANNOUNCEMENT**</code><br> We are planning a software update for SLURM and ZFS when the disk replacement is completely done. I will be sending out a notice for a planned reboot, which is necessary in order to ensure the ZFS filesystem is truly updated. Please keep this in mind as you are submitting jobs. A job intended to run for a month or so, will be killed prior to these updates in a couple of weeks.</p> <p><strong>Update 10/14/19:</strong></p> <p>Drive still resilvering - 24% done and going -</p> <blockquote> <p>status: One or more devices is currently being resilvered. The pool will<br> continue to function, possibly in a degraded state.<br> action: Wait for the resilver to complete.<br> scan: resilver in progress since Wed Oct 9 12:55:27 2019<br> 18.8T scanned out of 77.8T at 47.3M/s, 363h15m to go<br> 566G resilvered, 24.15% done</p> </blockquote> <p><strong>Update 10/9/19:</strong> <br> Failed drive replaced and ZFS resilvering with new drive started.</p> <p><strong>Update 10/7/19:</strong> <br> Increased quota for /memexnfs/home, decreased quota for /scratch as well.</p> <p><strong>Update 10/7/19:</strong> <br> Memexnfs has become more responsive due to ZFS resilvering after the drive failure. This has resulted in I/O improvements for /home, /work, /scratch, and /share/apps.</p> <p><strong>Update 10/5/19:</strong> <br> Waiting for new HDD in order to replace the failed disk.</p> <p>The main issue is a failed drive for all /memexnfs/* mounts. I’ll let you know when the drive has been replaced. Until then Memex’s directories, /memexnfs/scratch (/scratch), /memexnfs/home/ (or /home), /memexnfs/work (or /work), and /memexnfs/apps (or /share/apps) will all be operating in a degraded state.</p> Floyd Fayton[email protected]urn:noticeable:publications:KaPTOoCZ6rcqrRGzijD32019-07-30T18:53:00.001Z2020-01-15T17:26:15.522ZMemory failing in our SureStore UHD server (replacing DIMMs today)Update: Replacing the failing DIMMs now.. Issue: During IOR testing of /work on Memex, it was discovered that performance was lower than usual and two DIMMs were failing. Once discovered the manufacturer was contacted for replacements...<p>**Update: **<br> Replacing the failing DIMMs now…</p> <p><img src="https://storage.noticeable.io/projects/3f43Ej0LaTLbXv21eFel/publications/KaPTOoCZ6rcqrRGzijD3/01h55ta3gs86dgpfrz5934fsg5-image.jpg" alt="2019-07-30.jpeg"></p> <p>**Issue: **<br> During IOR testing of /work on Memex, it was discovered that performance was lower than usual and two DIMMs were failing. Once discovered the manufacturer was contacted for replacements, which they sent overnight.</p> <p>An emergency reboot was scheduled for 7/30/19 1pm PST (4pm EST) and notice was sent to users to shutdown their jobs before 1pm PST/4pm EST. This shutdown was necessary to avoid issues (i.e. corruption) within the affected Memex directories:</p> <p>/home (all but DPB/DGE)<br> /work<br> /scratch<br> /share</p> <p>Lustre, or /lustre and /lscratch, should also be rebooted during this time. There are still lingering issues from the OSS2 failure in May, 2019.</p> <p>Users were warned that any jobs not canceled by shutdown was killed.</p> <p>The SureStore is 2+ yrs. old and was up, 223 days prior to this emergency shutdown.</p> Floyd Fayton[email protected]urn:noticeable:publications:oXsj5DmMQxuikxEAwXp62019-07-09T19:41:00.001Z2020-01-15T17:26:29.981ZIntel Python Issue.. conda base corruptedOn July 8th, the conda environment for modules, python/2.7.0 and python/3.6.0, was affected after an incomplete install for seaborn and pandas was aborted. Subsequent steps to fallback to a sane state and install those packages failed...<p>On July 8th, the conda environment for modules, python/2.7.0 and python/3.6.0, was affected after an incomplete install for seaborn and pandas was aborted. Subsequent steps to fallback to a sane state and install those packages failed. Although the previous conda environments and Python base was still intact, new package installations using conda were failing altogether. Since Intel released their 2019 Parallel Studio XE (compilers and Python) and the 2018 installed version was still functional, effort is being made to migrate to the 2019 versions and eventually abandon the 2018 Python versions.</p> <p>The 2019 Intel Parallel Studio XE toolkit, including Python versions 2.7.16 and 3.6.7, have just been installed on Memex. To use the 2019 Intel compilers, use module “intel/2019” (includes icc, ifort, etc.), which is an upgrade to the “intel/2018” module (lowercase ‘i’ matters). I am currently working on modules for Intel’s Python 2 and 3, located under /share/apps/intel/2019 as intelpython2/ and intelpython3/.</p> <p>If you are only interested in the 2019 Intel Compilers and don’t use Intel’s Python versions, you can stop reading here. If you don’t know which python version you’re using, type “python --version” from the command line. It will indicate Intel or GNU. For those who are interested in the 2019 Python installation on Memex, please continue reading…</p> <p>The 2019 Intel Python installation does not inherit the packages and conda environments from the previous 2018 Intel Python installations, which are modules “python/2.7.0” and “python/3.6.0” on Memex. This means you can continue to use those 2018 Intel installations (including compilers, python, and python conda envs), but updates to those 2018 packages will be abandoned by September 1st, 2019. To save and recreate your current conda environment for the 2019 installation, see <a href="https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html?highlight=base&amp;utm_source=noticeable&amp;utm_campaign=3f43ej0latlbxv21efel.intel-python-issue-conda-base-corrupted&amp;utm_content=publication+link&amp;utm_id=3f43Ej0LaTLbXv21eFel.t8lIbf2iSTWZIIP91xqU.oXsj5DmMQxuikxEAwXp6&amp;utm_medium=newspage#sharing-an-environment" target="_blank" rel="noopener">Sharing an environment</a> and then <a href="https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html?highlight=base&amp;utm_source=noticeable&amp;utm_campaign=3f43ej0latlbxv21efel.intel-python-issue-conda-base-corrupted&amp;utm_content=publication+link&amp;utm_id=3f43Ej0LaTLbXv21eFel.t8lIbf2iSTWZIIP91xqU.oXsj5DmMQxuikxEAwXp6&amp;utm_medium=newspage#creating-an-environment-from-an-environment-yml-file" target="_blank" rel="noopener">Creating an environment from an environment.yml file</a>. This file can be saved and used in other Python/Conda setups (on other machines as well).</p> <p>I am working to install a few general Python packages for the new 2019 installation, so please feel free to send requests for package installations to <a href="mailto:[email protected]" target="_blank" rel="noopener">[email protected]</a>. The new 2019 Intel Python modules will be “python/2.7.16” and "python/3.6.7". These modules are available now but I am still working to install packages this week and establish a conda "base". These packages include:</p> <blockquote> <p>numpy<br> matplotlib<br> seaborn<br> tensorflow<br> pandas<br> keras<br> sklearn<br> r<br> rstudio<br> jupyter notebook<br> r-rgdal<br> and more… (some packages are easier installed than others!)</p> </blockquote> <p>Again, these packages will establish the “base” for each Python module and their downstream conda environments, so if you have a general package you’d like me to install, let me know by this week. This is important because If the base for either Python version changes, any conda environment created on top of it will be affected, so please send your requests this week. This work is ongoing…</p> <p>If request for package installations involve pulling from Github or other third-party sources, then a conda environment might become necessary. Not all packages, and/or package versions, are compatible. For this reason, wait until after this week to create your own conda environments. Personal conda environments can be setup without having Memex admin privileges (recommended if you want full control of your environment). For example (instructions taken from here),</p> <blockquote> <p>conda create -n myenv<br> conda activate myenv</p> </blockquote> <p>will create a conda environment in /home/username/.conda/envs/myenv and then prepend your command prompt with "(myenv)", indicating you’re now in your newly created conda environment. This conda environment allows you to install specific versions of packages as well, but your initial conda environment depends on the module you start with (i.e. “python/2.7.16” or “python/3.6.7”). You can specify what version of a package to install using following command (package here is "python", version is “3.6.8”):</p> <blockquote> <p>conda install -n myenv python=3.6.8</p> </blockquote> <p>Conda will try to accommodate this request by downgrading, removing, upgrading, superseding, or installing packages for “python=3.6.8” dependencies, and then ask if you want to proceed (y/n?). I can tell you from experience, Python 3 is easier for Conda to work than Python 2, but most issues can be worked through in conda environments.</p> <p>For issues, please email <a href="mailto:[email protected]" target="_blank" rel="noopener">[email protected]</a> to create a ticket.</p> <p>Other useful conda commands/instructions:</p> <blockquote> <p>conda deactivate #exit conda environment<br> conda env list #list available conda environments<br> conda list scipy #list package version in current environment<br> <a href="https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html?highlight=base&amp;utm_source=noticeable&amp;utm_campaign=3f43ej0latlbxv21efel.intel-python-issue-conda-base-corrupted&amp;utm_content=publication+link&amp;utm_id=3f43Ej0LaTLbXv21eFel.t8lIbf2iSTWZIIP91xqU.oXsj5DmMQxuikxEAwXp6&amp;utm_medium=newspage#sharing-an-environment" target="_blank" rel="noopener">sharing an environment</a><br> <a href="https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html?highlight=base&amp;utm_source=noticeable&amp;utm_campaign=3f43ej0latlbxv21efel.intel-python-issue-conda-base-corrupted&amp;utm_content=publication+link&amp;utm_id=3f43Ej0LaTLbXv21eFel.t8lIbf2iSTWZIIP91xqU.oXsj5DmMQxuikxEAwXp6&amp;utm_medium=newspage#removing-an-environment" target="_blank" rel="noopener">deleting an environment</a></p> </blockquote> <p>Updates will follow…</p> Floyd Fayton[email protected]urn:noticeable:publications:F4gcDNCdfQboogwv66pt2019-04-12T16:07:00.001Z2020-01-15T17:26:44.299ZError via getvnfs networkingFixed: by restart httpd on OpenHPC master node. Issue: Down nodes could not be reimaged because of the PXE process hanging at the getvnfs stage (seen in /var/log/messages, boot never makes it past getvnfs). Affected nodes: memex-c[014...<p><strong>Fixed:</strong> by restart httpd on OpenHPC master node.</p> <p><strong>Issue:</strong> Down nodes could not be reimaged because of the PXE process hanging at the getvnfs stage (seen in /var/log/messages, boot never makes it past getvnfs). Affected nodes: memex-c[014,042,038,072,075,088].</p> Floyd Fayton[email protected]