Troubleshooting¶
This page collects the most common issues reported by Purdue AF users, with solutions. If your problem is not listed here, please contact support.
Sessions¶
My session fails to start
Most commonly this happens due to issues and outages of computing infrastructure - please alert facility admins.
Another possible cause is an overfilled home directory: the
/home/<username>/ volume has a strict 25 GB quota, and sessions cannot start
if you are over it.
- Start a session with the
Minimal JupyterLab interfaceoption — it should work even when a normal session does not. - Check your home directory usage:
du -sh $HOME. - Move large files (data, Pixi/Conda environments) to
/work/or/depot/storage — see Storage volumes.
If you selected a full 40 GB A100 GPU, the session may also fail to start simply because all full GPU instances are taken. Try a 5 GB slice instead, or start a CPU-only session.
My session is very slow
- You might be trying to use custom Pixi or Conda environments on a slow
filesystem. Try moving them to
/work/storage. - Check whether you are running out of RAM: the resources selected at session creation are hard limits. Restart the session with more RAM if needed.
- Reading many small files from
/depot/or/eos/can be slow — see Data access for faster access patterns (XCache).
My session was shut down on its own
Sessions that remain inactive for 14 days are automatically shut down to release resources. Your storage volumes are unaffected — simply start a new session. Sessions may also occasionally get shut down due to unplanned outages, so save your work regularly and keep important code in sync with a Git repository.
I deleted/broke my configuration and want a clean start
Shut down your session (File → Hub Control Panel → Stop My Server), then
start a new one. The session image is recreated from scratch every time; only
the contents of your storage volumes persist.
Storage¶
I can't write to /depot/cms/
Depot is writable only for users with Purdue accounts. CERN and FNAL users
have read-only access — use /work/users/<username>/ or a
/work/projects/ directory instead. See Storage volumes.
Purdue users can write to their own private directories, as well as into group dircetories to which they have access. If you don't have access to your group's directory, please contact facility admins.
I can't write to /eos/purdue/
Purdue EOS is mounted read-only. To write to it, use xrdcp or gfal
commands — see Writing to EOS.
I don't see my Grid directory under /eos/purdue/store/user/
The Grid directory at Purdue EOS is created only for Purdue-affiliated users, and must be requested when creating a Purdue Tier-2 account. If you believe you should have one, contact support.
The eos-cern symlink shows up as a file, not a directory
Restart the session (File → Hub Control Panel → Stop My Server), then run
the eos-connect command again — see CERNBox access.
Software and kernels¶
pixi shell / pixi install fails in my home directory
This is intentional: Pixi projects must be located outside of /home/ to
avoid overflowing the 25 GB home quota. Move the project to /work/ or
/depot/ — see Pixi storage locations.
My Pixi environment doesn't show up in the project-aware kernel
The environment must have the ipykernel package installed:
Also make sure that the notebook is located in (a subdirectory of) the Pixi project directory.
My Conda environment doesn't show up as a Jupyter kernel
- The environment must have the
ipykernelpackage installed. - Kernel discovery takes 1–2 minutes after the package is installed.
- The environment must be stored in a publicly readable directory — private Depot directories will not work. See Storage volumes.
A package is missing from the global environment
Contact support — we regularly update the global Pixi environment. Alternatively, create your own Pixi environment with the packages you need.
Data access¶
XRootD reads fail with authentication errors
Your VOMS proxy is probably missing or expired. Check with voms-proxy-info,
and create a fresh proxy if needed:
A dataset I need is not accessible / only on tape
If no CMS site has the files on disk, a tape recall is necessary: create a Rucio replication rule to subscribe the dataset to Purdue — see Rucio tutorial.
Dask Gateway¶
Cluster creation times out
Cluster creation fails if the scheduler doesn't start within 3 minutes (Kubernetes backend) or 10 minutes (Slurm backend). This sometimes happens due to resource contention — simply try resubmitting the cluster.
I can't create a cluster: \"You may only have 1 active Dask Gateway cluster(s)\"
Each user can have at most one active cluster per gateway at a time. Shut down your existing cluster (see Shutting down clusters), or wait for it to finish stopping, then try again.
Workers fail to start or crash immediately
- Check that the Pixi/Conda environment passed to
new_cluster()is visible to the workers: Slurm workers can only see/depot/; Kubernetes workers can see/depot/and/work/. See the storage access table. - CERN/FNAL users: make sure the
envdictionary containsNB_UIDandNB_GID(passingenv = dict(os.environ)is sufficient).
My cluster disappeared
Idle clusters (no connected clients — e.g. after the notebook that created the cluster is terminated) are shut down automatically: after 1 hour on the Kubernetes backend, and after 24 hours on the Slurm backend. Slurm workers are additionally limited by a 4-hour Slurm job walltime.
Workers can't read my data via XRootD
Pass the VOMS proxy location to the workers, and make sure the proxy file itself is on a volume the workers can read (e.g. Depot for Slurm workers):
SSH and IDE connections¶
Remote-SSH connection from VSCode/Cursor fails
See the troubleshooting section of the IDE connection guide.
The usual suspects: home directory permissions (chmod 755 ~/), a missing or
not-on-PATH websocat binary, or an expired JupyterHub token.
Still stuck?¶
Send us a message — see Support. Please include your username, the login method you used (Purdue / CERN / FNAL), and the approximate time when the problem occurred.