CUDA, cuDNN, and PyTorch Compatibility

Hello,

I’m trying to set up a specific environment on my university’s HPC, which restricts sudo access. The HPC has Python >=3.9 and CUDA >=11.7. For my project, I need Python 3.6 and PyTorch 0.4.1, compatible with CUDA 9.2 and cuDNN 7.2.1.

What I’ve done:

  1. Created a conda environment with Python 3.6.
  2. Installed cudatoolkit=9.2 and cudnn=7.2.1.
  3. Installed PyTorch 0.4.1 using conda install pytorch=0.4.1 cuda92 -c pytorch.

Issues:

  • When installing pytorch 0.4.1 in this env i got env conflicts, so i created a python venv inside the conda env and installed 0.4.1 using pip.
  • When running nvcc --version, it shows CUDA 9.2. Also torch.cuda.version returns 9.2 which is good.
  • torch.version.cuda shows 9.2, but torch.backends.cudnn.version() returns 7.1 instead of 7.2.1.
  • During training, I encounter the error: RuntimeError: CuDNN error: CUDNN_STATUS_EXECUTION_FAILED.

Questions:

  1. Why is cuDNN version 7.1 instead of 7.2.1?
  2. How can I correctly set up the environment to avoid conflicts?
  3. How to solve CUDNN error?

Thank you for your help!

Hello @Arpan_Gyawali :slight_smile: Thank you for posting!

It looks like the issue isn’t related to conda, so unfortunately, we won’t be able to help with this one. I saw that you already posted your question on the PyTorch forum—good move!

In the future, I suggest creating lock files for your conda environments, especially the more complex ones. You can learn more about the conda-lock tool here.

1 Like

I’d advice to put all the packages together in the same conda create command so the solver can consider all the possible solutions at once:

conda create -n py36torch --override-channels -c defaults python=3.6 cudnn=7.2.1 pytorch=0.4.1 --platform=linux-64 
Channels:
 - defaults
Platform: linux-64
Collecting package metadata (repodata.json): done
Solving environment: failed

LibMambaUnsatisfiableError: Encountered problems while solving:
  - package cudnn-7.2.1-cuda9.2_0 requires cudatoolkit 9.2.*, but none of the providers can be installed

Could not solve for environment specs
The following packages are incompatible
├─ cudnn 7.2.1**  is installable and it requires
│  └─ cudatoolkit 9.2.* , which can be installed;
└─ pytorch 0.4.1**  is not installable because it requires
   └─ cudatoolkit 9.0.* , which conflicts with any installable versions previously reported.

You’ll see that there’s no solution for that because those versions require different cudatoolkit versions. The pytorch channel doesn’t have 0.4.1 anymore either. So I think you are out of luck :grimacing:

i created a python venv inside the conda env and installed 0.4.1 using pip.

This is the problem. The PyPI package you installed with pip includes its own libraries so it won’t use conda-forge-installed packages. If you can find the packages you need in PyPI, I’d stick to only using pip. Unfortunately I couldn’t find wheels for cudnn that old.

If you can afford a mild deviation the versions, cudnn=7.3 does solve:

conda create -n py36torch --override-channels -c defaults python cudnn=7.3 pytorch=0.4.1 --platform=linux-64

If that’s inadequate, I assume you are trying to reproduce an old environment given these (outdated) versions. I’d try using Docker images to get those versions installed with the system package manager. Best of luck, this is a tricky one!

2 Likes

Thank you so much @jaimergp
I will try these
Actually what i am trying to do is set up env for GitHub - MIC-DKFZ/medicaldetectiontoolkit: The Medical Detection Toolkit contains 2D + 3D implementations of prevalent object detectors such as Mask R-CNN, Retina Net, Retina U-Net, as well as a training and inference framework focused on dealing with medical images. this framework and having this problem. If you can help me with this, it would be great