The pytorch and nvidia channels aren't playing nicely together

If I set up a conda pytorch environment like this:

conda activate pytorch-cuda
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

That works; at least insofar as being able to import torch in python. If, however, I add cuDNN:

conda install cudnn -c nvidia

Things are no longer warm and fuzzy:

(torch-cuda1) pgoetz@finglas ~$ python --version
Python 3.11.5
(torch-cuda1) pgoetz@finglas ~$ python
Python 3.11.5 (main, Sep 11 2023, 13:54:46) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/lusr/opt/miniconda/envs/torch-cuda1/lib/python3.11/site-packages/torch/__init__.py", line 229, in <module>
from torch._C import *  # noqa: F403
^^^^^^^^^^^^^^^^^^^^^^
ImportError: /lusr/opt/miniconda/envs/torch-cuda1/lib/python3.11/site-packages/torch/lib/libc10_cuda.so: undefined symbol: cudaMemPoolSetAttribute, version libcudart.so.11.0
>>> 

What’s happening is the cuDNN conda package is installing and relinking an older version of libcudart.so.11.0. Here is what is in /miniconda/envs/pytorch-cuda/lib before cuDNN is installed:

# ls -l libcudart*
-rwxr-xr-x 3 root root 695712 Sep 21  2022 libcudart.so.11.8.89

Here is what it looks like after the cudnn package is installed from the nvidia channel:

# ls -l libcudart*
lrwxrwxrwx 1 root root     20 Sep 25 13:12 libcudart.so -> libcudart.so.11.1.74
lrwxrwxrwx 1 root root     20 Sep 25 13:12 libcudart.so.11.0 -> libcudart.so.11.1.74
-rwxr-xr-x 2 root root 554032 Oct 14  2020 libcudart.so.11.1.74
-rwxr-xr-x 3 root root 695712 Sep 21  2022 libcudart.so.11.8.89

It looks like something similar is happening with libcusparse.so.11, and possibly other libraries, I didn’t bother trying to track them all down.

My question is whom can should I bring this up with? The maintainers of the pytorch and nividia channels? Possibly just the nvidia channel?

Thanks.

I would reach out directly to nvidia. It looks like they also provide a forum:

1 Like