CUDA and cuDNN inside a Conda Env

DO NOT INSTALL cuda through $ sudo apt install cuda since this will install the latest NVIDIA driver as well without asking. The newest NVIDIA driver might not work with a particular kernel version. Through my ordeals, I figured out that only some particular combinations work.

The safest way to install CUDA is to use a conda environment. First, install cuda and cudnn inside your conda environment. All the conda related libraries are located in ~/anaconda3/envs/<env-name>/lib. To let your environment know the location of the CUDA libraries LD_LIBRARY_PATH needs to be used.

Activate the environment first. Assuming the environment name is env-name, the command is like this.

conda activate env-name

Then run the following commands.

mkdir -p $CONDA_PREFIX/etc/conda/activate.d
echo 'export OLD_LD_LIBRARY_PATH=${LD_LIBRARY_PATH}' > $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh

mkdir -p $CONDA_PREFIX/etc/conda/deactivate.d
echo 'export LD_LIBRARY_PATH=${OLD_LD_LIBRARY_PATH}' >  $CONDA_PREFIX/etc/conda/deactivate.d/env_vars.sh
echo 'unset OLD_LD_LIBRARY_PATH' >> $CONDA_PREFIX/etc/conda/deactivate.d/env_vars.sh

Deactivate your environment and activate it again. Check if your TensorFlow properly works with GPUs.

python3 -c "import os; os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'; import tensorflow as tf; print('Num GPUs Available: ', len(tf.config.list_physical_devices('GPU')))"

You will sse Num GPUs Available: #

If # is other than 0, you are all set.

8
2

Ubuntu Installation on Alienware x15 R2

Last few days, I spent hours and hours making NVIDIA GPU (GeForce RTX 3070 Ti 8GB) work on my Alienware x15 R2.

I would like to share the lessons that I learned from the ordeal.

The most important point is that “NEVER EVER INSTALL cuda meta package via apt install.” If you do, it will replace your current NVIDIA driver with the latest release which does not work with your current kernel.

Here are combinations that worked and did not work for your reference.

TL;DR

  • Install Ubuntu 20.04 LTS.
  • Update software. Make sure your kernel is 5.15.0-53.
  • Upgrade the OS to Ubuntu 22.04 LTS. This upgrade got NVIDIA driver 515 installed by default.
  • Install a Ubuntu driver of Killer 1690/1675/1650 Wi-Fi.
    • $ sudo apt install backport-iwlwifi-dkms

Ubuntu 20.04 LTS

  • kernel version: 5.8.0-43
  • NVIDIA driver: 471

This fresh install works with GPU. But, WIFI, speaker, and microphone do not work. After Linux firmware installation from one of the recent versions at http://mirrors.edge.kernel.org/ubuntu/pool/main/l/linux-firmware/, WIFI and speaker work.

When I update the software, which includes kernel upgrade, the kernel version is as follows.

  • kernel version: 5.15.0-53

This kernel does not work with NVIDIA driver 471.

When I install the latest kernel, 6.0.9 as of 11/22/2022, WIFI and speaker work but no luck on GPU.

Ubuntu 22.04 LTS

Here is what I did to make WIFI, speaker, mic, and even suspend work.

  • Install Ubuntu 20.04 LTS. (I couldn’t install 22.04 LTS directly since, somehow, my Alienware didn’t allow me to install 22.04 LTS from a bootable thumb drive)
  • Update software. This got my kernel upgrade from 5.8.0-43 to 5.15.0-53.
  • Upgrade the OS to Ubuntu 22.04 LTS. (When I upgraded my Ubuntu 20.04, the kernel version was 5.15.0-53. I haven’t tried to upgrade to 22.04 LTS from the original kernel)
    • This upgrade got NVIDIA driver 515 installed by default.
  • This upgraded Ubuntu 22.04 LTS (kernel 5.15.0-53) with NVIDIA driver 515 makes GPU, speaker, microphone work except WIFI.
  • After checking Alienware x15 R2 specifications, I knew Killer WIFI AX1675 was used for the machine.
  • Install a Ubuntu driver of Killer 1690/1675/1650 Wi-Fi.
    • $ sudo apt install backport-iwlwifi-dkms