Install TensorFlow with CUDA, cuDNN in Anaconda Environment

Introduction

Last update: 11/11/2023 – TensorFlow 2.14

Notice:

Prerequisites

NVIDIA Driver: a.k.n. Graphics card driver. 535 is the latest version as I am writing this article.

sudo apt install nvidia-driver-535

Then, reboot your system. Check the driver is properly installed with the following command.

nvidia-smi

Install CUDA (a.k.a CUDA Toolkit) and cuDNN

Assuming an Anaconda environment you use is jaerock. Python 3.10 is chosen to be safe since TensorFlow 2.14 requires 3.9 - 3.11. Also, note that cuDNN 8.6 is recommended according to TensorFlow 2.14 software requirements. No conda package of cuDNN 8.6 was found in the Anaconda default channel. So, 8.9 is used instead.

conda create --name jaerock python=3.10
conda activate jaerock
conda install cudatoolkit=11.8 cudnn=8.9

mkdir -p $CONDA_PREFIX/etc/conda/activate.d
echo 'export OLD_LD_LIBRARY_PATH=${LD_LIBRARY_PATH}' > $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh

mkdir -p $CONDA_PREFIX/etc/conda/deactivate.d
echo 'export LD_LIBRARY_PATH=${OLD_LD_LIBRARY_PATH}' >  $CONDA_PREFIX/etc/conda/deactivate.d/env_vars.sh
echo 'unset OLD_LD_LIBRARY_PATH' >> $CONDA_PREFIX/etc/conda/deactivate.d/env_vars.sh

Deactivate the conda environment or just close the current working terminal and re-open it.

conda activate jaerock
pip install tensorflow[and-cuda]

After installing, you can test if the TensorFlow runs with GPUs.

python3 -c "import os; os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'; import tensorflow as tf; print('Num GPUs Available: ', len(tf.config.list_physical_devices('GPU')))"

You will see Num GPUs Available: #

If # is other than 0, you are all set.

2

Gazebo9 Model Server Error

Problem

The version of the default Gazebo9 with ROS Melodic on Ubuntu 18.04 is 9.0.0. This version has an issue connecting to the API server for Gazebo models. The error messages below indicate that Gazebo cannot connect to the server.

[Err] [REST.cc:205] Error in REST request

libcurl: (51) SSL: no alternative certificate subject name matches target host name 'api.ignitionfuel.org'

Solution

Gazebo Install

NOTE: People say that the server name inside ~/.ignioin/fuel/config.yaml must be changed. But this is not true. You don’t need to change this file. Just keep the original.

Upgrade your Gazebo 9.0.0 to the latest Gazebo 9. As of today when I write this article, 9.19.0 is the latest version of Gazebo 9.

sudo apt update
sudo apt install gazebo9

Check your Gazebo version.

gazebo --verbose
Gazebo multi-robot simulator, version 9.19.0
Copyright (C) 2012 Open Source Robotics Foundation.
Released under the Apache 2 License.
http://gazebosim.org

Gazebo multi-robot simulator, version 9.19.0
Copyright (C) 2012 Open Source Robotics Foundation.
Released under the Apache 2 License.
http://gazebosim.org

[Msg] Waiting for master.
[Msg] Waiting for master.
[Msg] Connected to gazebo master @ http://127.0.0.1:11345
[Msg] Connected to gazebo master @ http://127.0.0.1:11345
[Msg] Publicized address: 10.0.2.15
[Msg] Publicized address: 10.0.2.15
[Msg] Loading world file [/usr/share/gazebo-9/worlds/empty.world]

If you still see 9.0.0, please follow the steps below.

  • Setup your computer to accept software from packages.osrfoundation.org.
sudo sh -c 'echo "deb http://packages.osrfoundation.org/gazebo/ubuntu-stable `lsb_release -cs` main" > /etc/apt/sources.list.d/gazebo-stable.list'
  • Setup keys
wget https://packages.osrfoundation.org/gazebo.key -O - | sudo apt-key add -
  • Install Gazebo9
sudo apt update
sudo apt install gazebo9
sudo apt install libgazebo9-dev

Upgrade libignition-math2

After upgrading Gazebo 9.0.0 to 9.19.0, when you start gazebo, you may see error messages. Then you have to upgrade libignition-math2.

  • Upgrade libignition-math2
sudo apt upgrade libignition-math2

VirtualBox sudo error

When you use sudo, you will see the error messages as below.

<em>user-name</em> is not in the sudoers file. This incident will be reported.

The default user does not have the sudo group. Let’s assume that the user name is jaerock. We need to make jaerock have sudo group. To make this change, use usermod command. Only super user can use this command. Thus, switch to super user using su, then you will be asked to enter a password. Use your user password. Then use usermod command to add sudo group to your user account. After this, simply use exit. Then you will be back to your account.

su

usermod -a -G sudo jaerock

exit

CUDA and cuDNN inside a Conda Env

DO NOT INSTALL cuda through $ sudo apt install cuda since this will install the latest NVIDIA driver as well without asking. The newest NVIDIA driver might not work with a particular kernel version. Through my ordeals, I figured out that only some particular combinations work.

The safest way to install CUDA is to use a conda environment. First, install cuda and cudnn inside your conda environment. All the conda related libraries are located in ~/anaconda3/envs/<env-name>/lib. To let your environment know the location of the CUDA libraries LD_LIBRARY_PATH needs to be used.

Activate the environment first. Assuming the environment name is env-name, the command is like this.

conda activate env-name

Then run the following commands.

mkdir -p $CONDA_PREFIX/etc/conda/activate.d
echo 'export OLD_LD_LIBRARY_PATH=${LD_LIBRARY_PATH}' > $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh

mkdir -p $CONDA_PREFIX/etc/conda/deactivate.d
echo 'export LD_LIBRARY_PATH=${OLD_LD_LIBRARY_PATH}' >  $CONDA_PREFIX/etc/conda/deactivate.d/env_vars.sh
echo 'unset OLD_LD_LIBRARY_PATH' >> $CONDA_PREFIX/etc/conda/deactivate.d/env_vars.sh

Deactivate your environment and activate it again. Check if your TensorFlow properly works with GPUs.

python3 -c "import os; os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'; import tensorflow as tf; print('Num GPUs Available: ', len(tf.config.list_physical_devices('GPU')))"

You will sse Num GPUs Available: #

If # is other than 0, you are all set.

8
2

Ubuntu Installation on Alienware x15 R2

Last few days, I spent hours and hours making NVIDIA GPU (GeForce RTX 3070 Ti 8GB) work on my Alienware x15 R2.

I would like to share the lessons that I learned from the ordeal.

The most important point is that “NEVER EVER INSTALL cuda meta package via apt install.” If you do, it will replace your current NVIDIA driver with the latest release which does not work with your current kernel.

Here are combinations that worked and did not work for your reference.

TL;DR

  • Install Ubuntu 20.04 LTS.
  • Update software. Make sure your kernel is 5.15.0-53.
  • Upgrade the OS to Ubuntu 22.04 LTS. This upgrade got NVIDIA driver 515 installed by default.
  • Install a Ubuntu driver of Killer 1690/1675/1650 Wi-Fi.
    • $ sudo apt install backport-iwlwifi-dkms

Ubuntu 20.04 LTS

  • kernel version: 5.8.0-43
  • NVIDIA driver: 471

This fresh install works with GPU. But, WIFI, speaker, and microphone do not work. After Linux firmware installation from one of the recent versions at http://mirrors.edge.kernel.org/ubuntu/pool/main/l/linux-firmware/, WIFI and speaker work.

When I update the software, which includes kernel upgrade, the kernel version is as follows.

  • kernel version: 5.15.0-53

This kernel does not work with NVIDIA driver 471.

When I install the latest kernel, 6.0.9 as of 11/22/2022, WIFI and speaker work but no luck on GPU.

Ubuntu 22.04 LTS

Here is what I did to make WIFI, speaker, mic, and even suspend work.

  • Install Ubuntu 20.04 LTS. (I couldn’t install 22.04 LTS directly since, somehow, my Alienware didn’t allow me to install 22.04 LTS from a bootable thumb drive)
  • Update software. This got my kernel upgrade from 5.8.0-43 to 5.15.0-53.
  • Upgrade the OS to Ubuntu 22.04 LTS. (When I upgraded my Ubuntu 20.04, the kernel version was 5.15.0-53. I haven’t tried to upgrade to 22.04 LTS from the original kernel)
    • This upgrade got NVIDIA driver 515 installed by default.
  • This upgraded Ubuntu 22.04 LTS (kernel 5.15.0-53) with NVIDIA driver 515 makes GPU, speaker, microphone work except WIFI.
  • After checking Alienware x15 R2 specifications, I knew Killer WIFI AX1675 was used for the machine.
  • Install a Ubuntu driver of Killer 1690/1675/1650 Wi-Fi.
    • $ sudo apt install backport-iwlwifi-dkms