Docker

Docker is a great tool, but you have to understand it first. Otherwise, you may get into a lot of troubles!

Generally is docker virtualisation tool. It means for you a possibility to train on a different version of packages, clear environment and you in the charge of everything!

Image vs Container

In the beginning, you may get confused about what is what. Also, their purpose may be confusing.

Image

Image is like a clear installation of your OS. There may be some other packages and tools installed, but it is a starting point for your container. You can also imagine that it is like a template.

Let's get into example... Imagine you have a clean installation of Ubuntu LTS 18.04. You want to train new deep super-resolution gan network. What do you need?

  • cuda
  • python (probably 3.x)
  • tensorflow-gpu

What would you do? I guess you will use apt-get install -y python=2.7.0 to install python. Then you can open the NVidia website to find install guide for Cuda. After that use pip install tensorflow-gpu to install tensorflow-gpu. It is a long way. Later after a few months, you need another version of Cuda. For instance downgrade to 9.0. Again install Cuda? Pain! Let's meet docker!

With docker, you can use others work as a starting point! There are two ways to do it. One is using official tensorflow-gpu image and the other is your own build.

build

Create Dockerfile in some empty directory.

FROM nvidia/cuda:10.1-base-ubuntu18.04 # see https://hub.docker.com/r/nvidia/cuda/

RUN pip install --yes tensorflow-gpu

Now you need to build your image. It is super easy! Only run docker build command with few parameters.

docker build -t dobromi1:cuda-tensorflow .

Let's explain it.

docker build \                         ### base command
     -t dobromi1:cuda-tensorflow \    ### your own imagename:tag
    .                                 ### build directory context

Thats all! For more options see official build documentation.

Container

For now, no matter what image you decide to use, you can run your first container. Running container means to start your virtual machine. Lest do it!

docker run -u $(id -u):$(id -g) --gpus all --rm nvidia/cuda nvidia-smi

This command will start the container with Cuda installer and it will show you an information about GPU usage, as you know it from Jupiter examples.

Let’s start some training!

docker run \
    -u $(id -u):$(id -g) \
    --gpus all \
    -it \
    -v /home/dobromi1/Experiments/device-list/:/workingdir \
    -w /workingdir \
    dobromi1:cuda-tensorflow \
    python ./tf_device_list.py

PERSISTENCE

Always use docker with persistent storage (-v param). If you store anything in a docker container in a not mounted folder, you will LOSE it.

Below are some useful tips for your training.

run

You may get confused from lots of parameters used in run command. Their explanation is below.

parameters

docker run                                     ### base command
    -u $(id -u):$(id -g)                        ### map user name into container
    --gpus all                                 ### GPU visible in docker container
    --rm                                     ### remove container after training (execution) is done, use it fo cleaning space!
    -v /path/on_alisa/:/path/in/container     ### maping directory or file from alisa to path inside container
    -w /wd                                     ### working dir of container
    dobromi1:cuda-tensorflow                 ### image:tag
    --name dobromi1-class-tumor-rmi            ### container name
    python -c "from tensorflow.python.client import device_lib; device_lib.list_local_devices()" ### command to run inside container

interactive mode

docker run                         ### base command
    -u $(id -u):$(id -g)           ### map user name into container
    --gpus all                     ### GPU visible in docker container
    --rm                         ### remove container after training (execution) is done, use it fo cleaning space!
    -it                         ### -i is for connection input, -t is for connection tty
    dobromi1:cuda-tensorflow     ### image:tag
    bash                         ### run bash console inside container

log

If you are running a container on the background (with -d parameter) and you need to see console output. There is an easy way with docker logs.

docker logs -f container_name

connecting to a running container

Sometimes you want to connect to the running container and execute some command.

docker exec -it container_name_or_id bash