Working on Tribble/Trouble
Trouble and Tribble are IBM Power9 machines with four V-100 GPUs for use with neural networks. They share the same file system, so once you set one up, you’ll be able to use either. We’ll be using a library called Tensorflow, Google’s neural network library. To get Tensorflow to work on Trouble/Tribble with GPU support, you should do the following steps:
- Log on to tribble.academy.usna.edu (or trouble.academy.usna.edu) using your academy credentials
conda config --prepend channels https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda/conda create -n <some environment name>conda activate <your environment name>conda install tensorflow-gpu- Every time you log in, you’ll activate the virtual environment with
conda activate <your environment name>
Sharing GPUs
Neural Net libraries try to be fast, by allocating all memory on all visible GPUs at once, in advance. That’s not conducive for sharing a machine! You only need one of those GPUs! When you’re about to run a job, then, run the command nvidia-smi to see which GPUs are currently in use. Then request one of the others. The following is an example of some code that makes only one, requested, GPU visible, using a command line argument: