This guide illustrates the use of Singularity containers to run a job on the GPU compute node on Spiedie.
Be Aware: as of 2025, access to GPU nodes is managed carefully to prevent misuse. Please contact Spiedie’s administrator, Phil Valenta, for access to GPU nodes.
If you’de like to run the same job, but using Conda instead of Apptainer, click here!
Click here for more information on when to use Conda vs. Apptainer vs. Modules
Things covered in this guide:
- Creating or pulling Apptainer containers
- Binding module directories to container environment
- Running a GPU-enabled workload
Requirements to complete this guide:
- Familiarity with Spiedie (try the quick start if you haven’t)
- Local installation of Apptainer
- Familiarity with shell commands and python
Creating or Pulling Apptainer Containers
For this tutorial, we will be using a custom singularity image file (SIF) to run gpu-enabled Tensorflow on a GPU-compute node.
We have a few options for creating our container image. We can either create the image on a local machine and transfer it to Spiedie or we can generate the image on the singularity-hub container repository.
Creating Apptainer Containers
If you would like to create the image locally first, you must install Apptainer on a machine in which you have root access.
We will use the verified TensorFlow-GPU recipe available on the Apptainer Recipe Hub.
Once you have downloaded the recipe, build the image by simply running:
sudo apptainer build spiedie_tf_gpu.simg spiedie_tensorflow_gpu.def
on your local machine.
** Note: This may take a few minutes depending on the speed of your machine. The resulting .simg file may >3 GB **
Once the container image is created, we will need to transfer image to Spiedie.
Log in to Spiedie and create a new directory for our new project.
mkdir GPU_Compute_Example
Transfer the image using scp from your local machine:
scp spiedie_tf_gpu.simg username@spiedie.binghamton.edu:./GPU_Compute_Example/
For more data transer instructions, click here
We will be running a simple 5000 element dot product on our P100 GPUs and logging the device placement. You can download here.
Download the source code and transfer it to the project directory from your local machine with:
scp tf_gpu.py username@spiedie.binghamton.edu:./GPU_Compute_Example/
Once the source code is uploaded, we can write the batch script to submit our job request.
Create a new file in the same directory on Spiedie called, apptainer_gpu_test.sh.
touch apptainer_gpu_test.sh
Using your preferred editor such as nano, emacs, or vim, edit the new file directly on Spiedie.
You can also locally write the script and transfer it once you are done.
The first line in the batch script must be the shebang. So we must have,
#!/bin.bash
Next, we will name our job so we are able to monitor it if we wish to on the slurm queue. To assign a job name add :
#SBATCH --job-name=APPTAINER_GPU_TEST
This will name the job CUBLASTEST.
Next, we will assign output file to log all the standard output from our program.
#SBATCH --output=tf_gpu_output.log
This will direct the output of the program to the cuda_out.log file.
Next, we must request the correct partition for our program to properly run and have access to the P100 gpus available on Spiedie. We therefore request the gpucompute partition with:
#SBATCH --partition=HarpurGPU
We can use the default number of nodes (1) and default memory for this program.
Finally, we should also let SLURM know how many tasks we will require for our program. Since we will not be using any parallel CPU computation, we will only request one.
#SBATCH --ntasks=1
We’ve finished defining our resource allocation parameters for our job.
Binding Module Directories
Apptainer is already available in Spiedie’s default configuration, so no additional module needs to be loaded.
We do need the CUDA drivers available on Spiedie to succesfully have access to the GPUs, so we must load the CUDA toolkit, and associated driver with.
module load cuda/11.8
The loaded library will not be available to the container automatically, so before we run the container, we must expose the module directory to the container.
We can add additional directories to the container, by simply binding those directories. We add additional paths by updating the APPTAINER_BINDPATH variable.
We can also bind paths using the -B/–bind flag. click here for more details
export APPTAINER_BINDPATH="/opt/ohpc/pub/apps/cuda/11.8/lib64/"
Running a GPU-enabled Workload
Finally our environment is set up and we can run our test file. Add the line
apptainer run spiedie_tf_gpu.simg python3.6 tf_gpu.py
The complete apptainer_gpu_test.sh file:
#!/bin.bash
#SBATCH --job-name=APPTAINER_GPU_TEST
#SBATCH --output=tf_gpu_output.log
#
#SBATCH --partition-gpucompute
#SBATCH --ntasks=1
export APPTAINER_BINDPATH="/opt/ohpc/pub/apps/cuda/11.8/lib64/"
apptainer run spiedie_tf_gpu.simg python3.6 tf_gpu.py
Click here to download the complete batch file.
We can now queue the job to SLURM with:
sbatch apptainer_cpu_test.sh
The job should be queued and the output logged in tf_gpu_output.log.
You can run:
cat tf_gpu_output.log
You should see the following output:
Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla P100-PCIE-12GB, pci bus id: 0000:82:00.0, compute capability: 6.0
/job:localhost/replica:0/task:0/device:GPU:1 -> device: 1, name: Tesla P100-PCIE-12GB, pci bus id: 0000:83:00.0, compute capability: 6.0
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:XLA_GPU:1 -> device: XLA_GPU device
2019-07-15 13:42:49.767029: I tensorflow/core/common_runtime/direct_session.cc:317] Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla P100-PCIE-12GB, pci bus id: 0000:82:00.0, compute capability: 6.0
/job:localhost/replica:0/task:0/device:GPU:1 -> device: 1, name: Tesla P100-PCIE-12GB, pci bus id: 0000:83:00.0, compute capability: 6.0
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:XLA_GPU:1 -> device: XLA_GPU device
MatMul: (MatMul): /job:localhost/replica:0/task:0/device:GPU:0
2019-07-15 13:42:49.768241: I tensorflow/core/common_runtime/placer.cc:1059] MatMul: (MatMul)/job:localhost/replica:0/task:0/device:GPU:0
a: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2019-07-15 13:42:49.768275: I tensorflow/core/common_runtime/placer.cc:1059] a: (Const)/job:localhost/replica:0/task:0/device:GPU:0
b: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2019-07-15 13:42:49.768300: I tensorflow/core/common_runtime/placer.cc:1059] b: (Const)/job:localhost/replica:0/task:0/device:GPU:0