This tutorial is designed to get you up and running on Spiedie as quickly as possible. You should be able to run simple programs on the Spiedie cluster by the end of this guide.
Things covered in this guide:
Requirements to complete the guide:
- A Spiedie user account and password
- Computer connected to the internet
- SSL VPN (Pulse) (if not connected to the school internet)
- Familiarity with programming and basic command-line experience
Log in
After acquiring your username and password, you should be able to log in to the Spiedie cluster following the steps listed here.
You will be logged in at your home directory. You can add modules and quickly prototype the code you with to run here. You should note run any lengthy programs in the log in node as it may cause disruptions for other users. You should only run programs using srun
and sbatch
, which will be explained these tutorials.
Once logged in you can create a new directory by running
mkdir quick_start
This will create a new directory called quick_start in your Home directory. You can quickly verify by typing ls
.
Transfer files to Cluster
Before we go further, download the python script we will be running. There are various ways to transfer data to and from cluster.
In this example we will be using SCP to transfer the data from our local machine to the quick_start directory located on the Spiedie server.
Exit the Spiedie cluster or open a new terminal on your local machine and cd
into the directory of the downloaded python script and run
scp quick_start.py username@spiedie.binghamton.edu:quick_start/
replace username with your username and fill in your password when prompted. This should place the quick_start.py file on the quick_start directory on Spiedie.
To verify the transfer, go back to your logged in spiedie session and run
[spiedie81 ~]$ cd quick_start/
[spiedie81 quick_start]$ ls
quick_start.py
You should see the python file listed on your screen.
Run the program
Since this is a small prototype program we believe will finish quickly, we will run it interactively using srun
. For larger programs we usually need to write a batch script and use sbatch
. Running more complex jobs and srun
and sbatch
are covered here and here
Go back to your logged in Spiedie session and make sure you are in the directory of the quick_start.py.
We can take a quick look at the python file by using the cat
command.
[watson@spiedie81 quick_start]$ cat quick_start.py
import random
import time
def rand(start, end, length):
ret = []
for i in range(length):
ret.append(random.randint(start,end))
return ret
def main():
start_time = time.clock()
for i in range (100):
unsorted_list = rand(0,100000,1000)
sorted_list = unsorted_list.sort()
end_time = time.clock()
print("Welcome to Spiedie!")
print("You are running the quick start python tutorial.")
print ("Soritng done")
print(" I ran for " + str(end_time-start_time) + " s")
main()
[watson@spiedie81 quick_start]$
A full breakdown of this script is outside the scope of this documentation, but the gist is that it sorts an array of 1000 elements 100 times, measures the time necessary to complete the task, and prints a message to the specified output.
We will interactively run the quick_start.py on Spiedie by using the srun
command. srun
will submit our job to the SLURM queue to be allocated, and the output will be printed to the terminal.
You should familiarize with the different partitions and compute capabilities of Spiedie, as different partitions may be more well suited depending on the task.
You can get a quick overview of the cluster by running the sinfo
command.
We’ll be using the quick partition as it is used mostly for rapid prototyping. You’ll notice quick has a time cutoff of 10 minutes, so jobs are automatically cleared after 10 minutes. This makes sure we don’t have to wait too long for allocation. Before we ask for allocation on the cluster, we can check how busy the system is by running squeue
.
This will list all jobs currently running and waiting to be allocated. You can learn more about how SLURM priorities work here
Let’s run the quick_start.py program. Run
srun --partition=quick python3 quick_start.py 1>quick_start.log 2>quick_start_error.log &
This will send your job to the SLURM daemon to be allocated and then run on a quick partition node. We have chosen to use default parameters for srun such number of nodes (1) and number of tasks (1) and number of CPU’s (1).
It isn’t necessary to include the content after you specify the python file. We included the 1>quick_start.log 2>quick_start_error.log
here for convenience as it will direct the outputs of the program to the quick_start.log and quick_start_error.log.
The &
ant the end of the command will run the process in the background, allowing us to continue using the terminal.
You can see a full explanation of the shell command here
Checking job status
Since we retained the control of the terminal we can check the allocation status of the job by running
squeue -u username
Replace username
with your user name and you should see your job listed as either pending or active and on which node it has been allocated to.
You can also check your account status using
sacct
Click here for a list of other basic SLURM commands.
Once the program has finished running, you can view the output by running
cat quick_start.log
And check for any error messages using
cat quick_start_error.log