Slurm is a workload manager which is used in a cluster server. It has their own command to run jobs.
https://slurm.schedmd.com/documentation.html
sinfo
$ sinfo -lN
NODELIST NODES PARTITION STATE CPUS S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON
g-11 1 all* mixed 6 1:6:1 120000 0 1 (null) none
g-12 1 all* mixed 6 1:6:1 120000 0 1 (null) none
srun
$ for i in {30..50};do srun -c2 --mem=5000 -J cal_${i} ldtable.py -th 0.01 -rh 101,100 -n ${i} --approx -c 2 & done
srun
: send a command to queue.-c2
: number of cpus required per task--mem=5000
: minimum amount of real memory (--mem=MB)-J cal_${i}
: task name on the list of queue
Note that if you do not assign --mem
, only one task will be allocated to one node even the node has available cpus.
squeue
$ squeue -S LIST -l
Wed Aug 5 16:18:30 2020
JOBID PARTITION NAME USER STATE TIME TIME_LIMI NODES NODELIST(REASON)
6632 all java user1 RUNNING 38:42 UNLIMITED 1 g-11
6634 all java user1 RUNNING 6:40 UNLIMITED 1 g-12
6635 all java user1 PENDING 0:00 UNLIMITED 1 (Resources)
6636 all java user1 PENDING 0:00 UNLIMITED 1 (Priority)
6637 all java user1 PENDING 0:00 UNLIMITED 1 (Priority)
scancel
$ scancel -uycho
$ scancel --help
Usage: scancel [OPTIONS] [job_id[_array_id][.step_id]]
-A, --account=account act only on jobs charging this account
-b, --batch signal batch shell for specified job
-f, --full signal batch shell and all steps for specified job
-H, --hurry avoid burst buffer stage out
-i, --interactive require response from user for each job
-M, --clusters clusters to issue commands to.
NOTE: SlurmDBD must be up.
-n, --name=job_name act only on jobs with this name
-p, --partition=partition act only on jobs in this partition
-Q, --quiet disable warnings
-q, --qos=qos act only on jobs with this quality of service
-R, --reservation=reservation act only on jobs with this reservation
--sibling=cluster_name remove an active sibling job from a federated job
-s, --signal=name | integer signal to send to job, default is SIGKILL
-t, --state=states act only on jobs in this state. Valid job
states are PENDING, RUNNING and SUSPENDED
-u, --user=user_name act only on jobs of this user
-V, --version output version information and exit
-v, --verbose verbosity level
-w, --nodelist act only on jobs on these nodes
--wckey=wckey act only on jobs with this workload
charactization key
Help options:
--help show this help message
--usage display brief usage message