3.2.2 Slurm Cluster User Manual
3.2.2.1 Usage of the Slurm CPU Cluster
Introduction of CPU Resources
- Users of the Slurm CPU cluster are from the following groups:
Group | Application | Contact perspon |
---|---|---|
mbh | Black hole | Yanrong Li |
bio | Biology | Lina Zhao |
cac | Chemistry | Lina Zhao |
nano | Nanophysics | Lina Zhao |
heps | Accelerator design for HEPS | Yi Jiao / Zhe Duan |
cepcmpi | Accelerator design for CEPC | Yuan Zhang / Yiwei Wang |
alicpt | Ali experiment | Hong Li |
bldesign | Beamline studies for HEPS | Haifeng Zhao |
raq | Quantum Chemistry, Molecular dynamics | Jianhui Lan |
- Each group is consuming sperated resources, and the following table lists computing resources, partitions and QOS (job queue) for each group.
Partition | QOS | Account / Group | Worker nodes |
---|---|---|---|
mbh,mbh16 | regular | mbh | 16 nodes,256 CPU cores |
cac | regular | cac | 8 nodes,384 CPU cores |
nano | regular | nano | 7 nodes,336 CPU cores |
biofastq | regular | bio | 12 nodes,288 CPU cores |
heps | regular,advanced | heps | 34 nodes,1224 CPU cores |
hepsdebug | hepsdebug | heps | 1 node, 36 CPU cores |
cepcmpi | regular | cepcmpi | 36 nodes,1696 CPU cores |
ali | regular | alicpt | 16 nodes,576 CPU cores |
bldesign | blregular | bldesign | 3 nodes,108 CPU cores |
raq | regular | raq | 12 nodes, 672 CPU cores |
- Resource limits for each QOS is shown in the following table.
QOS | Max Running Time for each job | Priority | Maximum number of submitted jobs |
---|---|---|---|
regular | 60 days | low | 4000 jobs per user, 8000 jobs per group |
advanced | 60 days | high | -, - |
hepsdebug | 30 minutes | medium | 100 jobs per user, - |
blregular | 30 days | low | 200 jobs per user, 1000 jobs per group |
Step 0 AFS account application and cluster grant
Users who already have the AFS account and cluster grant could skip this step.
For new users
- Apply for the AFS account :application web page
- Cluster grant will done automatically
For the ungranted users
- Send an email to the group administrator for user grant.
- An job submission error may be encountered if not granted
sbatch: error: Batch job submission failed: Invalid account or account/partition combination specified
- After granted by group administrator and Slurm cluster administrator, jobs could be submitted and run in the cluster.
- Send an email to the group administrator for user grant.
Step1 Get job script ready
Edit the script with your preferred editor, for example, vim.
- The sample job script can be found under the following directory.
/cvmfs/slurm.ihep.ac.cn/slurm_sample_script
Mark:The sample job script file is stored in CVMFS filesystem. CVMFS usage is shown with the following commands.
# ssh to the login nodes with your AFS account. # Replace <AFS_user_name> with your AFS account. > ssh <AFS_user_name>@lxlogin.ihep.ac.cn # Change to the sample script directory. > cd /cvmfs/slurm.ihep.ac.cn/slurm_sample_script # Check the sample script. > ls -lht -rw-rw-r-- 1 cvmfs cvmfs 2.2K 8月 2 10:30 slurm_sample_script_cpu.sh
- Get your own job script according to the sample job script.
# Content and comments of the sample job script. > cat slurm_sample_script_1.sh #! /bin/bash #===================================================== #===== Modify the following options for your job ===== #===== DON'T remove the #! /bin/bash lines ===== #===== DON'T comment #SBATCH lines ===== #===== of partition,account and ===== #===== qos ===== #===================================================== # Specify the partition name from which resources will be allocated #SBATCH --partition=mbh # Specify which expriment group you belong to. # This is for the accounting, so if you belong to many experiments, # write the experiment which will pay for your resource consumption #SBATCH --account=mbh # Specify which qos(job queue) the job is submitted to. #SBATCH --qos=regular #===================================================== #===== Modify the following options for your job ===== #===== The following options are not mandatory ===== #===================================================== # Specify your job name, e.g.: "test_job" is my job name #SBATCH --job-name=test_job # Specify how many cores you will need, e.g.: 16 #SBATCH --ntasks=16 # Specify how many memory per CPU are request # The resolution is MB, e.g. : the following line asks for 1GB per CPU #SBATCH --mem-per-cpu=1GB # Specify the output file path of your job # Attention!! It's a output file name, not a directory # Also you must have write access to this file # An output file name example is : job-%j.out, where %j will be replace with job id #SBATCH --output=/path/to/your/output/file #================================= #===== Run your job commands ===== #================================= #=== You can define your variables following the normal bash style VAR1="value1" VAR2="value2" #=== Run you job commands in the following lines # srun is necessary, with srun, Slurm will allocate the resources you are asking for # You can run an executable script with srun # modify /path/to/your/script to a real file path name srun /path/to/your/script # Or if your program is written with MPI, you can run it with mpiexec # First, run a simple command with srun srun -l hostname # later, you can run your MPI program with mpiexec # The output will be written under the path specified by the --output option # modify /path/to/your/mpi_program to your real program file path mpiexec /path/to/your/mpi_program
Some explanation of the sample job script.
Normally, Job scripts are consisted with two part:
Job parameters : started with #SBATCH to specify parameter values.
- Job workload: normally are execuatable files with options and arguments,e.g. , executable scripts, MPI programs etc.
- Job parameters partition, account and qos are mandatory, or jobs will be failed to submit.
Step2 Job submission
When your job script is ready, it's time to submit the job with the following command:
# log into lxlogin.ihep.ac.cn
$ ssh <AFS_user_name>@lxlogin.ihep.ac.cn
# Type the command sbatch to sumbit a job
$ sbatch slurm_sample_script_1.sh
Step3 Job Query
To query a single job.
Once a job submitted, the command sbatch will return back a job id. Users can query the job with this job id. The query command is :
# Use the command sacct to check the status of a single job # where <job_id> stands for the job id returned by the sbatch command $ sacct -j <jobid>
To query jobs submitted by a user.
To query all jobs submitted by a user after 0:00, one should type the following command:
# <AFS_user_name> can be replaced with user name $ sacct -u <AFS_user_name>
Or use sacct command
To query all jobs submitted by a user after a specified day, one should type the following command:
# <AFS_user_name> can be replaced with user name # --starttime specifies the query start time point with the format of 'YYYY-MM-DD' $ sacct -u <AFS_user_name> --starttime='YYYY-MM-DD'
Step4 Job result
Once the submitted job is done, one can get the output results.
- If the output file is not specified, then the default output file is saved under the working directory where users submitted the job. And the default output file name is
<job_id>.out
,<job_id>
is the job id.- For example, if the job id is
1234
, the output file name is1234.out
.
- For example, if the job id is
- If output file is specified, the output results can be found in the specified file.
- If job workload redirect the output, please check the redirected output files to get the job results.
Step5 Job cancellation
To cancel a submitted job, one can type the following command.
# Use scancel command to cancel a job, where <job_id> is the job id returned by sbatch
$ scancel <job_id>
Step6 Cluster status query
To check partition names of the Slurm cluster, or to query resource status of partitions, one can type the following command:
# Use sinfo command to query resource status
$ sinfo
3.2.2.2 Usage of the Slum GPU Cluster
Introduction of GPU Resources
- Authorized groups that can access the GPU cluster are listed in the following table.
group | Applications | Contact person |
---|---|---|
lqcd | Lattice QCD | Ying Chen / Ming Gong |
gpupwa | Partial Wave Analysis | Beijiang Liu / Liaoyuan Dong |
junogpu | Neutrino Analysis | Wuming Luo |
mlgpu | Machine Learning apps of BESIII | Yao Zhang |
higgsgpu | GPU acceleration for CEPC software | Gang Li |
bldesign | Beamline applications for HEPS experiment | Haifeng Zhao |
ucasgpu | Machine Learning for UCAS | Xiaorui Lv |
pqcd | Perturbative QCD calculation | Zhao Li |
cmsgpu | Machine Learning apps of CMS | Huaqiao Zhang, Mingshui Chen |
neuph | Theory of Neutrino and Phenomenology | Yufeng Li |
atlasgpu | Machine Learning apps of ATLAS | Contact of ATLAS |
lhaasogpu | Machine Learning apps of LHAASO | Contact of LHAASO |
herdgpu | Machine Learning apps of HERD | Contact of HERD |
qc | Quantum Computing | Contact of CC |
- GPU cluster is devided into two resource partition, each partition has different QOS (queue) and group, see the following table.
Partition | QOS | Group | Resource limitation | Num. of Nodes |
---|---|---|---|---|
lgpu | long | lqcd | QOS long - Run time of Jobs <= 30 days - Total number of submit jobs(running + queued) <= 64 - Memory requested per CPU per job <= 40GB |
- one worker node,384 GB memory per node. - 8 NVIDIA V100 nvlink GPU cards, 36 CPU cores in total. |
gpu | normal, debug | lqcd, gpupwa, junogpu,mlgpu,higgsgpu | QOS normal - Run time of Jobs <= 48 hours - Total number of submitted jobs(running + queued) per group<= 512, total gpu card number per group <= 128 - Total number of submit jobs(running + queued) per user<= 96, total GPU card number per user <= 64 - Memory requested per CPU per job <=40GB QOS debug - Run time of Jobs <= 15 minutes - Total number of jobs(running + queued) per group <= 256, total gpu card number per group <= 64 - Total number of jobs(running + queued) per user <= 24, total GPU card number per user <= 16 - Memory requested per CPU per job <= 40GB - The priority of QOS debug is higher than the priority of QOS normal |
- 23 worker nodes,384 GB memory per node. - 182 NVIDIA V100 nvlink GPU cards, 840 CPU cores in total. |
ucasgpu | ucasnormal | ucasgpu | QOS ucasnormal - Run time of Jobs <= 48 hours - Total number of submitted jobs(running + queued) per group<= 200, total gpu card number per group <= 40 - Total number of submit jobs(running + queued) per user<= 18, total GPU card number per user <= 6 - Memory requested per CPU per job <=40GB |
- one worker node,384 GB memory per node. - 8 NVIDIA V100 nvlink GPU cards, 36 CPU cores in total. |
pqcdgpu | pqcdnormal | pqcd | QOS ucasnormal - Run time of Jobs <= 72 hours - Total number of submitted jobs(running + queued) per group<= 100, total gpu card number per group <= 100 - Total number of submit jobs(running + queued) per user<= 20, total GPU card number per user <= 20 - Memory requested per CPU per job <=32GB |
- one worker node, 192GB memory per node - 5 NVIDIA V100 PCI-e GPU cards, 20 CPU cores in total. |
Explanations about QOS debug :
- debug is suitable for the following types of jobs:
- to test codes under development
- short run time
- For example, for the test jobs from group mlgpu and higgsgpu, it is recommended to submit jobs to the QOS debug.
- For other groups, jobs from gpupwa group, 75% jobs are finished within one hour according to statistics, it is recommended to submit these short jobs to the QOS debug.
Step1 Apply for your computing account
Users who already have his/her account granted could skip this step.
For the new users
- Apply for the account on the application web page.
- New users will be granted automatically
For the not granted users:
Users should send an email to the group administrator for cluster grant.
After granted by group administrator and Slurm cluster administrator, jobs could be submitted and run in the cluster.
- Not granted users will encounter the following error
sbatch: error: Batch job submission failed: Invalid account or account/partition combination specified
Step2 Prepare your executable programs
- Software of Group lqcd could be stored in the dedicated AFS directory
/afs/ihep.ac.cn/soft/lqcd/
, currently the upper storage limit of this directory is 100GB. - Users from
higgsgpu,junogpu,gpupwa,mlgpu,bldesign
could install your software under /hpcfs, directory paths for each group can be found in Step3. - Users from other groups could install your software under
/scratchfs
, or other dedicated data directory from your experiment. - If there are any special software requirements, please contact the cluster admin.
Step3 Prepare your storage I/O directory
There is a dedicated I/O directory for GPU cluster users from the above mentioned groups.
- Directory path for the group lqcd users:
/hpcfs/lqcd/qcd/
- Directory path for the group gpupwa users:
/hpcfs/bes/gpupwa/
- Directory path for the group junogpu:
/hpcfs/juno/junogpu/
- Directory path for the group mlgpu :
/hpcfs/bes/mlgpu/
- Directory path for the group higgsgpu:
/hpcfs/cepc/higgs/
- Directory path for the group bldesign :
/hpcfs/heps/bldesign/
- Directory path for the group ucasgpu:
/hpcfs/cepc/ucas/
- Directory path for the group lqcd users:
Input / Output files could be stored under the user's private sub-directory. Take lqcd group as an example, if there is a user
zhangsan
has Input / Output files, these files could be put under the directory/hpcfs/lqcd/qcd/zhangsan/
.For users who didn't find your data directory listed above,
/scratchfs
could be used as your data directory.
Step4 Prepare your job script
Job script is a bash script, and is consisted with two parts
- Part 1 : Job prameters. Lines of this part is started with
#SBATCH
which are used to specify resource partition, QOS(job queue), number of required resources(CPU/GPU/memory), job name, output file path, etc. - Part 2 : job workload. For example, executable scripts, programs, etc.
Attention!!
- No commands lie between the lines start with
#SBATCH
and the line start with#!
, or the job running parameters will be parsed by mistake, which could make the job get wrong resources allocated, and jobs will be failed at last. - Blank or comment lines could be filled between the
#SBATCH
lines and the#!
line.
- Part 1 : Job prameters. Lines of this part is started with
A job script sample is shown below.
#! /bin/bash
######## Part 1 #########
# Script parameters #
#########################
# Specify the partition name from which resources will be allocated, mandatory option
#SBATCH --partition=gpu
# Specify the QOS, mandatory option
#SBATCH --qos=normal
# Specify which group you belong to, mandatory option
# This is for the accounting, so if you belong to many group,
# write the experiment which will pay for your resource consumption
#SBATCH --account=lqcd
# Specify your job name, optional option, but strongly recommand to specify some name
#SBATCH --job-name=gres_test
# Specify how many cores you will need, default is one if not specified
#SBATCH --ntasks=2
# Specify the output file path of your job
# Attention!! Your afs account must have write access to the path
# Or the job will be FAILED!
#SBATCH --output=/home/cc/duran/job_output/gpujob-%j.out
# Specify memory to use, or slurm will allocate all available memory in MB
#SBATCH --mem-per-cpu=2048
# Specify how many GPU cards to use
#SBATCH --gres=gpu:v100:2
######## Part 2 ######
# Script workload #
######################
# Replace the following lines with your real workload
# For example to list the allocated hosts and sleep 3 minutes
srun -l hostname
sleep 180
More Information
- Specifications of --partitiion, --account, --qos options for each group
Group Job types --partition --account(normally same as the group) --qos lqcd long jobs lgpu lqcd long lqcd,gpupwa,higgsgpu,mlgpu,junogpu normal jobs gpu lqcd,gpupwa,higgsgpu,mlgpu,junogpu normal lqcd,gpupwa,higgsgpu,mlgpu,junogpu debug jobs gpu lqcd,gpupwa,higgsgpu,mlgpu,junogpu debug bldesign normal jobs gpu bldesign blnormal bldesign debug jobs gpu bldesign bldebug ucasgpu normal jobs ucasgpu ucasgpu ucasnormal pqcd normal jobs pqcdgpu pqcd pqcdnormal
#SBATCH --mem-per-cpu
option is used to specify required memory size. If this parameter is not given, default size is 4GB per CPU core, the maximum memory size is 32GB per CPU core. Please specify the memory size according to your practical requirements.Explation for the option
#SBATCH --time
- It may take less queued time for the jobs, if
--time
option is sepecified- Especially for the jobs from group gpupwa whose job number is quite large.
- To use
--time
option, the following lines could be modified and added in the job script# To tell how long does it take to finish the job, e.g.: 2 hours in the following line #SBATCH --time=2:00:00 # for the jobs will be run more than 24 hours, use the following time format # e.g. : this job will run for 1 day and 8 hours #SBATCH --time=1-8:00:00
- For the users not experienced, run time statistics of historical jobs can be used as reference:
Group Run time Porbability gpupwa <= 1 hour 90.43% lqcd <= 32 hours 90.37% junogpu <= 12 hours 91.24%
- Jobs from group mlgpu and higgsgpu are small, it is recommended to use QOS debug,
--time
option could be omitted for now.- If Jobs ran longer than specified
--time
option, the Scheduling system will clean the overtimed jobs itself.
- Sample job scripts could be found with the following path
/cvmfs/slurm.ihep.ac.cn/slurm_sample_script
Some comments
- Sample job scripts are stored in the CVMFS filesystem, access CVMFS with the following commands:
# log into the lxlogin nodes with your AFS account $ ssh <AFS_user_name>@lxlogin.ihep.ac.cn # Go to the directory where sample job scripts could be found $ cd /cvmfs/slurm.ihep.ac.cn/slurm_sample_script # get sample jobs scripts $ ls -lht -rw-rw-r-- 1 cvmfs cvmfs 1.4K 8月 12 18:31 slurm_sample_script_gpu.sh
Step5 Submit your job
- ssh login nodes.
# Issue ssh command to log in.
# Replace <AFS_user_name> with your user name.
$ ssh <AFS_user_name>@lxlogin.ihep.ac.cn
- The command to submit a job:
# command to submit a job
$ sbatch <job_script.sh>
# <job_script.sh> is the name of the script, e.g: v100_test.sh, then the command is:
$ sbatch v100_test.sh
# There will be a jobid returned as a message if the job is submitted successfully
Step6 Check job status
- The command to show job status is shown below.
# command to check the job queue
$ squeue
# command to check the jobs submitted by user
$ sacct -u <AFS_user_name>
Step7 Cancel your job
- The Command to cancel a job is listed below.
# command to cancel the job
# <jobid> can be found using the command sacct
$ scancel <jobid>