3.2.2 Slurm Cluster User Manual
3.2.2.1 Usage of the Slurm CPU Cluster
Introduction of CPU Resources
- Users of the Slurm CPU cluster are from the following groups:
| Group | Application | Contact perspon |
|---|---|---|
| mbh | Black hole | Yanrong Li |
| bio | Biology | Lina Zhaoapg |
| apg | Accelerator design | Haisheng Xu |
| pwfa | Plasma acceleration | Ming Zeng |
| heps | Accelerator design for HEPS | Yi Jiao / Zhe Duan |
| cepcmpi | Accelerator design for CEPC | Yuan Zhang / Yiwei Wang |
| alicpt | Ali experiment | Hong Li |
| bldesign | Beamline studies for HEPS | Haifeng Zhao |
| raq | Quantum Chemistry, Molecular dynamics | Jianhui Lan |
- Each group is consuming sperated resources, and the following table lists computing resources, partitions and QOS (job queue) for each group.
| Partition | QOS | Account / Group | Worker nodes |
|---|---|---|---|
| mbh,mbh16 | regular | mbh | 16 nodes,256 CPU cores |
| apg | apgregular | apg | 12 nodes,768 CPU cores |
| pwfadebug | spubpwfa | pfwa | 1 node,24 CPU cores |
| heps | regular,advanced | heps | 34 nodes,1224 CPU cores |
| hepsdebug | hepsdebug | heps | 1 node, 36 CPU cores |
| cepcmpi | regular | cepcmpi | 36 nodes,1696 CPU cores |
| ali | regular | alicpt | 16 nodes,576 CPU cores |
| bldesign | blregular | bldesign | 3 nodes,108 CPU cores |
| raq | regular | raq | 12 nodes, 672 CPU cores |
- Resource limits for each QOS is shown in the following table.
| QOS | Max Running Time for each job | Priority | Maximum number of submitted jobs per user / account | Maximum Resource limit |
|---|---|---|---|---|
| regular | 60 days | Low | 4000 jobs per user, 8000 jobs per group | - |
| advanced | 60 days | High | -, - | - |
| hepsdebug | 30 minutes | Medium | 100 jobs per user, - | - |
| blregular | 30 days | Low | 200 jobs per user, 1000 jobs per group | - |
| apgregular | - | Low | - | - |
| raqregular | - | High | 100 jobs per user, - | 280 CPU cores |
| raqacc | - | Low | 50 jobs per user, - | 112 CPU cores |
| spubpwfa | - | High | - | - |
Step 0 Cluster ID application and grant
Users who already have a cluster ID and get granted could skip this step.
For new users
- Apply for the cluster ID :application web page
- For users whose group is supported by the Slurm cluster, cluster grant will done automatically
- For users whose group is NOT supported by the Slurm cluster, there are two ways to obtain authorization:
- Users can apply to join a second linux group supported by the Slurm CPU cluster to gain grant. For details, please refer to the next section: For the ungranted users.
- If users cannot join a group supported by the Slurm CPU cluster, they may choose to rent the public computing platform. For details, please refer to the Introduction to Using the Public Computing Platform.
For the ungranted users
- An job submission error may be encountered if not granted
sbatch: error: Batch job submission failed: Invalid account or account/partition combination specifiedCurrently, a self-service application method is provided for users to get granted. The self-service application process is as follows:
- Log in via SSO unified authentication at http://ccsinfo.ihep.ac.cn.

Click on ”Apply to second linux group“ -> "Secondary group apply",choose the group you would like to join.
- Slurm support groups can be found here:Introduction to the Slurm CPU Cluster
- If targed groups are not listed,one can consider to rent resources from The Public Computing Platform

After filling in the application reason and selecting the appropriate group, click "Confirm" to submit the application.

After submission, the application will be reviewed and authorized by the designated computing contact person of the group.。
- For the list of computing contacts for each experiment, please refer to:Experiments and Contacts
Step1 Get job script ready
Do not use directories under /afs or /workfs2 as data read/write directories. Since compute nodes cannot write to these two directories, submitted jobs will immediately fail and exit.
- Each cluster account is created with three default storage directories: /afs, /workfs2, and /scratchfs
| Storage Path | Default Quota | Usage | Backup or not | | ------------ | ----------------- | ------------------------- | ------------- | | /afs | 500MB | Default Home directory | Yes | | /workfs2 | 5GB,5万个文件 | Directory for code backup | Yes | | /scratchfs | 500GB,20万个文件 | Data Directory | No |
- In addition to the three directories mentioned above, each experiment has dedicated data directories. For details, please refer to::
- It is recommended to use the data directories as the job submission directory to avoid errors caused by compute nodes being unable to write to /afs or /workfs2.
- If you need to change the default home directory (/afs) to another non-/scratchfs data directory, please contact HelpDesk to request the modification.
choose your preferred editor (such as vim) to edit the job script to be submitted.
Sample job scripts can be found in the following directories:
/cvmfs/slurm.ihep.ac.cn/slurm_sample_script
Note: The sample job scripts are stored in the CVMFS file system. The usage of CVMFS is as follows:
# Log in to the login node, where <user_name> is your cluster ID > ssh <user_name>@lxlogin.ihep.ac.cn # Locate the directory containing the sample scripts > cd /cvmfs/slurm.ihep.ac.cn/slurm_sample_script # View the sample scripts > ls slurm_sample_script*.sh slurm_sample_script_cpu.sh slurm_sample_script_gpu.sh
- Edit the sample script to obtain a runnable job script.
> cat slurm_sample_script_cpu.sh
#! /bin/bash
#================= Part 1 : job parameters ============
#SBATCH --partition=mbh
#SBATCH --account=mbh
#SBATCH --qos=regular
#SBATCH --ntasks=16
#SBATCH --mem-per-cpu=2GB
#SBATCH --job-name=test_job
#============== Part 2 : job workload ===================
NP=$SLURM_NTASKS
# source your environment file if exists
# replace /path/to/your/env_file with your real env file path
source /path/to/your/env_file
# replace /path/to/your/mpi_program with your real MPI program path
mpirun -np $NP /path/to/your/mpi_program
Some explanation about the job script:
# ssh to the login nodes with your AFS account. # Replace <AFS_user_name> with your AFS account. > ssh <AFS_user_name>@lxlogin.ihep.ac.cn # Change to the sample script directory. > cd /cvmfs/slurm.ihep.ac.cn/slurm_sample_script # Check the sample script. > ls -lht -rw-rw-r-- 1 cvmfs cvmfs 2.2K 8月 2 10:30 slurm_sample_script_cpu.sh
Get your own job script according to the sample job script.
# Content and comments of the sample job script. > cat slurm_sample_script_1.sh #! /bin/bash #================= Part 1 : job parameters ============ #SBATCH --partition=mbh #SBATCH --account=mbh #SBATCH --qos=regular #SBATCH --ntasks=16 #SBATCH --mem-per-cpu=2GB #SBATCH --job-name=test_job #============== Part 2 : job workload =================== NP=$SLURM_NTASKS # source your environment file if exists # replace /path/to/your/env_file with your real env file path source /path/to/your/env_file # replace /path/to/your/mpi_program with your real MPI program path mpirun -np $NP /path/to/your/mpi_programSome explanation of the sample job script.
- Job runtime parameters, which begin with
#SBATCHand specify the parameters for job execution. Job execution content, which typically includes executable programs, such as executable scripts, MPI programs, etc.
The job runtime parameters
--partition,--account, and--qosare mandatory and must be specified; otherwise, the job submission will be failed.--mem-per-cpuis used to request the amount of memory available per CPU core.--ntasksis used to request the number of CPUs available.--job-nameis used to specify the job name, which can be customized by the user.
- Job runtime parameters, which begin with
Step2 Job Submit
When your job script is ready, it's time to submit the job with the following command:
# log into lxlogin.ihep.ac.cn
$ ssh <user_name>@lxlogin.ihep.ac.cn
# submit jobs with sbatch cmd
$ sbatch slurm_sample_script_1.sh
Step3 Job Query
To query a single job.
Once a job submitted, the command sbatch will return back a job id. Users can query the job with this job id. The query command is :
```bash
Use the command sacct to check the status of a single job
where
stands for the job id returned by the sbatch command $ sacct -j
To query jobs submitted by a user.
To query all jobs submitted by a user after 0:00, one should type the following command:
# <user_name> can be replaced with user name $ sacct -u <user_name>Or use sacct command to query all jobs submitted by a user after a specified day, one should type the following command:
# <user_name> can be replaced with user name # --starttime specifies the query start time point with the format of 'YYYY-MM-DD' $ sacct -u <user_name> --starttime='YYYY-MM-DD'
Step4 Job result
Once the submitted job is done, one can get the output results.
- If the output file is not specified, then the default output file is saved under the working directory where users submitted the job. And the default output file name is
<job_id>.out,<job_id>is the job id.- For example, if the job id is
1234, the output file name is1234.out.
- For example, if the job id is
- If output file is specified, the output results can be found in the specified file.
- If job workload redirect the output, please check the redirected output files to get the job results.
Step5 Job cancellation
To cancel a submitted job, one can type the following command.
# Use scancel command to cancel a job, where <job_id> is the job id returned by sbatch
$ scancel <job_id>
Step6 Cluster status query
To check partition names of the Slurm cluster, or to query resource status of partitions, one can type the following command:
# Use sinfo command to query resource status
$ sinfo
3.2.2.2 Usage of the Slum GPU Cluster
Introduction of GPU Resources
- Authorized groups that can access the GPU cluster are listed in the following table.
| group | Applications | Contact person |
|---|---|---|
| lqcd | Lattice QCD | Ying Chen / Ming Gong |
| gpupwa | Partial Wave Analysis | Beijiang Liu / Liaoyuan Dong |
| junogpu | Neutrino Analysis | Wuming Luo |
| mlgpu | Machine Learning apps of BESIII | Yao Zhang |
| higgsgpu | GPU acceleration for CEPC software | Gang Li |
| bldesign | Beamline applications for HEPS experiment | Haifeng Zhao |
| ucasgpu | Machine Learning for UCAS | Xiaorui Lv |
| pqcd | Perturbative QCD calculation | Zhao Li |
| cmsgpu | Machine Learning apps of CMS | Huaqiao Zhang, Mingshui Chen |
| neuph | Theory of Neutrino and Phenomenology | Yufeng Li |
| atlasgpu | Machine Learning apps of ATLAS | Contact of ATLAS |
| lhaasogpu | Machine Learning apps of LHAASO | Contact of LHAASO |
| herdgpu | Machine Learning apps of HERD | Contact of HERD |
| qc | Quantum Computing | Contact of CC |
| lhcbgpu | LHCb Deep Learning | Yiming Li |
- GPU cluster is devided into two resource partition, each partition has different QOS (queue) and group, see the following table.
| Partition | QOS | Group | Resource limitation | Num. of Nodes |
|---|---|---|---|---|
| lgpu | long | lqcd | QOS long - Run time of Jobs <= 30 days - Total number of submit jobs(running + queued) <= 64 - Memory requested per CPU per job <= 40GB |
- one worker node,360 GB memory available per node. - 8 NVIDIA V100 nvlink GPU cards, 36 CPU cores in total. |
| gpu | normal, debug | lqcd, junogpu, mlgpu, higgsgpu |
QOS normal - Run time of Jobs <= 48 hours - Total number of submitted jobs(running + queued) per group<= 512 -Total gpu card number per group <= 128 -Total number of CPU cores per group <=432 -Total number of memory per group <= 5TB - Total number of submit jobs(running + queued) per user<= 96 -Total GPU card number per user <= 64** -Total number of CPU cores per user <=216 - Memory requested per CPU per job <=40GB QOS debug - Run time of Jobs <= 15 minutes - Total number of jobs(running + queued) per group <= 256, total gpu card number per group <= 64 -Total number of CPU cores per user <=216 -Total number of memory per group <= 2TB - Total number of jobs(running + queued) per user <= 24 -Total GPU card number per user <= 16 - Memory requested per CPU per job <= 40GB -The priority of QOS debug is higher than the priority of QOS normal |
Total of 24 nodes, including: - 23 nodes, each with 360GB of available memory -1 node, with 240GB of available memory. Total of 183 GPU cards, including: -182 NVIDIA V100 NVLink GPU cards;1 NVIDIA A100 PCIe GPU card. Total of 892 CPU cores |
| ucasgpu | ucasnormal | ucasgpu | QOS ucasnormal - Run time of Jobs <= 48 hours - Total number of submitted jobs(running + queued) per group<= 200, total gpu card number per group <= 40 - Total number of submit jobs(running + queued) per user<= 18, total GPU card number per user <= 6 - Memory requested per CPU per job <=40GB |
- one worker node,384 GB memory per node. - 8 NVIDIA V100 nvlink GPU cards, 36 CPU cores in total. |
| pqcdgpu | pqcdnormal | pqcd | QOS ucasnormal - Run time of Jobs <= 72 hours - Total number of submitted jobs(running + queued) per group<= 100, total gpu card number per group <= 100 - Total number of submit jobs(running + queued) per user<= 20, total GPU card number per user <= 20 - Memory requested per CPU per job <=32GB |
- one worker node, 192GB memory per node - 5 NVIDIA V100 PCI-e GPU cards, 20 CPU cores in total. |
| gpu | pwanormal, debug | gpupwa | QOS pwanormal - Run time of Jobs <= 48 hours - Total number of submitted jobs(running + queued) per group<= 512 -Total gpu card number per group <= 128 -Total number of CPU cores per group <=432 -Total number of memory per group <= 5TB - Total number of submit jobs(running + queued) per user<= 38 -Total GPU card number per user <= 10 -Total number of CPU cores per user <=32 -Total number of memory per user <= 600GB - Memory requested per CPU per job <=40GB QOS debug - Run time of Jobs <= 15 minutes - Total number of jobs(running + queued) per group <= 256, total gpu card number per group <= 64 -Total number of CPU cores per user <=216 -Total number of memory per group <= 2TB - Total number of jobs(running + queued) per user <= 24 -Total GPU card number per user <= 16 - Memory requested per CPU per job <= 40GB -The priority of QOS debug is higher than the priority of QOS pwanormal |
Total of 24 nodes, including: - 23 nodes, each with 360GB of available memory -1 node, with 240GB of available memory. Total of 183 GPU cards, including: -182 NVIDIA V100 NVLink GPU cards;1 NVIDIA A100 PCIe GPU card. Total of 892 CPU cores |
| gpu | cmsnormal,debug | cmsgpu | QOS cmsnormal - Run time of Jobs <= 48 hours - Total number of submitted jobs(running + queued) per group<= 128 -Total gpu card number per group <= 17 -Total number of CPU cores per group <=72 -Total number of memory per group <= 720GB - Total number of submit jobs(running + queued) per user<= 36 -Total GPU card number per user <= 9 -Total number of CPU cores per user <=36 -Total number of memory per user <= 360GB - Memory requested per CPU per job <=40GB QOS debug - Run time of Jobs <= 15 minutes - Total number of jobs(running + queued) per group <= 256, total gpu card number per group <= 64 -Total number of CPU cores per user <=216 -Total number of memory per group <= 2TB - Total number of jobs(running + queued) per user <= 24 -Total GPU card number per user <= 16 - Memory requested per CPU per job <= 40GB -The priority of QOS debug is higher than the priority of QOS cmsnormal |
Total of 24 nodes, including: - 23 nodes, each with 360GB of available memory -1 node, with 240GB of available memory. Total of 183 GPU cards, including: -182 NVIDIA V100 NVLink GPU cards;1 NVIDIA A100 PCIe GPU card. Total of 892 CPU cores |
| gpu | debug | atlasgpu, lhaasogpu, herdgpu, qc,lhcbgpu | QOS debug - Run time of Jobs <= 15 minutes - Total number of jobs(running + queued) per group <= 256, total gpu card number per group <= 64 -Total number of CPU cores per user <=216 -Total number of memory per group <= 2TB - Total number of jobs(running + queued) per user <= 24 -Total GPU card number per user <= 16 - Memory requested per CPU per job <= 40GB |
Total of 24 nodes, including: - 23 nodes, each with 360GB of available memory -1 node, with 240GB of available memory. Total of 183 GPU cards, including: -182 NVIDIA V100 NVLink GPU cards;1 NVIDIA A100 PCIe GPU card. Total of 892 CPU cores |
| ucasgpu | ucasnormal | ucasgpu | QOS ucasnormal - Run time of Jobs <= 48 hours - Total number of submitted jobs(running + queued) per group<= 100 -Total gpu card number per group <= 16 -Total number of CPU cores per group <=36 -Total number of memory per group <= 720GB - Total number of submit jobs(running + queued) per user<= 10 -Total GPU card number per user <= 6 -Total number of CPU cores per user <=18 -Total number of memory per user <= 480GB - Memory requested per CPU per job <=40GB |
Total of 1 node, 360GB of available memory Total of 8 NVIDIA V100 NVLink 32GB GPU cards, Total of 36 CPU cores |
| neuph | neuphnormal | neuph | - | Total of 2 node, 360GB of available memory Total of 5 NVIDIA A100 PCI-e 40GB GPU cards, Total of 96 CPU cores |
| pqcdgpu | pqcdnormal | pqcd | QOS pqcdnormal - Run time of Jobs <= 72 hours - Total number of submitted jobs(running + queued) per group<= 100 -Total gpu card number per group <= 16 -Total number of CPU cores per group <=20 -Total number of memory per group <= 180GB - Total number of submit jobs(running + queued) per user<= 20 -Total GPU card number per user <= 8 -Total number of CPU cores per user <=20 -Total number of memory per user <= 180GB - Memory requested per CPU per job <=32GB |
Total of 1 node, 360GB of available memory Total of 5 NVIDIA V100 PCI-e 32GB GPU cards, Total of 20 CPU cores |
| gpupwa | pwadedicate, pwadebug | gpupwa | QOS pwadedicate - Run time of Jobs <= 48 hours - Total number of submit jobs(running + queued) per user<= 12 -Total GPU card number per user <= 8 -Total number of CPU cores per user <=16 -Total number of memory per user <= 640GB - Memory requested per CPU per job <=40GB QOS pwadebug - Run time of Jobs <= 15 minutes - Total number of jobs(running + queued) per user <= 16 -Total GPU card number per user <= 8 -Total number of CPU cores per user <=32 -Total number of memory per user <= 640GB - Memory requested per CPU per job <= 40GB -The priority of QOS debug is higher than the priority of QOS pwanormal |
Total of 5 node, 740GB of available memory Total of 40 NVIDIA A100 PCI-e 40GB GPU cards, Total of 320 CPU cores |
Explanations about QOS debug :
- debug is suitable for the following types of jobs:
- to test codes under development
- short run time
- For example, for the test jobs from group mlgpu and higgsgpu, it is recommended to submit jobs to the QOS debug.
- For other groups, jobs from gpupwa group, 75% jobs are finished within one hour according to statistics, it is recommended to submit these short jobs to the QOS debug.
Step1 Apply for your cluster ID
Users who already have a cluster ID and get granted could skip this step.
For new users
- Apply for the cluster ID :application web page
- For users whose group is supported by the Slurm cluster, cluster grant will done automatically
- For users whose group is NOT supported by the Slurm cluster, there are two ways to obtain authorization:
- Users can apply to join a second linux group supported by the Slurm CPU cluster to gain grant. For details, please refer to the next section: For the ungranted users.
- If users cannot join a group supported by the Slurm CPU cluster, they may choose to rent the public computing platform. For details, please refer to the Introduction to Using the Public Computing Platform.
For the ungranted users
- An job submission error may be encountered if not granted
sbatch: error: Batch job submission failed: Invalid account or account/partition combination specifiedCurrently, a self-service application method is provided for users to get granted. The self-service application process is as follows:
- Log in via SSO unified authentication at http://ccsinfo.ihep.ac.cn.

Click on ”Apply to second linux group“ -> "Secondary group apply",choose the group you would like to join.
- Slurm support groups can be found here:Introduction to the Slurm GPU Cluster
- If targed groups are not listed,one can consider to rent resources from The Public Computing Platform

After filling in the application reason and selecting the appropriate group, click "Confirm" to submit the application.

After submission, the application will be reviewed and authorized by the designated computing contact person of the group.。
- For the list of computing contacts for each experiment, please refer to:Experiments and Contacts
Step2 Prepare your executable programs
- Software of Group lqcd could be stored in the dedicated AFS directory
/afs/ihep.ac.cn/soft/lqcd/, currently the upper storage limit of this directory is 100GB. - Users from
higgsgpu,junogpu,gpupwa,mlgpu,bldesigncould install your software under /hpcfs, directory paths for each group can be found in Step3. - Users from other groups could install your software under
/scratchfs, or other dedicated data directory from your experiment. - If there are any special software requirements, please contact the cluster admin.
Step3 Prepare your storage I/O directory
Do not use directories under /afs or /workfs2 as data read/write directories. Since compute nodes cannot write to these two directories, submitted jobs will immediately fail and exit.
Each cluster account is created with three default storage directories: /afs, /workfs2, and /scratchfs
| Storage Path | Default Quota | Usage | Backup or not |
|---|---|---|---|
| /afs | 500MB | Default Home directory | Yes |
| /workfs2 | 5GB,50,000 files | Directory for code backup | Yes |
| /scratchfs | 500GB,200,000 files | Data Directory | No |
- In addition to the three directories mentioned above, each experiment has dedicated data directories. For details, please refer to::
- It is recommended to use the data directories as the job submission directory to avoid errors caused by compute nodes being unable to write to /afs or /workfs2.
If you need to change the default home directory (/afs) to another non-/scratchfs data directory, please contact HelpDesk to request the modification.
There is a dedicated I/O directory for GPU cluster users from the above mentioned groups.
- Directory path for the group lqcd users:
/hpcfs/lqcd/qcd/ - Directory path for the group gpupwa users:
/hpcfs/bes/gpupwa/ - Directory path for the group junogpu:
/hpcfs/juno/junogpu/ - Directory path for the group mlgpu :
/hpcfs/bes/mlgpu/ - Directory path for the group higgsgpu:
/hpcfs/cepc/higgs/ - Directory path for the group bldesign :
/hpcfs/heps/bldesign/
- Directory path for the group lqcd users:
Step4 Prepare your job script
Job script is a bash script, and is consisted with two parts
- Part 1 : Job prameters. Lines of this part is started with
#SBATCHwhich are used to specify resource partition, QOS(job queue), number of required resources(CPU/GPU/memory), job name, output file path, etc. - Part 2 : job workload. For example, executable scripts, programs, etc.
Attention!!
- No commands lie between the lines start with
#SBATCHand the line start with#!, or the job running parameters will be parsed by mistake, which could make the job get wrong resources allocated, and jobs will be failed at last. - Blank or comment lines could be filled between the
#SBATCHlines and the#!line.
- Part 1 : Job prameters. Lines of this part is started with
A job script sample is shown below.
#! /bin/bash
######## Part 1 #########
#SBATCH --partition=gpu
#SBATCH --qos=normal
#SBATCH --account=lqcd
#SBATCH --ntasks=2
#SBATCH --mem-per-cpu=4096
#SBATCH --gpus=v100:1
#SBATCH --job-name=test
######## Part 2 ######
# Replace the following lines with your real workload
# list the allocated hosts
srun -l hostname
sleep 180
More Information
- #SBATCH is a job script parameter used to specify job runtime parameters, request resource quantities, etc. It is not a comment and should not be deleted.
Specifications of --partitiion, --account, --qos options for each group
Group Job types --partition --account(normally same as the group) --qos lqcd long jobs lgpu lqcd long lqcd,higgsgpu,mlgpu,junogpu normal jobs gpu lqcd,higgsgpu,mlgpu,junogpu normal gpupwa normal jobs gpu gpupwa pwanormal gpupwa normal jobs gpupwa gpupwa pwadedicate gpupwa debug jobs gpupwa gpupwa pwadebug Lqcd,gpupwa,higgsgpu,mlgpu,
junogpu,cmsgpu,atlasgpu,lhcbgpu, lhaasogpu,qcdebug jobs gpu Lqcd,gpupwa,higgsgpu,mlgpu,
junogpu,cmsgpu,atlasgpu,lhcbgpu, lhaasogpu,qcdebug bldesign normal jobs gpu bldesign blnormal bldesign debug jobs gpu bldesign bldebug ucasgpu normal jobs ucasgpu ucasgpu ucasnormal pqcd normal jobs pqcdgpu pqcd pqcdnormal cmsgpu normal jobs gpu cmsgpu cmsnormal neuph normal jobs neuph neuph neuphnormal
- The
--mem-per-cpuoption is used to specify the amount of memory required.
- If not specified, the default allocation is 4GB of memory per CPU core.
- The maximum allocatable memory per CPU core can be found in the Cluster Resources, Queues, and Groups Description. Please specify the memory usage according to the actual application requirements.
- The
--ntasksoption is used to specify the number of CPU cores required.
- For example:
#SBATCH --ntasks=20requests 20 CPU cores.- The
--gpusoption is used to specify the type and number of GPU cards required. For example:
#SBATCH --gpus=v100:1requests 1 V100 GPU card.#SBATCH --gpus=a100:1requests 1 A100 GPU card.#SBATCH --gpus=1requests 1 GPU card of any available model.- The `--job-name` option is used to specify the job name, which can be customized by the user.
Explation for the option
#SBATCH --time
- It may take less queued time for the jobs, if
--timeoption is sepecified- Especially for the jobs from group gpupwa whose job number is quite large.
- To use
--timeoption, the following lines could be modified and added in the job script# To tell how long does it take to finish the job, e.g.: 2 hours in the following line #SBATCH --time=2:00:00 # for the jobs will be run more than 24 hours, use the following time format # e.g. : this job will run for 1 day and 8 hours #SBATCH --time=1-8:00:00
- For the users not experienced, run time statistics of historical jobs can be used as reference:
Group Run time Porbability gpupwa <= 1 hour 90.43% lqcd <= 32 hours 90.37% junogpu <= 12 hours 91.24%
- Jobs from group mlgpu and higgsgpu are small, it is recommended to use QOS debug,
--timeoption could be omitted for now.- If Jobs ran longer than specified
--timeoption, the Scheduling system will clean the overtimed jobs itself.
- Sample job scripts could be found with the following path
/cvmfs/slurm.ihep.ac.cn/slurm_sample_script
Some comments
- Sample job scripts are stored in the CVMFS filesystem, access CVMFS with the following commands:
# log into the lxlogin nodes with your cluser ID $ ssh <user_name>@lxlogin.ihep.ac.cn # Go to the directory where sample job scripts could be found $ cd /cvmfs/slurm.ihep.ac.cn/slurm_sample_script # get sample jobs scripts $ ls -lht -rw-rw-r-- 1 cvmfs cvmfs 1.4K 8月 12 18:31 slurm_sample_script_gpu.sh
Step5 Submit your job
- ssh login nodes.
# Issue ssh command to log in.
# Replace <user_name> with your user name.
$ ssh <user_name>@lxlogin.ihep.ac.cn
- The command to submit a job:
# command to submit a job
$ sbatch <job_script.sh>
# <job_script.sh> is the name of the script, e.g: v100_test.sh, then the command is:
$ sbatch v100_test.sh
# There will be a jobid returned as a message if the job is submitted successfully
Step6 Check job status
- The command to show job status is shown below.
# command to check the job queue
$ squeue
# command to check the jobs submitted by user
$ sacct -u <user_name>
Step7 Cancel your job
- The Command to cancel a job is listed below.
# command to cancel the job
# <jobid> can be found using the command sacct
$ scancel <jobid>
3.2.2.3 Usage of the Public Computing Platform
Introduction to the Public Computing Platform
The public computing platform is a parallel computing service platform developed by the Computing Center, designed to provide rentable resources for users within the institute. This platform utilizes the Slurm scheduling system and offers the following resources.
Computing resources
| Partition | Resources |
|---|---|
| spub | 20 nodes in total, including: - 16 CPU nodes: each with 56 CPU cores and 240GB of memory - 3 CPU nodes: each with 36 CPU cores and 110GB of memory - 1 GPU node: with 48 CPU cores, 360GB of memory, and 4 NVIDIA A100 PCI-e 40GB GPU cards |
- Storage resources
| Path | Capacity | Quota |
|---|---|---|
| /ihepfs | 400TB | Default quota is 500GB,300,000 files, which can be adjusted based on actual needs |
- Matlab resources
| Software | License number |
|---|---|
| matlab parallel toolbox | 512 |
- --account, --partition and --qos used in the job script are listed below:
| Account | Partition | QoS |
|---|---|---|
| Same with group | spub | Created based on user requests |
Step 1: Prepare Your Cluster Account
- If you do not have a cluster ID, please refer to Applying for a Cluster Account.
- If you already have a cluster ID, you may skip this step.
Step 2: Apply for Computing Time on the Public Computing Platform
- Please visit the Public Computing Platform Introduction page to download the "Public Computing Platform Usage Application Form" and send it to yanran@ihep.ac.cn. The Computing Center will assign a dedicated representative to discuss your computing needs.
- After the requirements are finalized, the platform administrator will initialize your cluster environment based on the application, including:
- Setting up job queues.
- Creating storage directories.
- Installing the software environment.
- Once the cluster environment is initialized according to your needs, you can proceed to submit jobs for execution. For job submission and execution methods, please refer to Step 3 through Step 7.
Step 3: Prepare Your Data
- After your application is approved, you will have access to a dedicated directory for storing input/output data. The directory structure is:
/ihepfs/<experiment_name>/<user_name>. - For example: If user
zhangsanbelongs to theSPUBexperiment, their storage directory will be/ihepfs/SPUB/zhangsan.
Step 4: Prepare Your Job Script
- The approval email you receive will specify your
accountandqos. - You can use the sample script below and modify the relevant options:
- Replace
<account>and<qos>with the values provided in the email. - Replace
<experiment_name>and<user_name>with your experiment name and username. %jrepresents the job ID, which is automatically generated by the system during job execution.
- Replace
$ cat spub_slurm_sample.sh
#! /bin/bash
#================= Part 1 : job parameters ========================
#SBATCH --partition=spub
#SBATCH --account=<account>
#SBATCH --qos=<qos>
#SBATCH --job-name=sample
#SBATCH --ntasks=16
#SBATCH --mem-per-cpu=2GB
#SBATCH --output=/ihepfs/<experiment_name>/<user_name>/job-%j.out
#============ Part 2 : Job workload =====================
#===== Run your programs =====
# source your ENV settings first
NP=8
mpirun -n $NP my_mpi_program
Step5 Submit your job
$ ssh <user_name>@lxlogin.ihep.ac.cn
# When you submit a job using the sbatch command, the system will return the job ID.
$ sbatch spub_slurm_sample.sh
Step6 Query jobs
# View jobs in the queue for user zhangsan
$ squeue -u zhangsan
# View job details using job ID 1008163
$ sacct -j 1008163
# View all jobs submitted by user zhangsan today
$ sacct -u zhangsan
# View all jobs submitted by user zhangsan since 2021-07-18
$ sacct -u zhangsan --starttime=2021-07-18
Step7 Delete jobs
# Delete the job with id 1008163
$ scancel 1008163
# Delete all the jobs of user zhangsan
$ scancel -u zhangsan
Frequently Asked Questions (Q&A)
1. Can I apply to use only the software or storage resources of the public computing platform?
Yes, resources can be applied for based on actual needs.
2. What should I do if the provided sample scripts do not meet my requirements?
Please contact the platform administrator. Sample scripts can be customized according to user needs.
3. How are the fees calculated?
For inquiries about fees, please contact the computing needs liaison specialist: Yan Ran (yanran@ihep.ac.cn).