5 Distributed computing
5.1 BOSS user guide
5.1.1 Preparation
Apply certificates and join BES VO before using distributed resources to run BOSS jobs.
5.1.2 Environment set-up
Copy the Ganga configuration file to your directory:
$ cp /afs/.ihep.ac.cn/bes3/offline/ExternalLib/gangadist/.gangarc ~/.gangarc
modify the parameter gangadir (line 66),pointing to the user directory you defined
build grid and BOSS software env set-up scripts, including BOSS and Grid,the Grid env as follows:
source /cvmfs/dcomputing.ihep.ac.cn/frontend/gangadist/env_grid.sh
5.1.3 Job management
Here use bhabha as the example to introduce job submission and monitoring.
5.1.3.1 Job submission
1. Use the above env scripts to initialize your env, and input your certificate passwd when asked.
2. Edit ganga optional file in python. Here is the example(sim_dirac_by_event.py):
# ganga job file for BOSS simulation only
bossVersion = '6.6.4.p03'
optionsFile = 'jobOptions_sim_bhabha.txt'
jobGroup = '150420_bhabha_01'
metadata = {'resonance': '4360',
'eventType': 'bhabha',
}
splitter = UserSplitterByEvent(evtMaxPerJob = 1000, evtTotal = 1000*200)
app = Boss(version=bossVersion, optsfile=optionsFile, metadata=metadata)
j = Job(application=app, splitter=splitter)
j.backend = Dirac()
j.backend.settings['JobGroup'] = jobGroup
j.backend.settings['Destination'] = ['BES.UMN.us']
#j.inputsandbox.append("mypdt.table")
j.submit()
print '\nThe DFC path for output data is:'
print app.get_output_dir()
print '\nThe dataset name for output data is:'
print app.get_dataset_name()
Note: the parameter "evtMaxPerJob" has better not exceed 20,000 to avoid high failure rate."jobGroup" can be used to filter jobs in the monitoring page. Extra input files besides job option and decay cards need to be added in the inputsandbox.
Copy BOSS job files including job options and decay cards to the same directory as ganga scripts, and use the following commands to submit:
$ ganga sim_dirac_by_event.py
After the job completed, the output will be stored in the DFC as a dataset like:
User_tyan_4360_664p01_bhabha_round06_31001_31100_stream001_rtraw
5.1.3.2 Job monitoring
The job status can be known from DIRAC portal in the "Job monitor" page.
5.1.3.3 Get output
The output data files could be found under /scratchfs in your own directory. You can access /scratchfs through lxlogin.ihep.ac.cn in IHEP.
5.1.4 Job types supported
Now BESIII distributed computing system can support three kinds of BOSS jobs:
(1)simulation (output: rtraw files)
(2)simulation+reconstruction (output: rtraw/dst/rec files)
(3)simulation+reconstruction+analysis(output: rtraw/dst/rec/root files)
5.1.4.1 Simulation
The example ganga script is as follows:
bossVersion = '6.6.4.p02'
optionsFile = 'jobOptions_sim_rhopi.txt'
jobGroup = 'sim_rhopi_141102'
metadata = {'resonance': 'jpsi', 'eventType': 'rhopi'}
splitter = UserSplitterByRun(evtMaxPerJob = 100, evtTotal = 100*10)
app = Boss(version=bossVersion, optsfile=optionsFile, metadata=metadata)
j = Job(application=app, splitter=splitter)
j.backend = Dirac()
j.backend.settings['JobGroup'] = jobGroup
j.submit()
5.1.4.2 Simulation+reconstruction
The example ganga script is as follows:
bossVersion = '6.6.4.p02'
optionsFile = 'jobOptions_sim_rhopi.txt'
recOptionsFile = 'jobOptions_rec_rhopi.txt'
jobGroup = 'simrec_rhopi_141102'
metadata = {'resonance': 'jpsi', 'eventType': 'rhopi'}
splitter = UserSplitterByRun(evtMaxPerJob = 100, evtTotal = 100*10)
app = Boss(version=bossVersion, optsfile=optionsFile,
recoptsfile=recOptionsFile, metadata=metadata)
j = Job(application=app, splitter=splitter)
j.backend = Dirac()
j.backend.settings['JobGroup'] = jobGroup
j.submit()
5.1.4.3 Simulation+Reconstruction+Analysis
The example ganga script is as follows:
bossVersion = '6.6.4.p02'
optionsFile = 'jobOptions_sim_rhopi.txt'
recOptionsFile = 'jobOptions_rec_rhopi.txt'
anaOptionsFile = 'jobOptions_ana_rhopi.txt'
jobGroup = 'simrecana_rhopi_141102'
metadata = {'resonance': 'jpsi', 'eventType': 'rhopi'}
splitter = UserSplitterByRun(evtMaxPerJob = 100, evtTotal = 100*10)
app = Boss(version=bossVersion, optsfile=optionsFile, recoptsfile=recOptionsFile,
anaoptsfile=anaOptionsFile, use_custom_package=True, metadata=metadata)
j = Job(application=app, splitter=splitter)
j.backend = Dirac()
j.backend.settings['JobGroup'] = jobGroup
j.submit()
If users need to upload their own code to grid, he needs to set "use_custom_package=True".
5.1.5 Job Splitting
Two ways of splitting:
(1) split by run, splitting by the luminosity of each run, each job contain just one run and event number is proportional to the luminosity of the run.
(2) split by event, all the jobs contains the range of run specified in job options files and events for each jobs are the same.
5.1.5.1 Split by Run
Use splitter to define total events and the maximum events in each jobs
splitter = UserSplitterByRun(evtMaxPerJob = 500, evtTotal = 500*100)
In the job option file, please set the run range with the smaller run (absolute value) first:
RealizationSvc.RunIdList = {-11414, 0, -13988, -14395, 0, -14604};
or
RealizationSvc.RunIdList = {11414, 0, 13988, 14395, 0, 14604};
Note: Sign of minus is optional. Be sure that the run range don't exceed one round.
If uses want to know event number in each one,define output file name in the splitter:
splitter = UserSplitterByRun(evtMaxPerJob = 500, evtTotal = 500*100, outputEvtNum = 'run_event.log')
5.1.5.2 Split by Event
Use splitter to define total events and the maximum events in each jobs
splitter = UserSplitterByEvent(evtMaxPerJob = 500, evtTotal = 500*100)
splitter = UserSplitterByEvent(evtMaxPerJob = 500, evtTotal = 500*100)
Note: only use this method in a special case since it will create heavy pressure in local file system when jobs are running in parallel.
5.1.5.3 Random seed
Random seed can be defined by default, but users can also define his own, for example:
splitter = UserSplitterByRun(evtMaxPerJob = 100, evtTotal = 100*10, seed = 10001)
Then the seed can be 10001, 10002, 10003,...
5.1.6 Others
5.1.6.1 Select output files
The output files are the final output of the jobs by default. If users want to keep the middle results, can do the setting:
app = Boss(version=bossVersion, optsfile=optionsFile, recoptsfile=recOptionsFile, anaoptsfile=anaOptionsFile, metadata=metadata, use_custom_package=True, output_step=['sim','rec','ana'])
5.1.6.2 Use defined generators
If users need to do the simulation with its own generator such as BESEVTGEN DIY mode, he needs to set "use_custom_package=True" which allows to upload the packages under his work directory
app = Boss(version=bossVersion, optsfile=optionsFile, use_custom_package=True, metadata=metadata)
5.1.6.3 Other input files
If users want to upload his own special single files, he can use auto_upload.append to do the upload.
bossVersion = '6.6.4.p02'
optionsFile = 'jobOptions_sim_rhopi.txt'
recOptionsFile = 'jobOptions_rec_rhopi.txt'
anaOptionsFile = 'jobOptions_ana_rhopi.txt'
jobGroup = 'simrecana_rhopi_141102'
metadata = {'resonance': 'jpsi', 'eventType': 'rhopi'}
splitter = UserSplitterByRun(evtMaxPerJob = 100, evtTotal = 100*10)
app = Boss(version=bossVersion, optsfile=optionsFile, recoptsfile=recOptionsFile,
anaoptsfile=anaOptionsFile, use_custom_package=True, metadata=metadata)
app.auto_upload.append('/your_extra_file_path_1/extra_input_1.txt')
app.auto_upload.append('/your_extra_file_path_2/extra_input_2.txt')
j = Job(application=app, splitter=splitter)
j.backend = Dirac()
j.backend.settings['JobGroup'] = jobGroup
j.submit()
Note:Make sure that the paths for these files in the code are using relative directory to the current directory instead of absolute directories. A good way is to set these file paths in the job option files.
5.2 CEPC User Guide
5.2.1 Preparation
Apply certificates and join BES VO before using distributed resources to run BOSS jobs.
5.2.2 Environment set-up
source /cvmfs/dcomputing.ihep.ac.cn/frontend/dsub/setup/env_dsub.sh
5.2.3 Prepare job configuration files
Prepare job.cfg,here is the example, you can also refer to the examples under the directory /cvmfs/dcomputing.ihep.ac.cn/frontend/dsub
job_type = cepc_sr
repo_dir = ./repo
work_dir = ./work
input_filelist = ./stdhep.list
output_dir = test_001
evtmax = 10
evtstart = 0
batch = 1
job_group = CEPC_test_001
5.2.4 Job submission
Submit jobs with the following commands:
dsub job.cfg
5.2.5 Job status monitoring
Users can go to the DIRAC web portal(http://dirac.ihep.ac.cn) and choose the job monitor. JobGroup is a parameter used to query jobs.
Here is exit codes for common problems:
10 error in preparation
11 cvmfs is not ready
20 error in simulation
21 error in database connection
22 Too many substeps
23 no input events
30 error in reconstruction
5.2.6 Get outputs
When jobs completed, you can find your outputs under /gridfs
/gridfs/cepc/user/<initial>/<username>/<output_dir>/sim /gridfs/cepc/user/<initial>/<username> /<output_dir>/rec