Validation/RecoParticleFlow/README.md

0001
0002 # Quickstart
0003
0004 Run locally on lxplus
0005
0006
0007 Set up the work area
0008 for lxplus with SLC8 (used for Run3 CMSSW releases)
0009
0010 ~~~
0011 ssh -X username@lxplus8.cern.ch
0012 export SCRAM_ARCH=el8_amd64_gcc10
0013 cmsrel CMSSW_13_3_0_pre3
0014 cd CMSSW_13_3_0_pre3
0015 cmsenv
0016 ~~~
0017
0018 Get the code and compile
0019
0020 ~~~
0021 git cms-addpkg Validation/RecoParticleFlow
0022 scram b -j4
0023 cd $CMSSW_BASE/src/Validation/RecoParticleFlow
0024 ~~~
0025
0026 Activate reading files from remote locations and
0027 using dasgoclient for creating filelists in the next step
0028
0029 ~~~
0030 voms-proxy-init -voms cms
0031 ~~~
0032
0033 Create input file lists under test/tmp/das_cache
0034
0035 (You can modify which datasets are being used in the end of test/datasets.py script)
0036
0037 ~~~
0038 cd test; python3 datasets.py; cd ..
0039 ~~~
0040
0041 Proceed to RECO step, about 30 minutes
0042
0043 This is necessary if you need to re-reco events to test introduced changes to PF reco.
0044
0045 Note 1: the default era & condition is now set to Run3 2022. Change CONDITIONS and
0046 ERA in test/run_relval.sh when trying other era, before trying the above commands.
0047
0048 Note 2: the execution will fail if the destination directory (test/tmp/QCD etc.)
0049 already exists. Rename or remove existing conflicting directories from test/tmp.
0050
0051 ~~~
0052 make QCD_reco
0053 ~~~
0054
0055 Now let's do the DQM step that takes a few minutes
0056
0057 ~~~
0058 make QCD_dqm
0059 ~~~
0060
0061 Repeat for QCDPU & NuGunPU (make QCDPU_reco, make QCDPU_dqm etc.) or use CRAB
0062 for reco and run dqm steps as indicated below.
0063
0064 Next do final HTML plots (by default this just plots two identical results in
0065 tmp/{QCD,QCDPU,NuGunPU})
0066
0067 You can (and probably want to) also edit the 'make plots' part of Makefile for successfully running
0068 'make plots' without all the data samples produced, or use the selective commands
0069 'make QCD_plots', 'make QCDPU_plots' and 'make NuGunPU_plots'.
0070 You can also select only some of the handles controlling which validation plots are produced. The implemented handles are "--doResponsePlots",  "--doOffsetPlots", "--doMETPlots" and "--doPFCandPlots"
0071
0072 Start by increasing the amount of allowed open files to avoid crashes due to too many files:
0073
0074 ~~~
0075 ulimit -n 4096
0076 ~~~
0077
0078
0079 Note: each of the provided plotting commands will first empty and remove the
0080 plots/ -directory, so please save wanted plots somewhere else.
0081
0082 ~~~
0083 make plots # If you processed QCD, QCDPU and NuGunPU
0084 make QCD_plots # If you produced only QCD
0085 ~~~
0086
0087 If you get an error saying "ImportError: No module named ROOT", execute the following commands for modifying environment variables and try again:
0088
0089 ~~~
0090 export LD_LIBRARY_PATH=$ROOTSYS/lib:$PYTHONDIR/lib:$LD_LIBRARY_PATH
0091 export PYTHONPATH=$ROOTSYS/lib:$PYTHONPATH
0092 ~~~
0093
0094 If you have reference DQM results in tmp/QCD_ref, tmp/QCDPU_ref,
0095 tmp/NuGunPU_ref (i.e. reference results) etc under test/tmp/, you can do also this:
0096 (this is how actual comparisons with different reconstruction versions are done)
0097
0098 ~~~
0099 make plots_with_ref
0100 ~~~
0101
0102 The 'plots' directory can be viewed from a web browser once it is moved to e.g. /afs/cern.ch/user/f/foo/www/.
0103 In this case the URL for the directory is 'http://cern.ch/foo/plots', where 'foo' is the username
0104 (This requires that your personal cern web page cern.ch/username is enabled)
0105
0106
0107 # Running via condor
0108
0109 Make sure datasets.py is already parsed above and there are input file lists under ${CMSSW_BASE}/src/Validation/RecoParticleFlow/test/tmp/das_cache. This is written assuming that you are running condor jobs on CERN lxplus, although with some modifications, the setup can be used with condor of other clusters.
0110
0111 ~~~
0112 cd ${CMSSW_BASE}/src/Validation/RecoParticleFlow/test
0113 voms-proxy-init -voms cms
0114 cmsenv
0115 mkdir -p log
0116 condor_submit condor_QCD.jdl
0117 ~~~
0118
0119 The output files will appear /eos/cms/store/group/phys_pf/PFVal/QCD. You will want to make sure you are subscribed to cms-eos-phys-pf so that you have eos write access. There are jdl files for other datasets also.
0120
0121
0122 # Running via crab
0123
0124
0125 The reco step can also be run via Crab. Prepare CRAB scripts:
0126
0127 ~~~
0128 make conf
0129 make dumpconf
0130 cd test/crab
0131 ~~~
0132
0133 Initialize CRAB environment if not done already:
0134
0135 ~~~
0136 source /cvmfs/cms.cern.ch/crab3/crab.sh
0137 voms-proxy-init -voms cms
0138 cmsenv
0139 ~~~
0140
0141 Submit jobs
0142 Note that the datasets to run over are defined in the below script.
0143 Modify the "samples" -list there for changing datasets to process.
0144
0145 ~~~
0146 python3 multicrab.py
0147 ~~~
0148
0149 Once the jobs are done, move the step3_inMINIAODSIM root files
0150 from your GRID destination directory to test/tmp/QCD (etc) directory and proceed
0151 with QCD_dqm etc.
0152 Please note that any file matching 'step3\*MINIAODSIM\*.root' will
0153 be included in the DQM step, so delete files you don't want to study.
0154
0155
0156
0157 Note that the default era, condition, and samples are now set to 2021. Change CONDITIONS and ERA in test/run_relval.sh when trying other era, before trying the above commands. Also check (and if necessary, update) input samples and conf.Site.storageSite specified in $CMSSW_BASE/src/Validation/RecoParticleFlow/crab/multicrab.py (default storage site is T2_US_Caltech, but change it to your favorite site you have access to. use crab checkwrite --site=<site> to check your permission).
0158 Take note that the CMSSW python3 configuration for running the RECO sequence is dumped into `crab/step3_dump.py`.
0159
0160
0161 # Running DQM steps from existing MINIAOD samples
0162
0163 ~~~
0164 # For example (default for 2021):
0165 #CONDITIONS=auto:phase1_2018_realistic ERA=Run2_2018 # for 2018 scenarios
0166 CONDITIONS=auto:phase1_2022_realistic ERA=Run3 # for run 3
0167 #CONDITIONS=auto:phase2_realistic ERA=Phase2C9 # for phase2
0168 #Running with 2 threads allows to use more memory on grid
0169 NTHREADS=2 TMPDIR=tmp
0170
0171 cd $CMSSW_BASE/src/Validation/RecoParticleFlow
0172 make -p tmp/QCD; cd tmp/QCD
0173 #(or
0174 make -p tmp/QCDPU; cd tmp/QCDPU
0175 make -p tmp/NuGunPU; cd tmp/NuGunPU
0176 #)
0177 ~~~
0178
0179 # Make a text file for input files. For example:
0180
0181 ~~~
0182 dasgoclient --query="file dataset=/RelValQCD_FlatPt_15_3000HS_14/CMSSW_11_0_0_patch1-110X_mcRun3_2021_realistic_v6-v1/MINIAODSIM" > step3_filelist.txt
0183 #(or
0184 dasgoclient --query="file dataset=/RelValQCD_Pt15To7000_Flat_14TeV/CMSSW_11_0_0-110X_mcRun4_realistic_v2_2026D49noPU-v1/MINIAODSIM" > step3_filelist.txt
0185 or using the list of files from your crab output areas.
0186 #)
0187 cat step3_filelist.txt
0188
0189 cmsDriver.py step5 --conditions $CONDITIONS -s DQM:@pfDQM --datatier DQMIO --nThreads $NTHREADS --era $ERA --eventcontent DQM --filein filelist:step3_filelist.txt --fileout file:step5.root -n -1 >& step5.log &
0190 ~~~
0191
0192 # After step5 is completed:
0193 ~~~
0194 cmsDriver.py step6 --conditions $CONDITIONS -s HARVESTING:@pfDQM --era $ERA --filetype DQM --filein file:step5.root --fileout file:step6.root >& step6.log &
0195 ~~~