test/examples/README.md

0001 # Examples
0002
0003 ## Full JetHT analysis example using semi-automatic CRAB implementation
0004
0005 There is a possibility to run the analysis using CRAB. Currently the implementation for this is semi-automatic, meaning that the All-In-One tool provides you all the necessary configuration, but you will need to manually run the jobs and make sure all the jobs are finished successfully.
0006
0007 Before starting, make sure your have voms proxy available:
0008
0009 ```
0010 voms-proxy-init --voms cms
0011 ```
0012
0013 To begin, create the configuration using the All-In-One tool. It is important that you will do a dry-run:
0014
0015 ```
0016 validateAlignments.py -d jetHtAnalysis_fullExampleConfiguration.json
0017 ```
0018
0019 Move to the created directory with the configuration files:
0020
0021 ```
0022 cd $CMSSW_BASE/src/Alignment/OfflineValidation/test/examples/example_json_jetHT/JetHT/single/fullExample/prompt
0023 ```
0024
0025 Check that you have write access to the default output directory used by the CRAB. By default the shared EOS space of the tracker alignment group at CERN is used.
0026
0027 ```
0028 crab checkwrite --site=T2_CH_CERN --lfn=/store/group/alca_trackeralign/`whoami`
0029 ```
0030
0031 At this point, you should also check the running time and memory usage. Running over one file can take up to 2 hours and you will need about 1000 MB of RAM for that. So you should set the corresponding variables to values slightly above these.
0032
0033 ```
0034 vim crabConfiguration.py
0035 ...
0036 config.Data.outLFNDirBase = '/store/group/alca_trackeralign/username/' + config.General.requestName
0037 ...
0038 config.JobType.maxMemoryMB = 1200
0039 config.JobType.maxJobRuntimeMin = 200
0040 ...
0041 config.Data.unitsPerJob = 1
0042 ...
0043 config.Site.storageSite = 'T2_CH_CERN'
0044 ```
0045
0046 After checking the configuration, submit the jobs.
0047
0048 ```
0049 crab submit -c crabConfiguration.py
0050 ```
0051
0052 Do the same for ReReco and UltraLegacy folders:
0053
0054 ```
0055 cd $CMSSW_BASE/src/Alignment/OfflineValidation/test/examples/example_json_jetHT/JetHT/single/fullExample/rereco
0056 crab submit -c crabConfiguration.py
0057 ...
0058 cd $CMSSW_BASE/src/Alignment/OfflineValidation/test/examples/example_json_jetHT/JetHT/single/fullExample/ultralegacy
0059 crab submit -c crabConfiguration.py
0060 ```
0061
0062 Now you need to wait for the crab jobs to finish. It should take around two hours times the value you set for unitsPerJob variable for these example runs. After the jobs are finished, you will need to merge the output files and transfer the merged files to the correct output folders. One way to do this is as follows:
0063
0064 ```
0065 cd $CMSSW_BASE/src/JetHtExample/example_json_jetHT/JetHT/merge/fullExample/prompt
0066 hadd -ff JetHTAnalysis_merged.root `xrdfs root://eoscms.cern.ch ls -u /store/group/alca_trackeralign/username/path/to/prompt/files | grep '\.root'`
0067 cd $CMSSW_BASE/src/JetHtExample/example_json_jetHT/JetHT/merge/fullExample/rereco
0068 hadd -ff JetHTAnalysis_merged.root `xrdfs root://eoscms.cern.ch ls -u /store/group/alca_trackeralign/username/path/to/rereco/files | grep '\.root'`
0069 cd $CMSSW_BASE/src/JetHtExample/example_json_jetHT/JetHT/merge/fullExample/ultralegacy
0070 hadd -ff JetHTAnalysis_merged.root `xrdfs root://eoscms.cern.ch ls -u /store/group/alca_trackeralign/username/path/to/ultralegacy/files | grep '\.root'`
0071 ```
0072
0073 For 100 files, the merging should be done between one and two minutes. Now that all the files are merged, the only thing that remains is to plot the results. To do this, navigate to the folder where the plotting configuration is located and run it:
0074
0075 ```
0076 cd $CMSSW_BASE/src/Alignment/OfflineValidation/test/examples/example_json_jetHT/JetHT/plot/fullExample
0077 jetHtPlotter validation.json
0078 ```
0079
0080 The final validation plots appear to the output folder. If you want to change the style of the plots or the histograms plotted, you can edit the validation.json file here and rerun the plotter. No need to redo the time-consuming analysis part.
0081
0082 ## Full example using condor
0083
0084 The CRAB running is recommended for large datasets, but smaller tests can also be readily done with condor. There are two different modes for running with condor, and the mode is selected automatically based on the input file list. If using the same file list as for CRAB, file based job splitting method is used. However, there are some dangers in using file based splitting with maxevents parameter, namely that if not selected carefully, some of the files might be skipped altogether. This might lead to certain runs not being analyzed. Thus run number based job splitting is recommended to ensure that each run has the statistics defined in the maxevents parameter. To use run number based splitting, we need to include the run numbers that can be found from the input files in the file list.
0085
0086 There is an automatic procedure to do this. You can generate a file list with run numbers from a regular file list using the tool makeListRunsInFiles.py. To do this for the file list used in the CRAB example, you need to run the command
0087
0088 ```
0089 makeListRunsInFiles.py --input=jetHtFilesForRun2018A_first100files.txt --output=jetHtFilesForRun2018A_first100files_withRuns.txt
0090 ```
0091
0092 Since the information about runs needs to be read from a DAS database, it takes a while to execute this command. For this example with 100 files, the script should finish within two minutes. If it takes significantly longer than that, there might be some network issues. After generating the file list with run information included, change the dataset name in the json configuration file
0093
0094 ```
0095 vim jetHtAnalysis_fullExampleConfiguration.json
0096 ...
0097                     "dataset": "$CMSSW_BASE/src/Alignment/OfflineValidation/test/examples/jetHtFilesForRun2018A_first100files_withRuns.txt",
0098 ```
0099
0100 Now that the run numbers are included with the files, validateAlignments.py script will automatically setup run number based splitting. The configuration has been setup to run 1000 events from each run. Notice that if you use the original setup, the validation will still work, but 1000 events from each file will be analyzed instead. You can run everything with the command:
0101
0102 ```
0103 validateAlignments.py -j espresso jetHtAnalysis_fullExampleConfiguration.json
0104 ```
0105
0106 Then you just wait for your jobs to get submitted, and soon afterwards the plots will appear in the folder
0107
0108 ```
0109 cd $CMSSW_BASE/src/Alignment/OfflineValidation/test/examples/example_json_jetHT/JetHT/plot/fullExample/output
0110 ```
0111
0112 ## jetHt_multiYearTrendPlot.json
0113
0114 This configuration shows how to plot multi-year trend plots using previously merged jetHT validation files. It uses the jetHT plotting macro standalone. You can run this example using
0115
0116 ```
0117 jetHtPlotter jetHt_multiYearTrendPlot.json
0118 ```
0119
0120 For example purposes, this configuration has a lot of variables redefined to their default values. It shows you available configuration options, even though they can be omitted if you do not want to change the default values.
0121
0122 ## jetHt_ptHatWeightForMCPlot.json
0123
0124 This configuration shows how to apply ptHat weight for MC files produced with different ptHat cuts. What you need to do is to collect the file names and lower boundaries of the ptHat bins into a file, which is this case is ptHatFiles_MC2018_PFJet320.txt. For a file list like this, the ptHat weight is automatically applied by the code. The weights are correct for run2. The plotting can be done using the jetHT plotter standalone:
0125
0126 ```
0127 jetHtPlotter jetHt_ptHatWeightForMCPlot.json
0128 ```