MkFitCore/standalone/README.md

0001 # mkFit: a repository for vectorized, parallelized charged particle track reconstruction
0002
0003 **Intro**: Below is a short README on setup steps, code change procedures, and some helpful pointers. Please read this thoroughly before checking out the code! As this is a markdown file, it is best viewed via a web browser.
0004
0005 ### Outline
0006 1) Test platforms
0007 2) How to checkout the code
0008 3) How to run the code
0009 4) How to make changes to the main development branch
0010 5) The benchmark and validation suite
0011    1) Running the main script
0012    2) Some (must read) advice on benchmarking
0013    3) (Optional) Using additional scripts to display plots on the web
0014    4) Interpreting the results
0015       1) Benchmark results
0016       2) Validation results
0017       3) Other plots
0018 6) Submit an issue
0019 7) Condensed description of code
0020 8) Other helpful README's in the repository
0021 9) CMSSW integration
0022    1) Considerations for `mkFit` code
0023    2) Building and setting up `mkFit` for CMSSW
0024       1) Build `mkFit`
0025          1) Lxplus
0026          2) Phi3
0027       2) Set up `mkFit` as an external
0028       3) Pull CMSSW code and build
0029    3) Recipes for the impatient on phi3
0030       1) Offline tracking
0031       2) HLT tracking (iter0)
0032    4) More thorough running instructions
0033       1) Offline tracking
0034          1) Customize functions
0035          2) Timing measurements
0036          3) Producing MultiTrackValidator plots
0037       2) HLT tracking (iter0)
0038    5) Interpretation of results
0039       1) MultiTrackValidator plots
0040       2) Timing
0041 10) Other useful information
0042     1) Important Links
0043     2) Tips and Tricks
0044        1) Missing Libraries and Debugging
0045        2) SSH passwordless login for benchmarking and web scripts
0046     3) Acronyms/Abbreviations
0047
0048 ## Section 1: Test platforms
0049
0050 - **phi1.t2.ucsd.edu**: [Intel Xeon Processor E5-2620](https://ark.intel.com/products/64594/Intel-Xeon-Processor-E5-2620-15M-Cache-2_00-GHz-7_20-GTs-Intel-QPI) _Sandy Bridge_ (referred to as SNB, phiphi, phi1)
0051 - **phi2.t2.ucsd.edu**: [Intel Xeon Phi Processor 7210](https://ark.intel.com/products/94033/Intel-Xeon-Phi-Processor-7210-16GB-1_30-GHz-64-core) _Knights Landing_ (referred to as KNL, phi2)
0052 - **phi3.t2.ucsd.edu**: [Intel Xeon Gold 6130 Processor](https://ark.intel.com/products/120492/Intel-Xeon-Gold-6130-Processor-22M-Cache-2_10-GHz) _Skylake Scalable Performance_ (referred to as SKL-Au, SKL-SP, phi3)
0053 - **lnx4108.classe.cornell.edu**: [Intel Xeon Silver 4116 Processor](https://ark.intel.com/products/120481/Intel-Xeon-Silver-4116-Processor-16_5M-Cache-2_10-GHz) _Skylake Scalable Performance_ (referred to as SKL-Ag, SKL-SP, lnx4108, LNX-S)
0054 - **lnx7188.classe.cornell.edu**: [Intel Xeon Gold 6142 Processor](https://ark.intel.com/content/www/us/en/ark/products/120487/intel-xeon-gold-6142-processor-22m-cache-2-60-ghz.html) _Skylake Scalable Performance_ (referred to as lnx7188,LNX-G)
0055
0056 phi1, phi2, and phi3 are all managed across a virtual login server and therefore the home user spaces are shared. phi1, phi2, phi3, lnx7188, and lnx4108 also have /cvmfs mounted so you can source the environment needed to run the code.
0057
0058 The main development platform is phi3. This is the recommended machine for beginning development and testing. Login into any of the machines is achieved through ```ssh -X -Y <phi username>@phi<N>.t2.ucsd.edu```. It is recommended that you setup ssh key forwarding on your local machine so as to avoid typing in your password with every login, and more importantly, to avoid typing your password during the benchmarking (see Section 10.ii.b).
0059
0060 **Extra platform configuration information**
0061 - phi1, phi3, and lnx4108 are dual socket machines and have two identical Xeons on each board
0062 - phi1, phi2, and phi3 all have TurboBoost disabled to disentangle some effects of dynamic frequency scaling with higher vectorization
0063
0064 For further info on the configuration of each machine, use your favorite text file viewer to peruse the files ```/proc/cpuinfo``` and ```/proc/meminfo``` on each machine.
0065
0066 ## Section 2: How to checkout the code
0067
0068 The master development branch is ```devel```, hosted on a [public GH repo](https://github.com/trackreco/mkFit) (referred to as ```trackreco/devel``` for the remainder of the README). This is a public repository, as are all forks of this repository. Development for mkFit is done on separate branches within a forked repository. Make sure to fork the repository to your own account first (using the "Fork" option at the top of the webpage), and push any development branches to your own forked repo first.
0069
0070 Once forked, checkout a local copy by simply doing a git clone:
0071
0072 ```
0073 git clone git@github.com:<user>/mkFit
0074 ```
0075
0076 where ```<user>``` is your GH username if renamed your remote to your username. Otherwise ```<user>``` will be ```origin```.
0077
0078 If you wish to add another user's repo to your local clone, do:
0079
0080 ```
0081 git remote add <user> git@github.com:<user>/mkFit
0082 ```
0083
0084 This is useful if you want to submit changes to another user's branches. To checkout a remote branch, do:
0085
0086 ```
0087 git fetch <user>
0088 git fetch <user> <branch>
0089 git checkout -b <branch> <user>/<branch>
0090 ```
0091
0092 ## Section 3: How to run the code
0093
0094 As already mentioned, the recommended test platform to run the code is phi3. Checkout a local repo on phi3 from your forked repo. To run the code out-of-the-box from the main ```devel``` branch, you will first need to source the environment:
0095
0096 ```
0097 source xeon_scripts/init-env.sh
0098 ```
0099
0100 You are free to put the lines from this script in your login scripts (.bashrc, .bash_profile, etc). However, encapsulate them within a function and then call that function upon logging into phi3. We want clean shells before launching any tests. Therefore, if you have any setup that sources something, disable it and do a fresh login before running any tests!
0101
0102 Now compile the code:
0103
0104 ```
0105 make -j 32 AVX2:=1
0106 ```
0107
0108 To run the code with some generic options, do:
0109
0110 ```
0111 ./mkFit/mkFit --cmssw-n2seeds --input-file /data2/slava77/samples/2017/pass-c93773a/initialStep/PU70HS/10224.0_TTbar_13+TTbar_13TeV_TuneCUETP8M1_2017PU_GenSimFullINPUT+DigiFullPU_2017PU+RecoFullPU_2017PU+HARVESTFullPU_2017PU/memoryFile.fv3.clean.writeAll.CCC1620.recT.082418-25daeda.bin --build-ce --num-thr 64 --num-events 20
0112 ```
0113
0114 Consult Sections 7-8 for where to find more information on descriptions of the code, which list resources on where to find the full set of options for running the code.
0115
0116 There are ways to run this code locally on macOS. Instructions for how to to this will be provided later. You will need to have XCode installed (through the AppStore), XCode command line tools, a ROOT6 binary (downloaded from the ROOT webpage), as well as TBB (through homebrew).
0117
0118 ## Section 4: How to make changes to the main development branch
0119
0120 Below are some rules and procedures on how to submit changes to the main development branch. Although not strictly enforced through settings on the main repo, please follow the rules below. This ensures we have a full history of the project, as we can trace any changes to compute or physics performance that are introduced (whether intentional or unintentional).
0121
0122 **Special note**: Do not commit directly to ```cerati/devel```! This has caused issues in the past that made it difficult to track down changes in compute and physics performance. Please always submit a Pull Request first, ensuring it is reviewed and given the green light before hitting "Merge pull request".
0123
0124 1. Checkout a new branch on your local repo: ```git checkout -b <branch>```
0125 2. Make some changes on your local repo, and commit them to your branch: ```git commit -m "some meaningful text describing the changes"```
0126 3. If you have made multiple commits, see if you can squash them together to make the git history legibile for review. If you do not know what you are doing with this, make sure to save a copy of the local branch as backup by simplying checking out a new branch from the branch you are with something like: ```git checkout -b <branch_copy>```. Git provides a [tutorial on squashing commits](https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History).
0127 4. Ensure you have pulled down the latest changes from the main development branch merged into your local development branch. ```git merge cerati devel``` can make a mess, so the preferred option is ```git rebase --onto <new_base_hash> <old_base_hash> <branch>```. CMSSW provides a nice explanation of [this rebase option](https://cms-sw.github.io/tutorial-resolve-conflicts.html).
0128 5. Test locally!
0129    1. If you have not done so, clone your forked repo onto phi3, checking out your new branch.
0130    2. Source the environment for phi3 as explained in Section 3.
0131    3. Compile test: ```make -j 32 AVX2:=1```. Fix compilation errors if they are your fault or email the group / person responsible to fix their errors!
0132    4. Run benchmark test: ```./mkFit/mkFit --cmssw-n2seeds --input-file /data2/slava77/samples/2017/pass-4874f28/initialStep/PU70HS/10224.0_TTbar_13+TTbar_13TeV_TuneCUETP8M1_2017PU_GenSimFullINPUT+DigiFullPU_2017PU+RecoFullPU_2017PU+HARVESTFullPU_2017PU/a/memoryFile.fv3.clean.writeAll.recT.072617.bin --build-ce --num-thr 64 --num-events 20```. Ensure the test did not crash, and fix any segfaults / run-time errors!
0133    5. Compile with ROOT test: ```make -j 32 AVX2:=1 WITH_ROOT:=1```. Before compiling, make sure to do a ```make distclean```, as we do not want conflicting object definitions. Fix errors if compilation fails.
0134    6. Run validation test:  ```./mkFit/mkFit --cmssw-n2seeds --input-file /data2/slava77/samples/2017/pass-4874f28/initialStep/PU70HS/10224.0_TTbar_13+TTbar_13TeV_TuneCUETP8M1_2017PU_GenSimFullINPUT+DigiFullPU_2017PU+RecoFullPU_2017PU+HARVESTFullPU_2017PU/a/memoryFile.fv3.clean.writeAll.recT.072617.bin --build-ce --num-thr 64 --num-events 20 --backward-fit-pca --cmssw-val-fhit-bprm```. Ensure the test did not crash!
0135 6. Run the full benchmarking + validation suite on all platforms: follow procedure in Section 5 (below)! If you notice changes to compute or physics performance, make sure to understand why! Even if you are proposing a technical two-line change, please follow this step as it ensures we have a full history of changes.
0136 7. Prepare a Pull Request (PR)
0137    1. Push your branch to your forked repo on GitHub: ```git push <forked_repo_name> <branch>```
0138    2. [Navigate to the main GH](https://github.com/trackreco/mkFit)
0139    3. Click on "New Pull Request"
0140    4. Click on "Compare across forks", and navigate to your fork + branch you wish to merge as the "head fork + compare"
0141    5. Provide a decent title, give a brief description of the proposed commits. Include a link to the benchmarking and validation plots in the description. If there are changes to the compute or physics performance, provide an explanation for why! If no changes are expected and none are seen, make sure to mention it.
0142    6. (Optional) Nominate reviewers to check over the proposed changes.
0143    7. Follow up on review comments! After pushing new commits to your branch, repeat big steps 5 and 6 (i.e. test locally and re-run the validation). Post a comment to the PR with the new plots.
0144    8. Once given the green light, you can hit "Merge Pull Request", or ask someone else to do it.
0145
0146 ## Section 5: The benchmark and validation suite
0147
0148 **Notes on nomenclature**
0149 - "benchmark": these are the compute performance tests (i.e. time and speedup)
0150 - "validation": these are the physics performance tests (i.e. track-finding efficiency, fake rate, etc.)
0151
0152 We often use these words interchangibly to refer to the set of benchmark and validation tests as a single suite. So if you are asked to "run the benchmarking" or "run the validation": please run the full suite (unless specifically stated to run one or the other). In fact, the main scripts that run the full suite use "benchmark" in their name, even though they may refer to both the running of the compute and physics performance tests and plot comparisons.
0153
0154 **Notes on samples**
0155
0156 Currently, the full benchmark and validation suite uses simulated event data from CMSSW for ttbar events with an average 70 pileup collisions per event. The binary file has over 5000 events to be used for high statistics testing of time performance. There also exists samples with lower number of events for plain ttbar no pileup and ttbar + 30 pileup, used to measure the effects on physics performance when adding more complexity. Lastly, there also exists a sample for muon-gun events: 10 muons per event with no pileup. The muon-gun sample is used to show physics performance in a very clean detector environment. All of these samples are replicated on disk on all three platforms to make time measurements as repeatable and representative as possible.
0157
0158 ### Section 5.i: Running the main script
0159
0160 The main script for running the full suite can be launched from the top-level directory with:
0161
0162 ```
0163 ./xeon_scripts/runBenchmark.sh ${suite} ${useARCH} ${lnxuser}
0164 ```
0165
0166 There are three options for running the full suite by passing one of the three strings to the parameter ```${suite}```:
0167 - ```full``` : runs compute and physics tests for all track finding routines (BH, STD, CE, FV)
0168 - ```forPR``` : runs compute and physics tests for track finding routines used for comparisons in pull requests (default setting: BH and CE for benchmarks, STD and CE for validation)
0169 - ```forConf``` : runs compute and physics tests for track finding routines used for conferences only (currently only CE)
0170
0171 The ```full``` option currently takes little more than a half hour, while the other tests take about 25 minutes.
0172
0173 Additionally, the ```${useARCH}``` option allows the benchmarks to be run on different computer clusters:
0174 - ```${useARCH} = 0```: (default) runs on phi3 computers only. This option should be run from phi3.
0175 - ```${useARCH} = 1```: runs on lnx7188 and lnx4108 only. This option should be run from lnx7188.
0176 - ```${useARCH} = 2```: runs on both phi3 and lnx. This option should be run from phi3.
0177 - ```${useARCH} = 3```: runs on both all phi computers (phi1, phi2 and phi3). This option should be run from phi3.
0178 - ```${useARCH} = 4```: runs on both all phi computers (phi1, phi2 and phi3) as well as lnx7188 and lnx4108. This option should be run from phi3.
0179
0180
0181 - ```${lnxuser}``` denotes the username on the lnx computers. This is only need if running on the lnx computers when the lnx username is different from the phi3 username.
0182
0183 Inside the main script, tests are submitted for phi1, phi2, and phi3 concurrently by: tarring up the local repo, sending the tarball to a disk space on the remote platform, compiling the untarred directory natively on the remote platform, and then sending back the log files to be analyzed on phi3. It should be noted that the tests for phi3 are simply run on in the user home directory when logged into phi3 (although we could in principle ship the code to the work space disk on phi3). Because we run the tests for phi3 in the home directory, which is shared by all three machines, we pack and send the code to a remote _disk_ space _before_ launching the tests on phi3 from the home directory. The scripts that handle the remote testing are:
0184
0185 ```
0186 ./xeon_scripts/tarAndSendToRemote.sh ${remote_arch} ${suite}
0187 ./xeon_scripts/benchmark-cmssw-ttbar-fulldet-build-remote.sh ${ben_arch} ${suite}
0188 ```
0189
0190 When these scripts are called separately to run a test on particular platform, one of three options must be specified for ```${remote_arch}``` or ```${ben_arch}```: ```SNB```, ```KNL```, or ```SKL-SP```. The main script ```xeon_scripts/runBenchmark.sh``` will do this automatically for all three platforms. If the code is already resident on a given machine, it is sufficient to run:
0191
0192 ```
0193 ./xeon_scripts/benchmark-cmssw-ttbar-fulldet-build.sh ${ben_arch} ${suite}
0194 ```
0195
0196 The appropriate strings should appear in place of ```${ben_arch}``` and ```${suite}```. In fact, this is the script called by ```xeon_scripts/runBenchmark.sh``` to launch tests on each of the platforms once the code is sent and unpacked.
0197
0198 Within the main ```xeon_scripts/runBenchmark.sh``` script, there are two other scripts that make performance plots from the log files of compute performance tests:
0199
0200 ```
0201 ./plotting/benchmarkPlots.sh ${suite}
0202 ./plotting/textDumpPlots.sh ${suite}
0203 ```
0204
0205 The first will produce the time and speedup plots, while the second produces distributions of basic kinematic quantites of the candidate track collections, comparing the results across the different platforms and different number of vector units and threads.
0206
0207 The main physics performance script that is run is:
0208
0209 ```
0210 ./val_scripts/validation-cmssw-benchmarks.sh ${suite}
0211 ```
0212
0213 The physics validation scripts supports also an option to produce results compatible with the standard tracking validation in CMSSW, the MultiTrackValidator (MTV). This can run as:
0214 ```
0215 ./val_scripts/validation-cmssw-benchmarks.sh ${suite} --mtv-like-val
0216 ```
0217
0218 This script will run the validation on the building tests specified by the ```${suite}``` option. It will also produce the full set of physics performance plots and text files detailing the various physics rates.
0219
0220 It should be mentioned that each of these scripts within ```./xeon_scripts/runBenchmark.sh``` can be launched on their own, as again, they each set the environment and run tests and/or plot making. However, for simplicity's sake, it is easiest when prepping for a PR to just run the master ```./xeon_scripts/runBenchmark.sh```.  If you want to test locally, it is of course possible to launch the scripts one at a time.
0221
0222 ### Section 5.ii: Some (must read) advice on benchmarking
0223
0224 1. Since the repo tarball and log files are sent back and forth via ```scp``` in various subscripts, it is highly recommended you have SSH-forwarding set up to avoid having to type your password every time ```scp``` is called. This can be particularly annoying since the return of the log files is mostly indeterminate, since it is just when the scripts finish running on the remote they will be sent back. Coupled with ```nohup``` when launching the main script, the prompt will never appear, and the log files will then be lost, as the final step in remote testing is removing the copy of repo on the remote platform at the end of ```xeon_scripts/benchmark-cmssw-ttbar-fulldet-build-remote.sh```. See Section 10.ii.b for more information on how to set up SSH-forwarding and passwordless login.
0225
0226 2. Before launching any tests, make sure the machines are quiet: we don't want to disturb someone who already is testing! Tests from different users at the same time will also skew the results of your own tests as the scripts make use of the full resources available on each platform at various points.
0227
0228 3. Please run the full suite from phi3 with a clean login: make sure nothing has been sourced to set up the environment. The main script (as well as the called subscripts) will set the environment and some common variables shared between all subscripts by sourcing two scripts:
0229
0230 4. Check the logs! A log with standard out and error is generated for each test launched. If a plot is empty, check the log corresponding to the test point that failed as this will be the first place to say where and how the test died (hopefully with a somewhat useful stack trace). If you are sure you are not responsible for the crash, email the group listserv to see if anyone else has experienced the issue (attaching the log file(s) for reference). If it cannot be resolved via email, it will be promoted to the a GH Issue.
0231
0232 ```
0233 source xeon_scripts/init-env.sh
0234 source xeon_scripts/common-variables.sh ${suite}
0235 ```
0236
0237 ### Section 5.iii: (Optional) Using additional scripts to display plots on the web
0238
0239 After running the full suite, there is an additional set of scripts within the ```web/``` directory for organizing the output plots and text files for viewing them on the web. Make sure to read the ```web/README_WEBPLOTS.md``` first to setup an /afs or /eos web directory on LXPLUS. If you have your own website where you would rather post the results, just use ```web/collectBenchmarks.sh``` to tidy up the plots into neat directories before sending them somewhere else. More info on this script is below.
0240
0241 The main script for collecting plots and sending them to LXPLUS can be called by:
0242
0243 ```
0244 ./web/move-benchmarks.sh ${outdir_name} ${suite} ${afs_or_eos}
0245 ```
0246
0247 where again, ```${suite}``` defaults to ```forPR```. ```${outdir_name}``` will be the top-level directory where the output is collected and eventually shipped to LXPLUS. This script first calls ```./web/collectBenchmarks.sh ${outdir_name} ${suite}```, which will sort the files, and then calls the script ```./web/copyphp.sh```, which copies ```web/index.php``` into the ```${outdir_name}``` to have a nice GUI on the web, and finally calls ```./web/tarAndSendToLXPLUS.sh ${outdir_name} ${suite} ${afs_or_eos}```, which packs up the top-level output dir and copies it to either an /afs or /eos userspace on LXPLUS.
0248
0249 The option ```${afs_or_eos}``` takes either of the following arguments: ```afs``` or ```eos```, and defaults to ```eos```. The mapping of the username to the remote directories is in ```web/tarAndSendToLXPLUS.sh```. If an incorrect string is passed, the script will exit.
0250
0251 **IMPORTANT NOTES**
0252 1) AFS is being phased out at CERN, so the preferred option is ```eos```.
0253
0254 2) There are some assumptions on the remote directory structure, naming, and files present in order for ```web/tarAndSendToLXPLUS.sh``` to work. Please consult ```web/README_WEBPLOTS.md``` for setting this up properly!
0255
0256 **IMPORTANT DISCLAIMERS**
0257
0258 1. There is a script: ```./xeon_scripts/trashSKL-SP.sh``` that is run at the very end of the ```./web/move-benchmarks.sh``` script that will delete log files, pngs, validation directories, root files, and the neat directory created to house all the validation plots.  This means that if the scp fails, the plots will still be deleted locally, and you will be forced to re-run the whole suite!!  You can of course comment this script out if this bothers you.
0259
0260 2. ```web/tarAndSendToLXPLUS.sh``` executes a script remotely on LXPLUS when using AFS, which makes the directory readable to outside world. If you are uncomfortable with this, you can comment it out. If your website is on EOS, then please ignore this disclaimer.
0261
0262 ### Section 5.iv: Interpreting the results
0263
0264 This section provides a brief overview in how to interpret the plots and logs from the tests that produced them. This section assumes the plots were organized with the ```web/collectBenchmarks.sh``` script.
0265
0266 #### Section 5.iv.a: Benchmark results
0267
0268 The "main" benchmark plots are organized into two folders:
0269 - Benchmarks: Will contain plots of the form ```${ben_arch}_CMSSW_TTbar_PU70_${ben_test}_${ben_trend}```
0270 - MultEvInFlight: Will contain plots of the form ```${ben_arch}_CMSSW_TTbar_PU70_MEIF_${ben_trend}```
0271
0272 where the variables in the plot names are:
0273 - ```${ben_arch}```: SNB (phi1 results), KNL (phi2 results), or SKL (phi3 results)
0274 - ```${ben_test}```: VU (vector units) or TH (threads)
0275 - ```${ben_trend}```: time or speedup, i.e. the y-axis points
0276
0277 The plots in "Benchmarks" measure the time of the building sections only. These tests run over 20 events total, taking the average to measure the per event time for each building section. We discard the first event's time when computing the timing. The logs used for extracting the time information into plots are of the form: ```log_${ben_arch}_CMSSW_TTbar_PU70_${build}_NVU${nVU}_NTH${nTH}.txt```, where ```${build}``` is the building routine tested.
0278
0279 The plots in "MultEvInFlight" measure the perfomance of the full event loop time which includes I/O, seed cleaning, etc. These tests run over 20 events times the number of events in flight. The time plotted is the total time for all events divided by the number of events.
0280
0281 The points in the speedup plots are simply produced by dividing the first point by each point in the trend. The ideal scaling line assumes that with an N increase in resources, the speedup is then N, i.e. the code is fully vectorized and parallelized with no stalls from memory bandwidth, latency, cache misses, etc. Ideal scaling also assumes no penalty from [dynamic frequency scaling](https://en.wikichip.org/wiki/intel/frequency_behavior). Intel lowers the base and turbo frequency as a function of the occupancy of the number of cores, which can make speedup plots look much worse than they really are. In addition, different instruction sets have different base and turbo frequency settings. Namely, SSE has the highest settings, AVX2 is at the midpoint, while AVX512 has the lowest.
0282
0283 The "VU" tests measure the performance of the building sections as a function of the vector width. In hardware, of course, vector width is a fixed property equal to the maximum number of floats that can be processed by a VPU. Call this number N_max. One can force the hardware to underutilize its VPUs by compiling the code with an older instruction set, e.g., SSE instead of AVX; however, this would have effects beyond just shrinking the vectors. Therefore, for our "VU" tests, we mimic the effect of reducing vector width by setting the width of Matriplex types to various nVU values up to and including N_max. At nVU=1, the code is effectively serial: the compiler might choose not to vectorize Matriplex operations at all. At the maximum size, e.g. nVU=16 on SKL, Matriplex operations are fully vectorized and the VPU can be fully loaded with 16 floats to process these operations. For intermediate values of nVU, full-vector instructions probably will be used, but they may be masked so that the VPU is in reality only partially utilized.
0284
0285 The vectorization tests only use a single thread. There is an additional point at the VU=N_max (SNB: 8, KNL, SKL: 16) with an open dot: this is a measure of the vectorization using intrinsics.
0286
0287 The "TH" tests measure the performance of the building sections as a function of the number of threads launched. These tests have vectorization fully enabled with instrinsics. It should be noted that we do not account for frequency scaling in the speedup plots.
0288
0289 The building section has sections of code that are inherently serial (hit chi2 comparisons, copying tracks, etc.), so the vectorization and parallelization is not perfect. However, it is important to consider the effect of [Amdahl's Law](https://en.wikipedia.org/wiki/Amdahl%27s_law). Amdahl's law can be rewritten as:
0290 ```
0291     1-1/S
0292 p = -----
0293     1-1/R
0294 ```
0295
0296 where, ```p``` is the fraction of the code that is vectorized/parallelized, ```S``` is the measured speedup, and ```R``` is the amount of speedup from increased resources. For example, we have seen that SKL clocks in at about a factor of three in speedup (S=3) for vectorization when fully vectorized (i.e. nVU=R=16), which suggests the code is 70% vectorized. Of course, this assumes no issues with memory bandwidth, cache misses, etc.
0297
0298 We have seen that moving from nVU=1 to nVU=2 the improvement is minimal (and sometimes a loss in performance). One hypothetical reason for this (yet unconfirmed) is that the compiler is using an instruction set other than expected: either finding a way to use vector instructions with nVU=1, or choosing not to vectorize at nVU=2. Furthermore, at run time, the CPU will adjust its frequency depending on the instruction set being used (it runs slower for wider vectors). At present, the exact reasons for the detailed shape of the speedup-vs.-nVU curve are unknown.
0299
0300 Lastly, it is important to consider the effects of hyperthreading in the "TH" tests. At nTH=number of cores, we typically see a clear discontinuation in the slope. The main hypothesis is that this is likely due to resource contention as two threads now share the same cache.
0301
0302 #### Section 5.iv.b: Validation results
0303
0304 The physics validation results are organized into two directories:
0305 - SimVal: SimTracks are used as the reference set of tracks
0306 - CMSSWVal: CMSSW tracks are used as the reference set of tracks
0307
0308 Three different matching criteria are used for making associations between reconstructed tracks and reference tracks. Many of the details are enumerated in the validation manifesto, however, for simplicity, the main points are listed here.
0309
0310 - CMSSW tracks = "initial step" CMSSW tracks, after fitting in CMSSW (i.e. includes outlier rejection)
0311 - Reference tracks must satisfy:
0312   - "Findable": require least 12 layers (includes 4 seed layers, so only 8 outside of the seed are required)
0313   - Sim tracks are required to have four hits that match a seed
0314 - To-be-validated reconstructed tracks must satisfy:
0315   - "Good" tracks: require at least 10 layers
0316   - If a mkFit track, 4 hits from seed are included in this 10. So, 6 additional hits must be found during building to be considered a "good" track.
0317   - If a CMSSW track, up to 4 hits are in included, as a seed hit may be missing from outlier rejection. So, a CMSSW track may have to find more than 6 layers during building to be considered a "good" track, as some hits from the seed may have been removed.
0318 - Matching Criteria:
0319   - SimVal: reco track is matched to a sim track if >= 50% of hits on reco track match hits from a single sim track, excluding hits from the seed
0320   - CMSSWVal + Build Tracks: reco track is matched to a CMSSW track if >= 50% of hits on reco track match hits from a single CMSSW track, excluding hits from the seed. Given that CMSSW can produce duplicates (although very low), if a reco track matches more than one CMSSW track, the CMSSW track with the highest match percentage is chosen.
0321   - CMSSWVal + Fit Tracks: reco track is matched to a CMSSW track via a set of binned helix chi2 (track eta and track pT) and delta phi cuts
0322 - Fake = reco track NOT matching a ref. track, excluding matching to non-findable tracks
0323 - Figures of merit:
0324   - Efficiency = fraction of findable ref. tracks matched to a reco track
0325   - Duplicate rate = fraction of matched ref. tracks with more than one match to a reco track
0326   - Fake rate = fraction of "good" reco tracks without a match to a ref. track
0327
0328 In case the MTV-like validation is selected with the option ```mtv-like-val```, the above requirements are replaced with the following:
0329 - Reference tracks:
0330   - Sim tracks required to come from the hard-scatter interaction, originate from R<3.5 cm and |z|<30 cm, and with pseudorapidity |eta|<2.5 (no requirement to have four hits that match a seed)
0331 - All reconstructed tracks are considered "To-be-validated"
0332 - Matching Criteria:
0333   - Reco track is matched to a sim track if > 75% of hits on reco track match hits from a single sim track (including hits from the seed)
0334
0335 There are text files within these directories that contain the average numbers for each of the figures of merit, which start with "totals\_\*.txt." In addition, these directories contain nHit plots, as well as kinematic difference plots for matched tracks. Best matched plots are for differences with matched reco tracks with the best track score if more than one reco track matches a ref. track.
0336
0337 #### Section 5.iv.c: Other plots
0338
0339 The last set of plots to consider are those that produce some kinematic distributions from the text file logs, in the directory: "PlotsFromDump." The distributions compare for each building routine run during the benchmarking the differences across platform and vector + thread setup. Ideally, the distributions should have all points lie on top of each other: there should be no dependency on platform or parallelization/vectorization setting for a specific track-finding routine. The text files that produce these plots have nearly the same form as those for benchmarking, except they also have "DumpForPlots" at the very end.
0340
0341 The subdirectory for "Diffs" in "PlotsFromDump" are kinematic difference plots between mkFit and CMSSW. The matching is simple: we compare mkFit to CMSSW tracks for those that share the exact same CMSSW seed (since we clean some seeds out and CMSSW does not produce a track for every seed as well). The printouts that produce the dump have info to compare to sim tracks using the standard 50% hit matching as done in the SimVal. However, we do not produce these plots as it is redundant to the diff plots already in the validation plots.
0342
0343 ## Section 6: Submit an issue
0344
0345 It may so happen that you discover a bug or that there is a known problem that needs further discussion outside of private emails/the main list-serv. If so, make sure to open issue on the main repo by clicking on "Issues" on GH, then "Open an issue".  Provide a descriptive title and a description of the issue. Provide reference numbers to relevant PRs and other Issues with"#<number>".  Include a minimal working example to reproduce the problem, attaching log files of error messages and/or plots demonstrating the problem.
0346
0347 Assign who you think is responsible for the code (which could be yourself!). If you have an idea that could solve the problem: propose it! If it requires a large change to the code, or may hamper performance in either physics or computing, make sure to detail the pros and cons of different approaches.
0348
0349 Close an issue after it has been resolved, providing a meaningful message + refence to where/how it was resolved.
0350
0351 ## Section 7: Condensed description of code
0352
0353 ### mkFit/mkFit.cc
0354
0355 This file is where the ```main()``` function is called for running the executable ```./mkFit/mkFit```. The ```main()``` call simply setups the command line options (and lists them), while the meat of the code is called via ```test_standard()```. Some of the command line options will set global variables within mkFit.cc, while others will set the value of variables in the ```Config``` nampespace. Options that require strings are mapped to via enums in the code, with the mapping specified via global functions at the top of mkFit.cc
0356
0357 ```test_standard()``` does the majority of the work: running the toy simulation, reading or writing binary files, and running the various tests. The outer loop is a TBB parallel-for over the number of threads used for running multiple-events-in-flight (MEIF). The default is one event in flight. The inner loop is over the number of events specified for that thread. The number of events in total to run over can be specified as a command line option. When running multiple-events-in-flight, in order to have reasonable statistics from variable load from different events, it is advised to have at least 20 events per thread.  When we refer to "total loop time" of the code, we are timing the inner loop section for each event, which includes I/O. However, for the sake of the plots, we simply sum the time for all events and all threads, and divide by the number of events run to obtain an average per event time.
0358
0359 Within the inner loop, a file is read in, then the various building and fitting tests are run. At the end of each event there is optional printout, as well as at the end of all tthe events for a thread. If running the validation with multiple-events-in-flight is enabled, you will have to ```hadd``` these files into one file before making plots. This is handled automatically within the scripts.
0360
0361 ### mkFit/buildtestMPlex.[h,cc]
0362
0363 This code calls the various building routines, setting up the event, etc. The functions defined here are called in mkFit.cc. Functions called within this file are from MkBuilder.
0364
0365 ### mkFit/MkBase.h + mkFit/MkFitter.[h,cc] + mkFit/MkFinder.[h,cc]
0366
0367 MkFinder and MkFitter derive from MkBase. High-level code for objects used by building and fitting routines in mkFit. These objects specify I/O operations from standard format to Matriplex format for different templated Matriplex objects (see Matrix[.h,.cc] for template definitions).
0368
0369 ### mkFit/MkBuilder.[h,cc]
0370
0371 Specifies building routines, seed prepping, validation prepping, etc. Code for building and backward fit routines using MkFinders, while seed fitting uses MkFitters. Objects from Event object are converted to their Matriplex-ready formats. Uses the layer plan to navigate which layer to go to for each track. Foos for the navigation are defined in SteerinParams.h.
0372
0373 ### Math/ directory
0374
0375 Contains SMatrix headers, used for some operations on track objects (mostly validation and deprecated SMatrix building code -- see below).
0376
0377 ### Matriplex/ directory
0378
0379 Contains low-level Matriplex library code for reading/writing into matriplex objects as well as elementary math operations (add, multiply). Includes perl scripts to autogenerate code based on matrix dimension size.
0380
0381 ### Geoms/ dir + TrackerInfo.[h,cc]
0382
0383 Geometry plugin info. TrackerInfo setups classes for layer objects. Geoms/ dir contains the actual layout (number scheme, layer attributes, etc) for each of the different geoemetries.
0384
0385 ### mkFit/PropagationMPlex.[h,cc,icc] + mkFit/KalmanUtilsMPlex.[h,cc,icc]
0386
0387 Underlying code for propagation and Kalman upate (gain) calculations in Matriplex form. The .icc files contain the low-level computations. Chi2 computations specified in KalmanUtilsMPlex.
0388
0389 ### mkFit/CandCloner.[h,cc]
0390
0391 Code used in Clone Engine for bookkeeping + copying candidates after each layer during building.
0392
0393 ### mkFit.HitStructures.[h,cc]
0394
0395 Specifies MkBuilder + Matriplex friendly data formats for hits. Hits are placed in these containers before building.
0396
0397 ### Event.[h,cc]
0398
0399 Most of the code is vestigial (see below). However, the Event object is a container for the different track collections and hit collection. There is code for seed processing, namely cleaning. There is also code relevant for validation and validation prep for different track collections.
0400
0401 ### Hit.[h,cc] + Track.[h,cc]
0402
0403 Contain the Hit, Track, and TrackExtra classes. These are the "native" formats read from the binary file (read in from the Tracking NTuple). In principle, since we are planning to migrate to CMSSW eventually, these classes (as well Event) may be trimmed to just read straight from CMSSW native formats.
0404
0405 - Hit object contains hit parameters, covariance, and a global ID. The global ID is used for gaining more information on the MC generation of that hit.
0406 - Track object is simply the track parameters, covariance, charge, track ID, and hit indices + layers.
0407 - TrackExtra contains additional information about each track, e.g. associated MC info, seed hits, etc. A Track's TrackExtra is accessed through the track label, which is the index inside the vector of tracks.
0408
0409 ### Config.[h,cc]
0410
0411 Contains the Config namespace. Specifies configurable parameters for the code. For example: number of candidates to create for each track, chi2 cut, number of seeds to process per thread, etc. Also contains functions used for dynamically setting other parameters based on options selected.
0412
0413 Tracker Geometry plugin also initialized here.
0414
0415 ### Validation code
0416
0417 Described in validation manifesto. See Section 8 for more info on manifesto.
0418
0419 ### TO DO
0420
0421 - flesh out sections as needed
0422 - GPU specific code?
0423
0424 ### Vestigial code
0425
0426 There are some sections of code that are not in use anymore and/or are not regularly updated. A short list is here:
0427 - main.cc : Old SMatrix implementation of the code, which is sometimes referred to as the "serial" version of the code.
0428 - USolids/ : Directory for implementing USolids geometry package. Originally implemented in SMatrix code.
0429 - seedtest[.h,.cc] : SMatrix seeding
0430 - buildtest[.h,.cc] : SMatrix building
0431 - fittest[.h,.cc] : SMatrix fitting
0432 - ConformalUtils[.h,.cc] : SMatrix conformal fitter for seeding/fitting
0433 - (possibly) Propagation[.h,.cc] : currently in use by the currently defunct Simulation[.h,.cc]. In reality, will probably move simulation code to MPlex format, which will deprecate this code.
0434 - KalmanUtils[.h,.cc] : SMatrix Kalman Update
0435 - mkFit/seedtestMPlex[.h,.cc] and all code in MkBuilder[.h,.cc] related to finding seeds with our own algorithm
0436 - mkFit/ConformalUtils[.h,.cc] : used by the seeding, although could be revived for fitting
0437 - additional val_scripts/ and web/ scripts not automatically updated outside of main benchmarking code
0438 - mtorture test/ code
0439
0440 ## Section 8: Other helpful README's in the repository
0441
0442 Given that this is a living repository, the comments in the code may not always be enough. Here are some useful other README's within this repo:
0443 - afer compiling the code, do: ```./mkFit/mkFit --help``` : Describes the full list of command line options, inputs, and defaults when running mkFit. The list can also be seen in the code in mkFit/mkFit.cc, although the defaults are hidden behind Config.[h,cc], as well as mkFit.cc.
0444 - cmssw-trackerinfo-desc.txt : Describes the structure of the CMS Phase-I geometry as represented within this repo.
0445 - index-desc.txt : Desribes the various hit and track indices used by different sets of tracks throughout the different stages of the read in, seeding, building, fitting, and validation.
0446 - validation-desc.txt : The validation manifesto: (somewhat) up-to-date description of the full physics validation suite. It is complemented by a somewhat out-of-date [code flow diagram](https://indico.cern.ch/event/656884/contributions/2676532/attachments/1513662/2363067/validation_flow_diagram-v4.pdf).
0447 - web/README_WEBPLOTS.md : A short markdown file on how to setup a website with an AFS or EOS directory on LXPLUS (best when viewed from a web browser, like this README).
0448
0449 ## Section 9: CMSSW integration
0450
0451 The supported CMSSW version is currently `11_2_0`. The
0452 integration of `mkFit` in CMSSW is based on setting it up as a CMSSW
0453 external.
0454
0455 ### Section 9.i: Considerations for `mkFit` code
0456
0457 The multi-threaded CMSSW framework, and the iterative nature of CMS
0458 tracking impose some constraints on `mkFit` code (that are not all met
0459 yet). Note that not all are mandatory per se, but they would make the
0460 life easier for everybody.
0461
0462 * A single instance of `mkFit` should correspond to a single track building iteration
0463 * There should be no global non-const variables
0464   - Currently there are non-const global variables e.g. in `Config` namespace
0465 * All iteration-specific parameters should be passed from CMSSW to `mkFit` at run time
0466
0467 ### Section 9.ii: Building and setting up `mkFit` for CMSSW
0468
0469 #### Section 9.ii.a: Build `mkFit`
0470
0471 To be used from CMSSW the `mkFit` must be built with the CMSSW
0472 toolchain. Assuming you are in an empty directory, the following
0473 recipe will set up a CMSSW developer area and a `mkFit` area there,
0474 and compile `mkFit` using the CMSSW toolchain.
0475
0476 **Note:** The recipes have been tested on `lxplus` and on `phi3`.
0477 Currently there is no working recipe to compile with `icc` on LPC.
0478
0479 ##### Section 9.ii.a.a: Lxplus
0480
0481 ```bash
0482 cmsrel CMSSW_11_2_0
0483 pushd CMSSW_11_2_0/src
0484 cmsenv
0485 git cms-init
0486 popd
0487 git clone git@github.com:trackreco/mkFit
0488 pushd mkFit
0489 make -j 12 TBB_PREFIX=$(dirname $(cd $CMSSW_BASE && scram tool tag tbb INCLUDE)) CXX=g++ WITH_ROOT=1 VEC_GCC="-march=core2"
0490 popd
0491 ```
0492
0493 ##### Section 9.ii.a.b: Phi3
0494
0495 ```bash
0496 source /cvmfs/cms.cern.ch/cmsset_default.sh
0497 source /opt/intel/bin/compilervars.sh intel64
0498 export SCRAM_ARCH=slc7_amd64_gcc900
0499 cmsrel CMSSW_11_2_0
0500 pushd CMSSW_11_2_0/src
0501 cmsenv
0502 git cms-init
0503 popd
0504 git clone git@github.com:trackreco/mkFit
0505 pushd mkFit
0506 # for gcc CMSSW "default" build:
0507 #   1) call "unset INTEL_LICENSE_FILE", or do not source compilevars.sh above
0508 #   2) replace AVX* with VEC_GCC="-msse3"
0509 make -j 12 TBB_PREFIX=$(dirname $(cd $CMSSW_BASE && scram tool tag tbb INCLUDE)) WITH_ROOT=1 AVX2:=1
0510 popd
0511 ```
0512
0513 #### Section 9.ii.b: Set up `mkFit` as an external
0514
0515 Assuming you are in the aforementioned parent directory, the following
0516 recipe will create a scram tool file, and set up scram to use it
0517
0518 ```bash
0519 pushd CMSSW_11_2_0/src
0520 cat <<EOF >mkfit.xml
0521 <tool name="mkfit" version="1.0">
0522   <client>
0523     <environment name="MKFITBASE" default="$PWD/../../mkFit"/>
0524     <environment name="LIBDIR" default="\$MKFITBASE/lib"/>
0525     <environment name="INCLUDE" default="\$MKFITBASE"/>
0526   </client>
0527   <runtime name="MKFIT_BASE" value="\$MKFITBASE"/>
0528   <lib name="MicCore"/>
0529   <lib name="MkFit"/>
0530 </tool>
0531 EOF
0532 scram setup mkfit.xml
0533 cmsenv
0534 ```
0535
0536 #### Section 9.ii.c: Pull CMSSW code and build
0537
0538 The following recipe will pull the necessary CMSSW-side code and build it
0539
0540 ```bash
0541 # in CMSSW_11_2_0/src
0542 git cms-remote add trackreco
0543 git fetch trackreco
0544 git checkout -b CMSSW_11_2_0_mkFit_X trackreco/CMSSW_11_2_0_mkFit_X
0545 git cms-addpkg $(git diff $CMSSW_VERSION --name-only | cut -d/ -f-2 | uniq)
0546 git cms-checkdeps -a
0547 scram b -j 12
0548 ```
0549
0550 ### Section 9.iii Recipes for the impatient on phi3
0551
0552 #### Section 9.iii.a: Offline tracking
0553
0554 `trackingOnly` reconstruction, DQM, and VALIDATION.
0555
0556 ```bash
0557 # in CMSSW_11_2_0/src
0558
0559 # sample = 10mu, ttbarnopu, ttbarpu35, ttbarpu50, ttbarpu70
0560 # mkfit = 'all', 'InitialStep', ..., 'InitialStep,LowPtQuadStep', ..., ''
0561 # timing = '', 'framework', 'FastTimerService'
0562 # (maxEvents = 0, <N>, -1)
0563 # nthreads = 1, <N>
0564 # nstreams = 0, <N>
0565 # trackingNtuple = '', 'generalTracks', 'InitialStep', ...
0566 # jsonPatch = '', <path-to-JSON-file>
0567 # for core pinning prepend e.g. for nthreads=8 "taskset -c 0,32,1,33,2,34,3,35"
0568 #     0,32 will correspond to the same physical core with 2-way hyperthreading
0569 #     the step is 32 for phi3; check /proc/cpuinfo for same physical id
0570 cmsRun RecoTracker/MkFit/test/reco_cfg.py sample=ttbarpu50 timing=1
0571 ```
0572 * The default values for the command line parameters are the first ones.
0573 * `mkfit=1` runs MkFit, `0` runs CMSSW tracking
0574 * The job produces `step3_inDQM.root` that needs to be "harvested" to
0575   get a "normal" ROOT file with the histograms.
0576 * If `maxEvents` is set to `0`, the number of events to be processed
0577   is set to a relatively small value depending on the sample for short
0578   testing purposes.
0579 * Setting `maxEvents=-1` means to process all events.
0580 * `nthreads` sets the number of threads (default 1), and `nstreams`
0581   the number of EDM streams (or events in flight, default 0, meaning
0582   the same value as the number of threads)
0583 * [TrackingNtuple](https://github.com/cms-sw/cmssw/blob/master/Validation/RecoTrack/README.md#ntuple)
0584   can be enabled either for general tracks (`generalTracks`) for for
0585   individual iterations (e.g. `InitialStep`). See
0586   [here](https://github.com/cms-sw/cmssw/blob/master/Validation/RecoTrack/README.md#using-tracks-from-a-single-iteration-as-an-input)
0587   for how the track selection MVA and vertex collection are set
0588   differently between the two modes.
0589 * Iteration configuration can be patched with a JSON file with
0590   `jsonPatch` parameter (corresponds to `--json-patch` in the
0591   standalone program)
0592
0593 DQM harvesting
0594 ```bash
0595 cmsRun RecoTracker/MkFit/test/reco_harvest_cfg.py
0596 ```
0597 * Produces `DQM_V0001_R000000001__Global__CMSSW_X_Y_Z__RECO.root`
0598
0599 Producing plots
0600 ```bash
0601 makeTrackValidationPlots.py --extended --ptcut <DQM file> [<another DQM file>]
0602 ```
0603 * Produces `plots` directory with PDF files and HTML pages for
0604   navigation. Copy the directory to your web area of choice.
0605 * See `makeTrackValidationPlots.py --help` for more options
0606
0607 #### Section 9.iii.b HLT tracking (iter0)
0608
0609 **Note: this subsection has not yet been updated to 11_2_0**
0610
0611 HLT reconstruction
0612
0613 ```bash
0614 # in CMSSW_10_4_0_patch1/src
0615
0616 # in addition to the offline tracking options
0617 # hltOnDemand = 0, 1
0618 # hltIncludeFourthHit = 0, 1
0619 cmsRun RecoTracker/MkFit/test/hlt_cfg.py sample=ttbarpu50 timing=1
0620 ```
0621 * The default values for the command line parameters are the first ones.
0622 * For options that are same as in offline tracking, see above
0623 * Setting `hltOnDemand=1` makes the strip local reconstruction to be
0624   run in the "on-demand" mode (which is the default in real HLT but
0625   not here). Note that `hltOnDemand=1` works only with `mkfit=0`.
0626 * Setting `hltIncludeFourthHit=1` changes the (HLT-default) behavior
0627   of the EDProducer that converts pixel tracks to `TrajectorySeed`
0628   objects to include also the fourth, outermost hit of the pixel track
0629   in the seed.
0630
0631 DQM harvesting (unless running timing)
0632 ```bash
0633 cmsRun RecoTracker/MkFit/test/hlt_harvest.py
0634 ```
0635
0636 Producing plots (unless running timing)
0637 ```bash
0638 makeTrackValidationPlots.py --extended <DQM file> [<another DQM file>]
0639 ```
0640
0641 ### Section 9.iv More thorough instructions
0642
0643 #### Section 9.iv.a: Offline tracking
0644
0645 **Note: this subsection has not yet been updated to 11_2_0**
0646
0647 The example below uses 2018 tracking-only workflow
0648
0649 ```bash
0650 # Generate configuration
0651 runTheMatrix.py -l 10824.1 --apply 2 --command "--customise RecoTracker/MkFit/customizeInitialStepToMkFit.customizeInitialStepToMkFit --customise RecoTracker/MkFit/customizeInitialStepOnly.customizeInitialStepOnly" -j 0
0652 cd 10824.1*
0653 # edit step3*RECO*.py to contain your desired (2018 RelVal MC) input files
0654 cmsRun step3*RECO*.py
0655 ```
0656
0657 The customize function replaces the initialStep track building module
0658 with `mkFit`. In principle the customize function should work with any
0659 reconstruction configuration file.
0660
0661 By default `mkFit` is configured to use Clone Engine with N^2 seed
0662 cleaning, and to do the backward fit (to the innermost hit) within `mkFit`.
0663
0664 For profiling it is suggested to replace the
0665 `customizeInitialStepOnly` customize function with
0666 `customizeInitialStepOnlyNoMTV`. See below for more details.
0667
0668 ##### Section 9.iv.a.a: Customize functions
0669
0670 * `RecoTracker/MkFit/customizeInitialStepOnly.customizeInitialStepOnly`
0671   * Run only the initialStep tracking. In practice this configuration
0672     runs the initialStepPreSplitting iteration, but named as
0673     initialStep. MultiTrackValidator is included, and configured to
0674     monitor initialStep. Intended to provide the minimal configuration
0675     for CMSSW tests.
0676 * `RecoTracker/MkFit/customizeInitialStepOnly.customizeInitialStepOnlyNoMTV`
0677   * Otherwise same as `customizeInitialStepOnly` except drops
0678     MultiTrackValidator. Intended for profiling.
0679
0680 ##### Section 9.iv.a.b: Timing measurements
0681
0682 There are several options for the CMSSW module timing measurements:
0683
0684 - [FastTimerService](https://twiki.cern.ch/twiki/bin/viewauth/CMS/FastTimerService)
0685   * Produces timing measurements as histograms in the DQM root file
0686   * `makeTrackValidationPlots.py` (see next subsection) produces plots of those
0687      - "Timing" -> "iterationsCPU.pdf", look for "initialStep" histogram and "Building" bin
0688 - Framework report `process.options = cms.untracked.PSet(wantSummary = cms.untracked.bool(True))`
0689   * Prints module timings to the standard output
0690   * Look for the timing of `initialStepTrackCandidates`
0691 - [Timing module](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideEDMTimingAndMemory)
0692   * Prints module timings to the standard output
0693   * Look for the timing of `initialStepTrackCandidates`
0694
0695
0696 #### Section 9.iv.a.c: Producing MultiTrackValidator plots
0697
0698 The `step3` above runs also the [MultiTrackValidator](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideMultiTrackValidator).
0699
0700 To produce the plots, first run the DQM harvesting step
0701
0702 ```bash
0703 cmsRun step4_HARVESTING.py
0704 ```
0705
0706 which produces a `DQM_V0001_R000000001__Global__CMSSW_X_Y_Z__RECO.root` file that contains all the histograms. Rename the file to something reflecting the contents, and run
0707
0708 ```bash
0709 makeTrackValidationPlots.py --extended --limit-tracking-algo initialStep <DQM file> [<another DQM file> ...]
0710 ```
0711
0712 The script produces a directory `plots` that can be copied to any web
0713 area. Note that the script produces an `index.html` to ease the
0714 navigation.
0715
0716 ### Section 9.v: Interpretation of results
0717
0718 #### Section 9.v.a: MultiTrackValidator plots
0719
0720 As the recipe above replaces the initialStep track building, we are
0721 interested in the plots of "initialStep" (in the main page), and in
0722 the iteration-specific page the plots on the column "Built tracks".
0723 Technically these are the output of the final fit of the initialStep,
0724 but the difference wrt. `TrackCandidate`s of `MkFitProducer` should be
0725 negligible.
0726
0727 In short, the relevant plots are
0728 - `effandfake*` show efficiency and fake+duplicate rate vs. various quantities
0729 - `dupandfake*` show fake, duplicate, and pileup rates vs. various quantities (pileup rate is not that interesting for our case)
0730 - `distsim*` show distributions for all and reconstructed TrackingParticles (numerators and denominators of efficiencies)
0731 - `dist*` show distributions for all, true, fake, and duplicate tracks (numerators and denominators of fake and duplicate rates)
0732 - `hitsAndPt` and hitsLayers shows various information on hits and layers
0733 - `resolutions*` show track parameter resolutions vs eta and pT
0734 - `residual*` show track parameter residuals (bias) vs eta and pT
0735 - `pulls` shows track parameter pulls
0736 - `tuning` shows chi2/ndof, chi2 probability, chi2/ndof vs eta and pT residual
0737 - `mva1*` show various information on the BDT track selection
0738
0739 The tracking MC truth matching criteria are different from the mkFit
0740 SimVal. In MTV a track is classified as a "true track" (and a matched
0741 SimTrack as "reconstructed") if more than 75 % of the clusters of the
0742 track are linked to a single SimTrack. A cluster is linked to a
0743 SimTrack if the SimTrack has induced any amount of charge to any of
0744 the digis (= pixel or strip) of the cluster.
0745
0746 #### Section 9.v.b: Timing
0747
0748 When looking the per-module timing numbers, please see the following
0749 table for the relevant modules to look for, and what is their purpose.
0750
0751 | **Module in offline** | **Module in HLT** | **Description** |
0752 |-----------------------|-------------------|-----------------|
0753 | `initialStepTrackCandidatesMkFitInput` | `hltIter0PFlowCkfTrackCandidatesMkFitInput` | Input data conversion |
0754 | `initialStepTrackCandidatesMkFit` | `hltIter0PFlowCkfTrackCandidatesMkFit` | MkFit itself |
0755 | `initialStepTrackCandidates` | `hltIter0PFlowCkfTrackCandidates` | Output data conversion |
0756
0757 The MTV timing plot of initialStep "Building" includes the
0758 contributions of all three modules.
0759
0760
0761
0762 ## Section 10: Other useful information
0763
0764 ### Section 10.i: Important Links
0765
0766 Project Links
0767 - [Main development GitHub](https://github.com/trackreco/mkFit)
0768 - [Our project website](https://trackreco.github.io) and the [GH repo](https://github.com/trackreco/trackreco.github.io-source) hosting the web files. Feel free to edit the website repo if you have contributed a presentation, poster, or paper.
0769 - Out-of-date and no longer used [project twiki](https://twiki.cern.ch/twiki/bin/viewauth/CMS/MicTrkRnD)
0770 - [Indico meeting page](https://indico.cern.ch/category/8433)
0771 - Vidyo room: Parallel_Kalman_Filter_Tracking
0772 - Email list-serv: mic-trk-rd@cern.ch
0773
0774 Other Useful References
0775 - [CMS Run1 Tracking Paper](https://arxiv.org/abs/1405.6569)
0776 - [CMS Public Tracking Results](https://twiki.cern.ch/twiki/bin/view/CMSPublic/PhysicsResultsTRK)
0777 - [Kalman Filter in Particle Physics, paper by Rudi Fruhwirth](https://inspirehep.net/record/259509?ln=en)
0778 - [Kalman Filter explained simply](https://128.232.0.20/~rmf25/papers/Understanding%20the%20Basis%20of%20the%20Kalman%20Filter.pdf)
0779
0780 ### Section 10.ii: Tips and Tricks
0781
0782 #### Section 10.ii.a: Missing Libraries and Debugging
0783
0784 When sourcing the environment on phi3 via ```source xeon_scripts/init-env.sh```, some paths will be unset and access to local binaries may be lost. For example, since we source ROOT (and its many dependencies) over CVMFS, there may be some conflicts in loading some applications. In fact, the shell may complain about missing environment variables (emacs loves to complain about TIFF). The best way around this is to simply use CVMFS as a crutch to load in what you need.
0785
0786 This is particularly noticeable when trying to run a debugger. To compile the code, at a minimum, we must source icc + toolkits that give us libraries for c++14. We achieve this through the dependency loading of ROOT through CVMFS (previously, we sourced devtoolset-N to grab c++14 libraries).
0787
0788 After sourcing and compiling and then running only to find out there is some crash, when trying to load ```mkFit``` into ``gdb`` via ```gdb ./mkFit/mkFit```, it gives rather opaque error messages about missing Python paths.
0789
0790 This can be overcome by loading ```gdb``` over CVMFS: ```source /cvmfs/cms.cern.ch/slc7_amd64_gcc630/external/gdb/7.12.1-omkpbe2/etc/profile.d/init.sh```. At this point, the application will run normally and debugging can commence.
0791
0792 #### Section 10.ii.b: SSH passwordless login for benchmarking scripts and web scripts
0793
0794 When running the benchmarks, a tarball of the working directory will be ```scp```'ed to phi2 and phi1 before running tests on phi3. After the tests complete on each platform, the log files will be ```scp```'ed back to phi3 concurrently. If you do not forward your ssh keys upon login to phi3, you will have to enter your password when first shipping the code over to phi2 and phi1, and also, at some undetermined point, enter it again to receive the logs.
0795
0796 With your favorite text editor, enter the text below into ```~/.ssh/config``` on your local machine to avoid having to type in your password for login to any phi machine (N.B. some lines are optional):
0797
0798 ```
0799 Host phi*.t2.ucsd.edu
0800      User <phi* username>
0801      ForwardAgent yes
0802 # lines below are for using X11 on phi* to look at plots, open new windows for emacs, etc.
0803      ForwardX11 yes
0804      XAuthLocation /opt/X11/bin/xauth
0805 # lines below are specific to macOS
0806      AddKeysToAgent yes
0807      UseKeychain yes
0808 ```
0809
0810 After the benchmarks run, you may elect to use the ```web/``` scripts to transfer plots to CERN website hosted on either LXPLUS EOS or AFS. The plots will be put into a tarball, ```scp```'ed over, and then untarred remotely via ```ssh```. To avoid typing in your password for the ```web/``` scripts, you will need to use a Kerberos ticket and also modify your ```.ssh/config``` file in your home directory on the _phi_ machines with the text below:
0811
0812 ```
0813 Host lxplus*.cern.ch
0814      User <lxplus username>
0815      ForwardAgent yes
0816      ForwardX11 yes
0817      GSSAPIAuthentication yes
0818      GSSAPIDelegateCredentials yes
0819 ```
0820
0821 The last two lines are specific to Kerberos's handling of ssh, which is installed on all of the _phi_ machines. In order to open a Kerberos ticket, you will need to do:
0822
0823 ```
0824 kinit -f <lxplus username>@CERN.CH
0825 ```
0826
0827 and then enter your LXPLUS password. Kerberos will keep your ticket open for a few days to allow passwordless ```ssh``` into LXPLUS. After the ticket expires, you will need to enter that same command again. So, even if you only send plots once every month to LXPLUS, this reduces the number of times of typing in your LXPLUS password from two to one :).
0828
0829 ### Section 10.iii: Acronyms/Abbreviations:
0830
0831 [Glossary of acronyms from CMS](https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookGlossary)
0832
0833 - AVX: Advanced Vector Extensions [flavors of AVX: AVX, AVX2, AVX512]
0834 - BH: Best Hit (building routine that selects only the best hit per layer when performing track building)
0835 - BkFit: (B)ac(k)wards Fit, i.e. perform a KF fit backwards from the last layer on the track to the first layer / PCA
0836 - BS: Beamspot (i.e. the luminous region of interactions)
0837 - CCC: Charge Cluster Cut, used to remove hits that come from out-of-time pileup
0838 - CE: Clone Engine (building routine that keeps N candidates per seed, performing the KF update after hits have been saved)
0839 - CMS: Compact Muon Solenoid
0840 - CMSSW: CMS Software
0841 - CMSSWVal: CMSSWTrack Validation, use cmssw tracks as reference set of tracks for association
0842 - FV: Full Vector (building routine that uses a clever way of filling Matriplexes of tracks during track building to boost vectorization, current status: deprecated)
0843 - GH: GitHub
0844 - GPU: Graphical Processing Unit
0845 - GUI: Graphical User Interface
0846 - KF: Kalman Filter
0847 - KNL: Knights Landing
0848 - MEIF: Multiple-Events-In-Flight (method for splitting events into different tasks)
0849 - mkFit: (m)atriplex (k)alman filter (Fit)
0850 - MP: Multi-Processing
0851 - MTV: MultiTrackValidator
0852 - N^2: Local seed cleaning algorithm developed by Mario and Slava
0853 - PCA: Point of closest approach to either the origin or the BS
0854 - PR: Pull Request
0855 - Reco: Reconstruction
0856 - SimVal: SimTrack validation, use simtracks as reference set of tracks for association
0857 - SKL-SP: Skylake Scalable Performance
0858 - SNB: Sandy Bridge
0859 - SSE: Streaming SIMD Extensions
0860 - STD: Standard (building routine, like Clone Engine, but performs KF update before hits are saved to a track)
0861 - TBB: (Intel) Threaded Building Blocks, open source library from Intel to perform tasks in a multithreaded environment
0862 - TH: Threads
0863 - VU: (loosely) Vector Units