BeamSpotProducer/scripts/READMEMegascript.txt

0001 1) Manually running the BeamSpotWorkflow.py script
0002
0003 Just typing BeamSpotWorkflow.py -h will show the possible options of the script
0004
0005 The 3 most common options are
0006 -z -> changes the sigmaZ form the calculated value to 10cm
0007 -u -> upload the valuse into the DB
0008
0009 -c -> allow to specify your custom cfg file otherwise it uses the default BeamSpotWorkflow.cfg
0010
0011 Example:
0012 ./BeamSpotWorkflow.py -c BeamSpotWorkflow_run.cfg -z -u
0013
0014 2)Cfg file structure: (extra lines can be commented with a # at the beginning)
0015
0016 a) SOURCE_DIR  = /castor/cern.ch/cms/store/caf/user/uplegger/Workflows/361_patch4/express_T0_v11/
0017    Any directory ( castor or hard disk) where you have the txt files produced by the CMSSW beamspot workflow
0018
0019 b) ARCHIVE_DIR = /afs/cern.ch/cms/CAF/CMSCOMM/COMM_BSPOT/automated_workflow/good_archive/
0020    Any directory where you want to store the beamspot files. The files from SOURCE_DIR will be copied to the ARCHIVE_DIR
0021
0022 c) WORKING_DIR = /afs/cern.ch/cms/CAF/CMSCOMM/COMM_BSPOT/automated_workflow/good_tmp
0023    After the files are copied in the ARCHIVE_DIR, then they will be copied in the WORKING_DIR. Every time you run the script, the
0024    WORKING_DIR will be WIPED OUT first. In case you are running MORE SCRIPTS AT THE SAME TIME you can keep the same ARCHIVE_DIR but
0025    you MUST to use a different WORKING_DIR for each script to avoid conflicts
0026
0027 d) DBTAG       = BeamSpotObjects_2009_v14_offline
0028    Database tag you want to update. Currently we have (BeamSpotObjects_2009_v14_offline, BeamSpotObjects_2009_SigmaZ_v14_offline,
0029    BeamSpotObjects_2009_lumi_v14_offline, BeamSpotObjects_2009_lumi_SigmaZ_v14_offline. I use BeamSpotObjects_2009_v13_offline for testing)
0030
0031 e) DATASET     = /StreamExpress/Run2010A-TkAlMinBias-v4/ALCARECO
0032    Dataset which your txt files have been produced by. You can specify multiple DATASETS splitting them with a comma(,) like:
0033   /StreamExpress/Commissioning10-StreamTkAlMinBias-v7/ALCARECO,
0034   /StreamExpress/Commissioning10-StreamTkAlMinBias-v8/ALCARECO,
0035   /StreamExpress/Commissioning10-StreamTkAlMinBias-v9/ALCARECO,
0036   /StreamExpress/Run2010A-StreamTkAlMinBias-v1/ALCARECO,
0037   /StreamExpress/Run2010A-TkAlMinBias-v2/ALCARECO,
0038   /StreamExpress/Run2010A-TkAlMinBias-v3/ALCARECO,
0039   /StreamExpress/Run2010A-TkAlMinBias-v4/ALCARECO
0040   This is a very nice feature when you are reprocessing the all dataset
0041
0042 f) FILE_IOV_BASE = lumibase
0043    The iov base of the txt files. Recently we have been produceing files fitting every lumisection so it has been for a long time lumibase ( can be runbase)
0044
0045 g) DB_IOV_BASE   = runnumber
0046    Iov base in the database for the tag you want to upload. Right now the official tag has runnumber iovs. The other possibility is lumiid
0047
0048 h) DBS_TOLERANCE_PERCENT = 10
0049    Percentage of missing lumisection that can be tolerated between the lumi section processed and the ones that dbs says should have been processed.
0050    When querying dbs the script asks how many lumisections were present in the files that the workflow processed. The number of lumi processed and the one
0051    in dbs should always match but unfortunately it is not the case. 10% should let you pass all the files that have been processed so far.
0052
0053 i) DBS_TOLERANCE = 20
0054    Number of missing lumisection that can be tolerated between the lumi section processed and the ones that dbs says should have been processed.
0055    Sometimes a run  has few lumisections so in case the workflow doesn't process a few, the percentage of not processed lumis doesn't pass the
0056    previous tolerance.
0057
0058 l) RR_TOLERANCE = 10
0059    Percentage of missing lumisection that can be tolerated between the lumi section processed and the ones that are considered good in the run registry.
0060    If there are too many lumis unprocessed, when comapared to dbs, the script check if the ones that have been processed at least cover the
0061    one that are considered good in the run registry.
0062
0063 m) MISSING_FILES_TOLERANCE = 2
0064    Number of missing files that can be tolerated before the script can continue. It is important to keep this number low 2 max 3 especially
0065    when running it in a cron job. In fact, ithe script can be triggered when few files are still being processed and you don't want to do that
0066    if the number of missing files is still big.
0067
0068 n) MISSING_LUMIS_TIMEOUT = 14400
0069    There are few timeouts in the script (for example if there are still many files missing), and after a certain number of seconds = MISSING_LUMIS_TIMEOUT
0070    hte script keep running. MISSING_LUMIS_TIMEOUT = 0 doesn't produce a timeout and continue the script!
0071
0072 o) EMAIL       = uplegger@cern.ch,yumiceva@fnal.gov
0073    Comma separated list of people who will receive an e-mail in case of big troubles. There are some conditions that must be validated by
0074    a person so typically the script stop working and send an e-mail to the persons in this list who will have to take action.
0075
0076 3) Cron job shell script.
0077    In python/tools there is the beamspotWorkflow_cron.sh shell script which runs the workflow automatically.
0078
0079 //--------------------------------------------------------------------------------------------------
0080    export STAGE_HOST=castorcms.cern.ch
0081    source /afs/cern.ch/cms/sw/cmsset_default.sh
0082    cd /afs/cern.ch/user/u/uplegger/scratch0/CMSSW/CMSSW_3_6_1_patch4/src/
0083    logFileName="/afs/cern.ch/user/u/uplegger/www/Logs/MegaScriptLog.txt"
0084    echo >> $logFileName
0085    echo "Begin running the script on " `date` >> $logFileName
0086    if [ ! -e .lock ]
0087    then
0088      touch .lock
0089      eval `scramv1 runtime -sh`
0090      python $CMSSW_BASE/src/RecoVertex/BeamSpotProducer/scripts/BeamSpotWorkflow_T0.py -u -c BeamSpotWorkflow_T0.cfg >> $logFileName
0091      rm .lock
0092    else
0093      echo "There is already a megascript runnning...exiting" >> $logFileName
0094    fi
0095    echo "Done on " `date` >> $logFileName
0096 //--------------------------------------------------------------------------------------------------
0097
0098    REMEMBER:
0099    a) cd /afs/cern.ch/user/u/uplegger/scratch0/CMSSW/CMSSW_3_6_1_patch4/src/
0100       is the CMSSW area where your script is!
0101    b) logFileName="/afs/cern.ch/user/u/uplegger/www/Logs/MegaScriptLog.txt"
0102       is my area which is web accessible, so I can check the output of the script once in a while
0103    c) python $CMSSW_BASE/src/RecoVertex/BeamSpotProducer/scripts/BeamSpotWorkflow_T0.py -u -c BeamSpotWorkflow_T0.cfg >> $logFileName
0104       Runs the script WITH the BeamSpotWorkflow_T0.cfg cfg file and saves the output in the logfilename that I can check online
0105    d) if [ ! -e .lock ] then touch .lock
0106       It creates a .lock file in /afs/cern.ch/user/u/uplegger/scratch0/CMSSW/CMSSW_3_6_1_patch4/src/
0107       This lock file prevent 2 megascripts to run at the same time. It is in the shell script so should be removed 99.9% of the times
0108       but it already happened to me that it was not removed once.
0109
0110
0111 4) Running the cron job:
0112    acrontab -e
0113    let you edit your cron jobs while
0114    acrontab -l
0115    shows what your cron job file is.
0116 //--------------------------------------------------------------------------------------------------
0117    5 * * * * lxplus258 /afs/cern.ch/user/u/uplegger/scratch0/CMSSW/CMSSW_3_6_1_patch4/src/RecoVertex/BeamSpotProducer/python/tools/beamspotWorkflow_cron.sh >& /afs/cern.ch/user/u/uplegger/www/Logs/CronJob.log
0118    25 * * * * lxplus301 /afs/cern.ch/user/u/uplegger/scratch0/CMSSW/CMSSW_3_6_1_patch4/src/RecoVertex/BeamSpotProducer/python/tools/beamspotWorkflow_cron.sh >& /afs/cern.ch/user/u/uplegger/www/Logs/CronJob.log
0119    45 * * * * lxplus256 /afs/cern.ch/user/u/uplegger/scratch0/CMSSW/CMSSW_3_6_1_patch4/src/RecoVertex/BeamSpotProducer/python/tools/beamspotWorkflow_cron.sh >& /afs/cern.ch/user/u/uplegger/www/Logs/CronJob.log
0120    3 0,13 * * * lxplus301 /afs/cern.ch/user/u/uplegger/scratch0/CMSSW/CMSSW_3_6_1_patch4/src/RecoVertex/BeamSpotProducer/python/tools/mvLogFile_cron.sh
0121 //--------------------------------------------------------------------------------------------------
0122    Right now I am running the megascript cron job from 3 different machines every 20 minutes.
0123    I am also running twice a day another script that moves the log files away to keep the one on the web small.
0124
0125
0126 5) The way I run everything.
0127    a) Every few days I run the workflow at T0. This is my crab cfg
0128 //--------------------------------------------------------------------------------------------------
0129 [CRAB]
0130 jobtype              = cmssw
0131 scheduler            = caf
0132 server_name          = caf_test
0133
0134 [CAF]
0135 queue                = cmscaf1nd
0136
0137
0138 [CMSSW]
0139
0140 #datasetpath          = /MinimumBias/BeamCommissioning09-StreamTkAlMinBias-Dec19thReReco_341_v1/ALCARECO
0141 #datasetpath          = /MinimumBias/BeamCommissioning09-StreamTkAlMinBias-Dec19thReReco_341_v1/ALCARECO-TEST-1102
0142 #datasetpath = /MinimumBias/BeamCommissioning09-StreamTkAlMinBias-Dec19thReReco_341_v1/ALCARECO-TEST-Run[0-9]*-1503
0143 #datasetpath = /MinimumBias/BeamCommissioning09-StreamTkAlMinBias-Mar3rdReReco_v2/ALCARECO
0144 #datasetpath = /StreamExpress/Commissioning10-StreamTkAlMinBias-v9/ALCARECO
0145 #datasetpath = /StreamExpress/Run2010A-StreamTkAlMinBias-v1/ALCARECO
0146 #datasetpath = /StreamExpress/Run2010A-TkAlMinBias-v4/ALCARECO
0147 #datasetpath = /StreamExpress/Run2010B-TkAlMinBias-v1/ALCARECO
0148 datasetpath = /StreamExpress/Run2010B-TkAlMinBias-v2/ALCARECO
0149
0150 pset                 = BeamFit_LumiBased_Workflow.py
0151
0152 get_edm_output       = 1
0153 output_file          = BeamFit_LumiBased_Workflow.txt,BeamFit_LumiBased_Workflow.root
0154
0155 [USER]
0156 ui_working_dir       = crab_LumiBased_express_T0_v3
0157 # return data to local disk, change to 1
0158 return_data          = 0
0159 #user_remote_dir      = ShortWorkflow
0160 # return data to SE, change to 1
0161 copy_data            = 1
0162 storage_element      = T2_CH_CAF
0163 # area /castor/cern.ch/cms/store/caf/user/uplegger/Workflows/RunBased
0164 user_remote_dir      = Workflows/381_patch3/express_T0_v3
0165
0166 [WMBS]
0167
0168 automation           = 1
0169 feeder               = T0AST
0170 #feeder               = DBS
0171 startrun             = 149415
0172 splitting_algorithm  = RunBased
0173 split_per_job        = files_per_job
0174 split_value          = 1
0175 processing           = express
0176
0177 //--------------------------------------------------------------------------------------------------
0178    b) I start the cron jobs:
0179      acrontab -e
0180      I uncomment the lines that I care and save with ctrl-O
0181
0182 //--------------------------------------------------------------------------------------------------
0183      using the following cfg file (BeamSpotWorkflow_T0.cfg)
0184      [Common]
0185      SOURCE_DIR  = /castor/cern.ch/cms/store/caf/user/uplegger/Workflows/381_patch3/express_T0_v3/
0186      ARCHIVE_DIR = /afs/cern.ch/cms/CAF/CMSCOMM/COMM_BSPOT/automated_workflow/good_archive/
0187      WORKING_DIR = /afs/cern.ch/cms/CAF/CMSCOMM/COMM_BSPOT/automated_workflow/good_tmp
0188      DBTAG       = BeamSpotObjects_2009_v13_offline
0189      DATASET     = /StreamExpress/Run2010A-TkAlMinBias-v4/ALCARECO
0190      FILE_IOV_BASE = lumibase
0191      #DB_IOV_BASE   = lumiid
0192      DB_IOV_BASE   = runnumber
0193      DBS_TOLERANCE_PERCENT = 10
0194      DBS_TOLERANCE = 20
0195      RR_TOLERANCE = 10
0196      MISSING_FILES_TOLERANCE = 6
0197      MISSING_LUMIS_TIMEOUT = 14400
0198      EMAIL       = uplegger@cern.ch
0199 //--------------------------------------------------------------------------------------------------
0200
0201
0202    c) Or I receive some unwanted e-mails :( or in the morning I check what happened to the v13 tag using this script whci is in cvs
0203      checkPayloads.py 13
0204      with 13 as argument.
0205      This script compare the iovs uploaded in the tag with the run registry. If there is a run registry entry and not a corresponding IOV
0206      it prints out:
0207      Run: 133509 is missing for DB tag BeamSpotObjects_2009_v14_offline
0208      Run: 139363 is missing for DB tag BeamSpotObjects_2009_v14_offline
0209
0210      This are the only 2 runs that should have an entry in the DB but for some reason we didn't update.
0211      Inside the script I keep a list of the runs that are missing in the db and if the megascript skip some of them I manually go to see in the run
0212      registry why the run is missing. If the strips were bad for example I write that run down and add it to the knownMissingRunList so they won't be printed out
0213
0214      #132573 Beam lost immediately
0215      #132958 Bad strips
0216      #133081 Bad pixels bad strips
0217      #133242 Bad strips
0218      #133472 Bad strips
0219      #133473 Only 20 lumisection, run duration 00:00:03:00
0220      #133509 Should be good!!!!!!!!!!
0221      #136290 Bad Pixels bad strips
0222      #138560 Bad pixels bad strips
0223      #138562 Bad HLT bad L1T, need to rescale the Jet Triggers
0224      #139363 NOT in the bad list but only 15 lumis and stopped for DAQ problems
0225      #139455 Bad Pixels and Strips and stopped because of HCAL trigger rate too high
0226      #140133 Beams dumped
0227      #140182 No pixel and Strips with few entries
0228      knownMissingRunList = [132573,132958,133081,133242,133472,133473,136290,138560,138562,139455,140133,140182]
0229
0230    d) I check the v14 tag with the same script
0231      checkPayloads.py
0232      If the 2 matche there were no new runs otherwise if I think the v13 was correctly updated with all runs, it means
0233      that I have to update the v14.
0234      So I just cut and paste the commands that are in this txt file
0235
0236      more uploadTags.txt
0237      ./BeamSpotWorkflow.py -c BeamSpotWorkflow_run.cfg -z -u
0238      ./BeamSpotWorkflow.py -c BeamSpotWorkflow_run_sigmaz.cfg -u
0239      ./BeamSpotWorkflow.py -c BeamSpotWorkflow_lumi.cfg -z -u
0240      ./BeamSpotWorkflow.py -c BeamSpotWorkflow_lumi_sigmaz.cfg -u
0241
0242      #For prompt and express tags
0243      ./createPayload.py -d PayloadFile.txt -t BeamSpotObjects_2009_v1_prompt -z -u
0244      ./createPayload.py -d PayloadFile.txt -t BeamSpotObjects_2009_v1_express -z -u
0245
0246      I have 4 cfg files
0247 //-------------------BeamSpotWorkflow_run.cfg
0248 [Common]
0249 SOURCE_DIR  = /castor/cern.ch/cms/store/caf/user/uplegger/Workflows/381_patch3/express_T0_v3/
0250 ARCHIVE_DIR = /afs/cern.ch/cms/CAF/CMSCOMM/COMM_BSPOT/automated_workflow/good_archive/
0251 WORKING_DIR = /afs/cern.ch/cms/CAF/CMSCOMM/COMM_BSPOT/automated_workflow/good_run
0252 DBTAG       = BeamSpotObjects_2009_v14_offline
0253 DATASET     = /StreamExpress/Run2010A-TkAlMinBias-v4/ALCARECO
0254 FILE_IOV_BASE = lumibase
0255 #DB_IOV_BASE   = lumiid
0256 DB_IOV_BASE   = runnumber
0257 DBS_TOLERANCE_PERCENT = 10
0258 DBS_TOLERANCE = 25
0259 RR_TOLERANCE = 10
0260 MISSING_FILES_TOLERANCE = 2
0261 MISSING_LUMIS_TIMEOUT = 0
0262 EMAIL       = uplegger@cern.ch
0263 //--------------------------------------------------------------------------------------------------
0264
0265 //-------------------
0266 [Common]
0267 SOURCE_DIR  = /castor/cern.ch/cms/store/caf/user/uplegger/Workflows/381_patch3/express_T0_v3/
0268 ARCHIVE_DIR = /afs/cern.ch/cms/CAF/CMSCOMM/COMM_BSPOT/automated_workflow/good_archive/
0269 WORKING_DIR = /afs/cern.ch/cms/CAF/CMSCOMM/COMM_BSPOT/automated_workflow/good_run_sigmaz
0270 DBTAG       = BeamSpotObjects_2009_SigmaZ_v14_offline
0271 DATASET     = /StreamExpress/Run2010A-TkAlMinBias-v4/ALCARECO
0272 FILE_IOV_BASE = lumibase
0273 #DB_IOV_BASE   = lumiid
0274 DB_IOV_BASE   = runnumber
0275 DBS_TOLERANCE_PERCENT = 10
0276 DBS_TOLERANCE = 25
0277 RR_TOLERANCE = 10
0278 MISSING_FILES_TOLERANCE = 2
0279 MISSING_LUMIS_TIMEOUT = 0
0280 EMAIL       = uplegger@cern.ch
0281 //--------------------------------------------------------------------------------------------------
0282
0283 //------------------BeamSpotWorkflow_lumi.cfg
0284 [Common]
0285 SOURCE_DIR  = /castor/cern.ch/cms/store/caf/user/uplegger/Workflows/381_patch3/express_T0_v3/
0286 ARCHIVE_DIR = /afs/cern.ch/cms/CAF/CMSCOMM/COMM_BSPOT/automated_workflow/good_archive/
0287 WORKING_DIR = /afs/cern.ch/cms/CAF/CMSCOMM/COMM_BSPOT/automated_workflow/good_lumi
0288 DBTAG       = BeamSpotObjects_2009_LumiBased_v14_offline
0289 DATASET     = /StreamExpress/Run2010A-TkAlMinBias-v4/ALCARECO
0290 FILE_IOV_BASE = lumibase
0291 DB_IOV_BASE   = lumiid
0292 #DB_IOV_BASE   = runnumber
0293 DBS_TOLERANCE_PERCENT = 10
0294 DBS_TOLERANCE = 25
0295 RR_TOLERANCE = 10
0296 MISSING_FILES_TOLERANCE = 2
0297 MISSING_LUMIS_TIMEOUT = 0
0298 EMAIL       = uplegger@cern.ch
0299 //--------------------------------------------------------------------------------------------------
0300
0301 //----------------BeamSpotWorkflow_lumi_sigmaz.cfg
0302 [Common]
0303 SOURCE_DIR  = /castor/cern.ch/cms/store/caf/user/uplegger/Workflows/381_patch3/express_T0_v3/
0304 ARCHIVE_DIR = /afs/cern.ch/cms/CAF/CMSCOMM/COMM_BSPOT/automated_workflow/good_archive/
0305 WORKING_DIR = /afs/cern.ch/cms/CAF/CMSCOMM/COMM_BSPOT/automated_workflow/good_lumi_sigmaz
0306 DBTAG       = BeamSpotObjects_2009_LumiBased_SigmaZ_v14_offline
0307 DATASET     = /StreamExpress/Run2010A-TkAlMinBias-v4/ALCARECO
0308 FILE_IOV_BASE = lumibase
0309 DB_IOV_BASE   = lumiid
0310 #DB_IOV_BASE   = runnumber
0311 DBS_TOLERANCE_PERCENT = 10
0312 DBS_TOLERANCE = 25
0313 RR_TOLERANCE = 10
0314 MISSING_FILES_TOLERANCE = 2
0315 MISSING_LUMIS_TIMEOUT = 0
0316 EMAIL       = uplegger@cern.ch
0317 //--------------------------------------------------------------------------------------------------
0318
0319      As you can see the ARCHIVE_DIR are all the same and what changes are just the DBTAG, DB_IOV_BASE and the WORKING_DIR.
0320      The MISSING_LUMIS_TIMEOUT is set to 0 because I already know that everything went well with the v13 so I don't want to timeout!