WW and WZ Analysis Based on Boosted Decision Trees
Hai-Jun YangUniversity of Michigan(contributed from Tiesheng Dai, Alan Wilson, Zhengguo Zhao, Bing Zhou)
ATLAS Trigger and Physics MeetingCERN, June 4-7, 2007
?Boosted Decision Trees (BDT)
?WW ?emX analysis based on BDT
?WZ ?ln ll analysis based on BDT
?BDT Applications and Free Softwares?Summary and Future Plan
June 4-7, 2007H.J. Yang -BDT for WW/WZ2
Boosted Decision Trees
Ref: B.P. Roe, H.J. Yang, J. Zhu, Y. Liu, I. Stancu, G. McGregor, ”Boosted decision trees as an alternative to
artificial neural networks for particle identification”, physics/0408124, NIM A543 (2005) 577-584.
June 4-7, 2007H.J. Yang -BDT for WW/WZ3
?The advantage of using boosted decision
trees is that it combines many decision trees,
“weak” classifiers, to make a powerful classifier.
The performance of boosted decision trees is stable after a few hundred tree iterations.
?Boosted decision trees focus on the
misclassified events which usually have high
weights after hundreds of tree iterations. An
individual tree has a very weak discriminating
power; the weighted misclassified event rate
errmis about 0.4-0.45.
Ref1: H.J.Yang, B.P. Roe, J. Zhu, “Studies of Boosted Decision Trees for MiniBooNE Particle Identification”, physics/0508045, Nucl. Instum. & Meth. A 555(2005) 370-385.
Ref2: H.J. Yang, B. P. Roe, J. Zhu, " Studies of Stability and Robustness for Artificial Neural Networks and Boosted Decision Trees ", physics/0610276, Nucl. Instrum. & Meth. A574 (2007) 342-349.June 4-7, 2007H.J. Yang -BDT for WW/WZ4
Diboson analysis –Physics Motivation
X (CSC 11 Dataset)June 4-7, 2007H.J. Yang -BDT for WW/WZ6
WW analysis –datasets after precuts
Breakdown of MC samples for WW analysis after precuts
Event Selection for WW-> emXEvent Pre-selection
?At least one electron + one muon with Pt>10 GeV ?Missing Et > 15 GeV
?Signal efficiency is 39%
?Simple cutsbased on Rome sample studies?Boosted Decision Treeswith 15 input variables
June 4-7, 2007H.J. Yang -BDT for WW/WZ9
Select of WW -> em / me + Missing ETSimple cuts used in Rome studies
?Two isolated di-lepton PT> 20 GeV;
at least one PT > 25 GeV
?Missing ET > 30 GeV
?Mem > 30 GeV; Veto MZ (ee, mm)
?ET (had)=|Sum (lT)+ missing ET| < 60 GeV, Sum Et(jet) < 120 GeV;
?Number of jets < 2
?PT(l+l-) > 20 GeV
?Vertexbetween two leptons: DZ < 1mm, DA < 0.1 mmFor 1 fb-1, 189 signal and 168 backgroundZZ
June 4-7, 2007H.J. Yang -BDT for WW/WZ10
BDT Training Procedure?1ststep: use all 48 variables for BDT training, rank variables based on their gini index contributions or how often they were used as tree splitters.
?2ndstep: select 15 powerful variables?3rdstep: re-train BDT based on 15 selected good variables
June 4-7, 2007H.J. Yang -BDT for WW/WZ11
Variables after pre-selection used in BDTJune 4-7, 2007H.J. Yang -BDT for WW/WZ
Variable distributions after pre-selection
June 4-7, 2007H.J. Yang -BDT for WW/WZ13
Variable distributions after pre-selectionsignal
June 4-7, 2007H.J. Yang -BDT for WW/WZ14
Variable distributions after pre-selectionbackgroundsignal
June 4-7, 2007H.J. Yang -BDT for WW/WZ15
Variable distributions after pre-selectionsignal
June 4-7, 2007H.J. Yang -BDT for WW/WZ16
BDT Training Tips
?exp(2*0.01) = 1.0207,I = 1 if training events are misclassified, otherwise I = 0?1000 tree iterations, 20 leaves/tree
?The MC samples are split into two halves, one for training, the other for test; then reverse the training and testing samples. The average of testing results
are regarded as the final results.
June 4-7, 2007H.J. Yang -BDT for WW/WZ17
Boosted Decision Trees output
June 4-7, 2007H.J. Yang -BDT for WW/WZ18
Boosted Decision Trees output
June 4-7, 2007H.J. Yang -BDT for WW/WZ19
-1Signal (WW) and Backgrounds for 1 fb
June 4-7, 2007H.J. Yang -BDT for WW/WZ
MC breakdown with all cuts for 1 fb-1June 4-7, 2007H.J. Yang -BDT for WW/WZ
Summary for WW Analysis?Background event sample compared to Rome sample increased by a factor of ~10; compared to post Romesample increased by a factor of ~2.
?Simple Cuts: S/B ~ 1.1
?Boosted Decision Trees with 15 variables: S/B = 5.9?The major backgrounds are W-> mn(~50%), ttbar, WZ?W-> mn(event weight = 11.02)needs more statistics(x5)if possible.
June 4-7, 2007H.J. Yang -BDT for WW/WZ22
–Test of SM couplings
–Search for anomalous triple gauge boson couplings (TGCs) that could indicate new physics
–WZ final state would be a background to SUSY and technicolor signals.
?WZ event selection by two methods–Simple cuts
–Boosted Decision Trees
June 4-7, 2007H.J. Yang -BDT for WW/WZ23
WZ selection –Major backgrounds?Major backgrounds–pp ?t tbar?Pair of leptons fall in Z mass window?Jet produces lepton signal
?Fake missing ET?Jet produces third lepton signal
?Fake missing ETand third lepton
Lose a lepton–pp ?Z+jetsq’–pp ?Z/γ?ee, mm–pp ?ZZ?4 leptons
June 4-7, 2007H.J. Yang -BDT for WW/WZ24
Pre-selection for WZ analysis?Pre-selection–Identify leptons and require?pT > 5GeV, one with pT > 20GeV–Require missing ET > 15GeV–Find e+e-or m+m-pair with inv. mass closest to Z peak?must be within 91.18 ?20 GeV. –Third leptonwith pT > 15GeV and 10 < MT < 400GeV?Eff(W+Z) = 25.8%, Eff(W-Z) = 29.3%
?Compute additional variables(invariant masses, sums of jets, track isolations …), 67 variables in total
June 4-7, 2007H.J. Yang -BDT for WW/WZ25
WZ analysis –
datasets after precutsJune 4-7, 2007H.J. Yang -BDT for WW/WZ26
Final selection –simple cutsBased on pre-selected events, make further cuts–Lepton selection?????Isolation: leptons have tracks totaling < 8 GeV within DR<0.4 Z leptons have pT> 6 GeVW lepton pT> 25 GeVE ~ p for electrons: 0.7 < E/p < 1.3Hollow cone around leptons has little energy: [ET(DR<0.4) –ET(DR<0.2)] / ET< 0.1––––Leptons separated by DR > 0.2Exactly 3 leptonsMissing ET> 25 GeVFew jets??
?–Leptonic energy balanceNo more than one jet with ET> 30 GeV in |h| < 3Scalar sum of jet ET< 200 GeV
| Vector sum of leptons and missing ET| < 100 GeV–Z mass window: ?9 GeV for electrons and ?12 GeV for muons–W mass window: 40 GeV < MT(W) < 120 GeV
June 4-7, 2007H.J. Yang -BDT for WW/WZ27
Simple cuts –results (Alan)
June 4-7, 2007H.J. Yang -BDT for WW/WZ28
WZ –Boosted Decision Trees?Select 22 powerful variables out of 67 total available variables for BDT training.?Rank 22 variables based on the gini index contributions and the number of times used as tree splitters.?M(Z) and MT(W) are ranked highest.
June 4-7, 2007H.J. Yang -BDT for WW/WZ29
June 4-7, 2007H.J. Yang -BDT for WW/WZ30
June 4-7, 2007H.J. Yang -BDT for WW/WZ31
BDT Training Tips
?In the original BDT training program, all training events are set to have same weights in the beginning (the first tree). It works fine if all MC processes are produced based on their production rates.
?Our MCs are produced separately, the event weights vary from various backgrounds. e.g. assuming 1 fb-1wt (ZZ_llll) = 0.0024, wt (ttbar) = 0.7, wt(DY) = 1.8?We made two BDT trainings. One based on equal event weights for all training MC; the other based on their correct event weights for the 1sttree training.?BDT performance with correct event weights for training works better than that with equal weights.June 4-7, 2007H.J. Yang -BDT for WW/WZ32
BDT with 22 and 67 variables have comparable performance?ANN and BDT training with correct event weights works significantlybetter than that with equal event weights
June 4-7, 2007H.J. Yang -BDT for WW/WZ33
June 4-7, 2007H.J. Yang -BDT for WW/WZ34
?Event weight training technique works better than equal weight training for both ANN(x5-7)and BDT(x6-10)
?BDT is better than ANNby reducing more background(x1.5-2)?Anote to describe the event weight training technique in detail
will be available shortly.
June 4-7, 2007H.J. Yang -BDT for WW/WZ35
Eff_bkgd/RMS vs Training EventsJune 4-7, 2007H.J. Yang -BDT for WW/WZ36
WZ –Boosted Decision TreesFor 1 fb-1 , BDT Results
?Nsignal= 150 to 60
?Significance (Nsignal/√Nbkg) ~ 40
?BDT, S/BG ~ 10 to 24
S/BG ~ 2-2.5
June 4-7, 2007H.J. Yang -BDT for WW/WZ37
ZW ?eee, eem, mme,
mmmJune 4-7, 2007H.J. Yang -BDT for WW/WZ38
MC breakdown with all cuts for 1 fb-1
June 4-7, 2007H.J. Yang -BDT for WW/WZ39
Summary for WZ Analysis?Simple CutsS/BG = 2 ~ 2.5
?Boosted Decision Trees with 22 variablesS/BG = 10 ~ 24
?The major backgrounds are (BDT>=200):ZZ -> 4ZJet -> 2l (47.8%)
ttbar (17.4%)mX (15.5%)
Drell-Yan -> 2l (12.4%)
June 4-7, 2007H.J. Yang -BDT for WW/WZ40
Applications of BDT in HEP?Boosted Decision Trees (BDT) has been applied for some major HEP experiments in the past few years.
–MiniBooNE data analysis (BDT reject 20-80% more background than ANN)?physics/0408124 (NIM A543, p577), physics/0508045 (NIM A555, p370), ?physics/0610276(NIM A574, p342), physics/0611267?“ A search for electron neutrino appearance at dm^2 ~ 1 eV^2 Scale”, hep-ex/0704150 (submitted to PRL)–ATLAS Di-Boson analysis, ww, wz, wg, zg–ATLAS SUSY analysis –hep-ph/0605106 (JHEP060740)–LHC B-tagging, physics/0702041, for 60% b-tagging eff, BDT has 35% more light jet rejection than that of ANN.–BaBar data analysis?“Measurement of CP-violating asymmetries in the B0->K+K-K0 dalitz plot”, hep-ex/0607112?physics/0507143, physics/0507157–D0 data analysis
–More are underway …
June 4-7, 2007?hep-ph/0606257, Fermilab-thesis-2006-15, ?“Evidence of single top quarks and first direct measurement of |Vtb|”, hep-ex/0612052 (to appear in PRL), BDT better than ANN, matrix-element likelihoodH.J. Yang -BDT for WW/WZ41
BDT Free Softwares
??TMVA toolkit, CERN Root V5.14/00June 4-7, 2007H.J. Yang -BDT for WW/WZ42
Summary and Future Plan?WW and WZ analysis results with Simple cuts and BDT are presented?BDT works better than ANN, it is a very powerful and promising data analysis tool?Redo WW/WZ analysis with CSC12 MC?BDT will be applied for WW->2mX, H->ZZ,WW and tautau etc.
June 4-7, 2007H.J. Yang -BDT for WW/WZ43
BACKUP SLIDESBoosted Decision Treesfor
June 4-7, 2007H.J. Yang -BDT for WW/WZ44
Decision Trees & Boosting Algorithms?Decision Trees have been available about two decades, they are known to be powerful but unstable, i.e., a small change in the training sample can give a large change in the tree and the results.
Ref: L. Breiman, J.H. Friedman, R.A. Olshen, C.J.Stone, “Classification and Regression Trees”, Wadsworth, 1984.
?The boosting algorithm (AdaBoost) is a procedure that combines many “weak”classifiers to achieve a final powerful classifier.
Ref: Y. Freund, R.E. Schapire, “Experiments with a new boosting algorithm”, Proceedings of COLT, ACM Press, New York, 1996, pp. 209-217.
?Boosting algorithms can be applied to any classification method. Here, it is applied to decision trees, so called “Boosted Decision Trees”, for the MiniBooNE particle identification.
* Hai-Jun Yang, Byron P. Roe, Ji Zhu, " Studies of boosted decision trees for MiniBooNE particle identification", physics/0508045, NIM A 555:370,2005
* Byron P. Roe, Hai-Jun Yang, Ji Zhu, Yong Liu, Ion Stancu, Gordon McGregor," Boosted decision trees as an alternative to artificial neural networks for particle identification", NIM A 543:577,2005
* Hai-Jun Yang, Byron P. Roe, Ji Zhu, “Studies of Stability and Robustness of Artificial Neural Networks and Boosted Decision Trees”, NIM A574:342,2007
June 4-7, 2007H.J. Yang -BDT for WW/WZ45
June 4-7, 2007H.J. Yang -BDT for WW/WZ46
Criterion for “Best”Tree Split?Purity, P,is the fraction of the weight of a node (leaf) due to signal events.
?Gini Index: Note that Gini index is 0 for all signal or all background.
?The criterion is to minimize Gini_left_node+ Gini_right_node
June 4-7, 2007H.J. Yang -BDT for WW/WZ47
Criterion for Next Node to Split?Pick the node to maximize the change in Gini index.Criterion =
?We can use Gini index contribution of tree split variables to sort the importance of input variables.
?We can also sort the importance of input variables based on how often they are used as tree splitters.
June 4-7, 2007H.J. Yang -BDT for WW/WZ48
Signal and Background Leaves?Assume an equal weight of signal and background training events.
?If event weight of signal is larger than ?of the total weight of a leaf, it is a signal leaf; otherwise it is a background leaf.
?Signal events on a background leaf or background events on a signal leaf are misclassified events.
June 4-7, 2007H.J. Yang -BDT for WW/WZ49
How to Boost Decision Trees ??For each tree iteration, same set of training events are
used but the weights of misclassified events in previous iteration are increased (boosted). Events with higher weights have larger impact on Gini index values and Criterion values. The use of boosted weights for
misclassified events makes them possible to be correctly classified in succeeding trees.
?Typically, one generates several hundred to thousand trees
until the performance is optimal.
?The score of a testing event is assigned as follows: If it
lands on a signal leaf, it is given a score of 1; otherwise -
1. The sum of scores (weighted) from all trees is the final score of the event.
June 4-7, 2007H.J. Yang -BDT for WW/WZ50
?The advantage of using boosted
decision trees is that it combines
many decision trees, “weak”
classifiers, to make a powerful
classifier. The performance of BDT
is stable after few hundred tree
?Boosted decision trees focus on the
misclassified events which usually have
high weights after hundreds of tree
iterations. An individual tree has a very
weak discriminating power; the
weighted misclassified event rate errmis about 0.4-0.45.
June 4-7, 2007H.J. Yang -BDT for WW/WZ51
Two Boosting Algorithms
June 4-7, 2007H.J. Yang -BDT for WW/WZ52
?AdaBoost: the weight of misclassified events is increased by
–error rate=0.1 and b= 0.5, am= 1.1, exp(1.1) = 3
–error rate=0.4 and b= 0.5, am= 0.203, exp(0.203) = 1.225–Weight of a misclassified event is multiplied by a large factor which depends on the error rate.
?e-boost: the weight of misclassified events is increased by
–If e= 0.01, exp(2*0.01) = 1.02
–If e= 0.04, exp(2*0.04) = 1.083
–It changes event weight a little at a time.
?AdaBoost converges faster than e-boost. However, the performance of AdaBoost and e-boost are comparable with sufficient tree iterations.
June 4-7, 2007H.J. Yang -BDT for WW/WZ53