Statistical Methods in Particle Physics
|
|
- Arleen Ferguson
- 5 years ago
- Views:
Transcription
1 Statistical Methods in Particle Physics 8. Multivariate Analysis Prof. Dr. Klaus Reygers (lectures) Dr. Sebastian Neubert (tutorials) Heidelberg University WS 2017/18
2 Multi-Variate Classification Consider events which can be either signal or background events. Each event is characterized by n observables: ~x =(x 1,...,x n ) "feature vector" Goal: classify events as signal or background in an optimal way. This is usually done by mapping the feature vector to a single variable, i.e., to scalar test statistic: R n! R : y(~x) A cut y > c to classify events as signal corresponds to selecting a potentially complicated hyper-surface in feature space. In general superior to classical "rectangular" cuts on the xi. Problem closely related to machine learning (pattern recognition, data mining, ) 2
3 Classification: Different Approaches rectangular cuts non linear linear H 0 k-nearest-neighbor, Boosted Decision Trees, Multi-Layer Perceptrons, Support Vector Machines G. Cowan': 3
4 Signal Probability Instead of Hard Decisions Example: test statistic y for signal and background from a Multi-Layer Perceptron (MLP): Normalized TMVA output for classifier: MLP Signal Background Instead of a hard yes/no decision one can also define the probability of an event to be a signal event: P s (y) P(S y) = p(y B) TMVA manual p(y S) MLP U/O-flow (S,B): (0.0, 0.0)% / (0.0, 0.0)% Normalized TMVA output for classifier: BDT Signal Background Figure 4: Example plots for classifier output distributions for signal and background event test sample. Shown are likelihood (upper left), PDE range search (upper right), Multilay lower left) and boosted decision trees. p(y S) f s p(y S) f s + p(y B) (1 f s ), f s = n s n s + n b TMVA tutorial: An up-to-date reference of all configuration options for the TMVA Factory, 4 the MVA methods:
5 ROC Curve Quality of the classification can be characterized by the receiver operating characteristic (ROC curve) Background rejection versus Signal efficiency 1 εb εb: background efficiency Background rejection good MVA Method: Fisher MLP BDT PDERS Likelihood better Figure 5: Example for the background rejection versus signal e on the classifier outputs for the events of the test sample. Signal efficiency ciency ( ROC curve ) obtained by cutting 5
6 Different Approaches to Classification Neyman-Pearson lemma states that likelihood ratio provides an optimal test statistic for classification: y(~x) = p(~x S) p(~x B) Problem: the underlying pdf's are almost never known explicitly. Two approaches: 1. Estimate signal and background pdf's and construct test statistic based on Neyman-Pearson lemma, e.g. Naïve Bayes classifier (= Likelihood classifier) 2. Decision boundaries determined directly without approximating the pdf's (linear discriminants, decision trees, neural networks, ) 6
7 Supervised Machine Learning (I) Supervised Machine Learning requires labeled training data, i.e., a training sample where for each event it is known whether it is a signal or background event Decision boundary defined by minimizing a loss function ("training") Bias-variance tradeoff Classifiers with a small number of degrees of freedom are less prone to statistical fluctuations: different training samples would result in a similar classification boundaries ("small variance") However, if the data contain features that a model with few degrees of freedom cannot describe, a bias is introduced. In this case a classifier with more degrees of freedom would be better. User has to find a good balance 7
8 Supervised Machine Learning (II) Training, validation, and test sample Decision boundary fixed with training sample Performance on training sample becomes better with more iterations Danger of overtraining: Statistical fluctuations of the training sample will be learnt Validation sample = independent labeled data set not used for training check for overtraining Sign of overtraining: performance on validation sample becomes worse Stop training when signs of overtraining are observed ("early stopping") Performance: apply classifier to independent test sample Often: test sample = validation sample (only small bias) 8
9 Supervised Machine Learning (III) Rule of thumb if training data not expensive Training sample: 50% Validation sample: 25% Test sample: 25% often test sample = validation sample, i.e., training : validation/test = 50:50 Cross validation (efficient use of scarce training data) Split training sample in k independent subset Tk of the full sample T Train on T \ Tk resulting in k different classifiers For each training event there is one classifier that didn't use this event for training Validation results are then combined 9
10 Estimating PDFs from Histograms? Consider 2d example: G. Cowan': signal background Approximate approximate pdfs PDF using byn(x,y s), y S) N(x,y b) andn(x, in y B) corresponding cells. M bins per variable in d dimensions: M d cells hard to generate enough training data (often not practical for d > 1) In general in machine learning, problems related to a large number of dimensions of the feature space are referred to as the "curse of dimensionality" 10
11 k-nearest Neighbor Method (I) k-nn classifier Estimates probability density around the input vector p(~x S) and p(~x B) are approximated by the number of signal and background events in the training sample that lie in a small volume around the point ~x Algorithms finds k nearest neighbors: k = k s + k b Probability for the event to be of signal type: p s (~x) = k s (~x) k s (~x)+k b (~x) 11
12 k-nearest Neighbor Method (II) Simplest choice for distance measure in feature space is the Euclidean distance: 1.5 TMVA manual R = ~x ~y 1 Better: take correlations between variables into account: R = q (~x ~y) T V 1 (~x ~y) x V = covariance matrix "Mahalanobis distance" x 0 Figure 14: Example for the k-nearest The k-nn classifier has best performance when the boundary that separates signal and background events has irregular discriminating features that input cannot variables). be easily The thre approximated by parametric learning methods. The full (open) circles are the signal (bac in the nearest neighborhood (circle) of th signal and 7 background points so that 12 q
13 Kernel Density Estimator (KDE) [Just mentioned briefly here] Idea: Smear training data to get an estimate of the PDFs ˆf h (~x) = 1 n nx i=1 K h (~x ~x i )= 1 nh nx i=1 K( ~x ~x i h ) K = Kernel h = bandwidth (smoothing parameter) Use, e.g., Gaussian kernel: K(~x) = 1 ~x e 2 /2 d = dimension of feature space (2 ) d/2 x 2 Lecture Niklas Berger x 1 13
14 Gaussian KDE in 1 Dimension G. Cowan': Adaptive KDE: adjust width of kernel according to PDF (wide where pdf is low) Advantage of KDE: no training! Disadvantage of KDE: need to sum of all training events to evaluate PDF, i.e., method can be slow 14
15 Fisher Linear Discriminant Linear discriminant is simple. Can still be optimal if amount of training data is limited. n nx Ansatz for test statistic: yx= y(~x) w= i x i = ~w T i x i = w T x ~x i=1 i=1 f y s, f y b Choose parameters wi so that separation between signal and background distribution is maximum. Need to define "separation". f (y) t s t b Fisher: maximize J(~w) = ( s b ) 2 2 s + 2 b s b J w= s b 2 G. Cowan': s 2 b 2 y 15
16 Fisher Linear Discriminant: Variable Definitions Mean and covariance for signal and background: µ s,b i = V s,b ij = Z Z x i f (~x H s,b )d~x (x i µ s,b i )(x j µ s,b j ) f (~x H s,b )d~x Mean and covariance of y(~x) for signal and background: Z s,b = y(~x)f (~x H s,b )d~x = ~w T ~µ s,b Z 2 s,b = (y(~x) s,b ) 2 f (~x H s,b )d~x = ~w T V s,b ~w G. Cowan': 16
17 Fisher Linear Discriminant: Determining the Coefficients wi Numerator of J(~w) : ( s b ) 2 = J(~w) Denominator of : nx i=1 2 s + 2 b = w i (µ s i µ b i )! 2 = nx i,j=1 nx i,j=1 nx i,j=1 w i w j (µ s i µ b i )(µ s j µ b j ) w i w j B ij = ~w T B ~w w i w j (V s + V b ) ij ~w T W ~w Maximize: J(~w) = ~w T B ~w ~w T W ~w = separation between classes separation within classes G. Cowan': 17
18 Fisher Linear Discriminant: Determining the Coefficients i =0 gives: y(~x) =~w T ~x with ~w / W 1 (~µ s ~µ b ) We obtain linear decision boundaries. linear decision boundary Weight vector ~w can be interpreted as a direction in feature space on which the events are projected. G. Cowan': 18
19 Fisher Linear Discriminant: Remarks In case the signal and background pdfs f (~x H s ) and f (~x H b ) are both multivariate Gaussian with the same covariance but different means, the Fisher discriminant is y(~x) / ln f (~x H s) f (~x H b ) That is, in this case the Fisher discriminant is an optimal classifier according to the Neyman-Pearson lemma (as y(~x) is a monotonic function of the likelihood ratio) Test statistic can be written as y(~x) =w 0 + nx i=1 w i x i where events with y > 0 are classified as signal. Same functional form as for the perceptron (prototype of neural networks). 19
20 Perceptron Discriminant: y(~x) =h w 0 + nx i=1 w i x i! The nonlinear, monotonic function h is called activation function. Typical choices for h: 1 1+e x ( sigmoid ), tanhx x 1 h(x) y(~x) x n x 20
21 The Biological Inspiration: the Neuron Human brain: neurons, each with on average 7000 synaptic connections y l 1 1 y l 1 2. y l 1 n Input w l 1 1j w l 1 2j w l 1 nj ρ Output Σ y l j 21
22 Feedforward Neural Network with One Hidden Layer superscripts indicates layer number 1(~x) y(~x) i(~x) =h (1) i0 + nx j=1 1 w (1) ij x j A m(~x) y(~x) =h (2) 10 + m X j=1 w (2) 1j 1 j(~x) A Straightforward to generalize to multiple hidden layers 22
23 Network Training ~x a : training event, a = 1,..., N t a : correct label for training event a e.g., ta = 1, 0 for signal and background, respectively ~w : vector containing all weights Error function: E(~w) = 1 2 NX (y(~x a, ~w) t a ) 2 = NX E a (~w) a=1 a=1 Weights are determined by minimizing the error function. 23
24 Backpropagation ~w (0) Start with an initial guess for the weights an then update weights after each training event: ~w ( +1) = ~w ( ) re a (~w ( ) ) learning rate Let's write network output as follows: mx y(~x) =h(u(~x)) with u(~x) = w (2) 1j j(~x), j(~x) =h Here we defined φ0 = x0 = 1 and the sums start from 0 to include the offsets. Weights from hidden layer to output: E a = 1 2 (y a t a ) (2) 1j j=0 =(y a t a )h 0 (u(~x (2) 1j =(y a t a )h 0 (u(~x a )) j (~x a ) Weights from input layer to hidden layer ( further application of chain rule): nx k=0 w (1) jk x k! h (v j (1) jk =(y a t a )h 0 (u(~x a ))w (2) 1j h 0 (v j (~x a )) x a,k ~x a (x a,1,...,x a,n ) 24
25 Neural Network Output and Decision Boundaries P. Bhat, Multivariate Analysis Methods in Particle Physics, inspirehep.net/record/ a Input layer Hidden layer Output layer 800 b x 1 x 2 x 3 Number of events Signal Background output of neural network 200 x d x i h j 0(x) = f(x,w) NN output 25 c d 20 Signal Background NN contours decision boundaries for different cuts on NN output Variable x Variable x p(s x 1, x 2 ) Variable x Variable x signal probability p(s x1, x2) 25
26 Example of Overtraining Too many neurons/layers make a neural network too flexible overtraining training sample G. Cowan: test sample Network "learns" features that are merely statistical fluctuations in the training sample 26
27 Monitoring Overtraining G. Cowan: Monitor fraction of misclassified events (or error function:) error rate optimum = minimum of error rate for test sample overtraining = increase of error rate test sample training sample flexibility (e.g., number of nodes/layers) 27
28 Example: Identification of Events with Top-Quarks a t t! W + bw b! l bq q b Shaded = signal b likelihood method Background Signal Events (arbitrary units) x 1 = missing E T x 2 = aplanarity Events (arbitrary units) D LB Background Signal x 3 x 4 D0 experiment, plot from inspirehep.net/record/ neural net D NN 28
29 Deep Neural Networks Deep networks: many hidden layers with large number of neurons Challenges Hard too train ("vanishing gradient problem") Training slow Risk of overtraining Big progress in recent years Interest in NN waned before ca Milestone: paper by G. Hinton (2006): "learning for deep belief nets" Image recognition, AlphaGo, Soon: self-driving cars, 29
30 Fun with Neural Nets in the Browser 30
31 Decision Trees (I) arxiv:physics/ v1 root node B 4/37 S 7/1 S/B 52/48 < PMT Hits? S/B 9/10 S/B 48/11 < 0.2 GeV 0.2 GeV Energy? < 500 cm 500 cm Radius? B 2/9 branch node (node with further branching) S 39/1 MiniBooNE: 1520 photomultiplier signals, goal: separation of νe from νμ events leaf node (no further branching) Leaf nodes classify events as either signal or background 31
32 Decision Trees (II) Ann.Rev.Nucl.Part.Sci. 61 (2011) a Root node {x 1 5, x 2 } b x 1 θ 1 x 1 > θ 1 x 2 R 3 θ 3 x 2 θ 2 x 2 > θ 2 x 2 θ 3 x 2 > θ 3 R 2 R 1 R 2 R 3 θ 2 R 4 R 5 R 1 x 1 θ 4 x 1 > θ 4 θ 1 θ 4 x 1 R 4 R 5 Easy to interpret and visualize: Space of feature vectors split up into rectangular volumes (attributed to either signal or background) How to build a decision tree in an optimal way? 32
33 Finding Optimal Cuts Separation btw. signal and background is often measured with the Gini index: G = p(1 p) Here p is the purity: p = P signal w i P signal w i + P background w i w i = weight of event i [usefulness of weights will become apparent soon] Improvement in signal/background separation after splitting a set A into two sets B and C: = W A G A W B G B W C G C where W X = X X w i 33
34 Separation Measures arbitrary unit Split criterion Misclas. error Entropy Gini signal purity Cross entropy: p ln p (1 p)ln(1 p) p Gini index: p(1 p) [after Corrado Gini, used to measure income and wealth inequalities, 1912] Misclassification rate: 1 max(p,1 p) 34
35 Decision Tree Pruning When to stop growing a tree? When all nodes are essentially pure? Well, that's overfitting! Pruning Cut back fully grown tree to avoid overtraining 35
36 Boosted Decision Trees: Idea Drawback of decisions trees: very sensitive to statistical fluctuations in training sample Solution: boosting One tree several trees ("forrest") Trees are derived from the same training ensemble by reweighting events Individual trees are then combined: weighted average of individual trees Boosting is a general method of combining a set of classifiers (not necessarily decisions trees) into a new, more stable classifier with smaller error. Popular example: AdaBoost (Freund, Schapire, 1997) 36
37 AdaBoost (short for Adaptive Boosting) Initial training sample with equal weights normalized as nx Train first classifier f1: ~x 1,...,~x n : multivariate event data y 1,...,y n : true class labels, +1 or 1 w (1) (1) 1,...,w n event weights i=1 w (1) i =1 f 1 (~x i ) > 0 f 1 (~x i ) < 0 classify as signal classify as background 37
38 Updating Events Weights Define training sample k+1 from training sample k by updating weights: w (k+1) i = w (k) i e k f k (~x i )y i /2 Z k i = event index normalization factor so that nx i=1 w (k) i =1 Weight is increased if event was misclassified by the previous classifier "Next classifier should pay more attention to misclassified events" At each step the classifier fk minimizes error rate " k = nx i=1 w (k) i I (y i f k (~x i ) apple 0), I (X )=1ifX is true, 0 otherwise 38
39 Assigning the Classifier Score Assign score to each classifier according to its error rate: k =ln 1 " k " k Combined classifier (weighted average): f (~x) = KX k=1 k f k (~x) It can be shown that the error rate of the combined classifier satisfies " apple KY k=1 2 p " k (1 " k ) 39
40 General Remarks on Multi-Variate Analyses MVA Methods More effective than classic cut-based analyses Take correlations of input variables into account Important: find good input variables for MVA methods Good separation power between S and B Little correlations among variables No correlation with the parameters you try to measure in your signal sample! Pre-processing Apply obvious variable transformations and let MVA method do the rest Make use of obvious symmetries: if e.g. a particle production process is symmetric in polar angle θ use cos θ and not cos θ as input variable It is generally useful to bring all input variables to a similar numerical range H. Voss, Multivariate Data Analysis and Machine Learning in High Energy Physics 40
41 Classifiers and Their Properties H. Voss, Multivariate Data Analysis and Machine Learning in High Energy Physics Classifiers Criteria Cuts Likelihood PDERS / k-nn H-Matrix Fisher MLP BDT RuleFit SVM Performance no / linear correlations nonlinear correlations Speed Training Response / Robust -ness Overtraining Weak input variables Curse of dimensionality Transparency 41
Advanced statistical methods for data analysis Lecture 2
Advanced statistical methods for data analysis Lecture 2 RHUL Physics www.pp.rhul.ac.uk/~cowan Universität Mainz Klausurtagung des GK Eichtheorien exp. Tests... Bullay/Mosel 15 17 September, 2008 1 Outline
More informationMultivariate statistical methods and data mining in particle physics Lecture 4 (19 June, 2008)
Multivariate statistical methods and data mining in particle physics Lecture 4 (19 June, 2008) RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement
More informationMultivariate statistical methods and data mining in particle physics
Multivariate statistical methods and data mining in particle physics RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement of the problem Some general
More informationStatistical Methods for Particle Physics Lecture 2: multivariate methods
Statistical Methods for Particle Physics Lecture 2: multivariate methods http://indico.ihep.ac.cn/event/4902/ istep 2015 Shandong University, Jinan August 11-19, 2015 Glen Cowan ( 谷林 科恩 ) Physics Department
More informationStatistical Methods for Particle Physics Lecture 2: statistical tests, multivariate methods
Statistical Methods for Particle Physics Lecture 2: statistical tests, multivariate methods www.pp.rhul.ac.uk/~cowan/stat_aachen.html Graduierten-Kolleg RWTH Aachen 10-14 February 2014 Glen Cowan Physics
More informationApplied Statistics. Multivariate Analysis - part II. Troels C. Petersen (NBI) Statistics is merely a quantization of common sense 1
Applied Statistics Multivariate Analysis - part II Troels C. Petersen (NBI) Statistics is merely a quantization of common sense 1 Fisher Discriminant You want to separate two types/classes (A and B) of
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationMultivariate Methods in Statistical Data Analysis
Multivariate Methods in Statistical Data Analysis Web-Site: http://tmva.sourceforge.net/ See also: "TMVA - Toolkit for Multivariate Data Analysis, A. Hoecker, P. Speckmayer, J. Stelzer, J. Therhaag, E.
More informationAdvanced statistical methods for data analysis Lecture 1
Advanced statistical methods for data analysis Lecture 1 RHUL Physics www.pp.rhul.ac.uk/~cowan Universität Mainz Klausurtagung des GK Eichtheorien exp. Tests... Bullay/Mosel 15 17 September, 2008 1 Outline
More informationMachine Learning Lecture 7
Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant
More informationMidterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian
More informationMultivariate Analysis Techniques in HEP
Multivariate Analysis Techniques in HEP Jan Therhaag IKTP Institutsseminar, Dresden, January 31 st 2013 Multivariate analysis in a nutshell Neural networks: Defeating the black box Boosted Decision Trees:
More informationStatistical Tools in Collider Experiments. Multivariate analysis in high energy physics
Statistical Tools in Collider Experiments Multivariate analysis in high energy physics Pauli Lectures - 06/0/01 Nicolas Chanon - ETH Zürich 1 Main goals of these lessons - Have an understanding of what
More informationFINAL: CS 6375 (Machine Learning) Fall 2014
FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationECE662: Pattern Recognition and Decision Making Processes: HW TWO
ECE662: Pattern Recognition and Decision Making Processes: HW TWO Purdue University Department of Electrical and Computer Engineering West Lafayette, INDIANA, USA Abstract. In this report experiments are
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationMachine Learning Lecture 10
Machine Learning Lecture 10 Neural Networks 26.11.2018 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Today s Topic Deep Learning 2 Course Outline Fundamentals Bayes
More informationLecture 3: Pattern Classification
EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationStatistical Data Analysis Stat 2: Monte Carlo Method, Statistical Tests
Statistical Data Analysis Stat 2: Monte Carlo Method, Statistical Tests London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,
More informationMultivariate Data Analysis and Machine Learning in High Energy Physics (III)
Multivariate Data Analysis and Machine Learning in High Energy Physics (III) Helge Voss (MPI K, Heidelberg) Graduierten-Kolleg, Freiburg, 11.5-15.5, 2009 Outline Summary of last lecture 1-dimensional Likelihood:
More informationInstance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows Kn-Nearest
More informationCh 4. Linear Models for Classification
Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,
More informationLecture 2. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1
Lecture 2 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,
More informationNeural Networks and Deep Learning
Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationPATTERN CLASSIFICATION
PATTERN CLASSIFICATION Second Edition Richard O. Duda Peter E. Hart David G. Stork A Wiley-lnterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane Singapore Toronto CONTENTS
More informationHarrison B. Prosper. Bari Lectures
Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison B. Prosper Bari, 2016 1 h Lecture 1 h Introduction h Classification h Grid Searches
More informationBrief Introduction to Machine Learning
Brief Introduction to Machine Learning Yuh-Jye Lee Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU August 29, 2016 1 / 49 1 Introduction 2 Binary Classification 3 Support Vector
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationHoldout and Cross-Validation Methods Overfitting Avoidance
Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest
More informationBoosted Decision Trees and Applications
EPJ Web of Conferences, 24 (23) DOI:./ epjconf/ 2324 C Owned by the authors, published by EDP Sciences, 23 Boosted Decision Trees and Applications Yann COADOU CPPM, Aix-Marseille UniversitNRS/IN2P3, Marseille,
More informationIntelligent Systems Discriminative Learning, Neural Networks
Intelligent Systems Discriminative Learning, Neural Networks Carsten Rother, Dmitrij Schlesinger WS2014/2015, Outline 1. Discriminative learning 2. Neurons and linear classifiers: 1) Perceptron-Algorithm
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationMultilayer Neural Networks
Multilayer Neural Networks Multilayer Neural Networks Discriminant function flexibility NON-Linear But with sets of linear parameters at each layer Provably general function approximators for sufficient
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More informationLecture 3: Pattern Classification. Pattern classification
EE E68: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mitures and
More informationLinear discriminant functions
Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative
More informationLecture 5: Logistic Regression. Neural Networks
Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationVoting (Ensemble Methods)
1 2 Voting (Ensemble Methods) Instead of learning a single classifier, learn many weak classifiers that are good at different parts of the data Output class: (Weighted) vote of each classifier Classifiers
More informationLearning with multiple models. Boosting.
CS 2750 Machine Learning Lecture 21 Learning with multiple models. Boosting. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Learning with multiple models: Approach 2 Approach 2: use multiple models
More informationIntroduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones
Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive
More informationClassification: The rest of the story
U NIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN CS598 Machine Learning for Signal Processing Classification: The rest of the story 3 October 2017 Today s lecture Important things we haven t covered yet Fisher
More informationAnalysis Techniques Multivariate Methods
Analysis Techniques Multivariate Methods Harrison B. Prosper NEPPSR 007 Outline hintroduction hsignal/background Discrimination hfisher Discriminant hsupport Vector Machines hnaïve Bayes hbayesian Neural
More informationArtificial Neural Networks. MGS Lecture 2
Artificial Neural Networks MGS 2018 - Lecture 2 OVERVIEW Biological Neural Networks Cell Topology: Input, Output, and Hidden Layers Functional description Cost functions Training ANNs Back-Propagation
More informationMultivariate Data Analysis Techniques
Multivariate Data Analysis Techniques Andreas Hoecker (CERN) Workshop on Statistics, Karlsruhe, Germany, Oct 12 14, 2009 O u t l i n e Introduction to multivariate classification and regression Multivariate
More informationNumerical Learning Algorithms
Numerical Learning Algorithms Example SVM for Separable Examples.......................... Example SVM for Nonseparable Examples....................... 4 Example Gaussian Kernel SVM...............................
More informationNeural Networks and Ensemble Methods for Classification
Neural Networks and Ensemble Methods for Classification NEURAL NETWORKS 2 Neural Networks A neural network is a set of connected input/output units (neurons) where each connection has a weight associated
More informationEngineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers
Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:
More informationMining Classification Knowledge
Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology SE lecture revision 2013 Outline 1. Bayesian classification
More informationIntroduction to Machine Learning
Introduction to Machine Learning 3. Instance Based Learning Alex Smola Carnegie Mellon University http://alex.smola.org/teaching/cmu2013-10-701 10-701 Outline Parzen Windows Kernels, algorithm Model selection
More informationA Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007
Decision Trees, cont. Boosting Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 1 st, 2007 1 A Decision Stump 2 1 The final tree 3 Basic Decision Tree Building Summarized
More informationSupport Vector Machines
Support Vector Machines Hypothesis Space variable size deterministic continuous parameters Learning Algorithm linear and quadratic programming eager batch SVMs combine three important ideas Apply optimization
More informationNeural Networks DWML, /25
DWML, 2007 /25 Neural networks: Biological and artificial Consider humans: Neuron switching time 0.00 second Number of neurons 0 0 Connections per neuron 0 4-0 5 Scene recognition time 0. sec 00 inference
More informationChapter 14 Combining Models
Chapter 14 Combining Models T-61.62 Special Course II: Pattern Recognition and Machine Learning Spring 27 Laboratory of Computer and Information Science TKK April 3th 27 Outline Independent Mixing Coefficients
More informationHypothesis testing:power, test statistic CMS:
Hypothesis testing:power, test statistic The more sensitive the test, the better it can discriminate between the null and the alternative hypothesis, quantitatively, maximal power In order to achieve this
More informationIntroduction to Machine Learning Midterm Exam
10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationSUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION
SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationArtificial Intelligence Roman Barták
Artificial Intelligence Roman Barták Department of Theoretical Computer Science and Mathematical Logic Introduction We will describe agents that can improve their behavior through diligent study of their
More informationCSE 151 Machine Learning. Instructor: Kamalika Chaudhuri
CSE 151 Machine Learning Instructor: Kamalika Chaudhuri Ensemble Learning How to combine multiple classifiers into a single one Works well if the classifiers are complementary This class: two types of
More informationLogistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu
Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data
More informationStochastic Gradient Descent
Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular
More information5. Discriminant analysis
5. Discriminant analysis We continue from Bayes s rule presented in Section 3 on p. 85 (5.1) where c i is a class, x isap-dimensional vector (data case) and we use class conditional probability (density
More informationData Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396
Data Mining Linear & nonlinear classifiers Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 31 Table of contents 1 Introduction
More information10-701/ Machine Learning - Midterm Exam, Fall 2010
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationMLPR: Logistic Regression and Neural Networks
MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition Amos Storkey Amos Storkey MLPR: Logistic Regression and Neural Networks 1/28 Outline 1 Logistic Regression 2 Multi-layer
More informationCS534 Machine Learning - Spring Final Exam
CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the
More informationOutline. MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition. Which is the correct model? Recap.
Outline MLPR: and Neural Networks Machine Learning and Pattern Recognition 2 Amos Storkey Amos Storkey MLPR: and Neural Networks /28 Recap Amos Storkey MLPR: and Neural Networks 2/28 Which is the correct
More informationMODULE -4 BAYEIAN LEARNING
MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities
More informationNeural Networks. Nicholas Ruozzi University of Texas at Dallas
Neural Networks Nicholas Ruozzi University of Texas at Dallas Handwritten Digit Recognition Given a collection of handwritten digits and their corresponding labels, we d like to be able to correctly classify
More informationComputer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo
Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain
More informationLinear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging
More informationNeural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17
3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural
More informationThe exam is closed book, closed notes except your one-page cheat sheet.
CS 189 Fall 2015 Introduction to Machine Learning Final Please do not turn over the page before you are instructed to do so. You have 2 hours and 50 minutes. Please write your initials on the top-right
More informationThe exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.
CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please
More information12 - Nonparametric Density Estimation
ST 697 Fall 2017 1/49 12 - Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama Density Review ST 697 Fall 2017 2/49 Continuous Random Variables ST 697 Fall 2017 3/49 1.0 0.8 F(x) 0.6
More informationReconnaissance d objetsd et vision artificielle
Reconnaissance d objetsd et vision artificielle http://www.di.ens.fr/willow/teaching/recvis09 Lecture 6 Face recognition Face detection Neural nets Attention! Troisième exercice de programmation du le
More informationMulti-layer Neural Networks
Multi-layer Neural Networks Steve Renals Informatics 2B Learning and Data Lecture 13 8 March 2011 Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 1 Overview Multi-layer neural
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 1 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification
More informationMachine Learning Practice Page 2 of 2 10/28/13
Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes
More informationCOMS 4771 Introduction to Machine Learning. Nakul Verma
COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW1 due next lecture Project details are available decide on the group and topic by Thursday Last time Generative vs. Discriminative
More information10-701/ Machine Learning, Fall
0-70/5-78 Machine Learning, Fall 2003 Homework 2 Solution If you have questions, please contact Jiayong Zhang .. (Error Function) The sum-of-squares error is the most common training
More informationMining Classification Knowledge
Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology COST Doctoral School, Troina 2008 Outline 1. Bayesian classification
More information1 Machine Learning Concepts (16 points)
CSCI 567 Fall 2018 Midterm Exam DO NOT OPEN EXAM UNTIL INSTRUCTED TO DO SO PLEASE TURN OFF ALL CELL PHONES Problem 1 2 3 4 5 6 Total Max 16 10 16 42 24 12 120 Points Please read the following instructions
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More informationNeural networks (not in book)
(not in book) Another approach to classification is neural networks. were developed in the 1980s as a way to model how learning occurs in the brain. There was therefore wide interest in neural networks
More informationMachine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler
+ Machine Learning and Data Mining Multi-layer Perceptrons & Neural Networks: Basics Prof. Alexander Ihler Linear Classifiers (Perceptrons) Linear Classifiers a linear classifier is a mapping which partitions
More informationHierarchical Boosting and Filter Generation
January 29, 2007 Plan Combining Classifiers Boosting Neural Network Structure of AdaBoost Image processing Hierarchical Boosting Hierarchical Structure Filters Combining Classifiers Combining Classifiers
More informationMultilayer Neural Networks
Multilayer Neural Networks Introduction Goal: Classify objects by learning nonlinearity There are many problems for which linear discriminants are insufficient for minimum error In previous methods, the
More informationMachine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods)
Machine Learning InstanceBased Learning (aka nonparametric methods) Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Non parametric CSE 446 Machine Learning Daniel Weld March
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationStatPatternRecognition: A C++ Package for Multivariate Classification of HEP Data. Ilya Narsky, Caltech
StatPatternRecognition: A C++ Package for Multivariate Classification of HEP Data Ilya Narsky, Caltech Motivation Introduction advanced classification tools in a convenient C++ package for HEP researchers
More informationPATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality
More information