Statistical Methods in Particle Physics

Size: px
Start display at page:

Download "Statistical Methods in Particle Physics"

Transcription

1 Statistical Methods in Particle Physics 8. Multivariate Analysis Prof. Dr. Klaus Reygers (lectures) Dr. Sebastian Neubert (tutorials) Heidelberg University WS 2017/18

2 Multi-Variate Classification Consider events which can be either signal or background events. Each event is characterized by n observables: ~x =(x 1,...,x n ) "feature vector" Goal: classify events as signal or background in an optimal way. This is usually done by mapping the feature vector to a single variable, i.e., to scalar test statistic: R n! R : y(~x) A cut y > c to classify events as signal corresponds to selecting a potentially complicated hyper-surface in feature space. In general superior to classical "rectangular" cuts on the xi. Problem closely related to machine learning (pattern recognition, data mining, ) 2

3 Classification: Different Approaches rectangular cuts non linear linear H 0 k-nearest-neighbor, Boosted Decision Trees, Multi-Layer Perceptrons, Support Vector Machines G. Cowan': 3

4 Signal Probability Instead of Hard Decisions Example: test statistic y for signal and background from a Multi-Layer Perceptron (MLP): Normalized TMVA output for classifier: MLP Signal Background Instead of a hard yes/no decision one can also define the probability of an event to be a signal event: P s (y) P(S y) = p(y B) TMVA manual p(y S) MLP U/O-flow (S,B): (0.0, 0.0)% / (0.0, 0.0)% Normalized TMVA output for classifier: BDT Signal Background Figure 4: Example plots for classifier output distributions for signal and background event test sample. Shown are likelihood (upper left), PDE range search (upper right), Multilay lower left) and boosted decision trees. p(y S) f s p(y S) f s + p(y B) (1 f s ), f s = n s n s + n b TMVA tutorial: An up-to-date reference of all configuration options for the TMVA Factory, 4 the MVA methods:

5 ROC Curve Quality of the classification can be characterized by the receiver operating characteristic (ROC curve) Background rejection versus Signal efficiency 1 εb εb: background efficiency Background rejection good MVA Method: Fisher MLP BDT PDERS Likelihood better Figure 5: Example for the background rejection versus signal e on the classifier outputs for the events of the test sample. Signal efficiency ciency ( ROC curve ) obtained by cutting 5

6 Different Approaches to Classification Neyman-Pearson lemma states that likelihood ratio provides an optimal test statistic for classification: y(~x) = p(~x S) p(~x B) Problem: the underlying pdf's are almost never known explicitly. Two approaches: 1. Estimate signal and background pdf's and construct test statistic based on Neyman-Pearson lemma, e.g. Naïve Bayes classifier (= Likelihood classifier) 2. Decision boundaries determined directly without approximating the pdf's (linear discriminants, decision trees, neural networks, ) 6

7 Supervised Machine Learning (I) Supervised Machine Learning requires labeled training data, i.e., a training sample where for each event it is known whether it is a signal or background event Decision boundary defined by minimizing a loss function ("training") Bias-variance tradeoff Classifiers with a small number of degrees of freedom are less prone to statistical fluctuations: different training samples would result in a similar classification boundaries ("small variance") However, if the data contain features that a model with few degrees of freedom cannot describe, a bias is introduced. In this case a classifier with more degrees of freedom would be better. User has to find a good balance 7

8 Supervised Machine Learning (II) Training, validation, and test sample Decision boundary fixed with training sample Performance on training sample becomes better with more iterations Danger of overtraining: Statistical fluctuations of the training sample will be learnt Validation sample = independent labeled data set not used for training check for overtraining Sign of overtraining: performance on validation sample becomes worse Stop training when signs of overtraining are observed ("early stopping") Performance: apply classifier to independent test sample Often: test sample = validation sample (only small bias) 8

9 Supervised Machine Learning (III) Rule of thumb if training data not expensive Training sample: 50% Validation sample: 25% Test sample: 25% often test sample = validation sample, i.e., training : validation/test = 50:50 Cross validation (efficient use of scarce training data) Split training sample in k independent subset Tk of the full sample T Train on T \ Tk resulting in k different classifiers For each training event there is one classifier that didn't use this event for training Validation results are then combined 9

10 Estimating PDFs from Histograms? Consider 2d example: G. Cowan': signal background Approximate approximate pdfs PDF using byn(x,y s), y S) N(x,y b) andn(x, in y B) corresponding cells. M bins per variable in d dimensions: M d cells hard to generate enough training data (often not practical for d > 1) In general in machine learning, problems related to a large number of dimensions of the feature space are referred to as the "curse of dimensionality" 10

11 k-nearest Neighbor Method (I) k-nn classifier Estimates probability density around the input vector p(~x S) and p(~x B) are approximated by the number of signal and background events in the training sample that lie in a small volume around the point ~x Algorithms finds k nearest neighbors: k = k s + k b Probability for the event to be of signal type: p s (~x) = k s (~x) k s (~x)+k b (~x) 11

12 k-nearest Neighbor Method (II) Simplest choice for distance measure in feature space is the Euclidean distance: 1.5 TMVA manual R = ~x ~y 1 Better: take correlations between variables into account: R = q (~x ~y) T V 1 (~x ~y) x V = covariance matrix "Mahalanobis distance" x 0 Figure 14: Example for the k-nearest The k-nn classifier has best performance when the boundary that separates signal and background events has irregular discriminating features that input cannot variables). be easily The thre approximated by parametric learning methods. The full (open) circles are the signal (bac in the nearest neighborhood (circle) of th signal and 7 background points so that 12 q

13 Kernel Density Estimator (KDE) [Just mentioned briefly here] Idea: Smear training data to get an estimate of the PDFs ˆf h (~x) = 1 n nx i=1 K h (~x ~x i )= 1 nh nx i=1 K( ~x ~x i h ) K = Kernel h = bandwidth (smoothing parameter) Use, e.g., Gaussian kernel: K(~x) = 1 ~x e 2 /2 d = dimension of feature space (2 ) d/2 x 2 Lecture Niklas Berger x 1 13

14 Gaussian KDE in 1 Dimension G. Cowan': Adaptive KDE: adjust width of kernel according to PDF (wide where pdf is low) Advantage of KDE: no training! Disadvantage of KDE: need to sum of all training events to evaluate PDF, i.e., method can be slow 14

15 Fisher Linear Discriminant Linear discriminant is simple. Can still be optimal if amount of training data is limited. n nx Ansatz for test statistic: yx= y(~x) w= i x i = ~w T i x i = w T x ~x i=1 i=1 f y s, f y b Choose parameters wi so that separation between signal and background distribution is maximum. Need to define "separation". f (y) t s t b Fisher: maximize J(~w) = ( s b ) 2 2 s + 2 b s b J w= s b 2 G. Cowan': s 2 b 2 y 15

16 Fisher Linear Discriminant: Variable Definitions Mean and covariance for signal and background: µ s,b i = V s,b ij = Z Z x i f (~x H s,b )d~x (x i µ s,b i )(x j µ s,b j ) f (~x H s,b )d~x Mean and covariance of y(~x) for signal and background: Z s,b = y(~x)f (~x H s,b )d~x = ~w T ~µ s,b Z 2 s,b = (y(~x) s,b ) 2 f (~x H s,b )d~x = ~w T V s,b ~w G. Cowan': 16

17 Fisher Linear Discriminant: Determining the Coefficients wi Numerator of J(~w) : ( s b ) 2 = J(~w) Denominator of : nx i=1 2 s + 2 b = w i (µ s i µ b i )! 2 = nx i,j=1 nx i,j=1 nx i,j=1 w i w j (µ s i µ b i )(µ s j µ b j ) w i w j B ij = ~w T B ~w w i w j (V s + V b ) ij ~w T W ~w Maximize: J(~w) = ~w T B ~w ~w T W ~w = separation between classes separation within classes G. Cowan': 17

18 Fisher Linear Discriminant: Determining the Coefficients i =0 gives: y(~x) =~w T ~x with ~w / W 1 (~µ s ~µ b ) We obtain linear decision boundaries. linear decision boundary Weight vector ~w can be interpreted as a direction in feature space on which the events are projected. G. Cowan': 18

19 Fisher Linear Discriminant: Remarks In case the signal and background pdfs f (~x H s ) and f (~x H b ) are both multivariate Gaussian with the same covariance but different means, the Fisher discriminant is y(~x) / ln f (~x H s) f (~x H b ) That is, in this case the Fisher discriminant is an optimal classifier according to the Neyman-Pearson lemma (as y(~x) is a monotonic function of the likelihood ratio) Test statistic can be written as y(~x) =w 0 + nx i=1 w i x i where events with y > 0 are classified as signal. Same functional form as for the perceptron (prototype of neural networks). 19

20 Perceptron Discriminant: y(~x) =h w 0 + nx i=1 w i x i! The nonlinear, monotonic function h is called activation function. Typical choices for h: 1 1+e x ( sigmoid ), tanhx x 1 h(x) y(~x) x n x 20

21 The Biological Inspiration: the Neuron Human brain: neurons, each with on average 7000 synaptic connections y l 1 1 y l 1 2. y l 1 n Input w l 1 1j w l 1 2j w l 1 nj ρ Output Σ y l j 21

22 Feedforward Neural Network with One Hidden Layer superscripts indicates layer number 1(~x) y(~x) i(~x) =h (1) i0 + nx j=1 1 w (1) ij x j A m(~x) y(~x) =h (2) 10 + m X j=1 w (2) 1j 1 j(~x) A Straightforward to generalize to multiple hidden layers 22

23 Network Training ~x a : training event, a = 1,..., N t a : correct label for training event a e.g., ta = 1, 0 for signal and background, respectively ~w : vector containing all weights Error function: E(~w) = 1 2 NX (y(~x a, ~w) t a ) 2 = NX E a (~w) a=1 a=1 Weights are determined by minimizing the error function. 23

24 Backpropagation ~w (0) Start with an initial guess for the weights an then update weights after each training event: ~w ( +1) = ~w ( ) re a (~w ( ) ) learning rate Let's write network output as follows: mx y(~x) =h(u(~x)) with u(~x) = w (2) 1j j(~x), j(~x) =h Here we defined φ0 = x0 = 1 and the sums start from 0 to include the offsets. Weights from hidden layer to output: E a = 1 2 (y a t a ) (2) 1j j=0 =(y a t a )h 0 (u(~x (2) 1j =(y a t a )h 0 (u(~x a )) j (~x a ) Weights from input layer to hidden layer ( further application of chain rule): nx k=0 w (1) jk x k! h (v j (1) jk =(y a t a )h 0 (u(~x a ))w (2) 1j h 0 (v j (~x a )) x a,k ~x a (x a,1,...,x a,n ) 24

25 Neural Network Output and Decision Boundaries P. Bhat, Multivariate Analysis Methods in Particle Physics, inspirehep.net/record/ a Input layer Hidden layer Output layer 800 b x 1 x 2 x 3 Number of events Signal Background output of neural network 200 x d x i h j 0(x) = f(x,w) NN output 25 c d 20 Signal Background NN contours decision boundaries for different cuts on NN output Variable x Variable x p(s x 1, x 2 ) Variable x Variable x signal probability p(s x1, x2) 25

26 Example of Overtraining Too many neurons/layers make a neural network too flexible overtraining training sample G. Cowan: test sample Network "learns" features that are merely statistical fluctuations in the training sample 26

27 Monitoring Overtraining G. Cowan: Monitor fraction of misclassified events (or error function:) error rate optimum = minimum of error rate for test sample overtraining = increase of error rate test sample training sample flexibility (e.g., number of nodes/layers) 27

28 Example: Identification of Events with Top-Quarks a t t! W + bw b! l bq q b Shaded = signal b likelihood method Background Signal Events (arbitrary units) x 1 = missing E T x 2 = aplanarity Events (arbitrary units) D LB Background Signal x 3 x 4 D0 experiment, plot from inspirehep.net/record/ neural net D NN 28

29 Deep Neural Networks Deep networks: many hidden layers with large number of neurons Challenges Hard too train ("vanishing gradient problem") Training slow Risk of overtraining Big progress in recent years Interest in NN waned before ca Milestone: paper by G. Hinton (2006): "learning for deep belief nets" Image recognition, AlphaGo, Soon: self-driving cars, 29

30 Fun with Neural Nets in the Browser 30

31 Decision Trees (I) arxiv:physics/ v1 root node B 4/37 S 7/1 S/B 52/48 < PMT Hits? S/B 9/10 S/B 48/11 < 0.2 GeV 0.2 GeV Energy? < 500 cm 500 cm Radius? B 2/9 branch node (node with further branching) S 39/1 MiniBooNE: 1520 photomultiplier signals, goal: separation of νe from νμ events leaf node (no further branching) Leaf nodes classify events as either signal or background 31

32 Decision Trees (II) Ann.Rev.Nucl.Part.Sci. 61 (2011) a Root node {x 1 5, x 2 } b x 1 θ 1 x 1 > θ 1 x 2 R 3 θ 3 x 2 θ 2 x 2 > θ 2 x 2 θ 3 x 2 > θ 3 R 2 R 1 R 2 R 3 θ 2 R 4 R 5 R 1 x 1 θ 4 x 1 > θ 4 θ 1 θ 4 x 1 R 4 R 5 Easy to interpret and visualize: Space of feature vectors split up into rectangular volumes (attributed to either signal or background) How to build a decision tree in an optimal way? 32

33 Finding Optimal Cuts Separation btw. signal and background is often measured with the Gini index: G = p(1 p) Here p is the purity: p = P signal w i P signal w i + P background w i w i = weight of event i [usefulness of weights will become apparent soon] Improvement in signal/background separation after splitting a set A into two sets B and C: = W A G A W B G B W C G C where W X = X X w i 33

34 Separation Measures arbitrary unit Split criterion Misclas. error Entropy Gini signal purity Cross entropy: p ln p (1 p)ln(1 p) p Gini index: p(1 p) [after Corrado Gini, used to measure income and wealth inequalities, 1912] Misclassification rate: 1 max(p,1 p) 34

35 Decision Tree Pruning When to stop growing a tree? When all nodes are essentially pure? Well, that's overfitting! Pruning Cut back fully grown tree to avoid overtraining 35

36 Boosted Decision Trees: Idea Drawback of decisions trees: very sensitive to statistical fluctuations in training sample Solution: boosting One tree several trees ("forrest") Trees are derived from the same training ensemble by reweighting events Individual trees are then combined: weighted average of individual trees Boosting is a general method of combining a set of classifiers (not necessarily decisions trees) into a new, more stable classifier with smaller error. Popular example: AdaBoost (Freund, Schapire, 1997) 36

37 AdaBoost (short for Adaptive Boosting) Initial training sample with equal weights normalized as nx Train first classifier f1: ~x 1,...,~x n : multivariate event data y 1,...,y n : true class labels, +1 or 1 w (1) (1) 1,...,w n event weights i=1 w (1) i =1 f 1 (~x i ) > 0 f 1 (~x i ) < 0 classify as signal classify as background 37

38 Updating Events Weights Define training sample k+1 from training sample k by updating weights: w (k+1) i = w (k) i e k f k (~x i )y i /2 Z k i = event index normalization factor so that nx i=1 w (k) i =1 Weight is increased if event was misclassified by the previous classifier "Next classifier should pay more attention to misclassified events" At each step the classifier fk minimizes error rate " k = nx i=1 w (k) i I (y i f k (~x i ) apple 0), I (X )=1ifX is true, 0 otherwise 38

39 Assigning the Classifier Score Assign score to each classifier according to its error rate: k =ln 1 " k " k Combined classifier (weighted average): f (~x) = KX k=1 k f k (~x) It can be shown that the error rate of the combined classifier satisfies " apple KY k=1 2 p " k (1 " k ) 39

40 General Remarks on Multi-Variate Analyses MVA Methods More effective than classic cut-based analyses Take correlations of input variables into account Important: find good input variables for MVA methods Good separation power between S and B Little correlations among variables No correlation with the parameters you try to measure in your signal sample! Pre-processing Apply obvious variable transformations and let MVA method do the rest Make use of obvious symmetries: if e.g. a particle production process is symmetric in polar angle θ use cos θ and not cos θ as input variable It is generally useful to bring all input variables to a similar numerical range H. Voss, Multivariate Data Analysis and Machine Learning in High Energy Physics 40

41 Classifiers and Their Properties H. Voss, Multivariate Data Analysis and Machine Learning in High Energy Physics Classifiers Criteria Cuts Likelihood PDERS / k-nn H-Matrix Fisher MLP BDT RuleFit SVM Performance no / linear correlations nonlinear correlations Speed Training Response / Robust -ness Overtraining Weak input variables Curse of dimensionality Transparency 41

Advanced statistical methods for data analysis Lecture 2

Advanced statistical methods for data analysis Lecture 2 Advanced statistical methods for data analysis Lecture 2 RHUL Physics www.pp.rhul.ac.uk/~cowan Universität Mainz Klausurtagung des GK Eichtheorien exp. Tests... Bullay/Mosel 15 17 September, 2008 1 Outline

More information

Multivariate statistical methods and data mining in particle physics Lecture 4 (19 June, 2008)

Multivariate statistical methods and data mining in particle physics Lecture 4 (19 June, 2008) Multivariate statistical methods and data mining in particle physics Lecture 4 (19 June, 2008) RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement

More information

Multivariate statistical methods and data mining in particle physics

Multivariate statistical methods and data mining in particle physics Multivariate statistical methods and data mining in particle physics RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement of the problem Some general

More information

Statistical Methods for Particle Physics Lecture 2: multivariate methods

Statistical Methods for Particle Physics Lecture 2: multivariate methods Statistical Methods for Particle Physics Lecture 2: multivariate methods http://indico.ihep.ac.cn/event/4902/ istep 2015 Shandong University, Jinan August 11-19, 2015 Glen Cowan ( 谷林 科恩 ) Physics Department

More information

Statistical Methods for Particle Physics Lecture 2: statistical tests, multivariate methods

Statistical Methods for Particle Physics Lecture 2: statistical tests, multivariate methods Statistical Methods for Particle Physics Lecture 2: statistical tests, multivariate methods www.pp.rhul.ac.uk/~cowan/stat_aachen.html Graduierten-Kolleg RWTH Aachen 10-14 February 2014 Glen Cowan Physics

More information

Applied Statistics. Multivariate Analysis - part II. Troels C. Petersen (NBI) Statistics is merely a quantization of common sense 1

Applied Statistics. Multivariate Analysis - part II. Troels C. Petersen (NBI) Statistics is merely a quantization of common sense 1 Applied Statistics Multivariate Analysis - part II Troels C. Petersen (NBI) Statistics is merely a quantization of common sense 1 Fisher Discriminant You want to separate two types/classes (A and B) of

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

Multivariate Methods in Statistical Data Analysis

Multivariate Methods in Statistical Data Analysis Multivariate Methods in Statistical Data Analysis Web-Site: http://tmva.sourceforge.net/ See also: "TMVA - Toolkit for Multivariate Data Analysis, A. Hoecker, P. Speckmayer, J. Stelzer, J. Therhaag, E.

More information

Advanced statistical methods for data analysis Lecture 1

Advanced statistical methods for data analysis Lecture 1 Advanced statistical methods for data analysis Lecture 1 RHUL Physics www.pp.rhul.ac.uk/~cowan Universität Mainz Klausurtagung des GK Eichtheorien exp. Tests... Bullay/Mosel 15 17 September, 2008 1 Outline

More information

Machine Learning Lecture 7

Machine Learning Lecture 7 Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant

More information

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian

More information

Multivariate Analysis Techniques in HEP

Multivariate Analysis Techniques in HEP Multivariate Analysis Techniques in HEP Jan Therhaag IKTP Institutsseminar, Dresden, January 31 st 2013 Multivariate analysis in a nutshell Neural networks: Defeating the black box Boosted Decision Trees:

More information

Statistical Tools in Collider Experiments. Multivariate analysis in high energy physics

Statistical Tools in Collider Experiments. Multivariate analysis in high energy physics Statistical Tools in Collider Experiments Multivariate analysis in high energy physics Pauli Lectures - 06/0/01 Nicolas Chanon - ETH Zürich 1 Main goals of these lessons - Have an understanding of what

More information

FINAL: CS 6375 (Machine Learning) Fall 2014

FINAL: CS 6375 (Machine Learning) Fall 2014 FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

ECE662: Pattern Recognition and Decision Making Processes: HW TWO

ECE662: Pattern Recognition and Decision Making Processes: HW TWO ECE662: Pattern Recognition and Decision Making Processes: HW TWO Purdue University Department of Electrical and Computer Engineering West Lafayette, INDIANA, USA Abstract. In this report experiments are

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Machine Learning Lecture 10

Machine Learning Lecture 10 Machine Learning Lecture 10 Neural Networks 26.11.2018 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Today s Topic Deep Learning 2 Course Outline Fundamentals Bayes

More information

Lecture 3: Pattern Classification

Lecture 3: Pattern Classification EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

Statistical Data Analysis Stat 2: Monte Carlo Method, Statistical Tests

Statistical Data Analysis Stat 2: Monte Carlo Method, Statistical Tests Statistical Data Analysis Stat 2: Monte Carlo Method, Statistical Tests London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,

More information

Multivariate Data Analysis and Machine Learning in High Energy Physics (III)

Multivariate Data Analysis and Machine Learning in High Energy Physics (III) Multivariate Data Analysis and Machine Learning in High Energy Physics (III) Helge Voss (MPI K, Heidelberg) Graduierten-Kolleg, Freiburg, 11.5-15.5, 2009 Outline Summary of last lecture 1-dimensional Likelihood:

More information

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows Kn-Nearest

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

Lecture 2. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1

Lecture 2. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1 Lecture 2 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

PATTERN CLASSIFICATION

PATTERN CLASSIFICATION PATTERN CLASSIFICATION Second Edition Richard O. Duda Peter E. Hart David G. Stork A Wiley-lnterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane Singapore Toronto CONTENTS

More information

Harrison B. Prosper. Bari Lectures

Harrison B. Prosper. Bari Lectures Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison B. Prosper Bari, 2016 1 h Lecture 1 h Introduction h Classification h Grid Searches

More information

Brief Introduction to Machine Learning

Brief Introduction to Machine Learning Brief Introduction to Machine Learning Yuh-Jye Lee Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU August 29, 2016 1 / 49 1 Introduction 2 Binary Classification 3 Support Vector

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

Holdout and Cross-Validation Methods Overfitting Avoidance

Holdout and Cross-Validation Methods Overfitting Avoidance Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest

More information

Boosted Decision Trees and Applications

Boosted Decision Trees and Applications EPJ Web of Conferences, 24 (23) DOI:./ epjconf/ 2324 C Owned by the authors, published by EDP Sciences, 23 Boosted Decision Trees and Applications Yann COADOU CPPM, Aix-Marseille UniversitNRS/IN2P3, Marseille,

More information

Intelligent Systems Discriminative Learning, Neural Networks

Intelligent Systems Discriminative Learning, Neural Networks Intelligent Systems Discriminative Learning, Neural Networks Carsten Rother, Dmitrij Schlesinger WS2014/2015, Outline 1. Discriminative learning 2. Neurons and linear classifiers: 1) Perceptron-Algorithm

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Multilayer Neural Networks

Multilayer Neural Networks Multilayer Neural Networks Multilayer Neural Networks Discriminant function flexibility NON-Linear But with sets of linear parameters at each layer Provably general function approximators for sufficient

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

Lecture 3: Pattern Classification. Pattern classification

Lecture 3: Pattern Classification. Pattern classification EE E68: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mitures and

More information

Linear discriminant functions

Linear discriminant functions Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative

More information

Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Voting (Ensemble Methods)

Voting (Ensemble Methods) 1 2 Voting (Ensemble Methods) Instead of learning a single classifier, learn many weak classifiers that are good at different parts of the data Output class: (Weighted) vote of each classifier Classifiers

More information

Learning with multiple models. Boosting.

Learning with multiple models. Boosting. CS 2750 Machine Learning Lecture 21 Learning with multiple models. Boosting. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Learning with multiple models: Approach 2 Approach 2: use multiple models

More information

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive

More information

Classification: The rest of the story

Classification: The rest of the story U NIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN CS598 Machine Learning for Signal Processing Classification: The rest of the story 3 October 2017 Today s lecture Important things we haven t covered yet Fisher

More information

Analysis Techniques Multivariate Methods

Analysis Techniques Multivariate Methods Analysis Techniques Multivariate Methods Harrison B. Prosper NEPPSR 007 Outline hintroduction hsignal/background Discrimination hfisher Discriminant hsupport Vector Machines hnaïve Bayes hbayesian Neural

More information

Artificial Neural Networks. MGS Lecture 2

Artificial Neural Networks. MGS Lecture 2 Artificial Neural Networks MGS 2018 - Lecture 2 OVERVIEW Biological Neural Networks Cell Topology: Input, Output, and Hidden Layers Functional description Cost functions Training ANNs Back-Propagation

More information

Multivariate Data Analysis Techniques

Multivariate Data Analysis Techniques Multivariate Data Analysis Techniques Andreas Hoecker (CERN) Workshop on Statistics, Karlsruhe, Germany, Oct 12 14, 2009 O u t l i n e Introduction to multivariate classification and regression Multivariate

More information

Numerical Learning Algorithms

Numerical Learning Algorithms Numerical Learning Algorithms Example SVM for Separable Examples.......................... Example SVM for Nonseparable Examples....................... 4 Example Gaussian Kernel SVM...............................

More information

Neural Networks and Ensemble Methods for Classification

Neural Networks and Ensemble Methods for Classification Neural Networks and Ensemble Methods for Classification NEURAL NETWORKS 2 Neural Networks A neural network is a set of connected input/output units (neurons) where each connection has a weight associated

More information

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:

More information

Mining Classification Knowledge

Mining Classification Knowledge Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology SE lecture revision 2013 Outline 1. Bayesian classification

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning 3. Instance Based Learning Alex Smola Carnegie Mellon University http://alex.smola.org/teaching/cmu2013-10-701 10-701 Outline Parzen Windows Kernels, algorithm Model selection

More information

A Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007

A Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007 Decision Trees, cont. Boosting Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 1 st, 2007 1 A Decision Stump 2 1 The final tree 3 Basic Decision Tree Building Summarized

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Hypothesis Space variable size deterministic continuous parameters Learning Algorithm linear and quadratic programming eager batch SVMs combine three important ideas Apply optimization

More information

Neural Networks DWML, /25

Neural Networks DWML, /25 DWML, 2007 /25 Neural networks: Biological and artificial Consider humans: Neuron switching time 0.00 second Number of neurons 0 0 Connections per neuron 0 4-0 5 Scene recognition time 0. sec 00 inference

More information

Chapter 14 Combining Models

Chapter 14 Combining Models Chapter 14 Combining Models T-61.62 Special Course II: Pattern Recognition and Machine Learning Spring 27 Laboratory of Computer and Information Science TKK April 3th 27 Outline Independent Mixing Coefficients

More information

Hypothesis testing:power, test statistic CMS:

Hypothesis testing:power, test statistic CMS: Hypothesis testing:power, test statistic The more sensitive the test, the better it can discriminate between the null and the alternative hypothesis, quantitatively, maximal power In order to achieve this

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Artificial Intelligence Roman Barták

Artificial Intelligence Roman Barták Artificial Intelligence Roman Barták Department of Theoretical Computer Science and Mathematical Logic Introduction We will describe agents that can improve their behavior through diligent study of their

More information

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri CSE 151 Machine Learning Instructor: Kamalika Chaudhuri Ensemble Learning How to combine multiple classifiers into a single one Works well if the classifiers are complementary This class: two types of

More information

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data

More information

Stochastic Gradient Descent

Stochastic Gradient Descent Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular

More information

5. Discriminant analysis

5. Discriminant analysis 5. Discriminant analysis We continue from Bayes s rule presented in Section 3 on p. 85 (5.1) where c i is a class, x isap-dimensional vector (data case) and we use class conditional probability (density

More information

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396 Data Mining Linear & nonlinear classifiers Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 31 Table of contents 1 Introduction

More information

10-701/ Machine Learning - Midterm Exam, Fall 2010

10-701/ Machine Learning - Midterm Exam, Fall 2010 10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

MLPR: Logistic Regression and Neural Networks

MLPR: Logistic Regression and Neural Networks MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition Amos Storkey Amos Storkey MLPR: Logistic Regression and Neural Networks 1/28 Outline 1 Logistic Regression 2 Multi-layer

More information

CS534 Machine Learning - Spring Final Exam

CS534 Machine Learning - Spring Final Exam CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the

More information

Outline. MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition. Which is the correct model? Recap.

Outline. MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition. Which is the correct model? Recap. Outline MLPR: and Neural Networks Machine Learning and Pattern Recognition 2 Amos Storkey Amos Storkey MLPR: and Neural Networks /28 Recap Amos Storkey MLPR: and Neural Networks 2/28 Which is the correct

More information

MODULE -4 BAYEIAN LEARNING

MODULE -4 BAYEIAN LEARNING MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities

More information

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Neural Networks. Nicholas Ruozzi University of Texas at Dallas Neural Networks Nicholas Ruozzi University of Texas at Dallas Handwritten Digit Recognition Given a collection of handwritten digits and their corresponding labels, we d like to be able to correctly classify

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging

More information

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17 3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural

More information

The exam is closed book, closed notes except your one-page cheat sheet.

The exam is closed book, closed notes except your one-page cheat sheet. CS 189 Fall 2015 Introduction to Machine Learning Final Please do not turn over the page before you are instructed to do so. You have 2 hours and 50 minutes. Please write your initials on the top-right

More information

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please

More information

12 - Nonparametric Density Estimation

12 - Nonparametric Density Estimation ST 697 Fall 2017 1/49 12 - Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama Density Review ST 697 Fall 2017 2/49 Continuous Random Variables ST 697 Fall 2017 3/49 1.0 0.8 F(x) 0.6

More information

Reconnaissance d objetsd et vision artificielle

Reconnaissance d objetsd et vision artificielle Reconnaissance d objetsd et vision artificielle http://www.di.ens.fr/willow/teaching/recvis09 Lecture 6 Face recognition Face detection Neural nets Attention! Troisième exercice de programmation du le

More information

Multi-layer Neural Networks

Multi-layer Neural Networks Multi-layer Neural Networks Steve Renals Informatics 2B Learning and Data Lecture 13 8 March 2011 Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 1 Overview Multi-layer neural

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 1 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification

More information

Machine Learning Practice Page 2 of 2 10/28/13

Machine Learning Practice Page 2 of 2 10/28/13 Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes

More information

COMS 4771 Introduction to Machine Learning. Nakul Verma

COMS 4771 Introduction to Machine Learning. Nakul Verma COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW1 due next lecture Project details are available decide on the group and topic by Thursday Last time Generative vs. Discriminative

More information

10-701/ Machine Learning, Fall

10-701/ Machine Learning, Fall 0-70/5-78 Machine Learning, Fall 2003 Homework 2 Solution If you have questions, please contact Jiayong Zhang .. (Error Function) The sum-of-squares error is the most common training

More information

Mining Classification Knowledge

Mining Classification Knowledge Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology COST Doctoral School, Troina 2008 Outline 1. Bayesian classification

More information

1 Machine Learning Concepts (16 points)

1 Machine Learning Concepts (16 points) CSCI 567 Fall 2018 Midterm Exam DO NOT OPEN EXAM UNTIL INSTRUCTED TO DO SO PLEASE TURN OFF ALL CELL PHONES Problem 1 2 3 4 5 6 Total Max 16 10 16 42 24 12 120 Points Please read the following instructions

More information

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Gaussian and Linear Discriminant Analysis; Multiclass Classification Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015

More information

Neural networks (not in book)

Neural networks (not in book) (not in book) Another approach to classification is neural networks. were developed in the 1980s as a way to model how learning occurs in the brain. There was therefore wide interest in neural networks

More information

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler + Machine Learning and Data Mining Multi-layer Perceptrons & Neural Networks: Basics Prof. Alexander Ihler Linear Classifiers (Perceptrons) Linear Classifiers a linear classifier is a mapping which partitions

More information

Hierarchical Boosting and Filter Generation

Hierarchical Boosting and Filter Generation January 29, 2007 Plan Combining Classifiers Boosting Neural Network Structure of AdaBoost Image processing Hierarchical Boosting Hierarchical Structure Filters Combining Classifiers Combining Classifiers

More information

Multilayer Neural Networks

Multilayer Neural Networks Multilayer Neural Networks Introduction Goal: Classify objects by learning nonlinearity There are many problems for which linear discriminants are insufficient for minimum error In previous methods, the

More information

Machine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods)

Machine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods) Machine Learning InstanceBased Learning (aka nonparametric methods) Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Non parametric CSE 446 Machine Learning Daniel Weld March

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

StatPatternRecognition: A C++ Package for Multivariate Classification of HEP Data. Ilya Narsky, Caltech

StatPatternRecognition: A C++ Package for Multivariate Classification of HEP Data. Ilya Narsky, Caltech StatPatternRecognition: A C++ Package for Multivariate Classification of HEP Data Ilya Narsky, Caltech Motivation Introduction advanced classification tools in a convenient C++ package for HEP researchers

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information