Curve learning. p.1/35

Size: px
Start display at page:

Download "Curve learning. p.1/35"

Transcription

1 Curve learning Gérard Biau UNIVERSITÉ MONTPELLIER II p.1/35

2 Summary The problem The mathematical model Functional classification 1. Fourier filtering 2. Wavelet filtering Applications p.2/35

3 The problem In many experiments, practitioners collect samples of curves and other functional observations. p.3/35

4 The problem In many experiments, practitioners collect samples of curves and other functional observations. We wish to investigate how classical learning rules can be extended to handle functions. p.3/35

5 A typical problem p.4/35

6 A speech recognition problem 0.5 YES 0.5 NO BOAT GOAT SH AO p.5/35

7 Classification Given a random pair (X, Y ) R d {0, 1}, the problem of classification is to predict the unknown nature Y of a new observation X. p.6/35

8 Classification Given a random pair (X, Y ) R d {0, 1}, the problem of classification is to predict the unknown nature Y of a new observation X. The statistician creates a classifier g : R d {0, 1} which represents her guess of the label of X. p.6/35

9 Classification Given a random pair (X, Y ) R d {0, 1}, the problem of classification is to predict the unknown nature Y of a new observation X. The statistician creates a classifier g : R d {0, 1} which represents her guess of the label of X. An error occurs if g(x) Y, and the probability of error for a particular classifier g is L(g) = P{g(X) Y }. p.6/35

10 Classification The Bayes rule g (x) = { 0 if P{Y = 0 X = x} P{Y = 1 X = x} 1 otherwise, is the optimal decision. That is, for any decision function g : R d {0, 1}, P{g (X) Y } P{g(X) Y }. p.7/35

11 Classification The Bayes rule g (x) = { 0 if P{Y = 0 X = x} P{Y = 1 X = x} 1 otherwise, is the optimal decision. That is, for any decision function g : R d {0, 1}, P{g (X) Y } P{g(X) Y }. The problem is thus to construct a reasonable classifier g n based on i.i.d. observations (X 1, Y 1 ),..., (X n, Y n ). p.7/35

12 Error criteria The goal is to make the error probability P{g n (X) Y } close to L. p.8/35

13 Error criteria The goal is to make the error probability P{g n (X) Y } close to L. Definition. consistent if A decision rule g n is called P{g n (X) Y } L as n. p.8/35

14 Error criteria The goal is to make the error probability P{g n (X) Y } close to L. Definition. consistent if A decision rule g n is called P{g n (X) Y } L as n. Definition. A decision rule is called universally consistent if it is consistent for any distribution of the pair (X, Y ). p.8/35

15 Two simple and popular rules The moving window rule: g n (x) = { 0 if n i=1 1 {Y i =0,X i B x,hn } n i=1 1 {Y i =1,X i B x,h n } 1 otherwise. p.9/35

16 Two simple and popular rules The moving window rule: g n (x) = { 0 if n i=1 1 {Y i =0,X i B x,hn } n i=1 1 {Y i =1,X i B x,h n } 1 otherwise. The nearest neighbor rule: { 0 if k n g n (x) = i=1 1 {Y (i) (x)=0} k n i=1 1 {Y (i) (x)=1} 1 otherwise. p.9/35

17 Two simple and popular rules The moving window rule: g n (x) = { 0 if n i=1 1 {Y i =0,X i B x,hn } n i=1 1 {Y i =1,X i B x,h n } 1 otherwise. The nearest neighbor rule: { 0 if k n g n (x) = i=1 1 {Y (i) (x)=0} k n i=1 1 {Y (i) (x)=1} 1 otherwise. Devroye, Györfi and Lugosi (1996). A Probabilistic Theory of Pattern Recognition. p.9/35

18 Notation The model. The random variable X takes values in a metric space (F, ρ). The distribution of the pair (X, Y ) is completely specified by µ and η: µ(a) = P{X A} and η(x) = P{Y = 1 X = x}. p.10/35

19 Notation The model. The random variable X takes values in a metric space (F, ρ). The distribution of the pair (X, Y ) is completely specified by µ and η: µ(a) = P{X A} and η(x) = P{Y = 1 X = x}. And all the definitions remain the same... p.10/35

20 Assumptions (H1) The space F is complete. p.11/35

21 Assumptions (H1) The space F is complete. (H2) The following differentiation result holds: 1 lim η dµ = η(x) in µ-probability. h 0 µ(b x,h ) B x,h p.11/35

22 Assumptions (H1) The space F is complete. (H2) The following differentiation result holds: 1 lim η dµ = η(x) in µ-probability. h 0 µ(b x,h ) B x,h Aïe aïe aïe... p.11/35

23 Results Theorem [consistency]. Assume that (H1) and (H2) hold. If h n 0 and, for every k 1, N k (h n /2)/n 0 as n, then lim n L(g n) = L. p.12/35

24 Results Theorem [consistency]. Assume that (H1) and (H2) hold. If h n 0 and, for every k 1, N k (h n /2)/n 0 as n, then lim n L(g n) = L. h n 1/ log log n: curse of dimensionality. p.12/35

25 Curse of dimensionality? Consider the hypercube [ a, a] d and an inscribed hypersphere with radius r = a. Then f d = volume sphere volume cube = π d/2 d2 d 1 Γ(d/2). p.13/35

26 Curse of dimensionality? Consider the hypercube [ a, a] d and an inscribed hypersphere with radius r = a. Then f d = volume sphere volume cube = π d/2 d2 d 1 Γ(d/2). d f d p.13/35

27 Curse of dimensionality? Consider the hypercube [ a, a] d and an inscribed hypersphere with radius r = a. Then f d = volume sphere volume cube = π d/2 d2 d 1 Γ(d/2). d f d As the dimension increases, most spherical neighborhoods will be empty! p.13/35

28 Classification in Hilbert spaces Here, the observations X i take values in the infinite dimensional space L 2 ( [0, 1] ). p.14/35

29 Classification in Hilbert spaces Here, the observations X i take values in the infinite dimensional space L 2 ( [0, 1] ). The trigonometric basis, formed by the functions ψ 1 (t) = 1, ψ 2j (t) = 2 cos(2πjt), and ψ 2j+1 (t) = 2 sin(2πjt), j = 1, 2,..., is a complete orthonormal system in L 2 ( [0, 1] ). p.14/35

30 Classification in Hilbert spaces Here, the observations X i take values in the infinite dimensional space L 2 ( [0, 1] ). The trigonometric basis, formed by the functions ψ 1 (t) = 1, ψ 2j (t) = 2 cos(2πjt), and ψ 2j+1 (t) = 2 sin(2πjt), j = 1, 2,..., is a complete orthonormal system in L 2 ( [0, 1] ). We let the Fourier representation of X i be X i (t) = j=1 X ij ψ j (t). p.14/35

31 Dimension reduction Select the effective dimension d to approximate each X i by X i (t) d j=1 X ij ψ j (t). p.15/35

32 Dimension reduction Select the effective dimension d to approximate each X i by X i (t) d j=1 X ij ψ j (t). Perform nearest neighbor classification based on the coefficient vector X (d) i = (X i1,..., X id ) for the selected dimension d. p.15/35

33 Data-splitting We split the data into p.16/35

34 Data-splitting We split the data into a training sequence (X 1, Y 1 ),..., (X l, Y l ) p.16/35

35 Data-splitting We split the data into a training sequence (X 1, Y 1 ),..., (X l, Y l ) a testing sequence (X l+1, Y l+1 ),..., (X m+l, Y m+l ). p.16/35

36 Data-splitting We split the data into a training sequence a testing sequence (X 1, Y 1 ),..., (X l, Y l ) (X l+1, Y l+1 ),..., (X m+l, Y m+l ). The training sequence defines a class of nearest neighbor classifiers with members g l,k,d. p.16/35

37 Empirical risk minimization The testing sequence is used to select a particular classifier by minimizing the penalized empirical risk 1 m 1 { } λ + d. g l,k,d (X (d) i ) Y i m i J m p.17/35

38 Empirical risk minimization The testing sequence is used to select a particular classifier by minimizing the penalized empirical risk 1 m 1 { } λ + d. g l,k,d (X (d) i ) Y i m i J m In practice, the regularization penalty may be formally replaced by λ d = 0 for d d n and λ d = + for d d n +1. p.17/35

39 Result Theorem [oracle inequality]. Assume that d=1 e 2λ2 d <. (1) Then there exists a constant C > 0 such that L(ĝ n ) L [ inf (L d L ) + inf d 1 log l + C m. 1 k l ( L(gl,k,d ) Ld) λ ] d + m p.18/35

40 Result Theorem [oracle inequality]. Assume that d=1 e 2λ2 d <. (1) Then there exists a constant C > 0 such that L(ĝ n ) L [ inf (L d L ) + inf d 1 log l + C m. 1 k l ( L(gl,k,d ) Ld) λ ] d + m p.18/35

41 Result Theorem [oracle inequality]. Assume that d=1 e 2λ2 d <. (1) Then there exists a constant C > 0 such that L(ĝ n ) L [ inf (L d L ) + inf d 1 log l + C m. 1 k l ( L(gl,k,d ) Ld) λ ] d + m p.18/35

42 Result Theorem [oracle inequality]. Assume that d=1 e 2λ2 d <. (1) Then there exists a constant C > 0 such that L(ĝ n ) L [ inf (L d L ) + inf d 1 log l + C m. 1 k l ( L(gl,k,d ) Ld) λ ] d + m p.18/35

43 Results Theorem [consistency]. (1) and Under the assumption lim n l =, lim n m =, lim n log l m = 0, the classifier ĝ n is universally consistent. p.19/35

44 Results Theorem [consistency]. (1) and Under the assumption lim n l =, lim n m =, lim n log l m = 0, the classifier ĝ n is universally consistent. Under stronger assumptions, the classifier ĝ n is strongly consistent, that is P { ĝ n (X) Y (X 1, Y 1 ),..., (X n, Y n ) } L with probability one. p.19/35

45 Illustration 0.5 YES 0.5 NO BOAT GOAT SH AO p.20/35

46 Results Method Error rates 0.10 FOURIER NN-DIRECT QDA p.21/35

47 Multiresolution analysis V 0 V 1 V 2... L 2 ( [0, 1] ). p.22/35

48 Multiresolution analysis V 0 V 1 V 2... L 2 ( [0, 1] ). φ: the father wavelet such that {φ j,k : k = 0,..., 2 j 1} is an orthornormal basis of V j. p.22/35

49 Multiresolution analysis V 0 V 1 V 2... L 2 ( [0, 1] ). φ: the father wavelet such that {φ j,k : k = 0,..., 2 j 1} is an orthornormal basis of V j. For each j, one constructs an orthornormal basis {ψ j,k : k = 0,..., 2 j 1} of W j such that V j+1 = V j Wj. ψ is the mother wavelet. p.22/35

50 Multiresolution analysis V 0 V 1 V 2... L 2 ( [0, 1] ). φ: the father wavelet such that {φ j,k : k = 0,..., 2 j 1} is an orthornormal basis of V j. For each j, one constructs an orthornormal basis {ψ j,k : k = 0,..., 2 j 1} of W j such that V j+1 = V j Wj. ψ is the mother wavelet. Any function X i of L 2 ( [0, 1] ) reads X i (t) = 2 j 1 ξ i j,kψ j,k (t) + η i φ 0,0 (t). j=0 k=0 p.22/35

51 ψ ψ 0,0 ψ 0, ψ 4,4 ψ 4,8 ψ 4,12 p.23/35

52 Fix a maximum resolution level J and expand each observation as X i (t) J 1 j=0 2 j 1 k=0 ξ i j,kψ j,k (t) + η i φ 0,0 (t). p.24/35

53 Fix a maximum resolution level J and expand each observation as X i (t) J 1 j=0 2 j 1 k=0 ξ i j,kψ j,k (t) + η i φ 0,0 (t). Rewrite each X i as a linear combination of 2 J basis functions ψ j : X i (t) 2 J j=1 X ij ψ j (t). p.24/35

54 Fix a maximum resolution level J and expand each observation as X i (t) J 1 j=0 2 j 1 k=0 ξ i j,kψ j,k (t) + η i φ 0,0 (t). Rewrite each X i as a linear combination of 2 J basis functions ψ j : X i (t) 2 J j=1 X ij ψ j (t). How to select the most pertinent basis functions? p.24/35

55 Thresholding Reorder the first 2 J basis functions {ψ 1,..., ψ 2 J} into {ψ j1,..., ψ j2 J } via the scheme l X ij1 2 l X ij l X ij2 J 2. i=1 i=1 i=1 p.25/35

56 Thresholding Reorder the first 2 J basis functions {ψ 1,..., ψ 2 J} into {ψ j1,..., ψ j2 J } via the scheme l X ij1 2 l X ij l X ij2 J 2. i=1 i=1 i=1 Wavelet coefficients are ranked using a global thresholding on the mean of the square coefficients. p.25/35

57 Classification For each d = 1,..., 2 J, let D (d) l rules g (d) : R d {0, 1}. be a collection of p.26/35

58 Classification For each d = 1,..., 2 J, let D (d) l rules g (d) : R d {0, 1}. be a collection of Let S (d) C (m) denote the corresponding shatter l coefficients, and let S (J) C l (m) be the shatter coefficient of all rules {g (d) : d = 1,..., 2 J }. p.26/35

59 Selection step We select both d and g (d) optimally by minimizing the empirical probability of error: ( ˆd)) [ ˆd, ĝ ( 1 = arg min 1 d 2 J, g D (d) m l i J m 1 [g (d) (X (d) i ) Y i ] ]. p.27/35

60 Selection step We select both d and g (d) optimally by minimizing the empirical probability of error: ( ˆd)) [ ˆd, ĝ ( 1 = arg min 1 d 2 J, g D (d) m l i J m 1 [g (d) (X (d) i ) Y i ] ]. Theorem. L(ĝ) L L 2 J L + E + CE { log ( S (J) inf 1 d 2 J, g (d) D (d) l C l (m) ) m. L l (g (d) ) } L 2 J p.27/35

61 Illustration sh, iy, dcl, ao, aa, l = 250, m = 250. p.28/35

62 Methods W-QDA: D (d) l rules. W-NN: D (d) l W-T: D (d) l contains quadratic discriminant contains nearest-neighbor rules. contains binary trees. p.29/35

63 Methods W-QDA: D (d) l rules. W-NN: D (d) l W-T: D (d) l contains quadratic discriminant contains nearest-neighbor rules. contains binary trees. F-NN: Fourier filtering approach. MPLSR: Multivariate Partial Least Square regression. p.29/35

64 Results Method Error rates ˆd W-QDA W-NN W-T F-NN MPLSR p.30/35

65 A simulated example p.31/35

66 A simulated example Pairs (X i (t), Y i ) are generated by: X i (t) = 1 ( ) sin(fi 1 πt)f µi,σi (t)+sin(fi 2 πt)f 1 µi,σ 50 i (t) +ε i p.32/35

67 A simulated example Pairs (X i (t), Y i ) are generated by: X i (t) = 1 ( ) sin(fi 1 πt)f µi,σi (t)+sin(fi 2 πt)f 1 µi,σ 50 i (t) +ε i f µ,σ = N (µ, σ 2 ) where µ U(0.1, 0.4) and σ U(0, 0.005) p.32/35

68 A simulated example Pairs (X i (t), Y i ) are generated by: X i (t) = 1 ( ) sin(fi 1 πt)f µi,σi (t)+sin(fi 2 πt)f 1 µi,σ 50 i (t) +ε i f µ,σ = N (µ, σ 2 ) where µ U(0.1, 0.4) and σ U(0, 0.005) Fi 1 and Fi 2 U(50, 150) p.32/35

69 A simulated example Pairs (X i (t), Y i ) are generated by: X i (t) = 1 ( ) sin(fi 1 πt)f µi,σi (t)+sin(fi 2 πt)f 1 µi,σ 50 i (t) +ε i f µ,σ = N (µ, σ 2 ) where µ U(0.1, 0.4) and σ U(0, 0.005) Fi 1 and Fi 2 U(50, 150) ε i N (0, 0.5). p.32/35

70 A simulated example Pairs (X i (t), Y i ) are generated by: X i (t) = 1 ( ) sin(fi 1 πt)f µi,σi (t)+sin(fi 2 πt)f 1 µi,σ 50 i (t) +ε i f µ,σ = N (µ, σ 2 ) where µ U(0.1, 0.4) and σ U(0, 0.005) Fi 1 and Fi 2 U(50, 150) ε i N (0, 0.5). For i = 1,..., n Y i = { 0 if µi, otherwise. p.32/35

71 Results Method Error rates ˆd W-QDA W-NN W-T F-NN MPLSR p.33/35

72 Results Method Error rates ˆd W-QDA W-NN W-T F-NN MPLSR p.33/35

73 Sampling In practice, the curves are always observed at discrete time points, say t p, p = 1,..., P. p.34/35

74 Sampling In practice, the curves are always observed at discrete time points, say t p, p = 1,..., P. There is only an estimation of the wavelet coefficients available: X i (t) 2 J j=1 ˆX ij ψ j (t), where ˆX ij = 1 P P X i (t p )ψ j (t p ). p=1 p.34/35

75 Sampling consistency Theorem. Let D (d) l be the family of all k-nearest neighbor rules, and suppose that lim n l =, lim m =, n lim n log l m = 0. Then the selected rule ĝ is universally consistent, that is lim lim lim L(ĝ) = J n P L. p.35/35

On the Kernel Rule for Function Classification

On the Kernel Rule for Function Classification On the Kernel Rule for Function Classification Christophe ABRAHAM a, Gérard BIAU b, and Benoît CADRE b a ENSAM-INRA, UMR Biométrie et Analyse des Systèmes, 2 place Pierre Viala, 34060 Montpellier Cedex,

More information

Learning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013

Learning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013 Learning Theory Ingo Steinwart University of Stuttgart September 4, 2013 Ingo Steinwart University of Stuttgart () Learning Theory September 4, 2013 1 / 62 Basics Informal Introduction Informal Description

More information

Consistency of Nearest Neighbor Methods

Consistency of Nearest Neighbor Methods E0 370 Statistical Learning Theory Lecture 16 Oct 25, 2011 Consistency of Nearest Neighbor Methods Lecturer: Shivani Agarwal Scribe: Arun Rajkumar 1 Introduction In this lecture we return to the study

More information

Random forests and averaging classifiers

Random forests and averaging classifiers Random forests and averaging classifiers Gábor Lugosi ICREA and Pompeu Fabra University Barcelona joint work with Gérard Biau (Paris 6) Luc Devroye (McGill, Montreal) Leo Breiman Binary classification

More information

Approximation Theoretical Questions for SVMs

Approximation Theoretical Questions for SVMs Ingo Steinwart LA-UR 07-7056 October 20, 2007 Statistical Learning Theory: an Overview Support Vector Machines Informal Description of the Learning Goal X space of input samples Y space of labels, usually

More information

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II) Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture

More information

Lecture 4 Discriminant Analysis, k-nearest Neighbors

Lecture 4 Discriminant Analysis, k-nearest Neighbors Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se

More information

Random projection ensemble classification

Random projection ensemble classification Random projection ensemble classification Timothy I. Cannings Statistics for Big Data Workshop, Brunel Joint work with Richard Samworth Introduction to classification Observe data from two classes, pairs

More information

Recap from previous lecture

Recap from previous lecture Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience

More information

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows Kn-Nearest

More information

A talk on Oracle inequalities and regularization. by Sara van de Geer

A talk on Oracle inequalities and regularization. by Sara van de Geer A talk on Oracle inequalities and regularization by Sara van de Geer Workshop Regularization in Statistics Banff International Regularization Station September 6-11, 2003 Aim: to compare l 1 and other

More information

Introduction to Machine Learning

Introduction to Machine Learning Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},

More information

Statistical learning theory, Support vector machines, and Bioinformatics

Statistical learning theory, Support vector machines, and Bioinformatics 1 Statistical learning theory, Support vector machines, and Bioinformatics Jean-Philippe.Vert@mines.org Ecole des Mines de Paris Computational Biology group ENS Paris, november 25, 2003. 2 Overview 1.

More information

Direct Learning: Linear Classification. Donglin Zeng, Department of Biostatistics, University of North Carolina

Direct Learning: Linear Classification. Donglin Zeng, Department of Biostatistics, University of North Carolina Direct Learning: Linear Classification Logistic regression models for classification problem We consider two class problem: Y {0, 1}. The Bayes rule for the classification is I(P(Y = 1 X = x) > 1/2) so

More information

Online Learning and Sequential Decision Making

Online Learning and Sequential Decision Making Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning

More information

Adapted Feature Extraction and Its Applications

Adapted Feature Extraction and Its Applications saito@math.ucdavis.edu 1 Adapted Feature Extraction and Its Applications Naoki Saito Department of Mathematics University of California Davis, CA 95616 email: saito@math.ucdavis.edu URL: http://www.math.ucdavis.edu/

More information

Solving Classification Problems By Knowledge Sets

Solving Classification Problems By Knowledge Sets Solving Classification Problems By Knowledge Sets Marcin Orchel a, a Department of Computer Science, AGH University of Science and Technology, Al. A. Mickiewicza 30, 30-059 Kraków, Poland Abstract We propose

More information

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Bayesian Classification Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

More information

Nearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2

Nearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2 Nearest Neighbor Machine Learning CSE546 Kevin Jamieson University of Washington October 26, 2017 2017 Kevin Jamieson 2 Some data, Bayes Classifier Training data: True label: +1 True label: -1 Optimal

More information

CMU-Q Lecture 24:

CMU-Q Lecture 24: CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input

More information

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation Lecture 15. Pattern Classification (I): Statistical Formulation Outline Statistical Pattern Recognition Maximum Posterior Probability (MAP) Classifier Maximum Likelihood (ML) Classifier K-Nearest Neighbor

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Antti Ukkonen TAs: Saska Dönges and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer,

More information

Support Vector Machines for Classification: A Statistical Portrait

Support Vector Machines for Classification: A Statistical Portrait Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,

More information

Probability Models for Bayesian Recognition

Probability Models for Bayesian Recognition Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIAG / osig Second Semester 06/07 Lesson 9 0 arch 07 Probability odels for Bayesian Recognition Notation... Supervised Learning for Bayesian

More information

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

BINARY CLASSIFICATION

BINARY CLASSIFICATION BINARY CLASSIFICATION MAXIM RAGINSY The problem of binary classification can be stated as follows. We have a random couple Z = X, Y ), where X R d is called the feature vector and Y {, } is called the

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Is cross-validation valid for small-sample microarray classification?

Is cross-validation valid for small-sample microarray classification? Is cross-validation valid for small-sample microarray classification? Braga-Neto et al. Bioinformatics 2004 Topics in Bioinformatics Presented by Lei Xu October 26, 2004 1 Review 1) Statistical framework

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Nearest Neighbor Pattern Classification

Nearest Neighbor Pattern Classification Nearest Neighbor Pattern Classification T. M. Cover and P. E. Hart May 15, 2018 1 The Intro The nearest neighbor algorithm/rule (NN) is the simplest nonparametric decisions procedure, that assigns to unclassified

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Introduction to Classification Algorithms Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Some

More information

Lecture 3: Introduction to Complexity Regularization

Lecture 3: Introduction to Complexity Regularization ECE90 Spring 2007 Statistical Learning Theory Instructor: R. Nowak Lecture 3: Introduction to Complexity Regularization We ended the previous lecture with a brief discussion of overfitting. Recall that,

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

CMSC858P Supervised Learning Methods

CMSC858P Supervised Learning Methods CMSC858P Supervised Learning Methods Hector Corrada Bravo March, 2010 Introduction Today we discuss the classification setting in detail. Our setting is that we observe for each subject i a set of p predictors

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer

More information

A Study of Relative Efficiency and Robustness of Classification Methods

A Study of Relative Efficiency and Robustness of Classification Methods A Study of Relative Efficiency and Robustness of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang April 28, 2011 Department of Statistics

More information

EM Algorithm & High Dimensional Data

EM Algorithm & High Dimensional Data EM Algorithm & High Dimensional Data Nuno Vasconcelos (Ken Kreutz-Delgado) UCSD Gaussian EM Algorithm For the Gaussian mixture model, we have Expectation Step (E-Step): Maximization Step (M-Step): 2 EM

More information

Classification and Pattern Recognition

Classification and Pattern Recognition Classification and Pattern Recognition Léon Bottou NEC Labs America COS 424 2/23/2010 The machine learning mix and match Goals Representation Capacity Control Operational Considerations Computational Considerations

More information

Logistic Regression. Machine Learning Fall 2018

Logistic Regression. Machine Learning Fall 2018 Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

Optimal decision and naive Bayes

Optimal decision and naive Bayes Optimal decision and naive Bayes Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2017 1 Outline Standard supervised learning hypothesis Optimal model The Naive Bayes Classifier 2 Data model Data

More information

Bayes Classifiers. CAP5610 Machine Learning Instructor: Guo-Jun QI

Bayes Classifiers. CAP5610 Machine Learning Instructor: Guo-Jun QI Bayes Classifiers CAP5610 Machine Learning Instructor: Guo-Jun QI Recap: Joint distributions Joint distribution over Input vector X = (X 1, X 2 ) X 1 =B or B (drinking beer or not) X 2 = H or H (headache

More information

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber Machine Learning Regression-Based Classification & Gaussian Discriminant Analysis Manfred Huber 2015 1 Logistic Regression Linear regression provides a nice representation and an efficient solution to

More information

Weak Signals: machine-learning meets extreme value theory

Weak Signals: machine-learning meets extreme value theory Weak Signals: machine-learning meets extreme value theory Stephan Clémençon Télécom ParisTech, LTCI, Université Paris Saclay machinelearningforbigdata.telecom-paristech.fr 2017-10-12, Workshop Big Data

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Does Modeling Lead to More Accurate Classification?

Does Modeling Lead to More Accurate Classification? Does Modeling Lead to More Accurate Classification? A Comparison of the Efficiency of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang

More information

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Machine Learning. Lecture 9: Learning Theory. Feng Li. Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell

More information

Lecture 3: Statistical Decision Theory (Part II)

Lecture 3: Statistical Decision Theory (Part II) Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical

More information

A Bahadur Representation of the Linear Support Vector Machine

A Bahadur Representation of the Linear Support Vector Machine A Bahadur Representation of the Linear Support Vector Machine Yoonkyung Lee Department of Statistics The Ohio State University October 7, 2008 Data Mining and Statistical Learning Study Group Outline Support

More information

Material presented. Direct Models for Classification. Agenda. Classification. Classification (2) Classification by machines 6/16/2010.

Material presented. Direct Models for Classification. Agenda. Classification. Classification (2) Classification by machines 6/16/2010. Material presented Direct Models for Classification SCARF JHU Summer School June 18, 2010 Patrick Nguyen (panguyen@microsoft.com) What is classification? What is a linear classifier? What are Direct Models?

More information

STA 414/2104, Spring 2014, Practice Problem Set #1

STA 414/2104, Spring 2014, Practice Problem Set #1 STA 44/4, Spring 4, Practice Problem Set # Note: these problems are not for credit, and not to be handed in Question : Consider a classification problem in which there are two real-valued inputs, and,

More information

Does Unlabeled Data Help?

Does Unlabeled Data Help? Does Unlabeled Data Help? Worst-case Analysis of the Sample Complexity of Semi-supervised Learning. Ben-David, Lu and Pal; COLT, 2008. Presentation by Ashish Rastogi Courant Machine Learning Seminar. Outline

More information

Learning Linear Detectors

Learning Linear Detectors Learning Linear Detectors Instructor - Simon Lucey 16-423 - Designing Computer Vision Apps Today Detection versus Classification Bayes Classifiers Linear Classifiers Examples of Detection 3 Learning: Detection

More information

MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava

MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS Maya Gupta, Luca Cazzanti, and Santosh Srivastava University of Washington Dept. of Electrical Engineering Seattle,

More information

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Bayesian decision theory 8001652 Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Jussi Tohka jussi.tohka@tut.fi Institute of Signal Processing Tampere University of Technology

More information

A Least Squares Formulation for Canonical Correlation Analysis

A Least Squares Formulation for Canonical Correlation Analysis A Least Squares Formulation for Canonical Correlation Analysis Liang Sun, Shuiwang Ji, and Jieping Ye Department of Computer Science and Engineering Arizona State University Motivation Canonical Correlation

More information

Decision trees COMS 4771

Decision trees COMS 4771 Decision trees COMS 4771 1. Prediction functions (again) Learning prediction functions IID model for supervised learning: (X 1, Y 1),..., (X n, Y n), (X, Y ) are iid random pairs (i.e., labeled examples).

More information

Support vector machine for functional data classification

Support vector machine for functional data classification Support vector machine for functional data classification Fabrice Rossi, Nathalie Villa To cite this version: Fabrice Rossi, Nathalie Villa. Support vector machine for functional data classification. Neurocomputing

More information

Metric Embedding for Kernel Classification Rules

Metric Embedding for Kernel Classification Rules Metric Embedding for Kernel Classification Rules Bharath K. Sriperumbudur University of California, San Diego (Joint work with Omer Lang & Gert Lanckriet) Bharath K. Sriperumbudur (UCSD) Metric Embedding

More information

Distribution-specific analysis of nearest neighbor search and classification

Distribution-specific analysis of nearest neighbor search and classification Distribution-specific analysis of nearest neighbor search and classification Sanjoy Dasgupta University of California, San Diego Nearest neighbor The primeval approach to information retrieval and classification.

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

Generative v. Discriminative classifiers Intuition

Generative v. Discriminative classifiers Intuition Logistic Regression (Continued) Generative v. Discriminative Decision rees Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University January 31 st, 2007 2005-2007 Carlos Guestrin 1 Generative

More information

BINARY TREE-STRUCTURED PARTITION AND CLASSIFICATION SCHEMES

BINARY TREE-STRUCTURED PARTITION AND CLASSIFICATION SCHEMES BINARY TREE-STRUCTURED PARTITION AND CLASSIFICATION SCHEMES DAVID MCDIARMID Abstract Binary tree-structured partition and classification schemes are a class of nonparametric tree-based approaches to classification

More information

Introduction to Machine Learning (67577) Lecture 5

Introduction to Machine Learning (67577) Lecture 5 Introduction to Machine Learning (67577) Lecture 5 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Nonuniform learning, MDL, SRM, Decision Trees, Nearest Neighbor Shai

More information

Part I. Linear Discriminant Analysis. Discriminant analysis. Discriminant analysis

Part I. Linear Discriminant Analysis. Discriminant analysis. Discriminant analysis Week 5 Based in part on slides from textbook, slides of Susan Holmes Part I Linear Discriminant Analysis October 29, 2012 1 / 1 2 / 1 Nearest centroid rule Suppose we break down our data matrix as by the

More information

Dyadic Classification Trees via Structural Risk Minimization

Dyadic Classification Trees via Structural Risk Minimization Dyadic Classification Trees via Structural Risk Minimization Clayton Scott and Robert Nowak Department of Electrical and Computer Engineering Rice University Houston, TX 77005 cscott,nowak @rice.edu Abstract

More information

COMS 4771 Introduction to Machine Learning. Nakul Verma

COMS 4771 Introduction to Machine Learning. Nakul Verma COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW1 due next lecture Project details are available decide on the group and topic by Thursday Last time Generative vs. Discriminative

More information

Nearest neighbor classification in metric spaces: universal consistency and rates of convergence

Nearest neighbor classification in metric spaces: universal consistency and rates of convergence Nearest neighbor classification in metric spaces: universal consistency and rates of convergence Sanjoy Dasgupta University of California, San Diego Nearest neighbor The primeval approach to classification.

More information

Midterm Exam, Spring 2005

Midterm Exam, Spring 2005 10-701 Midterm Exam, Spring 2005 1. Write your name and your email address below. Name: Email address: 2. There should be 15 numbered pages in this exam (including this cover sheet). 3. Write your name

More information

Mining Classification Knowledge

Mining Classification Knowledge Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology SE lecture revision 2013 Outline 1. Bayesian classification

More information

Lecture 2 Machine Learning Review

Lecture 2 Machine Learning Review Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things

More information

Machine Learning 1. Linear Classifiers. Marius Kloft. Humboldt University of Berlin Summer Term Machine Learning 1 Linear Classifiers 1

Machine Learning 1. Linear Classifiers. Marius Kloft. Humboldt University of Berlin Summer Term Machine Learning 1 Linear Classifiers 1 Machine Learning 1 Linear Classifiers Marius Kloft Humboldt University of Berlin Summer Term 2014 Machine Learning 1 Linear Classifiers 1 Recap Past lectures: Machine Learning 1 Linear Classifiers 2 Recap

More information

Linear Discriminant Analysis Based in part on slides from textbook, slides of Susan Holmes. November 9, Statistics 202: Data Mining

Linear Discriminant Analysis Based in part on slides from textbook, slides of Susan Holmes. November 9, Statistics 202: Data Mining Linear Discriminant Analysis Based in part on slides from textbook, slides of Susan Holmes November 9, 2012 1 / 1 Nearest centroid rule Suppose we break down our data matrix as by the labels yielding (X

More information

Time Series Classification

Time Series Classification Distance Measures Classifiers DTW vs. ED Further Work Questions August 31, 2017 Distance Measures Classifiers DTW vs. ED Further Work Questions Outline 1 2 Distance Measures 3 Classifiers 4 DTW vs. ED

More information

Fast learning rates for plug-in classifiers under the margin condition

Fast learning rates for plug-in classifiers under the margin condition Fast learning rates for plug-in classifiers under the margin condition Jean-Yves Audibert 1 Alexandre B. Tsybakov 2 1 Certis ParisTech - Ecole des Ponts, France 2 LPMA Université Pierre et Marie Curie,

More information

Statistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003

Statistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003 Statistical Approaches to Learning and Discovery Week 4: Decision Theory and Risk Minimization February 3, 2003 Recall From Last Time Bayesian expected loss is ρ(π, a) = E π [L(θ, a)] = L(θ, a) df π (θ)

More information

Tufts COMP 135: Introduction to Machine Learning

Tufts COMP 135: Introduction to Machine Learning Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Logistic Regression Many slides attributable to: Prof. Mike Hughes Erik Sudderth (UCI) Finale Doshi-Velez (Harvard)

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio Class 4: Classification Quaid Morris February 11 th, 211 ML4Bio Overview Basic concepts in classification: overfitting, cross-validation, evaluation. Linear Discriminant Analysis and Quadratic Discriminant

More information

Classification with Reject Option

Classification with Reject Option Classification with Reject Option Bartlett and Wegkamp (2008) Wegkamp and Yuan (2010) February 17, 2012 Outline. Introduction.. Classification with reject option. Spirit of the papers BW2008.. Infinite

More information

Bayes Decision Theory

Bayes Decision Theory Bayes Decision Theory Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 / 16

More information

Optimal global rates of convergence for interpolation problems with random design

Optimal global rates of convergence for interpolation problems with random design Optimal global rates of convergence for interpolation problems with random design Michael Kohler 1 and Adam Krzyżak 2, 1 Fachbereich Mathematik, Technische Universität Darmstadt, Schlossgartenstr. 7, 64289

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete

More information

Wavelet-Based Estimation of a Discriminant Function

Wavelet-Based Estimation of a Discriminant Function Wavelet-Based Estimation of a Discriminant Function Woojin Chang 1 Seong-Hee Kim 2 and Brani Vidaovic 3 Abstract. In this paper we consider wavelet-based binary linear classifiers. Both consistency results

More information

PAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018

PAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University PAC Learning Matt Gormley Lecture 14 March 5, 2018 1 ML Big Picture Learning Paradigms:

More information

Generalization, Overfitting, and Model Selection

Generalization, Overfitting, and Model Selection Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification Maria-Florina (Nina) Balcan 10/03/2016 Two Core Aspects of Machine Learning Algorithm Design. How

More information

Discussion of Regularization of Wavelets Approximations by A. Antoniadis and J. Fan

Discussion of Regularization of Wavelets Approximations by A. Antoniadis and J. Fan Discussion of Regularization of Wavelets Approximations by A. Antoniadis and J. Fan T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania Professors Antoniadis and Fan are

More information

Bayesian Decision Theory

Bayesian Decision Theory Introduction to Pattern Recognition [ Part 4 ] Mahdi Vasighi Remarks It is quite common to assume that the data in each class are adequately described by a Gaussian distribution. Bayesian classifier is

More information

Empirical Risk Minimization, Model Selection, and Model Assessment

Empirical Risk Minimization, Model Selection, and Model Assessment Empirical Risk Minimization, Model Selection, and Model Assessment CS6780 Advanced Machine Learning Spring 2015 Thorsten Joachims Cornell University Reading: Murphy 5.7-5.7.2.4, 6.5-6.5.3.1 Dietterich,

More information

PAC-learning, VC Dimension and Margin-based Bounds

PAC-learning, VC Dimension and Margin-based Bounds More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based

More information

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification ISyE 6416: Computational Statistics Spring 2017 Lecture 5: Discriminant analysis and classification Prof. Yao Xie H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

LDA, QDA, Naive Bayes

LDA, QDA, Naive Bayes LDA, QDA, Naive Bayes Generative Classification Models Marek Petrik 2/16/2017 Last Class Logistic Regression Maximum Likelihood Principle Logistic Regression Predict probability of a class: p(x) Example:

More information

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher Lecture 3 STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher Previous lectures What is machine learning? Objectives of machine learning Supervised and

More information