arxiv: v1 [math.st] 23 May 2014

Size: px
Start display at page:

Download "arxiv: v1 [math.st] 23 May 2014"

Transcription

1 Abstract On oracle efficiency of the ROAD classification rule Britta Anker Bak and Jens Ledet Jensen Department of Mathematics, Aarhus University, Denmark arxiv: v [math.st] 3 May 04 For high-dimensional classification Fishers rule performs poorly due to noise from estimation of the covariance matrix. Fan, Feng and Tong (0 introduced the ROAD classifier that puts an L -constraint on the classification vector. In their Theorem Fan, Feng and Tong (0 show that the ROAD classifier asymptotically has the same misclassification rate as the corresponding oracle based classifier. Unfortunately, the proof contains an error. Here we restate the theorem and provide a new proof. Introduction We consider classification among two groups based on a p-dimensional normally distributed variable. Let the means in the two groups be µ and µ, and let the common variance be Σ. Also, let the probability of belonging to either of the two groups be. Defining µ a = (µ +µ / and µ d = (µ µ /, the Bayes discriminant rule becomes δ w (x = +(w T (x µ a < 0, with w = w F = Σ µ d, where x is classified to group or according to the value of δ w (x. The misclassification rate of the rule δ w is W(δ w = ( wt µ d /(w T Σw /, where (z = Φ(z is the upper tail probability of a standard normal distribution. The interpretation of the Bayes rule is that w F is the vector that minimizes the misclassification rate. Fan, Feng and Tong (0 suggest to use a L regularized version of w F, that is, w c = argmin w c, w T µ d = w T Σw. Its sample version yields the ROAD classifier ŵ c = argmin w T ˆΣw w c, w T ˆµ d = ˆδ = +(ŵ T c (x ˆµ a < 0. Theorem of Fan, Feng and Tong (0 states that the misclassification rate W(ˆδ of the ROAD classifier approaches the misclassification rate of the oracle classifier W(δ wc. Unfortunately, an essential step in the proof use an inequality which is not valid, see Appendix A for details. We reformulate the theorem and give a new proof.

2 Theorem. Let ǫ be a positive constant such that max j { µ dj } > ǫ, and c > ǫ + /max j { µ dj }. Let a n be a sequence tending to zero such that ˆΣ Σ = O p (a n, and ˆµ i µ i = O p (a n, i =,. Then, as n : with d n = c a n (+c Σ. W(ˆδ W(δ wc = O p (d n Prior to proving the theorem we comment on the differences compared to Theorem of Fan, Feng and Tong (0. Contrary to us, Fan, Feng and Tong (0 requires that the smallest eigenvalue ofσis bounded from below. The upper bound onw(ˆδ W(δ wc in Fan, Feng and Tong (0 depends on the sparsity of w c and of w c (, where w c ( is given by w c ( = arg min w T Σw, w c, w Tˆµ d = whereas our bound depends on the regularizing parameter c only. In the formulation of the theorem c is allowed to depend on n. We require a lower bound on max j { µ dj }, which is not part of the theorem in Fan, Feng and Tong (0. However, it enters indirectly in that we must have c > /max j { µ dj } in order for w c to exist. Thus, if max j { µ dj } 0, we have c, and c enters the upper bound of Fan, Feng and Tong (0. The reason for our more restrictive condition c > ǫ + /max j { µ dj } is that the theorem only makes sense if ŵ c exists with probability tending to one. Similarly, whereas Fan, Feng and Tong (0 have the condition ˆµ d µ d = O p (a n, we have ˆµ i µ i = O p (a n, i =,, in order to handle a term in the misclassification rate that has been neglected in Fan, Feng and Tong (0. Finally, Σ appears in our bound. However, requiring that the variances Σ ii, i =,...,p, are bounded is often encountered in high dimensional settings. Proof of Theorem In the proof we use the following inequalities: (a(+ǫ (a ǫ for a > 0 and ǫ <, ( ((a+ǫ / (a / ǫ for a > 0 and a+ǫ > 0. ( The misclassification rate consists of two terms corresponding to an observation from each of the two groups. The proofs for the two terms are identical, so to simplify we consider the misclassification rate of an observation from group only. Using ( the misclassification rate of ˆδ becomes W(ˆδ = ( ŵc T ˆµ d +ŵc T (ˆµ µ ŵt = ( ŵt +O( ŵc T c Σŵ c c Σŵ (ˆµ µ c ( ŵt +O(c ˆµ µ. (3 c Σŵ c

3 Next, and from ( we get ( ŵt = ( c Σŵ c ŵ T c Σŵ c ŵ T c ˆΣŵ c c ˆΣ Σ, ŵc T ˆΣŵ c From the proof in Fan, Feng and Tong (0 we see that and thus ( ŵc TˆΣŵ c ŵ T c ˆΣŵ c w (T c Σw ( c c ˆΣ Σ, = ( Combining (3 5 we have W(ˆδ = ( w c (T Σŵ c ( w c (T Σŵ c ( +O(c ˆΣ Σ. (4 +O(c ˆΣ Σ. (5 +O(c ˆΣ Σ +c ˆµ µ. (6 Since the oracle misclassification rate is W(δ wc = ( /( wc TΣw c we need to compare wc TΣw c with w c (T Σŵ c (. To this end let A = {w : w T µ d =, w c}, A = {w : w Tˆµ d =, w c}. We want to show that for any w A there exists w A such that w T Σw is close to w T Σ w and vice versa. This means that the minimum of w T Σw over the set A is close to the minimum over the set A. Let w A, and define w = w/(w Tˆµ d. If w c, we have w A, and w T Σw = (w Tˆµ d w T Σ w = (+O(c ˆµ d µ d w T Σ w. If instead w > c, we first define w A and then w = w/( w Tˆµ d A. To define w assume without loss of generality that = max j { µ dj. Write w = (w,w ( where w ( is (p -dimensional, and define w = ( w,rw ( with 0 < r <, and w chosen such that w T µ d =. The latter requirement implies w = rw T ( µ d( = r( w. We will show that with r = c ˆµ d µ d /(c / = O(c ˆµ d µ d we have w c. From the definition of w we have w = w +r w ( = r( w +r( w w. 3

4 If r( w > 0 we get w = +r ( w +w w +r ( c. This shows that w A and w A since w = w w Tˆµ d +r ( c c ˆµ d µ d c, when r c ˆµ d µ d /(c /. If instead r( w < 0 we find w = r +r w rc r rc, and w rc/( c ˆµ d µ d c for r c ˆµ d µ d. The latter condition is satisfied with r c ˆµ d µ d /(c /. Comparing w and w we get and also w T Σw w T Σ w c w w Σ c[( r w +( r ] Σ = O(c 4 ˆµ d µ d Σ, w T Σ w w T Σw (w T Σw O(c ˆµ d µ d. We have now shown that any value of w T Σw for w A is close to the corresponding value for some w A. The other way around, starting with w A, is treated in the same way. The only difference is that instead of using c / > ǫ, we use that when ˆ < min{ǫ,ǫ 3 /( + ǫ }, which happens with probability tending to (exponentially fast, we have ˆ > 0 and c /ˆ > ǫ/. Therefore, the minimum wc TΣw c of w T Σw over the set A is close to the minimum w c (T Σw c ( over the set A : w T c Σw c = w (T c Σw ( c +O(c 4 ˆµ d µ d Σ +O(c ˆµ d µ d w T c Σw c = w (T c Σw ( c +O(c 4 ˆµ d µ d Σ. Combining the latter with (6 we conclude Acknowledgement W(ˆδ W(δ wc = O ( c a n (+c Σ. We thank Xin Tong for reading this note and refer to the arxiv version of Fan, Feng and Tong (0 for updated versions. 4

5 Appendix A An essential step in the proof in Fan, Feng and Tong (0 is the inequality (used in equation ( of that paper w T c ˆµ d w T c Σw c. w c (T Σw c ( Unfortunately, this inequality is not correct. We illustrate this by a concrete example. We consider the two-dimensional case with ( µ d = (,0 T, Σ =, c = +ǫ with ǫ < /σ. σ In this case we have w c = (, ǫ T and w T c Σw c = ǫ+σǫ. Consider next ˆµ d = (+a,b with a and b small. For a and b sufficiently small we obtain and c = ( +b[a+ǫ(+a]/(+a+b, a+ǫ(+a T, (7 +a +a+b w ( w (T c Σw ( c = (w ( c +w ( c w( c +σ(w( c. For a and b small and including O(a and O(b terms only we get = w c (T Σw c ( which must be compared to { +ǫ ǫσ(+ǫ } +a bǫ+(a bǫ, (8 ǫ+σǫ ǫ+σǫ w T c ˆµ d w T c Σw c = +a bǫ ǫ+σǫ. (9 We thus see that (8 is less that (9 when a bǫ has the opposite sign of +ǫ ǫσ(+ǫ. Since (a bǫ N(0,( ǫ+σǫc 0 for some constant c 0, the probability of a particular sign of a bǫ is one half. References Fan, J., Feng, Y. and Tong, X. (0. A road to classification in high dimensional space: the regularized optimal affine discriminant J. R. Statist. Soc. B, 74, The paper with revised versions can also be found at arxiv.org: arxiv:0.6095v [stat.ml]. 5

arxiv: v2 [stat.ml] 9 Nov 2011

arxiv: v2 [stat.ml] 9 Nov 2011 A ROAD to Classification in High Dimensional Space Jianqing Fan [1], Yang Feng [2] and Xin Tong [1] [1] Department of Operations Research & Financial Engineering, Princeton University, Princeton, New Jersey

More information

Portfolio Allocation using High Frequency Data. Jianqing Fan

Portfolio Allocation using High Frequency Data. Jianqing Fan Portfolio Allocation using High Frequency Data Princeton University With Yingying Li and Ke Yu http://www.princeton.edu/ jqfan September 10, 2010 About this talk How to select sparsely optimal portfolio?

More information

Machine Learning 2017

Machine Learning 2017 Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section

More information

Delta Theorem in the Age of High Dimensions

Delta Theorem in the Age of High Dimensions Delta Theorem in the Age of High Dimensions Mehmet Caner Department of Economics Ohio State University December 15, 2016 Abstract We provide a new version of delta theorem, that takes into account of high

More information

Classification 2: Linear discriminant analysis (continued); logistic regression

Classification 2: Linear discriminant analysis (continued); logistic regression Classification 2: Linear discriminant analysis (continued); logistic regression Ryan Tibshirani Data Mining: 36-462/36-662 April 4 2013 Optional reading: ISL 4.4, ESL 4.3; ISL 4.3, ESL 4.4 1 Reminder:

More information

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II) Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture

More information

12 Discriminant Analysis

12 Discriminant Analysis 12 Discriminant Analysis Discriminant analysis is used in situations where the clusters are known a priori. The aim of discriminant analysis is to classify an observation, or several observations, into

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem Set 2 Due date: Wednesday October 6 Please address all questions and comments about this problem set to 6867-staff@csail.mit.edu. You will need to use MATLAB for some of

More information

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification ISyE 6416: Computational Statistics Spring 2017 Lecture 5: Discriminant analysis and classification Prof. Yao Xie H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology

More information

Does Modeling Lead to More Accurate Classification?

Does Modeling Lead to More Accurate Classification? Does Modeling Lead to More Accurate Classification? A Comparison of the Efficiency of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang

More information

SPARSE QUADRATIC DISCRIMINANT ANALYSIS FOR HIGH DIMENSIONAL DATA

SPARSE QUADRATIC DISCRIMINANT ANALYSIS FOR HIGH DIMENSIONAL DATA Statistica Sinica 25 (2015), 457-473 doi:http://dx.doi.org/10.5705/ss.2013.150 SPARSE QUADRATIC DISCRIMINANT ANALYSIS FOR HIGH DIMENSIONAL DATA Quefeng Li and Jun Shao University of Wisconsin-Madison and

More information

CS 195-5: Machine Learning Problem Set 1

CS 195-5: Machine Learning Problem Set 1 CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of

More information

MSA220 Statistical Learning for Big Data

MSA220 Statistical Learning for Big Data MSA220 Statistical Learning for Big Data Lecture 4 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology More on Discriminant analysis More on Discriminant

More information

A Study of Relative Efficiency and Robustness of Classification Methods

A Study of Relative Efficiency and Robustness of Classification Methods A Study of Relative Efficiency and Robustness of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang April 28, 2011 Department of Statistics

More information

Novel Quantization Strategies for Linear Prediction with Guarantees

Novel Quantization Strategies for Linear Prediction with Guarantees Novel Novel for Linear Simon Du 1/10 Yichong Xu Yuan Li Aarti Zhang Singh Pulkit Background Motivation: Brain Computer Interface (BCI). Predict whether an individual is trying to move his hand towards

More information

Classification 1: Linear regression of indicators, linear discriminant analysis

Classification 1: Linear regression of indicators, linear discriminant analysis Classification 1: Linear regression of indicators, linear discriminant analysis Ryan Tibshirani Data Mining: 36-462/36-662 April 2 2013 Optional reading: ISL 4.1, 4.2, 4.4, ESL 4.1 4.3 1 Classification

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

The Third International Workshop in Sequential Methodologies

The Third International Workshop in Sequential Methodologies Area C.6.1: Wednesday, June 15, 4:00pm Kazuyoshi Yata Institute of Mathematics, University of Tsukuba, Japan Effective PCA for large p, small n context with sample size determination In recent years, substantial

More information

Classification Methods II: Linear and Quadratic Discrimminant Analysis

Classification Methods II: Linear and Quadratic Discrimminant Analysis Classification Methods II: Linear and Quadratic Discrimminant Analysis Rebecca C. Steorts, Duke University STA 325, Chapter 4 ISL Agenda Linear Discrimminant Analysis (LDA) Classification Recall that linear

More information

Accepted author version posted online: 28 Nov 2012.

Accepted author version posted online: 28 Nov 2012. This article was downloaded by: [University of Minnesota Libraries, Twin Cities] On: 15 May 2013, At: 12:23 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954

More information

Multiclass Sparse Discriminant Analysis

Multiclass Sparse Discriminant Analysis Multiclass Sparse Discriminant Analysis Qing Mai, Yi Yang, Hui Zou Abstract In recent years many sparse linear discriminant analysis methods have been proposed for high-dimensional classification and variable

More information

Pattern Recognition 2

Pattern Recognition 2 Pattern Recognition 2 KNN,, Dr. Terence Sim School of Computing National University of Singapore Outline 1 2 3 4 5 Outline 1 2 3 4 5 The Bayes Classifier is theoretically optimum. That is, prob. of error

More information

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging

More information

Supervised Learning. Regression Example: Boston Housing. Regression Example: Boston Housing

Supervised Learning. Regression Example: Boston Housing. Regression Example: Boston Housing Supervised Learning Unsupervised learning: To extract structure and postulate hypotheses about data generating process from observations x 1,...,x n. Visualize, summarize and compress data. We have seen

More information

Bayesian Decision and Bayesian Learning

Bayesian Decision and Bayesian Learning Bayesian Decision and Bayesian Learning Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1 / 30 Bayes Rule p(x ω i

More information

Linear Classifiers as Pattern Detectors

Linear Classifiers as Pattern Detectors Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2014/2015 Lesson 16 8 April 2015 Contents Linear Classifiers as Pattern Detectors Notation...2 Linear

More information

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Intelligent Data Analysis and Probabilistic Inference Lecture

More information

Appendix to Portfolio Construction by Mitigating Error Amplification: The Bounded-Noise Portfolio

Appendix to Portfolio Construction by Mitigating Error Amplification: The Bounded-Noise Portfolio Appendix to Portfolio Construction by Mitigating Error Amplification: The Bounded-Noise Portfolio Long Zhao, Deepayan Chakrabarti, and Kumar Muthuraman McCombs School of Business, University of Texas,

More information

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:

More information

Machine Learning 1. Linear Classifiers. Marius Kloft. Humboldt University of Berlin Summer Term Machine Learning 1 Linear Classifiers 1

Machine Learning 1. Linear Classifiers. Marius Kloft. Humboldt University of Berlin Summer Term Machine Learning 1 Linear Classifiers 1 Machine Learning 1 Linear Classifiers Marius Kloft Humboldt University of Berlin Summer Term 2014 Machine Learning 1 Linear Classifiers 1 Recap Past lectures: Machine Learning 1 Linear Classifiers 2 Recap

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Part III. Hypothesis Testing. III.1. Log-rank Test for Right-censored Failure Time Data

Part III. Hypothesis Testing. III.1. Log-rank Test for Right-censored Failure Time Data 1 Part III. Hypothesis Testing III.1. Log-rank Test for Right-censored Failure Time Data Consider a survival study consisting of n independent subjects from p different populations with survival functions

More information

High Dimensional Discriminant Analysis

High Dimensional Discriminant Analysis High Dimensional Discriminant Analysis Charles Bouveyron LMC-IMAG & INRIA Rhône-Alpes Joint work with S. Girard and C. Schmid High Dimensional Discriminant Analysis - Lear seminar p.1/43 Introduction High

More information

Lecture 9: Classification, LDA

Lecture 9: Classification, LDA Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we

More information

Lecture 9: Classification, LDA

Lecture 9: Classification, LDA Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we

More information

Lecture 9: Classification, LDA

Lecture 9: Classification, LDA Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis Jonathan Taylor, 10/12 Slide credits: Sergio Bacallado 1 / 1 Review: Main strategy in Chapter 4 Find an estimate ˆP

More information

INFORMATION THEORY AND STATISTICS

INFORMATION THEORY AND STATISTICS INFORMATION THEORY AND STATISTICS Solomon Kullback DOVER PUBLICATIONS, INC. Mineola, New York Contents 1 DEFINITION OF INFORMATION 1 Introduction 1 2 Definition 3 3 Divergence 6 4 Examples 7 5 Problems...''.

More information

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2

More information

Regularized Discriminant Analysis and Reduced-Rank LDA

Regularized Discriminant Analysis and Reduced-Rank LDA Regularized Discriminant Analysis and Reduced-Rank LDA Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Regularized Discriminant Analysis A compromise between LDA and

More information

A Statistical Analysis of Fukunaga Koontz Transform

A Statistical Analysis of Fukunaga Koontz Transform 1 A Statistical Analysis of Fukunaga Koontz Transform Xiaoming Huo Dr. Xiaoming Huo is an assistant professor at the School of Industrial and System Engineering of the Georgia Institute of Technology,

More information

MSA200/TMS041 Multivariate Analysis

MSA200/TMS041 Multivariate Analysis MSA200/TMS041 Multivariate Analysis Lecture 8 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Back to Discriminant analysis As mentioned in the previous

More information

LDA, QDA, Naive Bayes

LDA, QDA, Naive Bayes LDA, QDA, Naive Bayes Generative Classification Models Marek Petrik 2/16/2017 Last Class Logistic Regression Maximum Likelihood Principle Logistic Regression Predict probability of a class: p(x) Example:

More information

Motivating the Covariance Matrix

Motivating the Covariance Matrix Motivating the Covariance Matrix Raúl Rojas Computer Science Department Freie Universität Berlin January 2009 Abstract This note reviews some interesting properties of the covariance matrix and its role

More information

STATS306B STATS306B. Discriminant Analysis. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010

STATS306B STATS306B. Discriminant Analysis. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010 STATS306B Discriminant Analysis Jonathan Taylor Department of Statistics Stanford University June 3, 2010 Spring 2010 Classification Given K classes in R p, represented as densities f i (x), 1 i K classify

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

Discriminant analysis and supervised classification

Discriminant analysis and supervised classification Discriminant analysis and supervised classification Angela Montanari 1 Linear discriminant analysis Linear discriminant analysis (LDA) also known as Fisher s linear discriminant analysis or as Canonical

More information

Introduction to Machine Learning Spring 2018 Note 18

Introduction to Machine Learning Spring 2018 Note 18 CS 189 Introduction to Machine Learning Spring 2018 Note 18 1 Gaussian Discriminant Analysis Recall the idea of generative models: we classify an arbitrary datapoint x with the class label that maximizes

More information

Statistica Sinica Preprint No: SS R2

Statistica Sinica Preprint No: SS R2 Statistica Sinica Preprint No: SS-2016-0117.R2 Title Multiclass Sparse Discriminant Analysis Manuscript ID SS-2016-0117.R2 URL http://www.stat.sinica.edu.tw/statistica/ DOI 10.5705/ss.202016.0117 Complete

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Algorithms: COMP3121/3821/9101/9801

Algorithms: COMP3121/3821/9101/9801 NEW SOUTH WALES Algorithms: COMP3121/3821/9101/9801 Aleks Ignjatović School of Computer Science and Engineering University of New South Wales LECTURE 9: INTRACTABILITY COMP3121/3821/9101/9801 1 / 29 Feasibility

More information

Classification. 1. Strategies for classification 2. Minimizing the probability for misclassification 3. Risk minimization

Classification. 1. Strategies for classification 2. Minimizing the probability for misclassification 3. Risk minimization Classification Volker Blobel University of Hamburg March 2005 Given objects (e.g. particle tracks), which have certain features (e.g. momentum p, specific energy loss de/ dx) and which belong to one of

More information

Lecture 5: Classification

Lecture 5: Classification Lecture 5: Classification Advanced Applied Multivariate Analysis STAT 2221, Spring 2015 Sungkyu Jung Department of Statistics, University of Pittsburgh Xingye Qiao Department of Mathematical Sciences Binghamton

More information

COM336: Neural Computing

COM336: Neural Computing COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk

More information

Machine Learning. Linear Models. Fabio Vandin October 10, 2017

Machine Learning. Linear Models. Fabio Vandin October 10, 2017 Machine Learning Linear Models Fabio Vandin October 10, 2017 1 Linear Predictors and Affine Functions Consider X = R d Affine functions: L d = {h w,b : w R d, b R} where ( d ) h w,b (x) = w, x + b = w

More information

High-dimensional covariance estimation based on Gaussian graphical models

High-dimensional covariance estimation based on Gaussian graphical models High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,

More information

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis .. December 20, 2013 Todays lecture. (PCA) (PLS-R) (LDA) . (PCA) is a method often used to reduce the dimension of a large dataset to one of a more manageble size. The new dataset can then be used to make

More information

MTH 241: Business and Social Sciences Calculus

MTH 241: Business and Social Sciences Calculus MTH 241: Business and Social Sciences Calculus F. Patricia Medina Department of Mathematics. Oregon State University January 28, 2015 Section 2.1 Increasing and decreasing Definition 1 A function is increasing

More information

Linear Classification: Probabilistic Generative Models

Linear Classification: Probabilistic Generative Models Linear Classification: Probabilistic Generative Models Sargur N. University at Buffalo, State University of New York USA 1 Linear Classification using Probabilistic Generative Models Topics 1. Overview

More information

5. Discriminant analysis

5. Discriminant analysis 5. Discriminant analysis We continue from Bayes s rule presented in Section 3 on p. 85 (5.1) where c i is a class, x isap-dimensional vector (data case) and we use class conditional probability (density

More information

CSC 411: Lecture 09: Naive Bayes

CSC 411: Lecture 09: Naive Bayes CSC 411: Lecture 09: Naive Bayes Class based on Raquel Urtasun & Rich Zemel s lectures Sanja Fidler University of Toronto Feb 8, 2015 Urtasun, Zemel, Fidler (UofT) CSC 411: 09-Naive Bayes Feb 8, 2015 1

More information

Notes on Discriminant Functions and Optimal Classification

Notes on Discriminant Functions and Optimal Classification Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem

More information

Part I. High-Dimensional Classification

Part I. High-Dimensional Classification Part I High-Dimensional Classification Chapter 1 High-Dimensional Classification Jianqing Fan, Yingying Fan and Yichao Wu Abstract In this chapter, we give a comprehensive overview on high-dimensional

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

A732: Exercise #7 Maximum Likelihood

A732: Exercise #7 Maximum Likelihood A732: Exercise #7 Maximum Likelihood Due: 29 Novemer 2007 Analytic computation of some one-dimensional maximum likelihood estimators (a) Including the normalization, the exponential distriution function

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I SYDE 372 Introduction to Pattern Recognition Probability Measures for Classification: Part I Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 Why use probability

More information

Second Order Cone Programming, Missing or Uncertain Data, and Sparse SVMs

Second Order Cone Programming, Missing or Uncertain Data, and Sparse SVMs Second Order Cone Programming, Missing or Uncertain Data, and Sparse SVMs Ammon Washburn University of Arizona September 25, 2015 1 / 28 Introduction We will begin with basic Support Vector Machines (SVMs)

More information

Analysis of Spectral Kernel Design based Semi-supervised Learning

Analysis of Spectral Kernel Design based Semi-supervised Learning Analysis of Spectral Kernel Design based Semi-supervised Learning Tong Zhang IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Rie Kubota Ando IBM T. J. Watson Research Center Yorktown Heights,

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer

More information

L 0 methods. H.J. Kappen Donders Institute for Neuroscience Radboud University, Nijmegen, the Netherlands. December 5, 2011.

L 0 methods. H.J. Kappen Donders Institute for Neuroscience Radboud University, Nijmegen, the Netherlands. December 5, 2011. L methods H.J. Kappen Donders Institute for Neuroscience Radboud University, Nijmegen, the Netherlands December 5, 2 Bert Kappen Outline George McCullochs model The Variational Garrote Bert Kappen L methods

More information

FE670 Algorithmic Trading Strategies. Stevens Institute of Technology

FE670 Algorithmic Trading Strategies. Stevens Institute of Technology FE670 Algorithmic Trading Strategies Lecture 8. Robust Portfolio Optimization Steve Yang Stevens Institute of Technology 10/17/2013 Outline 1 Robust Mean-Variance Formulations 2 Uncertain in Expected Return

More information

Lecture 5: LDA and Logistic Regression

Lecture 5: LDA and Logistic Regression Lecture 5: and Logistic Regression Hao Helen Zhang Hao Helen Zhang Lecture 5: and Logistic Regression 1 / 39 Outline Linear Classification Methods Two Popular Linear Models for Classification Linear Discriminant

More information

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:

More information

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES Supervised Learning Linear vs non linear classifiers In K-NN we saw an example of a non-linear classifier: the decision boundary

More information

Some theory for Fisher s Linear Discriminant function, naive Bayes, and some alternatives when there are many more variables than observations.

Some theory for Fisher s Linear Discriminant function, naive Bayes, and some alternatives when there are many more variables than observations. Some theory for Fisher s Linear Discriminant function, naive Bayes, and some alternatives when there are many more variables than observations. Peter J. Bickel 1 and Elizaveta Levina 1 Department of Statistics,

More information

Proof In the CR proof. and

Proof In the CR proof. and Question Under what conditions will we be able to attain the Cramér-Rao bound and find a MVUE? Lecture 4 - Consequences of the Cramér-Rao Lower Bound. Searching for a MVUE. Rao-Blackwell Theorem, Lehmann-Scheffé

More information

Spectral Analysis - Introduction

Spectral Analysis - Introduction Spectral Analysis - Introduction Initial Code Introduction Summer Semester 011/01 Lukas Vacha In previous lectures we studied stationary time series in terms of quantities that are functions of time. The

More information

Cross-Validation with Confidence

Cross-Validation with Confidence Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University UMN Statistics Seminar, Mar 30, 2017 Overview Parameter est. Model selection Point est. MLE, M-est.,... Cross-validation

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers Blaine Nelson, Tobias Scheffer Contents Classification Problem Bayesian Classifier Decision Linear Classifiers, MAP Models Logistic

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

Administration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books

Administration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books STA 44/04 Jan 6, 00 / 5 Administration Homework on web page, due Feb NSERC summer undergraduate award applications due Feb 5 Some helpful books STA 44/04 Jan 6, 00... administration / 5 STA 44/04 Jan 6,

More information

Log Covariance Matrix Estimation

Log Covariance Matrix Estimation Log Covariance Matrix Estimation Xinwei Deng Department of Statistics University of Wisconsin-Madison Joint work with Kam-Wah Tsui (Univ. of Wisconsin-Madsion) 1 Outline Background and Motivation The Proposed

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

1 h 9 e $ s i n t h e o r y, a p p l i c a t i a n

1 h 9 e $ s i n t h e o r y, a p p l i c a t i a n T : 99 9 \ E \ : \ 4 7 8 \ \ \ \ - \ \ T \ \ \ : \ 99 9 T : 99-9 9 E : 4 7 8 / T V 9 \ E \ \ : 4 \ 7 8 / T \ V \ 9 T - w - - V w w - T w w \ T \ \ \ w \ w \ - \ w \ \ w \ \ \ T \ w \ w \ w \ w \ \ w \

More information

Short Note: Naive Bayes Classifiers and Permanence of Ratios

Short Note: Naive Bayes Classifiers and Permanence of Ratios Short Note: Naive Bayes Classifiers and Permanence of Ratios Julián M. Ortiz (jmo1@ualberta.ca) Department of Civil & Environmental Engineering University of Alberta Abstract The assumption of permanence

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

A tight bound on the performance of Fisher s linear discriminant in randomly projected data spaces

A tight bound on the performance of Fisher s linear discriminant in randomly projected data spaces A tight bound on the performance of Fisher s linear discriminant in randomly projected data spaces Robert John Durrant, Ata Kabán School of Computer Science, University of Birmingham, Edgbaston, UK, B15

More information

A Direct Approach for Sparse Quadratic Discriminant Analysis

A Direct Approach for Sparse Quadratic Discriminant Analysis A Direct Approach for Sparse Quadratic Discriminant Analysis arxiv:1510.00084v3 [stat.me] 20 Sep 2016 Binyan Jiang, Xiangyu Wang, and Chenlei Leng September 21, 2016 Abstract Quadratic discriminant analysis

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30 Problem Set 2 MAS 622J/1.126J: Pattern Recognition and Analysis Due: 5:00 p.m. on September 30 [Note: All instructions to plot data or write a program should be carried out using Matlab. In order to maintain

More information

Exercise Sheet 4: Covariance and Correlation, Bayes theorem, and Linear discriminant analysis

Exercise Sheet 4: Covariance and Correlation, Bayes theorem, and Linear discriminant analysis Exercise Sheet 4: Covariance and Correlation, Bayes theorem, and Linear discriminant analysis Younesse Kaddar. Covariance and Correlation Assume that we have recorded two neurons in the two-alternative-forced

More information

6-1. Canonical Correlation Analysis

6-1. Canonical Correlation Analysis 6-1. Canonical Correlation Analysis Canonical Correlatin analysis focuses on the correlation between a linear combination of the variable in one set and a linear combination of the variables in another

More information

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.

More information

Classification: Linear Discriminant Analysis

Classification: Linear Discriminant Analysis Classification: Linear Discriminant Analysis Discriminant analysis uses sample information about individuals that are known to belong to one of several populations for the purposes of classification. Based

More information

Journal of Multivariate Analysis

Journal of Multivariate Analysis Journal of Multivariate Analysis 0 (00 6 637 Contents lists available at ScienceDirect Journal of Multivariate Analysis journal homepage: wwwelseviercom/locate/jmva The efficiency of logistic regression

More information

Boosting Methods: Why They Can Be Useful for High-Dimensional Data

Boosting Methods: Why They Can Be Useful for High-Dimensional Data New URL: http://www.r-project.org/conferences/dsc-2003/ Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003) March 20 22, Vienna, Austria ISSN 1609-395X Kurt Hornik,

More information

Week 5: Logistic Regression & Neural Networks

Week 5: Logistic Regression & Neural Networks Week 5: Logistic Regression & Neural Networks Instructor: Sergey Levine 1 Summary: Logistic Regression In the previous lecture, we covered logistic regression. To recap, logistic regression models and

More information

Variational sampling approaches to word confusability

Variational sampling approaches to word confusability Variational sampling approaches to word confusability John R. Hershey, Peder A. Olsen and Ramesh A. Gopinath {jrhershe,pederao,rameshg}@us.ibm.com IBM, T. J. Watson Research Center Information Theory and

More information

Linear Dimensionality Reduction

Linear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis

More information