arxiv: v1 [math.st] 23 May 2014
|
|
- Francine Porter
- 6 years ago
- Views:
Transcription
1 Abstract On oracle efficiency of the ROAD classification rule Britta Anker Bak and Jens Ledet Jensen Department of Mathematics, Aarhus University, Denmark arxiv: v [math.st] 3 May 04 For high-dimensional classification Fishers rule performs poorly due to noise from estimation of the covariance matrix. Fan, Feng and Tong (0 introduced the ROAD classifier that puts an L -constraint on the classification vector. In their Theorem Fan, Feng and Tong (0 show that the ROAD classifier asymptotically has the same misclassification rate as the corresponding oracle based classifier. Unfortunately, the proof contains an error. Here we restate the theorem and provide a new proof. Introduction We consider classification among two groups based on a p-dimensional normally distributed variable. Let the means in the two groups be µ and µ, and let the common variance be Σ. Also, let the probability of belonging to either of the two groups be. Defining µ a = (µ +µ / and µ d = (µ µ /, the Bayes discriminant rule becomes δ w (x = +(w T (x µ a < 0, with w = w F = Σ µ d, where x is classified to group or according to the value of δ w (x. The misclassification rate of the rule δ w is W(δ w = ( wt µ d /(w T Σw /, where (z = Φ(z is the upper tail probability of a standard normal distribution. The interpretation of the Bayes rule is that w F is the vector that minimizes the misclassification rate. Fan, Feng and Tong (0 suggest to use a L regularized version of w F, that is, w c = argmin w c, w T µ d = w T Σw. Its sample version yields the ROAD classifier ŵ c = argmin w T ˆΣw w c, w T ˆµ d = ˆδ = +(ŵ T c (x ˆµ a < 0. Theorem of Fan, Feng and Tong (0 states that the misclassification rate W(ˆδ of the ROAD classifier approaches the misclassification rate of the oracle classifier W(δ wc. Unfortunately, an essential step in the proof use an inequality which is not valid, see Appendix A for details. We reformulate the theorem and give a new proof.
2 Theorem. Let ǫ be a positive constant such that max j { µ dj } > ǫ, and c > ǫ + /max j { µ dj }. Let a n be a sequence tending to zero such that ˆΣ Σ = O p (a n, and ˆµ i µ i = O p (a n, i =,. Then, as n : with d n = c a n (+c Σ. W(ˆδ W(δ wc = O p (d n Prior to proving the theorem we comment on the differences compared to Theorem of Fan, Feng and Tong (0. Contrary to us, Fan, Feng and Tong (0 requires that the smallest eigenvalue ofσis bounded from below. The upper bound onw(ˆδ W(δ wc in Fan, Feng and Tong (0 depends on the sparsity of w c and of w c (, where w c ( is given by w c ( = arg min w T Σw, w c, w Tˆµ d = whereas our bound depends on the regularizing parameter c only. In the formulation of the theorem c is allowed to depend on n. We require a lower bound on max j { µ dj }, which is not part of the theorem in Fan, Feng and Tong (0. However, it enters indirectly in that we must have c > /max j { µ dj } in order for w c to exist. Thus, if max j { µ dj } 0, we have c, and c enters the upper bound of Fan, Feng and Tong (0. The reason for our more restrictive condition c > ǫ + /max j { µ dj } is that the theorem only makes sense if ŵ c exists with probability tending to one. Similarly, whereas Fan, Feng and Tong (0 have the condition ˆµ d µ d = O p (a n, we have ˆµ i µ i = O p (a n, i =,, in order to handle a term in the misclassification rate that has been neglected in Fan, Feng and Tong (0. Finally, Σ appears in our bound. However, requiring that the variances Σ ii, i =,...,p, are bounded is often encountered in high dimensional settings. Proof of Theorem In the proof we use the following inequalities: (a(+ǫ (a ǫ for a > 0 and ǫ <, ( ((a+ǫ / (a / ǫ for a > 0 and a+ǫ > 0. ( The misclassification rate consists of two terms corresponding to an observation from each of the two groups. The proofs for the two terms are identical, so to simplify we consider the misclassification rate of an observation from group only. Using ( the misclassification rate of ˆδ becomes W(ˆδ = ( ŵc T ˆµ d +ŵc T (ˆµ µ ŵt = ( ŵt +O( ŵc T c Σŵ c c Σŵ (ˆµ µ c ( ŵt +O(c ˆµ µ. (3 c Σŵ c
3 Next, and from ( we get ( ŵt = ( c Σŵ c ŵ T c Σŵ c ŵ T c ˆΣŵ c c ˆΣ Σ, ŵc T ˆΣŵ c From the proof in Fan, Feng and Tong (0 we see that and thus ( ŵc TˆΣŵ c ŵ T c ˆΣŵ c w (T c Σw ( c c ˆΣ Σ, = ( Combining (3 5 we have W(ˆδ = ( w c (T Σŵ c ( w c (T Σŵ c ( +O(c ˆΣ Σ. (4 +O(c ˆΣ Σ. (5 +O(c ˆΣ Σ +c ˆµ µ. (6 Since the oracle misclassification rate is W(δ wc = ( /( wc TΣw c we need to compare wc TΣw c with w c (T Σŵ c (. To this end let A = {w : w T µ d =, w c}, A = {w : w Tˆµ d =, w c}. We want to show that for any w A there exists w A such that w T Σw is close to w T Σ w and vice versa. This means that the minimum of w T Σw over the set A is close to the minimum over the set A. Let w A, and define w = w/(w Tˆµ d. If w c, we have w A, and w T Σw = (w Tˆµ d w T Σ w = (+O(c ˆµ d µ d w T Σ w. If instead w > c, we first define w A and then w = w/( w Tˆµ d A. To define w assume without loss of generality that = max j { µ dj. Write w = (w,w ( where w ( is (p -dimensional, and define w = ( w,rw ( with 0 < r <, and w chosen such that w T µ d =. The latter requirement implies w = rw T ( µ d( = r( w. We will show that with r = c ˆµ d µ d /(c / = O(c ˆµ d µ d we have w c. From the definition of w we have w = w +r w ( = r( w +r( w w. 3
4 If r( w > 0 we get w = +r ( w +w w +r ( c. This shows that w A and w A since w = w w Tˆµ d +r ( c c ˆµ d µ d c, when r c ˆµ d µ d /(c /. If instead r( w < 0 we find w = r +r w rc r rc, and w rc/( c ˆµ d µ d c for r c ˆµ d µ d. The latter condition is satisfied with r c ˆµ d µ d /(c /. Comparing w and w we get and also w T Σw w T Σ w c w w Σ c[( r w +( r ] Σ = O(c 4 ˆµ d µ d Σ, w T Σ w w T Σw (w T Σw O(c ˆµ d µ d. We have now shown that any value of w T Σw for w A is close to the corresponding value for some w A. The other way around, starting with w A, is treated in the same way. The only difference is that instead of using c / > ǫ, we use that when ˆ < min{ǫ,ǫ 3 /( + ǫ }, which happens with probability tending to (exponentially fast, we have ˆ > 0 and c /ˆ > ǫ/. Therefore, the minimum wc TΣw c of w T Σw over the set A is close to the minimum w c (T Σw c ( over the set A : w T c Σw c = w (T c Σw ( c +O(c 4 ˆµ d µ d Σ +O(c ˆµ d µ d w T c Σw c = w (T c Σw ( c +O(c 4 ˆµ d µ d Σ. Combining the latter with (6 we conclude Acknowledgement W(ˆδ W(δ wc = O ( c a n (+c Σ. We thank Xin Tong for reading this note and refer to the arxiv version of Fan, Feng and Tong (0 for updated versions. 4
5 Appendix A An essential step in the proof in Fan, Feng and Tong (0 is the inequality (used in equation ( of that paper w T c ˆµ d w T c Σw c. w c (T Σw c ( Unfortunately, this inequality is not correct. We illustrate this by a concrete example. We consider the two-dimensional case with ( µ d = (,0 T, Σ =, c = +ǫ with ǫ < /σ. σ In this case we have w c = (, ǫ T and w T c Σw c = ǫ+σǫ. Consider next ˆµ d = (+a,b with a and b small. For a and b sufficiently small we obtain and c = ( +b[a+ǫ(+a]/(+a+b, a+ǫ(+a T, (7 +a +a+b w ( w (T c Σw ( c = (w ( c +w ( c w( c +σ(w( c. For a and b small and including O(a and O(b terms only we get = w c (T Σw c ( which must be compared to { +ǫ ǫσ(+ǫ } +a bǫ+(a bǫ, (8 ǫ+σǫ ǫ+σǫ w T c ˆµ d w T c Σw c = +a bǫ ǫ+σǫ. (9 We thus see that (8 is less that (9 when a bǫ has the opposite sign of +ǫ ǫσ(+ǫ. Since (a bǫ N(0,( ǫ+σǫc 0 for some constant c 0, the probability of a particular sign of a bǫ is one half. References Fan, J., Feng, Y. and Tong, X. (0. A road to classification in high dimensional space: the regularized optimal affine discriminant J. R. Statist. Soc. B, 74, The paper with revised versions can also be found at arxiv.org: arxiv:0.6095v [stat.ml]. 5
arxiv: v2 [stat.ml] 9 Nov 2011
A ROAD to Classification in High Dimensional Space Jianqing Fan [1], Yang Feng [2] and Xin Tong [1] [1] Department of Operations Research & Financial Engineering, Princeton University, Princeton, New Jersey
More informationPortfolio Allocation using High Frequency Data. Jianqing Fan
Portfolio Allocation using High Frequency Data Princeton University With Yingying Li and Ke Yu http://www.princeton.edu/ jqfan September 10, 2010 About this talk How to select sparsely optimal portfolio?
More informationMachine Learning 2017
Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section
More informationDelta Theorem in the Age of High Dimensions
Delta Theorem in the Age of High Dimensions Mehmet Caner Department of Economics Ohio State University December 15, 2016 Abstract We provide a new version of delta theorem, that takes into account of high
More informationClassification 2: Linear discriminant analysis (continued); logistic regression
Classification 2: Linear discriminant analysis (continued); logistic regression Ryan Tibshirani Data Mining: 36-462/36-662 April 4 2013 Optional reading: ISL 4.4, ESL 4.3; ISL 4.3, ESL 4.4 1 Reminder:
More informationContents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)
Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture
More information12 Discriminant Analysis
12 Discriminant Analysis Discriminant analysis is used in situations where the clusters are known a priori. The aim of discriminant analysis is to classify an observation, or several observations, into
More information6.867 Machine Learning
6.867 Machine Learning Problem Set 2 Due date: Wednesday October 6 Please address all questions and comments about this problem set to 6867-staff@csail.mit.edu. You will need to use MATLAB for some of
More informationISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification
ISyE 6416: Computational Statistics Spring 2017 Lecture 5: Discriminant analysis and classification Prof. Yao Xie H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology
More informationDoes Modeling Lead to More Accurate Classification?
Does Modeling Lead to More Accurate Classification? A Comparison of the Efficiency of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang
More informationSPARSE QUADRATIC DISCRIMINANT ANALYSIS FOR HIGH DIMENSIONAL DATA
Statistica Sinica 25 (2015), 457-473 doi:http://dx.doi.org/10.5705/ss.2013.150 SPARSE QUADRATIC DISCRIMINANT ANALYSIS FOR HIGH DIMENSIONAL DATA Quefeng Li and Jun Shao University of Wisconsin-Madison and
More informationCS 195-5: Machine Learning Problem Set 1
CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of
More informationMSA220 Statistical Learning for Big Data
MSA220 Statistical Learning for Big Data Lecture 4 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology More on Discriminant analysis More on Discriminant
More informationA Study of Relative Efficiency and Robustness of Classification Methods
A Study of Relative Efficiency and Robustness of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang April 28, 2011 Department of Statistics
More informationNovel Quantization Strategies for Linear Prediction with Guarantees
Novel Novel for Linear Simon Du 1/10 Yichong Xu Yuan Li Aarti Zhang Singh Pulkit Background Motivation: Brain Computer Interface (BCI). Predict whether an individual is trying to move his hand towards
More informationClassification 1: Linear regression of indicators, linear discriminant analysis
Classification 1: Linear regression of indicators, linear discriminant analysis Ryan Tibshirani Data Mining: 36-462/36-662 April 2 2013 Optional reading: ISL 4.1, 4.2, 4.4, ESL 4.1 4.3 1 Classification
More informationParametric Techniques
Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationThe Third International Workshop in Sequential Methodologies
Area C.6.1: Wednesday, June 15, 4:00pm Kazuyoshi Yata Institute of Mathematics, University of Tsukuba, Japan Effective PCA for large p, small n context with sample size determination In recent years, substantial
More informationClassification Methods II: Linear and Quadratic Discrimminant Analysis
Classification Methods II: Linear and Quadratic Discrimminant Analysis Rebecca C. Steorts, Duke University STA 325, Chapter 4 ISL Agenda Linear Discrimminant Analysis (LDA) Classification Recall that linear
More informationAccepted author version posted online: 28 Nov 2012.
This article was downloaded by: [University of Minnesota Libraries, Twin Cities] On: 15 May 2013, At: 12:23 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954
More informationMulticlass Sparse Discriminant Analysis
Multiclass Sparse Discriminant Analysis Qing Mai, Yi Yang, Hui Zou Abstract In recent years many sparse linear discriminant analysis methods have been proposed for high-dimensional classification and variable
More informationPattern Recognition 2
Pattern Recognition 2 KNN,, Dr. Terence Sim School of Computing National University of Singapore Outline 1 2 3 4 5 Outline 1 2 3 4 5 The Bayes Classifier is theoretically optimum. That is, prob. of error
More informationLinear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging
More informationSupervised Learning. Regression Example: Boston Housing. Regression Example: Boston Housing
Supervised Learning Unsupervised learning: To extract structure and postulate hypotheses about data generating process from observations x 1,...,x n. Visualize, summarize and compress data. We have seen
More informationBayesian Decision and Bayesian Learning
Bayesian Decision and Bayesian Learning Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1 / 30 Bayes Rule p(x ω i
More informationLinear Classifiers as Pattern Detectors
Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2014/2015 Lesson 16 8 April 2015 Contents Linear Classifiers as Pattern Detectors Notation...2 Linear
More informationLecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides
Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Intelligent Data Analysis and Probabilistic Inference Lecture
More informationAppendix to Portfolio Construction by Mitigating Error Amplification: The Bounded-Noise Portfolio
Appendix to Portfolio Construction by Mitigating Error Amplification: The Bounded-Noise Portfolio Long Zhao, Deepayan Chakrabarti, and Kumar Muthuraman McCombs School of Business, University of Texas,
More informationEngineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers
Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:
More informationMachine Learning 1. Linear Classifiers. Marius Kloft. Humboldt University of Berlin Summer Term Machine Learning 1 Linear Classifiers 1
Machine Learning 1 Linear Classifiers Marius Kloft Humboldt University of Berlin Summer Term 2014 Machine Learning 1 Linear Classifiers 1 Recap Past lectures: Machine Learning 1 Linear Classifiers 2 Recap
More informationParametric Techniques Lecture 3
Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to
More informationPart III. Hypothesis Testing. III.1. Log-rank Test for Right-censored Failure Time Data
1 Part III. Hypothesis Testing III.1. Log-rank Test for Right-censored Failure Time Data Consider a survival study consisting of n independent subjects from p different populations with survival functions
More informationHigh Dimensional Discriminant Analysis
High Dimensional Discriminant Analysis Charles Bouveyron LMC-IMAG & INRIA Rhône-Alpes Joint work with S. Girard and C. Schmid High Dimensional Discriminant Analysis - Lear seminar p.1/43 Introduction High
More informationLecture 9: Classification, LDA
Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we
More informationLecture 9: Classification, LDA
Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we
More informationLecture 9: Classification, LDA
Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis Jonathan Taylor, 10/12 Slide credits: Sergio Bacallado 1 / 1 Review: Main strategy in Chapter 4 Find an estimate ˆP
More informationINFORMATION THEORY AND STATISTICS
INFORMATION THEORY AND STATISTICS Solomon Kullback DOVER PUBLICATIONS, INC. Mineola, New York Contents 1 DEFINITION OF INFORMATION 1 Introduction 1 2 Definition 3 3 Divergence 6 4 Examples 7 5 Problems...''.
More informationSupport Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2
More informationRegularized Discriminant Analysis and Reduced-Rank LDA
Regularized Discriminant Analysis and Reduced-Rank LDA Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Regularized Discriminant Analysis A compromise between LDA and
More informationA Statistical Analysis of Fukunaga Koontz Transform
1 A Statistical Analysis of Fukunaga Koontz Transform Xiaoming Huo Dr. Xiaoming Huo is an assistant professor at the School of Industrial and System Engineering of the Georgia Institute of Technology,
More informationMSA200/TMS041 Multivariate Analysis
MSA200/TMS041 Multivariate Analysis Lecture 8 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Back to Discriminant analysis As mentioned in the previous
More informationLDA, QDA, Naive Bayes
LDA, QDA, Naive Bayes Generative Classification Models Marek Petrik 2/16/2017 Last Class Logistic Regression Maximum Likelihood Principle Logistic Regression Predict probability of a class: p(x) Example:
More informationMotivating the Covariance Matrix
Motivating the Covariance Matrix Raúl Rojas Computer Science Department Freie Universität Berlin January 2009 Abstract This note reviews some interesting properties of the covariance matrix and its role
More informationSTATS306B STATS306B. Discriminant Analysis. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010
STATS306B Discriminant Analysis Jonathan Taylor Department of Statistics Stanford University June 3, 2010 Spring 2010 Classification Given K classes in R p, represented as densities f i (x), 1 i K classify
More informationClassification. Chapter Introduction. 6.2 The Bayes classifier
Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode
More informationDiscriminant analysis and supervised classification
Discriminant analysis and supervised classification Angela Montanari 1 Linear discriminant analysis Linear discriminant analysis (LDA) also known as Fisher s linear discriminant analysis or as Canonical
More informationIntroduction to Machine Learning Spring 2018 Note 18
CS 189 Introduction to Machine Learning Spring 2018 Note 18 1 Gaussian Discriminant Analysis Recall the idea of generative models: we classify an arbitrary datapoint x with the class label that maximizes
More informationStatistica Sinica Preprint No: SS R2
Statistica Sinica Preprint No: SS-2016-0117.R2 Title Multiclass Sparse Discriminant Analysis Manuscript ID SS-2016-0117.R2 URL http://www.stat.sinica.edu.tw/statistica/ DOI 10.5705/ss.202016.0117 Complete
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationAlgorithms: COMP3121/3821/9101/9801
NEW SOUTH WALES Algorithms: COMP3121/3821/9101/9801 Aleks Ignjatović School of Computer Science and Engineering University of New South Wales LECTURE 9: INTRACTABILITY COMP3121/3821/9101/9801 1 / 29 Feasibility
More informationClassification. 1. Strategies for classification 2. Minimizing the probability for misclassification 3. Risk minimization
Classification Volker Blobel University of Hamburg March 2005 Given objects (e.g. particle tracks), which have certain features (e.g. momentum p, specific energy loss de/ dx) and which belong to one of
More informationLecture 5: Classification
Lecture 5: Classification Advanced Applied Multivariate Analysis STAT 2221, Spring 2015 Sungkyu Jung Department of Statistics, University of Pittsburgh Xingye Qiao Department of Mathematical Sciences Binghamton
More informationCOM336: Neural Computing
COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk
More informationMachine Learning. Linear Models. Fabio Vandin October 10, 2017
Machine Learning Linear Models Fabio Vandin October 10, 2017 1 Linear Predictors and Affine Functions Consider X = R d Affine functions: L d = {h w,b : w R d, b R} where ( d ) h w,b (x) = w, x + b = w
More informationHigh-dimensional covariance estimation based on Gaussian graphical models
High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,
More informationDecember 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis
.. December 20, 2013 Todays lecture. (PCA) (PLS-R) (LDA) . (PCA) is a method often used to reduce the dimension of a large dataset to one of a more manageble size. The new dataset can then be used to make
More informationMTH 241: Business and Social Sciences Calculus
MTH 241: Business and Social Sciences Calculus F. Patricia Medina Department of Mathematics. Oregon State University January 28, 2015 Section 2.1 Increasing and decreasing Definition 1 A function is increasing
More informationLinear Classification: Probabilistic Generative Models
Linear Classification: Probabilistic Generative Models Sargur N. University at Buffalo, State University of New York USA 1 Linear Classification using Probabilistic Generative Models Topics 1. Overview
More information5. Discriminant analysis
5. Discriminant analysis We continue from Bayes s rule presented in Section 3 on p. 85 (5.1) where c i is a class, x isap-dimensional vector (data case) and we use class conditional probability (density
More informationCSC 411: Lecture 09: Naive Bayes
CSC 411: Lecture 09: Naive Bayes Class based on Raquel Urtasun & Rich Zemel s lectures Sanja Fidler University of Toronto Feb 8, 2015 Urtasun, Zemel, Fidler (UofT) CSC 411: 09-Naive Bayes Feb 8, 2015 1
More informationNotes on Discriminant Functions and Optimal Classification
Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem
More informationPart I. High-Dimensional Classification
Part I High-Dimensional Classification Chapter 1 High-Dimensional Classification Jianqing Fan, Yingying Fan and Yichao Wu Abstract In this chapter, we give a comprehensive overview on high-dimensional
More informationLinear Models for Classification
Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More informationA732: Exercise #7 Maximum Likelihood
A732: Exercise #7 Maximum Likelihood Due: 29 Novemer 2007 Analytic computation of some one-dimensional maximum likelihood estimators (a) Including the normalization, the exponential distriution function
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationSYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I
SYDE 372 Introduction to Pattern Recognition Probability Measures for Classification: Part I Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 Why use probability
More informationSecond Order Cone Programming, Missing or Uncertain Data, and Sparse SVMs
Second Order Cone Programming, Missing or Uncertain Data, and Sparse SVMs Ammon Washburn University of Arizona September 25, 2015 1 / 28 Introduction We will begin with basic Support Vector Machines (SVMs)
More informationAnalysis of Spectral Kernel Design based Semi-supervised Learning
Analysis of Spectral Kernel Design based Semi-supervised Learning Tong Zhang IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Rie Kubota Ando IBM T. J. Watson Research Center Yorktown Heights,
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer
More informationL 0 methods. H.J. Kappen Donders Institute for Neuroscience Radboud University, Nijmegen, the Netherlands. December 5, 2011.
L methods H.J. Kappen Donders Institute for Neuroscience Radboud University, Nijmegen, the Netherlands December 5, 2 Bert Kappen Outline George McCullochs model The Variational Garrote Bert Kappen L methods
More informationFE670 Algorithmic Trading Strategies. Stevens Institute of Technology
FE670 Algorithmic Trading Strategies Lecture 8. Robust Portfolio Optimization Steve Yang Stevens Institute of Technology 10/17/2013 Outline 1 Robust Mean-Variance Formulations 2 Uncertain in Expected Return
More informationLecture 5: LDA and Logistic Regression
Lecture 5: and Logistic Regression Hao Helen Zhang Hao Helen Zhang Lecture 5: and Logistic Regression 1 / 39 Outline Linear Classification Methods Two Popular Linear Models for Classification Linear Discriminant
More informationThe Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA
The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:
More informationLINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning
LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES Supervised Learning Linear vs non linear classifiers In K-NN we saw an example of a non-linear classifier: the decision boundary
More informationSome theory for Fisher s Linear Discriminant function, naive Bayes, and some alternatives when there are many more variables than observations.
Some theory for Fisher s Linear Discriminant function, naive Bayes, and some alternatives when there are many more variables than observations. Peter J. Bickel 1 and Elizaveta Levina 1 Department of Statistics,
More informationProof In the CR proof. and
Question Under what conditions will we be able to attain the Cramér-Rao bound and find a MVUE? Lecture 4 - Consequences of the Cramér-Rao Lower Bound. Searching for a MVUE. Rao-Blackwell Theorem, Lehmann-Scheffé
More informationSpectral Analysis - Introduction
Spectral Analysis - Introduction Initial Code Introduction Summer Semester 011/01 Lukas Vacha In previous lectures we studied stationary time series in terms of quantities that are functions of time. The
More informationCross-Validation with Confidence
Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University UMN Statistics Seminar, Mar 30, 2017 Overview Parameter est. Model selection Point est. MLE, M-est.,... Cross-validation
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer
Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers Blaine Nelson, Tobias Scheffer Contents Classification Problem Bayesian Classifier Decision Linear Classifiers, MAP Models Logistic
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationAdministration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books
STA 44/04 Jan 6, 00 / 5 Administration Homework on web page, due Feb NSERC summer undergraduate award applications due Feb 5 Some helpful books STA 44/04 Jan 6, 00... administration / 5 STA 44/04 Jan 6,
More informationLog Covariance Matrix Estimation
Log Covariance Matrix Estimation Xinwei Deng Department of Statistics University of Wisconsin-Madison Joint work with Kam-Wah Tsui (Univ. of Wisconsin-Madsion) 1 Outline Background and Motivation The Proposed
More informationMachine Learning for Signal Processing Bayes Classification and Regression
Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For
More information1 h 9 e $ s i n t h e o r y, a p p l i c a t i a n
T : 99 9 \ E \ : \ 4 7 8 \ \ \ \ - \ \ T \ \ \ : \ 99 9 T : 99-9 9 E : 4 7 8 / T V 9 \ E \ \ : 4 \ 7 8 / T \ V \ 9 T - w - - V w w - T w w \ T \ \ \ w \ w \ - \ w \ \ w \ \ \ T \ w \ w \ w \ w \ \ w \
More informationShort Note: Naive Bayes Classifiers and Permanence of Ratios
Short Note: Naive Bayes Classifiers and Permanence of Ratios Julián M. Ortiz (jmo1@ualberta.ca) Department of Civil & Environmental Engineering University of Alberta Abstract The assumption of permanence
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationA tight bound on the performance of Fisher s linear discriminant in randomly projected data spaces
A tight bound on the performance of Fisher s linear discriminant in randomly projected data spaces Robert John Durrant, Ata Kabán School of Computer Science, University of Birmingham, Edgbaston, UK, B15
More informationA Direct Approach for Sparse Quadratic Discriminant Analysis
A Direct Approach for Sparse Quadratic Discriminant Analysis arxiv:1510.00084v3 [stat.me] 20 Sep 2016 Binyan Jiang, Xiangyu Wang, and Chenlei Leng September 21, 2016 Abstract Quadratic discriminant analysis
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More informationProblem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30
Problem Set 2 MAS 622J/1.126J: Pattern Recognition and Analysis Due: 5:00 p.m. on September 30 [Note: All instructions to plot data or write a program should be carried out using Matlab. In order to maintain
More informationExercise Sheet 4: Covariance and Correlation, Bayes theorem, and Linear discriminant analysis
Exercise Sheet 4: Covariance and Correlation, Bayes theorem, and Linear discriminant analysis Younesse Kaddar. Covariance and Correlation Assume that we have recorded two neurons in the two-alternative-forced
More information6-1. Canonical Correlation Analysis
6-1. Canonical Correlation Analysis Canonical Correlatin analysis focuses on the correlation between a linear combination of the variable in one set and a linear combination of the variables in another
More informationMinimum Hellinger Distance Estimation in a. Semiparametric Mixture Model
Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.
More informationClassification: Linear Discriminant Analysis
Classification: Linear Discriminant Analysis Discriminant analysis uses sample information about individuals that are known to belong to one of several populations for the purposes of classification. Based
More informationJournal of Multivariate Analysis
Journal of Multivariate Analysis 0 (00 6 637 Contents lists available at ScienceDirect Journal of Multivariate Analysis journal homepage: wwwelseviercom/locate/jmva The efficiency of logistic regression
More informationBoosting Methods: Why They Can Be Useful for High-Dimensional Data
New URL: http://www.r-project.org/conferences/dsc-2003/ Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003) March 20 22, Vienna, Austria ISSN 1609-395X Kurt Hornik,
More informationWeek 5: Logistic Regression & Neural Networks
Week 5: Logistic Regression & Neural Networks Instructor: Sergey Levine 1 Summary: Logistic Regression In the previous lecture, we covered logistic regression. To recap, logistic regression models and
More informationVariational sampling approaches to word confusability
Variational sampling approaches to word confusability John R. Hershey, Peder A. Olsen and Ramesh A. Gopinath {jrhershe,pederao,rameshg}@us.ibm.com IBM, T. J. Watson Research Center Information Theory and
More informationLinear Dimensionality Reduction
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis
More information