A Robust Approach to Regularized Discriminant Analysis

Size: px
Start display at page:

Download "A Robust Approach to Regularized Discriminant Analysis"

Transcription

1 A Robust Approach to Regularized Discriminant Analysis Moritz Gschwandtner Department of Statistics and Probability Theory Vienna University of Technology, Austria Österreichische Statistiktage, Graz, Austria September 08, 2011 Vienna University of Technology

2 Joint work with Peter Filzmoser, Vienna University of Technology, Austria Christophe Croux, ORSTAT and University Center of Statistics, K. U. Leuven, Belgium Gentiane Haesbroeck, University of Liege, Liege, Belgium

3 Contents 1. Overview of discriminant analysis 2. Introduction of the proposed method 3. Choice of parameters 4. Real and simulated data examples

4 Discriminant Analysis (DA): Example Haemophilia data: 30 normal persons and 22 obligatory carriers of hemophilia A AHFactivity AHFantigen normal carrier

5 Notation Given n observations of training data, measured at p variables. Observations originate from k different populations G 1,..., G k, according to prior probabilities π 1,..., π k, where k j=1 π j = 1, with sample sizes n 1,..., n k, where k j=1 n j = n. Usual assumption: Observations are distributed according to a normal distribution N(µ j, Σ j ), with mean µ j and covariance matrix Σ j, j = 1,..., k.

6 DA: Classification and Prediction Find a classification function f based on the training data that assigns a new, unlabelled observation to one (and only one) of the k groups: f : Ω p {1,..., k} Bayes Rule: Given an observation x, the posterior probability for group G j equals P (G j x) = p(x G j) π j ki=1 p(x G i ) π i

7 Bayesian DA Rule A test set observation x is assigned to that population G j, for which ln P (G j x) is a maximum over all groups j = 1,..., k. ( f(x) = arg max ln(p (Gj x)) ) ( = arg max ln(p(x Gj ) π j ) ) j j Quadratic Discriminant Analysis: f QDA (x) = arg max j ( 1 2 ln(det Σ j) 1 2 (x µ j ) Σ 1 j (x µ j ) + ln π j ) Linear Discriminant Analysis: assume Σ 1 =... = Σ k = Σ, and use f LDA (x) = arg max (µ j j Σ 1 x 12 ) µ j Σ 1 µ j + ln π j

8 Parameter Estimation The essential elements of the LDA rule are the group centers and the common group covariance matrix. Estimate group centers and the common group covariance matrix by the sample means and pooled sample covariance matrix. robust estimators of location and covariance, like the MCD estimators. regularized (sparse) estimators of location and covariance. Robust estimators lead to robust DA rules!

9 Maximum Likelihood Given a data sample X, the log-likelihood function of joint location µ and inverse scatter Θ := Σ 1 is given by L(µ, Θ) = log(det(θ)) 1 n x X (x µ) Θ(x µ) Maximization leads to classical estimators: ˆµ = 1 n x X x ˆΘ = ˆΣ 1 = ( 1 n x X (x ˆµ)(x ˆµ) ) 1

10 Regularization Problem: If n < p, ˆΣ is singular and the maximum likelihood estimator for Θ does not exist! Solution: Penalization of log-likelihood function based on penalty term λ > 0 and L1 Norm. 1 : L(µ, Θ) = log(det(θ)) 1 n x X Θ 1 = θ lm l,m (x µ) Θ(x µ) λ Θ 1 The maximization problem can be solved by an algorithm called graphical lasso. λ governs sparseness of ˆΣ and ˆΘ! package: glasso (Friedman, Hastie, Tibshirani, 2007).

11 Example Simulated three-dimensional data, X N(0, I 3 ), n = 100: > solve(cov(x)) [,1] [,2] [,3] [1,] [2,] [3,] > glasso(cov(x), rho=0.2)$wi [,1] [,2] [,3] [1,] [2,] [3,] λ = 0.2 leads to sparse estimate of concentration matrix!

12 Example Problem: glasso not robust! Adding 10 outliers distributed according to N(10, I 3 ) leads to > glasso(cov(x), rho=0.2)$wi [,1] [,2] [,3] [1,] [2,] [3,] Idea: Combine regularization of glasso with robust techniques!

13 Regularized MCD estimator Croux and Haesbroeck (2010): Improvement: Adapt MCD idea and integrate it into log-likelihood function: L(H, (µ, Θ)) = log(det(θ)) 1 h with i H H {1,..., n}, H = h < n (x i µ) Θ(x i µ) λ Θ 1 Maximization of L(H, (µ, Θ)) means to find an index subset H opt for which max L(H opt, (µ, Θ)) max L(H, (µ, Θ)) (µ,θ) (µ,θ) H {1,..., n} : H = h

14 Regularized MCD estimator Problem: ( ) n h subsets to check. Not applicable to large n. Improvement: C-Step Algorithm: Let H k be the subset derived at iteration k and (ˆµ Hk, ˆΘ Hk ) be the corresponding estimates maximizing L(H k, (µ, Θ)). Compute Mahalanobis distances with respect to (ˆµ Hk, ˆΘ Hk ): d (k) i (x i, ˆµ Hk, ˆΘ Hk ) = (x i ˆµ Hk ) ˆΘ Hk (x i ˆµ Hk ) Define next subset H k+1 as { H k+1 = i {1,..., n} : d (k) i where d (k) (j) are the ordered distances. {d (k) } (1),..., d(k) (h) } L(H k, (µ Hk, Θ Hk )) L(H k+1, (µ Hk+1, Θ Hk+1 ))

15 Algorithm The regularized MCD estimator is computed using the following algorithm: 1. Draw initial subset H 0 2. Maximize penalized likelihood function (glasso) to obtain (ˆµ H0, ˆΘ H0 ) 3. Compute ordered Mahalanobis distances w.r.t. (ˆµ H0, ˆΘ H0 ) 4. Choose next subset containing h observations with smallest distances 5. Repeat steps 2-4 until convergence to obtain ( ˆµ, ˆΘ) A local maximum of the likelihood value is reached. Algorithm can be repeated several times with different initial subsets.

16 LDA and RegMCD How to apply LDA with the regularized MCD estimator in the multi group setting: X = {x ij : i = 1,..., n j ; j = 1,..., k} Compute robust location estimates t j for j = 1,..., k Compute centered observations Z = {z ij } with z ij = x ij t j Apply the regularized MCD algorithm to Z to obtain common estimates (ˆµ, ˆΘ) Correct location estimates: ˆµ j = t j + ˆµ Apply LDA using parameters ˆµ 1,..., ˆµ k, ˆΘ

17 Centering x y x y

18 The penalty parameter λ How to choose the penalty parameter λ: Based on test error rates: Cross Validation Based on a model selection criterion: AIC, BIC BIC criterion: BIC(Γ) = 2 log L(Γ) + κ(γ) log n L(Γ)... Likelihood function of the model κ(γ)... Number of parameters in the model

19 The penalty parameter λ BIC(λ) is small if the value of the likelihood function L(H opt, ˆθ) is high the number of parameters in the model is small Choose λ according to λ opt = arg min λ BIC(λ) Best compromise between likelihood and sparseness!

20 Example: Fruit Data Three different sorts of the same fruit (cucumis melo) 256 different spectra measured Outliers due to different illumination systems Partition of data into 60% training and 40% test set Test errors measured for each group separately

21 Example: Fruit Data, BIC BIC and AER suggest a small λ value! BIC AER Lambda Lambda

22 Example: Fruit Data, Results Outliers in the third group lead to poor results for LDA. RRLDA remains stable! Err T est1 Err T est2 Err T est3 RRLDA (λ = 0.001) GLASSO (λ = 0.001) LDA

23 Example: Golub Data, Results 38 training samples and 34 test samples from two cancer classes. Absolute test errors were measured for various variable subsets. Variable selection was done according to the nearest shrunken centroids method. p LDA GLASSO RRLDA

24 Simulated Example Two groups (k = 2) both consisting of 100 observations and p variables with p {30, 100, 300, 500, 1000} Discrimination occurs in variables 1 and 2. Variables 3 - p are uncorrelated noise according to standard normal distributions. µ 1 = ( ) µ 2 = ( ) Σ = ( 1 )

25 Simulated Example x[, 1] x[, 2]

26 Simulated Example, BIC, p=30 BIC Lambda (optimal value = 0.19)

27 Simulated Example, Outliers Simulate contamination by adding 10% shift outliers to the data. Variables 3 - p are distributed like non-outliers. Mean of variable 1 is shifted. Means of variable 2 are swapped. µ 1 = ( ) µ 2 = ( )

28 Simulated Example, Outliers x[, 1] x[, 2]

29 Results without contamination Error Rates LDA GLASSO RRLDA AER TER p

30 Results with contamination Error Rates LDA GLASSO RRLDA AER TER p

31 Computation Times Seconds RegMCD GLASSO p

32 Conclusions RRLDA is a combination of regularization and robust methods. RRLDA is a good choice if data contain either outliers or many noisy variables or both. Penalty parameter λ is chosen according to an adapted BIC criterion.

33 Some references C. Croux and G. Haesbroeck. Robust scatter regularization. Compstat, Book of Abstracts, Paris: Conservatoire National des Arts et M etiers (CNAM) and the French National Institute for Research in Computer Science and Control (INRIA), J.H. Friedman, T. Hastie, and R. Tibshirani. Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9, , J.H. Friedman. Regularized discriminant analysis. Journal of the American Statistical Association, 84, , P. Filzmoser, R. Maronna, and M. Werner. Outlier identification in high dimensions. Computational Statistics and Data Analysis, 52, , 2008.

Cellwise robust regularized discriminant analysis

Cellwise robust regularized discriminant analysis Cellwise robust regularized discriminant analysis JSM 2017 Stéphanie Aerts University of Liège, Belgium Ines Wilms KU Leuven, Belgium Cellwise robust regularized discriminant analysis 1 Discriminant analysis

More information

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification ISyE 6416: Computational Statistics Spring 2017 Lecture 5: Discriminant analysis and classification Prof. Yao Xie H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology

More information

MATH 829: Introduction to Data Mining and Analysis Graphical Models III - Gaussian Graphical Models (cont.)

MATH 829: Introduction to Data Mining and Analysis Graphical Models III - Gaussian Graphical Models (cont.) 1/12 MATH 829: Introduction to Data Mining and Analysis Graphical Models III - Gaussian Graphical Models (cont.) Dominique Guillot Departments of Mathematical Sciences University of Delaware May 6, 2016

More information

Cellwise robust regularized discriminant analysis

Cellwise robust regularized discriminant analysis Cellwise robust regularized discriminant analysis Ines Wilms (KU Leuven) and Stéphanie Aerts (University of Liège) ICORS, July 2017 Wilms and Aerts Cellwise robust regularized discriminant analysis 1 Discriminant

More information

MSA220 Statistical Learning for Big Data

MSA220 Statistical Learning for Big Data MSA220 Statistical Learning for Big Data Lecture 4 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology More on Discriminant analysis More on Discriminant

More information

Structure estimation for Gaussian graphical models

Structure estimation for Gaussian graphical models Faculty of Science Structure estimation for Gaussian graphical models Steffen Lauritzen, University of Copenhagen Department of Mathematical Sciences Minikurs TUM 2016 Lecture 3 Slide 1/48 Overview of

More information

An Introduction to Graphical Lasso

An Introduction to Graphical Lasso An Introduction to Graphical Lasso Bo Chang Graphical Models Reading Group May 15, 2015 Bo Chang (UBC) Graphical Lasso May 15, 2015 1 / 16 Undirected Graphical Models An undirected graph, each vertex represents

More information

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression

More information

Lecture 6: Methods for high-dimensional problems

Lecture 6: Methods for high-dimensional problems Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,

More information

MATH 829: Introduction to Data Mining and Analysis Graphical Models II - Gaussian Graphical Models

MATH 829: Introduction to Data Mining and Analysis Graphical Models II - Gaussian Graphical Models 1/13 MATH 829: Introduction to Data Mining and Analysis Graphical Models II - Gaussian Graphical Models Dominique Guillot Departments of Mathematical Sciences University of Delaware May 4, 2016 Recall

More information

MSA200/TMS041 Multivariate Analysis

MSA200/TMS041 Multivariate Analysis MSA200/TMS041 Multivariate Analysis Lecture 8 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Back to Discriminant analysis As mentioned in the previous

More information

Bayesian Decision and Bayesian Learning

Bayesian Decision and Bayesian Learning Bayesian Decision and Bayesian Learning Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1 / 30 Bayes Rule p(x ω i

More information

MSA220/MVE440 Statistical Learning for Big Data

MSA220/MVE440 Statistical Learning for Big Data MSA220/MVE440 Statistical Learning for Big Data Lecture 7/8 - High-dimensional modeling part 1 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Classification

More information

Gaussian Graphical Models and Graphical Lasso

Gaussian Graphical Models and Graphical Lasso ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf

More information

Regularized Discriminant Analysis and Reduced-Rank LDA

Regularized Discriminant Analysis and Reduced-Rank LDA Regularized Discriminant Analysis and Reduced-Rank LDA Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Regularized Discriminant Analysis A compromise between LDA and

More information

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

Lecture 5. Gaussian Models - Part 1. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. November 29, 2016

Lecture 5. Gaussian Models - Part 1. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. November 29, 2016 Lecture 5 Gaussian Models - Part 1 Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza November 29, 2016 Luigi Freda ( La Sapienza University) Lecture 5 November 29, 2016 1 / 42 Outline 1 Basics

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Bayesian Classification Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

More information

Variable selection for model-based clustering

Variable selection for model-based clustering Variable selection for model-based clustering Matthieu Marbac (Ensai - Crest) Joint works with: M. Sedki (Univ. Paris-sud) and V. Vandewalle (Univ. Lille 2) The problem Objective: Estimation of a partition

More information

The lasso: some novel algorithms and applications

The lasso: some novel algorithms and applications 1 The lasso: some novel algorithms and applications Newton Institute, June 25, 2008 Robert Tibshirani Stanford University Collaborations with Trevor Hastie, Jerome Friedman, Holger Hoefling, Gen Nowak,

More information

Introduction to Machine Learning

Introduction to Machine Learning Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},

More information

Fast Regularization Paths via Coordinate Descent

Fast Regularization Paths via Coordinate Descent August 2008 Trevor Hastie, Stanford Statistics 1 Fast Regularization Paths via Coordinate Descent Trevor Hastie Stanford University joint work with Jerry Friedman and Rob Tibshirani. August 2008 Trevor

More information

STATS306B STATS306B. Discriminant Analysis. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010

STATS306B STATS306B. Discriminant Analysis. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010 STATS306B Discriminant Analysis Jonathan Taylor Department of Statistics Stanford University June 3, 2010 Spring 2010 Classification Given K classes in R p, represented as densities f i (x), 1 i K classify

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Fast and Robust Discriminant Analysis

Fast and Robust Discriminant Analysis Fast and Robust Discriminant Analysis Mia Hubert a,1, Katrien Van Driessen b a Department of Mathematics, Katholieke Universiteit Leuven, W. De Croylaan 54, B-3001 Leuven. b UFSIA-RUCA Faculty of Applied

More information

Robust estimation of scale and covariance with P n and its application to precision matrix estimation

Robust estimation of scale and covariance with P n and its application to precision matrix estimation Robust estimation of scale and covariance with P n and its application to precision matrix estimation Garth Tarr, Samuel Müller and Neville Weber USYD 2013 School of Mathematics and Statistics THE UNIVERSITY

More information

Linear Regression and Discrimination

Linear Regression and Discrimination Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian

More information

Graphical Model Selection

Graphical Model Selection May 6, 2013 Trevor Hastie, Stanford Statistics 1 Graphical Model Selection Trevor Hastie Stanford University joint work with Jerome Friedman, Rob Tibshirani, Rahul Mazumder and Jason Lee May 6, 2013 Trevor

More information

CMSC858P Supervised Learning Methods

CMSC858P Supervised Learning Methods CMSC858P Supervised Learning Methods Hector Corrada Bravo March, 2010 Introduction Today we discuss the classification setting in detail. Our setting is that we observe for each subject i a set of p predictors

More information

Model comparison and selection

Model comparison and selection BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)

More information

10708 Graphical Models: Homework 2

10708 Graphical Models: Homework 2 10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves

More information

Regularized Discriminant Analysis. Part I. Linear and Quadratic Discriminant Analysis. Discriminant Analysis. Example. Example. Class distribution

Regularized Discriminant Analysis. Part I. Linear and Quadratic Discriminant Analysis. Discriminant Analysis. Example. Example. Class distribution Part I 09.06.2006 Discriminant Analysis The purpose of discriminant analysis is to assign objects to one of several (K) groups based on a set of measurements X = (X 1, X 2,..., X p ) which are obtained

More information

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout :. The Multivariate Gaussian & Decision Boundaries..15.1.5 1 8 6 6 8 1 Mark Gales mjfg@eng.cam.ac.uk Lent

More information

p(d θ ) l(θ ) 1.2 x x x

p(d θ ) l(θ ) 1.2 x x x p(d θ ).2 x 0-7 0.8 x 0-7 0.4 x 0-7 l(θ ) -20-40 -60-80 -00 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ θ x FIGURE 3.. The top graph shows several training points in one dimension, known or assumed to

More information

Machine Learning 2017

Machine Learning 2017 Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section

More information

Robust and sparse estimation of the inverse covariance matrix using rank correlation measures

Robust and sparse estimation of the inverse covariance matrix using rank correlation measures Robust and sparse estimation of the inverse covariance matrix using rank correlation measures Christophe Croux, Viktoria Öllerer Abstract Spearman s rank correlation is a robust alternative for the standard

More information

Sparse Permutation Invariant Covariance Estimation: Final Talk

Sparse Permutation Invariant Covariance Estimation: Final Talk Sparse Permutation Invariant Covariance Estimation: Final Talk David Prince Biostat 572 dprince3@uw.edu May 31, 2012 David Prince (UW) SPICE May 31, 2012 1 / 19 Electronic Journal of Statistics Vol. 2

More information

Chapter 17: Undirected Graphical Models

Chapter 17: Undirected Graphical Models Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)

More information

7 Gaussian Discriminant Analysis (including QDA and LDA)

7 Gaussian Discriminant Analysis (including QDA and LDA) 36 Jonathan Richard Shewchuk 7 Gaussian Discriminant Analysis (including QDA and LDA) GAUSSIAN DISCRIMINANT ANALYSIS Fundamental assumption: each class comes from normal distribution (Gaussian). X N(µ,

More information

The Bayes classifier

The Bayes classifier The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal

More information

Classification 2: Linear discriminant analysis (continued); logistic regression

Classification 2: Linear discriminant analysis (continued); logistic regression Classification 2: Linear discriminant analysis (continued); logistic regression Ryan Tibshirani Data Mining: 36-462/36-662 April 4 2013 Optional reading: ISL 4.4, ESL 4.3; ISL 4.3, ESL 4.4 1 Reminder:

More information

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator Estimation Theory Estimation theory deals with finding numerical values of interesting parameters from given set of data. We start with formulating a family of models that could describe how the data were

More information

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians Engineering Part IIB: Module F Statistical Pattern Processing University of Cambridge Engineering Part IIB Module F: Statistical Pattern Processing Handout : Multivariate Gaussians. Generative Model Decision

More information

Recap from previous lecture

Recap from previous lecture Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience

More information

Permutation-invariant regularization of large covariance matrices. Liza Levina

Permutation-invariant regularization of large covariance matrices. Liza Levina Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work

More information

Sparse Gaussian Markov Random Field Mixtures for Anomaly Detection

Sparse Gaussian Markov Random Field Mixtures for Anomaly Detection Sparse Gaussian Markov Random Field Mixtures for Anomaly Detection Tsuyoshi Idé ( Ide-san ), Ankush Khandelwal*, Jayant Kalagnanam IBM Research, T. J. Watson Research Center (*Currently with University

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

Frontiers in Forecasting, Minneapolis February 21-23, Sparse VAR-Models. Christophe Croux. EDHEC Business School (France)

Frontiers in Forecasting, Minneapolis February 21-23, Sparse VAR-Models. Christophe Croux. EDHEC Business School (France) Frontiers in Forecasting, Minneapolis February 21-23, 2018 Sparse VAR-Models Christophe Croux EDHEC Business School (France) Joint Work with Ines Wilms (Cornell University), Luca Barbaglia (KU leuven),

More information

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

Introduction to Machine Learning Spring 2018 Note 18

Introduction to Machine Learning Spring 2018 Note 18 CS 189 Introduction to Machine Learning Spring 2018 Note 18 1 Gaussian Discriminant Analysis Recall the idea of generative models: we classify an arbitrary datapoint x with the class label that maximizes

More information

Robust and sparse Gaussian graphical modelling under cell-wise contamination

Robust and sparse Gaussian graphical modelling under cell-wise contamination Robust and sparse Gaussian graphical modelling under cell-wise contamination Shota Katayama 1, Hironori Fujisawa 2 and Mathias Drton 3 1 Tokyo Institute of Technology, Japan 2 The Institute of Statistical

More information

Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results

Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results David Prince Biostat 572 dprince3@uw.edu April 19, 2012 David Prince (UW) SPICE April 19, 2012 1 / 11 Electronic

More information

STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă

STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă mmp@stat.washington.edu Reading: Murphy: BIC, AIC 8.4.2 (pp 255), SRM 6.5 (pp 204) Hastie, Tibshirani

More information

Classification 1: Linear regression of indicators, linear discriminant analysis

Classification 1: Linear regression of indicators, linear discriminant analysis Classification 1: Linear regression of indicators, linear discriminant analysis Ryan Tibshirani Data Mining: 36-462/36-662 April 2 2013 Optional reading: ISL 4.1, 4.2, 4.4, ESL 4.1 4.3 1 Classification

More information

LDA, QDA, Naive Bayes

LDA, QDA, Naive Bayes LDA, QDA, Naive Bayes Generative Classification Models Marek Petrik 2/16/2017 Last Class Logistic Regression Maximum Likelihood Principle Logistic Regression Predict probability of a class: p(x) Example:

More information

Gaussian Models

Gaussian Models Gaussian Models ddebarr@uw.edu 2016-04-28 Agenda Introduction Gaussian Discriminant Analysis Inference Linear Gaussian Systems The Wishart Distribution Inferring Parameters Introduction Gaussian Density

More information

Chapter 7: Model Assessment and Selection

Chapter 7: Model Assessment and Selection Chapter 7: Model Assessment and Selection DD3364 April 20, 2012 Introduction Regression: Review of our problem Have target variable Y to estimate from a vector of inputs X. A prediction model ˆf(X) has

More information

Support Vector Machines for Classification: A Statistical Portrait

Support Vector Machines for Classification: A Statistical Portrait Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,

More information

Unsupervised machine learning

Unsupervised machine learning Chapter 9 Unsupervised machine learning Unsupervised machine learning (a.k.a. cluster analysis) is a set of methods to assign objects into clusters under a predefined distance measure when class labels

More information

Simultaneous variable selection and class fusion for high-dimensional linear discriminant analysis

Simultaneous variable selection and class fusion for high-dimensional linear discriminant analysis Biostatistics (2010), 11, 4, pp. 599 608 doi:10.1093/biostatistics/kxq023 Advance Access publication on May 26, 2010 Simultaneous variable selection and class fusion for high-dimensional linear discriminant

More information

F & B Approaches to a simple model

F & B Approaches to a simple model A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 215 http://www.astro.cornell.edu/~cordes/a6523 Lecture 11 Applications: Model comparison Challenges in large-scale surveys

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

Proteomics and Variable Selection

Proteomics and Variable Selection Proteomics and Variable Selection p. 1/55 Proteomics and Variable Selection Alex Lewin With thanks to Paul Kirk for some graphs Department of Epidemiology and Biostatistics, School of Public Health, Imperial

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Thomas G. Dietterich tgd@eecs.oregonstate.edu 1 Outline What is Machine Learning? Introduction to Supervised Learning: Linear Methods Overfitting, Regularization, and the

More information

GWAS IV: Bayesian linear (variance component) models

GWAS IV: Bayesian linear (variance component) models GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian

More information

Extended Bayesian Information Criteria for Gaussian Graphical Models

Extended Bayesian Information Criteria for Gaussian Graphical Models Extended Bayesian Information Criteria for Gaussian Graphical Models Rina Foygel University of Chicago rina@uchicago.edu Mathias Drton University of Chicago drton@uchicago.edu Abstract Gaussian graphical

More information

Supervised Learning: Linear Methods (1/2) Applied Multivariate Statistics Spring 2012

Supervised Learning: Linear Methods (1/2) Applied Multivariate Statistics Spring 2012 Supervised Learning: Linear Methods (1/2) Applied Multivariate Statistics Spring 2012 Overview Review: Conditional Probability LDA / QDA: Theory Fisher s Discriminant Analysis LDA: Example Quality control:

More information

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)

More information

International Journal of Pure and Applied Mathematics Volume 19 No , A NOTE ON BETWEEN-GROUP PCA

International Journal of Pure and Applied Mathematics Volume 19 No , A NOTE ON BETWEEN-GROUP PCA International Journal of Pure and Applied Mathematics Volume 19 No. 3 2005, 359-366 A NOTE ON BETWEEN-GROUP PCA Anne-Laure Boulesteix Department of Statistics University of Munich Akademiestrasse 1, Munich,

More information

Covariance-regularized regression and classification for high-dimensional problems

Covariance-regularized regression and classification for high-dimensional problems Covariance-regularized regression and classification for high-dimensional problems Daniela M. Witten Department of Statistics, Stanford University, 390 Serra Mall, Stanford CA 94305, USA. E-mail: dwitten@stanford.edu

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Hastie, Tibshirani & Friedman: Elements of Statistical Learning Chapter Model Assessment and Selection. CN700/March 4, 2008.

Hastie, Tibshirani & Friedman: Elements of Statistical Learning Chapter Model Assessment and Selection. CN700/March 4, 2008. Hastie, Tibshirani & Friedman: Elements of Statistical Learning Chapter 7.1-7.9 Model Assessment and Selection CN700/March 4, 2008 Satyavarta sat@cns.bu.edu Auditory Neuroscience Laboratory, Department

More information

application in microarrays

application in microarrays Biostatistics Advance Access published April 7, 2006 Regularized linear discriminant analysis and its application in microarrays Yaqian Guo, Trevor Hastie and Robert Tibshirani Abstract In this paper,

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

High Dimensional Discriminant Analysis

High Dimensional Discriminant Analysis High Dimensional Discriminant Analysis Charles Bouveyron 1,2, Stéphane Girard 1, and Cordelia Schmid 2 1 LMC IMAG, BP 53, Université Grenoble 1, 38041 Grenoble cedex 9 France (e-mail: charles.bouveyron@imag.fr,

More information

Variable Selection and Weighting by Nearest Neighbor Ensembles

Variable Selection and Weighting by Nearest Neighbor Ensembles Variable Selection and Weighting by Nearest Neighbor Ensembles Jan Gertheiss (joint work with Gerhard Tutz) Department of Statistics University of Munich WNI 2008 Nearest Neighbor Methods Introduction

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians University of Cambridge Engineering Part IIB Module 4F: Statistical Pattern Processing Handout 2: Multivariate Gaussians.2.5..5 8 6 4 2 2 4 6 8 Mark Gales mjfg@eng.cam.ac.uk Michaelmas 2 2 Engineering

More information

Regularized Discriminant Analysis and Its Application in Microarray

Regularized Discriminant Analysis and Its Application in Microarray Regularized Discriminant Analysis and Its Application in Microarray Yaqian Guo, Trevor Hastie and Robert Tibshirani May 5, 2004 Abstract In this paper, we introduce a family of some modified versions of

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

COM336: Neural Computing

COM336: Neural Computing COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk

More information

Bayesian Decision Theory

Bayesian Decision Theory Bayesian Decision Theory Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2017 CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 1 / 46 Bayesian

More information

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive

More information

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction

More information

Accepted author version posted online: 28 Nov 2012.

Accepted author version posted online: 28 Nov 2012. This article was downloaded by: [University of Minnesota Libraries, Twin Cities] On: 15 May 2013, At: 12:23 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954

More information

Regression. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh.

Regression. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. Regression Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh September 24 (All of the slides in this course have been adapted from previous versions

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Machine Learning 1. Linear Classifiers. Marius Kloft. Humboldt University of Berlin Summer Term Machine Learning 1 Linear Classifiers 1

Machine Learning 1. Linear Classifiers. Marius Kloft. Humboldt University of Berlin Summer Term Machine Learning 1 Linear Classifiers 1 Machine Learning 1 Linear Classifiers Marius Kloft Humboldt University of Berlin Summer Term 2014 Machine Learning 1 Linear Classifiers 1 Recap Past lectures: Machine Learning 1 Linear Classifiers 2 Recap

More information

What does Bayes theorem give us? Lets revisit the ball in the box example.

What does Bayes theorem give us? Lets revisit the ball in the box example. ECE 6430 Pattern Recognition and Analysis Fall 2011 Lecture Notes - 2 What does Bayes theorem give us? Lets revisit the ball in the box example. Figure 1: Boxes with colored balls Last class we answered

More information

Sparse PCA for high-dimensional data with outliers

Sparse PCA for high-dimensional data with outliers Sparse PCA for high-dimensional data with outliers Mia Hubert Tom Reynkens Eric Schmitt Tim Verdonck Department of Mathematics, KU Leuven Leuven, Belgium June 25, 2015 Abstract A new sparse PCA algorithm

More information

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences Biostatistics-Lecture 16 Model Selection Ruibin Xi Peking University School of Mathematical Sciences Motivating example1 Interested in factors related to the life expectancy (50 US states,1969-71 ) Per

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

Active and Semi-supervised Kernel Classification

Active and Semi-supervised Kernel Classification Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),

More information

Regularized Discriminant Analysis and Its Application in Microarrays

Regularized Discriminant Analysis and Its Application in Microarrays Biostatistics (2005), 1, 1, pp. 1 18 Printed in Great Britain Regularized Discriminant Analysis and Its Application in Microarrays By YAQIAN GUO Department of Statistics, Stanford University Stanford,

More information