Log Covariance Matrix Estimation

Size: px
Start display at page:

Download "Log Covariance Matrix Estimation"

Transcription

1 Log Covariance Matrix Estimation Xinwei Deng Department of Statistics University of Wisconsin-Madison Joint work with Kam-Wah Tsui (Univ. of Wisconsin-Madsion) 1

2 Outline Background and Motivation The Proposed Log-ME Method Simulation and Real Example Summary and Discussion 2

3 Background Covariance matrix estimation is important in multivariate analysis and many statistical applications. Suppose x 1,...,x n are i.i.d. p-dimensional random vectors N(0,Σ). Let S = n i=1 x ix i /n be the sample covariance matrix. The negative log-likelihood function is proportional to L n (Σ) = log Σ 1 +tr[σ 1 S]. (1) Recent interests of p is large or p n. S is not a stable estimate. The largest eigenvalues of S overly estimate the true eigenvalues. When p > n, S is singular and the smallest eigenvalue is zero. How to estimate Σ 1? 3

4 Recent Estimation Methods on Σ or Σ 1 Reduce number of nonzeros estimates of Σ or Σ 1. Σ: Bickel and Levina (2008), using thresholding. Σ 1 : Yuan and Lin (2007), l 1 penalty on Σ 1. Friedman et al., (2008), Graphical Lasso. Meinshausen and Buhlmann (2006), Reformulated as regression. Shrinkage estimates of the covariance matrix. Ledoit and Wolf (2006), ρσ+(1 ρ)µi. Won et al. (2009), control the condition number (largest eigenvalue/smallest eigenvalue). 4

5 Motivation Estimate of Σ or Σ 1 needs to be positive definite. The mathematical restriction makes the covariance matrix estimation problem challenging. Any positive definite Σ can be expressed as a matrix exponential of a real symmetric matrix A. Σ = exp(a) = I + A+ A2 2! + Expressing the likelihood function in terms of A log(σ) releases the mathematical restriction. Consider the spectral decomposition of Σ = T DT with D = diag(d 1,...,d p ). Then A = T MT with M = diag(log(d 1 ),...,log(d p )). 5

6 Idea of the Proposed Method Leonard and Hsu (1992) used this log-transformation method to estimate Σ by approximating the likelihood using Volterra integral equation. Their approximation based on on S being nonsingular not applicable when p n. We extend the likelihood approximation to the case of singular S. Regularize the largest and smallest eigenvalues of Σ simultaneously. An efficient iterative quadratic programming algorithm to estimate A (log Σ). Call the resulting estimate Log-ME, Logarithm-transformed Matrix Estimate. 6

7 A Simple Example Experiment: simulate x i s from N(0,I), i = 1,...,n where n = 50. For each p varying from 5 to 100, consider the the largest and smallest eigenvalues of the covariance matrix estimate. For each p, repeat the experiment 100 times and compute the average of the largest eigenvalues and the average of the smallest eigenvalues for The sample covariance matrix. The Log-ME covariance matrix estimate 7

8 The averages of the largest and smallest eigenvalues of covariance matrix estimates over the dimension p.the true eigenvalues are all equal to 1. 8

9 The Transformed Log-Likelihood In terms of the covariance matrix logarithm A, the negative log-likelihood function in (1) becomes L n (A) = tr(a)+tr[exp( A)S]. (2) The problem of estimating a positive definite matrix Σ now becomes a problem of estimating a real symmetric matrix A. Because of the matrix exponential term exp( A)S, estimating A by directly minimizing L n (A) is nontrivial. Our approach: Approximate exp( A)S using the Volterra integral equation (valid even for S singular case). 9

10 The Volterra Integral Equation The Volterra integral equation (Bellman, 1970, page 175) is exp(at) = exp(a 0 t)+ t 0 exp(a 0 (t s))(a A 0 )exp(as)ds. (3) Repeatedly applying (3) leads to exp(at) = exp(a 0 t)+ + t s 0 0 t 0 exp(a 0 (t s))(a A 0 )exp(a 0 s)ds exp(a 0 (t s))(a A 0 )exp(a 0 (s u))(a A 0 )exp(a 0 u)duds + cubic and higher order terms, (4) where A 0 = log(σ 0 ) and Σ 0 is an initial estimate of Σ. The expression of exp( A) can be obtained by letting t = 1 in (4) and replacing A,A 0 in (4) with A, A 0. 10

11 Approximation to the Log-Likelihood The term tr[exp( A)S] can be written as tr[exp( A)S] =tr(sσ 1 0 ) s tr[(a A 0 )Σ s 0 SΣs 1 0 ]ds tr[(a A 0 )Σ u s 0 (A A 0 )Σ u 0 SΣs 1 0 ]duds + cubic and higher order terms. (5) By leaving out the higher order terms in (5), we approximate L n (A) by using l n (A): [ 1 ] l n (A) =tr(sσ 1 0 ) tr[(a A 0 )Σ s 0 SΣs 1 0 ]ds tr(a) + 1 s tr[(a A 0 )Σ u s 0 (A A 0 )Σ u 0 SΣs 1 0 ]duds. (6) 11

12 Explicit Form of l n (A) The integrations in l n (A) can be analytically solved through the spectral decomposition of Σ 0 = T 0 D 0 T 0. Some Notation: Here D 0 = diag(d (0) 1,...,d(0) p ) with d (0) i s as the eigenvalues of Σ 0. T 0 = (t (0) 1,...,t(0) p ) with t (0) i as the corresponding eigenvector for d (0) i. Let B = T 0 (A A 0)T 0 = (b i j ) p p, and S = T 0 ST 0 = ( s i j ) p p. The l n (A) can be written as a function of b i j : l n (A) = p i=1 1 2 ξ iib 2 ii + i< j ξ i j b 2 i j + 2 p i=1 τ i j b ii b i j + j i p k=1 i< j,i k, j k η ki j b ik b k j [ p β ii b ii + 2 β i j b i j ], (7) i=1 i< j up to some constant. Getting B Getting A. 12

13 Some Details For the linear term, β ii = s ii d (0) i 1, β i j = s i j(d (0) i d (0) j )/(d (0) i d (0) j ) (logd (0) i logd (0) j ). For the quadratic term, ξ ii = s ii d (0), i ξ i j = s ii/d (0) i s j j /d (0) j logd (0) j logd (0) i τ i j = η ki j = 1/d(0) j 1/d (0) i (logd (0) j logd (0) 1/d (0) i 1/d (0) j + (d(0) i /d (0) j 1) s ii /d (0) i +(d (0) j /d (0) i 1) s j j /d (0) j (logd (0) j logd (0), i ) 2 (0) i ) + 1/di 2 logd (0) j logd (0) s i j, i log(d (0) k /d (0) j )log(d (0) j /d (0) i ) + 1/d (0) j 1/d (0) i log(d (0) k /d (0) i )log(d (0) i /d (0) j ) + 2/d (0) k 1/d (0) i 1/d (0) j log(d (0) k /d (0) i )log(d (0) k /d (0) j ) s i j. 13

14 The Log-ME Method Propose a regularized method to estimate Σ by using the approximate log-likelihood function l n (A). Consider the penalty function A 2 F = tr(a2 ) = p i=1 (log(d i)) 2, where d i is the eigenvalue of the covariance matrix Σ. If d i goes to zero or diverges to infinity, the value of log(d i ) goes to infinity in both cases. Such a penalty function can simultaneously regularize the largest and smallest eigenvalues of the covariance matrix estimate. Estimate Σ, or equivalently A, by minimizing where λ is a tuning parameter. l n,λ (B) l n,λ (A) = l n (A)+λtr(A 2 ), (8) 14

15 An Iterative Algorithm The l n,λ (B) depends on an initial estimate Σ 0, or equivalently, A 0. Propose to iteratively use l n,λ (B) to obtain its minimizer ˆB: Algorithm: Step 1: Set an initial covariance matrix estimate Σ 0, a positive definite matrix. Step 2: Use the spectral decomposition Σ 0 = T 0 D 0 T 0, and set A 0 = log(σ 0 ). Step 3: Compute ˆB by minimizing l n,λ in (10). Then obtain  = T 0 ˆBT 0 + A 0, and update the estimate of Σ by ˆΣ = exp(â) = exp(t 0 ˆBT 0 + A 0 ). Step 4: Check if ˆΣ Σ 0 2 F is less than a pre-specified positive tolerance value. Otherwise, set Σ 0 = ˆΣ and go back to Step 2. Set an initial Σ 0 in Step 1 to be S+εI. 15

16 Simulation Study Six different covariance models of Σ = (σ i j ) p p are used for comparison, Model 1: Homogeneous model with Σ = I. Model 2: MA(1) model with σ ii = 1,σ i,i 1 = σ i 1,i = Model 3: Circle model with σ ii = 1,σ i,i 1 = σ i 1,i = 0.3, σ 1,p = σ p,1 = 0.3. Compare four estimation methods: the banding estimate (Bickel and Levina, 2008), the LW estimate (Ledoit and Wolf, 2006), the Glasso estimate (Yuan and Lin, 2007), and the CN estimate (Won et al., 2009). Consider two loss functions to evaluate the performance of each method, KL = log ˆΣ 1 +tr( ˆΣ 1 Σ) ( log Σ 1 + p), 1 = ˆ d 1 / ˆ d p d 1 /d p, where d 1 and d p are the largest and smallest eigenvalue of Σ. Denote dˆ 1 and dˆ p to be their estimates. 16

17 Simulation Results Averages and standard errors from 100 runs in the case of n = 50, p = 50. Log-ME Banding LW Glasso CN Model KL 1 KL 1 KL 1 KL 1 KL (0.00) (0.00) (0.04) (0.52) (0.01) (0.02) (0.02) (0.02) (0.02) (0.01) * (0.02) (0.05) (882.90) (152.82) (0.02) (0.04) (0.03) (0.03) (0.02) (0.02) (0.01) (0.01) (0.13) (0.39) (0.01) (0.03) (0.02) (0.02) (0.01) (0.02) Note: The value marked with means it is affected by the matrix singularity. 17

18 Portfolio Optimization of Stock Data Apply the Log-ME method in an application of portfolio optimization. In mean-variance optimization, the risk of a portfolio w = (w 1,...,w p ) is measured by the standard deviation w T Σ 1 w, where w i 0 and p i w i = 1. The estimated minimum variance portfolio optimization problem is min w wt ˆΣ 1 w (9) s.t. p i w i = 1, where ˆΣ is an estimate of the true covariance matrix Σ. An accurate covariance matrix estimate ˆΣ can lead to a better portfolio strategy. 18

19 The Setting-up Consider the weekly returns of p = 30 components of the Dow Jones Industrial Index from January 8th, 2007 to June 28th, Use the first n = 50 observations as the training set, the next 50 observations as the validation set, and the remaining 83 observations for the test set. Let X ts be the test set and S ts be the sample covariance matrix of X ts. The performance of a portfolio w is measured by the realized return and the realized risk R(w) = x X ts w T x, σ(w) = w T S ts w. The optimal portfolio w is computed with ˆΣ estimated by the Log-ME method, the CN method (Won et al., 2009) and the S, separately. 19

20 The Comparison Results Table 1. The comparison of the realized return and the realized risk. Log-ME CN S Realized return R( w) Realized risk σ( w) The Log-ME method produced a portfolio with a larger realized return but smaller realized risk. 20

21 Comparison in Different Periods Consider the portfolio strategy using the Log-ME method for various covariance matrix estimation methods. Given a stating week, use the first 50 observations as the training set, the next 50 observations as a validation set, and the third 50 observations as a test set. Shift the starting week one ahead every time, and evaluate the portfolio strategy of 33 different consecutive test periods. The optimal portfolio w is computed with ˆΣ estimated by the Log-ME method, the CN method and the sample covariance matrix method, separately. 21

22 The Realized Returns Realized Return Log ME CN S 0.5 Realized Return The proposed Log-ME covariance matrix estimate can lead to higher returns. 22

23 The Realized Risks Realized Risk Log ME CN S Realized Risk The log-me method has relatively higher risks than the CN method, but it provides much larger realized returns than the CN method. 23

24 Summary Estimate the covariance matrix through its matrix logarithm based on a penalized likelihood function. The Log-ME method regularizes the largest and smallest eigenvalues simultaneously by imposing a convex penalty. Other penalty functions can be considered to improve the estimation in different perspectives. Extend to Bayesian covariance matrix estimation for the large-p-small-n problem. 24

25 Thank you! 25

26 The Log-ME Method (Con t) Note that tr(a 2 ) = tr((t 0 BT 0 + A 0) 2 ) is equivalent to tr(b 2 )+2tr(BΓ) up to some constant, where Γ = (γ i j ) p p = T 0 A 0T 0. In terms of B, the function l n,λ (A) becomes l n,λ (B) = p i=1 + λ 1 2 ξ iib 2 ii + i< j ( p β ii b ii + 2 i=1 i< j [ 1 2 p b 2 ii + i=1 ξ i j b 2 i j + 2 p i=1 β i j b i j ) p b 2 i j + i< j τ i j b ii b i j + j i p k=1 p γ ii b ii + 2 γ i j b i j ]. i=1 i< j The l n,λ (B) is still a quadratic function of B = (b i j ). i< j,i k, j k η ki j b ik b k j (10) 26

27 The CN Method The CN method is to estimate Σ with a constraint on its condition number (Won et al., 2009). They consider ˆΣ = Tdiag(û 1 1,...,û 1 p )T, where T is from the spectral decomposition of S = Tdiag(l 1,...,l p )T. The û 1,...,û p are obtained by solving the constraint optimization min u,u 1,...,u p p i where κ max is a tuning parameter. (l i u i logu i ) s.t. u u i κ max u, i = 1,..., p, The tuning parameter is computed through an independent validation set. 27

Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results

Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results David Prince Biostat 572 dprince3@uw.edu April 19, 2012 David Prince (UW) SPICE April 19, 2012 1 / 11 Electronic

More information

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana

More information

High-dimensional covariance estimation based on Gaussian graphical models

High-dimensional covariance estimation based on Gaussian graphical models High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,

More information

Permutation-invariant regularization of large covariance matrices. Liza Levina

Permutation-invariant regularization of large covariance matrices. Liza Levina Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work

More information

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement

More information

Estimation of the Global Minimum Variance Portfolio in High Dimensions

Estimation of the Global Minimum Variance Portfolio in High Dimensions Estimation of the Global Minimum Variance Portfolio in High Dimensions Taras Bodnar, Nestor Parolya and Wolfgang Schmid 07.FEBRUARY 2014 1 / 25 Outline Introduction Random Matrix Theory: Preliminary Results

More information

Statistical Inference On the High-dimensional Gaussian Covarianc

Statistical Inference On the High-dimensional Gaussian Covarianc Statistical Inference On the High-dimensional Gaussian Covariance Matrix Department of Mathematical Sciences, Clemson University June 6, 2011 Outline Introduction Problem Setup Statistical Inference High-Dimensional

More information

High-dimensional Covariance Estimation Based On Gaussian Graphical Models

High-dimensional Covariance Estimation Based On Gaussian Graphical Models High-dimensional Covariance Estimation Based On Gaussian Graphical Models Shuheng Zhou, Philipp Rutimann, Min Xu and Peter Buhlmann February 3, 2012 Problem definition Want to estimate the covariance matrix

More information

High-dimensional regression with unknown variance

High-dimensional regression with unknown variance High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march 2012 Setting Gaussian regression with unknown variance: Y i = f i + ε i with ε i i.i.d. N (0, σ 2 ) f = (f

More information

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao

More information

A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables

A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables Qi Tang (Joint work with Kam-Wah Tsui and Sijian Wang) Department of Statistics University of Wisconsin-Madison Feb. 8,

More information

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming High Dimensional Inverse Covariate Matrix Estimation via Linear Programming Ming Yuan October 24, 2011 Gaussian Graphical Model X = (X 1,..., X p ) indep. N(µ, Σ) Inverse covariance matrix Σ 1 = Ω = (ω

More information

10708 Graphical Models: Homework 2

10708 Graphical Models: Homework 2 10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves

More information

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology

More information

High Dimensional Covariance and Precision Matrix Estimation

High Dimensional Covariance and Precision Matrix Estimation High Dimensional Covariance and Precision Matrix Estimation Wei Wang Washington University in St. Louis Thursday 23 rd February, 2017 Wei Wang (Washington University in St. Louis) High Dimensional Covariance

More information

arxiv: v1 [stat.me] 16 Feb 2018

arxiv: v1 [stat.me] 16 Feb 2018 Vol., 2017, Pages 1 26 1 arxiv:1802.06048v1 [stat.me] 16 Feb 2018 High-dimensional covariance matrix estimation using a low-rank and diagonal decomposition Yilei Wu 1, Yingli Qin 1 and Mu Zhu 1 1 The University

More information

Precision Matrix Estimation with ROPE

Precision Matrix Estimation with ROPE Precision Matrix Estimation with ROPE Kuismin M. O. Department of Mathematical Sciences, University of Oulu, Finland and Kemppainen J. T. Department of Mathematical Sciences, University of Oulu, Finland

More information

ECE521 lecture 4: 19 January Optimization, MLE, regularization

ECE521 lecture 4: 19 January Optimization, MLE, regularization ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity

More information

Sparse estimation of high-dimensional covariance matrices

Sparse estimation of high-dimensional covariance matrices Sparse estimation of high-dimensional covariance matrices by Adam J. Rothman A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Statistics) in The

More information

Nonparametric Eigenvalue-Regularized Precision or Covariance Matrix Estimator

Nonparametric Eigenvalue-Regularized Precision or Covariance Matrix Estimator Nonparametric Eigenvalue-Regularized Precision or Covariance Matrix Estimator Clifford Lam Department of Statistics, London School of Economics and Political Science Abstract We introduce nonparametric

More information

Theory and Applications of High Dimensional Covariance Matrix Estimation

Theory and Applications of High Dimensional Covariance Matrix Estimation 1 / 44 Theory and Applications of High Dimensional Covariance Matrix Estimation Yuan Liao Princeton University Joint work with Jianqing Fan and Martina Mincheva December 14, 2011 2 / 44 Outline 1 Applications

More information

Learning Gaussian Graphical Models with Unknown Group Sparsity

Learning Gaussian Graphical Models with Unknown Group Sparsity Learning Gaussian Graphical Models with Unknown Group Sparsity Kevin Murphy Ben Marlin Depts. of Statistics & Computer Science Univ. British Columbia Canada Connections Graphical models Density estimation

More information

Sparse Permutation Invariant Covariance Estimation

Sparse Permutation Invariant Covariance Estimation Sparse Permutation Invariant Covariance Estimation Adam J. Rothman University of Michigan, Ann Arbor, USA. Peter J. Bickel University of California, Berkeley, USA. Elizaveta Levina University of Michigan,

More information

Sparse Linear Discriminant Analysis With High Dimensional Data

Sparse Linear Discriminant Analysis With High Dimensional Data Sparse Linear Discriminant Analysis With High Dimensional Data Jun Shao University of Wisconsin Joint work with Yazhen Wang, Xinwei Deng, Sijian Wang Jun Shao (UW-Madison) Sparse Linear Discriminant Analysis

More information

Sparse Permutation Invariant Covariance Estimation: Final Talk

Sparse Permutation Invariant Covariance Estimation: Final Talk Sparse Permutation Invariant Covariance Estimation: Final Talk David Prince Biostat 572 dprince3@uw.edu May 31, 2012 David Prince (UW) SPICE May 31, 2012 1 / 19 Electronic Journal of Statistics Vol. 2

More information

High Dimensional Discriminant Analysis

High Dimensional Discriminant Analysis High Dimensional Discriminant Analysis Charles Bouveyron LMC-IMAG & INRIA Rhône-Alpes Joint work with S. Girard and C. Schmid High Dimensional Discriminant Analysis - Lear seminar p.1/43 Introduction High

More information

Graphical Model Selection

Graphical Model Selection May 6, 2013 Trevor Hastie, Stanford Statistics 1 Graphical Model Selection Trevor Hastie Stanford University joint work with Jerome Friedman, Rob Tibshirani, Rahul Mazumder and Jason Lee May 6, 2013 Trevor

More information

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation

More information

Journal of Multivariate Analysis. Sphericity test in a GMANOVA MANOVA model with normal error

Journal of Multivariate Analysis. Sphericity test in a GMANOVA MANOVA model with normal error Journal of Multivariate Analysis 00 (009) 305 3 Contents lists available at ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva Sphericity test in a GMANOVA MANOVA

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 1 Ridge Regression Ridge regression and the Lasso are two forms of regularized

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

Bayesian Gaussian / Linear Models. Read Sections and 3.3 in the text by Bishop

Bayesian Gaussian / Linear Models. Read Sections and 3.3 in the text by Bishop Bayesian Gaussian / Linear Models Read Sections 2.3.3 and 3.3 in the text by Bishop Multivariate Gaussian Model with Multivariate Gaussian Prior Suppose we model the observed vector b as having a multivariate

More information

A Quick Tour of Linear Algebra and Optimization for Machine Learning

A Quick Tour of Linear Algebra and Optimization for Machine Learning A Quick Tour of Linear Algebra and Optimization for Machine Learning Masoud Farivar January 8, 2015 1 / 28 Outline of Part I: Review of Basic Linear Algebra Matrices and Vectors Matrix Multiplication Operators

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

A Consistent Model Selection Criterion for L 2 -Boosting in High-Dimensional Sparse Linear Models

A Consistent Model Selection Criterion for L 2 -Boosting in High-Dimensional Sparse Linear Models A Consistent Model Selection Criterion for L 2 -Boosting in High-Dimensional Sparse Linear Models Tze Leung Lai, Stanford University Ching-Kang Ing, Academia Sinica, Taipei Zehao Chen, Lehman Brothers

More information

Journal of Multivariate Analysis. Consistency of sparse PCA in High Dimension, Low Sample Size contexts

Journal of Multivariate Analysis. Consistency of sparse PCA in High Dimension, Low Sample Size contexts Journal of Multivariate Analysis 5 (03) 37 333 Contents lists available at SciVerse ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva Consistency of sparse PCA

More information

arxiv: v1 [math.st] 31 Jan 2008

arxiv: v1 [math.st] 31 Jan 2008 Electronic Journal of Statistics ISSN: 1935-7524 Sparse Permutation Invariant arxiv:0801.4837v1 [math.st] 31 Jan 2008 Covariance Estimation Adam Rothman University of Michigan Ann Arbor, MI 48109-1107.

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

Estimating Covariance Structure in High Dimensions

Estimating Covariance Structure in High Dimensions Estimating Covariance Structure in High Dimensions Ashwini Maurya Michigan State University East Lansing, MI, USA Thesis Director: Dr. Hira L. Koul Committee Members: Dr. Yuehua Cui, Dr. Hyokyoung Hong,

More information

Bi-level feature selection with applications to genetic association

Bi-level feature selection with applications to genetic association Bi-level feature selection with applications to genetic association studies October 15, 2008 Motivation In many applications, biological features possess a grouping structure Categorical variables may

More information

A path following algorithm for Sparse Pseudo-Likelihood Inverse Covariance Estimation (SPLICE)

A path following algorithm for Sparse Pseudo-Likelihood Inverse Covariance Estimation (SPLICE) A path following algorithm for Sparse Pseudo-Likelihood Inverse Covariance Estimation (SPLICE) Guilherme V. Rocha, Peng Zhao, Bin Yu July 24, 2008 Abstract Given n observations of a p-dimensional random

More information

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,

More information

The annals of Statistics (2006)

The annals of Statistics (2006) High dimensional graphs and variable selection with the Lasso Nicolai Meinshausen and Peter Buhlmann The annals of Statistics (2006) presented by Jee Young Moon Feb. 19. 2010 High dimensional graphs and

More information

High Dimensional Discriminant Analysis

High Dimensional Discriminant Analysis High Dimensional Discriminant Analysis Charles Bouveyron 1,2, Stéphane Girard 1, and Cordelia Schmid 2 1 LMC IMAG, BP 53, Université Grenoble 1, 38041 Grenoble cedex 9 France (e-mail: charles.bouveyron@imag.fr,

More information

Classification of high dimensional data: High Dimensional Discriminant Analysis

Classification of high dimensional data: High Dimensional Discriminant Analysis Classification of high dimensional data: High Dimensional Discriminant Analysis Charles Bouveyron, Stephane Girard, Cordelia Schmid To cite this version: Charles Bouveyron, Stephane Girard, Cordelia Schmid.

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

Bayes Estimators & Ridge Regression

Bayes Estimators & Ridge Regression Readings Chapter 14 Christensen Merlise Clyde September 29, 2015 How Good are Estimators? Quadratic loss for estimating β using estimator a L(β, a) = (β a) T (β a) How Good are Estimators? Quadratic loss

More information

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI 53706 TECHNICAL REPORT NO. 1091r April 2004, Revised December 2004 A Note on the Lasso and Related Procedures in Model

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

Sparse Permutation Invariant Covariance Estimation

Sparse Permutation Invariant Covariance Estimation Sparse Permutation Invariant Covariance Estimation Adam J. Rothman University of Michigan Ann Arbor, MI 48109-1107 e-mail: ajrothma@umich.edu Peter J. Bickel University of California Berkeley, CA 94720-3860

More information

Chapter 17: Undirected Graphical Models

Chapter 17: Undirected Graphical Models Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)

More information

Sparsity Regularization

Sparsity Regularization Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation

More information

Geodesic Convexity and Regularized Scatter Estimation

Geodesic Convexity and Regularized Scatter Estimation Geodesic Convexity and Regularized Scatter Estimation Lutz Duembgen (Bern) David Tyler (Rutgers) Klaus Nordhausen (Turku/Vienna), Heike Schuhmacher (Bern) Markus Pauly (Ulm), Thomas Schweizer (Bern) Düsseldorf,

More information

A Modern Look at Classical Multivariate Techniques

A Modern Look at Classical Multivariate Techniques A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Computational and Statistical Aspects of Statistical Machine Learning. John Lafferty Department of Statistics Retreat Gleacher Center

Computational and Statistical Aspects of Statistical Machine Learning. John Lafferty Department of Statistics Retreat Gleacher Center Computational and Statistical Aspects of Statistical Machine Learning John Lafferty Department of Statistics Retreat Gleacher Center Outline Modern nonparametric inference for high dimensional data Nonparametric

More information

Shrinkage Tuning Parameter Selection in Precision Matrices Estimation

Shrinkage Tuning Parameter Selection in Precision Matrices Estimation arxiv:0909.1123v1 [stat.me] 7 Sep 2009 Shrinkage Tuning Parameter Selection in Precision Matrices Estimation Heng Lian Division of Mathematical Sciences School of Physical and Mathematical Sciences Nanyang

More information

Linear classifiers: Overfitting and regularization

Linear classifiers: Overfitting and regularization Linear classifiers: Overfitting and regularization Emily Fox University of Washington January 25, 2017 Logistic regression recap 1 . Thus far, we focused on decision boundaries Score(x i ) = w 0 h 0 (x

More information

Estimating Covariance Using Factorial Hidden Markov Models

Estimating Covariance Using Factorial Hidden Markov Models Estimating Covariance Using Factorial Hidden Markov Models João Sedoc 1,2 with: Jordan Rodu 3, Lyle Ungar 1, Dean Foster 1 and Jean Gallier 1 1 University of Pennsylvania Philadelphia, PA joao@cis.upenn.edu

More information

Nonparametric Inference In Functional Data

Nonparametric Inference In Functional Data Nonparametric Inference In Functional Data Zuofeng Shang Purdue University Joint work with Guang Cheng from Purdue Univ. An Example Consider the functional linear model: Y = α + where 1 0 X(t)β(t)dt +

More information

Logistic Regression. Sargur N. Srihari. University at Buffalo, State University of New York USA

Logistic Regression. Sargur N. Srihari. University at Buffalo, State University of New York USA Logistic Regression Sargur N. University at Buffalo, State University of New York USA Topics in Linear Classification using Probabilistic Discriminative Models Generative vs Discriminative 1. Fixed basis

More information

1 Data Arrays and Decompositions

1 Data Arrays and Decompositions 1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is

More information

Convex relaxation for Combinatorial Penalties

Convex relaxation for Combinatorial Penalties Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,

More information

ANOVA: Analysis of Variance - Part I

ANOVA: Analysis of Variance - Part I ANOVA: Analysis of Variance - Part I The purpose of these notes is to discuss the theory behind the analysis of variance. It is a summary of the definitions and results presented in class with a few exercises.

More information

NUCLEAR NORM PENALIZED ESTIMATION OF INTERACTIVE FIXED EFFECT MODELS. Incomplete and Work in Progress. 1. Introduction

NUCLEAR NORM PENALIZED ESTIMATION OF INTERACTIVE FIXED EFFECT MODELS. Incomplete and Work in Progress. 1. Introduction NUCLEAR NORM PENALIZED ESTIMATION OF IERACTIVE FIXED EFFECT MODELS HYUNGSIK ROGER MOON AND MARTIN WEIDNER Incomplete and Work in Progress. Introduction Interactive fixed effects panel regression models

More information

Multivariate Gaussian Analysis

Multivariate Gaussian Analysis BS2 Statistical Inference, Lecture 7, Hilary Term 2009 February 13, 2009 Marginal and conditional distributions For a positive definite covariance matrix Σ, the multivariate Gaussian distribution has density

More information

Least Squares Regression

Least Squares Regression E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute

More information

Sparse inverse covariance estimation with the lasso

Sparse inverse covariance estimation with the lasso Sparse inverse covariance estimation with the lasso Jerome Friedman Trevor Hastie and Robert Tibshirani November 8, 2007 Abstract We consider the problem of estimating sparse graphs by a lasso penalty

More information

Multivariate Bayesian Linear Regression MLAI Lecture 11

Multivariate Bayesian Linear Regression MLAI Lecture 11 Multivariate Bayesian Linear Regression MLAI Lecture 11 Neil D. Lawrence Department of Computer Science Sheffield University 21st October 2012 Outline Univariate Bayesian Linear Regression Multivariate

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

SCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University.

SCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University. SCMA292 Mathematical Modeling : Machine Learning Krikamol Muandet Department of Mathematics Faculty of Science, Mahidol University February 9, 2016 Outline Quick Recap of Least Square Ridge Regression

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks C.M. Bishop s PRML: Chapter 5; Neural Networks Introduction The aim is, as before, to find useful decompositions of the target variable; t(x) = y(x, w) + ɛ(x) (3.7) t(x n ) and x n are the observations,

More information

High-dimensional Covariance Estimation Based On Gaussian Graphical Models

High-dimensional Covariance Estimation Based On Gaussian Graphical Models Journal of Machine Learning Research 12 (211) 2975-326 Submitted 9/1; Revised 6/11; Published 1/11 High-dimensional Covariance Estimation Based On Gaussian Graphical Models Shuheng Zhou Department of Statistics

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

Statistics & Data Sciences: First Year Prelim Exam May 2018

Statistics & Data Sciences: First Year Prelim Exam May 2018 Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book

More information

STA 414/2104, Spring 2014, Practice Problem Set #1

STA 414/2104, Spring 2014, Practice Problem Set #1 STA 44/4, Spring 4, Practice Problem Set # Note: these problems are not for credit, and not to be handed in Question : Consider a classification problem in which there are two real-valued inputs, and,

More information

Matrix Algebra, part 2

Matrix Algebra, part 2 Matrix Algebra, part 2 Ming-Ching Luoh 2005.9.12 1 / 38 Diagonalization and Spectral Decomposition of a Matrix Optimization 2 / 38 Diagonalization and Spectral Decomposition of a Matrix Also called Eigenvalues

More information

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a

More information

Bayesian estimation of a covariance matrix with flexible prior specification

Bayesian estimation of a covariance matrix with flexible prior specification Ann Inst Stat Math 2012 64:319 342 DOI 10.1007/s10463-010-0314-5 Bayesian estimation of a covariance matrix with flexible prior specification Chih-Wen Hsu Marick S. Sinay John S. J. Hsu Received: 14 January

More information

A General Framework for High-Dimensional Inference and Multiple Testing

A General Framework for High-Dimensional Inference and Multiple Testing A General Framework for High-Dimensional Inference and Multiple Testing Yang Ning Department of Statistical Science Joint work with Han Liu 1 Overview Goal: Control false scientific discoveries in high-dimensional

More information

Group exponential penalties for bi-level variable selection

Group exponential penalties for bi-level variable selection for bi-level variable selection Department of Biostatistics Department of Statistics University of Kentucky July 31, 2011 Introduction In regression, variables can often be thought of as grouped: Indicator

More information

arxiv: v2 [math.st] 2 Jul 2017

arxiv: v2 [math.st] 2 Jul 2017 A Relaxed Approach to Estimating Large Portfolios Mehmet Caner Esra Ulasan Laurent Callot A.Özlem Önder July 4, 2017 arxiv:1611.07347v2 [math.st] 2 Jul 2017 Abstract This paper considers three aspects

More information

An algorithm for the multivariate group lasso with covariance estimation

An algorithm for the multivariate group lasso with covariance estimation An algorithm for the multivariate group lasso with covariance estimation arxiv:1512.05153v1 [stat.co] 16 Dec 2015 Ines Wilms and Christophe Croux Leuven Statistics Research Centre, KU Leuven, Belgium Abstract

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Review. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with

More information

STAT 730 Chapter 4: Estimation

STAT 730 Chapter 4: Estimation STAT 730 Chapter 4: Estimation Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 23 The likelihood We have iid data, at least initially. Each datum

More information

STATS306B STATS306B. Discriminant Analysis. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010

STATS306B STATS306B. Discriminant Analysis. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010 STATS306B Discriminant Analysis Jonathan Taylor Department of Statistics Stanford University June 3, 2010 Spring 2010 Classification Given K classes in R p, represented as densities f i (x), 1 i K classify

More information

Frontiers in Forecasting, Minneapolis February 21-23, Sparse VAR-Models. Christophe Croux. EDHEC Business School (France)

Frontiers in Forecasting, Minneapolis February 21-23, Sparse VAR-Models. Christophe Croux. EDHEC Business School (France) Frontiers in Forecasting, Minneapolis February 21-23, 2018 Sparse VAR-Models Christophe Croux EDHEC Business School (France) Joint Work with Ines Wilms (Cornell University), Luca Barbaglia (KU leuven),

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Midterm exam CS 189/289, Fall 2015

Midterm exam CS 189/289, Fall 2015 Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points

More information

Stochastic Proximal Gradient Algorithm

Stochastic Proximal Gradient Algorithm Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind

More information

Supplement to A Generalized Least Squares Matrix Decomposition. 1 GPMF & Smoothness: Ω-norm Penalty & Functional Data

Supplement to A Generalized Least Squares Matrix Decomposition. 1 GPMF & Smoothness: Ω-norm Penalty & Functional Data Supplement to A Generalized Least Squares Matrix Decomposition Genevera I. Allen 1, Logan Grosenic 2, & Jonathan Taylor 3 1 Department of Statistics and Electrical and Computer Engineering, Rice University

More information

Sparse PCA with applications in finance

Sparse PCA with applications in finance Sparse PCA with applications in finance A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon 1 Introduction

More information

Learning discrete graphical models via generalized inverse covariance matrices

Learning discrete graphical models via generalized inverse covariance matrices Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,

More information

Effective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data

Effective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data Effective Linear Discriant Analysis for High Dimensional, Low Sample Size Data Zhihua Qiao, Lan Zhou and Jianhua Z. Huang Abstract In the so-called high dimensional, low sample size (HDLSS) settings, LDA

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

Introduction to Machine Learning. Regression. Computer Science, Tel-Aviv University,

Introduction to Machine Learning. Regression. Computer Science, Tel-Aviv University, 1 Introduction to Machine Learning Regression Computer Science, Tel-Aviv University, 2013-14 Classification Input: X Real valued, vectors over real. Discrete values (0,1,2,...) Other structures (e.g.,

More information