Log Covariance Matrix Estimation
|
|
- Austin Sanders
- 6 years ago
- Views:
Transcription
1 Log Covariance Matrix Estimation Xinwei Deng Department of Statistics University of Wisconsin-Madison Joint work with Kam-Wah Tsui (Univ. of Wisconsin-Madsion) 1
2 Outline Background and Motivation The Proposed Log-ME Method Simulation and Real Example Summary and Discussion 2
3 Background Covariance matrix estimation is important in multivariate analysis and many statistical applications. Suppose x 1,...,x n are i.i.d. p-dimensional random vectors N(0,Σ). Let S = n i=1 x ix i /n be the sample covariance matrix. The negative log-likelihood function is proportional to L n (Σ) = log Σ 1 +tr[σ 1 S]. (1) Recent interests of p is large or p n. S is not a stable estimate. The largest eigenvalues of S overly estimate the true eigenvalues. When p > n, S is singular and the smallest eigenvalue is zero. How to estimate Σ 1? 3
4 Recent Estimation Methods on Σ or Σ 1 Reduce number of nonzeros estimates of Σ or Σ 1. Σ: Bickel and Levina (2008), using thresholding. Σ 1 : Yuan and Lin (2007), l 1 penalty on Σ 1. Friedman et al., (2008), Graphical Lasso. Meinshausen and Buhlmann (2006), Reformulated as regression. Shrinkage estimates of the covariance matrix. Ledoit and Wolf (2006), ρσ+(1 ρ)µi. Won et al. (2009), control the condition number (largest eigenvalue/smallest eigenvalue). 4
5 Motivation Estimate of Σ or Σ 1 needs to be positive definite. The mathematical restriction makes the covariance matrix estimation problem challenging. Any positive definite Σ can be expressed as a matrix exponential of a real symmetric matrix A. Σ = exp(a) = I + A+ A2 2! + Expressing the likelihood function in terms of A log(σ) releases the mathematical restriction. Consider the spectral decomposition of Σ = T DT with D = diag(d 1,...,d p ). Then A = T MT with M = diag(log(d 1 ),...,log(d p )). 5
6 Idea of the Proposed Method Leonard and Hsu (1992) used this log-transformation method to estimate Σ by approximating the likelihood using Volterra integral equation. Their approximation based on on S being nonsingular not applicable when p n. We extend the likelihood approximation to the case of singular S. Regularize the largest and smallest eigenvalues of Σ simultaneously. An efficient iterative quadratic programming algorithm to estimate A (log Σ). Call the resulting estimate Log-ME, Logarithm-transformed Matrix Estimate. 6
7 A Simple Example Experiment: simulate x i s from N(0,I), i = 1,...,n where n = 50. For each p varying from 5 to 100, consider the the largest and smallest eigenvalues of the covariance matrix estimate. For each p, repeat the experiment 100 times and compute the average of the largest eigenvalues and the average of the smallest eigenvalues for The sample covariance matrix. The Log-ME covariance matrix estimate 7
8 The averages of the largest and smallest eigenvalues of covariance matrix estimates over the dimension p.the true eigenvalues are all equal to 1. 8
9 The Transformed Log-Likelihood In terms of the covariance matrix logarithm A, the negative log-likelihood function in (1) becomes L n (A) = tr(a)+tr[exp( A)S]. (2) The problem of estimating a positive definite matrix Σ now becomes a problem of estimating a real symmetric matrix A. Because of the matrix exponential term exp( A)S, estimating A by directly minimizing L n (A) is nontrivial. Our approach: Approximate exp( A)S using the Volterra integral equation (valid even for S singular case). 9
10 The Volterra Integral Equation The Volterra integral equation (Bellman, 1970, page 175) is exp(at) = exp(a 0 t)+ t 0 exp(a 0 (t s))(a A 0 )exp(as)ds. (3) Repeatedly applying (3) leads to exp(at) = exp(a 0 t)+ + t s 0 0 t 0 exp(a 0 (t s))(a A 0 )exp(a 0 s)ds exp(a 0 (t s))(a A 0 )exp(a 0 (s u))(a A 0 )exp(a 0 u)duds + cubic and higher order terms, (4) where A 0 = log(σ 0 ) and Σ 0 is an initial estimate of Σ. The expression of exp( A) can be obtained by letting t = 1 in (4) and replacing A,A 0 in (4) with A, A 0. 10
11 Approximation to the Log-Likelihood The term tr[exp( A)S] can be written as tr[exp( A)S] =tr(sσ 1 0 ) s tr[(a A 0 )Σ s 0 SΣs 1 0 ]ds tr[(a A 0 )Σ u s 0 (A A 0 )Σ u 0 SΣs 1 0 ]duds + cubic and higher order terms. (5) By leaving out the higher order terms in (5), we approximate L n (A) by using l n (A): [ 1 ] l n (A) =tr(sσ 1 0 ) tr[(a A 0 )Σ s 0 SΣs 1 0 ]ds tr(a) + 1 s tr[(a A 0 )Σ u s 0 (A A 0 )Σ u 0 SΣs 1 0 ]duds. (6) 11
12 Explicit Form of l n (A) The integrations in l n (A) can be analytically solved through the spectral decomposition of Σ 0 = T 0 D 0 T 0. Some Notation: Here D 0 = diag(d (0) 1,...,d(0) p ) with d (0) i s as the eigenvalues of Σ 0. T 0 = (t (0) 1,...,t(0) p ) with t (0) i as the corresponding eigenvector for d (0) i. Let B = T 0 (A A 0)T 0 = (b i j ) p p, and S = T 0 ST 0 = ( s i j ) p p. The l n (A) can be written as a function of b i j : l n (A) = p i=1 1 2 ξ iib 2 ii + i< j ξ i j b 2 i j + 2 p i=1 τ i j b ii b i j + j i p k=1 i< j,i k, j k η ki j b ik b k j [ p β ii b ii + 2 β i j b i j ], (7) i=1 i< j up to some constant. Getting B Getting A. 12
13 Some Details For the linear term, β ii = s ii d (0) i 1, β i j = s i j(d (0) i d (0) j )/(d (0) i d (0) j ) (logd (0) i logd (0) j ). For the quadratic term, ξ ii = s ii d (0), i ξ i j = s ii/d (0) i s j j /d (0) j logd (0) j logd (0) i τ i j = η ki j = 1/d(0) j 1/d (0) i (logd (0) j logd (0) 1/d (0) i 1/d (0) j + (d(0) i /d (0) j 1) s ii /d (0) i +(d (0) j /d (0) i 1) s j j /d (0) j (logd (0) j logd (0), i ) 2 (0) i ) + 1/di 2 logd (0) j logd (0) s i j, i log(d (0) k /d (0) j )log(d (0) j /d (0) i ) + 1/d (0) j 1/d (0) i log(d (0) k /d (0) i )log(d (0) i /d (0) j ) + 2/d (0) k 1/d (0) i 1/d (0) j log(d (0) k /d (0) i )log(d (0) k /d (0) j ) s i j. 13
14 The Log-ME Method Propose a regularized method to estimate Σ by using the approximate log-likelihood function l n (A). Consider the penalty function A 2 F = tr(a2 ) = p i=1 (log(d i)) 2, where d i is the eigenvalue of the covariance matrix Σ. If d i goes to zero or diverges to infinity, the value of log(d i ) goes to infinity in both cases. Such a penalty function can simultaneously regularize the largest and smallest eigenvalues of the covariance matrix estimate. Estimate Σ, or equivalently A, by minimizing where λ is a tuning parameter. l n,λ (B) l n,λ (A) = l n (A)+λtr(A 2 ), (8) 14
15 An Iterative Algorithm The l n,λ (B) depends on an initial estimate Σ 0, or equivalently, A 0. Propose to iteratively use l n,λ (B) to obtain its minimizer ˆB: Algorithm: Step 1: Set an initial covariance matrix estimate Σ 0, a positive definite matrix. Step 2: Use the spectral decomposition Σ 0 = T 0 D 0 T 0, and set A 0 = log(σ 0 ). Step 3: Compute ˆB by minimizing l n,λ in (10). Then obtain  = T 0 ˆBT 0 + A 0, and update the estimate of Σ by ˆΣ = exp(â) = exp(t 0 ˆBT 0 + A 0 ). Step 4: Check if ˆΣ Σ 0 2 F is less than a pre-specified positive tolerance value. Otherwise, set Σ 0 = ˆΣ and go back to Step 2. Set an initial Σ 0 in Step 1 to be S+εI. 15
16 Simulation Study Six different covariance models of Σ = (σ i j ) p p are used for comparison, Model 1: Homogeneous model with Σ = I. Model 2: MA(1) model with σ ii = 1,σ i,i 1 = σ i 1,i = Model 3: Circle model with σ ii = 1,σ i,i 1 = σ i 1,i = 0.3, σ 1,p = σ p,1 = 0.3. Compare four estimation methods: the banding estimate (Bickel and Levina, 2008), the LW estimate (Ledoit and Wolf, 2006), the Glasso estimate (Yuan and Lin, 2007), and the CN estimate (Won et al., 2009). Consider two loss functions to evaluate the performance of each method, KL = log ˆΣ 1 +tr( ˆΣ 1 Σ) ( log Σ 1 + p), 1 = ˆ d 1 / ˆ d p d 1 /d p, where d 1 and d p are the largest and smallest eigenvalue of Σ. Denote dˆ 1 and dˆ p to be their estimates. 16
17 Simulation Results Averages and standard errors from 100 runs in the case of n = 50, p = 50. Log-ME Banding LW Glasso CN Model KL 1 KL 1 KL 1 KL 1 KL (0.00) (0.00) (0.04) (0.52) (0.01) (0.02) (0.02) (0.02) (0.02) (0.01) * (0.02) (0.05) (882.90) (152.82) (0.02) (0.04) (0.03) (0.03) (0.02) (0.02) (0.01) (0.01) (0.13) (0.39) (0.01) (0.03) (0.02) (0.02) (0.01) (0.02) Note: The value marked with means it is affected by the matrix singularity. 17
18 Portfolio Optimization of Stock Data Apply the Log-ME method in an application of portfolio optimization. In mean-variance optimization, the risk of a portfolio w = (w 1,...,w p ) is measured by the standard deviation w T Σ 1 w, where w i 0 and p i w i = 1. The estimated minimum variance portfolio optimization problem is min w wt ˆΣ 1 w (9) s.t. p i w i = 1, where ˆΣ is an estimate of the true covariance matrix Σ. An accurate covariance matrix estimate ˆΣ can lead to a better portfolio strategy. 18
19 The Setting-up Consider the weekly returns of p = 30 components of the Dow Jones Industrial Index from January 8th, 2007 to June 28th, Use the first n = 50 observations as the training set, the next 50 observations as the validation set, and the remaining 83 observations for the test set. Let X ts be the test set and S ts be the sample covariance matrix of X ts. The performance of a portfolio w is measured by the realized return and the realized risk R(w) = x X ts w T x, σ(w) = w T S ts w. The optimal portfolio w is computed with ˆΣ estimated by the Log-ME method, the CN method (Won et al., 2009) and the S, separately. 19
20 The Comparison Results Table 1. The comparison of the realized return and the realized risk. Log-ME CN S Realized return R( w) Realized risk σ( w) The Log-ME method produced a portfolio with a larger realized return but smaller realized risk. 20
21 Comparison in Different Periods Consider the portfolio strategy using the Log-ME method for various covariance matrix estimation methods. Given a stating week, use the first 50 observations as the training set, the next 50 observations as a validation set, and the third 50 observations as a test set. Shift the starting week one ahead every time, and evaluate the portfolio strategy of 33 different consecutive test periods. The optimal portfolio w is computed with ˆΣ estimated by the Log-ME method, the CN method and the sample covariance matrix method, separately. 21
22 The Realized Returns Realized Return Log ME CN S 0.5 Realized Return The proposed Log-ME covariance matrix estimate can lead to higher returns. 22
23 The Realized Risks Realized Risk Log ME CN S Realized Risk The log-me method has relatively higher risks than the CN method, but it provides much larger realized returns than the CN method. 23
24 Summary Estimate the covariance matrix through its matrix logarithm based on a penalized likelihood function. The Log-ME method regularizes the largest and smallest eigenvalues simultaneously by imposing a convex penalty. Other penalty functions can be considered to improve the estimation in different perspectives. Extend to Bayesian covariance matrix estimation for the large-p-small-n problem. 24
25 Thank you! 25
26 The Log-ME Method (Con t) Note that tr(a 2 ) = tr((t 0 BT 0 + A 0) 2 ) is equivalent to tr(b 2 )+2tr(BΓ) up to some constant, where Γ = (γ i j ) p p = T 0 A 0T 0. In terms of B, the function l n,λ (A) becomes l n,λ (B) = p i=1 + λ 1 2 ξ iib 2 ii + i< j ( p β ii b ii + 2 i=1 i< j [ 1 2 p b 2 ii + i=1 ξ i j b 2 i j + 2 p i=1 β i j b i j ) p b 2 i j + i< j τ i j b ii b i j + j i p k=1 p γ ii b ii + 2 γ i j b i j ]. i=1 i< j The l n,λ (B) is still a quadratic function of B = (b i j ). i< j,i k, j k η ki j b ik b k j (10) 26
27 The CN Method The CN method is to estimate Σ with a constraint on its condition number (Won et al., 2009). They consider ˆΣ = Tdiag(û 1 1,...,û 1 p )T, where T is from the spectral decomposition of S = Tdiag(l 1,...,l p )T. The û 1,...,û p are obtained by solving the constraint optimization min u,u 1,...,u p p i where κ max is a tuning parameter. (l i u i logu i ) s.t. u u i κ max u, i = 1,..., p, The tuning parameter is computed through an independent validation set. 27
Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results
Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results David Prince Biostat 572 dprince3@uw.edu April 19, 2012 David Prince (UW) SPICE April 19, 2012 1 / 11 Electronic
More informationProperties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation
Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana
More informationHigh-dimensional covariance estimation based on Gaussian graphical models
High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,
More informationPermutation-invariant regularization of large covariance matrices. Liza Levina
Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work
More informationBAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage
BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement
More informationEstimation of the Global Minimum Variance Portfolio in High Dimensions
Estimation of the Global Minimum Variance Portfolio in High Dimensions Taras Bodnar, Nestor Parolya and Wolfgang Schmid 07.FEBRUARY 2014 1 / 25 Outline Introduction Random Matrix Theory: Preliminary Results
More informationStatistical Inference On the High-dimensional Gaussian Covarianc
Statistical Inference On the High-dimensional Gaussian Covariance Matrix Department of Mathematical Sciences, Clemson University June 6, 2011 Outline Introduction Problem Setup Statistical Inference High-Dimensional
More informationHigh-dimensional Covariance Estimation Based On Gaussian Graphical Models
High-dimensional Covariance Estimation Based On Gaussian Graphical Models Shuheng Zhou, Philipp Rutimann, Min Xu and Peter Buhlmann February 3, 2012 Problem definition Want to estimate the covariance matrix
More informationHigh-dimensional regression with unknown variance
High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march 2012 Setting Gaussian regression with unknown variance: Y i = f i + ε i with ε i i.i.d. N (0, σ 2 ) f = (f
More informationAn efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss
An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao
More informationA New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables
A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables Qi Tang (Joint work with Kam-Wah Tsui and Sijian Wang) Department of Statistics University of Wisconsin-Madison Feb. 8,
More informationHigh Dimensional Inverse Covariate Matrix Estimation via Linear Programming
High Dimensional Inverse Covariate Matrix Estimation via Linear Programming Ming Yuan October 24, 2011 Gaussian Graphical Model X = (X 1,..., X p ) indep. N(µ, Σ) Inverse covariance matrix Σ 1 = Ω = (ω
More information10708 Graphical Models: Homework 2
10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves
More informationConfidence Intervals for Low-dimensional Parameters with High-dimensional Data
Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology
More informationHigh Dimensional Covariance and Precision Matrix Estimation
High Dimensional Covariance and Precision Matrix Estimation Wei Wang Washington University in St. Louis Thursday 23 rd February, 2017 Wei Wang (Washington University in St. Louis) High Dimensional Covariance
More informationarxiv: v1 [stat.me] 16 Feb 2018
Vol., 2017, Pages 1 26 1 arxiv:1802.06048v1 [stat.me] 16 Feb 2018 High-dimensional covariance matrix estimation using a low-rank and diagonal decomposition Yilei Wu 1, Yingli Qin 1 and Mu Zhu 1 1 The University
More informationPrecision Matrix Estimation with ROPE
Precision Matrix Estimation with ROPE Kuismin M. O. Department of Mathematical Sciences, University of Oulu, Finland and Kemppainen J. T. Department of Mathematical Sciences, University of Oulu, Finland
More informationECE521 lecture 4: 19 January Optimization, MLE, regularization
ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity
More informationSparse estimation of high-dimensional covariance matrices
Sparse estimation of high-dimensional covariance matrices by Adam J. Rothman A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Statistics) in The
More informationNonparametric Eigenvalue-Regularized Precision or Covariance Matrix Estimator
Nonparametric Eigenvalue-Regularized Precision or Covariance Matrix Estimator Clifford Lam Department of Statistics, London School of Economics and Political Science Abstract We introduce nonparametric
More informationTheory and Applications of High Dimensional Covariance Matrix Estimation
1 / 44 Theory and Applications of High Dimensional Covariance Matrix Estimation Yuan Liao Princeton University Joint work with Jianqing Fan and Martina Mincheva December 14, 2011 2 / 44 Outline 1 Applications
More informationLearning Gaussian Graphical Models with Unknown Group Sparsity
Learning Gaussian Graphical Models with Unknown Group Sparsity Kevin Murphy Ben Marlin Depts. of Statistics & Computer Science Univ. British Columbia Canada Connections Graphical models Density estimation
More informationSparse Permutation Invariant Covariance Estimation
Sparse Permutation Invariant Covariance Estimation Adam J. Rothman University of Michigan, Ann Arbor, USA. Peter J. Bickel University of California, Berkeley, USA. Elizaveta Levina University of Michigan,
More informationSparse Linear Discriminant Analysis With High Dimensional Data
Sparse Linear Discriminant Analysis With High Dimensional Data Jun Shao University of Wisconsin Joint work with Yazhen Wang, Xinwei Deng, Sijian Wang Jun Shao (UW-Madison) Sparse Linear Discriminant Analysis
More informationSparse Permutation Invariant Covariance Estimation: Final Talk
Sparse Permutation Invariant Covariance Estimation: Final Talk David Prince Biostat 572 dprince3@uw.edu May 31, 2012 David Prince (UW) SPICE May 31, 2012 1 / 19 Electronic Journal of Statistics Vol. 2
More informationHigh Dimensional Discriminant Analysis
High Dimensional Discriminant Analysis Charles Bouveyron LMC-IMAG & INRIA Rhône-Alpes Joint work with S. Girard and C. Schmid High Dimensional Discriminant Analysis - Lear seminar p.1/43 Introduction High
More informationGraphical Model Selection
May 6, 2013 Trevor Hastie, Stanford Statistics 1 Graphical Model Selection Trevor Hastie Stanford University joint work with Jerome Friedman, Rob Tibshirani, Rahul Mazumder and Jason Lee May 6, 2013 Trevor
More informationPaper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)
Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation
More informationJournal of Multivariate Analysis. Sphericity test in a GMANOVA MANOVA model with normal error
Journal of Multivariate Analysis 00 (009) 305 3 Contents lists available at ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva Sphericity test in a GMANOVA MANOVA
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2
MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 1 Ridge Regression Ridge regression and the Lasso are two forms of regularized
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationBayesian Gaussian / Linear Models. Read Sections and 3.3 in the text by Bishop
Bayesian Gaussian / Linear Models Read Sections 2.3.3 and 3.3 in the text by Bishop Multivariate Gaussian Model with Multivariate Gaussian Prior Suppose we model the observed vector b as having a multivariate
More informationA Quick Tour of Linear Algebra and Optimization for Machine Learning
A Quick Tour of Linear Algebra and Optimization for Machine Learning Masoud Farivar January 8, 2015 1 / 28 Outline of Part I: Review of Basic Linear Algebra Matrices and Vectors Matrix Multiplication Operators
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationA Consistent Model Selection Criterion for L 2 -Boosting in High-Dimensional Sparse Linear Models
A Consistent Model Selection Criterion for L 2 -Boosting in High-Dimensional Sparse Linear Models Tze Leung Lai, Stanford University Ching-Kang Ing, Academia Sinica, Taipei Zehao Chen, Lehman Brothers
More informationJournal of Multivariate Analysis. Consistency of sparse PCA in High Dimension, Low Sample Size contexts
Journal of Multivariate Analysis 5 (03) 37 333 Contents lists available at SciVerse ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva Consistency of sparse PCA
More informationarxiv: v1 [math.st] 31 Jan 2008
Electronic Journal of Statistics ISSN: 1935-7524 Sparse Permutation Invariant arxiv:0801.4837v1 [math.st] 31 Jan 2008 Covariance Estimation Adam Rothman University of Michigan Ann Arbor, MI 48109-1107.
More informationISyE 691 Data mining and analytics
ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)
More informationEstimating Covariance Structure in High Dimensions
Estimating Covariance Structure in High Dimensions Ashwini Maurya Michigan State University East Lansing, MI, USA Thesis Director: Dr. Hira L. Koul Committee Members: Dr. Yuehua Cui, Dr. Hyokyoung Hong,
More informationBi-level feature selection with applications to genetic association
Bi-level feature selection with applications to genetic association studies October 15, 2008 Motivation In many applications, biological features possess a grouping structure Categorical variables may
More informationA path following algorithm for Sparse Pseudo-Likelihood Inverse Covariance Estimation (SPLICE)
A path following algorithm for Sparse Pseudo-Likelihood Inverse Covariance Estimation (SPLICE) Guilherme V. Rocha, Peng Zhao, Bin Yu July 24, 2008 Abstract Given n observations of a p-dimensional random
More informationLasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices
Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,
More informationThe annals of Statistics (2006)
High dimensional graphs and variable selection with the Lasso Nicolai Meinshausen and Peter Buhlmann The annals of Statistics (2006) presented by Jee Young Moon Feb. 19. 2010 High dimensional graphs and
More informationHigh Dimensional Discriminant Analysis
High Dimensional Discriminant Analysis Charles Bouveyron 1,2, Stéphane Girard 1, and Cordelia Schmid 2 1 LMC IMAG, BP 53, Université Grenoble 1, 38041 Grenoble cedex 9 France (e-mail: charles.bouveyron@imag.fr,
More informationClassification of high dimensional data: High Dimensional Discriminant Analysis
Classification of high dimensional data: High Dimensional Discriminant Analysis Charles Bouveyron, Stephane Girard, Cordelia Schmid To cite this version: Charles Bouveyron, Stephane Girard, Cordelia Schmid.
More informationLinear Models for Classification
Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More informationBayes Estimators & Ridge Regression
Readings Chapter 14 Christensen Merlise Clyde September 29, 2015 How Good are Estimators? Quadratic loss for estimating β using estimator a L(β, a) = (β a) T (β a) How Good are Estimators? Quadratic loss
More informationTECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection
DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI 53706 TECHNICAL REPORT NO. 1091r April 2004, Revised December 2004 A Note on the Lasso and Related Procedures in Model
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More informationSparse Permutation Invariant Covariance Estimation
Sparse Permutation Invariant Covariance Estimation Adam J. Rothman University of Michigan Ann Arbor, MI 48109-1107 e-mail: ajrothma@umich.edu Peter J. Bickel University of California Berkeley, CA 94720-3860
More informationChapter 17: Undirected Graphical Models
Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)
More informationSparsity Regularization
Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation
More informationGeodesic Convexity and Regularized Scatter Estimation
Geodesic Convexity and Regularized Scatter Estimation Lutz Duembgen (Bern) David Tyler (Rutgers) Klaus Nordhausen (Turku/Vienna), Heike Schuhmacher (Bern) Markus Pauly (Ulm), Thomas Schweizer (Bern) Düsseldorf,
More informationA Modern Look at Classical Multivariate Techniques
A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationComputational and Statistical Aspects of Statistical Machine Learning. John Lafferty Department of Statistics Retreat Gleacher Center
Computational and Statistical Aspects of Statistical Machine Learning John Lafferty Department of Statistics Retreat Gleacher Center Outline Modern nonparametric inference for high dimensional data Nonparametric
More informationShrinkage Tuning Parameter Selection in Precision Matrices Estimation
arxiv:0909.1123v1 [stat.me] 7 Sep 2009 Shrinkage Tuning Parameter Selection in Precision Matrices Estimation Heng Lian Division of Mathematical Sciences School of Physical and Mathematical Sciences Nanyang
More informationLinear classifiers: Overfitting and regularization
Linear classifiers: Overfitting and regularization Emily Fox University of Washington January 25, 2017 Logistic regression recap 1 . Thus far, we focused on decision boundaries Score(x i ) = w 0 h 0 (x
More informationEstimating Covariance Using Factorial Hidden Markov Models
Estimating Covariance Using Factorial Hidden Markov Models João Sedoc 1,2 with: Jordan Rodu 3, Lyle Ungar 1, Dean Foster 1 and Jean Gallier 1 1 University of Pennsylvania Philadelphia, PA joao@cis.upenn.edu
More informationNonparametric Inference In Functional Data
Nonparametric Inference In Functional Data Zuofeng Shang Purdue University Joint work with Guang Cheng from Purdue Univ. An Example Consider the functional linear model: Y = α + where 1 0 X(t)β(t)dt +
More informationLogistic Regression. Sargur N. Srihari. University at Buffalo, State University of New York USA
Logistic Regression Sargur N. University at Buffalo, State University of New York USA Topics in Linear Classification using Probabilistic Discriminative Models Generative vs Discriminative 1. Fixed basis
More information1 Data Arrays and Decompositions
1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is
More informationConvex relaxation for Combinatorial Penalties
Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,
More informationANOVA: Analysis of Variance - Part I
ANOVA: Analysis of Variance - Part I The purpose of these notes is to discuss the theory behind the analysis of variance. It is a summary of the definitions and results presented in class with a few exercises.
More informationNUCLEAR NORM PENALIZED ESTIMATION OF INTERACTIVE FIXED EFFECT MODELS. Incomplete and Work in Progress. 1. Introduction
NUCLEAR NORM PENALIZED ESTIMATION OF IERACTIVE FIXED EFFECT MODELS HYUNGSIK ROGER MOON AND MARTIN WEIDNER Incomplete and Work in Progress. Introduction Interactive fixed effects panel regression models
More informationMultivariate Gaussian Analysis
BS2 Statistical Inference, Lecture 7, Hilary Term 2009 February 13, 2009 Marginal and conditional distributions For a positive definite covariance matrix Σ, the multivariate Gaussian distribution has density
More informationLeast Squares Regression
E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute
More informationSparse inverse covariance estimation with the lasso
Sparse inverse covariance estimation with the lasso Jerome Friedman Trevor Hastie and Robert Tibshirani November 8, 2007 Abstract We consider the problem of estimating sparse graphs by a lasso penalty
More informationMultivariate Bayesian Linear Regression MLAI Lecture 11
Multivariate Bayesian Linear Regression MLAI Lecture 11 Neil D. Lawrence Department of Computer Science Sheffield University 21st October 2012 Outline Univariate Bayesian Linear Regression Multivariate
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V
More informationNonconcave Penalized Likelihood with A Diverging Number of Parameters
Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA
More informationSCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University.
SCMA292 Mathematical Modeling : Machine Learning Krikamol Muandet Department of Mathematics Faculty of Science, Mahidol University February 9, 2016 Outline Quick Recap of Least Square Ridge Regression
More informationLeast Squares Regression
CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the
More informationVasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks
C.M. Bishop s PRML: Chapter 5; Neural Networks Introduction The aim is, as before, to find useful decompositions of the target variable; t(x) = y(x, w) + ɛ(x) (3.7) t(x n ) and x n are the observations,
More informationHigh-dimensional Covariance Estimation Based On Gaussian Graphical Models
Journal of Machine Learning Research 12 (211) 2975-326 Submitted 9/1; Revised 6/11; Published 1/11 High-dimensional Covariance Estimation Based On Gaussian Graphical Models Shuheng Zhou Department of Statistics
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationStatistics & Data Sciences: First Year Prelim Exam May 2018
Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book
More informationSTA 414/2104, Spring 2014, Practice Problem Set #1
STA 44/4, Spring 4, Practice Problem Set # Note: these problems are not for credit, and not to be handed in Question : Consider a classification problem in which there are two real-valued inputs, and,
More informationMatrix Algebra, part 2
Matrix Algebra, part 2 Ming-Ching Luoh 2005.9.12 1 / 38 Diagonalization and Spectral Decomposition of a Matrix Optimization 2 / 38 Diagonalization and Spectral Decomposition of a Matrix Also called Eigenvalues
More informationPh.D. Qualifying Exam Friday Saturday, January 6 7, 2017
Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a
More informationBayesian estimation of a covariance matrix with flexible prior specification
Ann Inst Stat Math 2012 64:319 342 DOI 10.1007/s10463-010-0314-5 Bayesian estimation of a covariance matrix with flexible prior specification Chih-Wen Hsu Marick S. Sinay John S. J. Hsu Received: 14 January
More informationA General Framework for High-Dimensional Inference and Multiple Testing
A General Framework for High-Dimensional Inference and Multiple Testing Yang Ning Department of Statistical Science Joint work with Han Liu 1 Overview Goal: Control false scientific discoveries in high-dimensional
More informationGroup exponential penalties for bi-level variable selection
for bi-level variable selection Department of Biostatistics Department of Statistics University of Kentucky July 31, 2011 Introduction In regression, variables can often be thought of as grouped: Indicator
More informationarxiv: v2 [math.st] 2 Jul 2017
A Relaxed Approach to Estimating Large Portfolios Mehmet Caner Esra Ulasan Laurent Callot A.Özlem Önder July 4, 2017 arxiv:1611.07347v2 [math.st] 2 Jul 2017 Abstract This paper considers three aspects
More informationAn algorithm for the multivariate group lasso with covariance estimation
An algorithm for the multivariate group lasso with covariance estimation arxiv:1512.05153v1 [stat.co] 16 Dec 2015 Ines Wilms and Christophe Croux Leuven Statistics Research Centre, KU Leuven, Belgium Abstract
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationReview. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda
Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with
More informationSTAT 730 Chapter 4: Estimation
STAT 730 Chapter 4: Estimation Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 23 The likelihood We have iid data, at least initially. Each datum
More informationSTATS306B STATS306B. Discriminant Analysis. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010
STATS306B Discriminant Analysis Jonathan Taylor Department of Statistics Stanford University June 3, 2010 Spring 2010 Classification Given K classes in R p, represented as densities f i (x), 1 i K classify
More informationFrontiers in Forecasting, Minneapolis February 21-23, Sparse VAR-Models. Christophe Croux. EDHEC Business School (France)
Frontiers in Forecasting, Minneapolis February 21-23, 2018 Sparse VAR-Models Christophe Croux EDHEC Business School (France) Joint Work with Ines Wilms (Cornell University), Luca Barbaglia (KU leuven),
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationMidterm exam CS 189/289, Fall 2015
Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points
More informationStochastic Proximal Gradient Algorithm
Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind
More informationSupplement to A Generalized Least Squares Matrix Decomposition. 1 GPMF & Smoothness: Ω-norm Penalty & Functional Data
Supplement to A Generalized Least Squares Matrix Decomposition Genevera I. Allen 1, Logan Grosenic 2, & Jonathan Taylor 3 1 Department of Statistics and Electrical and Computer Engineering, Rice University
More informationSparse PCA with applications in finance
Sparse PCA with applications in finance A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon 1 Introduction
More informationLearning discrete graphical models via generalized inverse covariance matrices
Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,
More informationEffective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data
Effective Linear Discriant Analysis for High Dimensional, Low Sample Size Data Zhihua Qiao, Lan Zhou and Jianhua Z. Huang Abstract In the so-called high dimensional, low sample size (HDLSS) settings, LDA
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationIntroduction to Machine Learning. Regression. Computer Science, Tel-Aviv University,
1 Introduction to Machine Learning Regression Computer Science, Tel-Aviv University, 2013-14 Classification Input: X Real valued, vectors over real. Discrete values (0,1,2,...) Other structures (e.g.,
More information