Geodesic Convexity and Regularized Scatter Estimation

Size: px
Start display at page:

Download "Geodesic Convexity and Regularized Scatter Estimation"

Transcription

1 Geodesic Convexity and Regularized Scatter Estimation Lutz Duembgen (Bern) David Tyler (Rutgers) Klaus Nordhausen (Turku/Vienna), Heike Schuhmacher (Bern) Markus Pauly (Ulm), Thomas Schweizer (Bern) Düsseldorf, July 22, 2017

2 I. Geometry of Scatter Matrices II. Geodesic Convexity and Coercivity III. M-Functionals of Scatter IV. Regularization

3 I. Geometry of Scatter Matrices R q q sym := { A R q q : A = A } R q q sym,+ := { A R q q sym : A positive definite } (open convex cone in R q q sym ) A, B := tr(ab) = i,j A ij B ij A F := A, A

4 z x y

5 A = [ z + x y ] y z x = [ ] x y + z I y x 2 A 2 F = 2 (x 2 + y 2 + z 2 ) A positive definite z > x 2 + y 2

6 µ R q Σ R q q sym,+ ˆΣ = sample covariance matrix of X 1, X 2,..., X n i.i.d N q (µ, Σ).

7 m = 50 samples of size n = 100:

8 m = 50 samples of size n = 500:

9 Suitable Geometry W ˆΣ = Σ 1/2 W Σ 1/2 { has universal symmetric distribution(q, n) p I q as n Local distance measure at Σ: d Σ (Σ, ˆΣ) := W I q F d Σ (Σ 0, Σ 1 ) := Σ 1/2 (Σ 0 Σ 1 )Σ 1/2 F

10 Global distance measure (geodesic distance) D g (Σ 0, Σ 1 ) := min over all smooth paths connecting Σ 0 and Σ 1. = min 1 d Σt ( Σt, Σ t+dt ) 0 1 Σ 1/2 t 0 [0, 1] t Σ t Σ t Σ 1/2 F dt t

11 Explicit solution A = log(σ 1/2 0 Σ 1 Σ 1/2 0 ) Σ t = Σ 1/2 0 exp(ta) Σ 1/2 0 D g (Σ 0, Σ 1 ) = A F Note: exp(a) = k=0 A k k! exp ( U diag(λ)u ) = U diag(e λ )U log ( U diag(λ)u ) = U diag(log λ)u

12 *

13

14 Local global parametrizations of R q q sym,+ Σ = BB with nonsingular B R q q R q q sym,+ = { B exp(a)b : A R q q } sym { Γ R q q sym,+ : det(γ) = det(σ)} = { B exp(a)b : A R q q sym, tr(a) = 0 }.

15 Note that for q 2, is not isometric. (R q q sym,+, D g ) Σ log(σ) (R q q sym, F ) y x

16 II. Geodesic Convexity and Coercivity Geodesic Convexity A function is (strictly) geodesically convex if f : R q q sym,+ R nonsingular B R q q, nonzero A R q q sym, f ( B exp(ta)b ) is (strictly) convex in t R. Equivalently: nonsingular B R q q, f ( B diag(e x )B ) is (strictly) convex in x R q.

17 Example The function is geodesically linear: f (Σ) := log det(σ) log det(b exp(a)b ) = log det(bb ) + trace(a).

18 Verifying g-convexity for smooth functions (V2) For any nonsingular B R q q and x R q, f ( B diag(e x )B ) = f (BB ) + gb x x H B x + o( x 2 ) as x 0. f is g-convex iff for all B, H B 0. f is strictly g-convex iff for all B, H B > 0.

19 Example For nonzero v R q, f (Σ) := log v Σv is g-convex. For nonsingular B R q q and w := B v, f ( B diag(e x )B ) = log ( w diag(e x )w ) f (BB ) + gb x x H B x with g B := ( w 2 i w 2 ) q i=1 H B := diag(g B ) g B g B.

20 Remarks Σ f (Σ) g-convex Σ f (Σ 1 ) g-convex. Sums and pointwise suprema of g-convex functions are g-convex. Both log λ max (Σ) and log λ max (Σ 1 ) = log λ min (Σ) are g-convex. f (Σ) g-convex, h : R R convex and increasing = h(f (Σ)) is g-convex. A local minimizer of a g-convex function is also a global minimizer. The only g-affine functions are f (Σ) = c 1 + c 2 log det(σ) with c 1, c 2 R.

21 Geodesic Coercivity Let f : R q q sym,+ R be g-convex / strictly g-convex. Then iff f is g-coercive, arg min f (Σ) is compact / a singleton Σ f (Σ) as log(σ) F. Criterion: If f is differentiable, it is g-coercive iff lim t for any nonzero A R q q sym. d dt f (exp(ta)) > 0

22 III. M-Functionals of Scatter True/empirical distribution Working model/caricature for P: P on R q with center 0 R q. ( f Σ (x) = C det(σ) 1/2 exp ρ(x Σ 1 x) ) 2 ρ(s) in s > 0 sρ (s) in s > 0 In other words, ρ(e x ) and convex in x R.

23 Target function (log-likelihood times 2/n) L(Σ, P) := 2 log[f Σ /f I ] dp = [ρ(x Σ 1 x) ρ(x x) ] P(dx) + log det(σ) M-Functional of scatter Σ(P) := arg min L(Σ, P) Σ R q q sym,+ M-estimator of scatter ˆP = emp. distribution of X 1, X 2,..., X n i.i.d. P Σ( ˆP) estimates Σ(P)

24 L(Σ, P) = Σ(P) = [ρ(x Σ 1 x) ρ(x x) ] P(dx) + log det(σ) arg min L(Σ, P) Σ R q q sym,+ ρ(s) = s: Σ(P) = Var(P) sρ (s) bounded in s 0: Σ( ) is moderately robust P elliptically symmetric with center 0 and scatter Σ: Σ(P) = c Σ

25 Good news In general, L(, P) is geodesically convex. Under mild regularity conditions on P and ρ, L(, P) is geodesically strictly convex and coercive.

26 Taylor expansion with L(B diag(e x )B, P) L(BB, P) + g B x x H B x g B := 1 q ψ B ψ B := ρ ( x 2 )(xi 2 ) q i=1 P B(dx) H B := diag(ψ B ) + ρ ( x 2 )xx P B (dx) P B := L(B 1 X ), X P. Existence, continuity and weak differentiability of Σ( )... Fast algorithms for computation of Σ( ˆP) via partial Newton method...

27 Symmetrization Replace Σ(P) with Σ s (P) := Σ(P P) P P := L(X X ), X, X i.i.d P. Estimator uses or with 1 k n. P P := P P := 1 nk ( ) n 1 δ 2 Xj X i 1 i<j n n i+k i=1 j=i+1 δ Xj X i

28 No need to estimate center of P P elliptically symmetric around µ with scatter Σ: Σ s (P) = c Σ Block independence property: ( P = L B [ X1 X 2 ]) with independent X 1 R q(1), X 2 R q(2) implies [ ] Σ1 (P) 0 Σ s (P) = B B. 0 Σ 2 (P)

29 IV. Regularization In high-dimensional settings replace Σ(P) with ( ) arg min L(Σ, P) + α Pen(Σ), α > 0, Σ R q q sym,+ where Pen : R q q sym,+ R satisfies Pen(cΣ) = Pen(Σ) (scale invariance) Pen(Σ) as λ max (Σ)/λ min (Σ).

30 Examples of penalties: Pen 0 (Σ) = log tr(σ) + log tr(σ 1 ) ( q ) ( q ) = log λ i + log λ 1 i ) i=1 i=1 Pen 1 (Σ) = q 1 log det(σ) + log tr(σ 1 ) q ( q ) = q 1 log λ i + log i=1 i=1 λ 1 i Pen 2 (Σ) = log det(σ) + q log λ max (Σ) q = log(λ i /λ min ) i=1

31 These penalties Pen j (Σ) are scale invariant g-convex g-coercive on {Σ : det(σ) = c} strictly g-convex on {Σ : det(σ) = c} (Pen 0, Pen 1 ) with arg min Σ Pen j (Σ) = {ci q : c > 0}

32 Example: Regularized version of Tyler s (1987) M-functional with f (Σ) = L(Σ, P) + α Pen(Σ) ρ(s) = q log s { Pen 1 (Σ) (Case 1) Pen(Σ) = h(pen 1 (Σ)) (Case 2) On {Σ : det(σ) = 1}, f is strictly g-convex g-coercive in Case 1 if ( P(V) < 1 + α ) dim(v) q q g-coercive in Case 2 if lim s h(s) s whenever 1 dim(v) < q. =

33 Numerical experiment For q = 50 and n = 30 consider X 1, X 2,..., X n i.i.d. Elliptic q (0, Σ) with Σ = diag(10, 5, 3, 2, 1,..., 1) 2.

34 Compute and ˆΣ α ( := arg min L(Σ, ˆP) + αh(pen 1 (Σ)) ) Σ ˆΣ := ˆΣˆα with ˆα := arg min CV(α) CV(α) := α 2 Z n i=1 { ρ(x i ˆΣ 1 α, i x i) + log det(ˆσ α, i ) }

35 log λ(σ ) and log λ(ˆσ ) (ˆα = 2 7 )

36 Cross validation: CV(2 k ) versus k

37 First eigenvectors: û 1 u 1 versus k

38 Eigenvalues: log λ(ˆσ ) log λ(σ ) versus k

39 Shape matrices: D g (ˆΣ, Σ ) versus k

40 Symmetrization and orthogonally invariant penalties f (Σ) = L(Σ, P P) + α Pen(Σ) Pen(U ΣU) = Pen(Σ) for orthogonal U R q q Restricted block independence property ( [ ]) X1 P = L U with U R q q orth and independent X 1 R q(1), X 2 R q(2) implies [ ] Σ1 (P) 0 Σ s (P) = U U. 0 Σ 2 (P) X 2

41 Open questions and ongoing work Symmetrized M-estimators: Balanced incomplete versus complete U-statistics Asymptotics for regularized scatter estimators Algorithms for non-smooth g-convex penalties Using regularized scatter estimators in other contexts (classification, ICS ICA, multivar. regression,... )...

42 References Auderset, Mazza & Ruh: Angular Gaussian and Cauchy estimation. (JMVA 2005) Bhatia: Positive definite matrices. (Princeton University Press 2007) Wiesel: Geodesic convexity and covariance estimation. (IEEE Trans. Signal Process. 2012) D., Pauly & Schweizer: M-functionals of multivariate scatter. (Statistics Surveys 2015) D., Nordhausen & Schuhmacher: New algorithms for M-estimation of multivar. scatter and loc. (JMVA 2016) R package fastm. (CRAN 2014/2015) D. & Tyler: Geodesic convexity and regularized scatter estimators. (arxiv )

Parameter estimation in linear Gaussian covariance models

Parameter estimation in linear Gaussian covariance models Parameter estimation in linear Gaussian covariance models Caroline Uhler (IST Austria) Joint work with Piotr Zwiernik (UC Berkeley) and Donald Richards (Penn State University) Big Data Reunion Workshop

More information

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

Log Covariance Matrix Estimation

Log Covariance Matrix Estimation Log Covariance Matrix Estimation Xinwei Deng Department of Statistics University of Wisconsin-Madison Joint work with Kam-Wah Tsui (Univ. of Wisconsin-Madsion) 1 Outline Background and Motivation The Proposed

More information

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:

More information

Robust subspace recovery by geodesically convex optimization

Robust subspace recovery by geodesically convex optimization Robust subspace recovery by geodesically convex optimization Teng Zhang arxiv:206.386v2 [stat.ml] 0 Jun 202 Abstract We introduce Tyler s M-estimator to robustly recover the underlying linear model from

More information

Sparse Covariance Selection using Semidefinite Programming

Sparse Covariance Selection using Semidefinite Programming Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support

More information

Multivariate Gaussian Analysis

Multivariate Gaussian Analysis BS2 Statistical Inference, Lecture 7, Hilary Term 2009 February 13, 2009 Marginal and conditional distributions For a positive definite covariance matrix Σ, the multivariate Gaussian distribution has density

More information

(Part 1) High-dimensional statistics May / 41

(Part 1) High-dimensional statistics May / 41 Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2

More information

Global Maxwellians over All Space and Their Relation to Conserved Quantites of Classical Kinetic Equations

Global Maxwellians over All Space and Their Relation to Conserved Quantites of Classical Kinetic Equations Global Maxwellians over All Space and Their Relation to Conserved Quantites of Classical Kinetic Equations C. David Levermore Department of Mathematics and Institute for Physical Science and Technology

More information

Invariant coordinate selection for multivariate data analysis - the package ICS

Invariant coordinate selection for multivariate data analysis - the package ICS Invariant coordinate selection for multivariate data analysis - the package ICS Klaus Nordhausen 1 Hannu Oja 1 David E. Tyler 2 1 Tampere School of Public Health University of Tampere 2 Department of Statistics

More information

Finite Singular Multivariate Gaussian Mixture

Finite Singular Multivariate Gaussian Mixture 21/06/2016 Plan 1 Basic definitions Singular Multivariate Normal Distribution 2 3 Plan Singular Multivariate Normal Distribution 1 Basic definitions Singular Multivariate Normal Distribution 2 3 Multivariate

More information

High-dimensional regression with unknown variance

High-dimensional regression with unknown variance High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march 2012 Setting Gaussian regression with unknown variance: Y i = f i + ε i with ε i i.i.d. N (0, σ 2 ) f = (f

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

Recovery of anisotropic metrics from travel times

Recovery of anisotropic metrics from travel times Purdue University The Lens Rigidity and the Boundary Rigidity Problems Let M be a bounded domain with boundary. Let g be a Riemannian metric on M. Define the scattering relation σ and the length (travel

More information

Notes on Random Vectors and Multivariate Normal

Notes on Random Vectors and Multivariate Normal MATH 590 Spring 06 Notes on Random Vectors and Multivariate Normal Properties of Random Vectors If X,, X n are random variables, then X = X,, X n ) is a random vector, with the cumulative distribution

More information

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology

More information

MATH 829: Introduction to Data Mining and Analysis Graphical Models II - Gaussian Graphical Models

MATH 829: Introduction to Data Mining and Analysis Graphical Models II - Gaussian Graphical Models 1/13 MATH 829: Introduction to Data Mining and Analysis Graphical Models II - Gaussian Graphical Models Dominique Guillot Departments of Mathematical Sciences University of Delaware May 4, 2016 Recall

More information

Decomposable and Directed Graphical Gaussian Models

Decomposable and Directed Graphical Gaussian Models Decomposable Decomposable and Directed Graphical Gaussian Models Graphical Models and Inference, Lecture 13, Michaelmas Term 2009 November 26, 2009 Decomposable Definition Basic properties Wishart density

More information

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015 Part IB Statistics Theorems with proof Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly)

More information

1 Overview. 2 A Characterization of Convex Functions. 2.1 First-order Taylor approximation. AM 221: Advanced Optimization Spring 2016

1 Overview. 2 A Characterization of Convex Functions. 2.1 First-order Taylor approximation. AM 221: Advanced Optimization Spring 2016 AM 221: Advanced Optimization Spring 2016 Prof. Yaron Singer Lecture 8 February 22nd 1 Overview In the previous lecture we saw characterizations of optimality in linear optimization, and we reviewed the

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions

More information

Maximum likelihood estimation of a log-concave density based on censored data

Maximum likelihood estimation of a log-concave density based on censored data Maximum likelihood estimation of a log-concave density based on censored data Dominic Schuhmacher Institute of Mathematical Statistics and Actuarial Science University of Bern Joint work with Lutz Dümbgen

More information

Common-Knowledge / Cheat Sheet

Common-Knowledge / Cheat Sheet CSE 521: Design and Analysis of Algorithms I Fall 2018 Common-Knowledge / Cheat Sheet 1 Randomized Algorithm Expectation: For a random variable X with domain, the discrete set S, E [X] = s S P [X = s]

More information

Permutation-invariant regularization of large covariance matrices. Liza Levina

Permutation-invariant regularization of large covariance matrices. Liza Levina Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work

More information

Journal of Multivariate Analysis. Sphericity test in a GMANOVA MANOVA model with normal error

Journal of Multivariate Analysis. Sphericity test in a GMANOVA MANOVA model with normal error Journal of Multivariate Analysis 00 (009) 305 3 Contents lists available at ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva Sphericity test in a GMANOVA MANOVA

More information

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1

More information

Random Matrices and Multivariate Statistical Analysis

Random Matrices and Multivariate Statistical Analysis Random Matrices and Multivariate Statistical Analysis Iain Johnstone, Statistics, Stanford imj@stanford.edu SEA 06@MIT p.1 Agenda Classical multivariate techniques Principal Component Analysis Canonical

More information

16.20 Techniques of Structural Analysis and Design Spring Instructor: Raúl Radovitzky Aeronautics & Astronautics M.I.T

16.20 Techniques of Structural Analysis and Design Spring Instructor: Raúl Radovitzky Aeronautics & Astronautics M.I.T 16.20 Techniques of Structural Analysis and Design Spring 2013 Instructor: Raúl Radovitzky Aeronautics & Astronautics M.I.T February 15, 2013 2 Contents 1 Stress and equilibrium 5 1.1 Internal forces and

More information

The problem is to infer on the underlying probability distribution that gives rise to the data S.

The problem is to infer on the underlying probability distribution that gives rise to the data S. Basic Problem of Statistical Inference Assume that we have a set of observations S = { x 1, x 2,..., x N }, xj R n. The problem is to infer on the underlying probability distribution that gives rise to

More information

OPTIMISATION CHALLENGES IN MODERN STATISTICS. Co-authors: Y. Chen, M. Cule, R. Gramacy, M. Yuan

OPTIMISATION CHALLENGES IN MODERN STATISTICS. Co-authors: Y. Chen, M. Cule, R. Gramacy, M. Yuan OPTIMISATION CHALLENGES IN MODERN STATISTICS Co-authors: Y. Chen, M. Cule, R. Gramacy, M. Yuan How do optimisation problems arise in Statistics? Let X 1,...,X n be independent and identically distributed

More information

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j. Chapter 9 Pearson s chi-square test 9. Null hypothesis asymptotics Let X, X 2, be independent from a multinomial(, p) distribution, where p is a k-vector with nonnegative entries that sum to one. That

More information

Naive Bayes and Gaussian Bayes Classifier

Naive Bayes and Gaussian Bayes Classifier Naive Bayes and Gaussian Bayes Classifier Mengye Ren mren@cs.toronto.edu October 18, 2015 Mengye Ren Naive Bayes and Gaussian Bayes Classifier October 18, 2015 1 / 21 Naive Bayes Bayes Rules: Naive Bayes

More information

Active Set Methods for Log-Concave Densities and Nonparametric Tail Inflation

Active Set Methods for Log-Concave Densities and Nonparametric Tail Inflation Active Set Methods for Log-Concave Densities and Nonparametric Tail Inflation Lutz Duembgen (Bern) Aleandre Moesching and Christof Straehl (Bern) Peter McCullagh and Nicholas G. Polson (Chicago) January

More information

Naive Bayes and Gaussian Bayes Classifier

Naive Bayes and Gaussian Bayes Classifier Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others February 22, 2016 Naive Bayes and Gaussian Bayes Classifier February 22, 2016 1 / 21 Naive Bayes Bayes Rule:

More information

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings Structural and Multidisciplinary Optimization P. Duysinx and P. Tossings 2018-2019 CONTACTS Pierre Duysinx Institut de Mécanique et du Génie Civil (B52/3) Phone number: 04/366.91.94 Email: P.Duysinx@uliege.be

More information

Multivariate Analysis and Likelihood Inference

Multivariate Analysis and Likelihood Inference Multivariate Analysis and Likelihood Inference Outline 1 Joint Distribution of Random Variables 2 Principal Component Analysis (PCA) 3 Multivariate Normal Distribution 4 Likelihood Inference Joint density

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Hyperbolic Systems of Conservation Laws. in One Space Dimension. I - Basic concepts. Alberto Bressan. Department of Mathematics, Penn State University

Hyperbolic Systems of Conservation Laws. in One Space Dimension. I - Basic concepts. Alberto Bressan. Department of Mathematics, Penn State University Hyperbolic Systems of Conservation Laws in One Space Dimension I - Basic concepts Alberto Bressan Department of Mathematics, Penn State University http://www.math.psu.edu/bressan/ 1 The Scalar Conservation

More information

INVARIANT COORDINATE SELECTION

INVARIANT COORDINATE SELECTION INVARIANT COORDINATE SELECTION By David E. Tyler 1, Frank Critchley, Lutz Dümbgen 2, and Hannu Oja Rutgers University, Open University, University of Berne and University of Tampere SUMMARY A general method

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Kernel Methods. Machine Learning A W VO

Kernel Methods. Machine Learning A W VO Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance

More information

Independent component analysis for functional data

Independent component analysis for functional data Independent component analysis for functional data Hannu Oja Department of Mathematics and Statistics University of Turku Version 12.8.216 August 216 Oja (UTU) FICA Date bottom 1 / 38 Outline 1 Probability

More information

A sensitivity result for quadratic semidefinite programs with an application to a sequential quadratic semidefinite programming algorithm

A sensitivity result for quadratic semidefinite programs with an application to a sequential quadratic semidefinite programming algorithm Volume 31, N. 1, pp. 205 218, 2012 Copyright 2012 SBMAC ISSN 0101-8205 / ISSN 1807-0302 (Online) www.scielo.br/cam A sensitivity result for quadratic semidefinite programs with an application to a sequential

More information

COMS 4771 Lecture Course overview 2. Maximum likelihood estimation (review of some statistics)

COMS 4771 Lecture Course overview 2. Maximum likelihood estimation (review of some statistics) COMS 4771 Lecture 1 1. Course overview 2. Maximum likelihood estimation (review of some statistics) 1 / 24 Administrivia This course Topics http://www.satyenkale.com/coms4771/ 1. Supervised learning Core

More information

Introduction to Normal Distribution

Introduction to Normal Distribution Introduction to Normal Distribution Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 17-Jan-2017 Nathaniel E. Helwig (U of Minnesota) Introduction

More information

ON THE HÖLDER CONTINUITY OF MATRIX FUNCTIONS FOR NORMAL MATRICES

ON THE HÖLDER CONTINUITY OF MATRIX FUNCTIONS FOR NORMAL MATRICES Volume 10 (2009), Issue 4, Article 91, 5 pp. ON THE HÖLDER CONTINUITY O MATRIX UNCTIONS OR NORMAL MATRICES THOMAS P. WIHLER MATHEMATICS INSTITUTE UNIVERSITY O BERN SIDLERSTRASSE 5, CH-3012 BERN SWITZERLAND.

More information

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability

More information

Sparse PCA with applications in finance

Sparse PCA with applications in finance Sparse PCA with applications in finance A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon 1 Introduction

More information

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage

More information

H 2 -optimal model reduction of MIMO systems

H 2 -optimal model reduction of MIMO systems H 2 -optimal model reduction of MIMO systems P. Van Dooren K. A. Gallivan P.-A. Absil Abstract We consider the problem of approximating a p m rational transfer function Hs of high degree by another p m

More information

Invariant co-ordinate selection

Invariant co-ordinate selection J. R. Statist. Soc. B (2009) 71, Part 3, pp. 549 592 Invariant co-ordinate selection David E. Tyler, Rutgers University, Piscataway, USA Frank Critchley, The Open University, Milton Keynes, UK Lutz Dümbgen

More information

Information Geometry: Background and Applications in Machine Learning

Information Geometry: Background and Applications in Machine Learning Geometry and Computer Science Information Geometry: Background and Applications in Machine Learning Giovanni Pistone www.giannidiorestino.it Pescara IT), February 8 10, 2017 Abstract Information Geometry

More information

Multivariable Calculus

Multivariable Calculus 2 Multivariable Calculus 2.1 Limits and Continuity Problem 2.1.1 (Fa94) Let the function f : R n R n satisfy the following two conditions: (i) f (K ) is compact whenever K is a compact subset of R n. (ii)

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

Efficient Estimation in Convex Single Index Models 1

Efficient Estimation in Convex Single Index Models 1 1/28 Efficient Estimation in Convex Single Index Models 1 Rohit Patra University of Florida http://arxiv.org/abs/1708.00145 1 Joint work with Arun K. Kuchibhotla (UPenn) and Bodhisattva Sen (Columbia)

More information

Math Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88

Math Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88 Math Camp 2010 Lecture 4: Linear Algebra Xiao Yu Wang MIT Aug 2010 Xiao Yu Wang (MIT) Math Camp 2010 08/10 1 / 88 Linear Algebra Game Plan Vector Spaces Linear Transformations and Matrices Determinant

More information

4 Film Extension of the Dynamics: Slowness as Stability

4 Film Extension of the Dynamics: Slowness as Stability 4 Film Extension of the Dynamics: Slowness as Stability 4.1 Equation for the Film Motion One of the difficulties in the problem of reducing the description is caused by the fact that there exists no commonly

More information

Random Matrix Eigenvalue Problems in Probabilistic Structural Mechanics

Random Matrix Eigenvalue Problems in Probabilistic Structural Mechanics Random Matrix Eigenvalue Problems in Probabilistic Structural Mechanics S Adhikari Department of Aerospace Engineering, University of Bristol, Bristol, U.K. URL: http://www.aer.bris.ac.uk/contact/academic/adhikari/home.html

More information

Econometrics I, Estimation

Econometrics I, Estimation Econometrics I, Estimation Department of Economics Stanford University September, 2008 Part I Parameter, Estimator, Estimate A parametric is a feature of the population. An estimator is a function of the

More information

Second Order Freeness and Random Orthogonal Matrices

Second Order Freeness and Random Orthogonal Matrices Second Order Freeness and Random Orthogonal Matrices Jamie Mingo (Queen s University) (joint work with Mihai Popa and Emily Redelmeier) AMS San Diego Meeting, January 11, 2013 1 / 15 Random Matrices X

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon

More information

Naive Bayes and Gaussian Bayes Classifier

Naive Bayes and Gaussian Bayes Classifier Naive Bayes and Gaussian Bayes Classifier Elias Tragas tragas@cs.toronto.edu October 3, 2016 Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, 2016 1 / 23 Naive Bayes Bayes Rules: Naive

More information

Factor Analysis. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA

Factor Analysis. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA Factor Analysis Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA 1 Factor Models The multivariate regression model Y = XB +U expresses each row Y i R p as a linear combination

More information

SIMULTANEOUS ESTIMATION OF SCALE MATRICES IN TWO-SAMPLE PROBLEM UNDER ELLIPTICALLY CONTOURED DISTRIBUTIONS

SIMULTANEOUS ESTIMATION OF SCALE MATRICES IN TWO-SAMPLE PROBLEM UNDER ELLIPTICALLY CONTOURED DISTRIBUTIONS SIMULTANEOUS ESTIMATION OF SCALE MATRICES IN TWO-SAMPLE PROBLEM UNDER ELLIPTICALLY CONTOURED DISTRIBUTIONS Hisayuki Tsukuma and Yoshihiko Konno Abstract Two-sample problems of estimating p p scale matrices

More information

Review: control, feedback, etc. Today s topic: state-space models of systems; linearization

Review: control, feedback, etc. Today s topic: state-space models of systems; linearization Plan of the Lecture Review: control, feedback, etc Today s topic: state-space models of systems; linearization Goal: a general framework that encompasses all examples of interest Once we have mastered

More information

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods

More information

Learning gradients: prescriptive models

Learning gradients: prescriptive models Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan

More information

Combinatorial Types of Tropical Eigenvector

Combinatorial Types of Tropical Eigenvector Combinatorial Types of Tropical Eigenvector arxiv:1105.55504 Ngoc Mai Tran Department of Statistics, UC Berkeley Joint work with Bernd Sturmfels 2 / 13 Tropical eigenvalues and eigenvectors Max-plus: (R,,

More information

A Least Squares Formulation for Canonical Correlation Analysis

A Least Squares Formulation for Canonical Correlation Analysis A Least Squares Formulation for Canonical Correlation Analysis Liang Sun, Shuiwang Ji, and Jieping Ye Department of Computer Science and Engineering Arizona State University Motivation Canonical Correlation

More information

On corrections of classical multivariate tests for high-dimensional data. Jian-feng. Yao Université de Rennes 1, IRMAR

On corrections of classical multivariate tests for high-dimensional data. Jian-feng. Yao Université de Rennes 1, IRMAR Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova On corrections of classical multivariate tests for high-dimensional

More information

Lecture 6: Discrete Choice: Qualitative Response

Lecture 6: Discrete Choice: Qualitative Response Lecture 6: Instructor: Department of Economics Stanford University 2011 Types of Discrete Choice Models Univariate Models Binary: Linear; Probit; Logit; Arctan, etc. Multinomial: Logit; Nested Logit; GEV;

More information

Linear Algebra Practice Final

Linear Algebra Practice Final . Let (a) First, Linear Algebra Practice Final Summer 3 3 A = 5 3 3 rref([a ) = 5 so if we let x 5 = t, then x 4 = t, x 3 =, x = t, and x = t, so that t t x = t = t t whence ker A = span(,,,, ) and a basis

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

Tests for separability in nonparametric covariance operators of random surfaces

Tests for separability in nonparametric covariance operators of random surfaces Tests for separability in nonparametric covariance operators of random surfaces Shahin Tavakoli (joint with John Aston and Davide Pigoli) April 19, 2016 Analysis of Multidimensional Functional Data Shahin

More information

Nonlinear Programming Models

Nonlinear Programming Models Nonlinear Programming Models Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Nonlinear Programming Models p. Introduction Nonlinear Programming Models p. NLP problems minf(x) x S R n Standard form:

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

S-estimators in mapping applications

S-estimators in mapping applications S-estimators in mapping applications João Sequeira Instituto Superior Técnico, Technical University of Lisbon Portugal Email: joaosilvasequeira@istutlpt Antonios Tsourdos Autonomous Systems Group, Department

More information

Paul Schrimpf. October 18, UBC Economics 526. Unconstrained optimization. Paul Schrimpf. Notation and definitions. First order conditions

Paul Schrimpf. October 18, UBC Economics 526. Unconstrained optimization. Paul Schrimpf. Notation and definitions. First order conditions Unconstrained UBC Economics 526 October 18, 2013 .1.2.3.4.5 Section 1 Unconstrained problem x U R n F : U R. max F (x) x U Definition F = max x U F (x) is the maximum of F on U if F (x) F for all x U and

More information

Discussion of Hypothesis testing by convex optimization

Discussion of Hypothesis testing by convex optimization Electronic Journal of Statistics Vol. 9 (2015) 1 6 ISSN: 1935-7524 DOI: 10.1214/15-EJS990 Discussion of Hypothesis testing by convex optimization Fabienne Comte, Céline Duval and Valentine Genon-Catalot

More information

Lecture 11. Multivariate Normal theory

Lecture 11. Multivariate Normal theory 10. Lecture 11. Multivariate Normal theory Lecture 11. Multivariate Normal theory 1 (1 1) 11. Multivariate Normal theory 11.1. Properties of means and covariances of vectors Properties of means and covariances

More information

1 Data Arrays and Decompositions

1 Data Arrays and Decompositions 1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is

More information

Second-Order Inference for Gaussian Random Curves

Second-Order Inference for Gaussian Random Curves Second-Order Inference for Gaussian Random Curves With Application to DNA Minicircles Victor Panaretos David Kraus John Maddocks Ecole Polytechnique Fédérale de Lausanne Panaretos, Kraus, Maddocks (EPFL)

More information

Mean-field equations for higher-order quantum statistical models : an information geometric approach

Mean-field equations for higher-order quantum statistical models : an information geometric approach Mean-field equations for higher-order quantum statistical models : an information geometric approach N Yapage Department of Mathematics University of Ruhuna, Matara Sri Lanka. arxiv:1202.5726v1 [quant-ph]

More information

Lecture: Examples of LP, SOCP and SDP

Lecture: Examples of LP, SOCP and SDP 1/34 Lecture: Examples of LP, SOCP and SDP Zaiwen Wen Beijing International Center For Mathematical Research Peking University http://bicmr.pku.edu.cn/~wenzw/bigdata2018.html wenzw@pku.edu.cn Acknowledgement:

More information

Course Summary Math 211

Course Summary Math 211 Course Summary Math 211 table of contents I. Functions of several variables. II. R n. III. Derivatives. IV. Taylor s Theorem. V. Differential Geometry. VI. Applications. 1. Best affine approximations.

More information

Analysis and Linear Algebra. Lectures 1-3 on the mathematical tools that will be used in C103

Analysis and Linear Algebra. Lectures 1-3 on the mathematical tools that will be used in C103 Analysis and Linear Algebra Lectures 1-3 on the mathematical tools that will be used in C103 Set Notation A, B sets AcB union A1B intersection A\B the set of objects in A that are not in B N. Empty set

More information

The following definition is fundamental.

The following definition is fundamental. 1. Some Basics from Linear Algebra With these notes, I will try and clarify certain topics that I only quickly mention in class. First and foremost, I will assume that you are familiar with many basic

More information

Journal of Computational and Applied Mathematics

Journal of Computational and Applied Mathematics Journal of Computational and Applied Mathematics 234 (2) 538 544 Contents lists available at ScienceDirect Journal of Computational and Applied Mathematics journal homepage: www.elsevier.com/locate/cam

More information

A note on profile likelihood for exponential tilt mixture models

A note on profile likelihood for exponential tilt mixture models Biometrika (2009), 96, 1,pp. 229 236 C 2009 Biometrika Trust Printed in Great Britain doi: 10.1093/biomet/asn059 Advance Access publication 22 January 2009 A note on profile likelihood for exponential

More information

arxiv: v2 [stat.me] 31 Aug 2017

arxiv: v2 [stat.me] 31 Aug 2017 Asymptotic and bootstrap tests for subspace dimension Klaus Nordhausen 1,2, Hannu Oja 1, and David E. Tyler 3 arxiv:1611.04908v2 [stat.me] 31 Aug 2017 1 Department of Mathematics and Statistics, University

More information

Operator norm convergence for sequence of matrices and application to QIT

Operator norm convergence for sequence of matrices and application to QIT Operator norm convergence for sequence of matrices and application to QIT Benoît Collins University of Ottawa & AIMR, Tohoku University Cambridge, INI, October 15, 2013 Overview Overview Plan: 1. Norm

More information

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ). .8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics

More information

Statistical Inference On the High-dimensional Gaussian Covarianc

Statistical Inference On the High-dimensional Gaussian Covarianc Statistical Inference On the High-dimensional Gaussian Covariance Matrix Department of Mathematical Sciences, Clemson University June 6, 2011 Outline Introduction Problem Setup Statistical Inference High-Dimensional

More information

1 Appendix A: Matrix Algebra

1 Appendix A: Matrix Algebra Appendix A: Matrix Algebra. Definitions Matrix A =[ ]=[A] Symmetric matrix: = for all and Diagonal matrix: 6=0if = but =0if 6= Scalar matrix: the diagonal matrix of = Identity matrix: the scalar matrix

More information

On Independent Component Analysis

On Independent Component Analysis On Independent Component Analysis Université libre de Bruxelles European Centre for Advanced Research in Economics and Statistics (ECARES) Solvay Brussels School of Economics and Management Symmetric Outline

More information

Tutorial lecture 2: System identification

Tutorial lecture 2: System identification Tutorial lecture 2: System identification Data driven modeling: Find a good model from noisy data. Model class: Set of all a priori feasible candidate systems Identification procedure: Attach a system

More information

EFFICIENT MULTIVARIATE ENTROPY ESTIMATION WITH

EFFICIENT MULTIVARIATE ENTROPY ESTIMATION WITH EFFICIENT MULTIVARIATE ENTROPY ESTIMATION WITH HINTS OF APPLICATIONS TO TESTING SHAPE CONSTRAINTS Richard Samworth, University of Cambridge Joint work with Thomas B. Berrett and Ming Yuan Collaborators

More information

Applications of Information Geometry to Hypothesis Testing and Signal Detection

Applications of Information Geometry to Hypothesis Testing and Signal Detection CMCAA 2016 Applications of Information Geometry to Hypothesis Testing and Signal Detection Yongqiang Cheng National University of Defense Technology July 2016 Outline 1. Principles of Information Geometry

More information

CS 195-5: Machine Learning Problem Set 1

CS 195-5: Machine Learning Problem Set 1 CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of

More information

Ordinary Differential Equations II

Ordinary Differential Equations II Ordinary Differential Equations II February 9 217 Linearization of an autonomous system We consider the system (1) x = f(x) near a fixed point x. As usual f C 1. Without loss of generality we assume x

More information