Geodesic Convexity and Regularized Scatter Estimation
|
|
- Piers Wilson
- 6 years ago
- Views:
Transcription
1 Geodesic Convexity and Regularized Scatter Estimation Lutz Duembgen (Bern) David Tyler (Rutgers) Klaus Nordhausen (Turku/Vienna), Heike Schuhmacher (Bern) Markus Pauly (Ulm), Thomas Schweizer (Bern) Düsseldorf, July 22, 2017
2 I. Geometry of Scatter Matrices II. Geodesic Convexity and Coercivity III. M-Functionals of Scatter IV. Regularization
3 I. Geometry of Scatter Matrices R q q sym := { A R q q : A = A } R q q sym,+ := { A R q q sym : A positive definite } (open convex cone in R q q sym ) A, B := tr(ab) = i,j A ij B ij A F := A, A
4 z x y
5 A = [ z + x y ] y z x = [ ] x y + z I y x 2 A 2 F = 2 (x 2 + y 2 + z 2 ) A positive definite z > x 2 + y 2
6 µ R q Σ R q q sym,+ ˆΣ = sample covariance matrix of X 1, X 2,..., X n i.i.d N q (µ, Σ).
7 m = 50 samples of size n = 100:
8 m = 50 samples of size n = 500:
9 Suitable Geometry W ˆΣ = Σ 1/2 W Σ 1/2 { has universal symmetric distribution(q, n) p I q as n Local distance measure at Σ: d Σ (Σ, ˆΣ) := W I q F d Σ (Σ 0, Σ 1 ) := Σ 1/2 (Σ 0 Σ 1 )Σ 1/2 F
10 Global distance measure (geodesic distance) D g (Σ 0, Σ 1 ) := min over all smooth paths connecting Σ 0 and Σ 1. = min 1 d Σt ( Σt, Σ t+dt ) 0 1 Σ 1/2 t 0 [0, 1] t Σ t Σ t Σ 1/2 F dt t
11 Explicit solution A = log(σ 1/2 0 Σ 1 Σ 1/2 0 ) Σ t = Σ 1/2 0 exp(ta) Σ 1/2 0 D g (Σ 0, Σ 1 ) = A F Note: exp(a) = k=0 A k k! exp ( U diag(λ)u ) = U diag(e λ )U log ( U diag(λ)u ) = U diag(log λ)u
12 *
13
14 Local global parametrizations of R q q sym,+ Σ = BB with nonsingular B R q q R q q sym,+ = { B exp(a)b : A R q q } sym { Γ R q q sym,+ : det(γ) = det(σ)} = { B exp(a)b : A R q q sym, tr(a) = 0 }.
15 Note that for q 2, is not isometric. (R q q sym,+, D g ) Σ log(σ) (R q q sym, F ) y x
16 II. Geodesic Convexity and Coercivity Geodesic Convexity A function is (strictly) geodesically convex if f : R q q sym,+ R nonsingular B R q q, nonzero A R q q sym, f ( B exp(ta)b ) is (strictly) convex in t R. Equivalently: nonsingular B R q q, f ( B diag(e x )B ) is (strictly) convex in x R q.
17 Example The function is geodesically linear: f (Σ) := log det(σ) log det(b exp(a)b ) = log det(bb ) + trace(a).
18 Verifying g-convexity for smooth functions (V2) For any nonsingular B R q q and x R q, f ( B diag(e x )B ) = f (BB ) + gb x x H B x + o( x 2 ) as x 0. f is g-convex iff for all B, H B 0. f is strictly g-convex iff for all B, H B > 0.
19 Example For nonzero v R q, f (Σ) := log v Σv is g-convex. For nonsingular B R q q and w := B v, f ( B diag(e x )B ) = log ( w diag(e x )w ) f (BB ) + gb x x H B x with g B := ( w 2 i w 2 ) q i=1 H B := diag(g B ) g B g B.
20 Remarks Σ f (Σ) g-convex Σ f (Σ 1 ) g-convex. Sums and pointwise suprema of g-convex functions are g-convex. Both log λ max (Σ) and log λ max (Σ 1 ) = log λ min (Σ) are g-convex. f (Σ) g-convex, h : R R convex and increasing = h(f (Σ)) is g-convex. A local minimizer of a g-convex function is also a global minimizer. The only g-affine functions are f (Σ) = c 1 + c 2 log det(σ) with c 1, c 2 R.
21 Geodesic Coercivity Let f : R q q sym,+ R be g-convex / strictly g-convex. Then iff f is g-coercive, arg min f (Σ) is compact / a singleton Σ f (Σ) as log(σ) F. Criterion: If f is differentiable, it is g-coercive iff lim t for any nonzero A R q q sym. d dt f (exp(ta)) > 0
22 III. M-Functionals of Scatter True/empirical distribution Working model/caricature for P: P on R q with center 0 R q. ( f Σ (x) = C det(σ) 1/2 exp ρ(x Σ 1 x) ) 2 ρ(s) in s > 0 sρ (s) in s > 0 In other words, ρ(e x ) and convex in x R.
23 Target function (log-likelihood times 2/n) L(Σ, P) := 2 log[f Σ /f I ] dp = [ρ(x Σ 1 x) ρ(x x) ] P(dx) + log det(σ) M-Functional of scatter Σ(P) := arg min L(Σ, P) Σ R q q sym,+ M-estimator of scatter ˆP = emp. distribution of X 1, X 2,..., X n i.i.d. P Σ( ˆP) estimates Σ(P)
24 L(Σ, P) = Σ(P) = [ρ(x Σ 1 x) ρ(x x) ] P(dx) + log det(σ) arg min L(Σ, P) Σ R q q sym,+ ρ(s) = s: Σ(P) = Var(P) sρ (s) bounded in s 0: Σ( ) is moderately robust P elliptically symmetric with center 0 and scatter Σ: Σ(P) = c Σ
25 Good news In general, L(, P) is geodesically convex. Under mild regularity conditions on P and ρ, L(, P) is geodesically strictly convex and coercive.
26 Taylor expansion with L(B diag(e x )B, P) L(BB, P) + g B x x H B x g B := 1 q ψ B ψ B := ρ ( x 2 )(xi 2 ) q i=1 P B(dx) H B := diag(ψ B ) + ρ ( x 2 )xx P B (dx) P B := L(B 1 X ), X P. Existence, continuity and weak differentiability of Σ( )... Fast algorithms for computation of Σ( ˆP) via partial Newton method...
27 Symmetrization Replace Σ(P) with Σ s (P) := Σ(P P) P P := L(X X ), X, X i.i.d P. Estimator uses or with 1 k n. P P := P P := 1 nk ( ) n 1 δ 2 Xj X i 1 i<j n n i+k i=1 j=i+1 δ Xj X i
28 No need to estimate center of P P elliptically symmetric around µ with scatter Σ: Σ s (P) = c Σ Block independence property: ( P = L B [ X1 X 2 ]) with independent X 1 R q(1), X 2 R q(2) implies [ ] Σ1 (P) 0 Σ s (P) = B B. 0 Σ 2 (P)
29 IV. Regularization In high-dimensional settings replace Σ(P) with ( ) arg min L(Σ, P) + α Pen(Σ), α > 0, Σ R q q sym,+ where Pen : R q q sym,+ R satisfies Pen(cΣ) = Pen(Σ) (scale invariance) Pen(Σ) as λ max (Σ)/λ min (Σ).
30 Examples of penalties: Pen 0 (Σ) = log tr(σ) + log tr(σ 1 ) ( q ) ( q ) = log λ i + log λ 1 i ) i=1 i=1 Pen 1 (Σ) = q 1 log det(σ) + log tr(σ 1 ) q ( q ) = q 1 log λ i + log i=1 i=1 λ 1 i Pen 2 (Σ) = log det(σ) + q log λ max (Σ) q = log(λ i /λ min ) i=1
31 These penalties Pen j (Σ) are scale invariant g-convex g-coercive on {Σ : det(σ) = c} strictly g-convex on {Σ : det(σ) = c} (Pen 0, Pen 1 ) with arg min Σ Pen j (Σ) = {ci q : c > 0}
32 Example: Regularized version of Tyler s (1987) M-functional with f (Σ) = L(Σ, P) + α Pen(Σ) ρ(s) = q log s { Pen 1 (Σ) (Case 1) Pen(Σ) = h(pen 1 (Σ)) (Case 2) On {Σ : det(σ) = 1}, f is strictly g-convex g-coercive in Case 1 if ( P(V) < 1 + α ) dim(v) q q g-coercive in Case 2 if lim s h(s) s whenever 1 dim(v) < q. =
33 Numerical experiment For q = 50 and n = 30 consider X 1, X 2,..., X n i.i.d. Elliptic q (0, Σ) with Σ = diag(10, 5, 3, 2, 1,..., 1) 2.
34 Compute and ˆΣ α ( := arg min L(Σ, ˆP) + αh(pen 1 (Σ)) ) Σ ˆΣ := ˆΣˆα with ˆα := arg min CV(α) CV(α) := α 2 Z n i=1 { ρ(x i ˆΣ 1 α, i x i) + log det(ˆσ α, i ) }
35 log λ(σ ) and log λ(ˆσ ) (ˆα = 2 7 )
36 Cross validation: CV(2 k ) versus k
37 First eigenvectors: û 1 u 1 versus k
38 Eigenvalues: log λ(ˆσ ) log λ(σ ) versus k
39 Shape matrices: D g (ˆΣ, Σ ) versus k
40 Symmetrization and orthogonally invariant penalties f (Σ) = L(Σ, P P) + α Pen(Σ) Pen(U ΣU) = Pen(Σ) for orthogonal U R q q Restricted block independence property ( [ ]) X1 P = L U with U R q q orth and independent X 1 R q(1), X 2 R q(2) implies [ ] Σ1 (P) 0 Σ s (P) = U U. 0 Σ 2 (P) X 2
41 Open questions and ongoing work Symmetrized M-estimators: Balanced incomplete versus complete U-statistics Asymptotics for regularized scatter estimators Algorithms for non-smooth g-convex penalties Using regularized scatter estimators in other contexts (classification, ICS ICA, multivar. regression,... )...
42 References Auderset, Mazza & Ruh: Angular Gaussian and Cauchy estimation. (JMVA 2005) Bhatia: Positive definite matrices. (Princeton University Press 2007) Wiesel: Geodesic convexity and covariance estimation. (IEEE Trans. Signal Process. 2012) D., Pauly & Schweizer: M-functionals of multivariate scatter. (Statistics Surveys 2015) D., Nordhausen & Schuhmacher: New algorithms for M-estimation of multivar. scatter and loc. (JMVA 2016) R package fastm. (CRAN 2014/2015) D. & Tyler: Geodesic convexity and regularized scatter estimators. (arxiv )
Parameter estimation in linear Gaussian covariance models
Parameter estimation in linear Gaussian covariance models Caroline Uhler (IST Austria) Joint work with Piotr Zwiernik (UC Berkeley) and Donald Richards (Penn State University) Big Data Reunion Workshop
More informationInference For High Dimensional M-estimates. Fixed Design Results
: Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and
More informationLog Covariance Matrix Estimation
Log Covariance Matrix Estimation Xinwei Deng Department of Statistics University of Wisconsin-Madison Joint work with Kam-Wah Tsui (Univ. of Wisconsin-Madsion) 1 Outline Background and Motivation The Proposed
More informationThe Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA
The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:
More informationRobust subspace recovery by geodesically convex optimization
Robust subspace recovery by geodesically convex optimization Teng Zhang arxiv:206.386v2 [stat.ml] 0 Jun 202 Abstract We introduce Tyler s M-estimator to robustly recover the underlying linear model from
More informationSparse Covariance Selection using Semidefinite Programming
Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support
More informationMultivariate Gaussian Analysis
BS2 Statistical Inference, Lecture 7, Hilary Term 2009 February 13, 2009 Marginal and conditional distributions For a positive definite covariance matrix Σ, the multivariate Gaussian distribution has density
More information(Part 1) High-dimensional statistics May / 41
Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2
More informationGlobal Maxwellians over All Space and Their Relation to Conserved Quantites of Classical Kinetic Equations
Global Maxwellians over All Space and Their Relation to Conserved Quantites of Classical Kinetic Equations C. David Levermore Department of Mathematics and Institute for Physical Science and Technology
More informationInvariant coordinate selection for multivariate data analysis - the package ICS
Invariant coordinate selection for multivariate data analysis - the package ICS Klaus Nordhausen 1 Hannu Oja 1 David E. Tyler 2 1 Tampere School of Public Health University of Tampere 2 Department of Statistics
More informationFinite Singular Multivariate Gaussian Mixture
21/06/2016 Plan 1 Basic definitions Singular Multivariate Normal Distribution 2 3 Plan Singular Multivariate Normal Distribution 1 Basic definitions Singular Multivariate Normal Distribution 2 3 Multivariate
More informationHigh-dimensional regression with unknown variance
High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march 2012 Setting Gaussian regression with unknown variance: Y i = f i + ε i with ε i i.i.d. N (0, σ 2 ) f = (f
More informationCovariance function estimation in Gaussian process regression
Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian
More informationRecovery of anisotropic metrics from travel times
Purdue University The Lens Rigidity and the Boundary Rigidity Problems Let M be a bounded domain with boundary. Let g be a Riemannian metric on M. Define the scattering relation σ and the length (travel
More informationNotes on Random Vectors and Multivariate Normal
MATH 590 Spring 06 Notes on Random Vectors and Multivariate Normal Properties of Random Vectors If X,, X n are random variables, then X = X,, X n ) is a random vector, with the cumulative distribution
More informationConfidence Intervals for Low-dimensional Parameters with High-dimensional Data
Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology
More informationMATH 829: Introduction to Data Mining and Analysis Graphical Models II - Gaussian Graphical Models
1/13 MATH 829: Introduction to Data Mining and Analysis Graphical Models II - Gaussian Graphical Models Dominique Guillot Departments of Mathematical Sciences University of Delaware May 4, 2016 Recall
More informationDecomposable and Directed Graphical Gaussian Models
Decomposable Decomposable and Directed Graphical Gaussian Models Graphical Models and Inference, Lecture 13, Michaelmas Term 2009 November 26, 2009 Decomposable Definition Basic properties Wishart density
More informationPart IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015
Part IB Statistics Theorems with proof Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly)
More information1 Overview. 2 A Characterization of Convex Functions. 2.1 First-order Taylor approximation. AM 221: Advanced Optimization Spring 2016
AM 221: Advanced Optimization Spring 2016 Prof. Yaron Singer Lecture 8 February 22nd 1 Overview In the previous lecture we saw characterizations of optimality in linear optimization, and we reviewed the
More informationMultivariate Statistical Analysis
Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions
More informationMaximum likelihood estimation of a log-concave density based on censored data
Maximum likelihood estimation of a log-concave density based on censored data Dominic Schuhmacher Institute of Mathematical Statistics and Actuarial Science University of Bern Joint work with Lutz Dümbgen
More informationCommon-Knowledge / Cheat Sheet
CSE 521: Design and Analysis of Algorithms I Fall 2018 Common-Knowledge / Cheat Sheet 1 Randomized Algorithm Expectation: For a random variable X with domain, the discrete set S, E [X] = s S P [X = s]
More informationPermutation-invariant regularization of large covariance matrices. Liza Levina
Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work
More informationJournal of Multivariate Analysis. Sphericity test in a GMANOVA MANOVA model with normal error
Journal of Multivariate Analysis 00 (009) 305 3 Contents lists available at ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva Sphericity test in a GMANOVA MANOVA
More informationLearning Multiple Tasks with a Sparse Matrix-Normal Penalty
Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1
More informationRandom Matrices and Multivariate Statistical Analysis
Random Matrices and Multivariate Statistical Analysis Iain Johnstone, Statistics, Stanford imj@stanford.edu SEA 06@MIT p.1 Agenda Classical multivariate techniques Principal Component Analysis Canonical
More information16.20 Techniques of Structural Analysis and Design Spring Instructor: Raúl Radovitzky Aeronautics & Astronautics M.I.T
16.20 Techniques of Structural Analysis and Design Spring 2013 Instructor: Raúl Radovitzky Aeronautics & Astronautics M.I.T February 15, 2013 2 Contents 1 Stress and equilibrium 5 1.1 Internal forces and
More informationThe problem is to infer on the underlying probability distribution that gives rise to the data S.
Basic Problem of Statistical Inference Assume that we have a set of observations S = { x 1, x 2,..., x N }, xj R n. The problem is to infer on the underlying probability distribution that gives rise to
More informationOPTIMISATION CHALLENGES IN MODERN STATISTICS. Co-authors: Y. Chen, M. Cule, R. Gramacy, M. Yuan
OPTIMISATION CHALLENGES IN MODERN STATISTICS Co-authors: Y. Chen, M. Cule, R. Gramacy, M. Yuan How do optimisation problems arise in Statistics? Let X 1,...,X n be independent and identically distributed
More informationThe purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.
Chapter 9 Pearson s chi-square test 9. Null hypothesis asymptotics Let X, X 2, be independent from a multinomial(, p) distribution, where p is a k-vector with nonnegative entries that sum to one. That
More informationNaive Bayes and Gaussian Bayes Classifier
Naive Bayes and Gaussian Bayes Classifier Mengye Ren mren@cs.toronto.edu October 18, 2015 Mengye Ren Naive Bayes and Gaussian Bayes Classifier October 18, 2015 1 / 21 Naive Bayes Bayes Rules: Naive Bayes
More informationActive Set Methods for Log-Concave Densities and Nonparametric Tail Inflation
Active Set Methods for Log-Concave Densities and Nonparametric Tail Inflation Lutz Duembgen (Bern) Aleandre Moesching and Christof Straehl (Bern) Peter McCullagh and Nicholas G. Polson (Chicago) January
More informationNaive Bayes and Gaussian Bayes Classifier
Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others February 22, 2016 Naive Bayes and Gaussian Bayes Classifier February 22, 2016 1 / 21 Naive Bayes Bayes Rule:
More informationStructural and Multidisciplinary Optimization. P. Duysinx and P. Tossings
Structural and Multidisciplinary Optimization P. Duysinx and P. Tossings 2018-2019 CONTACTS Pierre Duysinx Institut de Mécanique et du Génie Civil (B52/3) Phone number: 04/366.91.94 Email: P.Duysinx@uliege.be
More informationMultivariate Analysis and Likelihood Inference
Multivariate Analysis and Likelihood Inference Outline 1 Joint Distribution of Random Variables 2 Principal Component Analysis (PCA) 3 Multivariate Normal Distribution 4 Likelihood Inference Joint density
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationHyperbolic Systems of Conservation Laws. in One Space Dimension. I - Basic concepts. Alberto Bressan. Department of Mathematics, Penn State University
Hyperbolic Systems of Conservation Laws in One Space Dimension I - Basic concepts Alberto Bressan Department of Mathematics, Penn State University http://www.math.psu.edu/bressan/ 1 The Scalar Conservation
More informationINVARIANT COORDINATE SELECTION
INVARIANT COORDINATE SELECTION By David E. Tyler 1, Frank Critchley, Lutz Dümbgen 2, and Hannu Oja Rutgers University, Open University, University of Berne and University of Tampere SUMMARY A general method
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationKernel Methods. Machine Learning A W VO
Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance
More informationIndependent component analysis for functional data
Independent component analysis for functional data Hannu Oja Department of Mathematics and Statistics University of Turku Version 12.8.216 August 216 Oja (UTU) FICA Date bottom 1 / 38 Outline 1 Probability
More informationA sensitivity result for quadratic semidefinite programs with an application to a sequential quadratic semidefinite programming algorithm
Volume 31, N. 1, pp. 205 218, 2012 Copyright 2012 SBMAC ISSN 0101-8205 / ISSN 1807-0302 (Online) www.scielo.br/cam A sensitivity result for quadratic semidefinite programs with an application to a sequential
More informationCOMS 4771 Lecture Course overview 2. Maximum likelihood estimation (review of some statistics)
COMS 4771 Lecture 1 1. Course overview 2. Maximum likelihood estimation (review of some statistics) 1 / 24 Administrivia This course Topics http://www.satyenkale.com/coms4771/ 1. Supervised learning Core
More informationIntroduction to Normal Distribution
Introduction to Normal Distribution Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 17-Jan-2017 Nathaniel E. Helwig (U of Minnesota) Introduction
More informationON THE HÖLDER CONTINUITY OF MATRIX FUNCTIONS FOR NORMAL MATRICES
Volume 10 (2009), Issue 4, Article 91, 5 pp. ON THE HÖLDER CONTINUITY O MATRIX UNCTIONS OR NORMAL MATRICES THOMAS P. WIHLER MATHEMATICS INSTITUTE UNIVERSITY O BERN SIDLERSTRASSE 5, CH-3012 BERN SWITZERLAND.
More informationPhysics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester
Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability
More informationSparse PCA with applications in finance
Sparse PCA with applications in finance A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon 1 Introduction
More informationDimension Reduction Techniques. Presented by Jie (Jerry) Yu
Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage
More informationH 2 -optimal model reduction of MIMO systems
H 2 -optimal model reduction of MIMO systems P. Van Dooren K. A. Gallivan P.-A. Absil Abstract We consider the problem of approximating a p m rational transfer function Hs of high degree by another p m
More informationInvariant co-ordinate selection
J. R. Statist. Soc. B (2009) 71, Part 3, pp. 549 592 Invariant co-ordinate selection David E. Tyler, Rutgers University, Piscataway, USA Frank Critchley, The Open University, Milton Keynes, UK Lutz Dümbgen
More informationInformation Geometry: Background and Applications in Machine Learning
Geometry and Computer Science Information Geometry: Background and Applications in Machine Learning Giovanni Pistone www.giannidiorestino.it Pescara IT), February 8 10, 2017 Abstract Information Geometry
More informationMultivariable Calculus
2 Multivariable Calculus 2.1 Limits and Continuity Problem 2.1.1 (Fa94) Let the function f : R n R n satisfy the following two conditions: (i) f (K ) is compact whenever K is a compact subset of R n. (ii)
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More informationEfficient Estimation in Convex Single Index Models 1
1/28 Efficient Estimation in Convex Single Index Models 1 Rohit Patra University of Florida http://arxiv.org/abs/1708.00145 1 Joint work with Arun K. Kuchibhotla (UPenn) and Bodhisattva Sen (Columbia)
More informationMath Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88
Math Camp 2010 Lecture 4: Linear Algebra Xiao Yu Wang MIT Aug 2010 Xiao Yu Wang (MIT) Math Camp 2010 08/10 1 / 88 Linear Algebra Game Plan Vector Spaces Linear Transformations and Matrices Determinant
More information4 Film Extension of the Dynamics: Slowness as Stability
4 Film Extension of the Dynamics: Slowness as Stability 4.1 Equation for the Film Motion One of the difficulties in the problem of reducing the description is caused by the fact that there exists no commonly
More informationRandom Matrix Eigenvalue Problems in Probabilistic Structural Mechanics
Random Matrix Eigenvalue Problems in Probabilistic Structural Mechanics S Adhikari Department of Aerospace Engineering, University of Bristol, Bristol, U.K. URL: http://www.aer.bris.ac.uk/contact/academic/adhikari/home.html
More informationEconometrics I, Estimation
Econometrics I, Estimation Department of Economics Stanford University September, 2008 Part I Parameter, Estimator, Estimate A parametric is a feature of the population. An estimator is a function of the
More informationSecond Order Freeness and Random Orthogonal Matrices
Second Order Freeness and Random Orthogonal Matrices Jamie Mingo (Queen s University) (joint work with Mihai Popa and Emily Redelmeier) AMS San Diego Meeting, January 11, 2013 1 / 15 Random Matrices X
More informationA direct formulation for sparse PCA using semidefinite programming
A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon
More informationNaive Bayes and Gaussian Bayes Classifier
Naive Bayes and Gaussian Bayes Classifier Elias Tragas tragas@cs.toronto.edu October 3, 2016 Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, 2016 1 / 23 Naive Bayes Bayes Rules: Naive
More informationFactor Analysis. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA
Factor Analysis Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA 1 Factor Models The multivariate regression model Y = XB +U expresses each row Y i R p as a linear combination
More informationSIMULTANEOUS ESTIMATION OF SCALE MATRICES IN TWO-SAMPLE PROBLEM UNDER ELLIPTICALLY CONTOURED DISTRIBUTIONS
SIMULTANEOUS ESTIMATION OF SCALE MATRICES IN TWO-SAMPLE PROBLEM UNDER ELLIPTICALLY CONTOURED DISTRIBUTIONS Hisayuki Tsukuma and Yoshihiko Konno Abstract Two-sample problems of estimating p p scale matrices
More informationReview: control, feedback, etc. Today s topic: state-space models of systems; linearization
Plan of the Lecture Review: control, feedback, etc Today s topic: state-space models of systems; linearization Goal: a general framework that encompasses all examples of interest Once we have mastered
More informationLecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.
Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods
More informationLearning gradients: prescriptive models
Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan
More informationCombinatorial Types of Tropical Eigenvector
Combinatorial Types of Tropical Eigenvector arxiv:1105.55504 Ngoc Mai Tran Department of Statistics, UC Berkeley Joint work with Bernd Sturmfels 2 / 13 Tropical eigenvalues and eigenvectors Max-plus: (R,,
More informationA Least Squares Formulation for Canonical Correlation Analysis
A Least Squares Formulation for Canonical Correlation Analysis Liang Sun, Shuiwang Ji, and Jieping Ye Department of Computer Science and Engineering Arizona State University Motivation Canonical Correlation
More informationOn corrections of classical multivariate tests for high-dimensional data. Jian-feng. Yao Université de Rennes 1, IRMAR
Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova On corrections of classical multivariate tests for high-dimensional
More informationLecture 6: Discrete Choice: Qualitative Response
Lecture 6: Instructor: Department of Economics Stanford University 2011 Types of Discrete Choice Models Univariate Models Binary: Linear; Probit; Logit; Arctan, etc. Multinomial: Logit; Nested Logit; GEV;
More informationLinear Algebra Practice Final
. Let (a) First, Linear Algebra Practice Final Summer 3 3 A = 5 3 3 rref([a ) = 5 so if we let x 5 = t, then x 4 = t, x 3 =, x = t, and x = t, so that t t x = t = t t whence ker A = span(,,,, ) and a basis
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationTests for separability in nonparametric covariance operators of random surfaces
Tests for separability in nonparametric covariance operators of random surfaces Shahin Tavakoli (joint with John Aston and Davide Pigoli) April 19, 2016 Analysis of Multidimensional Functional Data Shahin
More informationNonlinear Programming Models
Nonlinear Programming Models Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Nonlinear Programming Models p. Introduction Nonlinear Programming Models p. NLP problems minf(x) x S R n Standard form:
More informationSparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28
Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:
More informationS-estimators in mapping applications
S-estimators in mapping applications João Sequeira Instituto Superior Técnico, Technical University of Lisbon Portugal Email: joaosilvasequeira@istutlpt Antonios Tsourdos Autonomous Systems Group, Department
More informationPaul Schrimpf. October 18, UBC Economics 526. Unconstrained optimization. Paul Schrimpf. Notation and definitions. First order conditions
Unconstrained UBC Economics 526 October 18, 2013 .1.2.3.4.5 Section 1 Unconstrained problem x U R n F : U R. max F (x) x U Definition F = max x U F (x) is the maximum of F on U if F (x) F for all x U and
More informationDiscussion of Hypothesis testing by convex optimization
Electronic Journal of Statistics Vol. 9 (2015) 1 6 ISSN: 1935-7524 DOI: 10.1214/15-EJS990 Discussion of Hypothesis testing by convex optimization Fabienne Comte, Céline Duval and Valentine Genon-Catalot
More informationLecture 11. Multivariate Normal theory
10. Lecture 11. Multivariate Normal theory Lecture 11. Multivariate Normal theory 1 (1 1) 11. Multivariate Normal theory 11.1. Properties of means and covariances of vectors Properties of means and covariances
More information1 Data Arrays and Decompositions
1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is
More informationSecond-Order Inference for Gaussian Random Curves
Second-Order Inference for Gaussian Random Curves With Application to DNA Minicircles Victor Panaretos David Kraus John Maddocks Ecole Polytechnique Fédérale de Lausanne Panaretos, Kraus, Maddocks (EPFL)
More informationMean-field equations for higher-order quantum statistical models : an information geometric approach
Mean-field equations for higher-order quantum statistical models : an information geometric approach N Yapage Department of Mathematics University of Ruhuna, Matara Sri Lanka. arxiv:1202.5726v1 [quant-ph]
More informationLecture: Examples of LP, SOCP and SDP
1/34 Lecture: Examples of LP, SOCP and SDP Zaiwen Wen Beijing International Center For Mathematical Research Peking University http://bicmr.pku.edu.cn/~wenzw/bigdata2018.html wenzw@pku.edu.cn Acknowledgement:
More informationCourse Summary Math 211
Course Summary Math 211 table of contents I. Functions of several variables. II. R n. III. Derivatives. IV. Taylor s Theorem. V. Differential Geometry. VI. Applications. 1. Best affine approximations.
More informationAnalysis and Linear Algebra. Lectures 1-3 on the mathematical tools that will be used in C103
Analysis and Linear Algebra Lectures 1-3 on the mathematical tools that will be used in C103 Set Notation A, B sets AcB union A1B intersection A\B the set of objects in A that are not in B N. Empty set
More informationThe following definition is fundamental.
1. Some Basics from Linear Algebra With these notes, I will try and clarify certain topics that I only quickly mention in class. First and foremost, I will assume that you are familiar with many basic
More informationJournal of Computational and Applied Mathematics
Journal of Computational and Applied Mathematics 234 (2) 538 544 Contents lists available at ScienceDirect Journal of Computational and Applied Mathematics journal homepage: www.elsevier.com/locate/cam
More informationA note on profile likelihood for exponential tilt mixture models
Biometrika (2009), 96, 1,pp. 229 236 C 2009 Biometrika Trust Printed in Great Britain doi: 10.1093/biomet/asn059 Advance Access publication 22 January 2009 A note on profile likelihood for exponential
More informationarxiv: v2 [stat.me] 31 Aug 2017
Asymptotic and bootstrap tests for subspace dimension Klaus Nordhausen 1,2, Hannu Oja 1, and David E. Tyler 3 arxiv:1611.04908v2 [stat.me] 31 Aug 2017 1 Department of Mathematics and Statistics, University
More informationOperator norm convergence for sequence of matrices and application to QIT
Operator norm convergence for sequence of matrices and application to QIT Benoît Collins University of Ottawa & AIMR, Tohoku University Cambridge, INI, October 15, 2013 Overview Overview Plan: 1. Norm
More informationx. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).
.8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics
More informationStatistical Inference On the High-dimensional Gaussian Covarianc
Statistical Inference On the High-dimensional Gaussian Covariance Matrix Department of Mathematical Sciences, Clemson University June 6, 2011 Outline Introduction Problem Setup Statistical Inference High-Dimensional
More information1 Appendix A: Matrix Algebra
Appendix A: Matrix Algebra. Definitions Matrix A =[ ]=[A] Symmetric matrix: = for all and Diagonal matrix: 6=0if = but =0if 6= Scalar matrix: the diagonal matrix of = Identity matrix: the scalar matrix
More informationOn Independent Component Analysis
On Independent Component Analysis Université libre de Bruxelles European Centre for Advanced Research in Economics and Statistics (ECARES) Solvay Brussels School of Economics and Management Symmetric Outline
More informationTutorial lecture 2: System identification
Tutorial lecture 2: System identification Data driven modeling: Find a good model from noisy data. Model class: Set of all a priori feasible candidate systems Identification procedure: Attach a system
More informationEFFICIENT MULTIVARIATE ENTROPY ESTIMATION WITH
EFFICIENT MULTIVARIATE ENTROPY ESTIMATION WITH HINTS OF APPLICATIONS TO TESTING SHAPE CONSTRAINTS Richard Samworth, University of Cambridge Joint work with Thomas B. Berrett and Ming Yuan Collaborators
More informationApplications of Information Geometry to Hypothesis Testing and Signal Detection
CMCAA 2016 Applications of Information Geometry to Hypothesis Testing and Signal Detection Yongqiang Cheng National University of Defense Technology July 2016 Outline 1. Principles of Information Geometry
More informationCS 195-5: Machine Learning Problem Set 1
CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of
More informationOrdinary Differential Equations II
Ordinary Differential Equations II February 9 217 Linearization of an autonomous system We consider the system (1) x = f(x) near a fixed point x. As usual f C 1. Without loss of generality we assume x
More information