Concentration Ellipsoids

Similar documents
ECE 275A Homework 6 Solutions

Exercises * on Principal Component Analysis

Moore-Penrose Conditions & SVD

Lecture 3: Review of Linear Algebra

Eigenvalues, Eigenvectors, and an Intro to PCA

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

1 Singular Value Decomposition and Principal Component

Eigenvalues, Eigenvectors, and an Intro to PCA

Principal Components Theory Notes

MIT Final Exam Solutions, Spring 2017

Foundations of Computer Vision

Lecture 3: Review of Linear Algebra

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

Multivariate Statistical Analysis

Mathematical foundations - linear algebra

Principal Component Analysis

Linear Algebra: Characteristic Value Problem

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Linear Algebra Review. Vectors

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Lecture: Face Recognition and Feature Reduction

1 Principal component analysis and dimensional reduction

Lecture 2: Linear Algebra Review

Final Exam, Linear Algebra, Fall, 2003, W. Stephen Wilson

Eigenvalues and diagonalization

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

IV. Matrix Approximation using Least-Squares

Review problems for MA 54, Fall 2004.

ECE 275A Homework #3 Solutions

1 Linearity and Linear Systems

CS 143 Linear Algebra Review

Stat 206: Linear algebra

Fundamentals of Matrices

235 Final exam review questions

Pseudoinverse & Orthogonal Projection Operators

DS-GA 1002 Lecture notes 10 November 23, Linear models

EE731 Lecture Notes: Matrix Computations for Signal Processing

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Maximum variance formulation

Eigenvalues, Eigenvectors, and an Intro to PCA

Lecture 6 Sept Data Visualization STAT 442 / 890, CM 462

Chapter 6: Orthogonality

Bare minimum on matrix algebra. Psychology 588: Covariance structure and factor models

VAR Model. (k-variate) VAR(p) model (in the Reduced Form): Y t-2. Y t-1 = A + B 1. Y t + B 2. Y t-p. + ε t. + + B p. where:

7. Symmetric Matrices and Quadratic Forms

Applied Linear Algebra in Geoscience Using MATLAB

Introduction to Matrix Algebra

forms Christopher Engström November 14, 2014 MAA704: Matrix factorization and canonical forms Matrix properties Matrix factorization Canonical forms

33AH, WINTER 2018: STUDY GUIDE FOR FINAL EXAM

Tutorial on Principal Component Analysis

Linear Algebra Massoud Malek

Principal Component Analysis CS498

Singular Value Decomposition. 1 Singular Value Decomposition and the Four Fundamental Subspaces

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Principal Component Analysis

Numerical Methods I Singular Value Decomposition

Pseudoinverse & Moore-Penrose Conditions

MATH 20F: LINEAR ALGEBRA LECTURE B00 (T. KEMP)

ICS 6N Computational Linear Algebra Symmetric Matrices and Orthogonal Diagonalization

Lecture: Face Recognition and Feature Reduction

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining

Seminar on Linear Algebra

Exercise Set 7.2. Skills

Principal Component Analysis

Chapter 7: Symmetric Matrices and Quadratic Forms

Data Mining and Analysis: Fundamental Concepts and Algorithms

Chapter 6. Eigenvalues. Josef Leydold Mathematical Methods WS 2018/19 6 Eigenvalues 1 / 45

linearly indepedent eigenvectors as the multiplicity of the root, but in general there may be no more than one. For further discussion, assume matrice

5 Linear Algebra and Inverse Problem

Linear Algebra - Part II

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Lecture 14 Singular Value Decomposition

Homework 2 Foundations of Computational Math 2 Spring 2019

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

8. Diagonalization.

Exercise Sheet 1.

Vectors and Matrices Statistics with Vectors and Matrices

Principal Component Analysis. Applied Multivariate Statistics Spring 2012

Review of some mathematical tools

Independent Component Analysis and Its Application on Accelerator Physics

12.2 Dimensionality Reduction

Lecture Notes 2: Matrices

Unsupervised Learning: Dimensionality Reduction

. = V c = V [x]v (5.1) c 1. c k

Chapter 3 Transformations

COMP 558 lecture 18 Nov. 15, 2010

Math 408 Advanced Linear Algebra

Throughout these notes we assume V, W are finite dimensional inner product spaces over C.

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

CSE 554 Lecture 7: Alignment

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

STAT 100C: Linear models

Econ 204 Supplement to Section 3.6 Diagonalization and Quadratic Forms. 1 Diagonalization and Change of Basis

Matrices A brief introduction

14 Singular Value Decomposition

Contents. Preface for the Instructor. Preface for the Student. xvii. Acknowledgments. 1 Vector Spaces 1 1.A R n and C n 2

The Singular Value Decomposition and Least Squares Problems

Midterm for Introduction to Numerical Analysis I, AMSC/CMSC 466, on 10/29/2015

Singular Value Decomposition (SVD) and Polar Form

Review (Probability & Linear Algebra)

Transcription:

Concentration Ellipsoids ECE275A Lecture Supplement Fall 2008 Kenneth Kreutz Delgado Electrical and Computer Engineering Jacobs School of Engineering University of California, San Diego VERSION LSECE275CE F08v1.0 Copyright c 2003 2008, All Rights Reserved Let θ = ˆθ θ denote the error associated with a uniformly unbiased } estimator 1 ˆθ of an n-dimensional real parameter vector θ. Necessarily then, E θ { θ = 0. Let the associated real n n error covariance matrix of θ be given by } } } Σ = Σ(ˆθ, θ) = Cov θ { θ = E θ { θ θt = E θ {(ˆθ θ)(ˆθ θ) T. Note that Σ(ˆθ, θ) is a function of both the estimator ˆθ and the assumed (unknown) parameter vector θ, even though we often use the simpler notation Σ in place of the more informative Σ(ˆθ, θ), and thereby not show this dependence explicitly. 2 It is usual to further assume that the (necessarily symmetric and positive-semidefinite) error covariance matrix is positivedefinite (and hence full-rank and invertible), Σ > 0. In the latter case the random vector θ is called full-rank. It is well-known that a symmetric n n positive-definite matrix, Σ, has n nonzero real eigenvalues, σ i > 0, and associated real orthogonal (which can assumed to be normalized) eigenvectors u i, Σ u i = σ i u i with u i, u j = u T i u j = δ ij i, j = 1,, n. (1) Defining the real n n orthogonal matrix U = [u 1 u n ], and the real diagonal matrix Λ = diag [σ 1 σ n ] we can rewrite (1) as Σ = UΛU T with U 1 = U T. (2) 1 Also known as an absolutely unbiased, or (in the Bayesian context) a conditionally unbiased estimator. 2 In lecture, at various times we also use the notations Σ θ (ˆθ) (both dependencies explicit), Σ(ˆθ) (the θ dependence suppressed), Σ θ (the ˆθ dependence suppressed), and Σˆθ (the θ dependence suppressed). 1

K. Kreutz-Delgado Copyright c 2003-2008, All Rights Reserved Version LSECE275CE F08v1.0 2 From (2) one can easily show that Σ k = UΛ k U T for all integer (positive and negative) values of k, and meaningfully generalize this fact by defining Σ r for all real values of r to be Σ r = UΛ r U T, where Λ r = diag [(σ 1 ) r (σ n ) r ]. Note that using this particular definition results in Σ r being symmetric positive definite for all r. In particular, we have Σ 1 = UΛ 1 U T and Σ 1 2 = UΛ 1 2 U T. The representation Σ = UΛU T can be equivalently written as Σ = σ 1 u 1 u T 1 + + σ n u n u T n. (3) The representation (3) is a particular example of a so-called rank one expansion of a matrix, 3 which, because it is an expansion in terms of the eigenpairs (σ i, u i ), i = 1,, n, is known as the Spectral Representation of the positive definite matrix Σ. It is also the singular value decomposition (SVD) of Σ. Because Σ is symmetric positive definite, its singular values and eigenvalues happen to be identical, but more generally this is not the case. The SVD is a generalization of the Spectral Representation that can be applied to general non symmetric, non diagonalizable and non square matrices. Because the symmetric positive-definite matrix Σ happens to also be a covariance matrix, the Spectral Representation (3) is also known as a Karhunen Loéve expansion (or K-L expansion) 4 and as a Principal Components Analysis expansion (or PCA expansion). In the latter case the eigenpairs (σ i, u i ) are known as the principal Components. The unit vectors u i are known as the principal directions (in θ space). 3 Each individual term u i u T i is a rank-one matrix, as can be easily shown. A more general rank one expansion of an arbitrary, and even possibly non-square, matrix which we have previously encountered is the SVD. Indeed, equations (2)-(3) give the SVD of the special, positive-definite matrix Σ, and σ i, i = 1, n are also the singular values of Σ. It is to emphasize this fact that I use σ i to denote the eigenvalues (which, in this case, are also the singular values) of Σ rather than the (perhaps) more conventional notation of λ i. 4 The K-L expansion provides a representation of the components of the random vector θ, viewed as a correlated stochastic time series (also known as a random process) θ[1],, θ[n], in terms of the components of the nonrandom vector u i viewed as a nonrandom time-series u i [1],, u i [n]. The n deterministic processes u i [k], i = 1,, n, are orthonormal with respect to the l 2 inner product u, v = u[k]v[k]. The K- k=1 L expansion (or representation) is given by θ[k] = n π[i]u i [k], where π[i] = u i, θ = u T θ i is random. i=1 If we define the vector π = [ π[1],, π[n]] T, we can readily show that π = U T θ, θ = U π, and that E θ ( π π T ) = Λ. This shows that we can transform the correlated discrete-time random process θ[k] into an equivalent uncorrelated discrete-time random process π[k]. More generally (and the usual situation encountered in communications theory) The K L expansion allows a continuous-time random process θ(t) to be represented in terms of a countable set of continuous-time orthogonal (with respect to the L 2 inner product) deterministic functions u i (t), θ(t) = π[i]u i (t), where π[i] is uncorrelated. This results in the transformation of the continuous-time correlated random process θ(t) into an equivalent discrete-time uncorrelated random process π[k], k = 1, 2,, which can be quite useful in a theoretical anaysis. For example, this transformation is commonly done when analyzing intersymbol interference (ISI) in a communication channel. n i=1

K. Kreutz-Delgado Copyright c 2003-2008, All Rights Reserved Version LSECE275CE F08v1.0 3 The principal components are not random. However, the projections of the random vector θ onto the nonrandom principal directions u i, denoted by π[i] = u T θ, i i = 1,, n, results in a collection of uncorrelated scalar random variables each with variance σ i, If we further define the random vector this corresponds to E θ { π[i] π[j]} = σ i δ ij. π = [ π[1],, π[n]] T = U T θ E θ { π π T } = Λ. Note that it is readily shown that, mse θ (ˆθ) = E θ { θ 2 } = trace Σ = trace Λ = E θ { π 2 } = σ 1 + + σ n. Assuming that the eigenvalues are ordered from largest to smallest as σ 1 σ n > 0 (as is usually done), then the corresponding principal directions u 1,, u n, describe directions in θ space which explain the observed statistical variation of θ in decreasing importance as measured by the eigenvalues. The projection, π[i], of θ onto u i is known as the loading of θ on the principal direction u i, and it explains 100 σ i σ 1 + +σ n % of the observed statistical variation in θ. Sometimes a principal direction can be physically (or otherwise) interpreted, thereby providing an explanation for the amount of variation seen along that particular direction. In terms of π, we can recover θ as θ = UU T θ = U π = π[1]u1 + + π[n]u n. A so-called rank-r approximation to θ in terms of its r < n first most important principal components as measured in order of decreasing importance by σ 1 σ 2 σ r σ n, is provided by θ (r) = π[1]u 1 + π[2]u 2 + + π[r]u r. Note that and trace Cov θ { θ(r) } = trace E θ { θ(r) θ(r) T } = σ 1 + + σ r, { ) } T trace E θ ( θ ( θ θ(r) θ(r)) = σ r+1 + + σ n. σ 1 + +σ r The rank-r approximation θ (r) explains 100 σ 1 + +σ r+ +σ n % of the observed variation in θ. Finally, if the components of θ are interpreted as a correlated random sequence, then the components of π can be interpreted as an uncorrelated ( whitened ) random sequence which is entirely equivalent to θ (see also footnote 4).

K. Kreutz-Delgado Copyright c 2003-2008, All Rights Reserved Version LSECE275CE F08v1.0 4 Homework Problems. The symmetric positive-definite matrices Σ 1 and Σ 2 denote the error covariances associated with the unbiased estimators ˆθ 1 and ˆθ 2 respectively. The respective estimation errors are denoted by θ 1 and θ 2. It is convenient to make the transformation λ 2 i = σ i, i = 1,, n, so that λ i = σ i denotes the standard deviation of the statistical variance of θ along the direction u i. 1. Show that a level surface of θ T Σ 1 θ = k 2 is the surface of an n-dimensional hyperellipsoid in θ space (parameter error space) and give the semi-major axes of the hyperellipsoid. Describe the region θ T Σ 1 θ k 2. Hint: Transform θ into the uncorrelatedelements random vector π = U T θ (see the discussion in footnote 4) and determine the form of the level surfaces in the π space. 2. Let 0 Σ 1 Σ 2. Let a be an arbitrary unit vector in θ space. (i) Show that the variance of a T θ1 is less than or equal to the variance of a T θ2 for any direction a and give an interpretation { of this } fact. (ii) For an arbitrary fixed direction a consider events of a T the form θi Li, 0 L i <, i = 1, 2, each having a probability of occurrence of at least p, ( ) a T P θi Li p > 0, i = 1, 2. Use the Chebyshev inequality 5 P ( e L) 1 E {e2 } L 2. to determine values L 1 and L 2 satisfying the condition L 1 L 2 and give a mathematical relationship between these two bounds. Interpret this result. 3. Prove that if 0 < Σ 1 Σ 2, then Σ 1 1 Σ 1 2 > 0. This result, which is needed to answer the next question below, can be surprisingly hard to prove. If you can t fully prove it, just show some of your attempts and move on to the next problem. 6 4. Now consider events of the form { θt i Σ 1 i θ i k 2 }, for 0 k <, i = 1, 2. (i) Explain that such a region defines a hyperellipsoid volume in θ space (parameter error space) which is associated with the estimation error for the estimator ˆθ i. (ii) For a given probability of occurrence of the events of at least P, use the Tchebycheff inequality to determine a value for k. (iii) For the same fixed value of k, show that the 5 Because of ambiguous transliteration from the Cyrillic (Russian) alphabet, there are a variety of equivalent spellings. E.g., Tchebycheff, Chebyshev, Tchebychev, etc. The Chebyshev inequality is an important statistical result which can be found in the standard upper division electrical engineering probability and statistics textbooks. The form of the Chebyshev inequality used here is one of several equivalent variant statements. 6 Hint: Show that the result holds for diagonal covariance matrices. Then show that you can simultaneously diagonalize two arbitrary covariance matrices and use your result for diagonal matrices.

K. Kreutz-Delgado Copyright c 2003-2008, All Rights Reserved Version LSECE275CE F08v1.0 5 hyperellipsoid volume associated with θ 1 is completely contained in the hyperellipsoid volume associated with θ 2. (iv) Explain how the partial ordering Σ 1 Σ 2 can be interpreted in terms of nested concentration ellipsoid volumes 7 and, in a planar drawing, symbolically represent the ellipsoidal volumes of two matrices that cannot be ordered with respect to each other. Comment on the homework problems. Not surprisingly, more precise statements can be made under a multivariate gaussian assumption on the error θ. See, for example, the discussion on pages 225 226 in Statistical Signal Processing: Detection, Estimation, and Time Series Analysis, by L.L. Scharf (1991, Addison-Wesley). 7 That is, where is the estimation error, with probability P, likely to be be concentrated.