ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters, transforms, etc) as matrices This lecture reviews basic concepts from linear algebra that will be useful Linear Vector Space Definition A linear vector space X is a collection of elements satisfying the following properties: addition: x, y, z X a) x + y X b) x + y = y + x c) (x + y) + z = x + (y + z) d) X, such that x + = x e) x X, x X such that x + ( x) = multiplication: x, y X and a, b R a) a x X b) a(b x) = (ab) x c) x = x, x = d) a(x + y) = ax + ay Example Here are two examples of linear vector spaces The familiar d-dimensional Euclidean space R d and the space of finite energy signals/functions supported on the interval [, T ] { } L 2 ([, T ]) := x : T It is easy to verify the properties above for both examples x 2 (t) dt < Definition 2 A subset M X is subspace if x, y M ax + by M, scalars a, b Definition 3 An inner product is a mapping from X X to R The inner product between any x, y X is denoted by x, y and it satisfies the following properties for all x, y, z X : a) x, y = y, x b) ax, y = a x, y, scalars a c) x + y, z = x, z + y, z d) x, x A space X equipped with an inner product is called an inner product space
Lecture 3: Review of Linear Algebra 2 Example 2 Let X = R n Then x, y := x T y = n i= x iy i Example 3 Let X = L 2 ([, ]) Then x, y := x(t)y(t) dt The inner product measures the alignment of the two vectors The inner product induces a norm defined as x := x, x The norm measures the length/size of x The inner product x, y = x y cos(θ), where θ is the angle between x and y Thus, in general, for every x, y X we have x, y x y, with equality if and only if x and y are linearly dependent or parallel ; ie, θ = This is called the Cauchy-Schwarz Inequality Two vectors x, y are orthogonal if x, y = Example 4 Let X = R 2, then x = and y = are orthogonal, as are u = and v = Definition 4 An inner product space that contains all its limits is called a Hilbert Space and in this case we often denote the space by H; ie, if x, x 2, are in H and lim n x n exists, then the limit is also in H It is easy to verify that R n, L 2 ([, T ]), and l 2 (Z), the set of all finite energy sequences (eg, discrete-time signals), are all Hilbert Spaces 2 Bases and Representations Definition 5 A collection of vectors {x,, x k } are said to be linearly independent if none of them can be represented as a linear combination of the others That is, for any x i and every set of scalar weights {θ j } we have x i j i θ jx j Definition 6 The set of all vectors that can be generated by taking linear combinations of {x,, x k } have the form k v = θ i x i, is called the span of {x,, x k }, denoted span(x,, x k ) i= Definition 7 A set of linearly independent vectors {φ i } i is a basis for H if every x H can be represented as a unique linear combination of {φ i } That is, every x H can be expressed as x = i θ i φ i for a certain unique set of scalar weights {θ i } [ Example 5 Let H = R 2 Then and are a basis (since they are orthogonal) Also, [ ] are a basis because they are linearly independent (although not orthogonal) ] and Definition 8 An orthonormal basis is one satisfying φ i, φ j = δ ij := {, i = j, i j
Lecture 3: Review of Linear Algebra 3 Every x H can be represented in terms of an orthonormal basis {φ i } i (or orthobasis for short) according to: x = i x, φ i φ i This is easy to see as follows Suppose x has a representation i θ iφ i Then x, φ j = i θ iφ i, φ j = i θ i φ i, φ j = i θ iδ i,j = θ j Example 6 Here is an orthobasis for L 2 ([, ]): for i =, 2, Doesn t it look familiar? φ 2i (t) := 2 cos(2π(i )t) φ 2i (t) := 2 sin(2πit) Any basis can be converted into an orthonormal basis using Gram-Schmidt Orthogonalization Let {φ i } be a basis for a vector space X An orthobasis for X can be constructed as follows Gram-Schmidt Orthogonalization ψ := φ / φ 2 ψ 2 := φ 2 ψ, φ 2 ψ ; ψ 2 = ψ 2/ ψ 2 k ψ k = φ k k i= ψ i, φ k ψ i ; ψ k = ψ k / ψ k 3 Orthogonal Projections and Filters One of the most important tools that we will use from linear algebra is the notion of an orthogonal projection Most linear filters used in signal processing (eg, bandpass filters, averaging filters, etc) may all be interpreted as or related to orthogonal projections Let H be a Hilbert space and let M H be a subspace Every x H can be written as x = y + z, where y M and z M, which is shorthand for z orthogonal to M; that is v M, v, z = The vector y is the optimal approximation to x in terms of vectors in M in the following sense: x y = min x v v M The vector y is called the projection of x onto M Here is an application of this concept Let M H and let {φ i } r i= be an orthobasis for M We say that the subspace M is spanned by {φ i } r i= Note that this implies that M is an r-dimensional subspace of H (and it is isometric to R r ) For any x H, the projection of x onto M is given by y = r φ i, x φ i i= and this projection can be viewed as a sort of filter that removes all components of the signal that are orthogonal to M
Lecture 3: Review of Linear Algebra 4 [ ] [ Example 7 Let H = R 2 Consider the canonical coordinate system φ = and φ 2 = the subspace spanned by φ The projection of any x = [x x 2 ] T R 2 onto this subspace is ( [ ]) x P x = x, = [x x 2 ] = ] Consider The projection operator P is just a matrix and it is given by P := φ φ T = [ ] = [ ] [ ] / 2 It is also easy to check that φ = / / 2 and φ 2 = 2 / is an orthobasis for R 2 What is the 2 projection operator onto the span of φ in this case? More generally suppose we are considering R n and we have an orthobasis {φ i } r i= for some r-dimensional, r < n, subspace M of R n Then the projection matrix is given by P M = r i= φ iφ T i Moreover, if {φ i} r i= is a basis for M, but not necessarily orthonormal, then P M = Φ(Φ T Φ) Φ T, where Φ = [φ,, φ r ], a matrix whose columns are the basis vectors Example 8 Let H = L 2 ([, ]) and let M = {linear functions on [, ]} Since all linear functions have the form at + b, for t [, ], here is a basis for M: φ (t) =, φ 2 (t) = t Note that this means that M is two-dimensional That makes sense since every line is defined by its slope and intercept (two real numbers) Using the Gram-Schmidt procedure we can construct the orthobasis ψ (t) =, ψ 2 (t) = t /2 Now, consider any signal/function x L 2 ([, ]) The projection of x onto M is 4 Eigenanalysis P M x = x, + x, t /2 (t /2) = x(τ) dτ + (t /2) (τ /2)x(t) dτ Suppose A is an m n matrix with entries from a field (eg, R or C, the latter being the complex numbers) Then there exists a factorization of the form A = U D V where U = [u u m ] is m m with orthonormal columns, V = [v v n ] is n n with orthonormal columns and the superscript means transposition and conjugation (if complex-valued), and the matrix D is m n and has the form σ σ 2 D = σ m The values σ,, σ m are called the singular values of A and the factorization is called the singular value decomposition (SVD) Because of the orthonormality of the columns of U and V we have Av i = σ i u i and A u i = σ i v i, i =,, m A vector u is called an eigenvector of A if A u = λ u for some scalar λ The scalar λ is called the eigenvalue associated with u Symmetric matrices (which are always square) always have real eigenvalues and have an eigendecomposition of the form A = UDU, where the columns of U the orthonormal eigenvectors of A, D is a diagonal matrix written D = diag(λ,, λ n ), and diagonal entries are the eigenvalues This is just a special case of the SVD A symmetric positive-semidefinite matrix satisfies the property v T Av for all v This implies that the eigenvalues of symmetric positive-semidefinite matrices are non-negative
Lecture 3: Review of Linear Algebra 5 Example 9 Let X be a random vector taking values in R n and recall the definition of the covariance matrix: Σ := E[(X µ)(x µ) T ] It is easy to see that v T Σv, and of course Σ is symmetric Therefore, every covariance matrix has an eigendecomposition of the form Σ = UDU, where D = diag(λ,, λ n ) and λ i for i =,, n The Karhunen-Loève Transform (KLT), which is also called Principal Component Analysis (PCA), is based on transforming a random vector X into the coordinate system associated with the eigendecomposition of the covariance of X Let X be an n-dimensional random vector with covariance matrix Σ = UDU Let u,, u n be the eigenvectors Assume that the eigenvectors and eigenvalues are ordered such that λ λ 2 λ n The KLT or PCA representation of the random vector X is given by X = n (u T i X)u i i= The coefficients in this representation can be arranged in a vector as θ = U T X, where U is as defined above The vector θ is called the KLT or PCA of X Using this representation we can define the approximation to X in the span of the first r < n eigenvectors X r = r (u T i X)u i i= Note that this approximation involves only r scalar random variables {(u T i X)}r i= rather than n In fact, it is easy to show that among all r-term linear approximations of X in terms of r random variables, X r has the smallest mean square error; that is if we let S r denote all r-term linear approximations to X, then E[ X X r 2 ] = min E[ X Y r ] Y r S r