identity matrix, shortened I the jth column of I; the jth standard basis vector matrix A with its elements a ij

Similar documents
Properties of Matrices and Operations on Matrices

STAT 100C: Linear models

Introduction to Matrix Algebra

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 2

Chapter 1. Matrix Algebra

Lecture 1 Review: Linear models have the form (in matrix notation) Y = Xβ + ε,

LinGloss. A glossary of linear algebra

Math Bootcamp An p-dimensional vector is p numbers put together. Written as. x 1 x =. x p

STAT5044: Regression and Anova. Inyoung Kim

Preliminaries. Copyright c 2018 Dan Nettleton (Iowa State University) Statistics / 38

. a m1 a mn. a 1 a 2 a = a n

Math Linear Algebra Final Exam Review Sheet

Lecture Summaries for Linear Algebra M51A

Chapter 3. Matrices. 3.1 Matrices

Basic Concepts in Matrix Algebra

Mathematical Foundations of Applied Statistics: Matrix Algebra

Math Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88

linearly indepedent eigenvectors as the multiplicity of the root, but in general there may be no more than one. For further discussion, assume matrice

Review of Linear Algebra

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Lecture 1: Review of linear algebra

Linear Algebra Formulas. Ben Lee

2. Linear algebra. matrices and vectors. linear equations. range and nullspace of matrices. function of vectors, gradient and Hessian

Matrix Factorizations

EE731 Lecture Notes: Matrix Computations for Signal Processing

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

A note on the equality of the BLUPs for new observations under two linear models

Math113: Linear Algebra. Beifang Chen

Chapter 3 Transformations

MATH 240 Spring, Chapter 1: Linear Equations and Matrices

G1110 & 852G1 Numerical Linear Algebra

STAT 100C: Linear models

Math 4A Notes. Written by Victoria Kala Last updated June 11, 2017

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

IMPORTANT DEFINITIONS AND THEOREMS REFERENCE SHEET

Linear Algebra Primer

ELEMENTS OF MATRIX ALGEBRA

Section 3.9. Matrix Norm

Principal Components Theory Notes

Foundations of Matrix Analysis

MAT Linear Algebra Collection of sample exams

Linear Algebra: Characteristic Value Problem

2. Matrix Algebra and Random Vectors

Linear Algebra in Actuarial Science: Slides to the lecture

ELE/MCE 503 Linear Algebra Facts Fall 2018

Lecture 6: Geometry of OLS Estimation of Linear Regession

Bare minimum on matrix algebra. Psychology 588: Covariance structure and factor models

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =

Preliminary Linear Algebra 1. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 100

Knowledge Discovery and Data Mining 1 (VO) ( )

IMPORTANT DEFINITIONS AND THEOREMS REFERENCE SHEET

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 1 x 2. x n 8 (4) 3 4 2

Matrix Algebra, Class Notes (part 2) by Hrishikesh D. Vinod Copyright 1998 by Prof. H. D. Vinod, Fordham University, New York. All rights reserved.

Glossary of Linear Algebra Terms. Prepared by Vince Zaccone For Campus Learning Assistance Services at UCSB

Linear Models Review

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

1. General Vector Spaces

1 Last time: least-squares problems

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2017 LECTURE 5

MATH36001 Generalized Inverses and the SVD 2015

Matrix Algebra: Summary

Recall the convention that, for us, all vectors are column vectors.

Estimation of the Response Mean. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 27

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Seminar on Linear Algebra

Matrix Mathematics. Theory, Facts, and Formulas with Application to Linear Systems Theory. Dennis S. Bernstein

Review of Some Concepts from Linear Algebra: Part 2

Stat 206: Linear algebra

Hands-on Matrix Algebra Using R

Quiz ) Locate your 1 st order neighbors. 1) Simplify. Name Hometown. Name Hometown. Name Hometown.

Chap 3. Linear Algebra

Lecture 7: Positive Semidefinite Matrices

Digital Workbook for GRA 6035 Mathematics

1. Linear systems of equations. Chapters 7-8: Linear Algebra. Solution(s) of a linear system of equations (continued)

18.06 Problem Set 8 - Solutions Due Wednesday, 14 November 2007 at 4 pm in

A Little Necessary Matrix Algebra for Doctoral Studies in Business & Economics. Matrix Algebra

Theorem A.1. If A is any nonzero m x n matrix, then A is equivalent to a partitioned matrix of the form. k k n-k. m-k k m-k n-k

Lecture notes: Applied linear algebra Part 1. Version 2

Linear Algebra Highlights

Matrix Algebra, part 2

Linear Algebra M1 - FIB. Contents: 5. Matrices, systems of linear equations and determinants 6. Vector space 7. Linear maps 8.

SECTION 3.3. PROBLEM 22. The null space of a matrix A is: N(A) = {X : AX = 0}. Here are the calculations of AX for X = a,b,c,d, and e. =

There are six more problems on the next two pages

1. Addition: To every pair of vectors x, y X corresponds an element x + y X such that the commutative and associative properties hold

Linear algebra for computational statistics

~ g-inverses are indeed an integral part of linear algebra and should be treated as such even at an elementary level.

Matrices and Linear Algebra

Equality: Two matrices A and B are equal, i.e., A = B if A and B have the same order and the entries of A and B are the same.

3 (Maths) Linear Algebra

Linear Algebra: Matrix Eigenvalue Problems

Exercise Sheet 1.

Matrix Inequalities by Means of Block Matrices 1

Chapter 2: Matrix Algebra

Lecture 22: A Review of Linear Algebra and an Introduction to The Multivariate Normal Distribution

Computational Methods CMSC/AMSC/MAPL 460. Eigenvalues and Eigenvectors. Ramani Duraiswami, Dept. of Computer Science

Throughout these notes we assume V, W are finite dimensional inner product spaces over C.

5.1 Consistency of least squares estimates. We begin with a few consistency results that stand on their own and do not depend on normality.

On V-orthogonal projectors associated with a semi-norm

[y i α βx i ] 2 (2) Q = i=1

Linear Algebra. Session 12

Transcription:

Notation R R n m R n m r R n s real numbers set of n m real matrices subset of R n m consisting of matrices with rank r subset of R n n consisting of symmetric matrices NND n subset of R n s consisting of nonnegative definite (nnd) matrices: A NND n A = LL for some L; instead of nnd, the term positive semidefinite is often used PD n subset of NND n consisting of positive definite (pd) matrices: A = LL for some nonsingular L 0 null vector, null matrix; denoted also as 0 n or 0 n m 1 n column vector of ones, shortened 1 I n i j A = {a ij } A n m a A A = (a 1 : : a m) identity matrix, shortened I the jth column of I; the jth standard basis vector matrix A with its elements a ij n m matrix A column vector a R n transpose of the matrix A (A : B) partitioned (augmented) matrix a (1) A = a (n) A 1 A n m represented columnwise An m represented row-wise inverse of the matrix A A generalized inverse of the matrix A: AA A = A, also called {1}- inverse, or inner inverse {A } the set of generalized inverses of A S Puntanen et al, Matrix Tricks for Linear Statistical Models: Our Personal Top Twenty, DOI 101007/978-3-642-10473-2, Springer-Verlag Berlin Heidelberg 2011 427

428 Notation A 12 reflexive generalized inverse of A: AA A = A, A AA = A, also called {12}-inverse A + A ij In(A) = (π, ν, δ) the Moore Penrose inverse of A: the unique matrix satisfying the four Moore Penrose conditions: (mp1) AA A = A, (mp2) A AA = A, (mp3) (AA ) = AA, (mp4) (A A) = A A generalized inverse of A satisfying the Moore Penrose conditions (mpi) and (mpj) A 1/2 symmetric nnd square root of A NND n: A 1/2 = TΛ 1/2 T, where A = TΛT is the eigenvalue decomposition of A A +1/2 (A + ) 1/2 a, b a, b V a b a a V inertia of the square matrix A: π, ν, and δ are the number of positive, negative, and zero eigenvalues of A, respectively, all counting multiplicities standard inner product in R n : a, b = a b; can denote also a general inner product in a vector space inner product a Vb; V is the inner product matrix (ipm) vectors a and b are orthogonal with respect to a given inner product Euclidean norm (standard norm, 2-norm) of vector a, also denoted a 2: a 2 = a a; can denote also a general vector norm in a vector space a 2 V = a Va, norm when the ipm is V (ellipsoidal norm) A, B standard matrix inner product between A, B R n m : A, B = tr(a B) = i,j a ijb ij A F Euclidean (Frobenius) norm of the matrix A: A 2 F = tr(a A) = i,j a2 ij A 2 A 1 2 matrix 2-norm of the matrix A (spectral norm): A 2 = max Ax 2 = sg 1 (A) = + ch 1(A A) x 2=1 matrix 2-norm of nonsingular A n n: A 1 2 = 1/ sg n (A) cond(a) condition number of nonsingular A n n: cond(a) = A 2 A 1 2 = sg 1 (A)/ sg n (A) cos(a, b) cos (a, b), the cosine of the angle, θ, between the nonzero vectors a and b: cos(a, b) = cos θ = cos (a, b) = a,b a b (a, b) the angle, θ, 0 θ π, between the nonzero vectors a and b: θ = (a, b) = cos 1 (a, b) A[α, β] A[α] submatrix of A n n, obtained by choosing the elements of A which lie in rows α and columns β; α and β are index sets of the rows and the columns of A, respectively A[α, α], principal submatrix; same rows and columns chosen

Notation 429 A(α, β) A(i, j) minor(a ij ) A L i ith leading principal submatrix of A n n: A L i = A[α, α], where α = {1,, i} submatrix of A, obtained by choosing the elements of A which do not lie in rows α and columns β submatrix of A, obtained by deleting row i and column j from A ijth minor of A corresponding to a ij : minor(a ij ) = det(a(i, j)), i, j {1,, n} cof(a ij ) ijth cofactor of A: cof(a ij ) = ( 1) i+j minor(a ij ) det(a) determinant of the matrix A n n: det(a) = a, a R, det(a) = n j=1 a ij cof(a ij ), i {1,, n}: the Laplace expansion by minors det(a[α]) det(a L i ) A diag(a) diag(d 1,, d n) diag(d) A δ rk(a) rank(a) tr(a) trace(a) vec(a) A B along the ith row principal minor leading principal minor of order i determinant of the matrix A n n diagonal matrix formed by the diagonal entries of A n n n n diagonal matrix with listed diagonal entries n n diagonal matrix whose ith diagonal element is d i diagonal matrix formed by the diagonal entries of A n n rank of the matrix A rank of the matrix A n trace of the matrix A n n : tr(a) = i=1 a ii trace of the matrix A n n vectoring operation: the vector formed by placing the columns of A under one another successively Kronecker product of A n m and B p q: a 11B a 1mB A B = R np mq a n1b a nmb A/A 11 Schur complement of A 11 in A = ( A 11 A 12 A 21 A 22 ) : A/A 11 = A 22 A 21A 11 A12 A 22 1 A L 0 A > L 0 A L B A < L B A rs B A 22 A 21A 11 A12 A is nonnegative definite: A = LL for some L; A NND n A is positive definite: A = LL for some invertible L; A PD n B A is nonnegative definite; B A NND n; A lies below B with respect to the Löwner ordering B A is positive definite; B A PD n A and B are rank-subtractive; rk(b A) = rk(b) rk(a); A lies below B with respect to the minus ordering

430 Notation Sh(V X) the shorted matrix of V NND n with respect to X n p, Sh(V X) is the maximal element U (in the Löwner ordering) in the set U = { U : 0 L U L V, C (U) C (X) } P A orthogonal projector onto C (A) (wrt I): P A = A(A A) A = AA + P A;V orthogonal projector onto C (A) wrt V PD n: P A;V = A(A VA) A V P A;V generalized orthogonal projector onto C (A) wrt V NND n: P A;V = A(A VA) A V + A[I (A VA) A VA]U, where U is arbitrary P A B projector onto C (A) along C (B): P A B (A : B) = (A : 0) {P A B } set of matrices satisfying: P A B (A : B) = (A : 0) P U orthogonal projector onto the vector space U (wrt a given inner product) C (A) column space of the matrix A n p: C (A) = { y R n : y = Ax for some x R p } N (A) null space of the matrix A n p: N (A) = { x R p : Ax = 0 } C (A) orthocomplement of C (A) wrt I: C (A) = { z R n : z Ax = 0 x R p } = N (A ) A matrix whose column space is C (A ) = C (A) C (A) V orthocomplement of C (A) wrt V: C (A) V = { z R n : z VAx = 0 x R p } = N (A V) A V p A (x) U V U + V U V U V U V ch i (A) = λ i ch(a) ch(a, B) nzch(a) chv i (A) matrix whose column space is C (A) V the characteristic polynomial of A: p A (x) = det(a xi) U is a subset of V; possibly U = V sum of the vector spaces U and V direct sum of the vector spaces U and V direct sum of the orthogonal vector spaces U and V intersection of the vector spaces U and V the ith largest eigenvalue of A n n (all eigenvalues being real) set of all n eigenvalues of A n n, including multiplicities, called also the spectrum of A: ch(a) = {ch 1(A),, ch n(a)} set of proper eigenvalues of symmetric A n n with respect to B NND n; λ ch(a, B) if Aw = λbw, Bw 0 set of the nonzero eigenvalues of A n n: nzch(a) = {ch 1(A),, ch r(a)}, r = rank(a) eigenvector of A n n with respect to λ i = ch i (A): a nonzero vector t i satisfying the equation At i = λ i t i sg i (A) = δ i the ith largest singular value of A n m: sg i (A) = + ch i (A A) = + ch i (AA )

Notation 431 sg(a) nzsg(a) ρ(a) var s(y) set of the singular values of A n m (m n): sg(a) = {sg 1 (A),, sg m (A)} set of the nonzero singular values of A n m: nzsg(a) = {sg 1 (A),, sg r (A)}, r = rank(a) the spectral radius of A n n: the maximum of the absolute values of the eigenvalues of A n n sample variance of the variable y var d (y) = s 2 y sample variance: argument is the variable vector y R n : var d (y) = 1 n 1 y Cy = 1 n n 1 i=1 (y i ȳ) 2 cov s(x, y) sample covariance between the variables x and y cov d (x, y) = s xy sample covariance: arguments are variable vectors R n : cov d (x, y) = 1 n 1 x Cy = 1 n n 1 i=1 (x i x i )(y i ȳ) cor d (x, y) = r xy x x U u 1,, u d u (1),, u (n) sample correlation: r xy = x Cy/ x Cx y Cy = cos(cx, Cy) projection of x onto C (1 n): x = Jx = x1 n centered x: x = Cx = x Jx = x x1 n n d data matrix of the u-variables: u (1) U = (u 1 : : u d ) = u (n) variable vectors in variable space R n observation vectors in observation space R d ū vector of means of the variables u 1,, u d : ū = (ū 1,, ū d ) Ũ ũ 1,, ũ d ũ (1),, ũ (n) centered U : Ũ = CU, C is the centering matrix centered variable vectors centered observation vectors var d (u i ) = s 2 i sample variance: argument is the variable vector u i R n : var d (u i ) = 1 n 1 u icu i = 1 n n 1 l=1 (u li ū i ) 2 cov d (u i, u j ) = s ij sample covariance: arguments are variable vectors R n : s ij = 1 n 1 u icu j = 1 n n 1 l=1 (u li ū i )(u lj ū j ) ssp(u) = {t ij } matrix T (d d) of the sums of squares and products of deviations about the mean: T = U n CU = i=1 (u (i) ū)(u (i) ū) cov d (U) = {s ij } sample covariance matrix S (d d) of the data matrix U: S = 1 n 1 T = 1 n n 1 i=1 (u (i) ū)(u (i) ū) cor d (u i, u j ) = r ij sample correlation: arguments are variable vectors R n cor d (U) = {r ij } sample correlation matrix R (d d) of the data matrix U: R = cor d (U) = (diag S) 1/2 S(diag S) 1/2 MHLN 2 (u (i), ū, S) sample Mahalanobis distance (squared) of the ith observation from the mean: MHLN 2 (u (i), ū, S) = (u (i) ū) S 1 (u (i) ū)

432 Notation MHLN 2 (ū i, ū j, S ) MHLN 2 (u, µ, Σ) E( ) var(x) = σ 2 x sample Mahalanobis distance (squared) between two mean vectors: MHLN 2 (ū i, ū j, S ) = (ū i ū j ) S 1 (ū i ū j ), where 1 S = n 1+n 2 2 (U 1C n1 U 1 + U 2C n2 U 2) population Mahalanobis distance squared: MHLN 2 (u, µ, Σ) = (u µ) Σ 1 (u µ) expectation of a random argument: E(x) = p 1x 1 + + p k x k if x is a discrete random variable whose values are x 1,, x k with corresponding probabilities p 1,, p k variance of the random variable x: σ 2 x = E(x µ x) 2, µ x = E(x) cov(x, y) = σ xy covariance between the random variables x and y: σ xy = E(x µ x)(y µ y), µ x = E(x), µ y = E(y) cor(x, y) = ϱ xy correlation between the random variables x and y: ϱ xy = σxy σ xσ y cov(x) covariance matrix (d d) of a d-dimensional random vector x: cov(x) = Σ = E(x µ x)(x µ x) cor(x) correlation matrix (d d) of the random vector x: cor(x) = ρ = (diag Σ) 1/2 Σ(diag Σ) 1/2 cov(x, y) (cross-)covariance matrix between the random vectors x and y: cov(x, y) = E(x µ x)(y µ y) = Σ xy cov(x, x) cor(x, y) cov(x, x) = cov(x) (cross-)correlation matrix between the random vectors x and y ( ) ( x cov y partitioned covariance matrix of the random vector xy ) : ( ) ( ) ( ) x Σxx σ xy cov(x, x) cov(x, y) cov = = y cov(x, y) var(y) x (µ, Σ) σ xy E(x) = µ, cov(x) = Σ x N p(µ, Σ) x follows the p-dimensional normal distribution N p(µ, Σ) n(x; µ, Σ) σ 2 y density for x N p(µ, Σ), Σ pd: n(x; µ, Σ) = 1 (2π) p/2 Σ 1/2 e 1 2 (x µ) Σ 1 (x µ) cc i (x, y) cc(x, y) cc +(x, y) ith largest canonical correlation between the random vectors x and y set of the canonical correlations between the random vectors x and y set of the nonzero (necessarily positive) canonical correlations between the random vectors x and y; square roots of the nonzero eigenvalues of P A P B : cc +(x, y) = nzch 1/2 (P A P B ) = nzsg[(a A) +1/2 A B(B B) +1/2 ], ( ) ( ) x A A A B cov = y B A B B X = (1 : X 0) in regression context often the model matrix

Notation 433 X 0 x 1,, x k x (1),, x (n) ssp(x 0 : y) cov d (X 0 : y) cor d (X 0 : y) n k data matrix of the x-variables: x (1) X 0 = (x 1 : : x k ) = x (n) variable vectors in the variable space R n observation vectors in the observation space R k partitioned matrix of the sums of squares and products of deviations about the mean of data (X 0 : y): ( ) T ssp(x 0 : y) = xx t xy t = (X xy t 0 : y) C(X 0 : y) yy partitioned sample covariance matrix of data (X 0 : y): ( ) Sxx s xy cov d (X 0 : y) = s xy partitioned sample correlation matrix of data (X 0 : y): ( ) cor d (X 0 : y) = R xx s 2 y r xy r xy 1 H orthogonal projector onto C (X), the hat matrix: H = X(X X) X = XX + = P X M J orthogonal projector onto C (X) : M = I n H the orthogonal projector onto C (1 n): J = 1 n 1n1 n = P 1n C centering matrix, the orthogonal projector onto C (1 n) : C = I n J (X 1 : X 2) partitioned model matrix X M 1 ˆβ orthogonal projector onto C (X 1) : M 1 = I n P X1 solution to normal equation X Xβ = X y, OLSE(β) X ˆβ = ŷ ŷ = Hy = OLS fitted values, OLSE(Xβ), denoted also Xβ = ˆµ, when µ = Xβ β solution to generalized normal equation X W Xβ = X W y, where W = V + XUX, C (W) = C (X : V) β if V is positive definite and X has full column rank, then β = BLUE(β) = (X V 1 X) 1 X V 1 y X β BLUE(Xβ), denoted also Xβ = µ ȳ mean of the response variable y: ȳ = (y 1 + + y n)/n x vector of the means of k regressor variables x 1,, x k : x = ( x 1,, x k ) R k ȳ projection of y onto C (1 n): ȳ = Jy = ȳ1 n

434 Notation ỹ centered y, ỹ = Cy = y ȳ ˆβ x ˆβ x = T 1 xx t xy ˆβ 0 BLP(y; x) BLUE(K β) BLUP(y f ; y) variables when X = (1 : X 0) = S 1 xx s xy: the OLS-regression coefficients of x- ˆβ0 = ȳ ˆβ x x = ȳ ( ˆβ 1 x 1 + + ˆβ k x k ): OLSE of the constant term (intercept) when X = (1 : X 0) the best linear predictor of the random vector y on the basis of the random vector x the best linear unbiased estimator of estimable parametric function K β, denoted as K β or K β the best linear unbiased predictor of a new unobserved y f LE(K β; y) (homogeneous) linear estimator of K β, where K R p q : {LE(K β; y)} = { Ay : A R q n } LP(y; x) (inhomogeneous) linear predictor of the p-dimensional random vector y on the basis of the q-dimensional random vector x: {LP(y; x)} = { f(x) : f(x) = Ax + a, A R p q, a R p } LUE(K β; y) (homogeneous) linear unbiased estimator of K β: {LUE(K β; y)} = { Ay : E(Ay) = K β } LUP(y f ; y) linear unbiased predictor of a new unobserved y f : {LUP(y f ; y)} = { Ay : E(Ay y f ) = 0 } MSEM(f(x); y) MSEM(Fy; K β) mean squared error matrix of f(x) (= random vector, function of the random vector x) with respect to y (= random vector or a given fixed vector): MSEM[f(x); y] = E[y f(x)][y f(x)] mean squared error matrix of the linear estimator Fy under {y, Xβ, σ 2 V} with respect to K β: MSEM(Fy; K β) = E(Fy K β)(fy K β) OLSE(K β) the ordinary least squares estimator of parametric function K β, denoted as K ˆβ or K β; here ˆβ is any solution to the normal equation X Xβ = X y risk(fy; K β) quadratic risk of Fy under {y, Xβ, σ 2 V} with respect to K β: risk(fy; K β) = tr[msem(fy; K β)] = E(Fy K β) (Fy K β) M mix M linear model: {y, Xβ, σ 2 V}: y = Xβ+ε, cov(y) = cov(ε) = σ 2 V, E(y) = Xβ mixed linear model: M mix = {y, Xβ+Zγ, D, R}: y = Xβ+Zγ+ ε; γ is the vector of the random effects, cov(γ) = D, cov(ε) = R, cov(γ, ε) = 0, E(y) = Xβ M f linear model with new future observations y f : {( ) ( ) ( )} y Xβ M f =,, σ 2 V V 12 y f X f β V 21 V 22