Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =

Size: px

Start display at page:

Download "Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A ="

Spencer Hines
5 years ago
Views:

1 Matrices and vectors A matrix is a rectangular array of numbers Here s an example: A = This matrix has dimensions 2 3 The number of rows is first, then the number of columns We can write the n p matrix X abstractly as x 11 x 12 x 13 x 1p x 21 x 22 x 23 x 2p X = x 31 x 32 x 33 x 3p x n1 x n2 x n3 x np 1

2 Another notation that is common is A = [a ij ] n m for an n m matrix A with element a ij in the i th row and j th column The matrix X on the previous page would then be written X = [x ij ] n p If two matrices A = [a ij ] n m and B = [b ij ] n m have the same dimensions, you can add them together, element by element, to get a new matrix C = [c ij ] n m That is, C = A + B is the matrix with elements c ij = a ij + b ij For example, = =

3 Multiplying a matrix A = [a ij ] n m by a number b yields the matrix C = Ab with elements c ij = a ij b For example, 1 2 1( 2) 2( 2) 2 4 ( 2) 5 7 = 5( 2) 7( 2) = ( 2) 20( 2) The transpose of a matrix A takes the matrix A and makes the rows the columns and the columns the rows Precisely, if A = [a ij ] n m then A is the m n matrix with elements a ij = a ji For example: If A = Question: what is (A )? , then A =

4 A vector is a matrix with only one column or row, called a column vector or row vector respectively Here s an example of each: 1 x = 1, y = [ ] 14 Note that for these vectors, x = y and y = x The product of an 1 n row vector and a n 1 column vector is the sum of the pairwise products of elements So if x = [x i ] 1 n and y = [y i ] n 1 then xy = n i=1 x iy i 4

5 For example, if x = [ 1 2 ] and y = xy = [ 1 2 ] then = 1(10) + 2( 5) = 20 The inner product of two n 1 column vectors x and y is the product x y = [ x 1 x 2 x 3 x n ] y 1 y 2 y 3 y n = n x i y i i=1 5

6 Note that if x = x 1 x 2 is a point in the plane R 2, then x x = x x 2 2 is the square of the length of x That is, x = x x We are now ready to define general matrix multiplication The product of an n p matrix A and a p m matrix B is the n m matrix C with elements c ij = p k=1 a ikb kj Let A be comprised of n 1 p row vectors a 1,,a n and let B be comprised of m p 1 column vectors b 1,,b m like A = a 1 a 2 a n n p and B = b 1 b 2 b m p m 6

7 Then c ij = a i b j : C = For example, let A = Then a 1 b 1 a 1 b 2 a 1 b m a 2 b 1 a 2 b 2 a 2 b m a n b 1 a n b 2 a n b m and B = AB = = 1(2) 1(0) 2(1) 1(0) 1( 5) 2(05) 1( 2) 1(7) 2( 4) 1(0) + 5(1) 3(2) 3(0) 1( 5) + 5(05) 3( 2) 1(7) + 5( 4)

8 On the previous slide, does BA make sense? No The rows of the first matrix must be the same length as the columns of the second Note that, in general, AB BA Define I n n as Then I = A n p I p p = A n p and I n n A n p = A n p, for any A n p The matrix I n n is called the n n identity matrix 8

9 For example, = and =

10 The inverse of a square (p p) matrix A is the p p matrix A 1 such that A 1 A = AA 1 = I p p For example, if A = 1 1, then 0 2 and so A 1 = as well = Note that we must have = , 10

11 There is an algorithm for finding the inverse of any size matrix but it is very computationally intensive, except for 2 2 matrices Let A = a 11 a 12 a 21 a 22 Then A 1 = 1 a 11 a 22 a 12 a 21 a 22 a 12 a 21 a 11 That is, switch the diagonal entries, multiply the off-diagonals by 1, and divide the works by a 11 a 22 a 12 a 21 We can show that this is the inverse in class Try it out on A on the previous slide 11

12 Not every square matrix has an inverse For example A = 1 2, 2 4 does not Try the formula on the previous slide out on this matrix What happens? Square matrices that do not have an inverse are said to be singular 12

13 The two sample problem in terms of matrices Recall the two-sample normal model with equal variances: Y 11, Y 12,,Y 1n1 iid N(µ 1, σ 2 ), Y 21, Y 22,,Y 2n2 iid N(µ 2, σ 2 ) We can rewrite this as where Y ij = µ i + e ij, e ij iid N(0, σ 2 ), where i = 1, 2 indexes the group (1 or 2) and j = 1,,n i is the observation within the group 13

14 Each piece of data Y ij follows: Y 11 = µ 1 + e 11 Y 12 = µ 1 + e 12 Y 13 = µ 1 + e 13 Y 1n1 = µ 1 + e 1n1 Y 21 = µ 2 + e 21 Y 22 = µ 2 + e 22 Y 23 = µ 2 + e 23 Y 2n2 = µ 2 + e 2n2 14

15 Define the following vectors and matrices: Y = Y 11 Y 12 Y 1n2 Y 21 Y 22 Y 2n2, X = , µ = µ 1 µ 2, and e = e 11 e 12 e 1n2 e 21 e 22 e 2n2 15

16 Then we can write the model succinctly as Y = Xµ + e We ll show this on the board for n 1 = n 2 = 3 It turns out the the MLE s for µ 1 and µ 2, namely ˆµ 1 = n 1 n1 1 j=1 y 1j = ȳ 1 and ˆµ 2 = n 1 n2 2 j=1 y 2j = ȳ 2 are obtained in matrix terms as ˆµ = ˆµ 1 ˆµ 2 = (X X) 1 X y 16

17 We ll show part of this: X X = = n n 2 17

18 And so Also, X y = (X X) 1 = n n = 1 n n 2 y 11 y 12 y 1n2 y 21 y 22 = n1 j=1 y 1j n2 j=1 y 2j y 2n2 18

19 Then (X X) 1 X y = 1 n n 2 n1 j=1 y 1j n2 j=1 y 2j = ȳ1 ȳ 2 = ˆµ 1 ˆµ 2, as promised The MLE of σ 2 in terms of matrices is ˆσ 2 = (y Xˆµ) (y Xˆµ)/(n 1 + n 2 ) = y Xˆµ 2 /(n 1 + n 2 ) What is the point? Although the two-sample normal model is fairly simple, very complex models with multiple predictors, both categorical and continuous, can be written as Y = Xβ + e, including the simple linear regression model, multiple regression models, oneway and multiway ANOVA models, and ANCOVA models 19

20 In Y = Xβ + e, Y is the n 1 data vector X is the n p design matrix Often the i th row of X is comprised of p 1 measurements taken on the i th subject in a study, eg the i th row of an Excel spreadsheet, and an intercept term β is the p 1 coefficient vector For the two-sample model, p = 2 and β = (µ 1, µ 2 ) e is the n 1 error vector All the elements of e are assumed to be iid N(0, σ 2 ) 20

21 The j th residual in group i is defined to be r ij = y ij Ê(Y ij) = y ij ˆµ i These can be plotted versus the group number i = 1, 2 to assess whether constant variance across groups is reasonable A histogram of all n = n 1 + n 2 residuals can be used to assess the normality assumption There are formal tests for both constant variance and for normality > skull <- c(english,celt) > group <- c(rep("english",length(english)),rep("celt",length(celt))) > group <- factor(group) > ttest(skull~group,varequal=t) Two Sample t-test data: skull by group t = , df = 32, p-value = 9003e-09 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval:

22 sample estimates: mean in group Celt mean in group English > fit1 <- lm(skull~group) #lm() is "linear model" > summary(fit1) Call: lm(formula = skull ~ group) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std Error t value Pr(> t ) (Intercept) <2e-16 *** groupenglish e-09 *** --- F-statistic: 5922 on 1 and 32 DF, p-value: 9003e-09 > r <- fit1$residuals # get residuals > plot(group,r) # gives boxplots instead of scatterplot > hist(r) # looks roughly normal 22

23 Histogram of r Frequency Celt English r Figure 1: Boxplots show roughly the same spread; histogram is roughly normal looking 23

24 The lm() fit to the data fits a different, but equivalent model: Celts : Y 1,1,,Y 1,16 iid N(µ, σ 2 ), English : Y 2,1,,Y 2,18 iid N(µ + δ, σ 2 ), reports estimates of the mean Celtic headbreadth µ, and the mean difference for Englishmen δ Of interest in this reparameterized model is H 0 : δ = 0, which is the same as testing H 0 : µ 1 = µ 2 in the two-sample model 24

25 The simple linear regression model stipulates Y i = β 0 + β 1 x i + e i, where i = 1,,n Here we can write Y = Y 1 Y 2, X = 1 x 1 1 x 2, β = β 0 β 1, and e = e 1 e 2 Y n 1 x n e n The model is succinctly written Y = Xβ + e 25

26 As in the two-sample model, the MLE s ˆβ = (ˆβ 0, ˆβ 1 ) are given by ˆβ = ˆβ 0 ˆβ 1 = (X X) 1 X y The MLE for σ 2 is ˆσ 2 = (y Xˆβ) (y Xˆβ) n = 1 n n (y i [ˆβ 0 + ˆβ 1 x i ]) 2 i=1 For the i th observation, the fitted value ŷ i is the point on the line at x i, namely ˆβ 0 + ˆβ 1 x i So at value x i we see the observed value y i, but estimate the underlying mean ŷ i = Ê(Y i) = ˆβ 0 + ˆβ 1 x i Since the MLE s ˆβ 0 and ˆβ 1 are unbiased, E(ˆβ 0 + ˆβ 1 x i ) = E(ˆβ 0 ) + E(ˆβ 1 )x i = β 0 + β 1 x i 26

27 That is, E(ŷ i ) = β 0 + β 1 x i The i th residual r i estimates the unknown e i = y i [β 0 + β 1 x i ] by r i = y i [ˆβ 0 + ˆβ 1 x i ] = y i Ê(Y i) These are used to check model assumptions: (1) that the e 1,,e n iid N(0, σ 2 ), and (2) that the mean changes linearly with x: E(Y i ) = β 0 + β 1 x i A histogram of r 1,,r n should be roughly symmetric and unimodal, ie normal looking A plot of r i versus x i should show no discernable pattern and roughly constant spread 27

28 > fit2 <- lm(score~ses) > plot(fit2) # get a variety of diagnostic plots automatically > r <- fit2$residuals > plot(ses,r) > hist(r) r ses Figure 2: Residual plot shows no discernable pattern 28

29 Histogram of r Frequency r Figure 3: Histogram of residuals is roughly normal looking 29

. a m1 a mn. a 1 a 2 a = a n

. a m1 a mn. a 1 a 2 a = a n Biostat 140655, 2008: Matrix Algebra Review 1 Definition: An m n matrix, A m n, is a rectangular array of real numbers with m rows and n columns Element in the i th row and the j th column is denoted by