Solution to Series 11

Size: px

Start display at page:

Download "Solution to Series 11"

Brianna Douglas
6 years ago
Views:

1 Prof. Dr. M. Maathuis Multivariate Statistics SS 2014 Solution to Series a) > car <- read.table(" sep=";", header=t, na.strings="") b) As the first random vector X only has two components, we can obtain only two pairs of canonical variables. c) > X <- car[,c(6,5)] > Y <- car[,c(3,4,7,8,9,10)] > dat <- cbind(x,y) > R <- cor(dat) > R11 <- R[1:2, 1:2] > R22 <- R[3:8, 3:8] > R12 <- R[1:2, 3:8] > R21 <- R[3:8, 1:2] > R11.inv <- solve(r11) > R22.inv <- solve(r22) > # compute E1 and E2 > E1 <- R11.inv %*% R12 %*% R22.inv %*% R21 > E2 <- R22.inv %*% R21 %*% R11.inv %*% R12 > # compute eigenvalues and eigenvectors of E1 and E2: > eigen(e1) [1] [,1] [,2] [1,] [2,] > eigen(e2) [1] e e e e-17 [5] e e-17 [,1] [,2] [,3] [,4] [1,] [2,] [3,] [4,] [5,] [6,] [,5] [,6] [1,] [2,] [3,] [4,] [5,] [6,]

2 2 > # note nonzero eigenvalues are indeed the same and positive > > # compute the canonical correlation vectors: > a1 <- eigen(e1)[,1] > a2 <- eigen(e1)[,2] > b1 <- eigen(e2)[,1] > b2 <- eigen(e2)[,2] > # correct the scaling of the canonical correlation vectors: > round((a1 <- -1 * a1 / sqrt(t(a1) %*% R11 %*% a1)),2) [1] > round((a2 <- -1 * a2 / sqrt(t(a2) %*% R11 %*% a2)),2) [1] > round((b1 <- -1 * b1 / sqrt(t(b1) %*% R22 %*% b1)),2) [1] > round((b2 <- -1 * b2 / sqrt(t(b2) %*% R22 %*% b2)),2) [1] The first pair of canonical vectors thus is and the second one a 1 = ( 0.37, 0.67) b 1 = ( 0.38, +0.17, 0.00, 0.47, 0.24, 0.24) a 2 = (1.65, 1.55) d) The canonical variables are computed by u 1 = 0.37 Price Value b 2 = (0.40, 0.57, 0.05, 0.00, 0.32, 0.61) v 1 = 0.38 Economy Service Design Sport Safety Easy.h. and u 2 = 1.65 Price Value v 2 = 0.40 Economy Service 0.05 Design Sport 0.32 Safety Easy.h. e) > # canonical correlations: > sqrt(eigen(e1)) [1] The canonical correlations are g 1 = 0.98 g 2 = 0.91 The relationship between both pairs of canonical variates thus seems to be quite strong. f) > dat.std <- apply(dat, 2, scale) > # compute canonical correlation variables: > u1 <- dat.std[,1:2] %*% a1 > u2 <- dat.std[,1:2] %*% a2 > v1 <- dat.std[,3:8] %*% b1 > v2 <- dat.std[,3:8] %*% b2 > # check covariance matrix: > round(var(cbind(u1,u2,v1,v2)),3) [,1] [,2] [,3] [,4] [1,] [2,] [3,] [4,]

3 3 g) u 1 = 0.37 Price Value v 1 = 0.38 Economy Service Design Sport Safety Easy.h. From the first pair of canonical variables (u 1, v 1 ), we see that Price is positively related to Economy, and negatively related to the remaining characteristics of a car (service, sportiness, safety and easy handling). The variable Value is negatively related to Economy and positively related to the other characteristics. The canonical variable u 1 can be interpreted as a value index of the car. On the one side, we observe cars with good (low) price and bad (high) appreciation of value such as Trabant and Wartburg and on the other side, we see cars with high price and good (low) appreciation of value such as BMW, Jaguar, Ferrari and Mercedes. Similarly, v 1 can be interpreted as a quality index consisting of variables such as service and safety. The value and quality indeces are highly correlated with the canonical correlation coefficient This can be seen in the following plot: > # plot the first canonical correlation variables: > plot(u1,v1,main="quality vs. value, correlation=0.98", xlab="u1 = 'value' of car", ylab="v1 = 'quality' of car", pch="") > text(u1,v1,labels=car$type) v1 = 'quality' of car aguar Ferrari Mercedes quality vs. value, correlation=0.98 BMW Mitsubishi Rover Audi Opel Volvo Fiat Lada Citroen Ford Nissan Opel Peugeot Mazda Renault Toyota Hyundai Wartbur Traban h) u1 = 'value' of car u 2 = 1.65 Price Value v 2 = 0.40 Economy Service 0.05 Design Sport 0.32 Safety Easy.h. The second pair of canonical variables provides more insight into the relation ship between the two sets of variables. u 2 has low values for cars with good marks both in price and value, e.g., and Opel. On the right hand side, we should see cars with bad marks in these two variables such as Ferrari and

4 4 Wartburg. The canonical variable v 2 consists mainly of variables economy and service. The position of cars is displayed in the plot below. > plot(u2, v2, xlab="u2", ylab="v2", pch="", main="v2 vs. u2") > text(u2,v2, labels=car$type) v2 vs. u2 Ferrar v Wartburg Jaguar Trabant Audi Rover Lada BMW Peugeot Citroen Mazda Volvo Renault Mitsubishi Mercedes Hyundai Opel Ford Toyota Nissan Fiat Opel u2 2. a) Read the data in with > car <- read.table(" sep=";", header=t, na.strings="") > X <- car[,6] > Y <- car[,c(3,10)] > dat <- cbind(x,y) > R <- cor(dat) > R11 <- R[1, 1] > R22 <- R[2:3, 2:3] > R12 <- R[1, 2:3] > R21 <- R[2:3, 1] > R11.inv <- solve(r11) > R22.inv <- solve(r22) > # compute E1 and E2 > E1 <- R11.inv %*% R12 %*% R22.inv %*% R21 > E2 <- R22.inv %*% R21 %*% R11.inv %*% R12 > # compute eigenvalues and eigenvectors of E1 and E2: > eigen(e1) [1] 0.624

5 5 [,1] [1,] 1 > eigen(e2) [1] 6.24e e-17 [,1] [,2] [1,] [2,] > # compute the canonical correlation vectors: > a1 <- eigen(e1)[,1] > b1 <- eigen(e2)[,1] > # correct the scaling of the canonical correlation vectors: > round((a1 <- -1 * a1 / sqrt(t(a1) %*% R11 %*% a1)),2) [,1] [1,] -1 > round((b1 <- -1 * b1 / sqrt(t(b1) %*% R22 %*% b1)),2) [1] The pair of canonical vectors thus is and the canonical variables are computed by a 1 = 1 u 1 = 1 Price b 1 = ( 1.17, 0.34) v 1 = 1.17 Economy Easy.h. We observe that the price has negative influence on the canonical variable v 1 which means that price is positively related to economy and negatively related to easy handling. > # canonical correlation: > sqrt(eigen(e1)) [1] 0.79 The canonical correlation is g 1 = > dat.std <- apply(dat, 2, scale) > # compute canonical correlation variables: > u1 <- dat.std[,1] %*% a1 > v1 <- dat.std[,2:3] %*% b1 > > plot(u1, v1, xlab="u1", ylab="v1", pch="", main="v1 vs. u1") > text(u1,v1, labels=car$type)

6 6 v v1 vs. u1 Opel Ford Fiat Toyota Mazda Hyundai Renault Nissan Peugeot Citroen Lada Wartburg Mitsubishi Opel Traban Rover Audi Volvo Jaguar Mercedes Ferrari BMW u1 We can see that the relationship between the two canonical variables is not so strong as in Exercise 1 where more variables from the same data set are analyzed. b) > fit <- lm(x~economy+easy.h., data=car) > summary(fit) Call: lm(formula = X ~ Economy + Easy.h., data = car) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) Economy e-05 *** Easy.h Signif. codes: 0 *** ** 0.01 * Residual standard error: on 21 degrees of freedom Multiple R-squared: 0.624, Adjusted R-squared: F-statistic: 17.4 on 2 and 21 DF, p-value: 3.48e a) The observations 110 and 111 have missing values for lchlorophyll. > fossil <- read.table(" header=t) > dat <- fossil[,c("sangle","llength","rwidth","sst.mean","salinity","lchlorophyll")] > dat <- dat[-c(110,111),]

7 7 b) > R <- cor(dat) > R11 <- R[1:3, 1:3] > R22 <- R[4:6, 4:6] > R12 <- R[1:3, 4:6] > R21 <- R[4:6, 1:3] > R11.inv <- solve(r11) > R22.inv <- solve(r22) > # compute E1 and E2 > E1 <- R11.inv %*% R12 %*% R22.inv %*% R21 > E2 <- R22.inv %*% R21 %*% R11.inv %*% R12 > # compute eigenvalues and eigenvectors of E1 and E2: > eigen(e1) [1] [,1] [,2] [,3] [1,] [2,] [3,] > eigen(e2) [1] [,1] [,2] [,3] [1,] [2,] [3,] > # compute the canonical correlation vectors: > a1 <- eigen(e1)[,1] > a2 <- eigen(e1)[,2] > a3 <- eigen(e1)[,3] > b1 <- eigen(e2)[,1] > b2 <- eigen(e2)[,2] > b3 <- eigen(e2)[,3] > # correct the scaling of the canonical correlation vectors: > round((a1 <- -1 * a1 / sqrt(t(a1) %*% R11 %*% a1)),2) [1] > round((a2 <- -1 * a2 / sqrt(t(a2) %*% R11 %*% a2)),2) [1] > round((a3 <- -1 * a3 / sqrt(t(a3) %*% R11 %*% a3)),2) [1] > round((b1 <- -1 * b1 / sqrt(t(b1) %*% R22 %*% b1)),2) [1] > round((b2 <- -1 * b2 / sqrt(t(b2) %*% R22 %*% b2)),2) [1] > round((b3 <- -1 * b3 / sqrt(t(b3) %*% R22 %*% b3)),2) [1] The first pair of canonical variables is computed by u 1 = 1.09 sangle llength rwidth v 1 = 1.04 SST.mean 0.26 Salinity lchlorophyll

8 8 The second one by and the third on by c) > # canonical correlations: > sqrt(eigen(e1)) [1] u 2 = 0.88 sangle llength rwidth v 2 = 0.39 SST.mean 0.97 Salinity 0.16 lchlorophyll u 3 = 0.41 sangle llength 1.33 rwidth v 3 = 0.03 SST.mean 0.34 Salinity 1.04 lchlorophyll The correlations of the first two pairs of canonical variables are quite high (0.88 and 0.56), the one of the third one not anymore. This means that u 3 and v 3 are almost uncorrelated, resp. in this case that v 3 probably has no influence on u 3. On the other hand, v 1 seems to have a large influence on u 1 as well as v 2 a quite large influence on u 2. d) The canonical variable u 1 mainly seems to be the sangle. It is not easy to find an interpretation of v 1 but it seems that the SST.mean has quite a large negative influence on sangle. The same holds for lchlorophyll. Salinity seems to have a positive influence on sangle. The canonical variable u 2 could be some kind of shape of the cocolith (big cocoliths, i.e. long and with a large width which have a small angle vs. small round cocoliths with a large angle). All the environmental variables seem to have a negative influence on the shape of a cocolith, meaning that if the environmental variables take high values the cocolith will be small with a large angle. > dat.std <- apply(dat,2,scale) > # compute canonical correlation variables: > u1 <- dat.std[,1:3] %*% a1 > u2 <- dat.std[,1:3] %*% a2 > v1 <- dat.std[,4:6] %*% b1 > v2 <- dat.std[,4:6] %*% b2 > # plot canonical correlation variables: > par(mfrow=c(1,2)) > plot(v1,u1,main="sqrt(angle) vs. v1", xlab="v1", ylab="u1") > plot(u2,v2, main="shape vs. v2", xlab="v2", ylab="u2") u sqrt(angle) vs. v1 u shape vs. v v v2

HawkEye Pro. NEW and EXCLUSIVE Professional Diagnostic tool for the workshop or mobile technician. Fully unlocked for ALL Land Rover vehicles*

HawkEye Pro. NEW and EXCLUSIVE Professional Diagnostic tool for the workshop or mobile technician. Fully unlocked for ALL Land Rover vehicles* NEW and EXCLUSIVE Professional Diagnostic tool for the workshop or mobile technician Fully unlocked for ALL Land Rover vehicles* * Exclusions Apply FREELANDER DEFENDER DISCOVERY RANGE ROVER A New diagnostic