6348 Final, Fall 14. Closed book, closed notes, no electronic devices. Points (out of 200) in parentheses.

Size: px

Start display at page:

Download "6348 Final, Fall 14. Closed book, closed notes, no electronic devices. Points (out of 200) in parentheses."

Janice Lee
5 years ago
Views:

1 6348 Final, Fall 14. Closed book, closed notes, no electronic devices. Points (out of 200) in parentheses (5) Give the result of the following matrix multiplication: Solution: (10) Here is graph of a bivariate data set. Identify the coordinates (x,y) of two points (two distinct circles in the graph) whose Mahalanobis distances from the mean vector (the x ) are about the same but whose Euclidean distances from the mean vector are quite different. FYI, the coordinates of the mean are approximately (7,70). Solution: (8.5, 75) and (9.4, 92) is one.

2 3.(5) The multivariate null hypothesis for the errors data was E(Y 1, Y 2 ) = (0, 0). What is the name of the multivariate testing procedure was used to test this hypothesis? Solution: Hotelling s T 2 test. 4.(10) List three distinct benefits of using a single multivariate test rather than several (one for each variable) univariate tests. Solution: i. You get (possibly) more power by incorporating correlation info. ii. You get the most significant linear combination. iii. You get a single test instead of several (simplicity). 5.A.(5) When would you use canonical correlation analysis? Solution: When you have two sets of variables and you want a single number to measure correlation between the two sets. 5.B. (5) In what way are the linear combinations that come from canonical correlation analysis best? Specifically, what is better about these linear combinations than any other linear combinations? Solution: They maximize the correlation. Any other linear combinations, one for each set of variables, will have smaller correlation that the correlation between the two canonical linear combinations (possibly equal if proportional, but certainly no larger). 6. Let Y = (F, 1, 2 ), a column vector of latent variables, and let X = (X 1, X 2 ), a column vector of manifest variables, defined as follows. X 1 = F + 2 1, X 2 = 2F A.(15) Write X = CY for an appropriate C Solution: X = Y.

3 6.B.(15) Using the CC T result and your answer to 6.A., find the covariance matrix of X. Assume the covariance matrix of Y is the identity matrix Solution: Cov(X) = (10) Draw the path diagram that is represented by the equations and text of problem 6. Solution: X 1 X F 8.(10) How do you standardize the data in column 3 of a multivariate data set? Not using R, but in words, how do you do it? Solution: Find the mean and standard deviation of the numbers in column 3. Then subtract the mean from every number in column s. Then divide all those differences by the standard deviation and you are done! 9.(7) The covariance matrix of a manifest measurement, Y, and the latent true value, T, that Y measures, is given as Y 100 cov T Find the reliability of Y as a measure of T. Solution: Reliability is squared correlation. The correlation here is ( 9)/(100 1) =.9. So reliability = (.9) 2 = 0.81.

4 10.(10) Parameter estimation in factor analysis, structural equations models, and path analysis is accomplished by choosing parameters that make two matrices as close as possible. What are these two matrices? Give brief descriptions of each. No math needed, but use good grammar. Solution: The model equations determine an implied form of the covariance matrix, one that depends on unknown parameters such as loadings, variances, and correlations. The parameter s values are chosen to make this model implied covariance matrix as close as possible to the ordinary covariance matrix of the data. 11.A. (5) When do you use polychoric correlation? Solution: When the two variables you want to correlate are highly discrete and ordinal. 11.B. (5) What goes wrong when you use the ordinary (Pearson) correlation when the polychoric correlation should have been be used? Solution: The ordinary (Pearson) correlation of the manifest discrete data is biased downward relative to the polychoric correlation, which is the correlation between the underlying continuous (latent) measures. 12.(15) Data on customer preference of the beverages coffee, tea, coca cola, and pepsi were collected, all on the usual 1 5 preference scale, with 5 denoting strong preference. A linear combination procedure (it does not matter which one) gives output as follows: COFFEE 0.56 TEA 0.51 COCACOLA 0.60 PEPSI 0.49 As discussed many times in class, you can imagine lining the customers up against the wall from lowest to highest (i.e., from left to right) values of their linear combinations. Which customers are at the high end (the right side) of the line up? Which ones are at the low end? Which ones are right in the middle? Based on this analysis, what does the linear combination measure about each customer? Solution: Those at the low end hate Coca Cola and Pepsi but love coffee and tea. Those at the high end hate coffee and tea but love Coca Cola and Pepsi. Those in the middle have no preference of coffee/tea over Coca cola/pepsi. Thus, this linear combination measures each customer s preference for Coca cola and pepsi over Coffee and Tea.

5 13.(10) The following graph is a biplot. What does it tell you about observation 6? Solution: Based on the projections onto the eigenspaces, observation 6 has a below average value for Y1 but an above average value for Y2. 14.(10) When testing that the mean of the errors data generating process was zero, the sample average was 0.011, and it was shown in class that the difference between and 0 is explainable by chance alone. What does explainable by chance alone mean here? Do not mention p value in your answer. Solution: We have to imagine a data generating process where the mean is zero. In R, rnorm(43, 0, 2) generates 3 observations from the normal distribution with mean 0 and standard deviation 2, and is an example of such a process. Now, even when data are produced by a process where the mean is zero, the sample average of data is not zero; this is explained by

6 chance alone. Different samples from the process that has mean 0 give different data averages, some are above zero and some are below. If the difference is within the typical (say 95%) range of such differences that are explained by chance alone, then we say that the difference between and 0 is explainable by chance alone. Multiple Choice questions, 3 points each. 15. What is the purpose of a copula? A. To reduce the dimensionality of a non normal multivariate data set. B. To simulate a non normal multivariate data set. C. To find a linear combination that best separates groups. D. To find a linear combination that maximizes the average R 2 statistic. 16. The exploratory factor analysis (EFA) model is Y = LF +. The covariance matrix of the error terms is. If the model is correct, with all the usual assumptions of EFA, what is the covariance matrix of Y? A. LL T + I B. LL T + C. L T L + I D. L T L What is a necessary condition to say that X causes Y? A. They both must be latent variables. B. They both must be manifest variables. C. X must precede Y. D. Y must precede X. 18. Where does the 2 statistic come from in SEM model fit testing? A. From the contingency table. B. From the log likelihood ratio. C. From the Mahalanobis distance. D. From the RMSR statistic.

7 19. When do missing values cause bias? A. When there are too many missing values. B. When their missingness is correlated with the data value. C. When the data are missing at random. D. When the data are missing completely at random. 20. What is an advantage of PLS path modeling over SEM? A. There is no need to assume existence of latent variables in PLS. B. PLS gives consistent estimates. C. PLS equates implied and observed covariance matrices. D. PLS uses Mahalanobis distances. 21. What is the most attractive feature of model based clustering? A. The method always assumes that a model produces the data. B. The method always finds the right number of clusters. C. The method always picks the right cluster to classify a future observation. D. The method works well with unusually shaped clusters, such as concentric circles. 22. Why is ordinary least squares (OLS) inappropriate with non recursive path models? A. Because the OLS estimates are biased B. Because the OLS fitted covariance matrix is not equal to the observed covariance matrix. C. Because the OLS assumption of normality may be violated. D. Because there may be extreme multicollinearity in the OLS fit. 23. What question is typically addressed with network analysis? A. Are the actors produced by a multivariate normal distribution? B. Is the covariance matrix of the actors similar to that implied by the model? C. Are the mean vectors of the actors different? D. Which actors are most important?

8 24. You have bivariate nominal data in R in the data frame called biv.n. How do you create a two way cross classification table? A. table(biv.n) B. biv.n(table) 25. What does the eigen function of R do? A. returns eigenvalues B. returns eigenvectors C. returns both eigenvalues and eigenvectors 26. What kind of a plot does the boxplot function in R give you? A. A histogram B. A q q plot C. A boxplot 27. What is the function for fitting regression models in R? A. regress B. lm C. aov D. manova 28. The psych package contains a function to compute Cronbach s alpha. What is it? A. t.test B. cor.alpha C. cronbach D. alpha 29. What does mvrnorm(n, Mu, Sigma) do? A. Simulates data from a multivariate normal distribution B. Performs multivariate regression analysis C. Tests the data set for multivariate normality D. Draws a graphs of chi square Mahalanobis distance plot 30. What kind of object does the candisc function require as input? A. A data frame B. A fitted model C. A covariance matrix D. A correlation matrix

Short Answer Questions: Answer on your separate blank paper. Points are given in parentheses.

Short Answer Questions: Answer on your separate blank paper. Points are given in parentheses. ISQS 6348 Final exam solutions. Name: Open book and notes, but no electronic devices. Answer short answer questions on separate blank paper. Answer multiple choice on this exam sheet. Put your name on