6. Let C and D be matrices conformable to multiplication. Then (CD) =

Size: px
Start display at page:

Download "6. Let C and D be matrices conformable to multiplication. Then (CD) ="

Transcription

1 Quiz 1. Name: 10 points per correct answer. (20 points for attendance). 1. Let A = 3 and B = [3 yy]. When is A equal to B? xx A. When x = 3 B. When y = 3 C. When x = y D. Never 2. See 1. What is the dimension of A? A. 1 B. 2 C. 1x2 D. 2x1 3. See 1. BA = 4. See 1. What is B -1? A. 1/3 + 1/y B. 1/3 1/y C. [1/3 1/y] D. B -1 does not exist 5. See 1. Suppose I = 1 0. Then I A = 0 1 A. does not exist B. = 3 xx C. = 3 + x D. = [ 3 xx]. 6. Let C and D be matrices conformable to multiplication. Then (CD) = A. D C B. C D C. C -1 D D. D -1 C 7. See 5. Trace(I) = A. 0 B. 1 C. 2 D See 5. I = A. 0 B. 1 C. 2 D. 4

2 Quiz 2. Name: Closed book and notes 1. Suppose (X,Y) are distributed as bivariate normal. Then the expected value of Y, given that X = x, is A. unrelated to x B. a normal function of x C. a linear function of x D. a quadratic function of x 2. Let Y = blood pressure, X = weight, and Z = age of a generic person. The partial correlation of (Y,X), given Z = 51, is A. The correlation between weight and age among 51-year olds B. The correlation between blood pressure and age among 51-year olds C. The correlation between blood pressure and weight among 51-year olds 3. What information do you need to find the partial correlation in 2.? A. Both the mean vector of (Y,X,Z) and the covariance matrix of (Y,X,Z) are needed B. Only the mean vector of (Y,X,Z) is needed C. Only the covariance matrix of (Y,X,Z) is needed 4. Let Z be an affine transformation of a vector X that has the multivariate normal distribution. Then Z has a distribution. A. multivariate normal B. multivariate Bernoulli C. multivariate chi-squared D. multivariate exponential

3 Quiz 3. Name: 1. What do the authors use for the dimension of a typical multivariate data matrix X? A. n n B. n q C. q n D. q q 2. Inside an R data frame, how is a missing value denoted? A. By NA B. By? C. By. D. By (i.e., by a blank space) 3. What is another term for the nontrivial extraction of implicit, previously unknown and potentially useful information from data? A. Multivariate analysis B. Multiple regression analysis C. Data mining D. Data wrangling 4. Suppose you have 4 variables in your multivariate data set. How many covariances are there? A. 4 B. 6 C. 9 D. 20

4 Quiz 4. Name: Closed book, notes, and no electronic devices. 1. A multivariate data set has 20 rows and 3 columns. How many normal quantile-quantile plots will you look at? A. 1 B. 3 C. 17 D A multivariate data set has 20 rows and 3 columns. How many chi-square quantilequantile plots will you look at? B. 1 B. 3 C. 17 D Suppose X ~ N3(µ, Σ) (a trivariate normal distribution). What is the distribution of (X µ) T Σ -1 (X - µ)? 2 A. N(0,1) B. N(3,1) C. χ 1 D. 2 χ 3 4. What is the expected appearance of the chi-squared quantile-quantile plot when the data come from a multivariate normal distribution? A. A straight line B. A bell curve C. A right skewed curve D. An S-shaped curve

5 Quiz 5. Name: Closed books, notes, and no electronic devices. Note: The first problem is a select all that apply problem. The rest are pure multiple choice. 1. According to the authors, what are the goals of using graphical displays of data? (Select all that apply; 5 points per correct selection/non-selection) A. To provide an overview B. To tell a story C. To suggest hypotheses D. To criticise a model 2. What kind of graph does the R command plot(x, Y) create? A. A bivariate boxplot B. A bivariate histogram C. A contour plot D. A scatterplot 3. How were the marginal distributions displayed on the same axes as a scatterplot in the reading? A. By using density plots B. By using rug plots C. By using histograms D. By using normal distributions 4. What is the point of using bubble and glyph plots? A. To estimate the bivariate distributions B. To check whether the distribution is bivariate normal C. To counter the effect of discreteness D. To visualize data that have more than two dimensions

6 Quiz 6. Name: Closed book, notes, and no electronic devices. 1. Why do the authors complain about using bivariate histograms? A. The volume under the histogram is not 1.0 B. There is often not enough data for reliable estimates C. They are too smooth D. They can show non-bivariate normal appearances 2. A kernel density estimator is a estimate of the distribution that produced the data. A. Parametric B. Nonparametric C. Univariate normal D. Multivariate normal 3. The density estimate that uses Gaussian kernels is a sum of bumps? What is an individual bump? A. A single normal density function B. A single data point C. A collection of nearby data points D. A rectangular box centered over the data value 4. What is an example of a perspective plot? A. A two-dimensional scatterplot B. A three-dimensional scatterplot C. A contour plot of a bivariate density function D. A three-dimensional plot of a bivariate density function

7 Quiz 7. Name: Closed books, notes, and no electronic devices. 1. What is a "principal component"? A. a linear combination B. an eigenvalue C. an eigenvector D. a variance 2. What is tr(r) when R is a correlation matrix calculated from an nxq data matrix? A. n B. q C. n-q D. n+q 3. When there are 10 original variables, what percentage of the variation in the 10 original variables is accounted for by all ten principal components? A. 1% B. 10% C. 90% D. 100% 4. When the principal components are extracted from the correlation matrix, then the average variance of the principal components is A. -1 B. 0 C. 1 D. 100

8 Quiz 8. Name: Closed, books, notes, and no electronic devices 1. In the first example about head lengths, the second principal component was essentially A. the sum of the head lengths of the two sons B. the mean of the head lengths sons C. the head length of the first born son D. the difference of the head lengths of the two sons 2. In the heptathlon example, why was the correlation between the first principal component (PC) score and the standard scoring system negative? A. Because the loadings were all negative B. Because the first PC is unrelated to the standard scoring C. Because the first PC explained so little variance in the heptathlon results D. Because the eigenvalue of the first PC was negative 3. Which principal component scores were useful to predict Sulfur Dioxide concentration? A. Potentially, all of them B. Only the first two C. Only the last two D. Only those whose eigenvalue was greater than How does canonical correlation analysis (CCA) differ from principal components analysis (PCA)? A. CCA is derived from relationships between sets of variables while PCA is derived from relations within a set of variables B. CCA uses unstandardized data while PCA uses standardized data C. CCA finds nonlinear relationships between variable while PCA finds linear relationships between variables D. CCA allows the data to come from non-multivariate normal distributions, while PCA assumes the data come from multivariate normal distributions

9 Quiz 9. Name: Closed, books, notes, and no electronic devices 5. In the first example about head lengths, the first principal component was essentially E. the sum of the head lengths of the two sons F. the head length of the first born son G. the head length of the second born son H. the difference of the head lengths of the two sons 6. In the heptathlon example, what was the relationship between the first principal component (PC) score and the score assigned by official Olympic rules? E. A strong inverse (negative) relationship F. A weak inverse (negative) relationship G. A strong direct (positive) relationship H. A weak direct (positive) relationship 7. Which principal component scores were statistically significant to predict Sulfur Dioxide concentration? E. All of them F. The first two G. Those whose eigenvalue was greater than 1.0 H. Those whose p-value was less than 0.05 in the regression model 8. Suppose the computer gives you the eigenvalue/eigenvector pair for a covariance YY matrix of a vector Y = YY 2 as { 3.4, 0.53 }. What is the variance of the linear YY combination 0.80Y Y Y3? A B C. 3.4 D

10 Quiz 10. Name: 1. Multidimensional scaling (MDS) is a model for the A. nxq data matrix B. qxq covariance matrix C. qxq correlation matrix D. nxn proximity matrix 2. What did the MDS graph of the airline data purport to show? A. The clustering of airports on either the west or east coast B. The airports that are considered outliers C. The known spatial arrangement of the airports D. The three-dimensional locations of the airplanes 3. In the MDS analysis of the skull data, distance between epochs (time periods) was determined using A. Mahalanobis distance B. Average distance C. Euclidean distance D. Manhattan distance 4. The conclusion from the MDS analysis of water voles was that A. British water voles in different regions are relatively different from one another B. British water voles in different regions are relatively similar to one another C. British water voles are similar to Spanish water voles D. British water voles have longer tails than Yugoslavian water voles

11 Quiz 11. Name: 5. What kind of data do you use for a standard correspondence analysis? E. Bivariate categorical data F. Multivariate normal data G. Correlation matrix data H. Covariance matrix data 6. Correspondence analysis is commonly used to supplement which statistical test? E. Contingency table test for independence F. Two-sample t-test G. Pearson product-moment correlation test H. Spearman rank correlation test 7. Like multidimensioal scaling, correspondence analysis also uses distance matrices. Which distance is used? E. Mahalanobis distance F. Average distance G. Euclidean distance H. Chi-squared distance 8. The conclusion from the correspondence analysis graph in the real example was that E. British water voles in different regions are relatively different from one another F. The first born son is more similar to the second born son than he is to the third born son G. Historical skulls in the epoch c4000bc are more similar to skulls in c3300bc than they are to skulls in cad150 H. Girls years old are more likely to have sexual relations than girls years old

12 Quiz 12. Name: 1. The factor analysis model states that the A. manifest variables are functions of the latent variables B. latent variables are functions of the manifest variables C. manifest variables are functions of linear combinations of the manifest variables D. latent variables are linear combinations of the manifest variables 2. What is assumed about a latent variable f in factor analysis? A. E(f) = 0 and Var(f) = 0 B. E(f) = 0 and Var(f) = 1 C. E(f) = 1 and Var(f) = 0 D. E(f) = 1 and Var(f) = 1 3. The coefficient that multiplies a latent variable in a factor analysis model is called a A. squared correlation B. standard deviation C. eigenvector component D. loading 4. In the example where the factor analysis model was used to predict the exam scores in Classics, French, and English, what latent variable was assumed for the factor? A. Socioeconomic status B. Intelligence or intellectual ability C. Time spent studying for the exams D. Student-teacher ratio

13 Quiz 13. Name: 1. What is the conclusion of the Scale Invariance section? A. You should standardize the data before performing factor analysis B. When the variances of the variables differ greatly, you should use the correlation matrix in factor analysis C. The results of factor analysis are essentially equivalent, not matter whether the covariance matrix or correlation matrix is used D. You should use the scale function in R prior to performing factor analysis 2. What are the parameters that you have to estimate in the factor analysis model? A. The values of the latent variables and the loadings B. The loadings and the specific variances C. The values of the latent variable and the specific variances D. The specific means and the specific variances 3. What is the most respectable method of estimating the parameters in the factor analysis model? A. The spectral decomposition method B. The least squares method C. The maximum likelihood method D. The principal factor method 4. What is the main goal of factor rotation? A. To achieve a simple structure of the loading pattern B. To make the data better conform to the assumptions of factor analysis C. To better estimate the values of the latent variables D. To better estimate the values of the specific variances

14 Quiz 14. Name: 1. In the expectations of life example, the manifest variables were A. four male life expectancy measures and four female life expectancy measures B. different countries C. three factors defining life expectancy D. the columns of the matrix ΛΛ Τ + Ψ 2. In the expectations of life example, the latent variables were life force measures defined for A. different people B. different countries C. different years D. different companies 3. In both the drug Use example and the expectations of life example, how did they decide how many factors to choose? A. By using a chi-square test B. By using a t test C. By noting that the percentage of variance explained was more than 70% D. By noting that the percentage of variance explained was more than 90% 4. In the drug use example, the latent variables were drug seeking measures defined for A. different students B. different schools C. different years D. different teachers

15 Quiz 15. Name: 1. What is another term for clusters? A. Normal groups B. Non-normal groups C. Homogenous groups D. Heterogeneous groups 2. Consider the following scatterplot. How many clusters are there? A. 2 B. 3 C. 4 D. It is not clear how many clusters there are 3. Before the process of agglomerative clustering can begin, what must first be calculated? A. A dendrogram B. A covariance or correlation matrix C. A mean vector and a covariance matrix D. A similarity or distance matrix 4. What does the two-cluster solution of jet fighters largely correspond to? A. High fuel efficiency jets and low fuel efficiency jets B. High altitude jets and low altitude jets C. Jets that can or cannot land on a carrier D. Old jets and new jets

16 Quiz 16. Name: 1. In the k-means clustering of entities based on crime rates, which was the outlier? A. Texas B. New York C. Washington, DC D. Honolulu 2. How were the variables standardized in the crime rate example? A. By dividing by their respective ranges B. By multiplying by their respective ranges C. By dividing by their respective standard deviations D. By multiplying by their respective standard deviations 3. When is a k-means clustering solution good? A. When the total number of clusters is large B. When the average number of observations within a cluster is high C. When the correlations between variables within a cluster are larger than 0.7 D. When the distances from within-cluster data values to the cluster mean are small 4. The clusters found in the Greco-Roman pottery example correspond best to A. Region where the pottery was found B. Year of discovery of the pottery C. Degree of solubility (in water) of the pottery D. Heat at which the pottery was fired

17 Quiz 16. Name: 5. In the k-means clustering of entities based on crime rates, which was the outlier? B. Texas B. New York C. Washington, DC D. Honolulu 6. How were the variables standardized in the crime rate example? E. By dividing by their respective ranges F. By multiplying by their respective ranges G. By dividing by their respective standard deviations H. By multiplying by their respective standard deviations 7. When is a k-means clustering solution good? E. When the total number of clusters is large F. When the average number of observations within a cluster is high G. When the correlations between variables within a cluster are larger than 0.7 H. When the distances from within-cluster data values to the cluster mean are small 8. The clusters found in the Greco-Roman pottery example correspond best to E. Region where the pottery was found F. Year of discovery of the pottery G. Degree of solubility (in water) of the pottery H. Heat at which the pottery was fired

18 Quiz 16. Name: 9. In the k-means clustering of entities based on crime rates, which was the outlier? C. Texas B. New York C. Washington, DC D. Honolulu 10. How were the variables standardized in the crime rate example? I. By dividing by their respective ranges J. By multiplying by their respective ranges K. By dividing by their respective standard deviations L. By multiplying by their respective standard deviations 11. When is a k-means clustering solution good? I. When the total number of clusters is large J. When the average number of observations within a cluster is high K. When the correlations between variables within a cluster are larger than 0.7 L. When the distances from within-cluster data values to the cluster mean are small 12. The clusters found in the Greco-Roman pottery example correspond best to I. Region where the pottery was found J. Year of discovery of the pottery K. Degree of solubility (in water) of the pottery L. Heat at which the pottery was fired

19 Quiz 17. Name: 1. What is the model in model-based clustering? A. The maximum likelihood method B. The hierarchical agglomeration method C. The Euclidean distance method D. The mixture of normals distribution 2. What do you assume about the shapes of the data in the clusters when you use model-based clustering? A. They are spherical B. They are elongated with the same orientation C. They can have different shapes but have the same volume D. They are ellipsoidal 3. In the diabetes analysis, what do the three chosen clusters refer to? A. three clinical diagnosis groups B. three levels of blood glucose C. young, middle age, and elderly patients D. three drug treatment groups 4. The mclust function gives results most similar to what? A. principal components B. factor analysis C. kernel density estimation D. multidimensional scaling

20 Quiz 17. Name: 1. What is the model in model-based clustering? B. The maximum likelihood method B. The hierarchical agglomeration method C. The Euclidean distance method D. The mixture of normals distribution 2. What do you assume about the shapes of the data in the clusters when you use model-based clustering? A. They are spherical B. They are elongated with the same orientation C. They can have different shapes but have the same volume D. They are ellipsoidal 3. In the diabetes analysis, what do the three chosen clusters refer to? A. three clinical diagnosis groups B. three levels of blood glucose C. young, middle age, and elderly patients D. three drug treatment groups 4. The mclust function gives results most similar to what? A. principal components B. factor analysis C. kernel density estimation D. multidimensional scaling

21 Quiz 18. Name: 1. In the wine example, what are the three known classes (or groups)? A. Wine in either a small, medium, or large bottle B. Wine from three different grape varieties (cultivars) C. Wine aged in either French oak, American oak, or stainless steel containers D. Red, white, or rosé wine 2. In general (not necessarily for the wine data), which model is fit for the within-group data when using MclustDA? A. A multivariate normal model with diagonal covariance matrix B. A multivariate normal model with any covariance matrix C. A mixture of multivariate normals model with diagonal covariance matrices D. A mixture of multivariate normals model with any covariance matrices 3. What kind of data are used to estimate the distributions used in the discriminant analysis model? A. Distance data B. Covariance data C. Training data D. Test data 4. What is test error? A. the proportion of test cases that are misclassified B. the difference between fitted and observed data in the test data set C. the sum of Type I and Type II errors for the likelihood ratio test D. the number of mistakes made on an examination

22 Quiz 19. Name: 5. How does exploratory factor analysis (EFA) differ from confirmatory factor analysis (CFA)? A. EFA constrains loadings to zero; CFA does not B. CFA constrains loadings to zero; EFA does not C. EFA assumes multivariate normal distributions; CFA does not D. CFA assumes multivariate normal distributions; EFA does not 6. What kind of data can be used to estimate CFA and/or structural equation (SEM) models? A. Covariance data B. Distance data C. Latent data D. Nominal data 7. The CFA and SEM model parameters are chosen to minimize a discrepancy function. How is discrepancy determined? A. By least squared differences B. By weighted least squared differences C. By average relative absolute difference D. By maximum likelihood assuming multivariate normality 8. The parameters θ of the SEM/CFA model are identifiable if A. Σ(θ1) = Σ(θ2) implies that θ1 = θ2. B. θ1 = θ2 implies that Σ(θ1) = Σ(θ2). C. The GFI index is greater than 0.90 or 0.95 D. The GFI index is less than 0.90 or 0.95

23 Quiz 20. Name: PCA = Principal Components Analysis EFA = Exploratory Factor Analysis CFA = Confirmatory Factor Analysis 1. (20) The reading said EFA was usually and PCA was usually. A. an exploratory method; a confirmatory method B. a confirmatory method; an exploratory method C. an end in and of itself; a means to an end D. a means to an end; an end in and of itself 2. (40) In which ways are EFA and CFA are similar? Select all that apply. Five points per correct selection/nonselection. A. Both allow rotation of the loadings B. Both assume uncorrelated factors C. Both assume uncorrelated errors D. Both assume reflective measurements E. Both are exploratory F. Both allow you to set the loadings to zero a priori G. Both use maximum likelihood assuming multivariate normality to estimate parameters H. Both are models to explain the covariance structure of the observed variables 3. (20 How do you decide how many latent factors to use in your CFA model? (Select the best single answer) A. By using your a priori theory about the measurements B. By using a scree plot C. By using the chi-square test D. By using the discrepancy statistic

24 Quiz 21. Name: 1. The chi-square tests of model fit (i.e., tests of H0: Σ = Σ(θ), where Σ(θ) is the modelimplied covariance matrix) discussed in Section 7.3 gave the following results. Ability and Aspiration study: Drug use study: A. Significant (p <.05); Significant (p <.05) B. Significant (p <.05); Insignificant (p >.05) C. Insignificant (p >.05); Significant (p <.05) D. Insignificant (p >.05); Insignificant (p >.05) 2. What is a normed residual? A. The difference between an observed and model-implied covariance B. The scaled difference between an observed and a model-implied covariance C. The difference between the observed data vector Xi and the predicted data vector D. The scaled difference between the observed data vector Xi and the predicted data vector 3. How is correlation between factors modelled using the R function sem? A. F1 <-> F2 B. F2 ~ F2 C. F1 ~~ F2 D. F1 >-< F2 4. In the video, how did the author initially choose three latent variables in the Confirmatory Factor Analysis? A. By using regression analysis B. By using principal components analysis C. By using exploratory factor analysis D. By using structural equation modeling 5. The usual CFA model fixes the variances of the latent variables at 1.0. The video allowed a second method where those three variances were free parameters, but A. the disturbance terms (error variances) were all fixed at 1.0 B. three of the disturbance terms (error variances) were fixed at 1.0 C. the loadings from the latent variable to manifest variables were all fixed at 1.0 D. three of the loadings from the latent variable to manifest variables were fixed at 1.0

25 6. In my data analysis file, the negative items were replaced via the code A. Q = Q B. Q = Q 1 C. Q = 5 Q D. Q = 6 Q Consider the following plot, from my data analysis. 7. This is an example of what kind of structural model? A. Mediation B. Standard regression C. Multivariate regression D. Hierarchical factor 8. See the path diagram. The 0.56 between Q28 and Q29 refers to A. the variance of the common factor for Q28 and Q29 B. the correlation between Q28 and Q29 C. the covariance between Q28 and Q29 D. the covariance between error terms for Q28 and Q29

26 Quiz 22. Name: 1. What is the main distinguishing feature of structural equation models? A. Non-identifiability of parameters B. Uncorrelated latent factors C. Direct relationships between the manifest variables D. Direct relationships between the latent variables 2. The variable socioeconomic status was a variable in the study A. latent B. manifest C. normally distributed D. linear combination 3. The variable Alienation71 was modelled as a function of which variables? A. Alientation67, SES B. Alienation67, Powerlesnness71 C. Anomia71, Powerlesnness71 D. Anomia67, Powerlesnness67 4. What was the result of the test of model fit (i.e., the test of H0: Σ = Σ(θ), where Σ(θ) is the model-implied covariance matrix) after modifying the model by allowing the measurement errors for anomia in 1967 and in 1971 to be correlated? A. Chi-square fit statistic with 6 degrees of freedom = ; model still fits poorly B. Chi-square fit statistic with 5 degrees of freedom = 6.359; model fit is better C. Chi-square fit statistic with 5 degrees of freedom = 6.359; model still fits poorly D. Chi-square fit statistic with 6 degrees of freedom = ; model fit is better

Short Answer Questions: Answer on your separate blank paper. Points are given in parentheses.

Short Answer Questions: Answer on your separate blank paper. Points are given in parentheses. ISQS 6348 Final exam solutions. Name: Open book and notes, but no electronic devices. Answer short answer questions on separate blank paper. Answer multiple choice on this exam sheet. Put your name on

More information

An Introduction to Applied Multivariate Analysis with R

An Introduction to Applied Multivariate Analysis with R ~ Snrinuer Brian Everitt Torsten Hathorn An Introduction to Applied Multivariate Analysis with R > Preface........................................................ vii 1 Multivariate Data and Multivariate

More information

6348 Final, Fall 14. Closed book, closed notes, no electronic devices. Points (out of 200) in parentheses.

6348 Final, Fall 14. Closed book, closed notes, no electronic devices. Points (out of 200) in parentheses. 6348 Final, Fall 14. Closed book, closed notes, no electronic devices. Points (out of 200) in parentheses. 0 11 1 1.(5) Give the result of the following matrix multiplication: 1 10 1 Solution: 0 1 1 2

More information

Experimental Design and Data Analysis for Biologists

Experimental Design and Data Analysis for Biologists Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1

More information

STRUCTURAL EQUATION MODELING. Khaled Bedair Statistics Department Virginia Tech LISA, Summer 2013

STRUCTURAL EQUATION MODELING. Khaled Bedair Statistics Department Virginia Tech LISA, Summer 2013 STRUCTURAL EQUATION MODELING Khaled Bedair Statistics Department Virginia Tech LISA, Summer 2013 Introduction: Path analysis Path Analysis is used to estimate a system of equations in which all of the

More information

Dimensionality Reduction Techniques (DRT)

Dimensionality Reduction Techniques (DRT) Dimensionality Reduction Techniques (DRT) Introduction: Sometimes we have lot of variables in the data for analysis which create multidimensional matrix. To simplify calculation and to get appropriate,

More information

Factor analysis. George Balabanis

Factor analysis. George Balabanis Factor analysis George Balabanis Key Concepts and Terms Deviation. A deviation is a value minus its mean: x - mean x Variance is a measure of how spread out a distribution is. It is computed as the average

More information

Structural Equation Modeling and Confirmatory Factor Analysis. Types of Variables

Structural Equation Modeling and Confirmatory Factor Analysis. Types of Variables /4/04 Structural Equation Modeling and Confirmatory Factor Analysis Advanced Statistics for Researchers Session 3 Dr. Chris Rakes Website: http://csrakes.yolasite.com Email: Rakes@umbc.edu Twitter: @RakesChris

More information

1 A factor can be considered to be an underlying latent variable: (a) on which people differ. (b) that is explained by unknown variables

1 A factor can be considered to be an underlying latent variable: (a) on which people differ. (b) that is explained by unknown variables 1 A factor can be considered to be an underlying latent variable: (a) on which people differ (b) that is explained by unknown variables (c) that cannot be defined (d) that is influenced by observed variables

More information

Article begins on next page

Article begins on next page A Short Course in Multivariate Statistical Methods with R = Олон хэмжээст статистикийн богино хэмжээний сургалт R прогамм дээр Rutgers University has made this article freely available. Please share how

More information

sphericity, 5-29, 5-32 residuals, 7-1 spread and level, 2-17 t test, 1-13 transformations, 2-15 violations, 1-19

sphericity, 5-29, 5-32 residuals, 7-1 spread and level, 2-17 t test, 1-13 transformations, 2-15 violations, 1-19 additive tree structure, 10-28 ADDTREE, 10-51, 10-53 EXTREE, 10-31 four point condition, 10-29 ADDTREE, 10-28, 10-51, 10-53 adjusted R 2, 8-7 ALSCAL, 10-49 ANCOVA, 9-1 assumptions, 9-5 example, 9-7 MANOVA

More information

1. Introduction to Multivariate Analysis

1. Introduction to Multivariate Analysis 1. Introduction to Multivariate Analysis Isabel M. Rodrigues 1 / 44 1.1 Overview of multivariate methods and main objectives. WHY MULTIVARIATE ANALYSIS? Multivariate statistical analysis is concerned with

More information

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data. Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two

More information

Introduction to Confirmatory Factor Analysis

Introduction to Confirmatory Factor Analysis Introduction to Confirmatory Factor Analysis Multivariate Methods in Education ERSH 8350 Lecture #12 November 16, 2011 ERSH 8350: Lecture 12 Today s Class An Introduction to: Confirmatory Factor Analysis

More information

Factor Analysis. Qian-Li Xue

Factor Analysis. Qian-Li Xue Factor Analysis Qian-Li Xue Biostatistics Program Harvard Catalyst The Harvard Clinical & Translational Science Center Short course, October 7, 06 Well-used latent variable models Latent variable scale

More information

Introduction to Factor Analysis

Introduction to Factor Analysis to Factor Analysis Lecture 10 August 2, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #10-8/3/2011 Slide 1 of 55 Today s Lecture Factor Analysis Today s Lecture Exploratory

More information

Table of Contents. Multivariate methods. Introduction II. Introduction I

Table of Contents. Multivariate methods. Introduction II. Introduction I Table of Contents Introduction Antti Penttilä Department of Physics University of Helsinki Exactum summer school, 04 Construction of multinormal distribution Test of multinormality with 3 Interpretation

More information

Chapter 4: Factor Analysis

Chapter 4: Factor Analysis Chapter 4: Factor Analysis In many studies, we may not be able to measure directly the variables of interest. We can merely collect data on other variables which may be related to the variables of interest.

More information

Contents. Acknowledgments. xix

Contents. Acknowledgments. xix Table of Preface Acknowledgments page xv xix 1 Introduction 1 The Role of the Computer in Data Analysis 1 Statistics: Descriptive and Inferential 2 Variables and Constants 3 The Measurement of Variables

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

Canonical Correlation & Principle Components Analysis

Canonical Correlation & Principle Components Analysis Canonical Correlation & Principle Components Analysis Aaron French Canonical Correlation Canonical Correlation is used to analyze correlation between two sets of variables when there is one set of IVs

More information

Wolfgang Karl Härdle Leopold Simar. Applied Multivariate. Statistical Analysis. Fourth Edition. ö Springer

Wolfgang Karl Härdle Leopold Simar. Applied Multivariate. Statistical Analysis. Fourth Edition. ö Springer Wolfgang Karl Härdle Leopold Simar Applied Multivariate Statistical Analysis Fourth Edition ö Springer Contents Part I Descriptive Techniques 1 Comparison of Batches 3 1.1 Boxplots 4 1.2 Histograms 11

More information

Principal Component Analysis, A Powerful Scoring Technique

Principal Component Analysis, A Powerful Scoring Technique Principal Component Analysis, A Powerful Scoring Technique George C. J. Fernandez, University of Nevada - Reno, Reno NV 89557 ABSTRACT Data mining is a collection of analytical techniques to uncover new

More information

Introduction to Factor Analysis

Introduction to Factor Analysis to Factor Analysis Lecture 11 November 2, 2005 Multivariate Analysis Lecture #11-11/2/2005 Slide 1 of 58 Today s Lecture Factor Analysis. Today s Lecture Exploratory factor analysis (EFA). Confirmatory

More information

Unconstrained Ordination

Unconstrained Ordination Unconstrained Ordination Sites Species A Species B Species C Species D Species E 1 0 (1) 5 (1) 1 (1) 10 (4) 10 (4) 2 2 (3) 8 (3) 4 (3) 12 (6) 20 (6) 3 8 (6) 20 (6) 10 (6) 1 (2) 3 (2) 4 4 (5) 11 (5) 8 (5)

More information

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Fundamentals to Biostatistics Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Statistics collection, analysis, interpretation of data development of new

More information

LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS

LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS NOTES FROM PRE- LECTURE RECORDING ON PCA PCA and EFA have similar goals. They are substantially different in important ways. The goal

More information

Linear Dimensionality Reduction

Linear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis

More information

Applied Multivariate Statistical Analysis Richard Johnson Dean Wichern Sixth Edition

Applied Multivariate Statistical Analysis Richard Johnson Dean Wichern Sixth Edition Applied Multivariate Statistical Analysis Richard Johnson Dean Wichern Sixth Edition Pearson Education Limited Edinburgh Gate Harlow Essex CM20 2JE England and Associated Companies throughout the world

More information

Applied Multivariate Analysis

Applied Multivariate Analysis Department of Mathematics and Statistics, University of Vaasa, Finland Spring 2017 Dimension reduction Exploratory (EFA) Background While the motivation in PCA is to replace the original (correlated) variables

More information

Dependence. MFM Practitioner Module: Risk & Asset Allocation. John Dodson. September 11, Dependence. John Dodson. Outline.

Dependence. MFM Practitioner Module: Risk & Asset Allocation. John Dodson. September 11, Dependence. John Dodson. Outline. MFM Practitioner Module: Risk & Asset Allocation September 11, 2013 Before we define dependence, it is useful to define Random variables X and Y are independent iff For all x, y. In particular, F (X,Y

More information

Introduction to Matrix Algebra and the Multivariate Normal Distribution

Introduction to Matrix Algebra and the Multivariate Normal Distribution Introduction to Matrix Algebra and the Multivariate Normal Distribution Introduction to Structural Equation Modeling Lecture #2 January 18, 2012 ERSH 8750: Lecture 2 Motivation for Learning the Multivariate

More information

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Introduction Edps/Psych/Stat/ 584 Applied Multivariate Statistics Carolyn J Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN c Board of Trustees,

More information

Principal Component Analysis

Principal Component Analysis I.T. Jolliffe Principal Component Analysis Second Edition With 28 Illustrations Springer Contents Preface to the Second Edition Preface to the First Edition Acknowledgments List of Figures List of Tables

More information

Subject CS1 Actuarial Statistics 1 Core Principles

Subject CS1 Actuarial Statistics 1 Core Principles Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and

More information

From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. About This Book... xiii About The Author...

From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. About This Book... xiii About The Author... From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. Contents About This Book... xiii About The Author... xxiii Chapter 1 Getting Started: Data Analysis with JMP...

More information

Multivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis

Multivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis Multivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis For example Data reduction approaches Cluster analysis Principal components analysis

More information

HANDBOOK OF APPLICABLE MATHEMATICS

HANDBOOK OF APPLICABLE MATHEMATICS HANDBOOK OF APPLICABLE MATHEMATICS Chief Editor: Walter Ledermann Volume VI: Statistics PART A Edited by Emlyn Lloyd University of Lancaster A Wiley-Interscience Publication JOHN WILEY & SONS Chichester

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 1: Introduction, Multivariate Location and Scatter

MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 1: Introduction, Multivariate Location and Scatter MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 1:, Multivariate Location Contents , pauliina.ilmonen(a)aalto.fi Lectures on Mondays 12.15-14.00 (2.1. - 6.2., 20.2. - 27.3.), U147 (U5) Exercises

More information

Basic Statistical Tools

Basic Statistical Tools Structural Health Monitoring Using Statistical Pattern Recognition Basic Statistical Tools Presented by Charles R. Farrar, Ph.D., P.E. Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants

More information

Introduction to Structural Equation Modeling

Introduction to Structural Equation Modeling Introduction to Structural Equation Modeling Notes Prepared by: Lisa Lix, PhD Manitoba Centre for Health Policy Topics Section I: Introduction Section II: Review of Statistical Concepts and Regression

More information

Exploratory Factor Analysis and Principal Component Analysis

Exploratory Factor Analysis and Principal Component Analysis Exploratory Factor Analysis and Principal Component Analysis Today s Topics: What are EFA and PCA for? Planning a factor analytic study Analysis steps: Extraction methods How many factors Rotation and

More information

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis: Uses one group of variables (we will call this X) In

More information

Pharmaceutical Experimental Design and Interpretation

Pharmaceutical Experimental Design and Interpretation Pharmaceutical Experimental Design and Interpretation N. ANTHONY ARMSTRONG, B. Pharm., Ph.D., F.R.Pharm.S., MCPP. KENNETH C. JAMES, M. Pharm., Ph.D., D.Sc, FRSC, F.R.Pharm.S., C.Chem. Welsh School of Pharmacy,

More information

STAT 730 Chapter 1 Background

STAT 730 Chapter 1 Background STAT 730 Chapter 1 Background Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 27 Logistics Course notes hopefully posted evening before lecture,

More information

FINM 331: MULTIVARIATE DATA ANALYSIS FALL 2017 PROBLEM SET 3

FINM 331: MULTIVARIATE DATA ANALYSIS FALL 2017 PROBLEM SET 3 FINM 331: MULTIVARIATE DATA ANALYSIS FALL 2017 PROBLEM SET 3 The required files for all problems can be found in: http://www.stat.uchicago.edu/~lekheng/courses/331/hw3/ The file name indicates which problem

More information

Key Algebraic Results in Linear Regression

Key Algebraic Results in Linear Regression Key Algebraic Results in Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) 1 / 30 Key Algebraic Results in

More information

Readings Howitt & Cramer (2014) Overview

Readings Howitt & Cramer (2014) Overview Readings Howitt & Cramer (4) Ch 7: Relationships between two or more variables: Diagrams and tables Ch 8: Correlation coefficients: Pearson correlation and Spearman s rho Ch : Statistical significance

More information

Statistícal Methods for Spatial Data Analysis

Statistícal Methods for Spatial Data Analysis Texts in Statistícal Science Statistícal Methods for Spatial Data Analysis V- Oliver Schabenberger Carol A. Gotway PCT CHAPMAN & K Contents Preface xv 1 Introduction 1 1.1 The Need for Spatial Analysis

More information

Multivariate Statistics (I) 2. Principal Component Analysis (PCA)

Multivariate Statistics (I) 2. Principal Component Analysis (PCA) Multivariate Statistics (I) 2. Principal Component Analysis (PCA) 2.1 Comprehension of PCA 2.2 Concepts of PCs 2.3 Algebraic derivation of PCs 2.4 Selection and goodness-of-fit of PCs 2.5 Algebraic derivation

More information

A User's Guide To Principal Components

A User's Guide To Principal Components A User's Guide To Principal Components J. EDWARD JACKSON A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York Chichester Brisbane Toronto Singapore Contents Preface Introduction 1. Getting

More information

Exploratory Factor Analysis and Principal Component Analysis

Exploratory Factor Analysis and Principal Component Analysis Exploratory Factor Analysis and Principal Component Analysis Today s Topics: What are EFA and PCA for? Planning a factor analytic study Analysis steps: Extraction methods How many factors Rotation and

More information

THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2014, Mr. Ruey S. Tsay. Solutions to Final Exam

THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2014, Mr. Ruey S. Tsay. Solutions to Final Exam THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2014, Mr. Ruey S. Tsay Solutions to Final Exam 1. City crime: The distance matrix is 694 915 1073 528 716 881 972 464

More information

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA Examples: Multilevel Modeling With Complex Survey Data CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA Complex survey data refers to data obtained by stratification, cluster sampling and/or

More information

STAT 730 Chapter 9: Factor analysis

STAT 730 Chapter 9: Factor analysis STAT 730 Chapter 9: Factor analysis Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Data Analysis 1 / 15 Basic idea Factor analysis attempts to explain the

More information

1 A Review of Correlation and Regression

1 A Review of Correlation and Regression 1 A Review of Correlation and Regression SW, Chapter 12 Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then

More information

Readings Howitt & Cramer (2014)

Readings Howitt & Cramer (2014) Readings Howitt & Cramer (014) Ch 7: Relationships between two or more variables: Diagrams and tables Ch 8: Correlation coefficients: Pearson correlation and Spearman s rho Ch 11: Statistical significance

More information

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR

More information

psychological statistics

psychological statistics psychological statistics B Sc. Counselling Psychology 011 Admission onwards III SEMESTER COMPLEMENTARY COURSE UNIVERSITY OF CALICUT SCHOOL OF DISTANCE EDUCATION CALICUT UNIVERSITY.P.O., MALAPPURAM, KERALA,

More information

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric Assumptions The observations must be independent. Dependent variable should be continuous

More information

Statistics Toolbox 6. Apply statistical algorithms and probability models

Statistics Toolbox 6. Apply statistical algorithms and probability models Statistics Toolbox 6 Apply statistical algorithms and probability models Statistics Toolbox provides engineers, scientists, researchers, financial analysts, and statisticians with a comprehensive set of

More information

The Common Factor Model. Measurement Methods Lecture 15 Chapter 9

The Common Factor Model. Measurement Methods Lecture 15 Chapter 9 The Common Factor Model Measurement Methods Lecture 15 Chapter 9 Today s Class Common Factor Model Multiple factors with a single test ML Estimation Methods New fit indices because of ML Estimation method

More information

Overview of clustering analysis. Yuehua Cui

Overview of clustering analysis. Yuehua Cui Overview of clustering analysis Yuehua Cui Email: cuiy@msu.edu http://www.stt.msu.edu/~cui A data set with clear cluster structure How would you design an algorithm for finding the three clusters in this

More information

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline. Practitioner Course: Portfolio Optimization September 10, 2008 Before we define dependence, it is useful to define Random variables X and Y are independent iff For all x, y. In particular, F (X,Y ) (x,

More information

A Program for Data Transformations and Kernel Density Estimation

A Program for Data Transformations and Kernel Density Estimation A Program for Data Transformations and Kernel Density Estimation John G. Manchuk and Clayton V. Deutsch Modeling applications in geostatistics often involve multiple variables that are not multivariate

More information

Confirmatory Factor Analysis. Psych 818 DeShon

Confirmatory Factor Analysis. Psych 818 DeShon Confirmatory Factor Analysis Psych 818 DeShon Purpose Takes factor analysis a few steps further. Impose theoretically interesting constraints on the model and examine the resulting fit of the model with

More information

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ). .8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics

More information

WELCOME! Lecture 14: Factor Analysis, part I Måns Thulin

WELCOME! Lecture 14: Factor Analysis, part I Måns Thulin Quantitative methods II WELCOME! Lecture 14: Factor Analysis, part I Måns Thulin The first factor analysis C. Spearman (1904). General intelligence, objectively determined and measured. The American Journal

More information

Visualizing Tests for Equality of Covariance Matrices Supplemental Appendix

Visualizing Tests for Equality of Covariance Matrices Supplemental Appendix Visualizing Tests for Equality of Covariance Matrices Supplemental Appendix Michael Friendly and Matthew Sigal September 18, 2017 Contents Introduction 1 1 Visualizing mean differences: The HE plot framework

More information

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26 Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1

More information

CHAPTER 5. Outlier Detection in Multivariate Data

CHAPTER 5. Outlier Detection in Multivariate Data CHAPTER 5 Outlier Detection in Multivariate Data 5.1 Introduction Multivariate outlier detection is the important task of statistical analysis of multivariate data. Many methods have been proposed for

More information

26:010:557 / 26:620:557 Social Science Research Methods

26:010:557 / 26:620:557 Social Science Research Methods 26:010:557 / 26:620:557 Social Science Research Methods Dr. Peter R. Gillett Associate Professor Department of Accounting & Information Systems Rutgers Business School Newark & New Brunswick 1 Overview

More information

Syllabus: Statistics 2 (7.5 hp)

Syllabus: Statistics 2 (7.5 hp) Department of Psychology, Stockholm University Doctoral program in Psychology, spring semester 2014 Syllabus: Statistics 2 (7.5 hp) Prior knowledge The course assumes prior knowledge corresponding to the

More information

Midterm 2 - Solutions

Midterm 2 - Solutions Ecn 102 - Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman Midterm 2 - Solutions You have until 10:20am to complete this exam. Please remember to put

More information

Textbook Examples of. SPSS Procedure

Textbook Examples of. SPSS Procedure Textbook s of IBM SPSS Procedures Each SPSS procedure listed below has its own section in the textbook. These sections include a purpose statement that describes the statistical test, identification of

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras Lecture - 39 Regression Analysis Hello and welcome to the course on Biostatistics

More information

Gaussian random variables inr n

Gaussian random variables inr n Gaussian vectors Lecture 5 Gaussian random variables inr n One-dimensional case One-dimensional Gaussian density with mean and standard deviation (called N, ): fx x exp. Proposition If X N,, then ax b

More information

Principal Component Analysis. Applied Multivariate Statistics Spring 2012

Principal Component Analysis. Applied Multivariate Statistics Spring 2012 Principal Component Analysis Applied Multivariate Statistics Spring 2012 Overview Intuition Four definitions Practical examples Mathematical example Case study 2 PCA: Goals Goal 1: Dimension reduction

More information

Vector Space Models. wine_spectral.r

Vector Space Models. wine_spectral.r Vector Space Models 137 wine_spectral.r Latent Semantic Analysis Problem with words Even a small vocabulary as in wine example is challenging LSA Reduce number of columns of DTM by principal components

More information

Robustness of Principal Components

Robustness of Principal Components PCA for Clustering An objective of principal components analysis is to identify linear combinations of the original variables that are useful in accounting for the variation in those original variables.

More information

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, data types 3 Data sources and preparation Project 1 out 4

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, data types 3 Data sources and preparation Project 1 out 4 Lecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, data types 3 Data sources and preparation Project 1 out 4 Data reduction, similarity & distance, data augmentation

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

Dimensionality Reduction and Principal Components

Dimensionality Reduction and Principal Components Dimensionality Reduction and Principal Components Nuno Vasconcelos (Ken Kreutz-Delgado) UCSD Motivation Recall, in Bayesian decision theory we have: World: States Y in {1,..., M} and observations of X

More information

Dimensionality Reduction and Principle Components

Dimensionality Reduction and Principle Components Dimensionality Reduction and Principle Components Ken Kreutz-Delgado (Nuno Vasconcelos) UCSD ECE Department Winter 2012 Motivation Recall, in Bayesian decision theory we have: World: States Y in {1,...,

More information

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Intelligent Data Analysis and Probabilistic Inference Lecture

More information

Psych Jan. 5, 2005

Psych Jan. 5, 2005 Psych 124 1 Wee 1: Introductory Notes on Variables and Probability Distributions (1/5/05) (Reading: Aron & Aron, Chaps. 1, 14, and this Handout.) All handouts are available outside Mija s office. Lecture

More information

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Combinations of features Given a data matrix X n p with p fairly large, it can

More information

DISCOVERING STATISTICS USING R

DISCOVERING STATISTICS USING R DISCOVERING STATISTICS USING R ANDY FIELD I JEREMY MILES I ZOE FIELD Los Angeles London New Delhi Singapore j Washington DC CONTENTS Preface How to use this book Acknowledgements Dedication Symbols used

More information

Principal Components Theory Notes

Principal Components Theory Notes Principal Components Theory Notes Charles J. Geyer August 29, 2007 1 Introduction These are class notes for Stat 5601 (nonparametrics) taught at the University of Minnesota, Spring 2006. This not a theory

More information

Prentice Hall Stats: Modeling the World 2004 (Bock) Correlated to: National Advanced Placement (AP) Statistics Course Outline (Grades 9-12)

Prentice Hall Stats: Modeling the World 2004 (Bock) Correlated to: National Advanced Placement (AP) Statistics Course Outline (Grades 9-12) National Advanced Placement (AP) Statistics Course Outline (Grades 9-12) Following is an outline of the major topics covered by the AP Statistics Examination. The ordering here is intended to define the

More information

Unsupervised machine learning

Unsupervised machine learning Chapter 9 Unsupervised machine learning Unsupervised machine learning (a.k.a. cluster analysis) is a set of methods to assign objects into clusters under a predefined distance measure when class labels

More information

Warner, R. M. (2008). Applied Statistics: From bivariate through multivariate techniques. Thousand Oaks: Sage.

Warner, R. M. (2008). Applied Statistics: From bivariate through multivariate techniques. Thousand Oaks: Sage. Errata for Warner, R. M. (2008). Applied Statistics: From bivariate through multivariate techniques. Thousand Oaks: Sage. Most recent update: March 4, 2009 Please send information about any errors in the

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Transition Passage to Descriptive Statistics 28

Transition Passage to Descriptive Statistics 28 viii Preface xiv chapter 1 Introduction 1 Disciplines That Use Quantitative Data 5 What Do You Mean, Statistics? 6 Statistics: A Dynamic Discipline 8 Some Terminology 9 Problems and Answers 12 Scales of

More information

Chapter Goals. To understand the methods for displaying and describing relationship among variables. Formulate Theories.

Chapter Goals. To understand the methods for displaying and describing relationship among variables. Formulate Theories. Chapter Goals To understand the methods for displaying and describing relationship among variables. Formulate Theories Interpret Results/Make Decisions Collect Data Summarize Results Chapter 7: Is There

More information

5. Discriminant analysis

5. Discriminant analysis 5. Discriminant analysis We continue from Bayes s rule presented in Section 3 on p. 85 (5.1) where c i is a class, x isap-dimensional vector (data case) and we use class conditional probability (density

More information

Statistics Introductory Correlation

Statistics Introductory Correlation Statistics Introductory Correlation Session 10 oscardavid.barrerarodriguez@sciencespo.fr April 9, 2018 Outline 1 Statistics are not used only to describe central tendency and variability for a single variable.

More information