Short Answer Questions: Answer on your separate blank paper. Points are given in parentheses.

Size: px
Start display at page:

Download "Short Answer Questions: Answer on your separate blank paper. Points are given in parentheses."

Transcription

1 ISQS 6348 Final exam solutions. Name: Open book and notes, but no electronic devices. Answer short answer questions on separate blank paper. Answer multiple choice on this exam sheet. Put your name on everything you hand in. Points out of 00 are indicated in parentheses. Short Answer Questions: Answer on your separate blank paper. Points are given in parentheses.. (3) Calculate: Solution: A.(3) Give the following system of equations in matrix/vector form: x + y 0 3x y. Solution: 0 3 y x.b. (3) Give the matrix form of the solution to A. (Don t calculate the final solution.) Solution: 0 3 y x 3. Here is a covariance matrix: Σ 9 X X Cov 3.A. () Find the standard deviation of X. Solution: 3 3.B. () Find the standard deviation of X. Solution: 3.C. (3) Find the correlation between X and X. Solution: -/3

2 3.D. (3) Assume the means of X and X are zero. Using your answers to A., B. and C., draw a scatterplot showing the likely appearance of the (X, X) data. Solution: 3.E. (5) Again assume the means of X and X are zero. Find the regression equation to predict X as a function of X. Solution: X -X + u

3 3.F. (3) Draw the path diagram that represents the regression equation in E. Assume both variables are manifest (not latent). Solution: 4. A. (3) Draw a scatterplot showing an observation whose Euclidean distance from the centroid is relatively small, but whose Mahalanobis distance from the centroid is relatively large. Solution: The observation is labelled P and the centroid is labelled M.

4 4.B. (3) Draw a contour plot of a kernel-based bivariate density estimate that corresponds to your scatterplot of A. Solution: Draw a graph with concentric ellipses that encompass the data. For my scatterplot above, these ellipses will be very narrow. 5. (5) Throughout the course, the effect of sample size on statistical estimates has been emphasized. What usually happens to statistical estimates when there is a larger sample size? Solution: With more data, the estimated parameters tend to be closer to the true values of the parameters. 6. Here is a contingency table showing job tasks taken up by husbands and wives. For example, 4 of the 744 couples surveyed jointly do the laundry. Wife Alternating Husband Jointly Total Laundry Main_meal Dinner Breakfeast Tidying Dishes Shopping Official Driving Finances Insurance Repairs Holidays Total Here is a correspondence analysis plot from these data:

5 6.A. (4) Insurance and Finances points are relatively close. Refer to the data to explain why. Solution: It means that their conditional distributions, are relatively similar compared to the other conditional distributions. Here are those distributions: Finances 3/3 3/3 /3 66/3 Insurance 8/39 /39 53/39 77/39 6.B. (4) Husband and Repairs have similar directions. Refer to the data to explain why. Solution: It means that Pr(Repairs Husband) is higher than Pr(Repairs). Here, Pr(Repairs Husband) is 60/65, while Pr(Repairs) 65/ (3) A principal component is given as follows: PC -0.55X -0.46X +0.48X X4. Here X through X4 are standardized measurements of a person: X Height, X Arm length, X3 Weight, X4 Percentage body fat Use the line the people against the wall from smallest to largest value of PC idea. What can you say about people with large values of PC? What can you say about people with small values of PC? Solution: People at the high end have low X, and X and simultaneously high X3 and X4. These are short people who are overweight. People at the low end are the opposite: tall people who are underweight.

6 8. We discussed principal components analysis (PCA) and canonical correlation analysis (CCA). 8.A. () Briefly state two things that are similar about PCA and CCA. Solution: (i) linear combinations are derived (ii) linear combinations are chosen optimally (best) 8.B. () Briefly state two things that are different about PCA and CCA. Solution: (i) for PCA best means maximum variance explained, for CCA it refers to maximum correlation (ii) PCA considers all variables in one group, CCA require two groups of variables. 9. We discussed model-based clustering (mclust) and k-means clustering (kmeans). 9.A.() Briefly state two things that are similar about mclust and kmeans. Solution: (i) both methods are used to assign data to clusters (ii) both methods work well with spherical clusters 9.B. () Briefly state two things that are different about mclust and kmeans. Solution: (i) kmeans has no model, mclust uses a model (ii) kmeans has no objective criterion for choosing number of clusters; mclust uses the objective BIC criterion. 0. We discussed exploratory factor analysis (EFA) and confirmatory factor analysis (CFA). 0.A.() Briefly state two things that are similar about EFA and CFA. Solution: (i) Both are latent variable methods, (ii) both assume the manifest variables are functions of the latent variable 0.B. () Briefly state two things that are different about EFA and CFA. Solution: (i) EFA typically assumes no correlation between factors; CFA typically allows correlated factors (ii) EFA allows all loadings to be non-zero, CFA constrains some loadings to be zero.

7 . Here is a factor analysis model with true loadings given. X 0.5 f + u X 0.5 f + u X3 0.5 f + u3.a. (5) Assume the X s all have variance.0. (Hence, each ui has variance 0.75). Give the covariance matrix of (X, X, X3) that is implied by the model Solution: Σ [ ] + implied B. (3) Using your answer to A., give the null hypothesis that is tested in the usual χ test for model adequacy. Solution: Σ true (3) In terms of the data generating process (not the data), what does it mean for a structural equation model to be misspecified? Do not answer in terms of any statistics like χ, SRMR, GFI etc., because those are from data. Answer only in terms of the true data generating process and your model for it. Solution: If you specify a model to fit that is not truly Nature s model, then your model is misspecified. Such misspecification can easily occur if there are additional paths or correlations in Nature, but you fail to model them.

8 Multiple Choice Questions: Circle Answers on Exam Sheet. One point each. 3. An eigenvector of a covariance matrix tells you what? A. The direction of variability in multivariate space B. The magnitude of variability in multivariate space C. Whether the data point is an outlier D. Whether the data point is normally distributed 4. Suppose (X, Y) has a bivariate normal distribution. Select the true statement. A. X and Y are independent B. X and Y are independent if the covariance between X and Y is equal to 0. C. X is a linear function of Y D. Y is a linear function of X 5. What does the na.omit function do to a data frame? A. Deletes a column if all observations in it are missing B. Deletes a column if at least one observation in it is missing C. Deletes a row if all observations in it are missing D. Deletes a row if at least one observation in it is missing 6. What is the preferred method of estimating correlations in the presence of missing values? A. Use listwise deletion on data.frame, then use cor(data.frame) B. Use pairwise deletion on data.frame, then use cor(data.frame) C. Use pairwise deletion data.frame, then use maximum likelihood D. Do not delete any missing values. Instead, use maximum likelihood on all available data 7. Suppose data.frame has two columns. What does the R command rug(data.frame, side) do? A. Draws a density plot of the data B. Draws a histogram of the data C. Places data values along the horizontal axis D. Places data values along the vertical axis 8. What does the scale function do? A. Converts data to percentages B. Coverts date to natural logarithm scale C. Subtracts the row mean from each observation, and then divides the result by the row standard deviation D. Subtracts the column mean from each observation, and then divides the result by the column standard deviation

9 9. As used most commonly in this course, what is a latent variable? A. A column of data in your data frame B. An unobserved column of data in your data frame C. A row of data in your data frame D. An unobserved row of data in your data frame 0. Which plot can display three of the variables in your data frame simultaneously? A. Ellipsoidal plot B. Bivariate density pot C. Bubble plot D. Contour plot. When does the Mahalanobis distance (generalized distance) from data X to the mean µ have a chi-squared distribution? A. When the estimated covariance matrix is used to define distance B. When the estimated mean vector is used instead of µ C. When the distribution of X is the multivariate normal distribution D. When the distribution of X is the chi-squared distribution. The kernel density estimate is the sum of the kernel bumps centered at the data points. How should you choose the bandwidth, h? A. Choose it to be around 0.5 B. Choose it to be around.0 C. Choose it to be around 0.5 standard deviations D. Choose it to be around.0 standard deviations 3. In principal component analysis, a smaller eigenvalue indicates that A. A given variable in the original data set, say Xj, is more important B. A given variable in the original data set, say Xj, is less important C. A given principal component, say Yj, is more important D. A given principal component, say Yj, is less important 4. Why do we often pick just the first two principal components? A. Because we can graph them in a scatterplot B. Because they explain most of the variance C. Because they are uncorrelated D. Because of the Kaiser criterion

10 5. Which function retrieves the coefficients ai of the PC score Y ax + + aqxq? A. princomp(data.frame)$loadings B. princomp(data.frame)$scores C. princomp(data.frame)$coefs D. princomp(data.frame)$estimates 6. Convert a correlation matrix Rho into a distance matrix. A. Rho^ B. sqrt(rho) C. sqrt(-rho^) D. exp(rho) 7. Pick the correct form of the exploratory factor analysis model. A. X Λ f + u B. Σ Λ f + u C. X Λ Λ T + ψ D. Σ Λ f + ψ 8. What does varimax factor rotation do? A. gives a simple structure of the loading matrix B. makes the distribution of manifest variables closer to multivariate normal C. makes the factors have maximum variances D. explains a greater proportion of variance of the manifest variables 9. In an exploratory factor analysis, there is a test of H0: Σ ΛΛ Τ + Ψ, where Λ has k columns (one for each latent variable). When is the model with k factors acceptable? A. When the p-value for the test is >.05 B. When the p-value for the test is.05 C. When the p-value for the test is <.05 D. When the p-value for the test is When are both principal components and factor analysis pointless? A. When the test of H0: Σ ΛΛ Τ + Ψ gives p <.05 B. When the test of H0: Σ ΛΛ Τ + Ψ gives p >.05 C. When the test of H0: Σ (a diagonal matrix) gives p <.05 D. When the test of H0: Σ (a diagonal matrix) gives p >.05

11 3. How many clusters are there to end with in the agglomerative hierarchical clustering algorithm? A. B. q C. n D. As many as you want 3. What is wrong with the knee (or elbow ) criterion for selecting number of clusters? A. It usually picks too many clusters B. It usually picks too few clusters C. It is hard to find the knee (or elbow ) when clusters are well separated D. It is hard to find the knee (or elbow ) when clusters are poorly separated 33. Which statistic measures row-column correspondence when comparing assigned clusters to an external grouping variable? A. F statistic B. chi square statistic C. affinity statistic D. root mean squared error 34. In model-based clustering, when do observations come from the same true cluster? A. When they come from the same distribution B. When they have the highest posterior probability of belonging to the same cluster C. When they are close to each other in terms of Mahalanobis distance D. When they are close to each other in terms of Euclidean distance 35. When you apply the R command plot to an object obtained via hierarchical clustering, as in h.obj hclust(dmat, ) plot(h.obj) then you get a A. dendrogram B. scree plot C. scatterplot D. density plot 36. Suppose you have specified the correct model in your confirmatory factor analysis. What happens to the p-value for the test of model fit as the sample size get larger? A. it tends toward 0.0 B. it tends toward.0 C. it is random, but usually above 0.05 D. it is random, but usually below 0.05

12 37. How can you improve the fit of your structural equations model? A. By removing a manifest variable from the model B. By including an additional manifest variable in the model C. By forcing certain error terms to be uncorrelated D. By allowing certain error terms to be correlated 38. Using lavaan, specify the following model. y λ f + u y λ f + u A. f ~ y + y B. f ~~ y + y C. f y + y D. f ~ y + y 39. Suppose a data set in wide format has 0 columns, four of which are repeated measures on a particular variable. There are 00 rows. How many rows are there in long format? Which kind of missing values are the worst? A. Non-normal B. Non-ignorable C. Missing completely at random D. Missing at random

6. Let C and D be matrices conformable to multiplication. Then (CD) =

6. Let C and D be matrices conformable to multiplication. Then (CD) = Quiz 1. Name: 10 points per correct answer. (20 points for attendance). 1. Let A = 3 and B = [3 yy]. When is A equal to B? xx A. When x = 3 B. When y = 3 C. When x = y D. Never 2. See 1. What is the dimension

More information

6348 Final, Fall 14. Closed book, closed notes, no electronic devices. Points (out of 200) in parentheses.

6348 Final, Fall 14. Closed book, closed notes, no electronic devices. Points (out of 200) in parentheses. 6348 Final, Fall 14. Closed book, closed notes, no electronic devices. Points (out of 200) in parentheses. 0 11 1 1.(5) Give the result of the following matrix multiplication: 1 10 1 Solution: 0 1 1 2

More information

Factor analysis. George Balabanis

Factor analysis. George Balabanis Factor analysis George Balabanis Key Concepts and Terms Deviation. A deviation is a value minus its mean: x - mean x Variance is a measure of how spread out a distribution is. It is computed as the average

More information

Chapter 4: Factor Analysis

Chapter 4: Factor Analysis Chapter 4: Factor Analysis In many studies, we may not be able to measure directly the variables of interest. We can merely collect data on other variables which may be related to the variables of interest.

More information

Dimensionality Reduction Techniques (DRT)

Dimensionality Reduction Techniques (DRT) Dimensionality Reduction Techniques (DRT) Introduction: Sometimes we have lot of variables in the data for analysis which create multidimensional matrix. To simplify calculation and to get appropriate,

More information

1 A factor can be considered to be an underlying latent variable: (a) on which people differ. (b) that is explained by unknown variables

1 A factor can be considered to be an underlying latent variable: (a) on which people differ. (b) that is explained by unknown variables 1 A factor can be considered to be an underlying latent variable: (a) on which people differ (b) that is explained by unknown variables (c) that cannot be defined (d) that is influenced by observed variables

More information

Applied Multivariate Analysis

Applied Multivariate Analysis Department of Mathematics and Statistics, University of Vaasa, Finland Spring 2017 Dimension reduction Exploratory (EFA) Background While the motivation in PCA is to replace the original (correlated) variables

More information

LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS

LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS NOTES FROM PRE- LECTURE RECORDING ON PCA PCA and EFA have similar goals. They are substantially different in important ways. The goal

More information

Introduction to Factor Analysis

Introduction to Factor Analysis to Factor Analysis Lecture 10 August 2, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #10-8/3/2011 Slide 1 of 55 Today s Lecture Factor Analysis Today s Lecture Exploratory

More information

Principal Components. Summary. Sample StatFolio: pca.sgp

Principal Components. Summary. Sample StatFolio: pca.sgp Principal Components Summary... 1 Statistical Model... 4 Analysis Summary... 5 Analysis Options... 7 Scree Plot... 8 Component Weights... 9 D and 3D Component Plots... 10 Data Table... 11 D and 3D Component

More information

Exploratory Factor Analysis and Principal Component Analysis

Exploratory Factor Analysis and Principal Component Analysis Exploratory Factor Analysis and Principal Component Analysis Today s Topics: What are EFA and PCA for? Planning a factor analytic study Analysis steps: Extraction methods How many factors Rotation and

More information

Unconstrained Ordination

Unconstrained Ordination Unconstrained Ordination Sites Species A Species B Species C Species D Species E 1 0 (1) 5 (1) 1 (1) 10 (4) 10 (4) 2 2 (3) 8 (3) 4 (3) 12 (6) 20 (6) 3 8 (6) 20 (6) 10 (6) 1 (2) 3 (2) 4 4 (5) 11 (5) 8 (5)

More information

Article begins on next page

Article begins on next page A Short Course in Multivariate Statistical Methods with R = Олон хэмжээст статистикийн богино хэмжээний сургалт R прогамм дээр Rutgers University has made this article freely available. Please share how

More information

Introduction to Confirmatory Factor Analysis

Introduction to Confirmatory Factor Analysis Introduction to Confirmatory Factor Analysis Multivariate Methods in Education ERSH 8350 Lecture #12 November 16, 2011 ERSH 8350: Lecture 12 Today s Class An Introduction to: Confirmatory Factor Analysis

More information

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis: Uses one group of variables (we will call this X) In

More information

Fall 07 ISQS 6348 Midterm Solutions

Fall 07 ISQS 6348 Midterm Solutions Fall 07 ISQS 648 Midterm Solutions Instructions: Open notes, no books. Points out of 00 in parentheses. 1. A random vector X = 4 X 1 X X has the following mean vector and covariance matrix: E(X) = 4 1

More information

Exploratory Factor Analysis and Principal Component Analysis

Exploratory Factor Analysis and Principal Component Analysis Exploratory Factor Analysis and Principal Component Analysis Today s Topics: What are EFA and PCA for? Planning a factor analytic study Analysis steps: Extraction methods How many factors Rotation and

More information

Introduction to Factor Analysis

Introduction to Factor Analysis to Factor Analysis Lecture 11 November 2, 2005 Multivariate Analysis Lecture #11-11/2/2005 Slide 1 of 58 Today s Lecture Factor Analysis. Today s Lecture Exploratory factor analysis (EFA). Confirmatory

More information

Factor Analysis. Summary. Sample StatFolio: factor analysis.sgp

Factor Analysis. Summary. Sample StatFolio: factor analysis.sgp Factor Analysis Summary... 1 Data Input... 3 Statistical Model... 4 Analysis Summary... 5 Analysis Options... 7 Scree Plot... 9 Extraction Statistics... 10 Rotation Statistics... 11 D and 3D Scatterplots...

More information

STAT 730 Chapter 9: Factor analysis

STAT 730 Chapter 9: Factor analysis STAT 730 Chapter 9: Factor analysis Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Data Analysis 1 / 15 Basic idea Factor analysis attempts to explain the

More information

An Introduction to Applied Multivariate Analysis with R

An Introduction to Applied Multivariate Analysis with R ~ Snrinuer Brian Everitt Torsten Hathorn An Introduction to Applied Multivariate Analysis with R > Preface........................................................ vii 1 Multivariate Data and Multivariate

More information

FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING

FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING Vishwanath Mantha Department for Electrical and Computer Engineering Mississippi State University, Mississippi State, MS 39762 mantha@isip.msstate.edu ABSTRACT

More information

Principal Component Analysis & Factor Analysis. Psych 818 DeShon

Principal Component Analysis & Factor Analysis. Psych 818 DeShon Principal Component Analysis & Factor Analysis Psych 818 DeShon Purpose Both are used to reduce the dimensionality of correlated measurements Can be used in a purely exploratory fashion to investigate

More information

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26 Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1

More information

Structural Equation Modeling and Confirmatory Factor Analysis. Types of Variables

Structural Equation Modeling and Confirmatory Factor Analysis. Types of Variables /4/04 Structural Equation Modeling and Confirmatory Factor Analysis Advanced Statistics for Researchers Session 3 Dr. Chris Rakes Website: http://csrakes.yolasite.com Email: Rakes@umbc.edu Twitter: @RakesChris

More information

Multivariate Statistics 101. Ordination (PCA, NMDS, CA) Cluster Analysis (UPGMA, Ward s) Canonical Correspondence Analysis

Multivariate Statistics 101. Ordination (PCA, NMDS, CA) Cluster Analysis (UPGMA, Ward s) Canonical Correspondence Analysis Multivariate Statistics 101 Ordination (PCA, NMDS, CA) Cluster Analysis (UPGMA, Ward s) Canonical Correspondence Analysis Multivariate Statistics 101 Copy of slides and exercises PAST software download

More information

ISQS 5349 Spring 2013 Final Exam

ISQS 5349 Spring 2013 Final Exam ISQS 5349 Spring 2013 Final Exam Name: General Instructions: Closed books, notes, no electronic devices. Points (out of 200) are in parentheses. Put written answers on separate paper; multiple choices

More information

Open book, but no loose leaf notes and no electronic devices. Points (out of 200) are in parentheses. Put all answers on the paper provided to you.

Open book, but no loose leaf notes and no electronic devices. Points (out of 200) are in parentheses. Put all answers on the paper provided to you. ISQS 5347 Final Exam Spring 2017 Open book, but no loose leaf notes and no electronic devices. Points (out of 200) are in parentheses. Put all answers on the paper provided to you. 1. Recall the commute

More information

Machine Learning 2nd Edition

Machine Learning 2nd Edition INTRODUCTION TO Lecture Slides for Machine Learning 2nd Edition ETHEM ALPAYDIN, modified by Leonardo Bobadilla and some parts from http://www.cs.tau.ac.il/~apartzin/machinelearning/ The MIT Press, 2010

More information

VAR2 VAR3 VAR4 VAR5. Or, in terms of basic measurement theory, we could model it as:

VAR2 VAR3 VAR4 VAR5. Or, in terms of basic measurement theory, we could model it as: 1 Neuendorf Factor Analysis Assumptions: 1. Metric (interval/ratio) data 2. Linearity (in the relationships among the variables) -Factors are linear constructions of the set of variables (see #8 under

More information

2/26/2017. This is similar to canonical correlation in some ways. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

2/26/2017. This is similar to canonical correlation in some ways. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 What is factor analysis? What are factors? Representing factors Graphs and equations Extracting factors Methods and criteria Interpreting

More information

Experimental Design and Data Analysis for Biologists

Experimental Design and Data Analysis for Biologists Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1

More information

Clusters. Unsupervised Learning. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Clusters. Unsupervised Learning. Luc Anselin.   Copyright 2017 by Luc Anselin, All Rights Reserved Clusters Unsupervised Learning Luc Anselin http://spatial.uchicago.edu 1 curse of dimensionality principal components multidimensional scaling classical clustering methods 2 Curse of Dimensionality 3 Curse

More information

STATISTICAL LEARNING SYSTEMS

STATISTICAL LEARNING SYSTEMS STATISTICAL LEARNING SYSTEMS LECTURE 8: UNSUPERVISED LEARNING: FINDING STRUCTURE IN DATA Institute of Computer Science, Polish Academy of Sciences Ph. D. Program 2013/2014 Principal Component Analysis

More information

Principal component analysis

Principal component analysis Principal component analysis Motivation i for PCA came from major-axis regression. Strong assumption: single homogeneous sample. Free of assumptions when used for exploration. Classical tests of significance

More information

Introduction to Matrix Algebra and the Multivariate Normal Distribution

Introduction to Matrix Algebra and the Multivariate Normal Distribution Introduction to Matrix Algebra and the Multivariate Normal Distribution Introduction to Structural Equation Modeling Lecture #2 January 18, 2012 ERSH 8750: Lecture 2 Motivation for Learning the Multivariate

More information

Or, in terms of basic measurement theory, we could model it as:

Or, in terms of basic measurement theory, we could model it as: 1 Neuendorf Factor Analysis Assumptions: 1. Metric (interval/ratio) data 2. Linearity (in relationships among the variables--factors are linear constructions of the set of variables; the critical source

More information

Table of Contents. Multivariate methods. Introduction II. Introduction I

Table of Contents. Multivariate methods. Introduction II. Introduction I Table of Contents Introduction Antti Penttilä Department of Physics University of Helsinki Exactum summer school, 04 Construction of multinormal distribution Test of multinormality with 3 Interpretation

More information

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR

More information

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data. Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two

More information

THE PEARSON CORRELATION COEFFICIENT

THE PEARSON CORRELATION COEFFICIENT CORRELATION Two variables are said to have a relation if knowing the value of one variable gives you information about the likely value of the second variable this is known as a bivariate relation There

More information

Confirmatory Factor Analysis: Model comparison, respecification, and more. Psychology 588: Covariance structure and factor models

Confirmatory Factor Analysis: Model comparison, respecification, and more. Psychology 588: Covariance structure and factor models Confirmatory Factor Analysis: Model comparison, respecification, and more Psychology 588: Covariance structure and factor models Model comparison 2 Essentially all goodness of fit indices are descriptive,

More information

DIMENSION REDUCTION AND CLUSTER ANALYSIS

DIMENSION REDUCTION AND CLUSTER ANALYSIS DIMENSION REDUCTION AND CLUSTER ANALYSIS EECS 833, 6 March 2006 Geoff Bohling Assistant Scientist Kansas Geological Survey geoff@kgs.ku.edu 864-2093 Overheads and resources available at http://people.ku.edu/~gbohling/eecs833

More information

Multivariate Statistics

Multivariate Statistics Multivariate Statistics Chapter 4: Factor analysis Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2017/2018 Master in Mathematical Engineering Pedro

More information

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ). .8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics

More information

Principal Component Analysis (PCA) Theory, Practice, and Examples

Principal Component Analysis (PCA) Theory, Practice, and Examples Principal Component Analysis (PCA) Theory, Practice, and Examples Data Reduction summarization of data with many (p) variables by a smaller set of (k) derived (synthetic, composite) variables. p k n A

More information

Canonical Correlation & Principle Components Analysis

Canonical Correlation & Principle Components Analysis Canonical Correlation & Principle Components Analysis Aaron French Canonical Correlation Canonical Correlation is used to analyze correlation between two sets of variables when there is one set of IVs

More information

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Canonical Edps/Soc 584 and Psych 594 Applied Multivariate Statistics Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Canonical Slide

More information

Intermediate Social Statistics

Intermediate Social Statistics Intermediate Social Statistics Lecture 5. Factor Analysis Tom A.B. Snijders University of Oxford January, 2008 c Tom A.B. Snijders (University of Oxford) Intermediate Social Statistics January, 2008 1

More information

STRUCTURAL EQUATION MODELING. Khaled Bedair Statistics Department Virginia Tech LISA, Summer 2013

STRUCTURAL EQUATION MODELING. Khaled Bedair Statistics Department Virginia Tech LISA, Summer 2013 STRUCTURAL EQUATION MODELING Khaled Bedair Statistics Department Virginia Tech LISA, Summer 2013 Introduction: Path analysis Path Analysis is used to estimate a system of equations in which all of the

More information

Linear Dimensionality Reduction

Linear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis

More information

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA Examples: Multilevel Modeling With Complex Survey Data CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA Complex survey data refers to data obtained by stratification, cluster sampling and/or

More information

Watch TV 4 7 Read 5 2 Exercise 2 4 Talk to friends 7 3 Go to a movie 6 5 Go to dinner 1 6 Go to the mall 3 1

Watch TV 4 7 Read 5 2 Exercise 2 4 Talk to friends 7 3 Go to a movie 6 5 Go to dinner 1 6 Go to the mall 3 1 Unit 3 Lesson 1 Investigation 2 Check Your Understanding Name: A couple decides to measure their compatibility by ranking their favorite leisure activities. The rankings are given below in the table. Mallisa

More information

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 42 Outline 1 Introduction 2 Feature selection

More information

Multivariate and Multivariable Regression. Stella Babalola Johns Hopkins University

Multivariate and Multivariable Regression. Stella Babalola Johns Hopkins University Multivariate and Multivariable Regression Stella Babalola Johns Hopkins University Session Objectives At the end of the session, participants will be able to: Explain the difference between multivariable

More information

Data Exploration and Unsupervised Learning with Clustering

Data Exploration and Unsupervised Learning with Clustering Data Exploration and Unsupervised Learning with Clustering Paul F Rodriguez,PhD San Diego Supercomputer Center Predictive Analytic Center of Excellence Clustering Idea Given a set of data can we find a

More information

Principal Component Analysis, A Powerful Scoring Technique

Principal Component Analysis, A Powerful Scoring Technique Principal Component Analysis, A Powerful Scoring Technique George C. J. Fernandez, University of Nevada - Reno, Reno NV 89557 ABSTRACT Data mining is a collection of analytical techniques to uncover new

More information

Multivariate Fundamentals: Rotation. Exploratory Factor Analysis

Multivariate Fundamentals: Rotation. Exploratory Factor Analysis Multivariate Fundamentals: Rotation Exploratory Factor Analysis PCA Analysis A Review Precipitation Temperature Ecosystems PCA Analysis with Spatial Data Proportion of variance explained Comp.1 + Comp.2

More information

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Review. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with

More information

Ordination & PCA. Ordination. Ordination

Ordination & PCA. Ordination. Ordination Ordination & PCA Introduction to Ordination Purpose & types Shepard diagrams Principal Components Analysis (PCA) Properties Computing eigenvalues Computing principal components Biplots Covariance vs. Correlation

More information

Exploratory Factor Analysis and Canonical Correlation

Exploratory Factor Analysis and Canonical Correlation Exploratory Factor Analysis and Canonical Correlation 3 Dec 2010 CPSY 501 Dr. Sean Ho Trinity Western University Please download: SAQ.sav Outline for today Factor analysis Latent variables Correlation

More information

Robotics 2 Data Association. Giorgio Grisetti, Cyrill Stachniss, Kai Arras, Wolfram Burgard

Robotics 2 Data Association. Giorgio Grisetti, Cyrill Stachniss, Kai Arras, Wolfram Burgard Robotics 2 Data Association Giorgio Grisetti, Cyrill Stachniss, Kai Arras, Wolfram Burgard Data Association Data association is the process of associating uncertain measurements to known tracks. Problem

More information

Instructions: Closed book, notes, and no electronic devices. Points (out of 200) in parentheses

Instructions: Closed book, notes, and no electronic devices. Points (out of 200) in parentheses ISQS 5349 Final Spring 2011 Instructions: Closed book, notes, and no electronic devices. Points (out of 200) in parentheses 1. (10) What is the definition of a regression model that we have used throughout

More information

PRINCIPAL COMPONENT ANALYSIS

PRINCIPAL COMPONENT ANALYSIS PRINCIPAL COMPONENT ANALYSIS 1 INTRODUCTION One of the main problems inherent in statistics with more than two variables is the issue of visualising or interpreting data. Fortunately, quite often the problem

More information

Introduction to multivariate analysis Outline

Introduction to multivariate analysis Outline Introduction to multivariate analysis Outline Why do a multivariate analysis Ordination, classification, model fitting Principal component analysis Discriminant analysis, quickly Species presence/absence

More information

Factor Analysis (10/2/13)

Factor Analysis (10/2/13) STA561: Probabilistic machine learning Factor Analysis (10/2/13) Lecturer: Barbara Engelhardt Scribes: Li Zhu, Fan Li, Ni Guan Factor Analysis Factor analysis is related to the mixture models we have studied.

More information

Multivariate Statistics

Multivariate Statistics Multivariate Statistics Chapter 6: Cluster Analysis Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2017/2018 Master in Mathematical Engineering

More information

Didacticiel - Études de cas

Didacticiel - Études de cas 1 Topic New features for PCA (Principal Component Analysis) in Tanagra 1.4.45 and later: tools for the determination of the number of factors. Principal Component Analysis (PCA) 1 is a very popular dimension

More information

Eigenvalues, Eigenvectors, and an Intro to PCA

Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Changing Basis We ve talked so far about re-writing our data using a new set of variables, or a new basis.

More information

Eigenvalues, Eigenvectors, and an Intro to PCA

Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Changing Basis We ve talked so far about re-writing our data using a new set of variables, or a new basis.

More information

Confirmatory Factor Analysis. Psych 818 DeShon

Confirmatory Factor Analysis. Psych 818 DeShon Confirmatory Factor Analysis Psych 818 DeShon Purpose Takes factor analysis a few steps further. Impose theoretically interesting constraints on the model and examine the resulting fit of the model with

More information

PRINCIPAL COMPONENTS ANALYSIS

PRINCIPAL COMPONENTS ANALYSIS PRINCIPAL COMPONENTS ANALYSIS Iris Data Let s find Principal Components using the iris dataset. This is a well known dataset, often used to demonstrate the effect of clustering algorithms. It contains

More information

Multivariate Statistics (I) 2. Principal Component Analysis (PCA)

Multivariate Statistics (I) 2. Principal Component Analysis (PCA) Multivariate Statistics (I) 2. Principal Component Analysis (PCA) 2.1 Comprehension of PCA 2.2 Concepts of PCs 2.3 Algebraic derivation of PCs 2.4 Selection and goodness-of-fit of PCs 2.5 Algebraic derivation

More information

The Common Factor Model. Measurement Methods Lecture 15 Chapter 9

The Common Factor Model. Measurement Methods Lecture 15 Chapter 9 The Common Factor Model Measurement Methods Lecture 15 Chapter 9 Today s Class Common Factor Model Multiple factors with a single test ML Estimation Methods New fit indices because of ML Estimation method

More information

Lecture 4: Principal Component Analysis and Linear Dimension Reduction

Lecture 4: Principal Component Analysis and Linear Dimension Reduction Lecture 4: Principal Component Analysis and Linear Dimension Reduction Advanced Applied Multivariate Analysis STAT 2221, Fall 2013 Sungkyu Jung Department of Statistics University of Pittsburgh E-mail:

More information

Machine Learning, Fall 2009: Midterm

Machine Learning, Fall 2009: Midterm 10-601 Machine Learning, Fall 009: Midterm Monday, November nd hours 1. Personal info: Name: Andrew account: E-mail address:. You are permitted two pages of notes and a calculator. Please turn off all

More information

Data Preprocessing Tasks

Data Preprocessing Tasks Data Tasks 1 2 3 Data Reduction 4 We re here. 1 Dimensionality Reduction Dimensionality reduction is a commonly used approach for generating fewer features. Typically used because too many features can

More information

EDAMI DIMENSION REDUCTION BY PRINCIPAL COMPONENT ANALYSIS

EDAMI DIMENSION REDUCTION BY PRINCIPAL COMPONENT ANALYSIS EDAMI DIMENSION REDUCTION BY PRINCIPAL COMPONENT ANALYSIS Mario Romanazzi October 29, 2017 1 Introduction An important task in multidimensional data analysis is reduction in complexity. Recalling that

More information

Principal Components Analysis and Exploratory Factor Analysis

Principal Components Analysis and Exploratory Factor Analysis Principal Components Analysis and Exploratory Factor Analysis PRE 905: Multivariate Analysis Lecture 12: May 6, 2014 PRE 905: PCA and EFA (with CFA) Today s Class Advanced matrix operations Principal Components

More information

Data reduction for multivariate analysis

Data reduction for multivariate analysis Data reduction for multivariate analysis Using T 2, m-cusum, m-ewma can help deal with the multivariate detection cases. But when the characteristic vector x of interest is of high dimension, it is difficult

More information

DISCOVERING STATISTICS USING R

DISCOVERING STATISTICS USING R DISCOVERING STATISTICS USING R ANDY FIELD I JEREMY MILES I ZOE FIELD Los Angeles London New Delhi Singapore j Washington DC CONTENTS Preface How to use this book Acknowledgements Dedication Symbols used

More information

UCLA STAT 233 Statistical Methods in Biomedical Imaging

UCLA STAT 233 Statistical Methods in Biomedical Imaging UCLA STAT 233 Statistical Methods in Biomedical Imaging Instructor: Ivo Dinov, Asst. Prof. In Statistics and Neurology University of California, Los Angeles, Spring 2004 http://www.stat.ucla.edu/~dinov/

More information

Using Structural Equation Modeling to Conduct Confirmatory Factor Analysis

Using Structural Equation Modeling to Conduct Confirmatory Factor Analysis Using Structural Equation Modeling to Conduct Confirmatory Factor Analysis Advanced Statistics for Researchers Session 3 Dr. Chris Rakes Website: http://csrakes.yolasite.com Email: Rakes@umbc.edu Twitter:

More information

Chapter 3: Examining Relationships

Chapter 3: Examining Relationships Chapter 3: Examining Relationships Most statistical studies involve more than one variable. Often in the AP Statistics exam, you will be asked to compare two data sets by using side by side boxplots or

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

3.1. The probabilistic view of the principal component analysis.

3.1. The probabilistic view of the principal component analysis. 301 Chapter 3 Principal Components and Statistical Factor Models This chapter of introduces the principal component analysis (PCA), briefly reviews statistical factor models PCA is among the most popular

More information

Vector Space Models. wine_spectral.r

Vector Space Models. wine_spectral.r Vector Space Models 137 wine_spectral.r Latent Semantic Analysis Problem with words Even a small vocabulary as in wine example is challenging LSA Reduce number of columns of DTM by principal components

More information

Multivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis

Multivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis Multivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis For example Data reduction approaches Cluster analysis Principal components analysis

More information

Introduction to Structural Equation Modeling

Introduction to Structural Equation Modeling Introduction to Structural Equation Modeling Notes Prepared by: Lisa Lix, PhD Manitoba Centre for Health Policy Topics Section I: Introduction Section II: Review of Statistical Concepts and Regression

More information

Dimension Reduction. David M. Blei. April 23, 2012

Dimension Reduction. David M. Blei. April 23, 2012 Dimension Reduction David M. Blei April 23, 2012 1 Basic idea Goal: Compute a reduced representation of data from p -dimensional to q-dimensional, where q < p. x 1,...,x p z 1,...,z q (1) We want to do

More information

Revision: Chapter 1-6. Applied Multivariate Statistics Spring 2012

Revision: Chapter 1-6. Applied Multivariate Statistics Spring 2012 Revision: Chapter 1-6 Applied Multivariate Statistics Spring 2012 Overview Cov, Cor, Mahalanobis, MV normal distribution Visualization: Stars plot, mosaic plot with shading Outlier: chisq.plot Missing

More information

Factor Analysis: An Introduction. What is Factor Analysis? 100+ years of Factor Analysis FACTOR ANALYSIS AN INTRODUCTION NILAM RAM

Factor Analysis: An Introduction. What is Factor Analysis? 100+ years of Factor Analysis FACTOR ANALYSIS AN INTRODUCTION NILAM RAM NILAM RAM 2018 PSYCHOLOGY R BOOTCAMP PENNSYLVANIA STATE UNIVERSITY AUGUST 16, 2018 FACTOR ANALYSIS https://psu-psychology.github.io/r-bootcamp-2018/index.html WITH ADDITIONAL MATERIALS AT https://quantdev.ssri.psu.edu/tutorials

More information

Institute of Actuaries of India

Institute of Actuaries of India Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2018 Examinations Subject CT3 Probability and Mathematical Statistics Core Technical Syllabus 1 June 2017 Aim The

More information

Statistics Introductory Correlation

Statistics Introductory Correlation Statistics Introductory Correlation Session 10 oscardavid.barrerarodriguez@sciencespo.fr April 9, 2018 Outline 1 Statistics are not used only to describe central tendency and variability for a single variable.

More information

Dimensionality Reduction and Principal Components

Dimensionality Reduction and Principal Components Dimensionality Reduction and Principal Components Nuno Vasconcelos (Ken Kreutz-Delgado) UCSD Motivation Recall, in Bayesian decision theory we have: World: States Y in {1,..., M} and observations of X

More information

Introduction to Within-Person Analysis and RM ANOVA

Introduction to Within-Person Analysis and RM ANOVA Introduction to Within-Person Analysis and RM ANOVA Today s Class: From between-person to within-person ANOVAs for longitudinal data Variance model comparisons using 2 LL CLP 944: Lecture 3 1 The Two Sides

More information

Logistic Regression Analysis

Logistic Regression Analysis Logistic Regression Analysis Predicting whether an event will or will not occur, as well as identifying the variables useful in making the prediction, is important in most academic disciplines as well

More information

ISQS 5349 Final Exam, Spring 2017.

ISQS 5349 Final Exam, Spring 2017. ISQS 5349 Final Exam, Spring 7. Instructions: Put all answers on paper other than this exam. If you do not have paper, some will be provided to you. The exam is OPEN BOOKS, OPEN NOTES, but NO ELECTRONIC

More information

From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. About This Book... xiii About The Author...

From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. About This Book... xiii About The Author... From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. Contents About This Book... xiii About The Author... xxiii Chapter 1 Getting Started: Data Analysis with JMP...

More information

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, data types 3 Data sources and preparation Project 1 out 4

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, data types 3 Data sources and preparation Project 1 out 4 Lecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, data types 3 Data sources and preparation Project 1 out 4 Data reduction, similarity & distance, data augmentation

More information