Introduction to Matrix Algebra and the Multivariate Normal Distribution

Similar documents
A Introduction to Matrix Algebra and the Multivariate Normal Distribution

An Introduction to Matrix Algebra

Vectors and Matrices Statistics with Vectors and Matrices

Univariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Statistical Distribution Assumptions of General Linear Models

Univariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Review of Multiple Regression

Maximum Likelihood Estimation; Robust Maximum Likelihood; Missing Data with Maximum Likelihood

MATH 320, WEEK 7: Matrices, Matrix Operations

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Mean Vector Inferences

Introduction to Confirmatory Factor Analysis

An Introduction to Path Analysis

Review from Bootcamp: Linear Algebra

Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Matrices and Vectors

Principal Components Analysis

Course Number 432/433 Title Algebra II (A & B) H Grade # of Days 120

Linear Algebra. The analysis of many models in the social sciences reduces to the study of systems of equations.

Path Analysis. PRE 906: Structural Equation Modeling Lecture #5 February 18, PRE 906, SEM: Lecture 5 - Path Analysis

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Histograms, Central Tendency, and Variability

Linear Algebra V = T = ( 4 3 ).

An Introduction to Mplus and Path Analysis

Part 6: Multivariate Normal and Linear Models

A Primer on Statistical Inference using Maximum Likelihood

Class Introduction and Overview; Review of ANOVA, Regression, and Psychological Measurement

[ Here 21 is the dot product of (3, 1, 2, 5) with (2, 3, 1, 2), and 31 is the dot product of

1 A Review of Correlation and Regression

A User's Guide To Principal Components

Psych Jan. 5, 2005

Math 360 Linear Algebra Fall Class Notes. a a a a a a. a a a

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

We use the overhead arrow to denote a column vector, i.e., a number with a direction. For example, in three-space, we write

Stat Lecture 20. Last class we introduced the covariance and correlation between two jointly distributed random variables.

LINEAR SYSTEMS, MATRICES, AND VECTORS

Structural Equation Modeling and Confirmatory Factor Analysis. Types of Variables

Lesson U2.1 Study Guide

Regression With a Categorical Independent Variable

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI

Numerical Methods Lecture 2 Simultaneous Equations

a11 a A = : a 21 a 22

Observations Homework Checkpoint quizzes Chapter assessments (Possibly Projects) Blocks of Algebra

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

A Review of Matrix Analysis

Modern Algebra Prof. Manindra Agrawal Department of Computer Science and Engineering Indian Institute of Technology, Kanpur

Lecture Notes Part 2: Matrix Algebra

Review of Statistics 101

36-309/749 Experimental Design for Behavioral and Social Sciences. Dec 1, 2015 Lecture 11: Mixed Models (HLMs)

Repeated Measures ANOVA Multivariate ANOVA and Their Relationship to Linear Mixed Models

Designing Information Devices and Systems I Fall 2017 Official Lecture Notes Note 2

Getting Started with Communications Engineering. Rows first, columns second. Remember that. R then C. 1

6348 Final, Fall 14. Closed book, closed notes, no electronic devices. Points (out of 200) in parentheses.

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA

Exploratory Factor Analysis and Principal Component Analysis

Designing Information Devices and Systems I Spring 2019 Lecture Notes Note 2

Chapter 1 Statistical Inference

Preliminary Statistics course. Lecture 1: Descriptive Statistics

B553 Lecture 5: Matrix Algebra Review

THE SUMMATION NOTATION Ʃ

Madison County Schools Suggested 4 th Grade Math Pacing Guide,

Notes on Mathematics Groups

High School Algebra I Scope and Sequence by Timothy D. Kanold

ESS 265 Spring Quarter 2005 Time Series Analysis: Linear Regression

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

y response variable x 1, x 2,, x k -- a set of explanatory variables

Lecture 11 Multiple Linear Regression

Designing Information Devices and Systems I Fall 2018 Lecture Notes Note 2

Language as a Stochastic Process

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

How can you solve a multistep. How can you solve an absolute value equation? How can you solve and absolute value. inequality?

Algebra , Martin-Gay

AP Statistics Cumulative AP Exam Study Guide

Stat 206: Sampling theory, sample moments, mahalanobis

Lesson 1: Inverses of Functions Lesson 2: Graphs of Polynomial Functions Lesson 3: 3-Dimensional Space

Math 123, Week 2: Matrix Operations, Inverses

Contents. Acknowledgments. xix

Madison County Schools Suggested 3 rd Grade Math Pacing Guide,

CSCI 239 Discrete Structures of Computer Science Lab 6 Vectors and Matrices

Module 1: Equations and Inequalities (30 days) Solving Equations: (10 Days) (10 Days)

Multiple Linear Regression

[y i α βx i ] 2 (2) Q = i=1

Maths for Signals and Systems Linear Algebra for Engineering Applications

Multivariate Regression (Chapter 10)

Introduction to Factor Analysis

Milford Public Schools Curriculum. Department: Mathematics Course Name: Algebra 1 Level 2

A Re-Introduction to General Linear Models (GLM)

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

SPSS LAB FILE 1

POLI270 - Linear Algebra

APPENDIX: MATHEMATICAL INDUCTION AND OTHER FORMS OF PROOF

Deciphering Math Notation. Billy Skorupski Associate Professor, School of Education

Algebra 1. Mathematics Course Syllabus

Image Registration Lecture 2: Vectors and Matrices

[POLS 8500] Review of Linear Algebra, Probability and Information Theory

I&C 6N. Computational Linear Algebra

Transcription:

Introduction to Matrix Algebra and the Multivariate Normal Distribution Introduction to Structural Equation Modeling Lecture #2 January 18, 2012 ERSH 8750: Lecture 2

Motivation for Learning the Multivariate Normal Distribution The univariate and multivariate normal distributions serve as the backbone of the most frequently used statistical procedures They are very robust and very useful Understanding their form and function will help you learn about a whole host of statistical routines immediately This course is essentially a course based in normal distributions Most of structural equation modeling happens with continuous variables with assumptions of multivariate normality To learn about the MVN distribution, one must first learn about matrix algebra ERSH 8750: Lecture 2 2

Introduction and Motivation for Matrix Algebra Structural equation modeling and nearly all other multivariate statistical techniques are described with matrix algebra It is the language of modern statistics When new methods are developed, the first published work typically involves matrices It makes technical writing more concise formulae are smaller Have you seen: (from the general linear model) (from confirmatory factor analysis) Useful tip: matrix algebra is a great way to get out of conversations and other awkward moments ERSH 8750: Lecture 2 3

Why Learn Matrix Algebra Matrix algebra can seem very abstract from the purposes of this class (and statistics in general) Learning matrix algebra is important for: Understanding how statistical methods work And when to use them (or not use them) Understanding what statistical methods mean Reading and writing results from new statistical methods Today s class is the first lecture of learning the language of structural equation modeling ERSH 8750: Lecture 2 4

Definitions We begin this class with some general definitions (from dictionary.com): Matrix: 1. A rectangular array of numeric or algebraic quantities subject to mathematical operations 2. The substrate on or within which a fungus grows Algebra: 1. A branch of mathematics in which symbols, usually letters of the alphabet, represent numbers or members of a specified set and are used to represent quantities and to express general relationships that hold for all members of the set 2. A set together with a pair of binary operations defined on the set. Usually, the set and the operations include an identity element, and the operations are commutative or associative ERSH 8750: Lecture 2 5

A Guiding Example To demonstrate matrix algebra, we will make use of data Imagine that somehow I collected data SAT test scores for both the Math (SATM) and Verbal (SATV) sections of 1,000 students The descriptive statistics of this data set are given below: ERSH 8750: Lecture 2 6

The Data In Excel: ERSH 8750: Lecture 2 7

THE BASICS: DEFINITIONS OF MATRICES, VECTORS, AND SCALARS ERSH 8750: Lecture 2 8

Matrices A matrix is a rectangular array of data Used for storing numbers Matrices can have unlimited dimensions For our purposes all matrices will have two dimensions: Row Columns Matrices are symbolized by boldface font in text, typically with capital letters Size (r rows x c columns) 520 580 520 550 540 660 ERSH 8750: Lecture 2 9

Vectors A vector is a matrix where one dimension is equal to size 1 Column vector: a matrix of size 1 520 520 540 Row vector: a matrix of size 1 520 580 Vectors are typically written in boldface font text, usually with lowercase letters ERSH 8750: Lecture 2 10

Scalars A scalar is just a single number (as we have known before) The name scalar is important: the number scales a vector it can make a vector longer or shorter ERSH 8750: Lecture 2 11

Matrix Elements A matrix (or vector) is composed of a set of elements Each element is denoted by its position in the matrix (row and column) For our matrix of data (size 1000 rows and 2 columns), each element is denoted by: The first subscript is the index for the rows: i= 1,,r (= 1000) The second subscript is the index for the columns: j = 1,,c (= 2),, ERSH 8750: Lecture 2 12

Matrix Transpose The transpose of a matrix is a reorganization of the matrix by switching the indices for the rows and columns An element in the original matrix is now in the transposed matrix ERSH 8750: Lecture 2 13

Types of Matrices Square Matrix: A matrix that has the same number of rows and columns Correlation/covariance matrices are square matrices Diagonal Matrix: A diagonal matrix is a square matrix with non zero diagonal elements ( and zeros on the off diagonal elements ( ): 2.759 0 0 0 1.643 0 0 0 0.879 We will use diagonal matrices to form correlation matrices Symmetric Matrix: A symmetric matrix is a square matrix where all elements are reflected across the diagonal ( Correlation and covariance matrices are symmetric matrices ERSH 8750: Lecture 2 14

MATRIX ALGEBRA ERSH 8750: Lecture 2 15

Moving from Vectors to Matrices A matrix can be thought of as a collection of vectors Matrix operations are vector operations on steroids Matrix algebra defines a set of operations and entities on matrices I will present a version meant to mirror your previous algebra experiences Definitions: Identity matrix Zero vector Ones vector Basic Operations: Addition Subtraction Multiplication Division ERSH 8750: Lecture 2 16

Matrix Addition and Subtraction Matrix addition and subtraction are much like vector addition/subtraction Rules: Matrices must be the same size (rows and columns) Method: The new matrix is constructed of element by element addition/subtraction of the previous matrices Order: The order of the matrices (pre and post ) does not matter ERSH 8750: Lecture 2 17

Matrix Addition/Subtraction ERSH 8750: Lecture 2 18

Matrix Multiplication Matrix multiplication is a bit more complicated The new matrix may be a different size from either of the two multiplying matrices Rules: Pre multiplying matrix must have number of columns equal to the number of rows of the post multiplying matrix Method: Order: The elements of the new matrix consist of the inner (dot) product of the row vectors of the pre multiplying matrix and the column vectors of the post multiplying matrix The order of the matrices (pre and post ) matters ERSH 8750: Lecture 2 19

Matrix Multiplication ERSH 8750: Lecture 2 20

Multiplication in Statistics Many statistical formulae with summation can be re expressed with matrices A common matrix multiplication form is: Diagonal elements: Off diagonal elements: For our SAT example: 251,797,800 251,928,400 251,928,400 254,862,700 ERSH 8750: Lecture 2 21

Identity Matrix The identity matrix is a matrix that, when pre or postmultiplied by another matrix results in the original matrix: The identity matrix is a square matrix that has: Diagonal elements = 1 Off diagonal elements = 0 ERSH 8750: Lecture 2 22

Zero Vector The zero vector is a column vector of zeros When pre or post multiplied the result is the zero vector: 0 0 0 ERSH 8750: Lecture 2 23

Ones Vector A ones vector is a column vector of 1s: The ones vector is useful for calculating statistical terms, such as the mean vector and the covariance matrix Next class we will discuss what these matrices are, how we compute them, and what the mean 1 1 1 ERSH 8750: Lecture 2 24

Matrix Division : The Inverse Matrix Division from algebra: First: Second: 1 Division in matrices serves a similar role For square and symmetric matrices, an inverse matrix is a matrix that when pre or post multiplied with another matrix produces the identity matrix: Calculation of the matrix inverse is complicated Even computers have a tough time Not all matrices can be inverted Non invertable matrices are called singular matrices In statistics, singular matrices are commonly caused by linear dependencies ERSH 8750: Lecture 2 25

The Inverse In data: the inverse shows up constantly in statistics Models which assume some type of (multivariate) normality need an inverse covariance matrix Using our SAT example Our data matrix was size (1000 x 2), which is not invertable However was size (2 x 2) square, and symmetric 251,797,800 251,928,400 251,928,400 254,862,700 The inverse is: 3.61 7 3.57 7 3.57 7 3.56 7 ERSH 8750: Lecture 2 26

Matrix Algebra Operations For such that exists: ERSH 8750: Lecture 2 27

UNIVARIATE STATISTICS ERSH 8750: Lecture 2 28

Today s Second Example For today, we will use hypothetical data on the heights and weights of 50 students picked at random from the University of Georgia You can follow along in Mplus using the example syntax and the data file heightweight.csv 300 250 Weight (in pounds) 200 150 100 50 0 50 55 60 65 70 75 80 Height (in inches) ERSH 8750: Lecture 2 29

Example Mplus Syntax PLOT command Requests various plots PLOT1 is for histograms and scatterplots OUTPUT command Requests additional output SAMPSTAT: descriptive sample statistics STANDARDIZED: standardized estimates ERSH 8750: Lecture 2 30

Random Variables A random variable is a variable whose outcome depends on the result of chance Can be continuous or discrete Example of continuous: represents the height of a person, drawn at random Example of discrete: represents the gender of a person, drawn at random Had a density function that indicates relative frequency of occurrence A density function is a mathematical function that gives a rough picture of the distribution from which a random variable is drawn We will discuss the ways that are used to summarize a random variable ERSH 8750: Lecture 2 31

Mean The mean is a measure of central tendency For a single variable, (put into a column vector): ( is a column vector with elements all 1 size N x 1) The sampling distribution of Converges to normal as N increases (central limit theorem) In our data: mean height: 67.28 (in inches) ERSH 8750: Lecture 2 32

Variance The variance is a measure of variability (or spread/range) For a single variable, (put into a column vector): 1 1 Note, if divided by N, maximum likelihood estimate is used MLE is biased, so often N 1 is used If the original data are normally distributed, then the sampling distribution of the variance is: 1 For our data: Variance of height: 21.8(in inches squared) Standard deviation of height: 4.7(in inches) ERSH 8750: Lecture 2 33

Linear Combinations of If we create a new variable based on a linear combination of our previous variable: The mean of the linear combination is The variance of the linear combination is In our data: If we wanted to change height from inches to centimeters, we would make the linear combination 2.54 Mean of height in CM: 2.54 170.9 Variance of height in CM 2 : 2.54 140.4 Standard deviation of height in CM: 2.54 11.8 ERSH 8750: Lecture 2 34

UNIVARIATE STATISTICAL DISTRIBUTIONS ERSH 8750: Lecture 2 35

Univariate Normal Distribution For a continuous random variable (ranging from to ) the univariate normal distribution function is: The shape of the distribution is governed by two parameter: The mean The variance The skewness (lean) and kurtosis (peakedness) are fixed Standard notation for normal distributions is ERSH 8750: Lecture 2 36

Univariate Normal Distribution For any value of, gives the height of the curve (relative frequency) ERSH 8750: Lecture 2 37

Chi Square Distribution Another frequently used univariate distribution is the Chi Square distribution Previously we mentioned the sampling distribution of the variance followed a Chi Square distribution For a continuous (i.e., quantitative) random variable (ranging from to ), the chi square distribution is given by: is called the gamma function The chi square distribution is governed by one parameter: (the degrees of freedom) The mean is equal to ; the variance is equal to 2 ERSH 8750: Lecture 2 38

(Univariate) Chi Square Distribution ERSH 8750: Lecture 2 39

Uses of Distributions Statistical models make distributional assumptions on various parameters and/or parts of data These assumptions govern: How models are estimated How inferences are made How missing data may be imputed If data do not follow an assumed distribution, inferences may be inaccurate Sometimes a problem, other times not so much Therefore, it can be helpful to check distributional assumptions prior to (or while) running statistical analyses ERSH 8750: Lecture 2 40

BIVARIATE STATISTICS AND DISTRIBUTIONS ERSH 8750: Lecture 2 41

Bivariate Statistics Up to this point, we have focused on only one of our variables: height Looked at its marginal distribution (the distribution of height independent of that of weight) Could have looked at weight, marginally Structural equation modeling is about exploring joint distributions of multiple variables How sets of variables relate to each other As such, we will now look at the joint distributions of two variables or in matrix form: (size N x 2) Beginning with two, then moving to anything more than two ERSH 8750: Lecture 2 42

Multiple Means: The Mean Vector We can use a vector to describe the set of means for our data Here is a N x 1 vector of 1s The resulting mean vector is a p x 1 vector of means For our data: From Mplus: ERSH 8750: Lecture 2 43

Mean Vector: Graphically The mean vector is the center of the distribution of both variables 300 250 Weight (in pounds) 200 150 100 50 0 50 55 60 65 70 75 80 Height (in inches) ERSH 8750: Lecture 2 44

Covariance of a Pair of Variables The covariance is a measure of the relatedness Expressed in the product of the units of the two The covariance between height and weight was 155.4 (in inch pounds) The denominator N is the ML version unbiased is N 1 Because the units of the covariance are difficult to understand, we more commonly describe association (correlation) between two variables with correlation Covariance divided by the product of each variable s standard deviation ERSH 8750: Lecture 2 45

Covariance in Mplus Because Mplus defaults to the multivariate normal distribution, covariance is the type of parameter natively estimated by the program Additionally, the MODEL RESULTS are maximum likelihood estimates (MLEs, see next week s lecture) This means the denominator is N, not N 1 ERSH 8750: Lecture 2 46

Correlation of a Pair of Variables Correlation is covariance divided by the product of the standard deviation of each variable: The correlation between height and weight was 0.72 Correlation is unitless it only ranges between 1 and 1 If and had variances of 1, the covariance between them would be a correlation Covariance of standardized variables = correlation ERSH 8750: Lecture 2 47

Covariance and Correlation in Matrices The covariance matrix (for any number of variables p) is found by: 1 If we take the SDs (the square root of the diagonal of the covariance matrix) and put them into a diagonal matrix,the correlation matrix is found by: 1 1 ERSH 8750: Lecture 2 48

Example Covariance Matrix For our data, the covariance matrix was: 21.322 154.5 154.5 2,215.007 The diagonal matrix was: 21.322 0 0 2,215.007 The correlation matrix was: 1 0 21.322 1 0 2,215.007 21.322 154.5 154.5 2,215.007 1.000 0.715 0.715 1.000 1 21.322 0 0 1 2,215.007 ERSH 8750: Lecture 2 49

Correlation and Covariance in Mplus Mplus does not directly estimate correlation But you can obtain it in two ways with the output section: SAMPSTAT STANDARDIZED Cov is, Mean is ERSH 8750: Lecture 2 50

Generalized Variance The determinant of the sample covariance matrix is called the generalized variance It is a measure of spread across all variables Reflecting how much overlap (covariance) in variables occurs Amount of overlap reduces the generalized sample variance The generalized sample variance is: Largest when variables are uncorrelated Zero when variables form a linear dependency In data: The generalized variance is seldom used descriptively, but shows up more frequently in maximum likelihood functions ERSH 8750: Lecture 2 51

Total Sample Variance The total sample variance is the sum of the variances of each variable in the sample The sum of the diagonal elements of the sample covariance matrix The trace of the sample covariance matrix tr The total sample variance does not take into consideration the covariances among the variables Will not equal zero if linearly dependency exists In data: The total sample variance is commonly used as the denominator (target) when calculating variance accounted for measures ERSH 8750: Lecture 2 52

BIVARIATE NORMAL DISTRIBUTION ERSH 8750: Lecture 2 53

Bivariate Normal Distribution The bivariate normal distribution is a statistical distribution for two variables Both variable is normally distributed marginally (by itself) Together, they form a bivariate normal distribution The bivariate normal density provides the relatively frequency of observing any pair of observations, Where ERSH 8750: Lecture 2 54

Bivariate Normal Plot #1 Density Surface (3D) Density Surface (2D): Contour Plot ERSH 8750: Lecture 2 55

Bivariate Normal Plot #2 Density Surface (3D) Density Surface (2D): Contour Plot ERSH 8750: Lecture 2 56

MULTIVARIATE DISTRIBUTIONS (VARIABLES 2) ERSH 8750: Lecture 2 57

Multivariate Normal Distribution The multivariate normal distribution is the generalization of the univariate normal distribution to multiple variables The bivariate normal distribution just shown is part of the MVN The MVN provides the relative likelihood of observing all p variables for a subject i simultaneously: The multivariate normal density function is: ERSH 8750: Lecture 2 58

The Multivariate Normal Distribution The mean vector is The covariance matrix is The covariance matrix must be non singular (invertable) ERSH 8750: Lecture 2 59

Multivariate Normal Notation Standard notation for the multivariate normal distribution of p variables is Our bivariate normal would have been, In data: The multivariate normal distribution serves as the basis for most every statistical technique commonly used in the social and educational sciences General linear models (ANOVA, regression, MANOVA) General linear mixed models (HLM/multilevel models) Factor and structural equation models (EFA, CFA, SEM, path models) Multiple imputation for missing data Simply put, the world of commonly used statistics revolves around the multivariate normal distribution Understanding it is the key to understanding many statistical methods ERSH 8750: Lecture 2 60

Bivariate Normal Plot #2 (Multivariate Normal) Density Surface (3D) Density Surface (2D): Contour Plot ERSH 8750: Lecture 2 61

Multivariate Normal Properties The multivariate normal distribution has some useful properties that show up in statistical methods If is distributed multivariate normally: 1. Linear combinations of are normally distributed 2. All subsets of are multivariate normally distributed 3. A zero covariance between a pair of variables of implies that the variables are independent 4. Conditional distributions of are multivariate normal ERSH 8750: Lecture 2 62

Sampling Distributions of MVN Statistics Just like in univariate statistics, there is a multivariate central limit theorem If the set of observations on variables is multivariate normal or not: The distribution of the mean vector is: The distribution of (covariance matrix) is Wishart (a multivariate chi square) with degrees of freedom ERSH 8750: Lecture 2 63

The Wishart Distribution The Wishart distribution is a multivariate chi square distribution Input: S (model predicted covariance matrix) Output: Likelihood value Fixed: (sample value of covariance matrix) In statistics, it appears whenever: Data are assumed multivariate normal Only covariance matrix based inferences are needed Mean vector ignored Mainly: Initial ML factor analysis and structural equation modeling ERSH 8750: Lecture 2 64

Sufficient Statistics The sample estimates and are called sufficient statistics All of the information contained in the data can be summarized by these two statistics alone No data is needed for analysis only these statistics Only true when: Data truly follow a multivariate normal distribution No missing data are present ERSH 8750: Lecture 2 65

ASSESSING UNIVARIATE NORMALITY IN MPLUS ERSH 8750: Lecture 2 66

Assessing Multivariate Normality of Data Up until the last week of class, we will assume our data follow a multivariate normal distribution Data should be multivariate normal then Methods exist for getting a rough picture of whether or not data are multivariate normal I will highlight a graphical method using the Q Q plot Q Q plot: actual data with what normal distribution would predict Mplus will provide Q Q plots for each variable the first step in assessing multivariate normality If one or more variables are not normal, then the whole set cannot be multivariate normal In reality: Determining exact distribution of data is difficult if not impossible Methods that assume multivariate normality are fairly robust to small deviations from that assumption ERSH 8750: Lecture 2 67

Q Q Plots and Normal Distribution Overlaid Histograms in Mplus If you specify the PLOT section, you can go to Graphs View Graphs and plot your variables: Q Q Plot: Points should fall onto line Histogram: Distribution should be symmetric, unimodal ERSH 8750: Lecture 2 68

When and Why to Check Normality Good news: most methods are fairly robust to violations of normality For homework we will have to check this assumption for each variable If normality does not hold: Check your data for input errors (i.e., 99 = missing) Plot every variable visualization is the most important thing Do not clean your data Use a model with a different link function, if possible (generalized model) ERSH 8750: Lecture 2 69

Overall Practice of Data Inspection Overall, checking your data in Mplus is a good thing Helps make sure variables are input properly Makes sure missing codes are properly read You would not believe how often this happens in practice Compare easily obtainable statistics Sample means [mean vector] Covariances Variances If values do not match those of another program, check your data for issues in importing/exporting ERSH 8750: Lecture 2 70

WRAPPING UP ERSH 8750: Lecture 2 71

Concluding Remarks Matrix algebra plays a role in concisely depicting structural equation models Many SEM texts and articles are described by The univariate and multivariate normal distributions serve as the backbone of the most frequently used statistical procedures They are very robust and very useful Understanding their form and function will help you learn about a whole host of statistical routines immediately This course is essentially a course based in normal distributions ERSH 8750: Lecture 2 72