Research Design - - Topic 15a Introduction to Multivariate Analyses 2009 R.C. Gardner, Ph.D.

Similar documents
Inference about the Slope and Intercept

CONTINUOUS SPATIAL DATA ANALYSIS

Dependence and scatter-plots. MVE-495: Lecture 4 Correlation and Regression

Research Design: Topic 18 Hierarchical Linear Modeling (Measures within Persons) 2010 R.C. Gardner, Ph.d.

Correlation and regression. Correlation and regression analysis. Measures of association. Why bother? Positive linear relationship

Research Design - - Topic 19 Multiple regression: Applications 2009 R.C. Gardner, Ph.D.

Chapter 13 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics

11. Regression and Least Squares

Factor Analysis. Qian-Li Xue

Copyright, 2008, R.E. Kass, E.N. Brown, and U. Eden REPRODUCTION OR CIRCULATION REQUIRES PERMISSION OF THE AUTHORS

INF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification

Data Analysis as a Decision Making Process

Research Design - - Topic 13a Split Plot Design with Either a Continuous or Categorical Between Subjects Factor 2008 R.C. Gardner, Ph.D.

1 History of statistical/machine learning. 2 Supervised learning. 3 Two approaches to supervised learning. 4 The general learning procedure

Topic - 12 Linear Regression and Correlation

Random Vectors. 1 Joint distribution of a random vector. 1 Joint distribution of a random vector

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

Assessing the relation between language comprehension and performance in general chemistry. Appendices

Model Estimation Example

26:010:557 / 26:620:557 Social Science Research Methods

Multivariate analysis of variance and covariance

Biostatistics in Research Practice - Regression I

Specifying Latent Curve and Other Growth Models Using Mplus. (Revised )

Semi-Supervised Laplacian Regularization of Kernel Canonical Correlation Analysis

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

Forestry 430 Advanced Biometrics and FRST 533 Problems in Statistical Methods Course Materials 2010

Correlation and simple linear regression S5

STAT 4385 Topic 03: Simple Linear Regression

Multivariate Regression (Chapter 10)

Correlation and regression

Statistics in medicine

REVIEW 8/2/2017 陈芳华东师大英语系

Linear regression Class 25, Jeremy Orloff and Jonathan Bloom

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models

STATISTICAL DATA ANALYSIS IN EXCEL

INF Introduction to classifiction Anne Solberg

LINEAR REGRESSION ANALYSIS

Impact of serial correlation structures on random effect misspecification with the linear mixed model.

4.7. Newton s Method. Procedure for Newton s Method HISTORICAL BIOGRAPHY

Exam 3 Practice Questions Psych , Fall 9

Correlation analysis 2: Measures of correlation

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

DESIGNING EXPERIMENTS AND ANALYZING DATA A Model Comparison Perspective

General structural model Part 1: Covariance structure and identification. Psychology 588: Covariance structure and factor models

Linear Equation Theory - 2

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Weighted Least Squares

INF Anne Solberg One of the most challenging topics in image analysis is recognizing a specific object in an image.

Strain Transformation and Rosette Gage Theory

N Utilization of Nursing Research in Advanced Practice, Summer 2008

Statistical tables are attached Two Hours UNIVERSITY OF MANCHESTER. May 2007 Final Draft

Inferential Statistics and Methods for Tuning and Configuring

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

An Introduction to Multivariate Statistical Analysis

Business Statistics. Lecture 10: Correlation and Linear Regression

Correlation and Regression Bangkok, 14-18, Sept. 2015

PIRLS 2016 Achievement Scaling Methodology 1

MANOVA MANOVA,$/,,# ANOVA ##$%'*!# 1. $!;' *$,$!;' (''

4.1 Computing section Example: Bivariate measurements on plants Post hoc analysis... 7

Definitions of terms and examples. Experimental Design. Sampling versus experiments. For each experimental unit, measures of the variables of

MULTIVARIATE ANALYSIS OF VARIANCE

Multivariate Fundamentals: Rotation. Exploratory Factor Analysis

Psychology 282 Lecture #4 Outline Inferences in SLR

Linear correlation. Contents. 1 Linear correlation. 1.1 Introduction. Anthony Tanbakuchi Department of Mathematics Pima Community College

DISCOVERING STATISTICS USING R

Modelling using ARMA processes

Neuendorf MANOVA /MANCOVA. Model: X1 (Factor A) X2 (Factor B) X1 x X2 (Interaction) Y4. Like ANOVA/ANCOVA:

Research Design - - Topic 12 MRC Analysis and Two Factor Designs: Completely Randomized and Repeated Measures 2010 R.C. Gardner, Ph.D.

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

POWER AND TYPE I ERROR RATE COMPARISON OF MULTIVARIATE ANALYSIS OF VARIANCE

Logistic Regression: Regression with a Binary Dependent Variable

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Review of Probability

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

UNCORRECTED. To recognise the rules of a number of common algebraic relations: y = x 1 y 2 = x

THE PEARSON CORRELATION COEFFICIENT

Computational Systems Biology: Biology X

Data Analyses in Multivariate Regression Chii-Dean Joey Lin, SDSU, San Diego, CA

A guide to multiple-sample structural equation modeling

Introduction to Within-Person Analysis and RM ANOVA

Practical Biostatistics

Contents. Acknowledgments. xix

Mathematical Notation Math Introduction to Applied Statistics

Chapter 11. Correlation and Regression

CHAPTER 2. Types of Effect size indices: An Overview of the Literature

Bayesian Linear Regression

Exponential and Logarithmic Functions

Subject CS1 Actuarial Statistics 1 Core Principles

Correlation and Linear Regression

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

Key words: Analysis of dependence, chi-plot, confidence intervals.

A User's Guide To Principal Components

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling

Multiple Regression. More Hypothesis Testing. More Hypothesis Testing The big question: What we really want to know: What we actually know: We know:

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression

Y (Nominal/Categorical) 1. Metric (interval/ratio) data for 2+ IVs, and categorical (nominal) data for a single DV

MANOVA is an extension of the univariate ANOVA as it involves more than one Dependent Variable (DV). The following are assumptions for using MANOVA:

Review of the General Linear Model

Transcription:

Research Design - - Topic 15a Introduction to Multivariate Analses 009 R.C. Gardner, Ph.D. Major Characteristics of Multivariate Procedures Overview of Multivariate Techniques Bivariate Regression and Correlation General Rationale and Applications Different Interpretations of Correlation Three Limited Truths 1

Issues in Multivariate Statistics Distinction between Univariate, Bivariate, and Multivariate statistics. Tpes of Variables: Predictor/criterion; Stimulus/response; Input/output; Independent/dependent, etc. Limitation: Generalization limited to variables in the analsis. Tpes of Inferences: Man concerned with individual differences; sometimes concerned with group differences. Basic Characteristic: Aggregation of variables.

Overview of Procedures The techniques can be classified in various was. For eample: I. Relationships between a single dependent variable and a set of other variables. II. Derivatives of Analsis of Variance (one dependent variable III. Relations involving sets of Dependent and Independent variables IV. Structure of relationships among variables V. Analsis of Frequenc Data VI. Analsis over serial time 3

I Multiple Regression and Multiple Correlation (T&F-5 Purpose: To determine a weighted aggregate to ield highest correlation with a criterion (least squares. Basic equation: Y Categor I i b0 + b1 i1 + b i + Tests of significance: R², b 0, b 1, etc... Applications: Test selection Analsis of variance and MRC analsis Path analsis Mediation and moderation Curve fitting 4

I Discriminant Function Analsis (T&F-9 Purpose: To determine weighted aggregates to predict group membership Basic equation: L ji w1 1 i + w i +... Tests of Significance: F-ratio (Begin/Within Applications: Empirical: Predict group membership in a new sample Theoretical: Hpothesis underling process determining group membership 5

I Logistic Regression (T&F-10 Purpose: To determine a weighted aggregate to maimize the probabilit of predicting membership in a dichotomous outcome variable (maimum likelihood. Basic equation: Y i e b0 + b1 i1+ b i +... b0 + b1 i 1+ b i +... 1+ e Pr ob( group 1 Tests of significance: χ² goodness of fit (log-likelihood ratio, model comparison, b 0, b 1, etc (i.e., odds ratios 6

Categor II II Analsis of Covariance (T&F-6 Purpose: To perform an analsis of variance having first removed variance associated with etraneous variable(s (covariate(s. Basic Equation: Y ij Y ij ( Y GY + b. ( c Gc ij c ij Tests of Significance: F-ratios of the effects of interest with the effects of the covariate removed, posthoc tests of the adjusted means. 7

II. Profile Analsis (T&F-8 Purpose: To compare profiles for groups of individuals on a series of similarl scaled dependent variables. Basic Equation: This is simpl a split plot analsis of variance where the tests of Within subject effects (i.e., Flatness and Parallelism are performed using the multivariate approach, while Elevation (Between groups is based on the univariate approach. Tests of significance: Hotelling s T², and F-ratios. 8

II Hierarchical Linear Modeling (T&F-15 Purpose: To determine regression coefficients linking a dependent variable with continuous independent variables of observations belonging to groups (for persons within groups or individuals (for measures within persons. Note the similarit with analsis of variance and MRC analsis, though this method emplos maimum likelihood. Basic equation (for Persons within Groups: Y ij γ + γ ( ij j + µ j + µ 00 10 0 1 j ( ij j + r ij Tests of Significance: Mean Intercept and Mean Slope over groups, variance of intercepts and variance of slopes across groups, covariance of intercepts and slopes with group level attribute(s. 9

Categor III III. Multivariate Analsis of Variance (T&F-7 Purpose: To determine weighted aggregates that produce maimum variance of main effects and, where relevant, interaction effects. Basic equation: L ji w1 1 i + w i +... Tests of Significance: Multivariate tests of significance (Pillais, Hotelling, Wilks, Ro, discriminant functions, F-ratios of univariate effects, post hoc tests of means. 10

III. Canonical Correlation (T&F-1 Purpose: To determine weights for a set of variables and a set of Y variables so that the correlation between the two aggregates is as large as possible. There will be k or k Y such correlations (in decreasing magnitude whichever is less. Basic Equations: CIi w + w + 1 1i i... CYIi w + w + 1 Y1i Y i... Tests of significance: The collection of all R², followed b sequential tests of remaining R² values once largest ones are removed in order. 11

IV. Principal Components and Factor Analsis (T&F-13 Purpose: To determine the number and nature of dimensions accounting for covariation (generall correlations among a set of variables. The nature of the dimensions (factors are then interpreted b considering the correlations (factor loadings of each of the variables with the factors. Often the factors are rotated to facilitate interpretation. Basic Equation: Ii Categor IV w + w + I1 1i I i Tests of Significance: None for the most commonl used procedures (i.e., principal components, principal ais, etc., but tests of significance for the factor loadings for procedures such as maimum likelihood. 1...

IV. Multidimensional Scaling Purpose: To determine the number and nature of dimensions underling similarit/dissimilarit judgments or Euclidian distances between pairs of elements. The rationale is comparable to that of factor analsis. Dimensions are interpreted in terms of what is common to elements with large projections on a dimension and distinguishes them from elements with small projections. IV. Cluster analsis Purpose: To determine discrete groupings of variables in terms of indices of similarit or Euclidean distance. The rationale involves arranging variables in terms of their decreasing similarit to other variables and forming groupings of variables that can be considered to form a set. There are a number of variations on how this is done. 13

IV. Structural Equation Modeling (T&F-14 Purpose: To test a regression model involving latent variables or to evaluate a confirmator factor analsis of a set of indicator variables. Basic Equations: Causal Modeling Σ Λ Y Α( ΓΦΓ +Ψ Α Λ Λ ΦΓΑ Λ Y Y θ ε Λ Y Λ ΑΓΦΛ ΦΛ + θ δ Confirmator Factor Analsis Σ Λ ΦΛ + θδ Tests of Significance: regression coefficients, correlations, variances 14

Categor V V. Multiwa Frequenc Analsis (T&F-16 Purpose: To investigate associations between a number of categorical variables (A, B, C and their interactions (AB, AC, BC, ABC, to identif which effects account for the observed frequencies, and to evaluate the goodness of fit of a reduced model. The rationale is not unlike analsis of variance in assessing which of the effects account most for the data ecept that it involves assessments of the goodness of fit of a linear model of the logarithm of the epected cell frequencies (hence loglinear analsis. 15

Categor V1 VI. Time Series Analsis (T&F-18 onl online Purpose: To ascertain whether there are patterns in responses over a number of time periods (>50 for data obtained from a single subject or aggregate of a number of subjects, or to test whether some intervention interferes with the pattern. The analsis can also be used to predict future patterns based on the eisting patterns. Generall consists of three steps, identification, estimation, and diagnosis. VI. Survival/Failure Analsis (T&F-11 Purpose: To stud the time it takes for an event to happen (i.e., death, malfunction of a piece of equipment, etc.. The dependent variable is the proportion of cases at an given time in the interval; the independent variable is time. 16

Common Themes There are a number of themes that are common to most of these procedures: 1. The involve relationships among variables, often considered initiall two at a time. As a consequence a foundation of multivariate procedures is bivariate regression and correlation.. The involve weighted aggregation of variables. 3. Generalizations are applicable to the population of subjects for which the sample can be considered representative but are restricted onl to the collection of variables included in the analsis. 4. The purpose of the procedures are each ver simple but the mathematics can be a bit comple. 17

Bivariate Regression and Correlation Bivariate regression refers to an equation that relates a dependent variable to an independent variable, or a criterion to a predictor. The fundamental equation is: Y a with a and b determined such that Σ(Y-Y ² a minimum. + b and a Y b b ( ( Y Y ( ² 18

Bivariate correlation refers to covariation between two variables, and Y. The most common measure is the Pearson product-moment correlation coefficient defined as: r Y ( ( Y Y ( ( Y ns S ( n 1 S S b b Y u Y u Y Y n n 1 using biased (S b and unbiased (S u estimates of the standard deviations respectivel. Y Or alternatives: ( ( ( Y ² Y ( Y Y ² ( N N Y Y ² ( ²( N Y ² ( Y ² 19

Given YY +(Y-Y, we can compute: ( Y Y ² ( Y Y ² + ( Y Y ² SS TOTAL SS REGRESSION + SS RESIDUAL And with some algebra, we can construct the following summar table Source df Sums of Squares Regression Residual 1 n r SS TOTAL ( SS 1 r TOTAL F r SS SSTOTAL (1 r n TOTAL Total n 1 (1 r r ( n 0

Consider the sample data set: Y 3 3 4 5-1.50 -.75-1.50 -.50 4 5 -.75 -.50 4 3 -.75-1.50 5 7 0.50 5 6 0 0 5 7 0.50 6 9.75 1.50 7 7 1.50.50 7 8 1.50 1.00 Mean 5.0 6.0.00.00 S u 1.33.00 1.00 1.00 1

Computing Regression Coefficients and Correlation b ( ( Y Y ( ² 0 16 1.5 a Y b 6.0 (1.5(5.0.5 b ( ( Y Y ( Y Y ² 0 36.56 a b Y 5.0 (.56(6.0 1.64 r Y ( ( Y Y ( n 1 Su SuY 0 9(1.33(.00.84

7 Regression of on 6 5 4 3 1 a 1.64 b.56 The two regression plots in raw score form 0 0 1 3 4 5 6 7 8 9 Y In each case: a the intercept --the value of the dependent variable when the independent variable is 0. Y 9 8 7 6 5 4 3 Regression of on b 1.5 b the slope of the regression line (the rise over the run. 1 0 a -.5 0 1 3 4 5 6 7 3

Regression of on 1.5 1 0.5 r.84 The two regression plots in standard score form -1.5-1 -0.5 0.5 1 1.5-0.5-1 -1.5 Note. The slope of the regression line in standard score form is the same in each case and is equal to the correlation coefficient (.84. The intercept is 0. 1.5 1 0.5-1.5-1 -0.5 0.5 1 1.5-0.5-1 -1.5 r.84 Regression of on 4

5 Different Interpretations of Correlation 1. Correlation is a measure of the linear relation between and : ( ( ( ( r r b b ( ( ( ( ( ( ( ( where: b a + and b a ( b +

6. Correlation is a measure of the slope of the regression line in standard score form: r slope r Best fit line ( minimum ( r r r + ( 0 + r r dr d 0 0 + r r n r 1 S n Note:

7 3. Correlation is a measure of the accurac of predicting given : Given ( + S S S + where and ( are independent S S S S S S S S r and 1 S S 1 S S r ±

Three Limited Truths 1. The Pearson product-moment correlation varies from -1 to +1. True, onl under ver specific circumstances. Proof: N Given S 1 and r N r can equal +1, onl if and -1, onl if - Thus, for this to be true, the standardized distributions of and must be: a Identical b Smmetrical (not necessaril normal 8

. Given a large enough sample size, the correlation will alwas be significant. True, onl because of artifacts. Proof: Given ρ T T Y + T + ER EM Y TY + EYR + EYM (i.e., the measures of and Y consist of true scores (T & T Y, random error (E R and E YR and measurement error (E M & E YM. Given: 0, it is possible that ρ Y 0. because the correlations ρ T E YM, ρ and ρ are not T Y E M E M E YM 0 Thus, even with two variables that are trul independent, the correlation between measures of those variables ma not be 0, and given a large enough sample size it ma be significant. 9

3. Correlation does not mean causation. This is not a limitation of the statistic, but rather the nature of the underling design. Consider an eperiment on the effects of the amount of alcohol consumed in the afternoon and number of hours slept that night. This stud could be run in controlled conditions with careful attention to detail, etc. The correlation between the two could be considered an inde of the linear effects of alcohol on hours slept (and an indication of causalit if the amount consumed was randoml determined and administered b the eperimenter. The correlation between the two would simpl be an inde of the covariation between the two if the amount consumed was not determined randoml. The regression equation would describe the nature of the linear relationship. 30

References Cohen, J. (1988. Statistical Power Analsis for the Behavioral Sciences (Second Edition. Hillsdale, NJ: Lawrence Erlbaum. Gardner, R. C. (000. Correlation, causation, motivation and second language acquisition. Canadian Pscholog, 41, 10-4. Tabachnick, B. G. & Fidell, L.S. (007. Using Multivariate Statistics (Fifth Edition. Needham Heights, MA: Alln & Bacon. 31