Jerome Kaltenhauser and Yuk Lee

Size: px
Start display at page:

Download "Jerome Kaltenhauser and Yuk Lee"

Transcription

1 Correlation Coefficients for Binary Data In Factor Analysis Jerome Kaltenhauser and Yuk Lee The most commonly used factor analytic models employ a symmetric matrix of correlation coefficients as input. The Pearson product moment coefficient is appropriate when the variables are continuous. For binary (two-valued) data there is no single agreed-upon coefficient [ 71. Three coefficients-phi, phi/phimax, and tetrachoric-are frequently discussed in the literature and are the focus of the present investigation. It is assumed that the coefficient is to be used in factor analysis and specifically in exploratory factor analyses [ 31. The characteristics of these coefficients have been discussed by others [Z, 4, 81 and will be quickly summarized here. It is convenient to discuss the coefficients in terms of a two-way table relating the occurrences of the values of the binary variables x and y to be correlated. X\Y C d c+d 1 b a b+a c+b d+a 1.0 a, b, c, and d are the joint frequencies of combinations of values of xand y, while c + d, a + b, b + c, and a + dare the marginal frequencies or proportions for y and x. Despite the special name and numerous expressions available for it, phi is simply the Pearson product moment formula applied to binary data. That is, phi = r= xty, (1) where x is the vector of standardized values of variable x (and xt is its transpose) and similarly for y; r is the product moment correlation coefficient. A defect of phi is that it can achieve the range -1.0 to 1.0 only under rare circumstances, when a+b a+d -=c+d b+c', Jerome Kaltenhauser is systems analyst, Computing Center, and Yuk Lee is associate professor of geogmphy, University of Colorado.

2 306 / Geographical Analysis Thus, phi will usually suffer some restriction in range. The simplest way to bring phi up to full range is to normalize it, and this produces phi/phimax. Phi/phimax is obtained by dividing phi by the maximum value it could assume consistent with the set of marginals from its two-way table: Pl 9 s phimax = - -, 91 Ps (3) where p, is the largest marginal in the table, p, is the second largest, and p + q = 1.0 [2]. The tetrachoric coefficient was derived on the assumption that the observed frequencies in the two-way table express an underlying bivariate normal distribution. If a bivariate normal distribution is divided into quadrants by partitions parallel to the x- and y-axes, then the tetrachoric is a maximum likelihood estimate-using only the quadrant frequencies-of the product moment coefficient which would be calculated if the full bivariate distribution were available [4, 61. Because calculation of the tetrachoric r involves evaluation of an infinite series, approximations must be resorted to. Of the three coefficients, only phi suffers from restriction of range. This is important when correlations are interpreted directly as a measure of relationship between two variables. It is not necessarily of importance in factor analysis where correlations are not directly interpreted. Phi has been used successfully in factor analysis, and an example of its performance relative to phi /phimax and the tetrachoric will be presented. Later we will see that phi has characteristics that would make it seem even less likely as a candidate for any kind of analysis. Coefficient Performance in Factor Analysis A preliminary evaluation of the three coefficients was made in a principal components analysis involving social, economic, and demographic data for Colorado counties. A matrix of 62 county observations on 18 variables was prepared and submitted to a principal components analysis using the SPSS package [ 51. The variables, which are continuous, were transformed to near normality to approximate the requirements of the tetrachoric coefficient (underlying bivariate normal distribution). For all but three of the variables the transformations were successful (50 percent confidence level using a chi-square test), and the exceptions do not appear to have seriously disturbed the results. With the variables still in continuous form, the above procedure produced a principal components analysis of a system of variables; the rotated factor loadings were used subsequently as a criterion against which to measure the loadings produced when the variables were converted to binary form. To convert variables to binary form, a cutoff point b must be chosen for each, with those values below b set to 0 and those above set to 1. Any two such binarized variables may then be related through a

3 Research Notes and Comments / 307 two-way table and phi, phi/phimax, and tetrachoric correlation coefficients computed. The cutoff points may differ from variable to variable and from --m to m. When they leave the range p & IT, however, the coefficients will give poor results for moderate sample sizes. In the present example b was restricted to &0.84a which, for normal distributions, will allow up to an 80 percent-20 percent split of 0 s and 1 s. A computer program was written to calculate phi, phi/phimax, and tetrachoric given the desired cutoff points. The tetrachoric coefficient was calculated using the method described by Kirk [4]. A matrix of each type of coefficient was output for each of a large number of different b values, and the matrices were then submitted to the principal components analysis. A set of rotated factor loadings was obtained in each case, to allow comparison with the corresponding loadings from the continuous variables cases. The comparison of the loadings was carried out by computing the root-mean-square (RMS) difference between the continuous and binary situations : where M is the number of loadings compared on a given factor, F is the number of factors, C, is a continuous loading, and B, is the corresponding loading from the binary case. If B, = C, for all loadings, RMS would be zero, indicating a perfect match. A large RMS indicates a poor match, but the scale is arbitrary. In the present example, the RMS values for phi/phimax and tetrachoric will be compared with those from phi. Figure 1 shows the results for a large number of runs. The runs 3.1 k Y.d a Tetrachoric VB. Phi I PhilPhinax VS. Phi Tetrachoric and PhilPhimax RHS Deviation FIG. 1. Performance of Coefficients of Binarized Data

4 308 / Geographical Analysis are distinguished from each other by the choice of cutoff points used to binarize the data. There was only one continuous case run since the choice of points for binarization does not affect the continuous case. To exclude insignificant variability, only loadings that exceeded were used in the comparisons. It will be seen in Figure 1 that the RMS error for phi is generally smaller than for either phi/phimax or the tetrachoric, despite the fact that cutoff points far from the mean-which should restrict the range of phi considerably-were chosen in many cases. Thus phi appears in this example to give good results in factor analysis. Of course one empirical.example establishes nothing. Results may vary with sample size (here N = 62) and with variable distributions. Normal marginal distributions, as effected here by transformations, do not guarantee the bivariate normality required by the tetrachoric [I]. To explore the effects of some of these parameters a simulation study was undertaken. Simulation Study A simulation in the present context offers the advantage that the population parameters can be controlled and varied at will, but suffers from the defect that a random element is introduced. Each simulation presents us with only a particular outcome and so can never establish a general case. This can be mitigated by examining a large number of outcomes for regularities, however. In this study, sets of N values of variables x and y were generated, with r,,, being the population correlation coefficient and N the sample size. The set of N x,y pairs may then be regarded as a random sample of N points from an infinite population with a correlation coefficient I,. The method of generating x and y simulated drawings from a bivariate normal distribution. In particular, x = n(2,; 0.0, 1.0) y = xr,,, + (1.0- r&)l z n(zj; 0.0, l.o), (5) (6) where z, and zj are random numbers and n(z; k, a) is a function that converts a number to a point on the normal cumulative distribution curve. x, which is generated first and then substituted into the equation for y, is a random normal variate with (population) zero mean and (population) unit variance. y is similarly distributed, and the values of x,y pairs are such as to simulate two variables correlated r,,,. Once N such x,y pairs are available, the sample correlation coefficient can be calculated. x and y can be binarized and phi, phi/phimax, and tetrachoric coefficients calculated. This process can be done repeatedly to get a sample of correlation coefficients (in the previous instance x,y pairs only were sampled whereas here correlation coefficients are sampled). N can be varied to examine the effect on the binary correlation

5 Research Notes and Comments / 309 coefficients. In addition, to see the effect of nonnormal distributions, the x,y pairs generated according to equations (5) and (6) were perturbed in twoways: first, varying amounts of uniform random noise were added in, and second, x and/or y were transformed according to 2 = ( x + C)h, (7) where h was allowed to vary up to 2.75 and c is a constant selected such that x + c > 0. In each instance, x and y were restandardized prior to calculating the correlation coefficients. Figure 2 presents the averages of sample correlation coefficients generated as above. Figure 2 contains nine panels, each containing plots of average binary coefficients as a function of sample size N. Columns of panels are differentiated by population rxy. Rows of panels are differentiated by the binarization points employed. The runs in tbe top row were binarized at the mean-b(x) = 0.0, b(y) = 0.0-and in the lower panels b(y) departs progressively from the mean. b(y) = 0.84 implies that the binarization point for y was at p u, or, with p = 0.00 and u = 1.0, at All values of y greater than 0.84 were converted to 1 and all others to 0. All phi values in the lower right-hand panel are less than rm , FIG. 2. Artificial Data: Calculated Coefficients with Varying Sample Parameters

6 310 / Geographical Analysis Each data point in Figure 2 represents the average of a number of sample binary correlation coefficients. N = 50 implies that fifty x,y pairs were used for each coefficient; sixteen such coefficients were calculated and averaged, for a total of eight hundred x,y pairs. Eight hundred were used for every data point, and thus N = 100 denotes eight coefficients of one hundred x,y pairs apiece, and so forth. Phi, phi/phimax, and tetrachoric show similar trends with Nbecause a single binarized set of data for each N was used to calculate the three coefficients. The data sets for different values of N are independent. The most striking regularity in Figure 2 is that phi systematically underestimates rry. The defect grows with increasing T,.~ and increasing departure of b(y) from the mean. It is apparent that phi is not a close estimate of the continuous product moment coefficient. Phi /phimax and tetrachoric on the other hand appear to supply reasonable estimates of this parameter, even for extreme binarization points. (Phi/phimax is absent from the top diagrams of Figs. 2-4 because phi and phi/phimax become the same in those situations.) How then can phi perform so well in factor analysis? A clue can be found in equation (8) for factor loadings. 1.b c 1.3. B OPhi b(d-o.00 ).5 mphilphiux b(y)-0.00 Aretrichoric 1. *%$ b(x)-o.oo b(ybo s. F b(d-0.m) b(yp FIG. 3. Artificial Data: Values of Fig. 2 Fiatioed by the Highest Coefficients

7 Research Notes and Comments / 311 where R, is the mxm correlation matrix and F is the matrix of factor loadings. If amatrix K,, with constant kevergere is introduced, then Kmxm Rmxm = Kmxm F- T Fp-9 K, R,,, = Kg& F,, F&, KALk, or Kmxm Rmxm = (K2:m Fmxp )(Fpxm Kn%/,",) * or (9) In other words, if all values in the correlation matrix are multiplied by k, the corresponding loadings will be changed by k1i2 but the relative values of the factors will be unchanged. Interpretation of the factors should not be altered, especially after rotation. Figure 3 compares the coefficients with the effect of a hypothetical constant k removed. In each panel, the average coefficient for rry = 0.20 and rxy = 0.50 is divided by the average for rxy = If k were not a constant-that is, if k varied with the magnitude of rx -a trend should be visible with varying rxy. This does not appear to i e so, and it can be concluded that k is very nearly a constant. There is a slight trend with b( y)-comparing with the tetrachoric-but it would appear that in a real data matrix, where the binarization points would be mixed for different variables, it should be of little consequence. This appears -~ Frc. 4. Artificial Data: Sampling Standard Deviation of Calculated Coefficients

8 312 / Geographical Analysis to be the explanation of the good performance of phi (relative to phi/phimax and the tetrachoric) in factor analysis, despite its obvious deficiencies. The evidence from Figures 2 and 3 would seem to place phi, phi/phimax, and the tetrachoric on an equal basis for use in factor analysis whereas there is some evidence, presented above, that phi performs somewhat better. Figure 4 shows why this is so. In Figure 4 the sample standard deviation of each coefficient is presented as a function of rxv, N, and b(y), as in Figure 2. It is evident that the sample variance of phi is much smaller than that of philphimax and the tetrachoric: the latter two coefficients have standard deviations 50 percent to 100 percent larger than phi,. at least for the medium-sized coefficients, which are likely to be numerous in a real correlation matrix. Conclusions The simulation runs in Figures 2 through 4 are only a small fraction of the total examined. The others used different binarization points and (slightly) nonnormal population distributions. Those presented here are representative of the total however. The sample averages and standard deviations varied as the distributions departed from normal, but the relations among the coefficients did not. It therefore appears that phi is adequate and perhaps even superior for use with binary data in factor analysis. With increasing sample size, the advantages of phi may disappear but not so as to preclude its use. Phi has two other advantages: it does not require a bivariate normal distribution as does the tetrachoric, and it is as easy to calculate in the usual factor analysis program as is the continuous product moment coefficient (the mechanics are the same). This study has not addressed the situation where the data matrix contains a mixture of continuous and binary data. It seems that the use of the product moment formula, which produces Pearson correlation coefficients for continuous data and phi coefficients at the other extreme, should turn out satisfactory coefficients in that case also, where the severe restrictions of two-valued variables have been relaxed. LITERATURE CITED 1. CARROLL, J. B. The Nature of the Data, or How to Choose a Correlation Coefficient. Ps&wm&rika, 26 (1961), GUILFORD, J. P. Fundamental Statistics in Psychology ar;d Education. 3rd ed. New York: McGraw-Hill Book Company, KAISER, HENRY F. A Second-Generation Little Jiffy. Psychometrika, 35 (December 1970), KIRK. DAVID B. On the Numerical Approximation of the Bivariate Normal (Tetrachoric) Correlation Coefficient. Psychometrika, 38 (June 1973), NIE, NORMAN H., DALE H. BENT, and C. Lmm HULL. SPSS: Statistical Package for the Social Sciences. New York: McGraw-Hill Book Company, F EARSON, K. I. Mathematical Contribution to the Theory of Evolution, VII, On the

9 Research Notes and Comments / 313 Correlation of Characters Not Quantitatively Measurable. Phil. Tmns. Roy. SOC. London, 1901,1954, pp RUMMEL, R J. Applied Factor Analysis. Evanston: Northwestern University Press, WALKER, HELEN M., and JOSEPH LEV. StaHsticd Inference. New York: Holt, Rnehart and Winston, 1953.

Upon completion of this chapter, you should be able to:

Upon completion of this chapter, you should be able to: 1 Chaptter 7:: CORRELATIION Upon completion of this chapter, you should be able to: Explain the concept of relationship between variables Discuss the use of the statistical tests to determine correlation

More information

Scaling of Variance Space

Scaling of Variance Space ... it Dominance, Information, and Hierarchical Scaling of Variance Space David J. Krus and Robert W. Ceurvorst Arizona State University A method for computation of dominance relations and for construction

More information

UNIT 4 RANK CORRELATION (Rho AND KENDALL RANK CORRELATION

UNIT 4 RANK CORRELATION (Rho AND KENDALL RANK CORRELATION UNIT 4 RANK CORRELATION (Rho AND KENDALL RANK CORRELATION Structure 4.0 Introduction 4.1 Objectives 4. Rank-Order s 4..1 Rank-order data 4.. Assumptions Underlying Pearson s r are Not Satisfied 4.3 Spearman

More information

About Bivariate Correlations and Linear Regression

About Bivariate Correlations and Linear Regression About Bivariate Correlations and Linear Regression TABLE OF CONTENTS About Bivariate Correlations and Linear Regression... 1 What is BIVARIATE CORRELATION?... 1 What is LINEAR REGRESSION... 1 Bivariate

More information

Correlations with Categorical Data

Correlations with Categorical Data Maximum Likelihood Estimation of Multiple Correlations and Canonical Correlations with Categorical Data Sik-Yum Lee The Chinese University of Hong Kong Wal-Yin Poon University of California, Los Angeles

More information

Advanced Methods for Determining the Number of Factors

Advanced Methods for Determining the Number of Factors Advanced Methods for Determining the Number of Factors Horn s (1965) Parallel Analysis (PA) is an adaptation of the Kaiser criterion, which uses information from random samples. The rationale underlying

More information

Estimating Coefficients in Linear Models: It Don't Make No Nevermind

Estimating Coefficients in Linear Models: It Don't Make No Nevermind Psychological Bulletin 1976, Vol. 83, No. 2. 213-217 Estimating Coefficients in Linear Models: It Don't Make No Nevermind Howard Wainer Department of Behavioral Science, University of Chicago It is proved

More information

Discrete Simulation of Power Law Noise

Discrete Simulation of Power Law Noise Discrete Simulation of Power Law Noise Neil Ashby 1,2 1 University of Colorado, Boulder, CO 80309-0390 USA 2 National Institute of Standards and Technology, Boulder, CO 80305 USA ashby@boulder.nist.gov

More information

Can Variances of Latent Variables be Scaled in Such a Way That They Correspond to Eigenvalues?

Can Variances of Latent Variables be Scaled in Such a Way That They Correspond to Eigenvalues? International Journal of Statistics and Probability; Vol. 6, No. 6; November 07 ISSN 97-703 E-ISSN 97-7040 Published by Canadian Center of Science and Education Can Variances of Latent Variables be Scaled

More information

Do not copy, post, or distribute

Do not copy, post, or distribute 14 CORRELATION ANALYSIS AND LINEAR REGRESSION Assessing the Covariability of Two Quantitative Properties 14.0 LEARNING OBJECTIVES In this chapter, we discuss two related techniques for assessing a possible

More information

An Introduction to Path Analysis

An Introduction to Path Analysis An Introduction to Path Analysis PRE 905: Multivariate Analysis Lecture 10: April 15, 2014 PRE 905: Lecture 10 Path Analysis Today s Lecture Path analysis starting with multivariate regression then arriving

More information

X X (2) X Pr(X = x θ) (3)

X X (2) X Pr(X = x θ) (3) Notes for 848 lecture 6: A ML basis for compatibility and parsimony Notation θ Θ (1) Θ is the space of all possible trees (and model parameters) θ is a point in the parameter space = a particular tree

More information

B. Weaver (18-Oct-2001) Factor analysis Chapter 7: Factor Analysis

B. Weaver (18-Oct-2001) Factor analysis Chapter 7: Factor Analysis B Weaver (18-Oct-2001) Factor analysis 1 Chapter 7: Factor Analysis 71 Introduction Factor analysis (FA) was developed by C Spearman It is a technique for examining the interrelationships in a set of variables

More information

Statistics for scientists and engineers

Statistics for scientists and engineers Statistics for scientists and engineers February 0, 006 Contents Introduction. Motivation - why study statistics?................................... Examples..................................................3

More information

Increasing Power in Paired-Samples Designs. by Correcting the Student t Statistic for Correlation. Donald W. Zimmerman. Carleton University

Increasing Power in Paired-Samples Designs. by Correcting the Student t Statistic for Correlation. Donald W. Zimmerman. Carleton University Power in Paired-Samples Designs Running head: POWER IN PAIRED-SAMPLES DESIGNS Increasing Power in Paired-Samples Designs by Correcting the Student t Statistic for Correlation Donald W. Zimmerman Carleton

More information

Probability and Stochastic Processes

Probability and Stochastic Processes Probability and Stochastic Processes A Friendly Introduction Electrical and Computer Engineers Third Edition Roy D. Yates Rutgers, The State University of New Jersey David J. Goodman New York University

More information

Subject CS1 Actuarial Statistics 1 Core Principles

Subject CS1 Actuarial Statistics 1 Core Principles Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and

More information

Factor analysis. George Balabanis

Factor analysis. George Balabanis Factor analysis George Balabanis Key Concepts and Terms Deviation. A deviation is a value minus its mean: x - mean x Variance is a measure of how spread out a distribution is. It is computed as the average

More information

The 3 Indeterminacies of Common Factor Analysis

The 3 Indeterminacies of Common Factor Analysis The 3 Indeterminacies of Common Factor Analysis James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) The 3 Indeterminacies of Common

More information

FAQ: Linear and Multiple Regression Analysis: Coefficients

FAQ: Linear and Multiple Regression Analysis: Coefficients Question 1: How do I calculate a least squares regression line? Answer 1: Regression analysis is a statistical tool that utilizes the relation between two or more quantitative variables so that one variable

More information

Journal of Educational and Behavioral Statistics

Journal of Educational and Behavioral Statistics Journal of Educational and Behavioral Statistics http://jebs.aera.net Theory of Estimation and Testing of Effect Sizes: Use in Meta-Analysis Helena Chmura Kraemer JOURNAL OF EDUCATIONAL AND BEHAVIORAL

More information

Approximate and Fiducial Confidence Intervals for the Difference Between Two Binomial Proportions

Approximate and Fiducial Confidence Intervals for the Difference Between Two Binomial Proportions Approximate and Fiducial Confidence Intervals for the Difference Between Two Binomial Proportions K. Krishnamoorthy 1 and Dan Zhang University of Louisiana at Lafayette, Lafayette, LA 70504, USA SUMMARY

More information

MATH 1070 Introductory Statistics Lecture notes Relationships: Correlation and Simple Regression

MATH 1070 Introductory Statistics Lecture notes Relationships: Correlation and Simple Regression MATH 1070 Introductory Statistics Lecture notes Relationships: Correlation and Simple Regression Objectives: 1. Learn the concepts of independent and dependent variables 2. Learn the concept of a scatterplot

More information

Dimensionality Assessment: Additional Methods

Dimensionality Assessment: Additional Methods Dimensionality Assessment: Additional Methods In Chapter 3 we use a nonlinear factor analytic model for assessing dimensionality. In this appendix two additional approaches are presented. The first strategy

More information

Directionally Sensitive Multivariate Statistical Process Control Methods

Directionally Sensitive Multivariate Statistical Process Control Methods Directionally Sensitive Multivariate Statistical Process Control Methods Ronald D. Fricker, Jr. Naval Postgraduate School October 5, 2005 Abstract In this paper we develop two directionally sensitive statistical

More information

FACTORIZATION AND THE PRIMES

FACTORIZATION AND THE PRIMES I FACTORIZATION AND THE PRIMES 1. The laws of arithmetic The object of the higher arithmetic is to discover and to establish general propositions concerning the natural numbers 1, 2, 3,... of ordinary

More information

A Threshold-Free Approach to the Study of the Structure of Binary Data

A Threshold-Free Approach to the Study of the Structure of Binary Data International Journal of Statistics and Probability; Vol. 2, No. 2; 2013 ISSN 1927-7032 E-ISSN 1927-7040 Published by Canadian Center of Science and Education A Threshold-Free Approach to the Study of

More information

Statistics Introductory Correlation

Statistics Introductory Correlation Statistics Introductory Correlation Session 10 oscardavid.barrerarodriguez@sciencespo.fr April 9, 2018 Outline 1 Statistics are not used only to describe central tendency and variability for a single variable.

More information

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data. Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

KDF2C QUANTITATIVE TECHNIQUES FOR BUSINESSDECISION. Unit : I - V

KDF2C QUANTITATIVE TECHNIQUES FOR BUSINESSDECISION. Unit : I - V KDF2C QUANTITATIVE TECHNIQUES FOR BUSINESSDECISION Unit : I - V Unit I: Syllabus Probability and its types Theorems on Probability Law Decision Theory Decision Environment Decision Process Decision tree

More information

Logistic Regression: Regression with a Binary Dependent Variable

Logistic Regression: Regression with a Binary Dependent Variable Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression

More information

Non-independence in Statistical Tests for Discrete Cross-species Data

Non-independence in Statistical Tests for Discrete Cross-species Data J. theor. Biol. (1997) 188, 507514 Non-independence in Statistical Tests for Discrete Cross-species Data ALAN GRAFEN* AND MARK RIDLEY * St. John s College, Oxford OX1 3JP, and the Department of Zoology,

More information

INSTITIÚID TEICNEOLAÍOCHTA CHEATHARLACH INSTITUTE OF TECHNOLOGY CARLOW MATRICES

INSTITIÚID TEICNEOLAÍOCHTA CHEATHARLACH INSTITUTE OF TECHNOLOGY CARLOW MATRICES 1 CHAPTER 4 MATRICES 1 INSTITIÚID TEICNEOLAÍOCHTA CHEATHARLACH INSTITUTE OF TECHNOLOGY CARLOW MATRICES 1 Matrices Matrices are of fundamental importance in 2-dimensional and 3-dimensional graphics programming

More information

CH 37 DOUBLE DISTRIBUTING

CH 37 DOUBLE DISTRIBUTING CH 37 DOUBLE DISTRIBUTING 343 The Double Distributive Property W hat we need now is a way to multiply two binomials together, a skill absolutely necessary for success in this class. For example, how do

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

SC705: Advanced Statistics Instructor: Natasha Sarkisian Class notes: Introduction to Structural Equation Modeling (SEM)

SC705: Advanced Statistics Instructor: Natasha Sarkisian Class notes: Introduction to Structural Equation Modeling (SEM) SC705: Advanced Statistics Instructor: Natasha Sarkisian Class notes: Introduction to Structural Equation Modeling (SEM) SEM is a family of statistical techniques which builds upon multiple regression,

More information

EQUATIONS OF EQUILIBRIUM & TWO- AND THREE-FORCE MEMEBERS

EQUATIONS OF EQUILIBRIUM & TWO- AND THREE-FORCE MEMEBERS EQUATIONS OF EQUILIBRIUM & TWO- AND THREE-FORCE MEMEBERS Today s Objectives: Students will be able to: a) Apply equations of equilibrium to solve for unknowns, and b) Recognize two-force members. In-Class

More information

A Cautionary Note on Estimating the Reliability of a Mastery Test with the Beta-Binomial Model

A Cautionary Note on Estimating the Reliability of a Mastery Test with the Beta-Binomial Model A Cautionary Note on Estimating the Reliability of a Mastery Test with the Beta-Binomial Model Rand R. Wilcox University of Southern California Based on recently published papers, it might be tempting

More information

CHAPTER 7 MULTI-LEVEL GATE CIRCUITS NAND AND NOR GATES

CHAPTER 7 MULTI-LEVEL GATE CIRCUITS NAND AND NOR GATES CHAPTER 7 MULTI-LEVEL GATE CIRCUITS NAND AND NOR GATES This chapter in the book includes: Objectives Study Guide 7.1 Multi-Level Gate Circuits 7.2 NAND and NOR Gates 7.3 Design of Two-Level Circuits Using

More information

Multivariate Distribution Models

Multivariate Distribution Models Multivariate Distribution Models Model Description While the probability distribution for an individual random variable is called marginal, the probability distribution for multiple random variables is

More information

Quantifying Weather Risk Analysis

Quantifying Weather Risk Analysis Quantifying Weather Risk Analysis Now that an index has been selected and calibrated, it can be used to conduct a more thorough risk analysis. The objective of such a risk analysis is to gain a better

More information

Handout #6 INTRODUCTION TO ALGEBRAIC STRUCTURES: Prof. Moseley AN ALGEBRAIC FIELD

Handout #6 INTRODUCTION TO ALGEBRAIC STRUCTURES: Prof. Moseley AN ALGEBRAIC FIELD Handout #6 INTRODUCTION TO ALGEBRAIC STRUCTURES: Prof. Moseley Chap. 2 AN ALGEBRAIC FIELD To introduce the notion of an abstract algebraic structure we consider (algebraic) fields. (These should not to

More information

Institute of Actuaries of India

Institute of Actuaries of India Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2018 Examinations Subject CT3 Probability and Mathematical Statistics Core Technical Syllabus 1 June 2017 Aim The

More information

Linear Algebra. Linear Equations and Matrices. Copyright 2005, W.R. Winfrey

Linear Algebra. Linear Equations and Matrices. Copyright 2005, W.R. Winfrey Copyright 2005, W.R. Winfrey Topics Preliminaries Systems of Linear Equations Matrices Algebraic Properties of Matrix Operations Special Types of Matrices and Partitioned Matrices Matrix Transformations

More information

Truss Structures: The Direct Stiffness Method

Truss Structures: The Direct Stiffness Method . Truss Structures: The Companies, CHAPTER Truss Structures: The Direct Stiffness Method. INTRODUCTION The simple line elements discussed in Chapter introduced the concepts of nodes, nodal displacements,

More information

CS100: DISCRETE STRUCTURES. Lecture 3 Matrices Ch 3 Pages:

CS100: DISCRETE STRUCTURES. Lecture 3 Matrices Ch 3 Pages: CS100: DISCRETE STRUCTURES Lecture 3 Matrices Ch 3 Pages: 246-262 Matrices 2 Introduction DEFINITION 1: A matrix is a rectangular array of numbers. A matrix with m rows and n columns is called an m x n

More information

STATISTICS; An Introductory Analysis. 2nd hidition TARO YAMANE NEW YORK UNIVERSITY A HARPER INTERNATIONAL EDITION

STATISTICS; An Introductory Analysis. 2nd hidition TARO YAMANE NEW YORK UNIVERSITY A HARPER INTERNATIONAL EDITION 2nd hidition TARO YAMANE NEW YORK UNIVERSITY STATISTICS; An Introductory Analysis A HARPER INTERNATIONAL EDITION jointly published by HARPER & ROW, NEW YORK, EVANSTON & LONDON AND JOHN WEATHERHILL, INC.,

More information

A Rothschild-Stiglitz approach to Bayesian persuasion

A Rothschild-Stiglitz approach to Bayesian persuasion A Rothschild-Stiglitz approach to Bayesian persuasion Matthew Gentzkow and Emir Kamenica Stanford University and University of Chicago December 2015 Abstract Rothschild and Stiglitz (1970) represent random

More information

Quadratic Equations. All types, factorising, equation, completing the square. 165 minutes. 151 marks. Page 1 of 53

Quadratic Equations. All types, factorising, equation, completing the square. 165 minutes. 151 marks. Page 1 of 53 Quadratic Equations All types, factorising, equation, completing the square 165 minutes 151 marks Page 1 of 53 Q1. (a) Factorise x 2 + 5x 24 Answer... (2) (b) Solve x 2 + 5x 24 = 0 Answer... (1) (Total

More information

Introduction to Statistics and Error Analysis

Introduction to Statistics and Error Analysis Introduction to Statistics and Error Analysis Physics116C, 4/3/06 D. Pellett References: Data Reduction and Error Analysis for the Physical Sciences by Bevington and Robinson Particle Data Group notes

More information

14.30 Introduction to Statistical Methods in Economics Spring 2009

14.30 Introduction to Statistical Methods in Economics Spring 2009 MIT OpenCourseWare http://ocw.mit.edu 4.0 Introduction to Statistical Methods in Economics Spring 009 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Multivariate Distributions

Multivariate Distributions IEOR E4602: Quantitative Risk Management Spring 2016 c 2016 by Martin Haugh Multivariate Distributions We will study multivariate distributions in these notes, focusing 1 in particular on multivariate

More information

Probability and Inference. POLI 205 Doing Research in Politics. Populations and Samples. Probability. Fall 2015

Probability and Inference. POLI 205 Doing Research in Politics. Populations and Samples. Probability. Fall 2015 Fall 2015 Population versus Sample Population: data for every possible relevant case Sample: a subset of cases that is drawn from an underlying population Inference Parameters and Statistics A parameter

More information

An Introduction to Mplus and Path Analysis

An Introduction to Mplus and Path Analysis An Introduction to Mplus and Path Analysis PSYC 943: Fundamentals of Multivariate Modeling Lecture 10: October 30, 2013 PSYC 943: Lecture 10 Today s Lecture Path analysis starting with multivariate regression

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem set 1 Solutions Thursday, September 19 What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.

More information

On Certain Indices for Ordinal Data with Unequally Weighted Classes

On Certain Indices for Ordinal Data with Unequally Weighted Classes Quality & Quantity (2005) 39:515 536 Springer 2005 DOI 10.1007/s11135-005-1611-6 On Certain Indices for Ordinal Data with Unequally Weighted Classes M. PERAKIS, P. E. MARAVELAKIS, S. PSARAKIS, E. XEKALAKI

More information

A process capability index for discrete processes

A process capability index for discrete processes Journal of Statistical Computation and Simulation Vol. 75, No. 3, March 2005, 175 187 A process capability index for discrete processes MICHAEL PERAKIS and EVDOKIA XEKALAKI* Department of Statistics, Athens

More information

REDUNDANCY ANALYSIS AN ALTERNATIVE FOR CANONICAL CORRELATION ANALYSIS ARNOLD L. VAN DEN WOLLENBERG UNIVERSITY OF NIJMEGEN

REDUNDANCY ANALYSIS AN ALTERNATIVE FOR CANONICAL CORRELATION ANALYSIS ARNOLD L. VAN DEN WOLLENBERG UNIVERSITY OF NIJMEGEN PSYCHOMETRIKA-VOL. 42, NO, 2 JUNE, 1977 REDUNDANCY ANALYSIS AN ALTERNATIVE FOR CANONICAL CORRELATION ANALYSIS ARNOLD L. VAN DEN WOLLENBERG UNIVERSITY OF NIJMEGEN A component method is presented maximizing

More information

IE 361 Exam 3 (Form A)

IE 361 Exam 3 (Form A) December 15, 005 IE 361 Exam 3 (Form A) Prof. Vardeman This exam consists of 0 multiple choice questions. Write (in pencil) the letter for the single best response for each question in the corresponding

More information

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) 1 EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) Taisuke Otsu London School of Economics Summer 2018 A.1. Summation operator (Wooldridge, App. A.1) 2 3 Summation operator For

More information

On prediction and density estimation Peter McCullagh University of Chicago December 2004

On prediction and density estimation Peter McCullagh University of Chicago December 2004 On prediction and density estimation Peter McCullagh University of Chicago December 2004 Summary Having observed the initial segment of a random sequence, subsequent values may be predicted by calculating

More information

SOME BASICS OF TIME-SERIES ANALYSIS

SOME BASICS OF TIME-SERIES ANALYSIS SOME BASICS OF TIME-SERIES ANALYSIS John E. Floyd University of Toronto December 8, 26 An excellent place to learn about time series analysis is from Walter Enders textbook. For a basic understanding of

More information

Exploratory Factor Analysis and Principal Component Analysis

Exploratory Factor Analysis and Principal Component Analysis Exploratory Factor Analysis and Principal Component Analysis Today s Topics: What are EFA and PCA for? Planning a factor analytic study Analysis steps: Extraction methods How many factors Rotation and

More information

Appendix A: Matrices

Appendix A: Matrices Appendix A: Matrices A matrix is a rectangular array of numbers Such arrays have rows and columns The numbers of rows and columns are referred to as the dimensions of a matrix A matrix with, say, 5 rows

More information

CHAPTER 3. THE IMPERFECT CUMULATIVE SCALE

CHAPTER 3. THE IMPERFECT CUMULATIVE SCALE CHAPTER 3. THE IMPERFECT CUMULATIVE SCALE 3.1 Model Violations If a set of items does not form a perfect Guttman scale but contains a few wrong responses, we do not necessarily need to discard it. A wrong

More information

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis: Uses one group of variables (we will call this X) In

More information

Inverse Sampling for McNemar s Test

Inverse Sampling for McNemar s Test International Journal of Statistics and Probability; Vol. 6, No. 1; January 27 ISSN 1927-7032 E-ISSN 1927-7040 Published by Canadian Center of Science and Education Inverse Sampling for McNemar s Test

More information

Concept of Reliability

Concept of Reliability Concept of Reliability 1 The concept of reliability is of the consistency or precision of a measure Weight example Reliability varies along a continuum, measures are reliable to a greater or lesser extent

More information

Incompatibility Paradoxes

Incompatibility Paradoxes Chapter 22 Incompatibility Paradoxes 22.1 Simultaneous Values There is never any difficulty in supposing that a classical mechanical system possesses, at a particular instant of time, precise values of

More information

Patterns in Offender Distance Decay and the Geographic Profiling Problem.

Patterns in Offender Distance Decay and the Geographic Profiling Problem. Patterns in Offender Distance Decay and the Geographic Profiling Problem. Mike O Leary Towson University 2010 Fall Western Section Meeting Los Angeles, CA October 9-10, 2010 Mike O Leary (Towson University)

More information

FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING

FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING Vishwanath Mantha Department for Electrical and Computer Engineering Mississippi State University, Mississippi State, MS 39762 mantha@isip.msstate.edu ABSTRACT

More information

Calculation and Application of MOPITT Averaging Kernels

Calculation and Application of MOPITT Averaging Kernels Calculation and Application of MOPITT Averaging Kernels Merritt N. Deeter Atmospheric Chemistry Division National Center for Atmospheric Research Boulder, Colorado 80307 July, 2002 I. Introduction Retrieval

More information

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adopted from Prof. H.R. Rabiee s and also Prof. R. Gutierrez-Osuna

More information

Some Approximations of the Logistic Distribution with Application to the Covariance Matrix of Logistic Regression

Some Approximations of the Logistic Distribution with Application to the Covariance Matrix of Logistic Regression Working Paper 2013:9 Department of Statistics Some Approximations of the Logistic Distribution with Application to the Covariance Matrix of Logistic Regression Ronnie Pingel Working Paper 2013:9 June

More information

Dimensionality Reduction Techniques (DRT)

Dimensionality Reduction Techniques (DRT) Dimensionality Reduction Techniques (DRT) Introduction: Sometimes we have lot of variables in the data for analysis which create multidimensional matrix. To simplify calculation and to get appropriate,

More information

Linear Algebra. The analysis of many models in the social sciences reduces to the study of systems of equations.

Linear Algebra. The analysis of many models in the social sciences reduces to the study of systems of equations. POLI 7 - Mathematical and Statistical Foundations Prof S Saiegh Fall Lecture Notes - Class 4 October 4, Linear Algebra The analysis of many models in the social sciences reduces to the study of systems

More information

Estimation of Parameters

Estimation of Parameters CHAPTER Probability, Statistics, and Reliability for Engineers and Scientists FUNDAMENTALS OF STATISTICAL ANALYSIS Second Edition A. J. Clark School of Engineering Department of Civil and Environmental

More information

2. Matrix Algebra and Random Vectors

2. Matrix Algebra and Random Vectors 2. Matrix Algebra and Random Vectors 2.1 Introduction Multivariate data can be conveniently display as array of numbers. In general, a rectangular array of numbers with, for instance, n rows and p columns

More information

CHAPTER 5 ANALYSIS OF STRUCTURES. Expected Outcome:

CHAPTER 5 ANALYSIS OF STRUCTURES. Expected Outcome: CHAPTER ANALYSIS O STRUCTURES Expected Outcome: Able to analyze the equilibrium of structures made of several connected parts, using the concept of the equilibrium of a particle or of a rigid body, in

More information

ECNS 561 Multiple Regression Analysis

ECNS 561 Multiple Regression Analysis ECNS 561 Multiple Regression Analysis Model with Two Independent Variables Consider the following model Crime i = β 0 + β 1 Educ i + β 2 [what else would we like to control for?] + ε i Here, we are taking

More information

1 A Review of Correlation and Regression

1 A Review of Correlation and Regression 1 A Review of Correlation and Regression SW, Chapter 12 Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then

More information

JUST THE MATHS UNIT NUMBER 1.5. ALGEBRA 5 (Manipulation of algebraic expressions) A.J.Hobson

JUST THE MATHS UNIT NUMBER 1.5. ALGEBRA 5 (Manipulation of algebraic expressions) A.J.Hobson JUST THE MATHS UNIT NUMBER 1.5 ALGEBRA 5 (Manipulation of algebraic expressions) by A.J.Hobson 1.5.1 Simplification of expressions 1.5.2 Factorisation 1.5.3 Completing the square in a quadratic expression

More information

Lecture 6 Positive Definite Matrices

Lecture 6 Positive Definite Matrices Linear Algebra Lecture 6 Positive Definite Matrices Prof. Chun-Hung Liu Dept. of Electrical and Computer Engineering National Chiao Tung University Spring 2017 2017/6/8 Lecture 6: Positive Definite Matrices

More information

ACE 562 Fall Lecture 2: Probability, Random Variables and Distributions. by Professor Scott H. Irwin

ACE 562 Fall Lecture 2: Probability, Random Variables and Distributions. by Professor Scott H. Irwin ACE 562 Fall 2005 Lecture 2: Probability, Random Variables and Distributions Required Readings: by Professor Scott H. Irwin Griffiths, Hill and Judge. Some Basic Ideas: Statistical Concepts for Economists,

More information

Chapter. Algebra techniques. Syllabus Content A Basic Mathematics 10% Basic algebraic techniques and the solution of equations.

Chapter. Algebra techniques. Syllabus Content A Basic Mathematics 10% Basic algebraic techniques and the solution of equations. Chapter 2 Algebra techniques Syllabus Content A Basic Mathematics 10% Basic algebraic techniques and the solution of equations. Page 1 2.1 What is algebra? In order to extend the usefulness of mathematical

More information

Discrete Distributions

Discrete Distributions Discrete Distributions STA 281 Fall 2011 1 Introduction Previously we defined a random variable to be an experiment with numerical outcomes. Often different random variables are related in that they have

More information

ALGEBRA. 1. Some elementary number theory 1.1. Primes and divisibility. We denote the collection of integers

ALGEBRA. 1. Some elementary number theory 1.1. Primes and divisibility. We denote the collection of integers ALGEBRA CHRISTIAN REMLING 1. Some elementary number theory 1.1. Primes and divisibility. We denote the collection of integers by Z = {..., 2, 1, 0, 1,...}. Given a, b Z, we write a b if b = ac for some

More information

Investigation into the use of confidence indicators with calibration

Investigation into the use of confidence indicators with calibration WORKSHOP ON FRONTIERS IN BENCHMARKING TECHNIQUES AND THEIR APPLICATION TO OFFICIAL STATISTICS 7 8 APRIL 2005 Investigation into the use of confidence indicators with calibration Gerard Keogh and Dave Jennings

More information

Algebraic Expressions and Identities

Algebraic Expressions and Identities ALGEBRAIC EXPRESSIONS AND IDENTITIES 137 Algebraic Expressions and Identities CHAPTER 9 9.1 What are Expressions? In earlier classes, we have already become familiar with what algebraic expressions (or

More information

Nesting and Equivalence Testing

Nesting and Equivalence Testing Nesting and Equivalence Testing Tihomir Asparouhov and Bengt Muthén August 13, 2018 Abstract In this note, we discuss the nesting and equivalence testing (NET) methodology developed in Bentler and Satorra

More information

NAG Library Chapter Introduction. G08 Nonparametric Statistics

NAG Library Chapter Introduction. G08 Nonparametric Statistics NAG Library Chapter Introduction G08 Nonparametric Statistics Contents 1 Scope of the Chapter.... 2 2 Background to the Problems... 2 2.1 Parametric and Nonparametric Hypothesis Testing... 2 2.2 Types

More information

TOTAL JITTER MEASUREMENT THROUGH THE EXTRAPOLATION OF JITTER HISTOGRAMS

TOTAL JITTER MEASUREMENT THROUGH THE EXTRAPOLATION OF JITTER HISTOGRAMS T E C H N I C A L B R I E F TOTAL JITTER MEASUREMENT THROUGH THE EXTRAPOLATION OF JITTER HISTOGRAMS Dr. Martin Miller, Author Chief Scientist, LeCroy Corporation January 27, 2005 The determination of total

More information

Linear Algebra. Chapter Linear Equations

Linear Algebra. Chapter Linear Equations Chapter 3 Linear Algebra Dixit algorizmi. Or, So said al-khwarizmi, being the opening words of a 12 th century Latin translation of a work on arithmetic by al-khwarizmi (ca. 78 84). 3.1 Linear Equations

More information

FACTOR ANALYSIS AS MATRIX DECOMPOSITION 1. INTRODUCTION

FACTOR ANALYSIS AS MATRIX DECOMPOSITION 1. INTRODUCTION FACTOR ANALYSIS AS MATRIX DECOMPOSITION JAN DE LEEUW ABSTRACT. Meet the abstract. This is the abstract. 1. INTRODUCTION Suppose we have n measurements on each of taking m variables. Collect these measurements

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization. Statistical Tools in Evaluation HPS 41 Dr. Joe G. Schmalfeldt Types of Scores Continuous Scores scores with a potentially infinite number of values. Discrete Scores scores limited to a specific number

More information

Matrix Algebra. Matrix Algebra. Chapter 8 - S&B

Matrix Algebra. Matrix Algebra. Chapter 8 - S&B Chapter 8 - S&B Algebraic operations Matrix: The size of a matrix is indicated by the number of its rows and the number of its columns. A matrix with k rows and n columns is called a k n matrix. The number

More information

Power Comparison of Exact Unconditional Tests for Comparing Two Binomial Proportions

Power Comparison of Exact Unconditional Tests for Comparing Two Binomial Proportions Power Comparison of Exact Unconditional Tests for Comparing Two Binomial Proportions Roger L. Berger Department of Statistics North Carolina State University Raleigh, NC 27695-8203 June 29, 1994 Institute

More information

Ho Chi Minh City University of Technology Faculty of Civil Engineering Department of Water Resources Engineering & Management

Ho Chi Minh City University of Technology Faculty of Civil Engineering Department of Water Resources Engineering & Management Lecturer: Associ. Prof. Dr. NGUYỄN Thống E-mail: nguyenthong@hcmut.edu.vn or nthong56@yahoo.fr Web: http://www4.hcmut.edu.vn/~nguyenthong/index 4/5/2016 1 Tél. (08) 38 691 592-098 99 66 719 CONTENTS Chapter

More information