On the Variance of Eigenvalues in PCA and MCA

Size: px
Start display at page:

Download "On the Variance of Eigenvalues in PCA and MCA"

Transcription

1 On the Variance of in PCA and MCA Jean-Luc Durand Laboratoire d Éthologie Expérimentale et Comparée - LEEC EA 4443 Université Paris 13 Sorbonne Paris Cité Naples, CARME /09/2015 J.-L. Durand On the Variance of 1/41 CA, PCA and MCA Which statistics measure the overall magnitude of the relations between variables in CA, PCA and MCA? the sum of the eigenvalues in CA the variance of the eigenvalues in PCA and MCA. J.-L. Durand On the Variance of 2/41

2 Part 1 Principal Component Analysis (PCA on correlation matrix) Overall viewpoint: correlations and eigenvalues Local viewpoint: contributions of each variable to axes J.-L. Durand On the Variance of 3/41 PCA Measure of Overall Magnitude of Correlations Definition The average linkage index of p numerical variables is: ALI = k k k r 2 kk p(p 1) (mean of the squared correlations between variables). 0 min (k,k ) r 2 kk ALI max r (k,k kk 2 k) 1 J.-L. Durand On the Variance of 4/41

3 PCA Variance of Variance of the p eigenvalues: Var(λ) = 0 Var(λ) p 1 p (λ l 1) 2 /p l=1 Two extreme situations: p uncorrelated variables r kk = 0 (k k ) =... = λ p = 1 Var(λ) = 0 Spherical cloud p perfectly correlated variables rkk 2 = 1 = p, λ 2 =... = λ p = 0 Var(λ) = p 1 Unidimensional cloud J.-L. Durand On the Variance of 5/41 PCA Theorem In PCA of p standardized variables: Var(λ) = (p 1) ALI The Average Linkage Index is a measure: of the overall magnitude of correlations of the departure from sphericity of the cloud. J.-L. Durand On the Variance of 6/41

4 Spearman Example Correlations SPEARMAN C., General Intelligence, Objectively Determined and Measured, The American Journal of Psychology, Vol. 15, No. 2, April 1904 (p. 291). Correlation matrix Lit Fre Eng Mat Aud Mus Literature French English Mathematics Aud. disc Music 1 ALI =.3899 ALI =.62 J.-L. Durand On the Variance of 7/41 Spearman Example = λ 2 = λ 3 = λ 4 = λ 5 = λ 6 = Sum 6 λ 6 λ 5 λ 4 λ 3 λ Var(λ) = = 5 ALI J.-L. Durand On the Variance of 8/41

5 PCA Linkage Index of a Variable Definition Given p numerical variables, the linkage index of variable k LI k = k k r 2 kk p 1 (mean of the squared correlations between variable k anf the others). 0 min k is: r 2 kk LI k max k k r 2 kk 1 LI k Lit.5240 Fre.4668 Eng.4037 Mat.3622 Aud.3024 Mus.2804 ALI.3899 Mean: LI k k p = ALI J.-L. Durand On the Variance of 9/41 PCA Linkage Ratio of a Variable Definition The linkage ratio of variable k is: LR k = LI k ALI Mean: LR k k p = 1 LR k Lit 1.34 Fre 1.20 Eng 1.04 Mat 0.93 Aud 0.78 Mus 0.72 Mean 1 J.-L. Durand On the Variance of 10/41

6 Spearman Example plane 1-2 Regression coefficients: r kl Axis 1 Axis 2 Literature French English Mathematics Aud. disc Music λ l l (1,..., p) k (1,..., p) rkl 2 = λ l k rkl 2 = l Axis 2 (10.3%) λ 2 = Correlation Circle (plane 1 2) = Axis 1 (68.4%) Aud Fre Eng Lit Mat Mus J.-L. Durand On the Variance of 11/41 Spearman Example Contributions to Axes Contributions of variables to axes (Ctr l k ) Axis 1 Axis 2 Axis 3 Axis 4 Axis 5 Axis 6 Sum Literature French English Mathematics Aud. disc Music Sum Eigenvalue Contribution of variable k to axis l: Ctr l k = rkl 2 /λ l l Ctr l k = 1 k l Ctrl k = 1 k w k : vector of the contributions of variable k to axes k Corr(w k, λ) = 0 J.-L. Durand On the Variance of 12/41

7 Property of Uncorrelated Positive Variables Let x and y be two positive variables with n values, Mean of x: xi n, denoted x Weighted average of x by y: ( y-average of x ) yi x i yi, denoted Avg y (x) Property r xy = 0 x = Avg y (x) ȳ = Avg x (y) In PCA on correlation matrix, k λ = 1 = Avgwk (λ) J.-L. Durand On the Variance of 13/41 Expansion Ratio Let x and w be two uncorrelated positive variables, (xi x) 2 Variance of x:, denoted Var(x) n wi (x i x) 2 Weighted variance of x by w:, denoted Var w (x) wi ( w-variance of x ) Definition The w-expansion ratio of x is: XR w (x) = Var w(x) Var(x) J.-L. Durand On the Variance of 14/41

8 PCA Theorem In PCA of p standardized variables: k Var wk (λ) = (p 1) LI k XR wk (λ) = LR k Interpretation The higher the linkage ratio of a variable, the higher the contributions of this variable to extreme eigenvalues. The lower the linkage ratio of a variable, the higher the contributions of this variable to central eigenvalues. J.-L. Durand On the Variance of 15/41 Spearman Example Literature Lit Linkage Ratio XR = 1.34 Contributions of Lit to axes (%) λ 2 λ 4 λ 3 λ 6 λ J.-L. Durand On the Variance of 16/41

9 Spearman Example French Fre Linkage Ratio XR = 1.20 Contributions of Fre to axes (%) λ 2 λ 4 λ 3 λ 6 λ J.-L. Durand On the Variance of 17/41 Spearman Example English Eng Linkage Ratio Contributions of Eng to axes (%) λ 2 λ 4 λ 3 λ 6 λ 5 XR = J.-L. Durand On the Variance of 18/41

10 Spearman Example Mathematics Mat Linkage Ratio Contributions of Mat to axes (%) λ 2 λ 4 λ 3 λ 6 λ 5 XR = J.-L. Durand On the Variance of 19/41 Spearman Example Auditive discrimination Aud Linkage Ratio Contributions of Aud to axes (%) λ 2 λ 4 λ 3 λ 6 λ 5 XR = J.-L. Durand On the Variance of 20/41

11 Spearman Example Music Mus Linkage Ratio Contributions of Mus to axes (%) λ 2 λ 4 λ 3 λ 6 λ 5 XR = J.-L. Durand On the Variance of 21/41 PCA Summary In PCA on correlation matrix: The variance of eigenvalues is proportional to the average linkage index. The distribution of contributions of a variable to axes depends on the linkage ratio of this variable. J.-L. Durand On the Variance of 22/41

12 Part 2 Multiple Correspondence Analysis Overall viewpoint: Φ 2 coefficients and eigenvalues Local viewpoint: contributions of each question to axes J.-L. Durand On the Variance of 23/41 MCA Burt s Data BURT C., The Factorial Analysis of Qualitative Data, British Journal of Statistical Psychology, Vol. 3, Issue 3, November 1950 (p. 177). Hair Eyes Head Stature Hair Eyes Head Stature fair red dark light mixed brown narrow wide tall short fair red dark light mixed brown narrow wide tall 43 0 short 0 57 Q = 4 questions K = K q = = 10 categories Average number of categories per question: K Q = 2.5 J.-L. Durand On the Variance of 24/41

13 MCA Φ 2 coefficients Φ 2 table Hair Eyes Head Stature Hair Eyes Head Stature 1 J.-L. Durand On the Variance of 25/41 MCA Magnitude of Relationships between Questions Definition The average linkage index of Q questions with K categories is: ALI = with Φ 2 = q K Φ2 Q 1 q q Φ2 qq Q(Q 1) (mean of the Φ 2 qq coefficients (q q )). 0 ALI 1 Burt example: Φ 2 = , ALI = = J.-L. Durand On the Variance of 26/41

14 MCA Variance of = λ 2 = λ 3 = λ 4 = λ 5 = λ 6 = λ l = K Q 1 = 1.5 l Mean(λ) = 1 Q = 0.25 Var(λ) = λ 6 λ 5 λ 4 λ 3 λ Var(λ) = K Q l=1 ( ) 2 λ l 1 Q K Q J.-L. Durand On the Variance of 27/41 MCA Variance of 0 Var(λ) Q 1 Q 2 Two extreme situations: Q independent questions Φ 2 qq = 0 (q q ) =... = λ p = 1 Q Var(λ) = 0 Spherical cloud Q equivalent questions Φ 2 qq = K q 1 = K q 1 { 1 if 1 l K λ l = Q 1 0 if K Q l K Q ( K Q Var(λ) = Q 1 Q 2 1)-dimensional cloud J.-L. Durand On the Variance of 28/41

15 MCA Theorem In MCA on a table with Q questions: Var(λ) = Q 1 Q 2 ALI The Average Linkage Index is a measure: of the overall magnitude of relations between questions of the departure from sphericity for the cloud. J.-L. Durand On the Variance of 29/41 MCA Linkage Index of a Question Definition Given Q questions, the linkage index of a question q with K q categories is: LI q = Φ2 q K q 1 with Φ 2 q q = q Φ2 qq Q 1 (mean the Φ 2 coefficients between question q and the others). 0 LI q 1 Average of linkage indexes weighted by number of categories minus 1 q (K q 1)LI q = ALI K Q J.-L. Durand On the Variance of 30/41

16 MCA Linkage Indexes of Questions Φ 2 table Hair Eyes Head Stature Φ 2 q K q 1 LI q Hair Eyes Head Stature Average of linkage indexes weighted by number of categories minus 1 ALI = J.-L. Durand On the Variance of 31/41 MCA Linkage Ratios of a Question Definition The linkage ratio of question q is: LR q = LI q ALI Average of linkage ratios weighted by number of categories minus 1 q (K q 1)LR q = 1 K Q J.-L. Durand On the Variance of 32/41

17 MCA Linkage Ratios of Questions LR q K q 1 Hair Eyes Head Stature Weighted average 1 Head Hair Eyes Stature Linkage Ratio J.-L. Durand On the Variance of 33/41 MCA Contributions to Axes Contributions of questions to axes (Ctr l q) Axis 1 Axis 2 Axis 3 Axis 4 Axis 5 Axis 6 Sum Hair Eyes Head Stature Sum Eigenvalue l, Ctr l q = 1 q q, l Ctrl q = K q 1 w q : vector of the contributions of question q to axes q, Corr(w q, λ) = 0 J.-L. Durand On the Variance of 34/41

18 MCA Theorem In MCA on a table with Q questions: q Var wq (λ) = Q 1 Q 2 LI q XR wq (λ) = LR q Interpretation The higher the linkage ratio of a variable, the higher the contributions of this variable to extreme eigenvalues. The lower the linkage ratio of a variable, the higher the contributions of this variable to central eigenvalues. J.-L. Durand On the Variance of 35/41 Burt Example Stature Stature Expansion Ratio XR = 1.95 Ctr of Stature to axes (%) λ 6 λ 5 λ 4 λ 3 λ J.-L. Durand On the Variance of 36/41

19 Burt Example Eyes Eyes Expansion Ratio XR = 1.32 Ctr of Eyes to axes (%) λ 6 λ 5 λ 4 λ 3 λ J.-L. Durand On the Variance of 37/41 Burt Example Hair Hair Expansion Ratio XR = 0.59 Ctr of Hair to axes (%) λ 6 λ 5 λ 4 λ 3 λ J.-L. Durand On the Variance of 38/41

20 Burt Example Head Head Expansion Ratio XR = 0.23 Ctr of Head to axes (%) λ 6 λ 5 λ 4 λ 3 λ J.-L. Durand On the Variance of 39/41 MCA In MCA: Summary The variance of eigenvalues is proportional to the average linkage index. The distribution of contributions of a question to axes depends on the linkage ratio of this question. J.-L. Durand On the Variance of 40/41

21 Suggestions in PCA results: report ALI and LIs or LRs of variables. in MCA results: report Φ 2 table, ALI and LIs or LRs of questions. in agglomerative hierarchical clustering of variables: use the within-class ALI as an aggregation index. J.-L. Durand On the Variance of 41/41

Statistical View of Least Squares

Statistical View of Least Squares Basic Ideas Some Examples Least Squares May 22, 2007 Basic Ideas Simple Linear Regression Basic Ideas Some Examples Least Squares Suppose we have two variables x and y Basic Ideas Simple Linear Regression

More information

Applied Multivariate Analysis

Applied Multivariate Analysis Department of Mathematics and Statistics, University of Vaasa, Finland Spring 2017 Dimension reduction Exploratory (EFA) Background While the motivation in PCA is to replace the original (correlated) variables

More information

Overview of clustering analysis. Yuehua Cui

Overview of clustering analysis. Yuehua Cui Overview of clustering analysis Yuehua Cui Email: cuiy@msu.edu http://www.stt.msu.edu/~cui A data set with clear cluster structure How would you design an algorithm for finding the three clusters in this

More information

Multivariate analysis of genetic data: exploring groups diversity

Multivariate analysis of genetic data: exploring groups diversity Multivariate analysis of genetic data: exploring groups diversity T. Jombart Imperial College London Bogota 01-12-2010 1/42 Outline Introduction Clustering algorithms Hierarchical clustering K-means Multivariate

More information

6.041/6.431 Fall 2010 Quiz 2 Solutions

6.041/6.431 Fall 2010 Quiz 2 Solutions 6.04/6.43: Probabilistic Systems Analysis (Fall 200) 6.04/6.43 Fall 200 Quiz 2 Solutions Problem. (80 points) In this problem: (i) X is a (continuous) uniform random variable on [0, 4]. (ii) Y is an exponential

More information

Lecture 2: Data Analytics of Narrative

Lecture 2: Data Analytics of Narrative Lecture 2: Data Analytics of Narrative Data Analytics of Narrative: Pattern Recognition in Text, and Text Synthesis, Supported by the Correspondence Analysis Platform. This Lecture is presented in three

More information

Chapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables

Chapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables Chapter 2 Some Basic Probability Concepts 2.1 Experiments, Outcomes and Random Variables A random variable is a variable whose value is unknown until it is observed. The value of a random variable results

More information

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression BSTT523: Kutner et al., Chapter 1 1 Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression Introduction: Functional relation between

More information

Linear & Non-Linear Discriminant Analysis! Hugh R. Wilson

Linear & Non-Linear Discriminant Analysis! Hugh R. Wilson Linear & Non-Linear Discriminant Analysis! Hugh R. Wilson PCA Review! Supervised learning! Fisher linear discriminant analysis! Nonlinear discriminant analysis! Research example! Multiple Classes! Unsupervised

More information

Math 1710 Class 20. V2u. Last Time. Graphs and Association. Correlation. Regression. Association, Correlation, Regression Dr. Back. Oct.

Math 1710 Class 20. V2u. Last Time. Graphs and Association. Correlation. Regression. Association, Correlation, Regression Dr. Back. Oct. ,, Dr. Back Oct. 14, 2009 Son s Heights from Their Fathers Galton s Original 1886 Data If you know a father s height, what can you say about his son s? Son s Heights from Their Fathers Galton s Original

More information

1. Let A be a 2 2 nonzero real matrix. Which of the following is true?

1. Let A be a 2 2 nonzero real matrix. Which of the following is true? 1. Let A be a 2 2 nonzero real matrix. Which of the following is true? (A) A has a nonzero eigenvalue. (B) A 2 has at least one positive entry. (C) trace (A 2 ) is positive. (D) All entries of A 2 cannot

More information

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture16: Population structure and logistic regression I Jason Mezey jgm45@cornell.edu April 11, 2017 (T) 8:40-9:55 Announcements I April

More information

Dr. Allen Back. Sep. 23, 2016

Dr. Allen Back. Sep. 23, 2016 Dr. Allen Back Sep. 23, 2016 Look at All the Data Graphically A Famous Example: The Challenger Tragedy Look at All the Data Graphically A Famous Example: The Challenger Tragedy Type of Data Looked at the

More information

Simple Linear Regression Analysis

Simple Linear Regression Analysis LINEAR REGRESSION ANALYSIS MODULE II Lecture - 6 Simple Linear Regression Analysis Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Prediction of values of study

More information

6. Let C and D be matrices conformable to multiplication. Then (CD) =

6. Let C and D be matrices conformable to multiplication. Then (CD) = Quiz 1. Name: 10 points per correct answer. (20 points for attendance). 1. Let A = 3 and B = [3 yy]. When is A equal to B? xx A. When x = 3 B. When y = 3 C. When x = y D. Never 2. See 1. What is the dimension

More information

2/26/2017. This is similar to canonical correlation in some ways. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

2/26/2017. This is similar to canonical correlation in some ways. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 What is factor analysis? What are factors? Representing factors Graphs and equations Extracting factors Methods and criteria Interpreting

More information

Introduction to Factor Analysis

Introduction to Factor Analysis to Factor Analysis Lecture 11 November 2, 2005 Multivariate Analysis Lecture #11-11/2/2005 Slide 1 of 58 Today s Lecture Factor Analysis. Today s Lecture Exploratory factor analysis (EFA). Confirmatory

More information

Variance reduction. Michel Bierlaire. Transport and Mobility Laboratory. Variance reduction p. 1/18

Variance reduction. Michel Bierlaire. Transport and Mobility Laboratory. Variance reduction p. 1/18 Variance reduction p. 1/18 Variance reduction Michel Bierlaire michel.bierlaire@epfl.ch Transport and Mobility Laboratory Variance reduction p. 2/18 Example Use simulation to compute I = 1 0 e x dx We

More information

Write your Registration Number, Test Centre, Test Code and the Number of this booklet in the appropriate places on the answersheet.

Write your Registration Number, Test Centre, Test Code and the Number of this booklet in the appropriate places on the answersheet. 2016 Booklet No. Test Code : PSA Forenoon Questions : 30 Time : 2 hours Write your Registration Number, Test Centre, Test Code and the Number of this booklet in the appropriate places on the answersheet.

More information

18 Bivariate normal distribution I

18 Bivariate normal distribution I 8 Bivariate normal distribution I 8 Example Imagine firing arrows at a target Hopefully they will fall close to the target centre As we fire more arrows we find a high density near the centre and fewer

More information

Introduction to Factor Analysis

Introduction to Factor Analysis to Factor Analysis Lecture 10 August 2, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #10-8/3/2011 Slide 1 of 55 Today s Lecture Factor Analysis Today s Lecture Exploratory

More information

Dimension Reduc-on. Example: height of iden-cal twins. PCA, SVD, MDS, and clustering [ RI ] Twin 2 (inches away from avg)

Dimension Reduc-on. Example: height of iden-cal twins. PCA, SVD, MDS, and clustering [ RI ] Twin 2 (inches away from avg) Dimension Reduc-on PCA, SVD, MDS, and clustering Example: height of iden-cal twins Twin (inches away from avg) 0 5 0 5 0 5 0 5 0 Twin (inches away from avg) Expression between two ethnic groups Frequency

More information

Response Surface Methodology III

Response Surface Methodology III LECTURE 7 Response Surface Methodology III 1. Canonical Form of Response Surface Models To examine the estimated regression model we have several choices. First, we could plot response contours. Remember

More information

Ch. 10 Principal Components Analysis (PCA) Outline

Ch. 10 Principal Components Analysis (PCA) Outline Ch. 10 Principal Components Analysis (PCA) Outline 1. Why use PCA? 2. Calculating Principal Components 3. Using Principal Components in Regression 4. PROC FACTOR This material is loosely related to Section

More information

Short Answer Questions: Answer on your separate blank paper. Points are given in parentheses.

Short Answer Questions: Answer on your separate blank paper. Points are given in parentheses. ISQS 6348 Final exam solutions. Name: Open book and notes, but no electronic devices. Answer short answer questions on separate blank paper. Answer multiple choice on this exam sheet. Put your name on

More information

Marquette University MATH 1700 Class 5 Copyright 2017 by D.B. Rowe

Marquette University MATH 1700 Class 5 Copyright 2017 by D.B. Rowe Class 5 Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science Copyright 2017 by D.B. Rowe 1 Agenda: Recap Chapter 3.2-3.3 Lecture Chapter 4.1-4.2 Review Chapter 1 3.1 (Exam

More information

Chapter 4 continued. Chapter 4 sections

Chapter 4 continued. Chapter 4 sections Chapter 4 sections Chapter 4 continued 4.1 Expectation 4.2 Properties of Expectations 4.3 Variance 4.4 Moments 4.5 The Mean and the Median 4.6 Covariance and Correlation 4.7 Conditional Expectation SKIP:

More information

1 Principal Components Analysis

1 Principal Components Analysis Lecture 3 and 4 Sept. 18 and Sept.20-2006 Data Visualization STAT 442 / 890, CM 462 Lecture: Ali Ghodsi 1 Principal Components Analysis Principal components analysis (PCA) is a very popular technique for

More information

7. Variable extraction and dimensionality reduction

7. Variable extraction and dimensionality reduction 7. Variable extraction and dimensionality reduction The goal of the variable selection in the preceding chapter was to find least useful variables so that it would be possible to reduce the dimensionality

More information

Principal Components Analysis (PCA)

Principal Components Analysis (PCA) Principal Components Analysis (PCA) Principal Components Analysis (PCA) a technique for finding patterns in data of high dimension Outline:. Eigenvectors and eigenvalues. PCA: a) Getting the data b) Centering

More information

Multivariate analysis of genetic data exploring group diversity

Multivariate analysis of genetic data exploring group diversity Multivariate analysis of genetic data exploring group diversity Thibaut Jombart, Marie-Pauline Beugin MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis with PR

More information

Covariance and Correlation

Covariance and Correlation Covariance and Correlation ST 370 The probability distribution of a random variable gives complete information about its behavior, but its mean and variance are useful summaries. Similarly, the joint probability

More information

ME 597: AUTONOMOUS MOBILE ROBOTICS SECTION 2 PROBABILITY. Prof. Steven Waslander

ME 597: AUTONOMOUS MOBILE ROBOTICS SECTION 2 PROBABILITY. Prof. Steven Waslander ME 597: AUTONOMOUS MOBILE ROBOTICS SECTION 2 Prof. Steven Waslander p(a): Probability that A is true 0 pa ( ) 1 p( True) 1, p( False) 0 p( A B) p( A) p( B) p( A B) A A B B 2 Discrete Random Variable X

More information

Computation. For QDA we need to calculate: Lets first consider the case that

Computation. For QDA we need to calculate: Lets first consider the case that Computation For QDA we need to calculate: δ (x) = 1 2 log( Σ ) 1 2 (x µ ) Σ 1 (x µ ) + log(π ) Lets first consider the case that Σ = I,. This is the case where each distribution is spherical, around the

More information

Lecture 10. Variance and standard deviation

Lecture 10. Variance and standard deviation 18.440: Lecture 10 Variance and standard deviation Scott Sheffield MIT 1 Outline Defining variance Examples Properties Decomposition trick 2 Outline Defining variance Examples Properties Decomposition

More information

Assessment, analysis and interpretation of Patient Reported Outcomes (PROs)

Assessment, analysis and interpretation of Patient Reported Outcomes (PROs) Assessment, analysis and interpretation of Patient Reported Outcomes (PROs) Day 2 Summer school in Applied Psychometrics Peterhouse College, Cambridge 12 th to 16 th September 2011 This course is prepared

More information

Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees

Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees Rafdord M. Neal and Jianguo Zhang Presented by Jiwen Li Feb 2, 2006 Outline Bayesian view of feature

More information

Solutionbank S1 Edexcel AS and A Level Modular Mathematics

Solutionbank S1 Edexcel AS and A Level Modular Mathematics Heinemann Solutionbank: Statistics S Page of Solutionbank S Exercise A, Question Write down whether or not each of the following is a discrete random variable. Give a reason for your answer. a The average

More information

A Peak to the World of Multivariate Statistical Analysis

A Peak to the World of Multivariate Statistical Analysis A Peak to the World of Multivariate Statistical Analysis Real Contents Real Real Real Why is it important to know a bit about the theory behind the methods? Real 5 10 15 20 Real 10 15 20 Figure: Multivariate

More information

Key Algebraic Results in Linear Regression

Key Algebraic Results in Linear Regression Key Algebraic Results in Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) 1 / 30 Key Algebraic Results in

More information

A Short Course in Basic Statistics

A Short Course in Basic Statistics A Short Course in Basic Statistics Ian Schindler November 5, 2017 Creative commons license share and share alike BY: C 1 Descriptive Statistics 1.1 Presenting statistical data Definition 1 A statistical

More information

Machine Learning 2nd Edition

Machine Learning 2nd Edition INTRODUCTION TO Lecture Slides for Machine Learning 2nd Edition ETHEM ALPAYDIN, modified by Leonardo Bobadilla and some parts from http://www.cs.tau.ac.il/~apartzin/machinelearning/ The MIT Press, 2010

More information

Regression and Covariance

Regression and Covariance Regression and Covariance James K. Peterson Department of Biological ciences and Department of Mathematical ciences Clemson University April 16, 2014 Outline A Review of Regression Regression and Covariance

More information

Methods for territorial intelligence.

Methods for territorial intelligence. Methods for territorial intelligence. Serge Ormaux To cite this version: Serge Ormaux. Methods for territorial intelligence.. In International Conference of Territorial Intelligence, Sep 2006, Alba Iulia,

More information

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik MAT2377 Rafa l Kulik Version 2015/November/26 Rafa l Kulik Bivariate data and scatterplot Data: Hydrocarbon level (x) and Oxygen level (y): x: 0.99, 1.02, 1.15, 1.29, 1.46, 1.36, 0.87, 1.23, 1.55, 1.40,

More information

1 A factor can be considered to be an underlying latent variable: (a) on which people differ. (b) that is explained by unknown variables

1 A factor can be considered to be an underlying latent variable: (a) on which people differ. (b) that is explained by unknown variables 1 A factor can be considered to be an underlying latent variable: (a) on which people differ (b) that is explained by unknown variables (c) that cannot be defined (d) that is influenced by observed variables

More information

Conjoint use of variables clustering and PLS structural equations modelling

Conjoint use of variables clustering and PLS structural equations modelling Conjoint use of variables clustering and PLS structural equations modelling Valentina Stan 1 and Gilbert Saporta 1 1 Conservatoire National des Arts et Métiers, 9 Rue Saint Martin, F 75141 Paris Cedex

More information

Weighted Least Squares

Weighted Least Squares Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w

More information

STATISTICS 407 METHODS OF MULTIVARIATE ANALYSIS TOPICS

STATISTICS 407 METHODS OF MULTIVARIATE ANALYSIS TOPICS STATISTICS 407 METHODS OF MULTIVARIATE ANALYSIS TOPICS Principal Component Analysis (PCA): Reduce the, summarize the sources of variation in the data, transform the data into a new data set where the variables

More information

Dependencies in Interval-valued. valued Symbolic Data. Lynne Billard University of Georgia

Dependencies in Interval-valued. valued Symbolic Data. Lynne Billard University of Georgia Dependencies in Interval-valued valued Symbolic Data Lynne Billard University of Georgia lynne@stat.uga.edu Tribute to Professor Edwin Diday: Paris, France; 5 September 2007 Naturally occurring Symbolic

More information

Principal Components Analysis in 2D

Principal Components Analysis in 2D Principal Components Analysis in D Miguel A. Lerma October 30, 017 Abstract Here we study -dimensional PCA and discuss an application to the location of a set of points in the plane in an elliptical region.

More information

A Tutorial on Data Reduction. Principal Component Analysis Theoretical Discussion. By Shireen Elhabian and Aly Farag

A Tutorial on Data Reduction. Principal Component Analysis Theoretical Discussion. By Shireen Elhabian and Aly Farag A Tutorial on Data Reduction Principal Component Analysis Theoretical Discussion By Shireen Elhabian and Aly Farag University of Louisville, CVIP Lab November 2008 PCA PCA is A backbone of modern data

More information

Simple Linear Regression (Part 3)

Simple Linear Regression (Part 3) Chapter 1 Simple Linear Regression (Part 3) 1 Write an Estimated model Statisticians/Econometricians usually write an estimated model together with some inference statistics, the following are some formats

More information

Study Sheet. December 10, The course PDF has been updated (6/11). Read the new one.

Study Sheet. December 10, The course PDF has been updated (6/11). Read the new one. Study Sheet December 10, 2017 The course PDF has been updated (6/11). Read the new one. 1 Definitions to know The mode:= the class or center of the class with the highest frequency. The median : Q 2 is

More information

Factor Analysis: An Introduction. What is Factor Analysis? 100+ years of Factor Analysis FACTOR ANALYSIS AN INTRODUCTION NILAM RAM

Factor Analysis: An Introduction. What is Factor Analysis? 100+ years of Factor Analysis FACTOR ANALYSIS AN INTRODUCTION NILAM RAM NILAM RAM 2018 PSYCHOLOGY R BOOTCAMP PENNSYLVANIA STATE UNIVERSITY AUGUST 16, 2018 FACTOR ANALYSIS https://psu-psychology.github.io/r-bootcamp-2018/index.html WITH ADDITIONAL MATERIALS AT https://quantdev.ssri.psu.edu/tutorials

More information

Intelligent Data Analysis. Principal Component Analysis. School of Computer Science University of Birmingham

Intelligent Data Analysis. Principal Component Analysis. School of Computer Science University of Birmingham Intelligent Data Analysis Principal Component Analysis Peter Tiňo School of Computer Science University of Birmingham Discovering low-dimensional spatial layout in higher dimensional spaces - 1-D/3-D example

More information

What is Principal Component Analysis?

What is Principal Component Analysis? What is Principal Component Analysis? Principal component analysis (PCA) Reduce the dimensionality of a data set by finding a new set of variables, smaller than the original set of variables Retains most

More information

The Common Factor Model. Measurement Methods Lecture 15 Chapter 9

The Common Factor Model. Measurement Methods Lecture 15 Chapter 9 The Common Factor Model Measurement Methods Lecture 15 Chapter 9 Today s Class Common Factor Model Multiple factors with a single test ML Estimation Methods New fit indices because of ML Estimation method

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Vectors and Matrices Statistics with Vectors and Matrices

Vectors and Matrices Statistics with Vectors and Matrices Vectors and Matrices Statistics with Vectors and Matrices Lecture 3 September 7, 005 Analysis Lecture #3-9/7/005 Slide 1 of 55 Today s Lecture Vectors and Matrices (Supplement A - augmented with SAS proc

More information

MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 8: Canonical Correlation Analysis

MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 8: Canonical Correlation Analysis MS-E2112 Multivariate Statistical (5cr) Lecture 8: Contents Canonical correlation analysis involves partition of variables into two vectors x and y. The aim is to find linear combinations α T x and β

More information

Unsupervised machine learning

Unsupervised machine learning Chapter 9 Unsupervised machine learning Unsupervised machine learning (a.k.a. cluster analysis) is a set of methods to assign objects into clusters under a predefined distance measure when class labels

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

Lecture 13. Simple Linear Regression

Lecture 13. Simple Linear Regression 1 / 27 Lecture 13 Simple Linear Regression October 28, 2010 2 / 27 Lesson Plan 1. Ordinary Least Squares 2. Interpretation 3 / 27 Motivation Suppose we want to approximate the value of Y with a linear

More information

BNAD 276 Lecture 10 Simple Linear Regression Model

BNAD 276 Lecture 10 Simple Linear Regression Model 1 / 27 BNAD 276 Lecture 10 Simple Linear Regression Model Phuong Ho May 30, 2017 2 / 27 Outline 1 Introduction 2 3 / 27 Outline 1 Introduction 2 4 / 27 Simple Linear Regression Model Managerial decisions

More information

Two-Variable Regression Model: The Problem of Estimation

Two-Variable Regression Model: The Problem of Estimation Two-Variable Regression Model: The Problem of Estimation Introducing the Ordinary Least Squares Estimator Jamie Monogan University of Georgia Intermediate Political Methodology Jamie Monogan (UGA) Two-Variable

More information

Lecture 4: Proofs for Expectation, Variance, and Covariance Formula

Lecture 4: Proofs for Expectation, Variance, and Covariance Formula Lecture 4: Proofs for Expectation, Variance, and Covariance Formula by Hiro Kasahara Vancouver School of Economics University of British Columbia Hiro Kasahara (UBC) Econ 325 1 / 28 Discrete Random Variables:

More information

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 Lecture 3: Linear Models Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector of observed

More information

Lecture 5: November 19, Minimizing the maximum intracluster distance

Lecture 5: November 19, Minimizing the maximum intracluster distance Analysis of DNA Chips and Gene Networks Spring Semester, 2009 Lecture 5: November 19, 2009 Lecturer: Ron Shamir Scribe: Renana Meller 5.1 Minimizing the maximum intracluster distance 5.1.1 Introduction

More information

Principal component analysis and the asymptotic distribution of high-dimensional sample eigenvectors

Principal component analysis and the asymptotic distribution of high-dimensional sample eigenvectors Principal component analysis and the asymptotic distribution of high-dimensional sample eigenvectors Kristoffer Hellton Department of Mathematics, University of Oslo May 12, 2015 K. Hellton (UiO) Distribution

More information

Principal Components Theory Notes

Principal Components Theory Notes Principal Components Theory Notes Charles J. Geyer August 29, 2007 1 Introduction These are class notes for Stat 5601 (nonparametrics) taught at the University of Minnesota, Spring 2006. This not a theory

More information

Dimension Reduction (PCA, ICA, CCA, FLD,

Dimension Reduction (PCA, ICA, CCA, FLD, Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011 April 6 th, 2011 Parts of the PCA slides are from previous 10-701 lectures 1 Outline Dimension reduction

More information

Class 11 Maths Chapter 15. Statistics

Class 11 Maths Chapter 15. Statistics 1 P a g e Class 11 Maths Chapter 15. Statistics Statistics is the Science of collection, organization, presentation, analysis and interpretation of the numerical data. Useful Terms 1. Limit of the Class

More information

Correlation and Regression

Correlation and Regression Correlation and Regression. ITRDUCTI Till now, we have been working on one set of observations or measurements e.g. heights of students in a class, marks of students in an exam, weekly wages of workers

More information

Machine Learning - MT Clustering

Machine Learning - MT Clustering Machine Learning - MT 2016 15. Clustering Varun Kanade University of Oxford November 28, 2016 Announcements No new practical this week All practicals must be signed off in sessions this week Firm Deadline:

More information

STATISTICAL LEARNING SYSTEMS

STATISTICAL LEARNING SYSTEMS STATISTICAL LEARNING SYSTEMS LECTURE 8: UNSUPERVISED LEARNING: FINDING STRUCTURE IN DATA Institute of Computer Science, Polish Academy of Sciences Ph. D. Program 2013/2014 Principal Component Analysis

More information

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti

More information

Weighted Least Squares

Weighted Least Squares Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand

More information

Pharmaceutical Experimental Design and Interpretation

Pharmaceutical Experimental Design and Interpretation Pharmaceutical Experimental Design and Interpretation N. ANTHONY ARMSTRONG, B. Pharm., Ph.D., F.R.Pharm.S., MCPP. KENNETH C. JAMES, M. Pharm., Ph.D., D.Sc, FRSC, F.R.Pharm.S., C.Chem. Welsh School of Pharmacy,

More information

Lecture 5: Clustering, Linear Regression

Lecture 5: Clustering, Linear Regression Lecture 5: Clustering, Linear Regression Reading: Chapter 10, Sections 3.1-3.2 STATS 202: Data mining and analysis October 4, 2017 1 / 22 .0.0 5 5 1.0 7 5 X2 X2 7 1.5 1.0 0.5 3 1 2 Hierarchical clustering

More information

BIO 682 Multivariate Statistics (Lite) Spring 2010

BIO 682 Multivariate Statistics (Lite) Spring 2010 BIO 682 Multivariate Statistics (Lite) Spring 2010 Steve Shuster http://www4.nau.edu/shustercourses/bio682/index.htm Lecture 10 Outline for This Section 1. Multiple regression in ecological and behavioral

More information

arxiv: v1 [cs.lg] 28 Nov 2007

arxiv: v1 [cs.lg] 28 Nov 2007 Covariance and PCA for Categorical Variables arxiv:711.4452v1 [cs.lg] 28 Nov 27 Hirotaka Niitsuma and Takashi Okada November 9, 218 Abstract Covariances from categorical variables are defined using a regular

More information

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision) CS4495/6495 Introduction to Computer Vision 8B-L2 Principle Component Analysis (and its use in Computer Vision) Wavelength 2 Wavelength 2 Principal Components Principal components are all about the directions

More information

18.440: Lecture 25 Covariance and some conditional expectation exercises

18.440: Lecture 25 Covariance and some conditional expectation exercises 18.440: Lecture 25 Covariance and some conditional expectation exercises Scott Sheffield MIT Outline Covariance and correlation Outline Covariance and correlation A property of independence If X and Y

More information

Chapter 10 Conjoint Use of Variables Clustering and PLS Structural Equations Modeling

Chapter 10 Conjoint Use of Variables Clustering and PLS Structural Equations Modeling Chapter 10 Conjoint Use of Variables Clustering and PLS Structural Equations Modeling Valentina Stan and Gilbert Saporta Abstract In PLS approach, it is frequently assumed that the blocks of variables

More information

EXPECTED VALUE of a RV. corresponds to the average value one would get for the RV when repeating the experiment, =0.

EXPECTED VALUE of a RV. corresponds to the average value one would get for the RV when repeating the experiment, =0. EXPECTED VALUE of a RV corresponds to the average value one would get for the RV when repeating the experiment, independently, infinitely many times. Sample (RIS) of n values of X (e.g. More accurately,

More information

Lecture 5: Clustering, Linear Regression

Lecture 5: Clustering, Linear Regression Lecture 5: Clustering, Linear Regression Reading: Chapter 10, Sections 3.1-3.2 STATS 202: Data mining and analysis October 4, 2017 1 / 22 Hierarchical clustering Most algorithms for hierarchical clustering

More information

Measuring the fit of the model - SSR

Measuring the fit of the model - SSR Measuring the fit of the model - SSR Once we ve determined our estimated regression line, we d like to know how well the model fits. How far/close are the observations to the fitted line? One way to do

More information

MULTIVARIATE DISTRIBUTIONS

MULTIVARIATE DISTRIBUTIONS Chapter 9 MULTIVARIATE DISTRIBUTIONS John Wishart (1898-1956) British statistician. Wishart was an assistant to Pearson at University College and to Fisher at Rothamsted. In 1928 he derived the distribution

More information

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang. Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning

More information

Principal component analysis

Principal component analysis Principal component analysis Angela Montanari 1 Introduction Principal component analysis (PCA) is one of the most popular multivariate statistical methods. It was first introduced by Pearson (1901) and

More information

CS70: Jean Walrand: Lecture 22.

CS70: Jean Walrand: Lecture 22. CS70: Jean Walrand: Lecture 22. Confidence Intervals; Linear Regression 1. Review 2. Confidence Intervals 3. Motivation for LR 4. History of LR 5. Linear Regression 6. Derivation 7. More examples Review:

More information

Econ 371 Problem Set #1 Answer Sheet

Econ 371 Problem Set #1 Answer Sheet Econ 371 Problem Set #1 Answer Sheet 2.1 In this question, you are asked to consider the random variable Y, which denotes the number of heads that occur when two coins are tossed. a. The first part of

More information

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR

More information

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013 Applied Regression Chapter 2 Simple Linear Regression Hongcheng Li April, 6, 2013 Outline 1 Introduction of simple linear regression 2 Scatter plot 3 Simple linear regression model 4 Test of Hypothesis

More information

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) 1 EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) Taisuke Otsu London School of Economics Summer 2018 A.1. Summation operator (Wooldridge, App. A.1) 2 3 Summation operator For

More information

Graph Functional Methods for Climate Partitioning

Graph Functional Methods for Climate Partitioning Graph Functional Methods for Climate Partitioning Mathilde Mougeot - with D. Picard, V. Lefieux*, M. Marchand* Université Paris Diderot, France *Réseau Transport Electrique (RTE) Buenos Aires, 2015 Mathilde

More information

Regression Models - Introduction

Regression Models - Introduction Regression Models - Introduction In regression models, two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent variable,

More information