An introduction to multivariate data

Size: px
Start display at page:

Download "An introduction to multivariate data"

Transcription

1 An introduction to multivariate data Angela Montanari 1 The data matrix The starting point of any analysis of multivariate data is a data matrix, i.e. a collection of n observations on a set of p characters X 1,..., X k,..., X p (they may be numeric variables, or binary variables, or suitably coded categorical variables): x x 1k... x 1p X = x i1... x ik... x ip x n1... x nk... x np The element x ik represents the value of the k-th variable on the i-th observed unit. Each row of X corresponds to an observed unit. Each column of X corresponds to an observed variable. The n statistical units can be thought of as as many points in the p-dimensional space R p. The following data matrix (24 3) contains data on length, width and height (in mm) of the carapace from 24 male painted turtles (Jolicoeur and Mosimann, 1960). There is one row per turtle, and one column per variable. Multivariate analysis is concerned either with studying the relationships between variables, or with studying the similarities between units. In the first set of methods we will consider: angela.montanari@unibo.it 1

2 Table 1: Data matrix length width height

3 - Principal component analysis - Factor analysis - Discriminant analysis In the second set we will deal with - Clustering methods. Starting from the data matrix X a series of different matrices can be derived. We will first concentrate on the matrices dealing with relationships between variables, leaving the theme of measuring dissimilarities between units to when we will deal with clustering. 2 The average vector When dealing with numeric variables, we might be interested in associating to each variable its arithmetic mean. The p means can be collected in a p dimensional vector x 1... x = x k... = x p ( ) 1 n 1 n X = 1 n X 1 n 3 The mean centered data matrix In certain applications it might be useful to express the variables as deviations from the mean. The data matrix becomes: x x 1k... x 1p X = x i1... x ik... x ip x n1... x nk... x np where x ik = x ik x k. 3

4 ( ) ( 1 X = X 1 n x = X 1 n n 1 n X = I n 1 ) n 1 n1 n X = AX where A is the so called centering matrix. A is squared n n, symmetric and idempotent. Each column of X has zero sum (zero mean). The matrix X defines a translation of the origin of the original reference system. The shape of the point cloud remains unchanged, but the origin of the axes is moved to x. 4 The standardized data matrix If one wants to eliminate the effect of different scales on the observed variables, one can resort to the standardized data matrix z z 1k... z 1p Z = z i1... z ik... z ip z n1... z nk... z np where z ik = x ik V ar(xk ) = x ik x k V ar(xk ) If we denote by D the p p diagonal matrix having the variances of the observed variables on the main diagonal, the standardized data matrix can be defined as Z = XD 1/2. Each column of Z has zero mean and unit variance. 5 The covariance matrix The covariance between two variables X k and X h is defined as: Cov (X k, X h ) = n (x ik x k )(x ih x h )/n = 1 n 4 n ( x ik x ih ) = 1 n x k x h

5 where x k and x h are the k-th and the h-th columns of X respectively. It is worth remembering that the covariance between a variable and itself is but the variance of the variable itself. Variances and covariances can then be summarized in the so called covariance matrix S: S = 1 n X X = 1 ( X 1n x ) ( X 1n x ) = n V ar(x 1 )... Cov(X 1, X k )... Cov(X 1, X p ) = Covar(X k, X 1 )... V ar(x k )... Cov(X k, X p ) = Covar(X p, X 1 )... Covar(X p, X k )... V ar(x p ) s s 1k... s 1p = s k1... s kk... s kp s p1... s pk... s pp where the diagonal elements are the variances and the off-diagonal elements are the covariances. The covariance matrix has many relevant properties: it is squared (p p); it is symmetric; it is positive semi definite; its trace is the so called total variance tr(s) = p k=1 V ar(x k). In order to have an intuition of the reason why the covariance matrix is positive semi definite consider the simple case where two variables only have been observed. Because of the symmetry property, their covariance matrix is [ ] s11 s S = 12 s 12 s 22 This matrix is positive semi definite if its determinant is greater than or equal to 0: det S = s 11 s 22 s

6 After dividing both sides of the inequality by s 11 s 22 we obtain 1 s2 12 s 11 s s 2 12 This inequality is always true as s 11 s 22 = r12 2 is the squared correlation coefficient between X 1 and X 2 which, by definition, can only take values between 0 and 1, both included. If r12 2 is equal to 1, S is positive semi definite; for all values of r12 2 other than 1, S is positive definite. 6 The correlation matrix The covariance between two standardized variables Z k and Z h is defined as: Cov (Z k, Z h ) = n (z ik z k )(z ih z h )/n = n z ik z ih /n = z k z h /n because of the zero mean property of standardized variables. If we replace z ik and z ih by their expressions as a function of the observed variables (see page 4) we obtain n n Cov (Z k, Z h ) = z ik z ih /n = (x ik x k )(x ih x h )/n = V ar(xk )V ar(x h ) = Cov(X k, X h ) V ar(xk )V ar(x h ) = r kh. This means that the covariance between two standardized variables coincides with their correlation. The correlation of a variable with itself is equal to 1, as is the variance of a standardized variable. In matrix form we have 1... r 1k... r 1p R = 1 n Z Z = n D 1/2 X XD 1/2 = D 1/2 SD 1/2 = r k r kp r p1... r pk... 1 R is the correlation matrix; it has many relevant properties: it is squared (p p); 6

7 it is symmetric; it is positive semi definite (as it is the covariance matrix of the standardized variables); all its diagonal elements are equal to 1; therefore its trace is tr(r) = p. 7 Multivariate random variables and derived linear combinations The data matrix X may be thought of as describing a sequence of n empirical realizations of a p-dimensional random vector x. In the following we will first describe multivariate statistical methods considering random vectors (i.e. at the population level) and then we will derive their sample counterpart. Let s assume we are dealing with a p-dimensional random vector x; we will denote by µ its p-dimensional expectation and by Σ its p p covariance matrix. It is worth remembering that, for mean centered random variables, Σ = E(xx ). Most of the multivariate statistical methods we will deal with in the following are based on linear combinations of the components of a random vector. We will define as y = a x such a linear combination where a is the p- dimensional vector of coefficients. Note that y is a scalar random variable. In case the interest is in more than one linear combination, say m, the vectors of coefficients will be the columns of a p m matrix A; the m linear combinations will be the components of the m-dimensional random vector y : y = A x. Linear combinations defined by an orthogonal matrix A describe an axes rotation in the multidimensional space. Again the simpler two variable case may help in understanding why. Figure 1 presents the coordinates of point P, both in the original reference system X 1, X 2 and in the new reference system Y 1, Y 2 obtained after rotating the system X 1, X 2 by an angle α. The coordinates of the point P in the original reference system are (x 1, x 2 ). The coordinates of the same point in the rotated reference system (y 1, y 2 ) 7

8 X 2 Y2 x 2 P Y 1 y 1 y 2 D x 1 X 1 Figure 1: Axes rotation can be obtained from the original ones as y 1 = x 1 cos α + x 2 sin α y 2 = x 1 sin α + x 2 cos α or, with a notation coherent with the one we used before as: y 1 = a 1 x y 2 = a 2 x where, because of the property sin 2 α + cos 2 α = 1 both a 1 and a 2 are unit norm vectors. 8

9 The rotated coordinates are therefore a linear combination of the original ones. The expected value and the variance of a single linear combination will be: and E(y) = E(a x) = a E(x) = a µ V (y) = V (a x) = a V (x)a = a Σa. The expected value and the variance of multiple linear combinations will be: and E(y) = E(A x) = A E(x) = A µ V (y) = V(A x) = A V(x)A = A ΣA. Exercise 1. Given a bi-dimensional random vector x = (x 1, x 2 ) with expected value µ = (µ 1, µ 2 ) and covariance matrix [ ] σ11 σ Σ = 12 σ 12 σ 22 consider the two linear combinations y1 = x 1 x 2 and y 2 = x 1 + x 2 and derive the expected value and the covariance (the solution will be provided in class). Exercise 2. Consider three independent standardized variables Z 1, Z 2, Z 3. Assume you transform them as follows obtaining three new variables Y 1, Y 2, Y 3 : Y 1 = Z 1 Y 2 = Y Z 2 Y 3 = 10Z 3 Derive the covariance matrix of the new Y variables. 9

10 Because of the properties of the covariance matrix, the variance of a linear combination is a positive semi definite quadratic form. It is interesting to study its properties as the vector a varies. For this purpose let s consider again the simple case consisting of two variables only. Expectation and covariance matrix are the same as in Exercise 1. After suitably performing the scalar product we obtain: V (y) = a Σa = [ ] [ ] [ ] σ a 1 a 11 σ 12 a1 2 = a 2 σ 12 σ 22 a 2 1σ a 1 a 2 σ 12 + a 2 2σ 22 If we read this variance as a function of a 1 we easily recognize, in the polynomial of degree 2, the equation of a parabola. The coefficient of a 2 1 is positive since it is a variance; the equation describes therefore a concave up parabola. The same happens if we read the variance as a function of a 2. This means that, as a varies, V (y) does never reach a finite maximum, but only a minimum. 10

Principal component analysis

Principal component analysis Principal component analysis Angela Montanari 1 Introduction Principal component analysis (PCA) is one of the most popular multivariate statistical methods. It was first introduced by Pearson (1901) and

More information

Econ Slides from Lecture 8

Econ Slides from Lecture 8 Econ 205 Sobel Econ 205 - Slides from Lecture 8 Joel Sobel September 1, 2010 Computational Facts 1. det AB = det BA = det A det B 2. If D is a diagonal matrix, then det D is equal to the product of its

More information

Principal Components Theory Notes

Principal Components Theory Notes Principal Components Theory Notes Charles J. Geyer August 29, 2007 1 Introduction These are class notes for Stat 5601 (nonparametrics) taught at the University of Minnesota, Spring 2006. This not a theory

More information

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x = Linear Algebra Review Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1 x x = 2. x n Vectors of up to three dimensions are easy to diagram.

More information

Notes on Linear Algebra and Matrix Theory

Notes on Linear Algebra and Matrix Theory Massimo Franceschet featuring Enrico Bozzo Scalar product The scalar product (a.k.a. dot product or inner product) of two real vectors x = (x 1,..., x n ) and y = (y 1,..., y n ) is not a vector but a

More information

Random Vectors 1. STA442/2101 Fall See last slide for copyright information. 1 / 30

Random Vectors 1. STA442/2101 Fall See last slide for copyright information. 1 / 30 Random Vectors 1 STA442/2101 Fall 2017 1 See last slide for copyright information. 1 / 30 Background Reading: Renscher and Schaalje s Linear models in statistics Chapter 3 on Random Vectors and Matrices

More information

An Introduction to Multivariate Methods

An Introduction to Multivariate Methods Chapter 12 An Introduction to Multivariate Methods Multivariate statistical methods are used to display, analyze, and describe data on two or more features or variables simultaneously. I will discuss multivariate

More information

Vectors and Matrices Statistics with Vectors and Matrices

Vectors and Matrices Statistics with Vectors and Matrices Vectors and Matrices Statistics with Vectors and Matrices Lecture 3 September 7, 005 Analysis Lecture #3-9/7/005 Slide 1 of 55 Today s Lecture Vectors and Matrices (Supplement A - augmented with SAS proc

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

The Multivariate Gaussian Distribution

The Multivariate Gaussian Distribution The Multivariate Gaussian Distribution Chuong B. Do October, 8 A vector-valued random variable X = T X X n is said to have a multivariate normal or Gaussian) distribution with mean µ R n and covariance

More information

1. Introduction to Multivariate Analysis

1. Introduction to Multivariate Analysis 1. Introduction to Multivariate Analysis Isabel M. Rodrigues 1 / 44 1.1 Overview of multivariate methods and main objectives. WHY MULTIVARIATE ANALYSIS? Multivariate statistical analysis is concerned with

More information

Properties of Summation Operator

Properties of Summation Operator Econ 325 Section 003/004 Notes on Variance, Covariance, and Summation Operator By Hiro Kasahara Properties of Summation Operator For a sequence of the values {x 1, x 2,..., x n, we write the sum of x 1,

More information

Algebra 1 Khan Academy Video Correlations By SpringBoard Activity and Learning Target

Algebra 1 Khan Academy Video Correlations By SpringBoard Activity and Learning Target Algebra 1 Khan Academy Video Correlations By SpringBoard Activity and Learning Target SB Activity Activity 1 Investigating Patterns 1-1 Learning Targets: Identify patterns in data. Use tables, graphs,

More information

Discriminant analysis and supervised classification

Discriminant analysis and supervised classification Discriminant analysis and supervised classification Angela Montanari 1 Linear discriminant analysis Linear discriminant analysis (LDA) also known as Fisher s linear discriminant analysis or as Canonical

More information

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis: Uses one group of variables (we will call this X) In

More information

Statistics for Applications. Chapter 9: Principal Component Analysis (PCA) 1/16

Statistics for Applications. Chapter 9: Principal Component Analysis (PCA) 1/16 Statistics for Applications Chapter 9: Principal Component Analysis (PCA) 1/16 Multivariate statistics and review of linear algebra (1) Let X be a d-dimensional random vector and X 1,..., X n be n independent

More information

Stat 206: Sampling theory, sample moments, mahalanobis

Stat 206: Sampling theory, sample moments, mahalanobis Stat 206: Sampling theory, sample moments, mahalanobis topology James Johndrow (adapted from Iain Johnstone s notes) 2016-11-02 Notation My notation is different from the book s. This is partly because

More information

Basic Concepts in Matrix Algebra

Basic Concepts in Matrix Algebra Basic Concepts in Matrix Algebra An column array of p elements is called a vector of dimension p and is written as x p 1 = x 1 x 2. x p. The transpose of the column vector x p 1 is row vector x = [x 1

More information

Principal Component Analysis (PCA) Theory, Practice, and Examples

Principal Component Analysis (PCA) Theory, Practice, and Examples Principal Component Analysis (PCA) Theory, Practice, and Examples Data Reduction summarization of data with many (p) variables by a smaller set of (k) derived (synthetic, composite) variables. p k n A

More information

Sample Geometry. Edps/Soc 584, Psych 594. Carolyn J. Anderson

Sample Geometry. Edps/Soc 584, Psych 594. Carolyn J. Anderson Sample Geometry Edps/Soc 584, Psych 594 Carolyn J. Anderson Department of Educational Psychology I L L I N O I S university of illinois at urbana-champaign c Board of Trustees, University of Illinois Spring

More information

More Linear Algebra. Edps/Soc 584, Psych 594. Carolyn J. Anderson

More Linear Algebra. Edps/Soc 584, Psych 594. Carolyn J. Anderson More Linear Algebra Edps/Soc 584, Psych 594 Carolyn J. Anderson Department of Educational Psychology I L L I N O I S university of illinois at urbana-champaign c Board of Trustees, University of Illinois

More information

Lecture 6: Selection on Multiple Traits

Lecture 6: Selection on Multiple Traits Lecture 6: Selection on Multiple Traits Bruce Walsh lecture notes Introduction to Quantitative Genetics SISG, Seattle 16 18 July 2018 1 Genetic vs. Phenotypic correlations Within an individual, trait values

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2 MA 575 Linear Models: Cedric E Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2 1 Revision: Probability Theory 11 Random Variables A real-valued random variable is

More information

Matrix Algebra Determinant, Inverse matrix. Matrices. A. Fabretti. Mathematics 2 A.Y. 2015/2016. A. Fabretti Matrices

Matrix Algebra Determinant, Inverse matrix. Matrices. A. Fabretti. Mathematics 2 A.Y. 2015/2016. A. Fabretti Matrices Matrices A. Fabretti Mathematics 2 A.Y. 2015/2016 Table of contents Matrix Algebra Determinant Inverse Matrix Introduction A matrix is a rectangular array of numbers. The size of a matrix is indicated

More information

Linear algebra I Homework #1 due Thursday, Oct Show that the diagonals of a square are orthogonal to one another.

Linear algebra I Homework #1 due Thursday, Oct Show that the diagonals of a square are orthogonal to one another. Homework # due Thursday, Oct. 0. Show that the diagonals of a square are orthogonal to one another. Hint: Place the vertices of the square along the axes and then introduce coordinates. 2. Find the equation

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Laurenz Wiskott Institute for Theoretical Biology Humboldt-University Berlin Invalidenstraße 43 D-10115 Berlin, Germany 11 March 2004 1 Intuition Problem Statement Experimental

More information

L3: Review of linear algebra and MATLAB

L3: Review of linear algebra and MATLAB L3: Review of linear algebra and MATLAB Vector and matrix notation Vectors Matrices Vector spaces Linear transformations Eigenvalues and eigenvectors MATLAB primer CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna

More information

2. Matrix Algebra and Random Vectors

2. Matrix Algebra and Random Vectors 2. Matrix Algebra and Random Vectors 2.1 Introduction Multivariate data can be conveniently display as array of numbers. In general, a rectangular array of numbers with, for instance, n rows and p columns

More information

Week Quadratic forms. Principal axes theorem. Text reference: this material corresponds to parts of sections 5.5, 8.2,

Week Quadratic forms. Principal axes theorem. Text reference: this material corresponds to parts of sections 5.5, 8.2, Math 051 W008 Margo Kondratieva Week 10-11 Quadratic forms Principal axes theorem Text reference: this material corresponds to parts of sections 55, 8, 83 89 Section 41 Motivation and introduction Consider

More information

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ). .8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics

More information

TAMS39 Lecture 2 Multivariate normal distribution

TAMS39 Lecture 2 Multivariate normal distribution TAMS39 Lecture 2 Multivariate normal distribution Martin Singull Department of Mathematics Mathematical Statistics Linköping University, Sweden Content Lecture Random vectors Multivariate normal distribution

More information

The Hilbert Space of Random Variables

The Hilbert Space of Random Variables The Hilbert Space of Random Variables Electrical Engineering 126 (UC Berkeley) Spring 2018 1 Outline Fix a probability space and consider the set H := {X : X is a real-valued random variable with E[X 2

More information

Chapter 6. Eigenvalues. Josef Leydold Mathematical Methods WS 2018/19 6 Eigenvalues 1 / 45

Chapter 6. Eigenvalues. Josef Leydold Mathematical Methods WS 2018/19 6 Eigenvalues 1 / 45 Chapter 6 Eigenvalues Josef Leydold Mathematical Methods WS 2018/19 6 Eigenvalues 1 / 45 Closed Leontief Model In a closed Leontief input-output-model consumption and production coincide, i.e. V x = x

More information

The coordinates of the vertex of the corresponding parabola are p, q. If a > 0, the parabola opens upward. If a < 0, the parabola opens downward.

The coordinates of the vertex of the corresponding parabola are p, q. If a > 0, the parabola opens upward. If a < 0, the parabola opens downward. Mathematics 10 Page 1 of 8 Quadratic Relations in Vertex Form The expression y ax p q defines a quadratic relation in form. The coordinates of the of the corresponding parabola are p, q. If a > 0, the

More information

CS 143 Linear Algebra Review

CS 143 Linear Algebra Review CS 143 Linear Algebra Review Stefan Roth September 29, 2003 Introductory Remarks This review does not aim at mathematical rigor very much, but instead at ease of understanding and conciseness. Please see

More information

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012 Problem Set #6: OLS Economics 835: Econometrics Fall 202 A preliminary result Suppose we have a random sample of size n on the scalar random variables (x, y) with finite means, variances, and covariance.

More information

. a m1 a mn. a 1 a 2 a = a n

. a m1 a mn. a 1 a 2 a = a n Biostat 140655, 2008: Matrix Algebra Review 1 Definition: An m n matrix, A m n, is a rectangular array of real numbers with m rows and n columns Element in the i th row and the j th column is denoted by

More information

Final Exam Practice Problems Answers Math 24 Winter 2012

Final Exam Practice Problems Answers Math 24 Winter 2012 Final Exam Practice Problems Answers Math 4 Winter 0 () The Jordan product of two n n matrices is defined as A B = (AB + BA), where the products inside the parentheses are standard matrix product. Is the

More information

THE UNIVERSITY OF HONG KONG DEPARTMENT OF MATHEMATICS

THE UNIVERSITY OF HONG KONG DEPARTMENT OF MATHEMATICS THE UNIVERSITY OF HONG KONG DEPARTMENT OF MATHEMATICS MATH853: Linear Algebra, Probability and Statistics May 5, 05 9:30a.m. :30p.m. Only approved calculators as announced by the Examinations Secretary

More information

Recall that if X 1,...,X n are random variables with finite expectations, then. The X i can be continuous or discrete or of any other type.

Recall that if X 1,...,X n are random variables with finite expectations, then. The X i can be continuous or discrete or of any other type. Expectations of Sums of Random Variables STAT/MTHE 353: 4 - More on Expectations and Variances T. Linder Queen s University Winter 017 Recall that if X 1,...,X n are random variables with finite expectations,

More information

ECE 275A Homework 6 Solutions

ECE 275A Homework 6 Solutions ECE 275A Homework 6 Solutions. The notation used in the solutions for the concentration (hyper) ellipsoid problems is defined in the lecture supplement on concentration ellipsoids. Note that θ T Σ θ =

More information

Exercises * on Principal Component Analysis

Exercises * on Principal Component Analysis Exercises * on Principal Component Analysis Laurenz Wiskott Institut für Neuroinformatik Ruhr-Universität Bochum, Germany, EU 4 February 207 Contents Intuition 3. Problem statement..........................................

More information

Dot Products. K. Behrend. April 3, Abstract A short review of some basic facts on the dot product. Projections. The spectral theorem.

Dot Products. K. Behrend. April 3, Abstract A short review of some basic facts on the dot product. Projections. The spectral theorem. Dot Products K. Behrend April 3, 008 Abstract A short review of some basic facts on the dot product. Projections. The spectral theorem. Contents The dot product 3. Length of a vector........................

More information

Linear Algebra: Characteristic Value Problem

Linear Algebra: Characteristic Value Problem Linear Algebra: Characteristic Value Problem . The Characteristic Value Problem Let < be the set of real numbers and { be the set of complex numbers. Given an n n real matrix A; does there exist a number

More information

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1 Inverse of a Square Matrix For an N N square matrix A, the inverse of A, 1 A, exists if and only if A is of full rank, i.e., if and only if no column of A is a linear combination 1 of the others. A is

More information

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data. Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two

More information

Econ 2120: Section 2

Econ 2120: Section 2 Econ 2120: Section 2 Part I - Linear Predictor Loose Ends Ashesh Rambachan Fall 2018 Outline Big Picture Matrix Version of the Linear Predictor and Least Squares Fit Linear Predictor Least Squares Omitted

More information

Elements of Probability Theory

Elements of Probability Theory Short Guides to Microeconometrics Fall 2016 Kurt Schmidheiny Unversität Basel Elements of Probability Theory Contents 1 Random Variables and Distributions 2 1.1 Univariate Random Variables and Distributions......

More information

Unconstrained Ordination

Unconstrained Ordination Unconstrained Ordination Sites Species A Species B Species C Species D Species E 1 0 (1) 5 (1) 1 (1) 10 (4) 10 (4) 2 2 (3) 8 (3) 4 (3) 12 (6) 20 (6) 3 8 (6) 20 (6) 10 (6) 1 (2) 3 (2) 4 4 (5) 11 (5) 8 (5)

More information

1 Multiply Eq. E i by λ 0: (λe i ) (E i ) 2 Multiply Eq. E j by λ and add to Eq. E i : (E i + λe j ) (E i )

1 Multiply Eq. E i by λ 0: (λe i ) (E i ) 2 Multiply Eq. E j by λ and add to Eq. E i : (E i + λe j ) (E i ) Direct Methods for Linear Systems Chapter Direct Methods for Solving Linear Systems Per-Olof Persson persson@berkeleyedu Department of Mathematics University of California, Berkeley Math 18A Numerical

More information

Principal Component Analysis (PCA) Our starting point consists of T observations from N variables, which will be arranged in an T N matrix R,

Principal Component Analysis (PCA) Our starting point consists of T observations from N variables, which will be arranged in an T N matrix R, Principal Component Analysis (PCA) PCA is a widely used statistical tool for dimension reduction. The objective of PCA is to find common factors, the so called principal components, in form of linear combinations

More information

Singular Value Decomposition and Principal Component Analysis (PCA) I

Singular Value Decomposition and Principal Component Analysis (PCA) I Singular Value Decomposition and Principal Component Analysis (PCA) I Prof Ned Wingreen MOL 40/50 Microarray review Data per array: 0000 genes, I (green) i,i (red) i 000 000+ data points! The expression

More information

Stat 216 Final Solutions

Stat 216 Final Solutions Stat 16 Final Solutions Name: 5/3/05 Problem 1. (5 pts) In a study of size and shape relationships for painted turtles, Jolicoeur and Mosimann measured carapace length, width, and height. Their data suggest

More information

Algebra 2 Khan Academy Video Correlations By SpringBoard Activity

Algebra 2 Khan Academy Video Correlations By SpringBoard Activity SB Activity Activity 1 Creating Equations 1-1 Learning Targets: Create an equation in one variable from a real-world context. Solve an equation in one variable. 1-2 Learning Targets: Create equations in

More information

STATISTICAL LEARNING SYSTEMS

STATISTICAL LEARNING SYSTEMS STATISTICAL LEARNING SYSTEMS LECTURE 8: UNSUPERVISED LEARNING: FINDING STRUCTURE IN DATA Institute of Computer Science, Polish Academy of Sciences Ph. D. Program 2013/2014 Principal Component Analysis

More information

MATH 583A REVIEW SESSION #1

MATH 583A REVIEW SESSION #1 MATH 583A REVIEW SESSION #1 BOJAN DURICKOVIC 1. Vector Spaces Very quick review of the basic linear algebra concepts (see any linear algebra textbook): (finite dimensional) vector space (or linear space),

More information

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Introduction Edps/Psych/Stat/ 584 Applied Multivariate Statistics Carolyn J Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN c Board of Trustees,

More information

Algebra 2 Khan Academy Video Correlations By SpringBoard Activity

Algebra 2 Khan Academy Video Correlations By SpringBoard Activity SB Activity Activity 1 Creating Equations 1-1 Learning Targets: Create an equation in one variable from a real-world context. Solve an equation in one variable. 1-2 Learning Targets: Create equations in

More information

Prentice Hall Mathematics, Algebra Correlated to: Achieve American Diploma Project Algebra II End-of-Course Exam Content Standards

Prentice Hall Mathematics, Algebra Correlated to: Achieve American Diploma Project Algebra II End-of-Course Exam Content Standards Core: Operations on Numbers and Expressions Priority: 15% Successful students will be able to perform operations with rational, real, and complex numbers, using both numeric and algebraic expressions,

More information

Intelligent Data Analysis. Principal Component Analysis. School of Computer Science University of Birmingham

Intelligent Data Analysis. Principal Component Analysis. School of Computer Science University of Birmingham Intelligent Data Analysis Principal Component Analysis Peter Tiňo School of Computer Science University of Birmingham Discovering low-dimensional spatial layout in higher dimensional spaces - 1-D/3-D example

More information

Revision: Chapter 1-6. Applied Multivariate Statistics Spring 2012

Revision: Chapter 1-6. Applied Multivariate Statistics Spring 2012 Revision: Chapter 1-6 Applied Multivariate Statistics Spring 2012 Overview Cov, Cor, Mahalanobis, MV normal distribution Visualization: Stars plot, mosaic plot with shading Outlier: chisq.plot Missing

More information

Linear Algebra Review

Linear Algebra Review Linear Algebra Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Linear Algebra Review 1 / 45 Definition of Matrix Rectangular array of elements arranged in rows and

More information

Linear Algebra and Eigenproblems

Linear Algebra and Eigenproblems Appendix A A Linear Algebra and Eigenproblems A working knowledge of linear algebra is key to understanding many of the issues raised in this work. In particular, many of the discussions of the details

More information

Gaussian Elimination and Back Substitution

Gaussian Elimination and Back Substitution Jim Lambers MAT 610 Summer Session 2009-10 Lecture 4 Notes These notes correspond to Sections 31 and 32 in the text Gaussian Elimination and Back Substitution The basic idea behind methods for solving

More information

Robustness of Principal Components

Robustness of Principal Components PCA for Clustering An objective of principal components analysis is to identify linear combinations of the original variables that are useful in accounting for the variation in those original variables.

More information

Linear Algebra: Matrix Eigenvalue Problems

Linear Algebra: Matrix Eigenvalue Problems CHAPTER8 Linear Algebra: Matrix Eigenvalue Problems Chapter 8 p1 A matrix eigenvalue problem considers the vector equation (1) Ax = λx. 8.0 Linear Algebra: Matrix Eigenvalue Problems Here A is a given

More information

Algebra II. A2.1.1 Recognize and graph various types of functions, including polynomial, rational, and algebraic functions.

Algebra II. A2.1.1 Recognize and graph various types of functions, including polynomial, rational, and algebraic functions. Standard 1: Relations and Functions Students graph relations and functions and find zeros. They use function notation and combine functions by composition. They interpret functions in given situations.

More information

Need for Several Predictor Variables

Need for Several Predictor Variables Multiple regression One of the most widely used tools in statistical analysis Matrix expressions for multiple regression are the same as for simple linear regression Need for Several Predictor Variables

More information

Multivariate Analysis of Variance

Multivariate Analysis of Variance Chapter 15 Multivariate Analysis of Variance Jolicouer and Mosimann studied the relationship between the size and shape of painted turtles. The table below gives the length, width, and height (all in mm)

More information

Knowledge Discovery and Data Mining 1 (VO) ( )

Knowledge Discovery and Data Mining 1 (VO) ( ) Knowledge Discovery and Data Mining 1 (VO) (707.003) Review of Linear Algebra Denis Helic KTI, TU Graz Oct 9, 2014 Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 1 / 74 Big picture: KDDM Probability Theory

More information

Review problems for MA 54, Fall 2004.

Review problems for MA 54, Fall 2004. Review problems for MA 54, Fall 2004. Below are the review problems for the final. They are mostly homework problems, or very similar. If you are comfortable doing these problems, you should be fine on

More information

Preliminaries. Copyright c 2018 Dan Nettleton (Iowa State University) Statistics / 38

Preliminaries. Copyright c 2018 Dan Nettleton (Iowa State University) Statistics / 38 Preliminaries Copyright c 2018 Dan Nettleton (Iowa State University) Statistics 510 1 / 38 Notation for Scalars, Vectors, and Matrices Lowercase letters = scalars: x, c, σ. Boldface, lowercase letters

More information

Linear Regression. Junhui Qian. October 27, 2014

Linear Regression. Junhui Qian. October 27, 2014 Linear Regression Junhui Qian October 27, 2014 Outline The Model Estimation Ordinary Least Square Method of Moments Maximum Likelihood Estimation Properties of OLS Estimator Unbiasedness Consistency Efficiency

More information

22.3. Repeated Eigenvalues and Symmetric Matrices. Introduction. Prerequisites. Learning Outcomes

22.3. Repeated Eigenvalues and Symmetric Matrices. Introduction. Prerequisites. Learning Outcomes Repeated Eigenvalues and Symmetric Matrices. Introduction In this Section we further develop the theory of eigenvalues and eigenvectors in two distinct directions. Firstly we look at matrices where one

More information

Computing Science Group STABILITY OF THE MAHALANOBIS DISTANCE: A TECHNICAL NOTE. Andrew D. Ker CS-RR-10-20

Computing Science Group STABILITY OF THE MAHALANOBIS DISTANCE: A TECHNICAL NOTE. Andrew D. Ker CS-RR-10-20 Computing Science Group STABILITY OF THE MAHALANOBIS DISTANCE: A TECHNICAL NOTE Andrew D. Ker CS-RR-10-20 Oxford University Computing Laboratory Wolfson Building, Parks Road, Oxford OX1 3QD Abstract When

More information

Chapter 1. The Noble Eightfold Path to Linear Regression

Chapter 1. The Noble Eightfold Path to Linear Regression Chapter 1 The Noble Eightfold Path to Linear Regression In this chapter, I show several di erent ways of solving the linear regression problem. The di erent approaches are interesting in their own way.

More information

ANALYTICAL MATHEMATICS FOR APPLICATIONS 2018 LECTURE NOTES 3

ANALYTICAL MATHEMATICS FOR APPLICATIONS 2018 LECTURE NOTES 3 ANALYTICAL MATHEMATICS FOR APPLICATIONS 2018 LECTURE NOTES 3 ISSUED 24 FEBRUARY 2018 1 Gaussian elimination Let A be an (m n)-matrix Consider the following row operations on A (1) Swap the positions any

More information

Matrices and Deformation

Matrices and Deformation ES 111 Mathematical Methods in the Earth Sciences Matrices and Deformation Lecture Outline 13 - Thurs 9th Nov 2017 Strain Ellipse and Eigenvectors One way of thinking about a matrix is that it operates

More information

Repeated Eigenvalues and Symmetric Matrices

Repeated Eigenvalues and Symmetric Matrices Repeated Eigenvalues and Symmetric Matrices. Introduction In this Section we further develop the theory of eigenvalues and eigenvectors in two distinct directions. Firstly we look at matrices where one

More information

Economics 573 Problem Set 5 Fall 2002 Due: 4 October b. The sample mean converges in probability to the population mean.

Economics 573 Problem Set 5 Fall 2002 Due: 4 October b. The sample mean converges in probability to the population mean. Economics 573 Problem Set 5 Fall 00 Due: 4 October 00 1. In random sampling from any population with E(X) = and Var(X) =, show (using Chebyshev's inequality) that sample mean converges in probability to..

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions

More information

Unsupervised Learning: Dimensionality Reduction

Unsupervised Learning: Dimensionality Reduction Unsupervised Learning: Dimensionality Reduction CMPSCI 689 Fall 2015 Sridhar Mahadevan Lecture 3 Outline In this lecture, we set about to solve the problem posed in the previous lecture Given a dataset,

More information

Math 1302 Notes 2. How many solutions? What type of solution in the real number system? What kind of equation is it?

Math 1302 Notes 2. How many solutions? What type of solution in the real number system? What kind of equation is it? Math 1302 Notes 2 We know that x 2 + 4 = 0 has How many solutions? What type of solution in the real number system? What kind of equation is it? What happens if we enlarge our current system? Remember

More information

TC08 / 6. Hadamard codes SX

TC08 / 6. Hadamard codes SX TC8 / 6. Hadamard codes 3.2.7 SX Hadamard matrices Hadamard matrices. Paley s construction of Hadamard matrices Hadamard codes. Decoding Hadamard codes A Hadamard matrix of order is a matrix of type whose

More information

Matrix Algebra, Class Notes (part 2) by Hrishikesh D. Vinod Copyright 1998 by Prof. H. D. Vinod, Fordham University, New York. All rights reserved.

Matrix Algebra, Class Notes (part 2) by Hrishikesh D. Vinod Copyright 1998 by Prof. H. D. Vinod, Fordham University, New York. All rights reserved. Matrix Algebra, Class Notes (part 2) by Hrishikesh D. Vinod Copyright 1998 by Prof. H. D. Vinod, Fordham University, New York. All rights reserved. 1 Converting Matrices Into (Long) Vectors Convention:

More information

Lecture 13: Simple Linear Regression in Matrix Format. 1 Expectations and Variances with Vectors and Matrices

Lecture 13: Simple Linear Regression in Matrix Format. 1 Expectations and Variances with Vectors and Matrices Lecture 3: Simple Linear Regression in Matrix Format To move beyond simple regression we need to use matrix algebra We ll start by re-expressing simple linear regression in matrix form Linear algebra is

More information

Key Algebraic Results in Linear Regression

Key Algebraic Results in Linear Regression Key Algebraic Results in Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) 1 / 30 Key Algebraic Results in

More information

STAT5044: Regression and Anova. Inyoung Kim

STAT5044: Regression and Anova. Inyoung Kim STAT5044: Regression and Anova Inyoung Kim 2 / 51 Outline 1 Matrix Expression 2 Linear and quadratic forms 3 Properties of quadratic form 4 Properties of estimates 5 Distributional properties 3 / 51 Matrix

More information

Gaussian random variables inr n

Gaussian random variables inr n Gaussian vectors Lecture 5 Gaussian random variables inr n One-dimensional case One-dimensional Gaussian density with mean and standard deviation (called N, ): fx x exp. Proposition If X N,, then ax b

More information

The degree of a function is the highest exponent in the expression

The degree of a function is the highest exponent in the expression L1 1.1 Power Functions Lesson MHF4U Jensen Things to Remember About Functions A relation is a function if for every x-value there is only 1 corresponding y-value. The graph of a relation represents a function

More information

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis. Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Vector space Consists of: A set V A scalar

More information

Lecture Note 1: Probability Theory and Statistics

Lecture Note 1: Probability Theory and Statistics Univ. of Michigan - NAME 568/EECS 568/ROB 530 Winter 2018 Lecture Note 1: Probability Theory and Statistics Lecturer: Maani Ghaffari Jadidi Date: April 6, 2018 For this and all future notes, if you would

More information

Exam 2. Jeremy Morris. March 23, 2006

Exam 2. Jeremy Morris. March 23, 2006 Exam Jeremy Morris March 3, 006 4. Consider a bivariate normal population with µ 0, µ, σ, σ and ρ.5. a Write out the bivariate normal density. The multivariate normal density is defined by the following

More information

1 Principal component analysis and dimensional reduction

1 Principal component analysis and dimensional reduction Linear Algebra Working Group :: Day 3 Note: All vector spaces will be finite-dimensional vector spaces over the field R. 1 Principal component analysis and dimensional reduction Definition 1.1. Given an

More information

CS 246 Review of Linear Algebra 01/17/19

CS 246 Review of Linear Algebra 01/17/19 1 Linear algebra In this section we will discuss vectors and matrices. We denote the (i, j)th entry of a matrix A as A ij, and the ith entry of a vector as v i. 1.1 Vectors and vector operations A vector

More information

Math 313 Chapter 1 Review

Math 313 Chapter 1 Review Math 313 Chapter 1 Review Howard Anton, 9th Edition May 2010 Do NOT write on me! Contents 1 1.1 Introduction to Systems of Linear Equations 2 2 1.2 Gaussian Elimination 3 3 1.3 Matrices and Matrix Operations

More information

Common-Knowledge / Cheat Sheet

Common-Knowledge / Cheat Sheet CSE 521: Design and Analysis of Algorithms I Fall 2018 Common-Knowledge / Cheat Sheet 1 Randomized Algorithm Expectation: For a random variable X with domain, the discrete set S, E [X] = s S P [X = s]

More information

Section 7.3: SYMMETRIC MATRICES AND ORTHOGONAL DIAGONALIZATION

Section 7.3: SYMMETRIC MATRICES AND ORTHOGONAL DIAGONALIZATION Section 7.3: SYMMETRIC MATRICES AND ORTHOGONAL DIAGONALIZATION When you are done with your homework you should be able to Recognize, and apply properties of, symmetric matrices Recognize, and apply properties

More information

Selection on Multiple Traits

Selection on Multiple Traits Selection on Multiple Traits Bruce Walsh lecture notes Uppsala EQG 2012 course version 7 Feb 2012 Detailed reading: Chapter 30 Genetic vs. Phenotypic correlations Within an individual, trait values can

More information