Appendix A : rational of the spatial Principal Component Analysis

Similar documents
Community surveys through space and time: testing the space time interaction

Analysis of Multivariate Ecological Data

Community surveys through space and time: testing the space-time interaction in the absence of replication

Community surveys through space and time: testing the space-time interaction in the absence of replication

Temporal eigenfunction methods for multiscale analysis of community composition and other multivariate data

Example Linear Algebra Competency Test

Figure 43 - The three components of spatial variation

No books, no notes, no calculators. You must show work, unless the question is a true/false, yes/no, or fill-in-the-blank question.

Spatial eigenfunction modelling: recent developments

MATH 829: Introduction to Data Mining and Analysis Principal component analysis

Study Notes on Matrices & Determinants for GATE 2017

MEMGENE package for R: Tutorials

PRINCIPAL COMPONENTS ANALYSIS

22m:033 Notes: 7.1 Diagonalization of Symmetric Matrices

1 A factor can be considered to be an underlying latent variable: (a) on which people differ. (b) that is explained by unknown variables

MEMGENE: Spatial pattern detection in genetic distance data

Bootstrapping, Randomization, 2B-PLS

5.) For each of the given sets of vectors, determine whether or not the set spans R 3. Give reasons for your answers.

Extreme Values and Positive/ Negative Definite Matrix Conditions

Multivariate Statistical Analysis

Multivariate Time Series: VAR(p) Processes and Models

Constant coefficients systems

Matrix Algebra: Summary

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

Historical contingency, niche conservatism and the tendency for some taxa to be more diverse towards the poles

What are the important spatial scales in an ecosystem?

Review of Linear Algebra

Spatial genetics analyses using

MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 6: Bivariate Correspondence Analysis - part II

Math 1553, Introduction to Linear Algebra

Chapter 3 Transformations

MATH 304 Linear Algebra Lecture 20: The Gram-Schmidt process (continued). Eigenvalues and eigenvectors.

Principal Component Analysis (PCA) Theory, Practice, and Examples

spring, math 204 (mitchell) list of theorems 1 Linear Systems Linear Transformations Matrix Algebra

Multivariate analysis of genetic data: investigating spatial structures

Smooth Common Principal Component Analysis

Principal Components Analysis (PCA)

Multivariate analysis of genetic data: investigating spatial structures

Testing the significance of canonical axes in redundancy analysis

MATH 423 Linear Algebra II Lecture 33: Diagonalization of normal operators.

Supplementary Material

Homework 2 Solutions

Mathematical foundations - linear algebra

Matrices related to linear transformations

On common eigenbases of commuting operators

Dimensionality Reduction Techniques (DRT)

Linear Algebra (MATH ) Spring 2011 Final Exam Practice Problem Solutions

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

Should the Mantel test be used in spatial analysis?

Frank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c /9/9 page 147 le-tex

This appendix provides a very basic introduction to linear algebra concepts.

Contents. UNIT 1 Descriptive Statistics 1. vii. Preface Summary of Goals for This Text

Estimating and controlling for spatial structure in the study of ecological communitiesgeb_

EIGENVALUES AND SINGULAR VALUE DECOMPOSITION

Basic Calculus Review

MATH 1553 SAMPLE FINAL EXAM, SPRING 2018

PCA and admixture models

Linear Algebra Practice Problems

Should the Mantel test be used in spatial analysis?

Linear Algebra. Paul Yiu. 6D: 2-planes in R 4. Department of Mathematics Florida Atlantic University. Fall 2011

Reduction to the associated homogeneous system via a particular solution

LINEAR ALGEBRA 1, 2012-I PARTIAL EXAM 3 SOLUTIONS TO PRACTICE PROBLEMS

STATISTICAL LEARNING SYSTEMS

Vector Spaces and Linear Transformations

A TIME SERIES PARADOX: UNIT ROOT TESTS PERFORM POORLY WHEN DATA ARE COINTEGRATED

Here each term has degree 2 (the sum of exponents is 2 for all summands). A quadratic form of three variables looks as

Matrix Vector Products

Regularized Discriminant Analysis and Reduced-Rank LDA

A = 3 B = A 1 1 matrix is the same as a number or scalar, 3 = [3].

A review of stability and dynamical behaviors of differential equations:

Vectors and Matrices Statistics with Vectors and Matrices

Hotelling s One- Sample T2

Linear Algebra Review. Fei-Fei Li

GEOG 4110/5100 Advanced Remote Sensing Lecture 15

Study Guide for Linear Algebra Exam 2

2. Every linear system with the same number of equations as unknowns has a unique solution.

Chapter 11 Canonical analysis

VarCan (version 1): Variation Estimation and Partitioning in Canonical Analysis

Maths for Signals and Systems Linear Algebra in Engineering

Background Mathematics (2/2) 1. David Barber

Spatial autocorrelation: robustness of measures and tests

7. Variable extraction and dimensionality reduction

Eigenvalue and Eigenvector Homework

MATH 221, Spring Homework 10 Solutions

MATH 1553, SPRING 2018 SAMPLE MIDTERM 2 (VERSION B), 1.7 THROUGH 2.9

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Random Vectors, Random Matrices, and Matrix Expected Value

Modelling directional spatial processes in ecological data

Previously Monte Carlo Integration

MATH 1553 PRACTICE FINAL EXAMINATION

14 Singular Value Decomposition

Linear Algebra Final Exam Study Guide Solutions Fall 2012

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

Appendix A: Matrices

Multivariate Analysis of Ecological Data using CANOCO

Tutorials in Optimization. Richard Socher

Factor Analysis Continued. Psy 524 Ainsworth

Hands-on Matrix Algebra Using R

Machine Learning 11. week

8. Diagonalization.

Transcription:

Appendix A : rational of the spatial Principal Component Analysis In this appendix, the following notations are used : X is the n-by-p table of centred allelic frequencies, where rows are observations (on individuals or populations) and columns are alleles. L is a row-standardized connection matrix associated to the connection network among genotypes or populations. v refers to any scaled vector of p loadings, so that v 2 = 1. α refers to any vector of n row scores obtained by linear combinaison of the columns of X so that : α = Xv. C(v) is a quantity serving as a score criterion in the analysis, defined as : C(v) = var(xv)i(xv) = 1 n (Xv)T LXv = 1 n vt X T LXv. w refers to any scaled vector of p loadings provided by the spca, thus verifying w 2 = 1. ψ refers to any vector of n row scores obtained by linear combinaison of the columns of X so that : ψ = Xw. The purpose of the spatial Principal Component Analysis is to find the extrema of : which we rewrite, posing A = 1 n XT LX : C(v) = 1 n vt X T LXv = 1 n αt Lα C(v) = v T Av The solution to this problem is well-known when A is symmetric. This is, however, not the case because L is not symmetric itself. To solve this problem, we seek a symmetric matrix B so that : C(v) = v T Bv The expression v T Av is a scalar, so v T Av = (v T Av) T = v T A T v. Thus we have : v T Av = 1 2 (vt Av + v T A T v) = 1 2 (vt (A + A T )v) = v T ( 1 2 (A + AT ))v 1

where (A + A T ) is symmetric because (A + A T ) T C(v) = v T Bv with : = A T + A. As a consequence, B = 1 2 (A + AT ) = 1 2n XT (L + L T )X Hence, we can find the extrema of C(v) = v T Bv using the existing solution (Harville, 1997, p533-534). It is shown that if w 1 and w r are the eigenvectors of B associated to λ 1 and λ r, respectively the highest and lowest eigenvalues of B, then : and so : λ r = w T r Bw r v T Av w T 1 Bw 1 = λ 1 λ r = var(ψ r )I(ψ r ) C(v) var(ψ 1 )I(ψ 1 ) = λ 1 Appendix B : diagram of the multivariate tests of global and local structuring and associated computations Computations of the test statistic The testing procedure (Figure 1) is the same for both multivariate tests (global or local structuring). The matrix of allelic frequencies X (n individuals or genotypes ; p alleles) is first centred and scaled. The obtained matrix is denoted Y. The matrix E is obtained like in Griffith (1996) and Dray et al. (2006) by the eigen analysis of the connection matrix associated to the connection network between genotypes (or populations). Its columns e j (j = 1, q) are the centred and scaled Moran s eigenvector maps (MEMs), which model either global or local structures (Dray et al. 2006). For the purpose of our tests, E contains global MEMs (E = E+) for the test of global structuring and local MEMs (E = E ) for the local test. The coefficients of determinations (R 2 ) are computed after linear regression of each allele on each MEM, giving a matrix S containing p x q values of R 2 computed as : S = (YT E) (Y T E) n 2 where Y T is the transposed matrix of Y and where denotes the Hadamard product. 2

FIG. 1 Diagram of the testing procedure As MEMs are orthogonal vectors, the coefficients of determination of alleles obtained from regression onto E+ are independent from of those obtained with E. Practically, this implies that the test statistics in global and local tests are independent, in the sense that the value of the first is not conditionned by the value of the second (and reciprocally). Note that this is also true for the reference distributions of both statistics. 3

The squared correlations are then averaged for each MEM, giving a value t j for the j th allele defined by : t j = 1 p R 2 (y i, e j ) p i=1 The value of t j is large (respectively small ) when alleles are in average strongly (respectively weakly ) correlated to the j th MEM. The (n 1) values of t j are stored in the vector t, directly computed as : t = 1 p 1T p S where 1 p is the p-dimensional vector whose components are all 1. The test statistic is defined as the maximum of all components of t, max(t). Computation of the reference distribution The reference distribution is defined as the distribution of the test statistic (max(t)) under the null hypothesis H 0 that allelic frequencies of the genotypes (or populations) are distributed at random on the connection network. The alternative hypothesis H 1 depends on the type of test : global test : allelic frequencies of the genotypes (or populations) display at least one global structure local test : allelic frequencies of the genotypes (or populations) display at least one local structure In both tests, the reference distribution is approximated by a Monte Carlo procedure involving a large number of permutations (at least 999) of the rows of Y (genotypes or populations), the test statistic being computed from each permutation. The p-value is computed as the relative frequency of permuted statistics equal to or higher than the initial value of max(t). Assessment of type I error The actual type I error of both tests was assessed using simulated datasets of allelic frequencies with random spatial coordinates. We used datasets with different number of observations (25, 50, 100, 200) and different number of alleles (50, 100, 150). For each size of dataset, 200 simulations were performed and the results were pooled across simulations for each test, yielding a total of 2400 simulations per test. The actual type I error was measured as the relative frequency of reject of H 0 given different nominal α levels (Table 1). The estimated type I error was always very close to the chosen α level. 4

Nominal α level Observed type I error Observed type I error (global test) (local test) 0.1 0.0925 0.1042 0.05 0.0425 0.0512 0.01 0.0075 0.0062 TAB. 1 Estimations of actual type I errors of the global and local tests assessed through simulations, for three nominal α levels. 2400 simulations of datasets with different size (see text) were performed for each test. References Dray S, Legendre P, Peres-Neto P (2006) Spatial modelling : a comprehensive framework for principal coordinate analysis of neighbours matrices (PCNM). Ecological Modelling 196 : 483-493. Griffith D (1996) Spatial autocorrelation and eigenfunctions of the geographic weights matrix accompanying geo-referenced data. Canadian Geographer 40 : 351-367. Harville D (1997) Matrix algebra from a statistician s perspective. Springer, New York. 5