Multivariate analysis

Size: px
Start display at page:

Download "Multivariate analysis"

Transcription

1 Multivariate analysis Prof dr Ann Vanreusel -Multidimensional scaling -Simper analysis -BEST -ANOSIM

2 1 2 Gradient in species composition 3 4 Gradient in environment site1 site2 site 3 site 4 site species a 1 2 species b species c species d species e 2 beach zonation beach zonation Similarity site 3 site Stress: site 4 1 site1 site2 site 3 site 4 site site1 site2

3 Clustering or Classification some disadvantages Even when there is contious structure in the data matrix DISCONTINUOUS OUTPUT CLUSTERS Variation in communities rather continuous than discontinuous However still useful in ecology, mainly in combination with ordination In order to recognize structure (communities) in large datamatrices.

4 Non metric multidimensional scaling = ordination points close together = sites similar in (species) composition points far apart = sites dissimilar in (species) composition MDS original (species) composition data are replaced by matrix of dissimilarity values between sites this matrix is used to obtain ordination diagram Specifies what similar means Measure needed that expresses how well or badly the distances in the ordination diagram correspond to the dissimilarity values = stress function MDS to choose a configuration that minimizes the degree of stress

5 Metric ordination (CA, PCA) Stress function depends on the actual numerical values of the dissimilarities Chi square CA Euclidean distance PCA Non metric ordination (MDS) Stress function depends only on the rank order of the dissimilarities Characteristics better flexibility complex algorithm rationale simple few if any assumptions

6 Based on ranks of similarities Raw data similarities ranks ordination The higher similarity has the lowest rank

7 site1 site2 site 3 site 4 site species a 1 2 species b species c species d species e 2 Raw counts Bray Curtis similarity matrix Site 1 site 2 site 3 site 4 Site 2 8 Site 3 44,44 44,44 Site 4 19,4 19,4 63,1 Site 1,2 1,2 8,82 7 Site 1 site 2 site 3 site 4 Site 2 1 Site 3 Site Site Ordination diagram 2 site 4 site Ordination ranks 3 and 4 6 and 7 site 3 Resemblance: S17 Bray Curtis similarity 2D Stress: 1 site1 site2

8 What are stages in the construction of an MDS diagram? Iterative procedure Successively refining of the positions of the points until they satisfy as closely as possible the dissimilarity relationships between samples I. Specify nr of dimensions (usually 2 ) II. Starting configuration of samples (whatever..) III. Regress interpoint distances from this plot on the corresponding dissimilarities

9 Shepard diagram non-parametric regression = non metric MDS (regression metric MDS) = best fitting line which moulds itself to the shape of scatterplot = constrained to increase (series of steps)

10 IV. Goodness of fit of the regression by calculating the stress value ΣΣ (d jk d jk )² Stress = ΣΣd jk ² Predicted from regression line Larger scatter = larger stress V. Points are moved to new positions in distribution which decrease the stress most rapidly VI. Repeat steps 3 to until no further improvement of stress can be achieved

11 Iterative procedure gradually finds it way down to a minimum of the stress function traps - Local minimum of stress function in stead of global minimum Repeat MDS starting with different random positions of samples If same solution re-appears best solution - Degenerate solutions f.i. if data divide in two groups with no species in common No sense to determine how far apart groups should be placed in the MDS plot infinitely apart Two separate analyses

12 Adequacy of MDS ordination Is stress value small? Is a 2 dimensional plot a usable summary of the sample relationships? Stress <. excellent Stress <.1 good Stress <.2 potential useful Stress >.3 arbitrarily placed points in 2 dimensional space Does the shepard diagram appears satisfactory? The stress value totals the scatter around the regression line in a shepard diagram Outliers might need a higher dimensional representation for accurate placement

13 Strenghts Weakness Simple in concept Based on relevant sample information Species deletions are unnecessary Generally applicable Similarities can be given unequal weight Computionally demanding Convergence to the global minimum of stress is not guaranteed The algorithm places most weight on the large distances

14 Based on road distance matrix Based on real distance matrix

15 site1 site2 site 3 site 4 site Resemblance: S17 Bray Curtis similarity 2D Stress: species a species b species c species d species e 1 2 site1 site2 site 4site Resemblance: S17 Bray Curtis similarity 2D Stress: species a site 3 site1 site2 2 4 site site 4,4 1,6 site 3 1 2,8 4 Bubble plots Distribution of species over stations Resemblance: S17 Bray Curtis similarity 2D Stress: species e Resemblance: S17 Bray Curtis similarity 2D Stress: species c site1 site2 site 3 1 site site 4 2, 2 3, site1 site2 site 3 2 site site 4 2 1, 2 3,

16 ANOSIM (Analysis of similarities) To test for statistically significant differences between groups A priori defined structure within set of samples (e.g. replicates ) = simple non-parametric permutation procedure applied to the (rank) similarity matrix Null hypothesis No significant differences in community composition between a priori defined groups

17 st1a st1b st1c st2a st2b st2c st3a st3b st3c spec A spec B Spec C spec D Spec E spec F Resemblance: S17 Bray Curtis similarity 2D Stress:,1 site st2a st1c st2b st2c st1b st1a st3a st3b st3c Significant differences in species composition between sites???

18 Cfr ANOVA Compute test statistic R reflecting the observed differences between sites contrasted with differences among replicates within sites Test is based on distances between and within sites or better Based on ranked similarities R is based on difference between - average of rank similarities of all pairs between sites And - Average of rank similarities from all pairs within sites r R = B -r W ((n(n-1)/2)/2) 1 when all replicates within sites are more similar to each other than any other replicates from different sites

19 Rationale of permutation test all possible allocations of replicate labels to any sample is examined and R statistic is calculated (all = a large number of times) If R statistic falls outside range of R s obtained after permutation H is rejected (H : no site differences)

20 Global Test Sample statistic (Global R):,934 Significance level of sample statistic:,4% Number of permutations: 28 (All possible permutations) Number of permuted statistics greater than or equal to Global R: 1 Pairwise Tests R Significance Possible Actual Number >= Groups Statistic Level % Permutations Permutations Observed 1, 2, , , = low st2a st1c st2b st2c st1b st1a Resemblance: S17 Bray Curtis similarity 2D Stress:,1 st3a st3b st3c site Significant differences in species composition between sites??? ANOSIM Ho : no sites difference P < % (p>.) R close or = to 1 Ho rejected Sites are different

21 If R statistic falls outside range of R s obtained after permutation H is rejected (H : no site differences) 73 site Test Sample statistic (Global R):,934 R =.943 is very unlikely 4 times on thousands trials (p =.4 %) Frequency -,4 -,3 -,2 -,1,1,2,3,4,,6,7,8,9 1, R

22 So far global test To test for specific pairs of sites Repeated significancy test cumulation of risks to draw incorrect conclusion (type I error) Global test is most reliable higher nr of replicates sufficient permutations Pairwise test rather look at R (in stead of p) R approaching 1 separation (in case of low stress value also obvious from MDS) R appraoching no separation Also ANOSIM for two lay layout

23 Correlation with environmental variables BEST analysis Selects environmental variables, or species "best explaining" community pattern, by maximising a rank correlation between their respective resemblance matrices. Two algorithms are available. In the BIOENV algorithm all permutations of the trial variables are tried. In the BVSTEP algorithm a stepwise search over the trial variables is tried. Use BVSTEP if there is a large number of trial variables and BIOENV is too slow.

24 BIO -ENV Linking community analysis to environmental variables To which extent are physico-chemical variables related ( explains ) to the observed biological pattern By superimposing univariates on top of the MDS plot

25 MDS repeated for specific combination of environmental variables Best fitting environmental combination Match between any two plots Ranks of two similarity matrices are compared through a (weighted) rank correlation coefficient (take care for collinearity)

26

27

28 SIMPER (similarity percentages) Species similarity matrix MDS Often high stress for species MDS Therefore concentrate on sample similarities and highlight species responsible for determining the sample groupings in cluster or ordination analysis Compute the average dissimilaity (δ) between all pairs of the intergroup samples = every sample in group 1 paired with every sample in group 2 Break the average down into specific contributions from each species to δ Discriminating species When it contributes much to the dissimilarity between group 1 and 2 (δ is large) When it does so consistently in the inter comparisons of all samples in the 2 groups Standard Deviation of δ is small

29 Species that are good discriminators between groups are indicated by *

30 E. Affinis explains almost 3 % Intra group similarity typical species (not necessarily a good discriminator)

31 st1a st1b st1c st2a st2b st2c st3a st3b st3c spec A spec B Spec C spec D Spec E spec F Groups 1 & 2 Average dissimilarity = 19,61 Group 1 Group 2 Species Av.Abund Av.Abund Av.Diss Diss/SD Contrib% Cum.% spec A 1,33 3,67 6,87 3,33 3,4 3,4 spec B 4, 2,,82 1,69 29,69 64,73 spec D 6,, 3,67 1,22 18,7 83,44 Groups 1 & 3 Average dissimilarity = 8,27 Group 1 Group 3 Species Av.Abund Av.Abund Av.Diss Diss/SD Contrib% Cum.% spec D 6, 21, 24,69 6,78 3,76 3,76 spec F, 12,33 2,9 4,78 2,2,78 Spec E, 1, 16,42 11,19 2,46 76,23 Groups 2 & 3 Average dissimilarity = 83,2 Group 2 Group 3 Species Av.Abund Av.Abund Av.Diss Diss/SD Contrib% Cum.% spec D, 21, 26,9 6,69 32,37 32,37 spec F, 12,33 2,3 4,81 24,66 7,3 Spec E, 1, 16,79 11,29 2,16 77,19

32 st1a st1b st1c st2a st2b st2c st3a st3b st3c spec A spec B Spec C spec D Spec E spec F Group 1 Average similarity: 84,93 Species Av.Abund Av.Sim Sim/SD Contrib% Cum.% spec D 6, 3,34 6,6 3,72 3,72 Spec C 6,33 3,13 24,8 3,48 71,2 spec B 4, 18,78 9, 22,11 93,32 Group 2 Average similarity: 87,6 Species Av.Abund Av.Sim Sim/SD Contrib% Cum.% Spec C,67 32,6 21,8 37,21 37,21 spec D, 26,46 14,13 3,2 67,42 spec A 3,67 2,32 9,16 23,2 9,61 Group 3 Average similarity: 9,4 Species Av.Abund Av.Sim Sim/SD Contrib% Cum.% spec D 21, 4,47 11,24,, spec F 12,33 23,1 7,24 2, 76, Spec E 1, 21,6 13,21 23,9 1,

DETECTING BIOLOGICAL AND ENVIRONMENTAL CHANGES: DESIGN AND ANALYSIS OF MONITORING AND EXPERIMENTS (University of Bologna, 3-14 March 2008)

DETECTING BIOLOGICAL AND ENVIRONMENTAL CHANGES: DESIGN AND ANALYSIS OF MONITORING AND EXPERIMENTS (University of Bologna, 3-14 March 2008) Dipartimento di Biologia Evoluzionistica Sperimentale Centro Interdipartimentale di Ricerca per le Scienze Ambientali in Ravenna INTERNATIONAL WINTER SCHOOL UNIVERSITY OF BOLOGNA DETECTING BIOLOGICAL AND

More information

Experimental Design and Data Analysis for Biologists

Experimental Design and Data Analysis for Biologists Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1

More information

Multivariate Statistics 101. Ordination (PCA, NMDS, CA) Cluster Analysis (UPGMA, Ward s) Canonical Correspondence Analysis

Multivariate Statistics 101. Ordination (PCA, NMDS, CA) Cluster Analysis (UPGMA, Ward s) Canonical Correspondence Analysis Multivariate Statistics 101 Ordination (PCA, NMDS, CA) Cluster Analysis (UPGMA, Ward s) Canonical Correspondence Analysis Multivariate Statistics 101 Copy of slides and exercises PAST software download

More information

ANOVA approach. Investigates interaction terms. Disadvantages: Requires careful sampling design with replication

ANOVA approach. Investigates interaction terms. Disadvantages: Requires careful sampling design with replication ANOVA approach Advantages: Ideal for evaluating hypotheses Ideal to quantify effect size (e.g., differences between groups) Address multiple factors at once Investigates interaction terms Disadvantages:

More information

Multivariate Statistics Summary and Comparison of Techniques. Multivariate Techniques

Multivariate Statistics Summary and Comparison of Techniques. Multivariate Techniques Multivariate Statistics Summary and Comparison of Techniques P The key to multivariate statistics is understanding conceptually the relationship among techniques with regards to: < The kinds of problems

More information

4. Ordination in reduced space

4. Ordination in reduced space Université Laval Analyse multivariable - mars-avril 2008 1 4.1. Generalities 4. Ordination in reduced space Contrary to most clustering techniques, which aim at revealing discontinuities in the data, ordination

More information

4/2/2018. Canonical Analyses Analysis aimed at identifying the relationship between two multivariate datasets. Cannonical Correlation.

4/2/2018. Canonical Analyses Analysis aimed at identifying the relationship between two multivariate datasets. Cannonical Correlation. GAL50.44 0 7 becki 2 0 chatamensis 0 darwini 0 ephyppium 0 guntheri 3 0 hoodensis 0 microphyles 0 porteri 2 0 vandenburghi 0 vicina 4 0 Multiple Response Variables? Univariate Statistics Questions Individual

More information

Multivariate Analysis of Ecological Data

Multivariate Analysis of Ecological Data Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology

More information

Edwin A. Hernández-Delgado*

Edwin A. Hernández-Delgado* Long-term Coral Reef Ecological Change Monitoring Program of the Luis Peña Channel Marine Fishery Reserve, Culebra Island, Puerto Rico: I. Status of the coral reef epibenthic communities (1997-2002). Edwin

More information

Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics

Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics The candidates for the research course in Statistics will have to take two shortanswer type tests

More information

Unconstrained Ordination

Unconstrained Ordination Unconstrained Ordination Sites Species A Species B Species C Species D Species E 1 0 (1) 5 (1) 1 (1) 10 (4) 10 (4) 2 2 (3) 8 (3) 4 (3) 12 (6) 20 (6) 3 8 (6) 20 (6) 10 (6) 1 (2) 3 (2) 4 4 (5) 11 (5) 8 (5)

More information

Multivariate Analysis of Ecological Data

Multivariate Analysis of Ecological Data Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology

More information

BIO 682 Multivariate Statistics Spring 2008

BIO 682 Multivariate Statistics Spring 2008 BIO 682 Multivariate Statistics Spring 2008 Steve Shuster http://www4.nau.edu/shustercourses/bio682/index.htm Lecture 11 Properties of Community Data Gauch 1982, Causton 1988, Jongman 1995 a. Qualitative:

More information

STAT Section 5.8: Block Designs

STAT Section 5.8: Block Designs STAT 518 --- Section 5.8: Block Designs Recall that in paired-data studies, we match up pairs of subjects so that the two subjects in a pair are alike in some sense. Then we randomly assign, say, treatment

More information

Trip Distribution Modeling Milos N. Mladenovic Assistant Professor Department of Built Environment

Trip Distribution Modeling Milos N. Mladenovic Assistant Professor Department of Built Environment Trip Distribution Modeling Milos N. Mladenovic Assistant Professor Department of Built Environment 25.04.2017 Course Outline Forecasting overview and data management Trip generation modeling Trip distribution

More information

Multivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis

Multivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis Multivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis For example Data reduction approaches Cluster analysis Principal components analysis

More information

Lecture 2: Diversity, Distances, adonis. Lecture 2: Diversity, Distances, adonis. Alpha- Diversity. Alpha diversity definition(s)

Lecture 2: Diversity, Distances, adonis. Lecture 2: Diversity, Distances, adonis. Alpha- Diversity. Alpha diversity definition(s) Lecture 2: Diversity, Distances, adonis Lecture 2: Diversity, Distances, adonis Diversity - alpha, beta (, gamma) Beta- Diversity in practice: Ecological Distances Unsupervised Learning: Clustering, etc

More information

Unsupervised machine learning

Unsupervised machine learning Chapter 9 Unsupervised machine learning Unsupervised machine learning (a.k.a. cluster analysis) is a set of methods to assign objects into clusters under a predefined distance measure when class labels

More information

Regression analysis is a tool for building mathematical and statistical models that characterize relationships between variables Finds a linear

Regression analysis is a tool for building mathematical and statistical models that characterize relationships between variables Finds a linear Regression analysis is a tool for building mathematical and statistical models that characterize relationships between variables Finds a linear relationship between: - one independent variable X and -

More information

4/4/2018. Stepwise model fitting. CCA with first three variables only Call: cca(formula = community ~ env1 + env2 + env3, data = envdata)

4/4/2018. Stepwise model fitting. CCA with first three variables only Call: cca(formula = community ~ env1 + env2 + env3, data = envdata) 0 Correlation matrix for ironmental matrix 1 2 3 4 5 6 7 8 9 10 11 12 0.087451 0.113264 0.225049-0.13835 0.338366-0.01485 0.166309-0.11046 0.088327-0.41099-0.19944 1 1 2 0.087451 1 0.13723-0.27979 0.062584

More information

Multivariate Analysis of Ecological Data using CANOCO

Multivariate Analysis of Ecological Data using CANOCO Multivariate Analysis of Ecological Data using CANOCO JAN LEPS University of South Bohemia, and Czech Academy of Sciences, Czech Republic Universitats- uric! Lanttesbibiiothek Darmstadt Bibliothek Biologie

More information

CAP. Canonical Analysis of Principal coordinates. A computer program by Marti J. Anderson. Department of Statistics University of Auckland (2002)

CAP. Canonical Analysis of Principal coordinates. A computer program by Marti J. Anderson. Department of Statistics University of Auckland (2002) CAP Canonical Analysis of Principal coordinates A computer program by Marti J. Anderson Department of Statistics University of Auckland (2002) 2 DISCLAIMER This FORTRAN program is provided without any

More information

Ordination & PCA. Ordination. Ordination

Ordination & PCA. Ordination. Ordination Ordination & PCA Introduction to Ordination Purpose & types Shepard diagrams Principal Components Analysis (PCA) Properties Computing eigenvalues Computing principal components Biplots Covariance vs. Correlation

More information

Discrimination Among Groups. Discrimination Among Groups

Discrimination Among Groups. Discrimination Among Groups Discrimination Among Groups Id Species Canopy Snag Canopy Cover Density Height 1 A 80 1.2 35 2 A 75 0.5 32 3 A 72 2.8 28..... 31 B 35 3.3 15 32 B 75 4.1 25 60 B 15 5.0 3..... 61 C 5 2.1 5 62 C 8 3.4 2

More information

Distance Measures. Objectives: Discuss Distance Measures Illustrate Distance Measures

Distance Measures. Objectives: Discuss Distance Measures Illustrate Distance Measures Distance Measures Objectives: Discuss Distance Measures Illustrate Distance Measures Quantifying Data Similarity Multivariate Analyses Re-map the data from Real World Space to Multi-variate Space Distance

More information

Purposes of Data Analysis. Variables and Samples. Parameters and Statistics. Part 1: Probability Distributions

Purposes of Data Analysis. Variables and Samples. Parameters and Statistics. Part 1: Probability Distributions Part 1: Probability Distributions Purposes of Data Analysis True Distributions or Relationships in the Earths System Probability Distribution Normal Distribution Student-t Distribution Chi Square Distribution

More information

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Fundamentals to Biostatistics Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Statistics collection, analysis, interpretation of data development of new

More information

Revision: Chapter 1-6. Applied Multivariate Statistics Spring 2012

Revision: Chapter 1-6. Applied Multivariate Statistics Spring 2012 Revision: Chapter 1-6 Applied Multivariate Statistics Spring 2012 Overview Cov, Cor, Mahalanobis, MV normal distribution Visualization: Stars plot, mosaic plot with shading Outlier: chisq.plot Missing

More information

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric Assumptions The observations must be independent. Dependent variable should be continuous

More information

* Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course.

* Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course. Name of the course Statistical methods and data analysis Audience The course is intended for students of the first or second year of the Graduate School in Materials Engineering. The aim of the course

More information

BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression

BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression Introduction to Correlation and Regression The procedures discussed in the previous ANOVA labs are most useful in cases where we are interested

More information

FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING

FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING Vishwanath Mantha Department for Electrical and Computer Engineering Mississippi State University, Mississippi State, MS 39762 mantha@isip.msstate.edu ABSTRACT

More information

Statistics Toolbox 6. Apply statistical algorithms and probability models

Statistics Toolbox 6. Apply statistical algorithms and probability models Statistics Toolbox 6 Apply statistical algorithms and probability models Statistics Toolbox provides engineers, scientists, researchers, financial analysts, and statisticians with a comprehensive set of

More information

EXAM PRACTICE. 12 questions * 4 categories: Statistics Background Multivariate Statistics Interpret True / False

EXAM PRACTICE. 12 questions * 4 categories: Statistics Background Multivariate Statistics Interpret True / False EXAM PRACTICE 12 questions * 4 categories: Statistics Background Multivariate Statistics Interpret True / False Stats 1: What is a Hypothesis? A testable assertion about how the world works Hypothesis

More information

MSc in Statistics and Operations Research

MSc in Statistics and Operations Research MSc in Statistics and Operations Research Title: Permutation multivariate analysis of variance on real data and simulations to evaluate for robustness against dispersion and unbalancedness. Author: Lucas

More information

Unsupervised learning: beyond simple clustering and PCA

Unsupervised learning: beyond simple clustering and PCA Unsupervised learning: beyond simple clustering and PCA Liza Rebrova Self organizing maps (SOM) Goal: approximate data points in R p by a low-dimensional manifold Unlike PCA, the manifold does not have

More information

Nonparametric Statistics. Leah Wright, Tyler Ross, Taylor Brown

Nonparametric Statistics. Leah Wright, Tyler Ross, Taylor Brown Nonparametric Statistics Leah Wright, Tyler Ross, Taylor Brown Before we get to nonparametric statistics, what are parametric statistics? These statistics estimate and test population means, while holding

More information

Chapter 11 Canonical analysis

Chapter 11 Canonical analysis Chapter 11 Canonical analysis 11.0 Principles of canonical analysis Canonical analysis is the simultaneous analysis of two, or possibly several data tables. Canonical analyses allow ecologists to perform

More information

1.3. Principal coordinate analysis. Pierre Legendre Département de sciences biologiques Université de Montréal

1.3. Principal coordinate analysis. Pierre Legendre Département de sciences biologiques Université de Montréal 1.3. Pierre Legendre Département de sciences biologiques Université de Montréal http://www.numericalecology.com/ Pierre Legendre 2018 Definition of principal coordinate analysis (PCoA) An ordination method

More information

Chapter 1. Gaining Knowledge with Design of Experiments

Chapter 1. Gaining Knowledge with Design of Experiments Chapter 1 Gaining Knowledge with Design of Experiments 1.1 Introduction 2 1.2 The Process of Knowledge Acquisition 2 1.2.1 Choosing the Experimental Method 5 1.2.2 Analyzing the Results 5 1.2.3 Progressively

More information

Statistics II 1. Modelling Biology. Basic Applications of Mathematics and Statistics in the Biological Sciences

Statistics II 1. Modelling Biology. Basic Applications of Mathematics and Statistics in the Biological Sciences Statistics II Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part II: Data Analysis and Statistics Script C Introductory Course for Students of Biology, Biotechnology

More information

Small n, σ known or unknown, underlying nongaussian

Small n, σ known or unknown, underlying nongaussian READY GUIDE Summary Tables SUMMARY-1: Methods to compute some confidence intervals Parameter of Interest Conditions 95% CI Proportion (π) Large n, p 0 and p 1 Equation 12.11 Small n, any p Figure 12-4

More information

4/6/16. Non-parametric Test. Overview. Stephen Opiyo. Distinguish Parametric and Nonparametric Test Procedures

4/6/16. Non-parametric Test. Overview. Stephen Opiyo. Distinguish Parametric and Nonparametric Test Procedures Non-parametric Test Stephen Opiyo Overview Distinguish Parametric and Nonparametric Test Procedures Explain commonly used Nonparametric Test Procedures Perform Hypothesis Tests Using Nonparametric Procedures

More information

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

Correlation. A statistics method to measure the relationship between two variables. Three characteristics Correlation Correlation A statistics method to measure the relationship between two variables Three characteristics Direction of the relationship Form of the relationship Strength/Consistency Direction

More information

INTRODUCTION TO MULTIVARIATE ANALYSIS OF ECOLOGICAL DATA

INTRODUCTION TO MULTIVARIATE ANALYSIS OF ECOLOGICAL DATA INTRODUCTION TO MULTIVARIATE ANALYSIS OF ECOLOGICAL DATA David Zelený & Ching-Feng Li INTRODUCTION TO MULTIVARIATE ANALYSIS Ecologial similarity similarity and distance indices Gradient analysis regression,

More information

Factors affecting the Power and Validity of Randomization-based Multivariate Tests for Difference among Ecological Assemblages

Factors affecting the Power and Validity of Randomization-based Multivariate Tests for Difference among Ecological Assemblages Factors affecting the Power and Validity of Randomization-based Multivariate Tests for Difference among Ecological Assemblages Cameron Hurst B.Sc. (Hons) This thesis was submitted in fulfillment of the

More information

Lecture 6: Single-classification multivariate ANOVA (k-group( MANOVA)

Lecture 6: Single-classification multivariate ANOVA (k-group( MANOVA) Lecture 6: Single-classification multivariate ANOVA (k-group( MANOVA) Rationale and MANOVA test statistics underlying principles MANOVA assumptions Univariate ANOVA Planned and unplanned Multivariate ANOVA

More information

Self Organizing Maps

Self Organizing Maps Sta306b May 21, 2012 Dimension Reduction: 1 Self Organizing Maps A SOM represents the data by a set of prototypes (like K-means. These prototypes are topologically organized on a lattice structure. In

More information

STAT 730 Chapter 14: Multidimensional scaling

STAT 730 Chapter 14: Multidimensional scaling STAT 730 Chapter 14: Multidimensional scaling Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Data Analysis 1 / 16 Basic idea We have n objects and a matrix

More information

SRI RAMAKRISHNA INSTITUTE OF TECHNOLOGY DEPARTMENT OF SCIENCE & HUMANITIES STATISTICS & NUMERICAL METHODS TWO MARKS

SRI RAMAKRISHNA INSTITUTE OF TECHNOLOGY DEPARTMENT OF SCIENCE & HUMANITIES STATISTICS & NUMERICAL METHODS TWO MARKS SRI RAMAKRISHNA INSTITUTE OF TECHNOLOGY DEPARTMENT OF SCIENCE & HUMANITIES STATISTICS & NUMERICAL METHODS TWO MARKS UNIT-I HYPOTHESIS TESTING 1. What are the applications of distributions? * Test the hypothesis

More information

2 Dean C. Adams and Gavin J. P. Naylor the best three-dimensional ordination of the structure space is found through an eigen-decomposition (correspon

2 Dean C. Adams and Gavin J. P. Naylor the best three-dimensional ordination of the structure space is found through an eigen-decomposition (correspon A Comparison of Methods for Assessing the Structural Similarity of Proteins Dean C. Adams and Gavin J. P. Naylor? Dept. Zoology and Genetics, Iowa State University, Ames, IA 50011, U.S.A. 1 Introduction

More information

Algebra of Principal Component Analysis

Algebra of Principal Component Analysis Algebra of Principal Component Analysis 3 Data: Y = 5 Centre each column on its mean: Y c = 7 6 9 y y = 3..6....6.8 3. 3.8.6 Covariance matrix ( variables): S = -----------Y n c ' Y 8..6 c =.6 5.8 Equation

More information

CLASSICAL NORMAL-BASED DISCRIMINANT ANALYSIS

CLASSICAL NORMAL-BASED DISCRIMINANT ANALYSIS CLASSICAL NORMAL-BASED DISCRIMINANT ANALYSIS EECS 833, March 006 Geoff Bohling Assistant Scientist Kansas Geological Survey geoff@gs.u.edu 864-093 Overheads and resources available at http://people.u.edu/~gbohling/eecs833

More information

Background to Statistics

Background to Statistics FACT SHEET Background to Statistics Introduction Statistics include a broad range of methods for manipulating, presenting and interpreting data. Professional scientists of all kinds need to be proficient

More information

Glossary for the Triola Statistics Series

Glossary for the Triola Statistics Series Glossary for the Triola Statistics Series Absolute deviation The measure of variation equal to the sum of the deviations of each value from the mean, divided by the number of values Acceptance sampling

More information

AP Statistics Cumulative AP Exam Study Guide

AP Statistics Cumulative AP Exam Study Guide AP Statistics Cumulative AP Eam Study Guide Chapters & 3 - Graphs Statistics the science of collecting, analyzing, and drawing conclusions from data. Descriptive methods of organizing and summarizing statistics

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 1 Bootstrapped Bias and CIs Given a multiple regression model with mean and

More information

Principal Components Analysis. Sargur Srihari University at Buffalo

Principal Components Analysis. Sargur Srihari University at Buffalo Principal Components Analysis Sargur Srihari University at Buffalo 1 Topics Projection Pursuit Methods Principal Components Examples of using PCA Graphical use of PCA Multidimensional Scaling Srihari 2

More information

Statistics Handbook. All statistical tables were computed by the author.

Statistics Handbook. All statistical tables were computed by the author. Statistics Handbook Contents Page Wilcoxon rank-sum test (Mann-Whitney equivalent) Wilcoxon matched-pairs test 3 Normal Distribution 4 Z-test Related samples t-test 5 Unrelated samples t-test 6 Variance

More information

Types of Statistical Tests DR. MIKE MARRAPODI

Types of Statistical Tests DR. MIKE MARRAPODI Types of Statistical Tests DR. MIKE MARRAPODI Tests t tests ANOVA Correlation Regression Multivariate Techniques Non-parametric t tests One sample t test Independent t test Paired sample t test One sample

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

Agonistic Display in Betta splendens: Data Analysis I. Betta splendens Research: Parametric or Non-parametric Data?

Agonistic Display in Betta splendens: Data Analysis I. Betta splendens Research: Parametric or Non-parametric Data? Agonistic Display in Betta splendens: Data Analysis By Joanna Weremjiwicz, Simeon Yurek, and Dana Krempels Once you have collected data with your ethogram, you are ready to analyze that data to see whether

More information

Principal component analysis

Principal component analysis Principal component analysis Motivation i for PCA came from major-axis regression. Strong assumption: single homogeneous sample. Free of assumptions when used for exploration. Classical tests of significance

More information

Predictive analysis on Multivariate, Time Series datasets using Shapelets

Predictive analysis on Multivariate, Time Series datasets using Shapelets 1 Predictive analysis on Multivariate, Time Series datasets using Shapelets Hemal Thakkar Department of Computer Science, Stanford University hemal@stanford.edu hemal.tt@gmail.com Abstract Multivariate,

More information

2/19/2018. Dataset: 85,122 islands 19,392 > 1km 2 17,883 with data

2/19/2018. Dataset: 85,122 islands 19,392 > 1km 2 17,883 with data The group numbers are arbitrary. Remember that you can rotate dendrograms around any node and not change the meaning. So, the order of the clusters is not meaningful. Taking a subset of the data changes

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Evaluation Strategies

Evaluation Strategies Evaluation Intrinsic Evaluation Comparison with an ideal output: Challenges: Requires a large testing set Intrinsic subjectivity of some discourse related judgments Hard to find corpora for training/testing

More information

Discrete Multivariate Statistics

Discrete Multivariate Statistics Discrete Multivariate Statistics Univariate Discrete Random variables Let X be a discrete random variable which, in this module, will be assumed to take a finite number of t different values which are

More information

CHAPTER 5. Outlier Detection in Multivariate Data

CHAPTER 5. Outlier Detection in Multivariate Data CHAPTER 5 Outlier Detection in Multivariate Data 5.1 Introduction Multivariate outlier detection is the important task of statistical analysis of multivariate data. Many methods have been proposed for

More information

Chapter 4: Regression Models

Chapter 4: Regression Models Sales volume of company 1 Textbook: pp. 129-164 Chapter 4: Regression Models Money spent on advertising 2 Learning Objectives After completing this chapter, students will be able to: Identify variables,

More information

Basic Statistical Analysis

Basic Statistical Analysis indexerrt.qxd 8/21/2002 9:47 AM Page 1 Corrected index pages for Sprinthall Basic Statistical Analysis Seventh Edition indexerrt.qxd 8/21/2002 9:47 AM Page 656 Index Abscissa, 24 AB-STAT, vii ADD-OR rule,

More information

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti

More information

Course in Data Science

Course in Data Science Course in Data Science About the Course: In this course you will get an introduction to the main tools and ideas which are required for Data Scientist/Business Analyst/Data Analyst. The course gives an

More information

Linear Dimensionality Reduction

Linear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis

More information

Short Answer Questions: Answer on your separate blank paper. Points are given in parentheses.

Short Answer Questions: Answer on your separate blank paper. Points are given in parentheses. ISQS 6348 Final exam solutions. Name: Open book and notes, but no electronic devices. Answer short answer questions on separate blank paper. Answer multiple choice on this exam sheet. Put your name on

More information

Module 9: Nonparametric Statistics Statistics (OA3102)

Module 9: Nonparametric Statistics Statistics (OA3102) Module 9: Nonparametric Statistics Statistics (OA3102) Professor Ron Fricker Naval Postgraduate School Monterey, California Reading assignment: WM&S chapter 15.1-15.6 Revision: 3-12 1 Goals for this Lecture

More information

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Introduction Edps/Psych/Stat/ 584 Applied Multivariate Statistics Carolyn J Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN c Board of Trustees,

More information

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

Chapte The McGraw-Hill Companies, Inc. All rights reserved. er15 Chapte Chi-Square Tests d Chi-Square Tests for -Fit Uniform Goodness- Poisson Goodness- Goodness- ECDF Tests (Optional) Contingency Tables A contingency table is a cross-tabulation of n paired observations

More information

-However, this definition can be expanded to include: biology (biometrics), environmental science (environmetrics), economics (econometrics).

-However, this definition can be expanded to include: biology (biometrics), environmental science (environmetrics), economics (econometrics). Chemometrics Application of mathematical, statistical, graphical or symbolic methods to maximize chemical information. -However, this definition can be expanded to include: biology (biometrics), environmental

More information

Preprocessing & dimensionality reduction

Preprocessing & dimensionality reduction Introduction to Data Mining Preprocessing & dimensionality reduction CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University Fall 2016 CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall 2016

More information

Sources of randomness

Sources of randomness Random Number Generator Chapter 7 In simulations, we generate random values for variables with a specified distribution Ex., model service times using the exponential distribution Generation of random

More information

Analysing data: regression and correlation S6 and S7

Analysing data: regression and correlation S6 and S7 Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association

More information

Lecture 5: Ecological distance metrics; Principal Coordinates Analysis. Univariate testing vs. community analysis

Lecture 5: Ecological distance metrics; Principal Coordinates Analysis. Univariate testing vs. community analysis Lecture 5: Ecological distance metrics; Principal Coordinates Analysis Univariate testing vs. community analysis Univariate testing deals with hypotheses concerning individual taxa Is this taxon differentially

More information

Correspondence Analysis & Related Methods

Correspondence Analysis & Related Methods Correspondence Analysis & Related Methods Michael Greenacre SESSION 9: CA applied to rankings, preferences & paired comparisons Correspondence analysis (CA) can also be applied to other types of data:

More information

Machine Learning Linear Regression. Prof. Matteo Matteucci

Machine Learning Linear Regression. Prof. Matteo Matteucci Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares

More information

ESP 178 Applied Research Methods. 2/23: Quantitative Analysis

ESP 178 Applied Research Methods. 2/23: Quantitative Analysis ESP 178 Applied Research Methods 2/23: Quantitative Analysis Data Preparation Data coding create codebook that defines each variable, its response scale, how it was coded Data entry for mail surveys and

More information

MULTIVARIATE ANALYSIS OF VARIANCE

MULTIVARIATE ANALYSIS OF VARIANCE MULTIVARIATE ANALYSIS OF VARIANCE RAJENDER PARSAD AND L.M. BHAR Indian Agricultural Statistics Research Institute Library Avenue, New Delhi - 0 0 lmb@iasri.res.in. Introduction In many agricultural experiments,

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

1km. the estuary (J). The spatial extent of a site (50m x25m) is approximately as tall as each letter and twice as wide

1km. the estuary (J). The spatial extent of a site (50m x25m) is approximately as tall as each letter and twice as wide 1km Fig. 5. Map of sampling sites within the Orewa estuary. Sites are labeled alphabetically and sequentially from the estuary mouth (A) to the inner reaches of the estuary (J). The spatial extent of a

More information

Lecture: Mixture Models for Microbiome data

Lecture: Mixture Models for Microbiome data Lecture: Mixture Models for Microbiome data Lecture 3: Mixture Models for Microbiome data Outline: - - Sequencing thought experiment Mixture Models (tangent) - (esp. Negative Binomial) - Differential abundance

More information

LINGUIST 716 Week 9: Compuational methods for finding dimensions

LINGUIST 716 Week 9: Compuational methods for finding dimensions LINGUIST 716 Week 9: Compuational methods for finding dimensions Kristine Yu Department of Linguistics, UMass Amherst November 1, 2013 Computational methods for finding dimensions 716 Fall 2013 Week 9

More information

Correspondence Analysis & Related Methods

Correspondence Analysis & Related Methods Corresponence Analysis & Relate Methos Michael Greenacre SESSION 3: MUIDIMENSIONA SCAING (MDS DIMENSION REDUCION CASSICA MDS NONMERIC MDS Distances an issimilarities... n objects = istance between object

More information

Disadvantages of using many pooled t procedures. The sampling distribution of the sample means. The variability between the sample means

Disadvantages of using many pooled t procedures. The sampling distribution of the sample means. The variability between the sample means Stat 529 (Winter 2011) Analysis of Variance (ANOVA) Reading: Sections 5.1 5.3. Introduction and notation Birthweight example Disadvantages of using many pooled t procedures The analysis of variance procedure

More information

Linear Regression Models

Linear Regression Models Linear Regression Models Model Description and Model Parameters Modelling is a central theme in these notes. The idea is to develop and continuously improve a library of predictive models for hazards,

More information

Correlation and Regression (Excel 2007)

Correlation and Regression (Excel 2007) Correlation and Regression (Excel 2007) (See Also Scatterplots, Regression Lines, and Time Series Charts With Excel 2007 for instructions on making a scatterplot of the data and an alternate method of

More information

Simulating Uniform- and Triangular- Based Double Power Method Distributions

Simulating Uniform- and Triangular- Based Double Power Method Distributions Journal of Statistical and Econometric Methods, vol.6, no.1, 2017, 1-44 ISSN: 1792-6602 (print), 1792-6939 (online) Scienpress Ltd, 2017 Simulating Uniform- and Triangular- Based Double Power Method Distributions

More information

Freeman (2005) - Graphic Techniques for Exploring Social Network Data

Freeman (2005) - Graphic Techniques for Exploring Social Network Data Freeman (2005) - Graphic Techniques for Exploring Social Network Data The analysis of social network data has two main goals: 1. Identify cohesive groups 2. Identify social positions Moreno (1932) was

More information

Prentice Hall Stats: Modeling the World 2004 (Bock) Correlated to: National Advanced Placement (AP) Statistics Course Outline (Grades 9-12)

Prentice Hall Stats: Modeling the World 2004 (Bock) Correlated to: National Advanced Placement (AP) Statistics Course Outline (Grades 9-12) National Advanced Placement (AP) Statistics Course Outline (Grades 9-12) Following is an outline of the major topics covered by the AP Statistics Examination. The ordering here is intended to define the

More information

Lecture 3: Mixture Models for Microbiome data. Lecture 3: Mixture Models for Microbiome data

Lecture 3: Mixture Models for Microbiome data. Lecture 3: Mixture Models for Microbiome data Lecture 3: Mixture Models for Microbiome data 1 Lecture 3: Mixture Models for Microbiome data Outline: - Mixture Models (Negative Binomial) - DESeq2 / Don t Rarefy. Ever. 2 Hypothesis Tests - reminder

More information