Multivariate analysis
|
|
- Rafe Hunt
- 5 years ago
- Views:
Transcription
1 Multivariate analysis Prof dr Ann Vanreusel -Multidimensional scaling -Simper analysis -BEST -ANOSIM
2 1 2 Gradient in species composition 3 4 Gradient in environment site1 site2 site 3 site 4 site species a 1 2 species b species c species d species e 2 beach zonation beach zonation Similarity site 3 site Stress: site 4 1 site1 site2 site 3 site 4 site site1 site2
3 Clustering or Classification some disadvantages Even when there is contious structure in the data matrix DISCONTINUOUS OUTPUT CLUSTERS Variation in communities rather continuous than discontinuous However still useful in ecology, mainly in combination with ordination In order to recognize structure (communities) in large datamatrices.
4 Non metric multidimensional scaling = ordination points close together = sites similar in (species) composition points far apart = sites dissimilar in (species) composition MDS original (species) composition data are replaced by matrix of dissimilarity values between sites this matrix is used to obtain ordination diagram Specifies what similar means Measure needed that expresses how well or badly the distances in the ordination diagram correspond to the dissimilarity values = stress function MDS to choose a configuration that minimizes the degree of stress
5 Metric ordination (CA, PCA) Stress function depends on the actual numerical values of the dissimilarities Chi square CA Euclidean distance PCA Non metric ordination (MDS) Stress function depends only on the rank order of the dissimilarities Characteristics better flexibility complex algorithm rationale simple few if any assumptions
6 Based on ranks of similarities Raw data similarities ranks ordination The higher similarity has the lowest rank
7 site1 site2 site 3 site 4 site species a 1 2 species b species c species d species e 2 Raw counts Bray Curtis similarity matrix Site 1 site 2 site 3 site 4 Site 2 8 Site 3 44,44 44,44 Site 4 19,4 19,4 63,1 Site 1,2 1,2 8,82 7 Site 1 site 2 site 3 site 4 Site 2 1 Site 3 Site Site Ordination diagram 2 site 4 site Ordination ranks 3 and 4 6 and 7 site 3 Resemblance: S17 Bray Curtis similarity 2D Stress: 1 site1 site2
8 What are stages in the construction of an MDS diagram? Iterative procedure Successively refining of the positions of the points until they satisfy as closely as possible the dissimilarity relationships between samples I. Specify nr of dimensions (usually 2 ) II. Starting configuration of samples (whatever..) III. Regress interpoint distances from this plot on the corresponding dissimilarities
9 Shepard diagram non-parametric regression = non metric MDS (regression metric MDS) = best fitting line which moulds itself to the shape of scatterplot = constrained to increase (series of steps)
10 IV. Goodness of fit of the regression by calculating the stress value ΣΣ (d jk d jk )² Stress = ΣΣd jk ² Predicted from regression line Larger scatter = larger stress V. Points are moved to new positions in distribution which decrease the stress most rapidly VI. Repeat steps 3 to until no further improvement of stress can be achieved
11 Iterative procedure gradually finds it way down to a minimum of the stress function traps - Local minimum of stress function in stead of global minimum Repeat MDS starting with different random positions of samples If same solution re-appears best solution - Degenerate solutions f.i. if data divide in two groups with no species in common No sense to determine how far apart groups should be placed in the MDS plot infinitely apart Two separate analyses
12 Adequacy of MDS ordination Is stress value small? Is a 2 dimensional plot a usable summary of the sample relationships? Stress <. excellent Stress <.1 good Stress <.2 potential useful Stress >.3 arbitrarily placed points in 2 dimensional space Does the shepard diagram appears satisfactory? The stress value totals the scatter around the regression line in a shepard diagram Outliers might need a higher dimensional representation for accurate placement
13 Strenghts Weakness Simple in concept Based on relevant sample information Species deletions are unnecessary Generally applicable Similarities can be given unequal weight Computionally demanding Convergence to the global minimum of stress is not guaranteed The algorithm places most weight on the large distances
14 Based on road distance matrix Based on real distance matrix
15 site1 site2 site 3 site 4 site Resemblance: S17 Bray Curtis similarity 2D Stress: species a species b species c species d species e 1 2 site1 site2 site 4site Resemblance: S17 Bray Curtis similarity 2D Stress: species a site 3 site1 site2 2 4 site site 4,4 1,6 site 3 1 2,8 4 Bubble plots Distribution of species over stations Resemblance: S17 Bray Curtis similarity 2D Stress: species e Resemblance: S17 Bray Curtis similarity 2D Stress: species c site1 site2 site 3 1 site site 4 2, 2 3, site1 site2 site 3 2 site site 4 2 1, 2 3,
16 ANOSIM (Analysis of similarities) To test for statistically significant differences between groups A priori defined structure within set of samples (e.g. replicates ) = simple non-parametric permutation procedure applied to the (rank) similarity matrix Null hypothesis No significant differences in community composition between a priori defined groups
17 st1a st1b st1c st2a st2b st2c st3a st3b st3c spec A spec B Spec C spec D Spec E spec F Resemblance: S17 Bray Curtis similarity 2D Stress:,1 site st2a st1c st2b st2c st1b st1a st3a st3b st3c Significant differences in species composition between sites???
18 Cfr ANOVA Compute test statistic R reflecting the observed differences between sites contrasted with differences among replicates within sites Test is based on distances between and within sites or better Based on ranked similarities R is based on difference between - average of rank similarities of all pairs between sites And - Average of rank similarities from all pairs within sites r R = B -r W ((n(n-1)/2)/2) 1 when all replicates within sites are more similar to each other than any other replicates from different sites
19 Rationale of permutation test all possible allocations of replicate labels to any sample is examined and R statistic is calculated (all = a large number of times) If R statistic falls outside range of R s obtained after permutation H is rejected (H : no site differences)
20 Global Test Sample statistic (Global R):,934 Significance level of sample statistic:,4% Number of permutations: 28 (All possible permutations) Number of permuted statistics greater than or equal to Global R: 1 Pairwise Tests R Significance Possible Actual Number >= Groups Statistic Level % Permutations Permutations Observed 1, 2, , , = low st2a st1c st2b st2c st1b st1a Resemblance: S17 Bray Curtis similarity 2D Stress:,1 st3a st3b st3c site Significant differences in species composition between sites??? ANOSIM Ho : no sites difference P < % (p>.) R close or = to 1 Ho rejected Sites are different
21 If R statistic falls outside range of R s obtained after permutation H is rejected (H : no site differences) 73 site Test Sample statistic (Global R):,934 R =.943 is very unlikely 4 times on thousands trials (p =.4 %) Frequency -,4 -,3 -,2 -,1,1,2,3,4,,6,7,8,9 1, R
22 So far global test To test for specific pairs of sites Repeated significancy test cumulation of risks to draw incorrect conclusion (type I error) Global test is most reliable higher nr of replicates sufficient permutations Pairwise test rather look at R (in stead of p) R approaching 1 separation (in case of low stress value also obvious from MDS) R appraoching no separation Also ANOSIM for two lay layout
23 Correlation with environmental variables BEST analysis Selects environmental variables, or species "best explaining" community pattern, by maximising a rank correlation between their respective resemblance matrices. Two algorithms are available. In the BIOENV algorithm all permutations of the trial variables are tried. In the BVSTEP algorithm a stepwise search over the trial variables is tried. Use BVSTEP if there is a large number of trial variables and BIOENV is too slow.
24 BIO -ENV Linking community analysis to environmental variables To which extent are physico-chemical variables related ( explains ) to the observed biological pattern By superimposing univariates on top of the MDS plot
25 MDS repeated for specific combination of environmental variables Best fitting environmental combination Match between any two plots Ranks of two similarity matrices are compared through a (weighted) rank correlation coefficient (take care for collinearity)
26
27
28 SIMPER (similarity percentages) Species similarity matrix MDS Often high stress for species MDS Therefore concentrate on sample similarities and highlight species responsible for determining the sample groupings in cluster or ordination analysis Compute the average dissimilaity (δ) between all pairs of the intergroup samples = every sample in group 1 paired with every sample in group 2 Break the average down into specific contributions from each species to δ Discriminating species When it contributes much to the dissimilarity between group 1 and 2 (δ is large) When it does so consistently in the inter comparisons of all samples in the 2 groups Standard Deviation of δ is small
29 Species that are good discriminators between groups are indicated by *
30 E. Affinis explains almost 3 % Intra group similarity typical species (not necessarily a good discriminator)
31 st1a st1b st1c st2a st2b st2c st3a st3b st3c spec A spec B Spec C spec D Spec E spec F Groups 1 & 2 Average dissimilarity = 19,61 Group 1 Group 2 Species Av.Abund Av.Abund Av.Diss Diss/SD Contrib% Cum.% spec A 1,33 3,67 6,87 3,33 3,4 3,4 spec B 4, 2,,82 1,69 29,69 64,73 spec D 6,, 3,67 1,22 18,7 83,44 Groups 1 & 3 Average dissimilarity = 8,27 Group 1 Group 3 Species Av.Abund Av.Abund Av.Diss Diss/SD Contrib% Cum.% spec D 6, 21, 24,69 6,78 3,76 3,76 spec F, 12,33 2,9 4,78 2,2,78 Spec E, 1, 16,42 11,19 2,46 76,23 Groups 2 & 3 Average dissimilarity = 83,2 Group 2 Group 3 Species Av.Abund Av.Abund Av.Diss Diss/SD Contrib% Cum.% spec D, 21, 26,9 6,69 32,37 32,37 spec F, 12,33 2,3 4,81 24,66 7,3 Spec E, 1, 16,79 11,29 2,16 77,19
32 st1a st1b st1c st2a st2b st2c st3a st3b st3c spec A spec B Spec C spec D Spec E spec F Group 1 Average similarity: 84,93 Species Av.Abund Av.Sim Sim/SD Contrib% Cum.% spec D 6, 3,34 6,6 3,72 3,72 Spec C 6,33 3,13 24,8 3,48 71,2 spec B 4, 18,78 9, 22,11 93,32 Group 2 Average similarity: 87,6 Species Av.Abund Av.Sim Sim/SD Contrib% Cum.% Spec C,67 32,6 21,8 37,21 37,21 spec D, 26,46 14,13 3,2 67,42 spec A 3,67 2,32 9,16 23,2 9,61 Group 3 Average similarity: 9,4 Species Av.Abund Av.Sim Sim/SD Contrib% Cum.% spec D 21, 4,47 11,24,, spec F 12,33 23,1 7,24 2, 76, Spec E 1, 21,6 13,21 23,9 1,
DETECTING BIOLOGICAL AND ENVIRONMENTAL CHANGES: DESIGN AND ANALYSIS OF MONITORING AND EXPERIMENTS (University of Bologna, 3-14 March 2008)
Dipartimento di Biologia Evoluzionistica Sperimentale Centro Interdipartimentale di Ricerca per le Scienze Ambientali in Ravenna INTERNATIONAL WINTER SCHOOL UNIVERSITY OF BOLOGNA DETECTING BIOLOGICAL AND
More informationExperimental Design and Data Analysis for Biologists
Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1
More informationMultivariate Statistics 101. Ordination (PCA, NMDS, CA) Cluster Analysis (UPGMA, Ward s) Canonical Correspondence Analysis
Multivariate Statistics 101 Ordination (PCA, NMDS, CA) Cluster Analysis (UPGMA, Ward s) Canonical Correspondence Analysis Multivariate Statistics 101 Copy of slides and exercises PAST software download
More informationANOVA approach. Investigates interaction terms. Disadvantages: Requires careful sampling design with replication
ANOVA approach Advantages: Ideal for evaluating hypotheses Ideal to quantify effect size (e.g., differences between groups) Address multiple factors at once Investigates interaction terms Disadvantages:
More informationMultivariate Statistics Summary and Comparison of Techniques. Multivariate Techniques
Multivariate Statistics Summary and Comparison of Techniques P The key to multivariate statistics is understanding conceptually the relationship among techniques with regards to: < The kinds of problems
More information4. Ordination in reduced space
Université Laval Analyse multivariable - mars-avril 2008 1 4.1. Generalities 4. Ordination in reduced space Contrary to most clustering techniques, which aim at revealing discontinuities in the data, ordination
More information4/2/2018. Canonical Analyses Analysis aimed at identifying the relationship between two multivariate datasets. Cannonical Correlation.
GAL50.44 0 7 becki 2 0 chatamensis 0 darwini 0 ephyppium 0 guntheri 3 0 hoodensis 0 microphyles 0 porteri 2 0 vandenburghi 0 vicina 4 0 Multiple Response Variables? Univariate Statistics Questions Individual
More informationMultivariate Analysis of Ecological Data
Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology
More informationEdwin A. Hernández-Delgado*
Long-term Coral Reef Ecological Change Monitoring Program of the Luis Peña Channel Marine Fishery Reserve, Culebra Island, Puerto Rico: I. Status of the coral reef epibenthic communities (1997-2002). Edwin
More informationTest Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics
Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics The candidates for the research course in Statistics will have to take two shortanswer type tests
More informationUnconstrained Ordination
Unconstrained Ordination Sites Species A Species B Species C Species D Species E 1 0 (1) 5 (1) 1 (1) 10 (4) 10 (4) 2 2 (3) 8 (3) 4 (3) 12 (6) 20 (6) 3 8 (6) 20 (6) 10 (6) 1 (2) 3 (2) 4 4 (5) 11 (5) 8 (5)
More informationMultivariate Analysis of Ecological Data
Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology
More informationBIO 682 Multivariate Statistics Spring 2008
BIO 682 Multivariate Statistics Spring 2008 Steve Shuster http://www4.nau.edu/shustercourses/bio682/index.htm Lecture 11 Properties of Community Data Gauch 1982, Causton 1988, Jongman 1995 a. Qualitative:
More informationSTAT Section 5.8: Block Designs
STAT 518 --- Section 5.8: Block Designs Recall that in paired-data studies, we match up pairs of subjects so that the two subjects in a pair are alike in some sense. Then we randomly assign, say, treatment
More informationTrip Distribution Modeling Milos N. Mladenovic Assistant Professor Department of Built Environment
Trip Distribution Modeling Milos N. Mladenovic Assistant Professor Department of Built Environment 25.04.2017 Course Outline Forecasting overview and data management Trip generation modeling Trip distribution
More informationMultivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis
Multivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis For example Data reduction approaches Cluster analysis Principal components analysis
More informationLecture 2: Diversity, Distances, adonis. Lecture 2: Diversity, Distances, adonis. Alpha- Diversity. Alpha diversity definition(s)
Lecture 2: Diversity, Distances, adonis Lecture 2: Diversity, Distances, adonis Diversity - alpha, beta (, gamma) Beta- Diversity in practice: Ecological Distances Unsupervised Learning: Clustering, etc
More informationUnsupervised machine learning
Chapter 9 Unsupervised machine learning Unsupervised machine learning (a.k.a. cluster analysis) is a set of methods to assign objects into clusters under a predefined distance measure when class labels
More informationRegression analysis is a tool for building mathematical and statistical models that characterize relationships between variables Finds a linear
Regression analysis is a tool for building mathematical and statistical models that characterize relationships between variables Finds a linear relationship between: - one independent variable X and -
More information4/4/2018. Stepwise model fitting. CCA with first three variables only Call: cca(formula = community ~ env1 + env2 + env3, data = envdata)
0 Correlation matrix for ironmental matrix 1 2 3 4 5 6 7 8 9 10 11 12 0.087451 0.113264 0.225049-0.13835 0.338366-0.01485 0.166309-0.11046 0.088327-0.41099-0.19944 1 1 2 0.087451 1 0.13723-0.27979 0.062584
More informationMultivariate Analysis of Ecological Data using CANOCO
Multivariate Analysis of Ecological Data using CANOCO JAN LEPS University of South Bohemia, and Czech Academy of Sciences, Czech Republic Universitats- uric! Lanttesbibiiothek Darmstadt Bibliothek Biologie
More informationCAP. Canonical Analysis of Principal coordinates. A computer program by Marti J. Anderson. Department of Statistics University of Auckland (2002)
CAP Canonical Analysis of Principal coordinates A computer program by Marti J. Anderson Department of Statistics University of Auckland (2002) 2 DISCLAIMER This FORTRAN program is provided without any
More informationOrdination & PCA. Ordination. Ordination
Ordination & PCA Introduction to Ordination Purpose & types Shepard diagrams Principal Components Analysis (PCA) Properties Computing eigenvalues Computing principal components Biplots Covariance vs. Correlation
More informationDiscrimination Among Groups. Discrimination Among Groups
Discrimination Among Groups Id Species Canopy Snag Canopy Cover Density Height 1 A 80 1.2 35 2 A 75 0.5 32 3 A 72 2.8 28..... 31 B 35 3.3 15 32 B 75 4.1 25 60 B 15 5.0 3..... 61 C 5 2.1 5 62 C 8 3.4 2
More informationDistance Measures. Objectives: Discuss Distance Measures Illustrate Distance Measures
Distance Measures Objectives: Discuss Distance Measures Illustrate Distance Measures Quantifying Data Similarity Multivariate Analyses Re-map the data from Real World Space to Multi-variate Space Distance
More informationPurposes of Data Analysis. Variables and Samples. Parameters and Statistics. Part 1: Probability Distributions
Part 1: Probability Distributions Purposes of Data Analysis True Distributions or Relationships in the Earths System Probability Distribution Normal Distribution Student-t Distribution Chi Square Distribution
More informationFundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur
Fundamentals to Biostatistics Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Statistics collection, analysis, interpretation of data development of new
More informationRevision: Chapter 1-6. Applied Multivariate Statistics Spring 2012
Revision: Chapter 1-6 Applied Multivariate Statistics Spring 2012 Overview Cov, Cor, Mahalanobis, MV normal distribution Visualization: Stars plot, mosaic plot with shading Outlier: chisq.plot Missing
More informationParametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami
Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric Assumptions The observations must be independent. Dependent variable should be continuous
More information* Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course.
Name of the course Statistical methods and data analysis Audience The course is intended for students of the first or second year of the Graduate School in Materials Engineering. The aim of the course
More informationBIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression
BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression Introduction to Correlation and Regression The procedures discussed in the previous ANOVA labs are most useful in cases where we are interested
More informationFACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING
FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING Vishwanath Mantha Department for Electrical and Computer Engineering Mississippi State University, Mississippi State, MS 39762 mantha@isip.msstate.edu ABSTRACT
More informationStatistics Toolbox 6. Apply statistical algorithms and probability models
Statistics Toolbox 6 Apply statistical algorithms and probability models Statistics Toolbox provides engineers, scientists, researchers, financial analysts, and statisticians with a comprehensive set of
More informationEXAM PRACTICE. 12 questions * 4 categories: Statistics Background Multivariate Statistics Interpret True / False
EXAM PRACTICE 12 questions * 4 categories: Statistics Background Multivariate Statistics Interpret True / False Stats 1: What is a Hypothesis? A testable assertion about how the world works Hypothesis
More informationMSc in Statistics and Operations Research
MSc in Statistics and Operations Research Title: Permutation multivariate analysis of variance on real data and simulations to evaluate for robustness against dispersion and unbalancedness. Author: Lucas
More informationUnsupervised learning: beyond simple clustering and PCA
Unsupervised learning: beyond simple clustering and PCA Liza Rebrova Self organizing maps (SOM) Goal: approximate data points in R p by a low-dimensional manifold Unlike PCA, the manifold does not have
More informationNonparametric Statistics. Leah Wright, Tyler Ross, Taylor Brown
Nonparametric Statistics Leah Wright, Tyler Ross, Taylor Brown Before we get to nonparametric statistics, what are parametric statistics? These statistics estimate and test population means, while holding
More informationChapter 11 Canonical analysis
Chapter 11 Canonical analysis 11.0 Principles of canonical analysis Canonical analysis is the simultaneous analysis of two, or possibly several data tables. Canonical analyses allow ecologists to perform
More information1.3. Principal coordinate analysis. Pierre Legendre Département de sciences biologiques Université de Montréal
1.3. Pierre Legendre Département de sciences biologiques Université de Montréal http://www.numericalecology.com/ Pierre Legendre 2018 Definition of principal coordinate analysis (PCoA) An ordination method
More informationChapter 1. Gaining Knowledge with Design of Experiments
Chapter 1 Gaining Knowledge with Design of Experiments 1.1 Introduction 2 1.2 The Process of Knowledge Acquisition 2 1.2.1 Choosing the Experimental Method 5 1.2.2 Analyzing the Results 5 1.2.3 Progressively
More informationStatistics II 1. Modelling Biology. Basic Applications of Mathematics and Statistics in the Biological Sciences
Statistics II Modelling Biology Basic Applications of Mathematics and Statistics in the Biological Sciences Part II: Data Analysis and Statistics Script C Introductory Course for Students of Biology, Biotechnology
More informationSmall n, σ known or unknown, underlying nongaussian
READY GUIDE Summary Tables SUMMARY-1: Methods to compute some confidence intervals Parameter of Interest Conditions 95% CI Proportion (π) Large n, p 0 and p 1 Equation 12.11 Small n, any p Figure 12-4
More information4/6/16. Non-parametric Test. Overview. Stephen Opiyo. Distinguish Parametric and Nonparametric Test Procedures
Non-parametric Test Stephen Opiyo Overview Distinguish Parametric and Nonparametric Test Procedures Explain commonly used Nonparametric Test Procedures Perform Hypothesis Tests Using Nonparametric Procedures
More informationCorrelation. A statistics method to measure the relationship between two variables. Three characteristics
Correlation Correlation A statistics method to measure the relationship between two variables Three characteristics Direction of the relationship Form of the relationship Strength/Consistency Direction
More informationINTRODUCTION TO MULTIVARIATE ANALYSIS OF ECOLOGICAL DATA
INTRODUCTION TO MULTIVARIATE ANALYSIS OF ECOLOGICAL DATA David Zelený & Ching-Feng Li INTRODUCTION TO MULTIVARIATE ANALYSIS Ecologial similarity similarity and distance indices Gradient analysis regression,
More informationFactors affecting the Power and Validity of Randomization-based Multivariate Tests for Difference among Ecological Assemblages
Factors affecting the Power and Validity of Randomization-based Multivariate Tests for Difference among Ecological Assemblages Cameron Hurst B.Sc. (Hons) This thesis was submitted in fulfillment of the
More informationLecture 6: Single-classification multivariate ANOVA (k-group( MANOVA)
Lecture 6: Single-classification multivariate ANOVA (k-group( MANOVA) Rationale and MANOVA test statistics underlying principles MANOVA assumptions Univariate ANOVA Planned and unplanned Multivariate ANOVA
More informationSelf Organizing Maps
Sta306b May 21, 2012 Dimension Reduction: 1 Self Organizing Maps A SOM represents the data by a set of prototypes (like K-means. These prototypes are topologically organized on a lattice structure. In
More informationSTAT 730 Chapter 14: Multidimensional scaling
STAT 730 Chapter 14: Multidimensional scaling Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Data Analysis 1 / 16 Basic idea We have n objects and a matrix
More informationSRI RAMAKRISHNA INSTITUTE OF TECHNOLOGY DEPARTMENT OF SCIENCE & HUMANITIES STATISTICS & NUMERICAL METHODS TWO MARKS
SRI RAMAKRISHNA INSTITUTE OF TECHNOLOGY DEPARTMENT OF SCIENCE & HUMANITIES STATISTICS & NUMERICAL METHODS TWO MARKS UNIT-I HYPOTHESIS TESTING 1. What are the applications of distributions? * Test the hypothesis
More information2 Dean C. Adams and Gavin J. P. Naylor the best three-dimensional ordination of the structure space is found through an eigen-decomposition (correspon
A Comparison of Methods for Assessing the Structural Similarity of Proteins Dean C. Adams and Gavin J. P. Naylor? Dept. Zoology and Genetics, Iowa State University, Ames, IA 50011, U.S.A. 1 Introduction
More informationAlgebra of Principal Component Analysis
Algebra of Principal Component Analysis 3 Data: Y = 5 Centre each column on its mean: Y c = 7 6 9 y y = 3..6....6.8 3. 3.8.6 Covariance matrix ( variables): S = -----------Y n c ' Y 8..6 c =.6 5.8 Equation
More informationCLASSICAL NORMAL-BASED DISCRIMINANT ANALYSIS
CLASSICAL NORMAL-BASED DISCRIMINANT ANALYSIS EECS 833, March 006 Geoff Bohling Assistant Scientist Kansas Geological Survey geoff@gs.u.edu 864-093 Overheads and resources available at http://people.u.edu/~gbohling/eecs833
More informationBackground to Statistics
FACT SHEET Background to Statistics Introduction Statistics include a broad range of methods for manipulating, presenting and interpreting data. Professional scientists of all kinds need to be proficient
More informationGlossary for the Triola Statistics Series
Glossary for the Triola Statistics Series Absolute deviation The measure of variation equal to the sum of the deviations of each value from the mean, divided by the number of values Acceptance sampling
More informationAP Statistics Cumulative AP Exam Study Guide
AP Statistics Cumulative AP Eam Study Guide Chapters & 3 - Graphs Statistics the science of collecting, analyzing, and drawing conclusions from data. Descriptive methods of organizing and summarizing statistics
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2
MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 1 Bootstrapped Bias and CIs Given a multiple regression model with mean and
More informationPrincipal Components Analysis. Sargur Srihari University at Buffalo
Principal Components Analysis Sargur Srihari University at Buffalo 1 Topics Projection Pursuit Methods Principal Components Examples of using PCA Graphical use of PCA Multidimensional Scaling Srihari 2
More informationStatistics Handbook. All statistical tables were computed by the author.
Statistics Handbook Contents Page Wilcoxon rank-sum test (Mann-Whitney equivalent) Wilcoxon matched-pairs test 3 Normal Distribution 4 Z-test Related samples t-test 5 Unrelated samples t-test 6 Variance
More informationTypes of Statistical Tests DR. MIKE MARRAPODI
Types of Statistical Tests DR. MIKE MARRAPODI Tests t tests ANOVA Correlation Regression Multivariate Techniques Non-parametric t tests One sample t test Independent t test Paired sample t test One sample
More informationGlossary. The ISI glossary of statistical terms provides definitions in a number of different languages:
Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the
More informationAgonistic Display in Betta splendens: Data Analysis I. Betta splendens Research: Parametric or Non-parametric Data?
Agonistic Display in Betta splendens: Data Analysis By Joanna Weremjiwicz, Simeon Yurek, and Dana Krempels Once you have collected data with your ethogram, you are ready to analyze that data to see whether
More informationPrincipal component analysis
Principal component analysis Motivation i for PCA came from major-axis regression. Strong assumption: single homogeneous sample. Free of assumptions when used for exploration. Classical tests of significance
More informationPredictive analysis on Multivariate, Time Series datasets using Shapelets
1 Predictive analysis on Multivariate, Time Series datasets using Shapelets Hemal Thakkar Department of Computer Science, Stanford University hemal@stanford.edu hemal.tt@gmail.com Abstract Multivariate,
More information2/19/2018. Dataset: 85,122 islands 19,392 > 1km 2 17,883 with data
The group numbers are arbitrary. Remember that you can rotate dendrograms around any node and not change the meaning. So, the order of the clusters is not meaningful. Taking a subset of the data changes
More informationChapter 1 Statistical Inference
Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations
More informationEvaluation Strategies
Evaluation Intrinsic Evaluation Comparison with an ideal output: Challenges: Requires a large testing set Intrinsic subjectivity of some discourse related judgments Hard to find corpora for training/testing
More informationDiscrete Multivariate Statistics
Discrete Multivariate Statistics Univariate Discrete Random variables Let X be a discrete random variable which, in this module, will be assumed to take a finite number of t different values which are
More informationCHAPTER 5. Outlier Detection in Multivariate Data
CHAPTER 5 Outlier Detection in Multivariate Data 5.1 Introduction Multivariate outlier detection is the important task of statistical analysis of multivariate data. Many methods have been proposed for
More informationChapter 4: Regression Models
Sales volume of company 1 Textbook: pp. 129-164 Chapter 4: Regression Models Money spent on advertising 2 Learning Objectives After completing this chapter, students will be able to: Identify variables,
More informationBasic Statistical Analysis
indexerrt.qxd 8/21/2002 9:47 AM Page 1 Corrected index pages for Sprinthall Basic Statistical Analysis Seventh Edition indexerrt.qxd 8/21/2002 9:47 AM Page 656 Index Abscissa, 24 AB-STAT, vii ADD-OR rule,
More informationROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015
ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti
More informationCourse in Data Science
Course in Data Science About the Course: In this course you will get an introduction to the main tools and ideas which are required for Data Scientist/Business Analyst/Data Analyst. The course gives an
More informationLinear Dimensionality Reduction
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis
More informationShort Answer Questions: Answer on your separate blank paper. Points are given in parentheses.
ISQS 6348 Final exam solutions. Name: Open book and notes, but no electronic devices. Answer short answer questions on separate blank paper. Answer multiple choice on this exam sheet. Put your name on
More informationModule 9: Nonparametric Statistics Statistics (OA3102)
Module 9: Nonparametric Statistics Statistics (OA3102) Professor Ron Fricker Naval Postgraduate School Monterey, California Reading assignment: WM&S chapter 15.1-15.6 Revision: 3-12 1 Goals for this Lecture
More informationI L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN
Introduction Edps/Psych/Stat/ 584 Applied Multivariate Statistics Carolyn J Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN c Board of Trustees,
More informationChapte The McGraw-Hill Companies, Inc. All rights reserved.
er15 Chapte Chi-Square Tests d Chi-Square Tests for -Fit Uniform Goodness- Poisson Goodness- Goodness- ECDF Tests (Optional) Contingency Tables A contingency table is a cross-tabulation of n paired observations
More information-However, this definition can be expanded to include: biology (biometrics), environmental science (environmetrics), economics (econometrics).
Chemometrics Application of mathematical, statistical, graphical or symbolic methods to maximize chemical information. -However, this definition can be expanded to include: biology (biometrics), environmental
More informationPreprocessing & dimensionality reduction
Introduction to Data Mining Preprocessing & dimensionality reduction CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University Fall 2016 CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall 2016
More informationSources of randomness
Random Number Generator Chapter 7 In simulations, we generate random values for variables with a specified distribution Ex., model service times using the exponential distribution Generation of random
More informationAnalysing data: regression and correlation S6 and S7
Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association
More informationLecture 5: Ecological distance metrics; Principal Coordinates Analysis. Univariate testing vs. community analysis
Lecture 5: Ecological distance metrics; Principal Coordinates Analysis Univariate testing vs. community analysis Univariate testing deals with hypotheses concerning individual taxa Is this taxon differentially
More informationCorrespondence Analysis & Related Methods
Correspondence Analysis & Related Methods Michael Greenacre SESSION 9: CA applied to rankings, preferences & paired comparisons Correspondence analysis (CA) can also be applied to other types of data:
More informationMachine Learning Linear Regression. Prof. Matteo Matteucci
Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares
More informationESP 178 Applied Research Methods. 2/23: Quantitative Analysis
ESP 178 Applied Research Methods 2/23: Quantitative Analysis Data Preparation Data coding create codebook that defines each variable, its response scale, how it was coded Data entry for mail surveys and
More informationMULTIVARIATE ANALYSIS OF VARIANCE
MULTIVARIATE ANALYSIS OF VARIANCE RAJENDER PARSAD AND L.M. BHAR Indian Agricultural Statistics Research Institute Library Avenue, New Delhi - 0 0 lmb@iasri.res.in. Introduction In many agricultural experiments,
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More information1km. the estuary (J). The spatial extent of a site (50m x25m) is approximately as tall as each letter and twice as wide
1km Fig. 5. Map of sampling sites within the Orewa estuary. Sites are labeled alphabetically and sequentially from the estuary mouth (A) to the inner reaches of the estuary (J). The spatial extent of a
More informationLecture: Mixture Models for Microbiome data
Lecture: Mixture Models for Microbiome data Lecture 3: Mixture Models for Microbiome data Outline: - - Sequencing thought experiment Mixture Models (tangent) - (esp. Negative Binomial) - Differential abundance
More informationLINGUIST 716 Week 9: Compuational methods for finding dimensions
LINGUIST 716 Week 9: Compuational methods for finding dimensions Kristine Yu Department of Linguistics, UMass Amherst November 1, 2013 Computational methods for finding dimensions 716 Fall 2013 Week 9
More informationCorrespondence Analysis & Related Methods
Corresponence Analysis & Relate Methos Michael Greenacre SESSION 3: MUIDIMENSIONA SCAING (MDS DIMENSION REDUCION CASSICA MDS NONMERIC MDS Distances an issimilarities... n objects = istance between object
More informationDisadvantages of using many pooled t procedures. The sampling distribution of the sample means. The variability between the sample means
Stat 529 (Winter 2011) Analysis of Variance (ANOVA) Reading: Sections 5.1 5.3. Introduction and notation Birthweight example Disadvantages of using many pooled t procedures The analysis of variance procedure
More informationLinear Regression Models
Linear Regression Models Model Description and Model Parameters Modelling is a central theme in these notes. The idea is to develop and continuously improve a library of predictive models for hazards,
More informationCorrelation and Regression (Excel 2007)
Correlation and Regression (Excel 2007) (See Also Scatterplots, Regression Lines, and Time Series Charts With Excel 2007 for instructions on making a scatterplot of the data and an alternate method of
More informationSimulating Uniform- and Triangular- Based Double Power Method Distributions
Journal of Statistical and Econometric Methods, vol.6, no.1, 2017, 1-44 ISSN: 1792-6602 (print), 1792-6939 (online) Scienpress Ltd, 2017 Simulating Uniform- and Triangular- Based Double Power Method Distributions
More informationFreeman (2005) - Graphic Techniques for Exploring Social Network Data
Freeman (2005) - Graphic Techniques for Exploring Social Network Data The analysis of social network data has two main goals: 1. Identify cohesive groups 2. Identify social positions Moreno (1932) was
More informationPrentice Hall Stats: Modeling the World 2004 (Bock) Correlated to: National Advanced Placement (AP) Statistics Course Outline (Grades 9-12)
National Advanced Placement (AP) Statistics Course Outline (Grades 9-12) Following is an outline of the major topics covered by the AP Statistics Examination. The ordering here is intended to define the
More informationLecture 3: Mixture Models for Microbiome data. Lecture 3: Mixture Models for Microbiome data
Lecture 3: Mixture Models for Microbiome data 1 Lecture 3: Mixture Models for Microbiome data Outline: - Mixture Models (Negative Binomial) - DESeq2 / Don t Rarefy. Ever. 2 Hypothesis Tests - reminder
More information