Fitted vectors. Ordination and environment. Alternatives to vectors. Example: River bryophytes

Size: px
Start display at page:

Download "Fitted vectors. Ordination and environment. Alternatives to vectors. Example: River bryophytes"

Transcription

1 Vegetation Analysis Ordination and environment Slide 1 Vegetation Analysis Ordination and environment Slide Ordination and environment We take granted that vegetation is controlled by environment, so 1. Two sites close to each other in ordination have similar vegetation, and. If two sites have similar vegetation, they have similar environment; moreover. Two sites far away from each other in ordination have dissimilar vegetation, and perhaps. If two sites have different vegetation, they have different environment Fitted vectors Direction of fitted vector shows the gradient, length shows its importance. For every arrow, there is an equally long arrow into opposite direction: Decreasing direction of the gradient. Implies a linear model: roject sample plots onto the vector for expected value. Class values as weighted averages. Dimension! Fe ph 1 Mo N S ! Zn Dimension 1 Ca Mg Humdepth Mn 8 Vegetation Analysis Ordination and environment Slide 1 ternatives to vectors Vegetation Analysis Ordination and environment Slide 1 Example: River bryophytes Fitted vectors natural in constrained ordination, since these have linear constraints. Distant sites are different, but may be different in various ways: Environmental variables may have a non-linear relation to ordination. mds artsize r = 0. mds Currvel r = 0. mds slope r = 0.1 mds Depth r = 0.1 Dimension!1.0! Linear r = 0. Ca Dimension!1.0! Observed values Ca Dimension!1.0! GAM Ca mds!1.!1.0! width r = 0. mds!1.!1.0! Instab r = 0.0 mds!1.!1.0! ph r = 0.88 mds!1.!1.0! Conduc r = 0.! Dimension 1! Dimension 1! Dimension 1!1.!1.0! !1.!1.0! !1.!1.0! !1.!1.0!

2 Vegetation Analysis Ordination and environment Slide 1 Vegetation Analysis Ordination and environment Slide 1 Lessons from environmental interpretation Constrained vs. unconstrained aims Environmental variables need not be parallel to ordination axes. Axes cannot be taken as gradients, but gradients are oblique to axes: You cannot tear off an axis from an ordination. Never calculate a correlation between an axis and an environmental variable. Environmental variables need not be linearly correlated with the ordination, but locations in ordination can be exceptional. Unconstrained ordination tries to display the variation in data. Constrained ordination tries to display only the variation that can be explained with constraining variables. You can observe only things that you have measured. V V1 Vegetation Analysis Ordination and environment Slide Vegetation Analysis Ordination and environment Slide 1 The constraining toolbox Constrained Correspondence Analysis (CCA) Linear tools based on CA framework: Discriminant analysis, Canonical Correlations Redundancy Analysis (RDA). Only RDA useful in community ecology if linear model is adequate. Unimodal tools based on CA framework: Constrained or Canonical Correspondence Analysis (CCA). Absolutely the most important constrained ordination in ecology: The only one dealt with in these lectures. Ordinary Correspondence analysis gives: 1. Site scores which may be regarded as describing the gradients.. Species scores which may be taken as location of species optima in the space spanned by site scores. Constrained or Canonical Correspondence Analysis gives in addition:. Environmental scores which define the gradient space. And optimizes the interpretability of results.

3 Vegetation Analysis Ordination and environment Slide 0 Cla.ste CCA: gorithm 1. Fit weighted linear regression to all species individually using all constraints as explonatory variables.. Analyse fitted values using CA! le.sch! Vegetation Analysis Ordination and environment Slide 1 Arbitrary Site Scores Species scores as weighted averages of Site Scores Site scores as weigtehd averages of Species Scores Orthogonalize Site Scores Any Change in Scores? CCA: ternating regression algorithm CA DCA CCA Arbitrary Site Scores Species Scores as weighted averages of Site Scores Site Scores as weighted averages of Species Scores Detrend Site Scores Any Change in Scores? Arbitrary LC Scores Species Scores as weighted averages of LC Scores WA Scores as weighted averages of Species Scores LC Site Scores as predicted values of linear regression Any Change in Scores? Two kind of site scores: 1. LC scores are predicted values of multiple regression with constraining variables = the constraints.. WA Scores are weighted averages of species scores. Vegetation Analysis Ordination and environment Slide Vegetation Analysis Ordination and environment Slide Those numbers... Eigenvalues exactly like in CA. CCA eigenvalue should be lower than in CA or constraining may have been useless. Eigenvalue has nothing to do with variance, so there is neither variance explained. Species Environment correlation: Multiple correlation from constraining regression: Usually high even with poor models. ointwise goodness of fit can be expressed either as residual distance from the ordination space or as proportion of projection from thet total Chi-squared distance exactly like in CA. Like CA plot, but now a triplot: Vectors for linear constraints. Classes as weighted averages. Most use LC scores: These are the constraints. opular to scale species relative to eigenvalues, but keep sites unscaled. Sites do not display the configuration, but their projections onto environmental vectors are the estimated values. CCA plot Bar.lyc Bet.pub 1 ti.cil 1 1 Cla.bot Cal.vul Led.pal ol.com Ich.eri Dic.fus Vac.uli Dic.pol Dip.mon Cla.ama Cla.def Cla.arb Cla.cri 0 Emp.nig Cla.fim Cet.isl Ste.sp 1 Cla.unc Cla.gra Cla.ran Vac.myr Vac.vit Cla.coc Cla.cor ol.jun in.syl oh.nut Cla.chl Cet.eri ol.pil el.aph Cla.sp Des.flele.sch Nep.arc Cla.ste 1 Cla.cerCet.niv Dic.sp Cla.phy 1 Hyl.spl 8!

4 Vegetation Analysis Ordination and environment Slide Class constraints Vegetation Analysis Ordination and environment Slide redicted values of constraints Class variables usually as dummy variates: Make m 1 indicator variables out of m levels Indicator scoring: 1 if site belongs the the class, 0 otherwise One dummy less than levels, because all are redundant Ordered factors may be better expressed with polynomial constraints! X1 X X X X X X1 X 1 X X X X X X1 X1 Moisture.Q Moisture.C X1 Moisture.L X8 X1 X0 X1 roject a site point onto environmental arrow: rediction Exact with two constraints: Multidimensional space warped! Vegetation Analysis Ordination and environment Slide LC or WA Scores? Vegetation Analysis Ordination and environment Slide WA and LC scores with class constraints Mike almer: Use LC scores, because they give the best fit with the environment, and WA scores are a step from CCA towards CA. Bruce McCune: LC scores are excellent, if you have no error in constraining variables. Even with small error, LC scores become miserable, but WA scores are good even in noisy data. Error!free Constraints Noisy Constraints LC Scores LC Scores x.t, Normal error sd=0. Error!free Constraints Noisy Constraints WA Scores WA Scores 1. Make class centroids as distinct as possible.. Make clouds about centroids as compact as possible. Success λ. LC scores are the class centroids: The expected locations. If high λ, WA scores are close to LC scores. With several class variables, or together with continuous variables, the simple structure becomess blurred. X1 X1 LC Scores X X1 X1 X X8 X X1 X X1 X0 X X X1 X X1 X X X X!! 0 WA Scores X1 X1 X1 X0 X X X X X X X8 X X1X X X X1X!! 0 Dune meadows, Landuse constraints

5 Vegetation Analysis Ordination and environment Slide 8 LC scores are the constraints Vegetation Analysis Ordination and environment Slide Number of constraints and the plot LC scores are the scaled and weighted constraints and shuffling the order of sites of the community data does not change the configuration Shuffled sites, LC scores Cla.phy 1 Hyl.spl 1 Cla.ste Ca S Cla.chl Mg Zn 1 Des.fle Vac.myr le.sch ol.com Cet.isl Cla.sp Humdepth Mn Dic.sp ol.jun oh.nut in.syl Nep.arc el.aph Dic.pol Cla.unc Cla.cor Emp.nig Vac.vit Led.pal Cla.cri ph Cla.def ti.cil Cla.fim Cet.eri Cla.cer 1 Cla.coc Bet.pub Bar.lyc Cla.bot Cla.gra Mo Fe 0 Dic.fus ol.pil Cla.ran 1 Cet.niv Cla.arb 1 N Dip.mon Cal.vul Cla.ama Vac.uli Ste.sp Ich.eri Cet.niv Cla.cer 1 Cla.phy 1 Cla.ste el.aph Nep.arc Cla.spCet.eri Cla.chl oh.nut ol.jun ol.pil le.sch Cla.cor Vac.vit Dip.mon Des.fle in.syl Cla.gra Cla.unc Emp.nig Dic.pol Cla.ranCla.ama Cla.coc Cet.isl Cla.def 0 Vac.myr Cla.fim Vac.uliCla.arb Ste.sp Cla.cri ol.com 1 Ich.eri Cal.vul Led.pal ti.cil Cla.bot Dic.fus 1 Bet.pub Bar.lyc 1 Dic.sp 1 Hyl.spl 8 1! Vegetation Analysis Ordination and environment Slide Vegetation Analysis Ordination and environment Slide Number of constraints and curvature Curvature cured because forced to linear constraints. High number of constraints = no constraint. Absolute limit: Number of constraints = min(s, N) 1, but release from the constraints can begin much earlier. Reduce environmental variables so that only the important remain: Heuristic value better than statistics. Reduces multicollinearity as well. Eigenvalue Unconstrained 1 Constraints Constraints 1 0 Axis DECORANA in Disguise Constrained Correspondence Analysis replaced Decorana as the canonical method and indeed, it is Decorana in disguise Detrending: Based on fitted values from linear regression Rescaling: Linear combinations of environmental variables scaled similarly Downweighting: Rare species fit poorly

6 Vegetation Analysis Ordination and environment Slide 1 olynomial Constraints: A Bad Idea Vegetation Analysis Ordination and environment Slide Constrained horseshoe Unconstrained CA produces curves, because species have non-linear responses to gradients Constrained CA straightens up curves, because it forces linear species responses olynomial constraints produce quadratic fitted values: Ordination will be quadratic. y (sd units)! Truth 0 1 x (sd units) CCA, Linear Constraints y x CA!1! !!! 0 1 CA! CA1 CCA, olynomial Constraints x^ x y^ y!.0! Curve is removed in CCA because the solution is forced to linear constraints If contraints have a quadratic relation to each other, a curve may re-appear X olynomial constraints and interactions are X1 generally a bad idea! X1 X X X X 1 X X X1 X X X X1 Moisture.Q Moisture.C X Moisture.L X8 X1 X0 X1 X1 Vegetation Analysis Ordination and environment Slide 1 Levels of environmental intervention Vegetation Analysis Ordination and environment Slide 1 Significance of constraints Background variables (covariates) Environemntal variables (constraints) Environmental variates (correlates) constrain constrain correlate (residual) (residual) partial CCA CCA CA DCA artial CCA removes the effect of background variables before proper (C)CA: random or nuisance variables. Residual ordinations may be analysed at all level: artitioning of variation. Constraints are linear: If levels of environmental variables are not orthogonal, this may result in negative components of variation. Information of lower levels mixed with upper. CCA maximizes eigenvalue with constraints. ermutation tests can be used to assess significance: ermute lines of environmental data. Repeat CCA with permuted data. If observed λ higher than (most) permutations, regarded as significant. Many constraints = much opportunity for optimizing: Significance usually lower. Eigenvalue ermutation test with constraints 0 permutations 1 Dimension

7 Vegetation Analysis Ordination and environment Slide 1 ermutation statistic Without constraints, sum of all eigenvalues is a natural choice Testing of first eigenvalue has an unclear meaning In partial models, use pseudo-f : ( p ) ( q ) F p,q = λ (c) i /p / λ (r) i /q i with constrained (c) and residual (r) eigenvalues λ and respective number of axes p and q. Not at all distributed like real F, but used in permutation tests i Vegetation Analysis Ordination and environment Slide 1 Number of permutations Too few permutations: Cannot detect significant response when it is close to a critical limit Too many permutations waste time Sequential testing: ermute so many times that assessed significance is certainly outside the grey zone where it could be either significant or nonsignificant Number of ermutations Vegetation Analysis Ordination and environment Slide What is permuted? Vegetation Analysis Ordination and environment Slide 1 Selecting constraining variables No conditioning variables: Community data or constraints can be permuted In partial models ith conditioning variables: Community data cannot be permuted, because it is dependent on conditions Small number of variables means stricter constraints, reduced curvature, improve interpretation, increases signficance: Try to end up with one or two constraints for each independent factor. Significance tests and visual inspection help in selecting environmental variables. Constraints cannot be permuted, because they correlate with conditions Residuals are exchangeable if they are independent and identically distributed... Automated selection dangerous: Small changes in data set can change the whole selection history, and omission of a variable does not mean it is unimportant. Final selection must be made with heuristic criteria. Reduced model permutes residuals after conditions, Full model residuals after conditions and constraints The purpose of computation is insight, not numbers

8 Vegetation Analysis Ordination and environment Slide Vegetation Analysis Ordination and environment Slide Automatic stepping is dangerous Automatic model selection may give different results depending on stepping direction, scope or small changes in the data set! Forward ! 8 8 Forward w/interactions : 1 0 : Humdepth Backward ! S 1 Components of Variation Take a partial model CCA(Y X Z) The explained Inertia can be decomposed into two components: 1. Explained by X in a simple model CCA(Y X). The residual effect of X after removing the variation caused by the conditioning variable Z After conditioning by Z, the eigenvalue of X decreases by the amount of shared component of variation Vegetation Analysis Ordination and environment Slide 1 Negative Components of Variation Constraints: Ca ph Red line Ca lightblue levels ph CCA(Y X Z) is equal to CCA(Res(CCA(Y Z)) X Z) If variables are better predictors together than in isolation: λ XZ > λ X λ Z Constraints allow the reappearance of the curve λ Ca = 0.1 λ Ca ph = 0.

Multivariate Analysis of Ecological Communities in R

Multivariate Analysis of Ecological Communities in R Multivariate Analysis of Ecological Communities in R Jari Oksanen March 16, 2005 Abstract This tutorial demostrates the use of basic ordination methods in R package vegan. The tutorial assumes basic familiarity

More information

Multivariate Analysis of Ecological Communities in R: vegan tutorial

Multivariate Analysis of Ecological Communities in R: vegan tutorial Multivariate Analysis of Ecological Communities in R: vegan tutorial Jari Oksanen June 9, 008 Abstract This tutorial demostrates the use of ordination methods in R package vegan. The tutorial assumes familiarity

More information

Multivariate Analysis of Ecological Communities in R: vegan tutorial

Multivariate Analysis of Ecological Communities in R: vegan tutorial Multivariate Analysis of Ecological Communities in R: vegan tutorial Jari Oksanen August 11, 011 Abstract This tutorial demostrates the use of ordination methods in R package vegan. The tutorial assumes

More information

-Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the

-Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the 1 2 3 -Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the 1950's. -PCA is based on covariance or correlation

More information

4/2/2018. Canonical Analyses Analysis aimed at identifying the relationship between two multivariate datasets. Cannonical Correlation.

4/2/2018. Canonical Analyses Analysis aimed at identifying the relationship between two multivariate datasets. Cannonical Correlation. GAL50.44 0 7 becki 2 0 chatamensis 0 darwini 0 ephyppium 0 guntheri 3 0 hoodensis 0 microphyles 0 porteri 2 0 vandenburghi 0 vicina 4 0 Multiple Response Variables? Univariate Statistics Questions Individual

More information

Chapter 11 Canonical analysis

Chapter 11 Canonical analysis Chapter 11 Canonical analysis 11.0 Principles of canonical analysis Canonical analysis is the simultaneous analysis of two, or possibly several data tables. Canonical analyses allow ecologists to perform

More information

Multivariate Analysis of Ecological Data

Multivariate Analysis of Ecological Data Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology

More information

Multivariate Analysis of Ecological Data using CANOCO

Multivariate Analysis of Ecological Data using CANOCO Multivariate Analysis of Ecological Data using CANOCO JAN LEPS University of South Bohemia, and Czech Academy of Sciences, Czech Republic Universitats- uric! Lanttesbibiiothek Darmstadt Bibliothek Biologie

More information

Introduction to ordination. Gary Bradfield Botany Dept.

Introduction to ordination. Gary Bradfield Botany Dept. Introduction to ordination Gary Bradfield Botany Dept. Ordination there appears to be no word in English which one can use as an antonym to classification ; I would like to propose the term ordination.

More information

Unconstrained Ordination

Unconstrained Ordination Unconstrained Ordination Sites Species A Species B Species C Species D Species E 1 0 (1) 5 (1) 1 (1) 10 (4) 10 (4) 2 2 (3) 8 (3) 4 (3) 12 (6) 20 (6) 3 8 (6) 20 (6) 10 (6) 1 (2) 3 (2) 4 4 (5) 11 (5) 8 (5)

More information

INTRODUCTION TO MULTIVARIATE ANALYSIS OF ECOLOGICAL DATA

INTRODUCTION TO MULTIVARIATE ANALYSIS OF ECOLOGICAL DATA INTRODUCTION TO MULTIVARIATE ANALYSIS OF ECOLOGICAL DATA David Zelený & Ching-Feng Li INTRODUCTION TO MULTIVARIATE ANALYSIS Ecologial similarity similarity and distance indices Gradient analysis regression,

More information

Experimental Design and Data Analysis for Biologists

Experimental Design and Data Analysis for Biologists Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1

More information

Multivariate Statistics Summary and Comparison of Techniques. Multivariate Techniques

Multivariate Statistics Summary and Comparison of Techniques. Multivariate Techniques Multivariate Statistics Summary and Comparison of Techniques P The key to multivariate statistics is understanding conceptually the relationship among techniques with regards to: < The kinds of problems

More information

Why ordination? Major ordination methods. Species space

Why ordination? Major ordination methods. Species space Vegetaton Analyss Major ordnaton methods Slde 67 Vegetaton Analyss Major ordnaton methods Slde 68 Why ordnaton? Major ordnaton methods Prncpal Components Analyss (PCA) Factor Analyss (FA) Prncpal Co-ordnates

More information

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis: Uses one group of variables (we will call this X) In

More information

Gradient types. Gradient Analysis. Gradient Gradient. Community Community. Gradients and landscape. Species responses

Gradient types. Gradient Analysis. Gradient Gradient. Community Community. Gradients and landscape. Species responses Vegetation Analysis Gradient Analysis Slide 18 Vegetation Analysis Gradient Analysis Slide 19 Gradient Analysis Relation of species and environmental variables or gradients. Gradient Gradient Individualistic

More information

Multivariate Statistics 101. Ordination (PCA, NMDS, CA) Cluster Analysis (UPGMA, Ward s) Canonical Correspondence Analysis

Multivariate Statistics 101. Ordination (PCA, NMDS, CA) Cluster Analysis (UPGMA, Ward s) Canonical Correspondence Analysis Multivariate Statistics 101 Ordination (PCA, NMDS, CA) Cluster Analysis (UPGMA, Ward s) Canonical Correspondence Analysis Multivariate Statistics 101 Copy of slides and exercises PAST software download

More information

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works CS68: The Modern Algorithmic Toolbox Lecture #8: How PCA Works Tim Roughgarden & Gregory Valiant April 20, 206 Introduction Last lecture introduced the idea of principal components analysis (PCA). The

More information

Indirect Gradient Analysis

Indirect Gradient Analysis Indirect Gradient Analysis Gavin Simpson May 12, 2006 Summary This practical will use the PONDS dataset to demonstrate methods of indirect gradient analysis (PCA, CA, and DCA) of species and environmental

More information

4/4/2018. Stepwise model fitting. CCA with first three variables only Call: cca(formula = community ~ env1 + env2 + env3, data = envdata)

4/4/2018. Stepwise model fitting. CCA with first three variables only Call: cca(formula = community ~ env1 + env2 + env3, data = envdata) 0 Correlation matrix for ironmental matrix 1 2 3 4 5 6 7 8 9 10 11 12 0.087451 0.113264 0.225049-0.13835 0.338366-0.01485 0.166309-0.11046 0.088327-0.41099-0.19944 1 1 2 0.087451 1 0.13723-0.27979 0.062584

More information

PRINCIPAL COMPONENTS ANALYSIS

PRINCIPAL COMPONENTS ANALYSIS 121 CHAPTER 11 PRINCIPAL COMPONENTS ANALYSIS We now have the tools necessary to discuss one of the most important concepts in mathematical statistics: Principal Components Analysis (PCA). PCA involves

More information

Mean Ellenberg indicator values as explanatory variables in constrained ordination. David Zelený

Mean Ellenberg indicator values as explanatory variables in constrained ordination. David Zelený Mean Ellenberg indicator values as explanatory variables in constrained ordination David Zelený Heinz Ellenberg Use of mean Ellenberg indicator values in vegetation analysis species composition observed

More information

EXAM PRACTICE. 12 questions * 4 categories: Statistics Background Multivariate Statistics Interpret True / False

EXAM PRACTICE. 12 questions * 4 categories: Statistics Background Multivariate Statistics Interpret True / False EXAM PRACTICE 12 questions * 4 categories: Statistics Background Multivariate Statistics Interpret True / False Stats 1: What is a Hypothesis? A testable assertion about how the world works Hypothesis

More information

Design decisions and implementation details in vegan

Design decisions and implementation details in vegan Design decisions and implementation details in vegan Jari Oksanen processed with vegan 2.3-5 in R version 3.2.4 Patched (2016-03-28 r70390) on April 4, 2016 Abstract This document describes design decisions,

More information

An Introduction to R for the Geosciences: Ordination I

An Introduction to R for the Geosciences: Ordination I An Introduction to R for the Geosciences: Ordination I Gavin Simpson April 29, 2013 Summary This practical will use the PONDS dataset to demonstrate methods of indirect gradient analysis (PCA, CA, and

More information

UPDATE NOTES: CANOCO VERSION 3.10

UPDATE NOTES: CANOCO VERSION 3.10 UPDATE NOTES: CANOCO VERSION 3.10 Cajo J.F. ter Braak 1 New features in CANOCO 3.10 compared to CANOCO 2.x: 2 2 Summary of the ordination....................................... 5 2.1 Summary of the ordination

More information

Analysis of Multivariate Ecological Data

Analysis of Multivariate Ecological Data Analysis of Multivariate Ecological Data School on Recent Advances in Analysis of Multivariate Ecological Data 24-28 October 2016 Prof. Pierre Legendre Dr. Daniel Borcard Département de sciences biologiques

More information

Day 4: Shrinkage Estimators

Day 4: Shrinkage Estimators Day 4: Shrinkage Estimators Kenneth Benoit Data Mining and Statistical Learning March 9, 2015 n versus p (aka k) Classical regression framework: n > p. Without this inequality, the OLS coefficients have

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

1 Principal Components Analysis

1 Principal Components Analysis Lecture 3 and 4 Sept. 18 and Sept.20-2006 Data Visualization STAT 442 / 890, CM 462 Lecture: Ali Ghodsi 1 Principal Components Analysis Principal components analysis (PCA) is a very popular technique for

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Laurenz Wiskott Institute for Theoretical Biology Humboldt-University Berlin Invalidenstraße 43 D-10115 Berlin, Germany 11 March 2004 1 Intuition Problem Statement Experimental

More information

Gradient Descent. Dr. Xiaowei Huang

Gradient Descent. Dr. Xiaowei Huang Gradient Descent Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Three machine learning algorithms: decision tree learning k-nn linear regression only optimization objectives are discussed,

More information

Regression-Discontinuity Analysis

Regression-Discontinuity Analysis Page 1 of 11 Home» Analysis» Inferential Statistics» Regression-Discontinuity Analysis Analysis Requirements The basic RD Design is a two-group pretestposttest model as indicated in the design notation.

More information

Maximum variance formulation

Maximum variance formulation 12.1. Principal Component Analysis 561 Figure 12.2 Principal component analysis seeks a space of lower dimensionality, known as the principal subspace and denoted by the magenta line, such that the orthogonal

More information

Multivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis

Multivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis Multivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis For example Data reduction approaches Cluster analysis Principal components analysis

More information

CS168: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA)

CS168: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA) CS68: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA) Tim Roughgarden & Gregory Valiant April 0, 05 Introduction. Lecture Goal Principal components analysis

More information

Dimensionality Reduction Techniques (DRT)

Dimensionality Reduction Techniques (DRT) Dimensionality Reduction Techniques (DRT) Introduction: Sometimes we have lot of variables in the data for analysis which create multidimensional matrix. To simplify calculation and to get appropriate,

More information

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations. Previously Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations y = Ax Or A simply represents data Notion of eigenvectors,

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

Eigenvalues, Eigenvectors, and an Intro to PCA

Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Changing Basis We ve talked so far about re-writing our data using a new set of variables, or a new basis.

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

INTRODUCTORY REGRESSION ANALYSIS

INTRODUCTORY REGRESSION ANALYSIS ;»»>? INTRODUCTORY REGRESSION ANALYSIS With Computer Application for Business and Economics Allen Webster Routledge Taylor & Francis Croup NEW YORK AND LONDON TABLE OF CONTENT IN DETAIL INTRODUCTORY REGRESSION

More information

Principal Component Analysis CS498

Principal Component Analysis CS498 Principal Component Analysis CS498 Today s lecture Adaptive Feature Extraction Principal Component Analysis How, why, when, which A dual goal Find a good representation The features part Reduce redundancy

More information

Lab 7. Direct & Indirect Gradient Analysis

Lab 7. Direct & Indirect Gradient Analysis Lab 7 Direct & Indirect Gradient Analysis Direct and indirect gradient analysis refers to a case where you have two datasets with variables that have cause-and-effect or mutual influences on each other.

More information

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 14 Indexes for Multimedia Data 14 Indexes for Multimedia

More information

Principal Component Analysis (PCA) Theory, Practice, and Examples

Principal Component Analysis (PCA) Theory, Practice, and Examples Principal Component Analysis (PCA) Theory, Practice, and Examples Data Reduction summarization of data with many (p) variables by a smaller set of (k) derived (synthetic, composite) variables. p k n A

More information

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that Linear Regression For (X, Y ) a pair of random variables with values in R p R we assume that E(Y X) = β 0 + with β R p+1. p X j β j = (1, X T )β j=1 This model of the conditional expectation is linear

More information

Interaction effects for continuous predictors in regression modeling

Interaction effects for continuous predictors in regression modeling Interaction effects for continuous predictors in regression modeling Testing for interactions The linear regression model is undoubtedly the most commonly-used statistical model, and has the advantage

More information

Categorical Predictor Variables

Categorical Predictor Variables Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively

More information

Principal component analysis

Principal component analysis Principal component analysis Motivation i for PCA came from major-axis regression. Strong assumption: single homogeneous sample. Free of assumptions when used for exploration. Classical tests of significance

More information

Constrained and Unconstrained Optimization Prof. Adrijit Goswami Department of Mathematics Indian Institute of Technology, Kharagpur

Constrained and Unconstrained Optimization Prof. Adrijit Goswami Department of Mathematics Indian Institute of Technology, Kharagpur Constrained and Unconstrained Optimization Prof. Adrijit Goswami Department of Mathematics Indian Institute of Technology, Kharagpur Lecture - 01 Introduction to Optimization Today, we will start the constrained

More information

Linear Models 1. Isfahan University of Technology Fall Semester, 2014

Linear Models 1. Isfahan University of Technology Fall Semester, 2014 Linear Models 1 Isfahan University of Technology Fall Semester, 2014 References: [1] G. A. F., Seber and A. J. Lee (2003). Linear Regression Analysis (2nd ed.). Hoboken, NJ: Wiley. [2] A. C. Rencher and

More information

Lecture 6: Methods for high-dimensional problems

Lecture 6: Methods for high-dimensional problems Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,

More information

Lecture 13: Orthogonal projections and least squares (Section ) Thang Huynh, UC San Diego 2/9/2018

Lecture 13: Orthogonal projections and least squares (Section ) Thang Huynh, UC San Diego 2/9/2018 Lecture 13: Orthogonal projections and least squares (Section 3.2-3.3) Thang Huynh, UC San Diego 2/9/2018 Orthogonal projection onto subspaces Theorem. Let W be a subspace of R n. Then, each x in R n can

More information

Chapter 19: Logistic regression

Chapter 19: Logistic regression Chapter 19: Logistic regression Self-test answers SELF-TEST Rerun this analysis using a stepwise method (Forward: LR) entry method of analysis. The main analysis To open the main Logistic Regression dialog

More information

Least Squares Optimization

Least Squares Optimization Least Squares Optimization The following is a brief review of least squares optimization and constrained optimization techniques. Broadly, these techniques can be used in data analysis and visualization

More information

BIO 682 Multivariate Statistics Spring 2008

BIO 682 Multivariate Statistics Spring 2008 BIO 682 Multivariate Statistics Spring 2008 Steve Shuster http://www4.nau.edu/shustercourses/bio682/index.htm Lecture 11 Properties of Community Data Gauch 1982, Causton 1988, Jongman 1995 a. Qualitative:

More information

4. Ordination in reduced space

4. Ordination in reduced space Université Laval Analyse multivariable - mars-avril 2008 1 4.1. Generalities 4. Ordination in reduced space Contrary to most clustering techniques, which aim at revealing discontinuities in the data, ordination

More information

Eigenvalues, Eigenvectors, and an Intro to PCA

Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Changing Basis We ve talked so far about re-writing our data using a new set of variables, or a new basis.

More information

MLR Model Selection. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project

MLR Model Selection. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project MLR Model Selection Author: Nicholas G Reich, Jeff Goldsmith This material is part of the statsteachr project Made available under the Creative Commons Attribution-ShareAlike 3.0 Unported License: http://creativecommons.org/licenses/by-sa/3.0/deed.en

More information

Fuzzy coding in constrained ordinations

Fuzzy coding in constrained ordinations Fuzzy coding in constrained ordinations Michael Greenacre Department of Economics and Business Faculty of Biological Sciences, Fisheries & Economics Universitat Pompeu Fabra University of Tromsø 08005

More information

Eigenvalues, Eigenvectors, and an Intro to PCA

Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Changing Basis We ve talked so far about re-writing our data using a new set of variables, or a new basis.

More information

DIMENSION REDUCTION AND CLUSTER ANALYSIS

DIMENSION REDUCTION AND CLUSTER ANALYSIS DIMENSION REDUCTION AND CLUSTER ANALYSIS EECS 833, 6 March 2006 Geoff Bohling Assistant Scientist Kansas Geological Survey geoff@kgs.ku.edu 864-2093 Overheads and resources available at http://people.ku.edu/~gbohling/eecs833

More information

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46 BIO5312 Biostatistics Lecture 10:Regression and Correlation Methods Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 11/1/2016 1/46 Outline In this lecture, we will discuss topics

More information

Diagnostics and Transformations Part 2

Diagnostics and Transformations Part 2 Diagnostics and Transformations Part 2 Bivariate Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University Multilevel Regression Modeling, 2009 Diagnostics

More information

Least Squares Optimization

Least Squares Optimization Least Squares Optimization The following is a brief review of least squares optimization and constrained optimization techniques, which are widely used to analyze and visualize data. Least squares (LS)

More information

Linear Models for Regression. Sargur Srihari

Linear Models for Regression. Sargur Srihari Linear Models for Regression Sargur srihari@cedar.buffalo.edu 1 Topics in Linear Regression What is regression? Polynomial Curve Fitting with Scalar input Linear Basis Function Models Maximum Likelihood

More information

Figure 43 - The three components of spatial variation

Figure 43 - The three components of spatial variation Université Laval Analyse multivariable - mars-avril 2008 1 6.3 Modeling spatial structures 6.3.1 Introduction: the 3 components of spatial structure For a good understanding of the nature of spatial variation,

More information

NONLINEAR REDUNDANCY ANALYSIS AND CANONICAL CORRESPONDENCE ANALYSIS BASED ON POLYNOMIAL REGRESSION

NONLINEAR REDUNDANCY ANALYSIS AND CANONICAL CORRESPONDENCE ANALYSIS BASED ON POLYNOMIAL REGRESSION Ecology, 8(4),, pp. 4 by the Ecological Society of America NONLINEAR REDUNDANCY ANALYSIS AND CANONICAL CORRESPONDENCE ANALYSIS BASED ON POLYNOMIAL REGRESSION VLADIMIR MAKARENKOV, AND PIERRE LEGENDRE, Département

More information

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras Lecture - 39 Regression Analysis Hello and welcome to the course on Biostatistics

More information

CSCI5654 (Linear Programming, Fall 2013) Lectures Lectures 10,11 Slide# 1

CSCI5654 (Linear Programming, Fall 2013) Lectures Lectures 10,11 Slide# 1 CSCI5654 (Linear Programming, Fall 2013) Lectures 10-12 Lectures 10,11 Slide# 1 Today s Lecture 1. Introduction to norms: L 1,L 2,L. 2. Casting absolute value and max operators. 3. Norm minimization problems.

More information

CHAPTER 2: QUADRATIC PROGRAMMING

CHAPTER 2: QUADRATIC PROGRAMMING CHAPTER 2: QUADRATIC PROGRAMMING Overview Quadratic programming (QP) problems are characterized by objective functions that are quadratic in the design variables, and linear constraints. In this sense,

More information

Principal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis. Chris Funk. Lecture 17

Principal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis. Chris Funk. Lecture 17 Principal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis Chris Funk Lecture 17 Outline Filters and Rotations Generating co-varying random fields Translating co-varying fields into

More information

Chapter 14 Student Lecture Notes 14-1

Chapter 14 Student Lecture Notes 14-1 Chapter 14 Student Lecture Notes 14-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter 14 Multiple Regression Analysis and Model Building Chap 14-1 Chapter Goals After completing this

More information

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Final Review. Yang Feng.   Yang Feng (Columbia University) Final Review 1 / 58 Final Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Final Review 1 / 58 Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple

More information

Factor Analysis. Qian-Li Xue

Factor Analysis. Qian-Li Xue Factor Analysis Qian-Li Xue Biostatistics Program Harvard Catalyst The Harvard Clinical & Translational Science Center Short course, October 7, 06 Well-used latent variable models Latent variable scale

More information

Multicollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response.

Multicollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response. Multicollinearity Read Section 7.5 in textbook. Multicollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response. Example of multicollinear

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Constrained optimization. Unconstrained optimization. One-dimensional. Multi-dimensional. Newton with equality constraints. Active-set method.

Constrained optimization. Unconstrained optimization. One-dimensional. Multi-dimensional. Newton with equality constraints. Active-set method. Optimization Unconstrained optimization One-dimensional Multi-dimensional Newton s method Basic Newton Gauss- Newton Quasi- Newton Descent methods Gradient descent Conjugate gradient Constrained optimization

More information

Analysis of Multivariate Ecological Data

Analysis of Multivariate Ecological Data Analysis of Multivariate Ecological Data School on Recent Advances in Analysis of Multivariate Ecological Data 24-28 October 2016 Prof. Pierre Legendre Dr. Daniel Borcard Département de sciences biologiques

More information

Regression Clustering

Regression Clustering Regression Clustering In regression clustering, we assume a model of the form y = f g (x, θ g ) + ɛ g for observations y and x in the g th group. Usually, of course, we assume linear models of the form

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

INF Shape descriptors

INF Shape descriptors 3.09.09 Shape descriptors Anne Solberg 3.09.009 1 Mandatory exercise1 http://www.uio.no/studier/emner/matnat/ifi/inf4300/h09/undervisningsmateriale/inf4300 termprojecti 009 final.pdf / t / / t t/ifi/inf4300/h09/

More information

8. FROM CLASSICAL TO CANONICAL ORDINATION

8. FROM CLASSICAL TO CANONICAL ORDINATION Manuscript of Legendre, P. and H. J. B. Birks. 2012. From classical to canonical ordination. Chapter 8, pp. 201-248 in: Tracking Environmental Change using Lake Sediments, Volume 5: Data handling and numerical

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

STAT Checking Model Assumptions

STAT Checking Model Assumptions STAT 704 --- Checking Model Assumptions Recall we assumed the following in our model: (1) The regression relationship between the response and the predictor(s) specified in the model is appropriate (2)

More information

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 42 Outline 1 Introduction 2 Feature selection

More information

DETECTING BIOLOGICAL AND ENVIRONMENTAL CHANGES: DESIGN AND ANALYSIS OF MONITORING AND EXPERIMENTS (University of Bologna, 3-14 March 2008)

DETECTING BIOLOGICAL AND ENVIRONMENTAL CHANGES: DESIGN AND ANALYSIS OF MONITORING AND EXPERIMENTS (University of Bologna, 3-14 March 2008) Dipartimento di Biologia Evoluzionistica Sperimentale Centro Interdipartimentale di Ricerca per le Scienze Ambientali in Ravenna INTERNATIONAL WINTER SCHOOL UNIVERSITY OF BOLOGNA DETECTING BIOLOGICAL AND

More information

Chapter Two Elements of Linear Algebra

Chapter Two Elements of Linear Algebra Chapter Two Elements of Linear Algebra Previously, in chapter one, we have considered single first order differential equations involving a single unknown function. In the next chapter we will begin to

More information

ECEN 615 Methods of Electric Power Systems Analysis Lecture 18: Least Squares, State Estimation

ECEN 615 Methods of Electric Power Systems Analysis Lecture 18: Least Squares, State Estimation ECEN 615 Methods of Electric Power Systems Analysis Lecture 18: Least Squares, State Estimation Prof. om Overbye Dept. of Electrical and Computer Engineering exas A&M University overbye@tamu.edu Announcements

More information

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University Lecture 4: Principal Component Analysis Aykut Erdem May 016 Hacettepe University This week Motivation PCA algorithms Applications PCA shortcomings Autoencoders Kernel PCA PCA Applications Data Visualization

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the

More information

IN Pratical guidelines for classification Evaluation Feature selection Principal component transform Anne Solberg

IN Pratical guidelines for classification Evaluation Feature selection Principal component transform Anne Solberg IN 5520 30.10.18 Pratical guidelines for classification Evaluation Feature selection Principal component transform Anne Solberg (anne@ifi.uio.no) 30.10.18 IN 5520 1 Literature Practical guidelines of classification

More information

An Introduction to Ordination Connie Clark

An Introduction to Ordination Connie Clark An Introduction to Ordination Connie Clark Ordination is a collective term for multivariate techniques that adapt a multidimensional swarm of data points in such a way that when it is projected onto a

More information

Duke University, Department of Electrical and Computer Engineering Optimization for Scientists and Engineers c Alex Bronstein, 2014

Duke University, Department of Electrical and Computer Engineering Optimization for Scientists and Engineers c Alex Bronstein, 2014 Duke University, Department of Electrical and Computer Engineering Optimization for Scientists and Engineers c Alex Bronstein, 2014 Linear Algebra A Brief Reminder Purpose. The purpose of this document

More information

Mathematical optimization

Mathematical optimization Optimization Mathematical optimization Determine the best solutions to certain mathematically defined problems that are under constrained determine optimality criteria determine the convergence of the

More information

Lecture 5: Clustering, Linear Regression

Lecture 5: Clustering, Linear Regression Lecture 5: Clustering, Linear Regression Reading: Chapter 10, Sections 3.1-3.2 STATS 202: Data mining and analysis October 4, 2017 1 / 22 .0.0 5 5 1.0 7 5 X2 X2 7 1.5 1.0 0.5 3 1 2 Hierarchical clustering

More information

COMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma

COMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma COMS 4771 Introduction to Machine Learning James McInerney Adapted from slides by Nakul Verma Announcements HW1: Please submit as a group Watch out for zero variance features (Q5) HW2 will be released

More information

Multivariate Ordination Analyses: Principal Component Analysis. Dilys Vela

Multivariate Ordination Analyses: Principal Component Analysis. Dilys Vela Multivariate Ordination Analyses: Principal Component Analysis Dilys Vela Tatiana Boza Multivariate Analyses A multivariate data set includes more than one variable ibl recorded dd from a number of replicate

More information

MATH 304 Linear Algebra Lecture 20: The Gram-Schmidt process (continued). Eigenvalues and eigenvectors.

MATH 304 Linear Algebra Lecture 20: The Gram-Schmidt process (continued). Eigenvalues and eigenvectors. MATH 304 Linear Algebra Lecture 20: The Gram-Schmidt process (continued). Eigenvalues and eigenvectors. Orthogonal sets Let V be a vector space with an inner product. Definition. Nonzero vectors v 1,v

More information