Diversity and composition of termites in Amazonia CSDambros 09 January, 2015

Size: px
Start display at page:

Download "Diversity and composition of termites in Amazonia CSDambros 09 January, 2015"

Transcription

1 Diversity and composition of termites in Amazonia CSDambros 09 January, 2015 Put the abstract here Missing code is being cleaned. Abstract Contents 1 Intro 3 2 Load required packages 3 3 Import data 3 4 Extract geogrpahical coordinates for transects and grids 4 5 Create default MAP 4 6 Create site x species matrix 5 7 Calculate PCoA 6 8 Dealing with missing values 7 9 Create connectivity matrix Transect-transect within grids Transect-transect among grids Merge local and regional dispersal into a single matrix Plot connectivity matrix Create Moran Eigenvectors Maps (MEMs) 9 11 Calculate spatial autocorrelation of MEMs 9 12 Analysis Regression analysis Termite abundance

2 14 Results in MAP Observed Predicted Residual Forward Selection of MEMs Define Broad and Fine scale MEMs Variance partitioning Significance of environmental variables controlling for space Plot variance partitioning graph Supplementary Analysis: RDA and dbrda RDA (Euclidean distance on Hellinger transformed data) Distance-Based RDA

3 1 Intro This script describes in detail all analyses conducted for the paper XXX. All datasets used are publicly available on-line, and the R code provides links for their individual download. Some datasets were previously modified to facilitate their use here. The original datasets as well as R scripts used for their transformation are available on-line at XXX. The clean R code presented in this document is also available at XXX in.r format. 2 Load required packages Some of the analyses run in our study were coded along the text, some analyses were in pre built packages, and some were transformed in functions, which are publicly available on-line. The following lines load the required libraries, and download and source the necessary functions rm(list=ls()) library(vegan) installed using install.packages("vegan") library(raster) installed using install.packages("raster") library(maptools) installed using install.packages("maptools") library(rgdal) installed using install.packages("rgdal") library(spdep) installed using install.packages("spdep") library(sp) installed using install.packages("sp") #source(" # poncho function for figures source("additionalfunctions.r") source("pie.ord.r") 3 Import data Similarly to the functions, we made our datsets publicly available on-line. They are downloaded using the next chunk of code. If you have problems running these lines, you might have an Internet connection problem. Try to download the files to your directory using the provided links, and then read the files directly from your folder. The files imported here are not original datasets. These data were transformed to facilitate analyses. For example, environmental data from rasters were extracted previous to these analyses and incorporated into the main environmental dataframe. The original datasets, as well as scripts with data transformation are also available at: XXX # Import maps MAP<-readShapeSpatial("MAP.shp") rivers<-readshapespatial("rivers.shp") # Termite data Import termite record data isoptera<-read.csv("termiteproject.csv",row.names=1) Import termite species data - Path to be changed isopterataxonid<-read.csv("~/dropbox/labtermes/tabela_geral/isopterataxonid.csv") 3

4 # Environmental data env<-read.csv("termiteprojectenvironment.csv") 4 Extract geogrpahical coordinates for transects and grids coords<-env[,c("utm_easting","utm_northing")] regional.coords<-aggregate(coords,list(env$moduloid),mean)# Spatial Coordinates for grids 5 Create default MAP Before starting the analyses, the following lines can be run to create a default map to be used when plotting the results. This avoids repeating these lines several times when plotting maps #Creade default map MAP.1<-function(){ # Reduce fig margins par(mar=c(0.1,0.1,0.1,0.1)) # Nice positioning of labels lab.coords<-matrix(c( , ,886041, ,151940, , ,77515, , , , , , , , , , ),ncol=2) # Country labels labels<-c("brazil","f. Guiana","Guiana","Suriname","Venezuela","Colombia","Peru","Bolivia", "Brazilian Amazonia") # Plot Maps plot(map,border="grey40",col="grey80",bg="grey100",xlim=c(-80463, ),ylim=c( , )) plot(map[9,],col="grey34",bg="grey100",border="grey34",add=t) plot(map[6,],border="grey20",col=na,bg=na,add=t) # Add labels text(lab.coords[,1],lab.coords[,2],labels,adj=0,col=rep(c("grey30","grey70"),c(length(labels)-1,1))) plot(rivers,col="grey20",add=t) # Add grid locations points(regional.coords[,-1],pch=21,col="white",bg=1,cex=.8,lwd=1.3) op<-par(no.readonly=true) # Add rectangle for barplot par(fig=c(0.52,.98,0.1,.33),usr=c(0,1,0,1),new=true) plot.new() rect(0,0,1,1,col="grey85",border="grey15") 4

5 par(op) par(fig=c(0.6,.95,0.16,.3),new=true) plot.new() # Add labels for barplot axis(1,at=c(0.05,0.95),labels=c("broad","fine"),tick=false,cex.axis=1,padj=-1) axis(1,at=0.5,labels="mems",tick=false,cex.axis=1,padj=-.5) axis(2,at=c(0,1),labels=0:1,cex.axis=1,las=2) axis(2,at=0.5,labels=expression(r^2),tick=false,cex.axis=1) box(bty="l") par(op) } # Call map #MAP.1() # The following lines only define where points will actually be plotted to avoid overlapping x2<-c( ,828511,209624,693235,70966,615452,432829,388864,453120,977315, ,422683,970551) y2<-c( , , , , , , , , , , , , ) #They were all merged into the nx table to facilitate plotting later nx<-cbind(regional.coords[,-1],x2,y2) 6 Create site x species matrix The termite data provided is in the long format, that is, each occurrence is a row in the original spreadsheet. Species, sampling location, etc are attributes (columns) in this spreadsheet. To run our analyses, we started by transforming the termite data into a short table, where each row represents a sampling location, and each column represents a species. Note that long format contain more information and is preferable for storage. Before starting, the variables PlotID and TaxonID in table isoptera were changed to be in the same order than these variables in the environmental table (env) and species info table (isopterataxonid). The number of subsamples sampled within each transect was also recorded # Make sure transects in termite dataset are in the same order from environmental dataset isoptera$plotid<-factor(isoptera$plotid,levels=env$plotid) # Make sure species in termite dataset are in the same order from species dataset isoptera$taxonid<-factor(isoptera$taxonid,levels=isopterataxonid$taxonid) attach(isoptera) termite.plot.obs<-tapply(isoptera$n,list(plotid,taxonid),sum)# Create transect X species matrix termite.plot.obs[is.na(termite.plot.obs)]<-0 # NAs are true zeros effort.plot<-tapply(n_subplots,plotid,mean) # Number of sections sampled per transect 5

6 detach(isoptera) The number of subsamples in each transect varied as can be checked by the following code range(effort.plot) [1] 5 12 To avoid comparing transects with different sampling effort, transects with more than 5 sections were rarefied. In other words, 5 sections were randomly selected in each transect, and the average abundance of each species, probability of presence (average presence), species overall abundance and species richness were calculated. If all transects had only 5 sections, the results would be exactly the same for the abundance and presence/absence matrices, species abundance per transect, and species richness per transect. The rarefaction was performed for all species, and for Wood feeding and Soil feeding termites separately. Abundance matrix termite.plot<-(5*termite.plot.obs)/as.vector(effort.plot) termite.plot.wood<-termite.plot[,isopterataxonid$fg1=="w"] termite.plot.soil<-termite.plot[,isopterataxonid$fg1=="s"] Presence-Absence matrix #termite.plot.pa<-1-((1-termite.plot.obs/as.vector(effort))^5)#with replacement termite.plot.pa<-poccur(as.vector(effort.plot),termite.plot.obs,5)# Without replacement (as simulated) Warning: NaNs produced termite.plot.pa[is.nan(termite.plot.pa)]<-1 termite.plot.pa.wood<-termite.plot.pa[,isopterataxonid$fg1=="w"] termite.plot.pa.soil<-termite.plot.pa[,isopterataxonid$fg1=="s"] Abundance termite.n<-rowsums(termite.plot,na.rm=true) termite.n.wood<-rowsums(termite.plot.wood,na.rm=true) termite.n.soil<-rowsums(termite.plot.soil,na.rm=true) Richness termite.s<-rowsums(termite.plot.pa,na.rm=true) termite.s.wood<-rowsums(termite.plot.pa.wood,na.rm=true) termite.s.soil<-rowsums(termite.plot.pa.soil,na.rm=true) 7 Calculate PCoA 6

7 Composition termite.bray<-vegdist(cbind(termite.plot,1),"bray") termite.pcoa<-scores(cmdscale(termite.bray)) termite.pcoa1<-termite.pcoa[,1] termite.pcoa2<-termite.pcoa[,2] Composition soil termite.bray.soil<-vegdist(cbind(termite.plot.soil,1),"bray") termite.pcoa.soil<-scores(cmdscale(termite.bray.soil)) termite.pcoa1.soil<-termite.pcoa.soil[,1] termite.pcoa2.soil<-termite.pcoa.soil[,2] Composition wood termite.bray.wood<-vegdist(cbind(termite.plot.wood,1),"bray") termite.pcoa.wood<-scores(cmdscale(termite.bray.wood)) termite.pcoa1.wood<-termite.pcoa.wood[,1] termite.pcoa2.wood<-termite.pcoa.wood[,2] 8 Dealing with missing values Some environmental soil variables and ant data were not available for all transects. Removing these data from the analyses could prevent the detection of the association of other variables with termite species abundance and composition. To overcome the problem of having missing data we performed data inputation, that is filling missing data with values. For those transects with missing values, we randomly selected values from other transects. We analyzed the ant data by including an interaction of grid:antdensity, so that on those grids without ant information, slopes are close to zero due to data inputation, whereas in the remaining transects the slopes are not affected by data inputation. The number of samples without environmental data was 58 for ants, 61 for P and Clay, and 51 for Mg and K. Grids with environmental data were sampled in all sampling regions of the study, so data inputation was spread across the study region. #Data Imputation For those transects without env data, randomly select vals from other transects set.seed(102) # randomly select from observations env.1<-as.data.frame(sapply(env,function(x)ifelse(is.na(as.numeric(x)),sample(x[!is.na(x)]),x))) # Restore variables that are factors env.1[,colnames(env)[(sapply(env,is.factor))]]<-env[,sapply(env,is.factor)] 7

8 9 Create connectivity matrix 9.1 Transect-transect within grids # Create dispersal matrix to represent spatial autocorrelation Local communities # Connect plots that are less than 1700 meters apart, but not self local.nb.mat<-((as.matrix(dist(coords))<1700&as.matrix(dist(coords))>5)*1) # Prob of leaving a plot to metacom = prob of leaving to another plot within grid pleave<-1/(colsums(local.nb.mat)+1) # probs to disperse to neighboring plots within grid within<-t(local.nb.mat/(colsums(local.nb.mat)+1)) 9.2 Transect-transect among grids Regional community (dispersal between grids of plots) reg.hub<-as.integer(env$moduloid)# determine in which grid (hub in metacomm), a particular plot is # Use Delaunay triangulation to create metacommunity structure (requires spdep package) regional.nb<-graph2nb(gabrielneigh(as.matrix(regional.coords[,2:3])),sym=t,row.names = regional.coords[, # Calculate the number of neighbors a grid (hub) has reg.nlink<-sapply(regional.nb,length) # Create a matrix of 1s & 0s of dispersal from plots to grids regional.nb.mat<-sapply(regional.nb,function(x){(1*matrix(reg.hub%in%as.integer(x)))}) # Create matrix of dispersal from plot to plot through the metacommunity regional.nb.mat1<-t(t(regional.nb.mat)*(1/reg.nlink))[,reg.hub] # Define how many plots each grid is connected to hub.out<-(1/(table(reg.hub)))[reg.hub] # Final probability of dispersal plot-plot in metacommunity between<-matrix(hub.out,198,198)*regional.nb.mat1 9.3 Merge local and regional dispersal into a single matrix # merge local and regional: p(plot2plot)=p(leave & arrive by metacomm hub) or p(stay in grid and move) plot2plot<-(t(pleave*t(between)))+within # transform into a list removing zeros (sparse matrix) plot2plot.nb<-sapply(as.data.frame(plot2plot),function(x)(1:length(x))[x!=0]) 8

9 # Transform into a class nb (spdep), just for plotting attributes(plot2plot.nb)<-list(class="nb",sym=true) 9.4 Plot connectivity matrix Figure 1: plot of chunk plotgabriel 10 Create Moran Eigenvectors Maps (MEMs) Calculate Moran EigenVectors representing spatial autocorrelation w<-(t(plot2plot)+plot2plot)/2 #Make weight (dispersal) matrix symetric w<-t(t((w-rowmeans(w)))-colmeans(w)+mean(w)) #normalize data (column and row means to zero) MEMs<-eigen(w)#Calculate Eigenvalues and vectors for w # Eliminate vector where eigenvalue equals about zero MEMs$vectors<-MEMs$vectors[,abs(MEMs$values)!=min(abs(MEMs$values))] # Eliminate value where eigenvalue equals about zero MEMs$values<-MEMs$values[abs(MEMs$values)!=min(abs(MEMs$values))] 11 Calculate spatial autocorrelation of MEMs # Detect if moran eigenvectors have significant spatial autocorrelation pvals<-sapply(1:length(mems$values),function(x){ morani.pvals(mems$vectors[,x],plot2plot,c("less","greater")[(mems$values[x]>0)+1],100)}) # Select only vectors with significant spatial aucorrelation MEMs.signif.vec <- as.data.frame(mems$vectors[,pvals<0.01]) MEMs.signif.val <- MEMs$values[pvals<0.01] 9

10 12 Analysis To analyze the data, we first created a string with the predictor variables to be included in our models. By creating the string this way, we avoided repeating the same long string in every section where a model is fitted. Some variables were also log transformed previous to analyses. These are represented by I(log(variable + h)). h is a constant added to all log transformed variables, so that zeros do not produce an error (log(0)=-inf; log(1)=0). #Select environmental variables to include in the model h=1 #Factor to be added to variables prior to log transformation # Create dataframe only with predictor variables attach(env.1) predictors<-data.frame(layer.1,layer.12,tclandsat,clay,lnca=log(ca+h),lnk=log(env.1$k+h),lnp=log(p+h)) detach(env.1) head(predictors) layer.1 layer.12 TCLandsat Clay lnca lnk lnp Regression analysis We run two types of models. For univariate models, we run multiple regressions. For multivariate (species composition) we run Redundancy Analysis, which is an extension of multiple regression. Note that using rda with a single response variable produces exactly the same results as a multiple regression, so we used the function rda for univariate models in some cases where RDA implementation is easier. In this annotated script, only the analyses for termite abundance and overall composition is shown for brevity. The analyses for species richness, and abundance, richness, and composition for wood and soil feeding termites are available in the clean R script provided along with data Termite abundance Similarly to what has been done for the predictor variables, the response variable was called as a string. This variable is then called later with the code get(response). We conducted the analyses using the string method because the same code is repeated several times, so that only the string needs to be modified when changing the response variable. We also performed regression analyses with only environmental variables as predictors, and then compared the spatial structure observed in the data, spatial structure expected by the environmental variables, and spatial structure in the residuals. # Create regression model REG<-lm(termite.N~.,data=predictors) summary(reg) 10

11 Call: lm(formula = termite.n ~., data = predictors) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) *** layer ** layer TCLandsat Clay lnca * lnk lnp * --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 4.61 on 190 degrees of freedom Multiple R-squared: 0.173, Adjusted R-squared: F-statistic: 5.67 on 7 and 190 DF, p-value: 5.74e-06 # Com in this case is just the termite.n variable, termite abundance Com<-termite.N # Termite abundance predicted by the environment using the regression model Env<-predict(REG,type="response") # Residuals from the regression model Res<-termite.N-predict(REG,type="response") # Create a dataframe with observed, predicted, and residuals scores.n<-cbind(com,env,res) We then calculated the variance explained by individual MEMs. These R2 values were grouped into 28 groups from broad to fine scale (smoothed MEMs; Dray et al. 2012). P-values for each group was calculated by randomizing the values of termite abundance and recalculating R2 values 1000 times (Monte Carlo permutations). P-value = (number of times R2 is higher or equal observed + 1)/(number of randomizations + 1) # calculate variance explained by MEMs test.n<-groupr2.pvals(scores.n,mems$vectors[,1:196],28,1000) 14 Results in MAP 14.1 Observed 11

12 #Observed MAP.1() segments(nx[,1],nx[,2],nx[,3],nx[,4],col="white") points(regional.coords[,-1],pch=21,col="white",bg=1,cex=.8,lwd=1.3) pie.ord(nx[as.integer(env$moduloid),3:4],scores.n[,1]-min(scores.n[,1]),groups=env$moduloid, col=c("green","green"),starpie=true,expand=.08,border=adjustcolor(1,alpha=.8)) Loading required package: plotrix points(nx[,3:4],pch=21,bg=1,cex=.6,col="white") par(fig=c(0.6,.95,0.16,.3),new=true) barplot(test.n$obs[,1],ylim=c(0,1),col=(test.n$pvals[,1]<0.05),axes=false)# legend.ord(0.80,0.40,round(seq(max(scores.n[,1]),min(scores.n[,1]),length=4)), title="abundance",col="green",bg="grey85",border="grey15") Colombia Venezuela Suriname F. Guiana Guiana Abundance 28 Brazilian Amazonia R2 Peru Bolivia 0 Broad MEMs Fine Brazil Figure 2: plot of chunk YinMap #dev.copy2pdf(file="yn1.pdf")#save figure as a pdf 14.2 Predicted 12

13 #Predicted MAP.1() segments(nx[,1],nx[,2],nx[,3],nx[,4],col="white") points(regional.coords[,-1],pch=21,col="white",bg=1,cex=.8,lwd=1.3) pie.ord(nx[as.integer(env$moduloid),3:4],scores.n[,2]-min(scores.n[,2]),groups=env$moduloid, col=c("green","green"),starpie=true,expand=.08,border=adjustcolor(1,alpha=.8)) points(nx[,3:4],pch=21,bg=1,cex=.6,col="white") par(fig=c(0.6,.95,0.16,.3),new=true) barplot(test.n$obs[,2],ylim=c(0,1),col=(test.n$pvals[,2]<0.05),axes=false)# legend.ord(0.80,0.40,round(seq(max(scores.n[,2]),min(scores.n[,2]),length=4)), title="abundance",col="green",bg="grey85",border="grey15") Colombia Venezuela Suriname F. Guiana Guiana Abundance 19 Brazilian Amazonia R2 Peru Bolivia 0 Broad MEMs Fine Brazil Figure 3: plot of chunk FinMap #dev.copy2pdf(file="fn1.pdf")#save figure as a pdf 14.3 Residual #Residual MAP.1() 13

14 segments(nx[,1],nx[,2],nx[,3],nx[,4],col="white") points(regional.coords[,-1],pch=21,col="white",bg=1,cex=.8,lwd=1.3) pie.ord(nx[as.integer(env$moduloid),3:4],scores.n[,3],groups=env$moduloid, col=c("gold","green"),starpie=true,expand=.08,border=adjustcolor(1,alpha=.8)) points(nx[,3:4],pch=21,bg=1,cex=.6,col="white") par(fig=c(0.6,.95,0.16,.3),new=true) barplot(test.n$obs[,3],ylim=c(0,1),col=(test.n$pvals[,3]<0.05),axes=false)# legend.ord(0.80,0.40,round(seq(max(scores.n[,3]),min(scores.n[,3]),length=4)), title="abundance",col=c("green","green","gold","gold"),bg="grey85",border="grey15") Colombia Venezuela Suriname F. Guiana Guiana Abundance 13 Brazilian Amazonia R2 Peru Bolivia 0 Broad MEMs Fine Brazil Figure 4: plot of chunk RinMap #dev.copy2pdf(file="rn1.pdf") #Save figure as a pdf 15 Forward Selection of MEMs After defining the response variable, we performed a forward selection of MEMs, and selected only those spatial predictors that significantly increased the values of adjusted R2 of the models. That is, the MEMs that better explained the variance in the response variable. Forward selection and definition of broad and fine scale variables # Create null model reg.0 <- rda(termite.n ~ 1, data=mems.signif.vec) 14

15 # Create model with all MEMs reg.full <- rda(termite.n ~., data=mems.signif.vec) #use rda to perform forward selection based on R2 reg.fw <- ordir2step(reg.0, formula(reg.full), steps = 10000,direction="forward",R2scope=FALSE) # Extract the name of MEMs selected selected<-attr(reg.fw$terms,"term.labels") # Create dataframe with selected MEMs vec.sel<-mems.signif.vec[,match(selected,colnames(mems.signif.vec))] # Create vector with the corresponding eigenvalues for selected MEMs val.sel<-mems.signif.val[match(selected,colnames(mems.signif.vec))] 16 Define Broad and Fine scale MEMs broad <- vec.sel[,val.sel>0] # Split selected MEMs in broad scale MEMs fine <- vec.sel[,val.sel<0] # Split selected MEMs in fine scale MEMs 17 Variance partitioning After conducting forward selection and splitting MEMs in broad and fine scale MEMs, we partitioned the variance of the termite abundance into 1) variance explaned by environmental predictors, variance explained by broad scale spatial autocorrelation, and variance explained by fine scale spatial autocorrelation. Spatial structure of data and variance partitioning varpart.table.n <- varpart(termite.n, predictors, broad, fine) varpart.table.n Partition of variation in RDA Call: varpart(y = termite.n, X = predictors, broad, fine) Explanatory tables: X1: predictors X2: broad X3: fine No. of explanatory tables: 3 Total variation (SS): Variance: No. of observations: 198 Partition table: Df R.square Adj.R.square Testable 15

16 [a+d+f+g] = X TRUE [b+d+e+g] = X TRUE [c+e+f+g] = X TRUE [a+b+d+e+f+g] = X1+X TRUE [a+c+d+e+f+g] = X1+X TRUE [b+c+d+e+f+g] = X2+X TRUE [a+b+c+d+e+f+g] = All TRUE Individual fractions [a] = X1 X2+X TRUE [b] = X2 X1+X TRUE [c] = X3 X1+X TRUE [d] FALSE [e] FALSE [f] FALSE [g] FALSE [h] = Residuals FALSE Controlling 1 table X [a+d] = X1 X TRUE [a+f] = X1 X TRUE [b+d] = X2 X TRUE [b+e] = X2 X TRUE [c+e] = X3 X TRUE [c+f] = X3 X TRUE --- Use function 'rda' to test significance of fractions of interest 18 Significance of environmental variables controlling for space Run anova by margin (each variables agains all others) pval.n<-anova(rda(termite.n~.+condition(as.matrix(vec.sel)),data=predictors),by="margin",step=999) pval.n Model: rda(formula = termite.n ~ layer.1 + layer.12 + TCLandsat + Clay + lnca + lnk + lnp + Condition Permutation test for rda under reduced model Marginal effects of terms Df Var F N.Perm Pr(>F) layer layer TCLandsat * Clay * lnca lnk lnp Residual Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 16

17 18.1 Plot variance partitioning graph # plot(varpart.table.n) title("abundance",cex.main=2) Abundance Residuals = 0.50 Values <0 not shown Figure 5: plot of chunk plotvarpart dev.copy2pdf(file="vennn.pdf") pdf 2 We then repeated the same procedure for species richness and PCoA1 and PCoA2. The code for these analyses is now shown to avoid being repetitive, but you can access the complete code in the clean R script at XXX 19 Supplementary Analysis: RDA and dbrda 19.1 RDA (Euclidean distance on Hellinger transformed data) Hellinger transformation Hellinger transform termite data (Poisson-like ---> Normal-like, weight in abundant species) termite.plot.h <- decostand(termite.plot, method="hellinger") termite.plot.h.w <- decostand(termite.plot.w, method="hellinger") termite.plot.h.s <- decostand(termite.plot.s, method="hellinger") 17

18 RDA 19.2 Distance-Based RDA 18

Ecography. Supplementary material

Ecography. Supplementary material Ecography ECOG-02663 Dambros, C. S., Moraris, J. W., Azevedo, R. A. and Gotelli, N. J. 2016. Isolation by distance, not rivers, control the distribution of termite species in the Amazonian rain forest.

More information

Continuous soil attribute modeling and mapping: Multiple linear regression

Continuous soil attribute modeling and mapping: Multiple linear regression Continuous soil attribute modeling and mapping: Multiple linear regression Soil Security Laboratory 2017 1 Multiple linear regression Multiple linear regression (MLR) is where we regress a target variable

More information

Bootstrapping, Randomization, 2B-PLS

Bootstrapping, Randomization, 2B-PLS Bootstrapping, Randomization, 2B-PLS Statistics, Tests, and Bootstrapping Statistic a measure that summarizes some feature of a set of data (e.g., mean, standard deviation, skew, coefficient of variation,

More information

Analysis of Multivariate Ecological Data

Analysis of Multivariate Ecological Data Analysis of Multivariate Ecological Data School on Recent Advances in Analysis of Multivariate Ecological Data 24-28 October 2016 Prof. Pierre Legendre Dr. Daniel Borcard Département de sciences biologiques

More information

Partial regression and variation partitioning

Partial regression and variation partitioning Partial regression and variation partitioning Pierre Legendre Département de sciences biologiques Université de Montréal http://www.numericalecology.com/ Pierre Legendre 2017 Outline of the presentation

More information

Using R in 200D Luke Sonnet

Using R in 200D Luke Sonnet Using R in 200D Luke Sonnet Contents Working with data frames 1 Working with variables........................................... 1 Analyzing data............................................... 3 Random

More information

4/2/2018. Canonical Analyses Analysis aimed at identifying the relationship between two multivariate datasets. Cannonical Correlation.

4/2/2018. Canonical Analyses Analysis aimed at identifying the relationship between two multivariate datasets. Cannonical Correlation. GAL50.44 0 7 becki 2 0 chatamensis 0 darwini 0 ephyppium 0 guntheri 3 0 hoodensis 0 microphyles 0 porteri 2 0 vandenburghi 0 vicina 4 0 Multiple Response Variables? Univariate Statistics Questions Individual

More information

Inferences on Linear Combinations of Coefficients

Inferences on Linear Combinations of Coefficients Inferences on Linear Combinations of Coefficients Note on required packages: The following code required the package multcomp to test hypotheses on linear combinations of regression coefficients. If you

More information

R Output for Linear Models using functions lm(), gls() & glm()

R Output for Linear Models using functions lm(), gls() & glm() LM 04 lm(), gls() &glm() 1 R Output for Linear Models using functions lm(), gls() & glm() Different kinds of output related to linear models can be obtained in R using function lm() {stats} in the base

More information

VarCan (version 1): Variation Estimation and Partitioning in Canonical Analysis

VarCan (version 1): Variation Estimation and Partitioning in Canonical Analysis VarCan (version 1): Variation Estimation and Partitioning in Canonical Analysis Pedro R. Peres-Neto March 2005 Department of Biology University of Regina Regina, SK S4S 0A2, Canada E-mail: Pedro.Peres-Neto@uregina.ca

More information

ST430 Exam 1 with Answers

ST430 Exam 1 with Answers ST430 Exam 1 with Answers Date: October 5, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textook are permitted but you may use a calculator.

More information

BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression

BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression Introduction to Correlation and Regression The procedures discussed in the previous ANOVA labs are most useful in cases where we are interested

More information

Variance Decomposition and Goodness of Fit

Variance Decomposition and Goodness of Fit Variance Decomposition and Goodness of Fit 1. Example: Monthly Earnings and Years of Education In this tutorial, we will focus on an example that explores the relationship between total monthly earnings

More information

Regression on Faithful with Section 9.3 content

Regression on Faithful with Section 9.3 content Regression on Faithful with Section 9.3 content The faithful data frame contains 272 obervational units with variables waiting and eruptions measuring, in minutes, the amount of wait time between eruptions,

More information

Exercise 2 SISG Association Mapping

Exercise 2 SISG Association Mapping Exercise 2 SISG Association Mapping Load the bpdata.csv data file into your R session. LHON.txt data file into your R session. Can read the data directly from the website if your computer is connected

More information

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference. Understanding regression output from software Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals In 1966 Cyril Burt published a paper called The genetic determination of differences

More information

R Demonstration ANCOVA

R Demonstration ANCOVA R Demonstration ANCOVA Objective: The purpose of this week s session is to demonstrate how to perform an analysis of covariance (ANCOVA) in R, and how to plot the regression lines for each level of the

More information

STAT 350 Final (new Material) Review Problems Key Spring 2016

STAT 350 Final (new Material) Review Problems Key Spring 2016 1. The editor of a statistics textbook would like to plan for the next edition. A key variable is the number of pages that will be in the final version. Text files are prepared by the authors using LaTeX,

More information

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model Lab 3 A Quick Introduction to Multiple Linear Regression Psychology 310 Instructions.Work through the lab, saving the output as you go. You will be submitting your assignment as an R Markdown document.

More information

MODELS WITHOUT AN INTERCEPT

MODELS WITHOUT AN INTERCEPT Consider the balanced two factor design MODELS WITHOUT AN INTERCEPT Factor A 3 levels, indexed j 0, 1, 2; Factor B 5 levels, indexed l 0, 1, 2, 3, 4; n jl 4 replicate observations for each factor level

More information

STAT 3022 Spring 2007

STAT 3022 Spring 2007 Simple Linear Regression Example These commands reproduce what we did in class. You should enter these in R and see what they do. Start by typing > set.seed(42) to reset the random number generator so

More information

Density Temp vs Ratio. temp

Density Temp vs Ratio. temp Temp Ratio Density 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Density 0.0 0.2 0.4 0.6 0.8 1.0 1. (a) 170 175 180 185 temp 1.0 1.5 2.0 2.5 3.0 ratio The histogram shows that the temperature measures have two peaks,

More information

cor(dataset$measurement1, dataset$measurement2, method= pearson ) cor.test(datavector1, datavector2, method= pearson )

cor(dataset$measurement1, dataset$measurement2, method= pearson ) cor.test(datavector1, datavector2, method= pearson ) Tutorial 7: Correlation and Regression Correlation Used to test whether two variables are linearly associated. A correlation coefficient (r) indicates the strength and direction of the association. A correlation

More information

Multiple Predictor Variables: ANOVA

Multiple Predictor Variables: ANOVA Multiple Predictor Variables: ANOVA 1/32 Linear Models with Many Predictors Multiple regression has many predictors BUT - so did 1-way ANOVA if treatments had 2 levels What if there are multiple treatment

More information

A GEOSTATISTICAL APPROACH TO PREDICTING A PHYSICAL VARIABLE THROUGH A CONTINUOUS SURFACE

A GEOSTATISTICAL APPROACH TO PREDICTING A PHYSICAL VARIABLE THROUGH A CONTINUOUS SURFACE Katherine E. Williams University of Denver GEOG3010 Geogrpahic Information Analysis April 28, 2011 A GEOSTATISTICAL APPROACH TO PREDICTING A PHYSICAL VARIABLE THROUGH A CONTINUOUS SURFACE Overview Data

More information

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph. Regression, Part I I. Difference from correlation. II. Basic idea: A) Correlation describes the relationship between two variables, where neither is independent or a predictor. - In correlation, it would

More information

> nrow(hmwk1) # check that the number of observations is correct [1] 36 > attach(hmwk1) # I like to attach the data to avoid the '$' addressing

> nrow(hmwk1) # check that the number of observations is correct [1] 36 > attach(hmwk1) # I like to attach the data to avoid the '$' addressing Homework #1 Key Spring 2014 Psyx 501, Montana State University Prof. Colleen F Moore Preliminary comments: The design is a 4x3 factorial between-groups. Non-athletes do aerobic training for 6, 4 or 2 weeks,

More information

1 Use of indicator random variables. (Chapter 8)

1 Use of indicator random variables. (Chapter 8) 1 Use of indicator random variables. (Chapter 8) let I(A) = 1 if the event A occurs, and I(A) = 0 otherwise. I(A) is referred to as the indicator of the event A. The notation I A is often used. 1 2 Fitting

More information

Community surveys through space and time: testing the space-time interaction in the absence of replication

Community surveys through space and time: testing the space-time interaction in the absence of replication Community surveys through space and time: testing the space-time interaction in the absence of replication Pierre Legendre, Miquel De Cáceres & Daniel Borcard Département de sciences biologiques, Université

More information

Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov

Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT Nov 20 2015 Charlotte Wickham stat511.cwick.co.nz Quiz #4 This weekend, don t forget. Usual format Assumptions Display 7.5 p. 180 The ideal normal, simple

More information

Inference with Heteroskedasticity

Inference with Heteroskedasticity Inference with Heteroskedasticity Note on required packages: The following code requires the packages sandwich and lmtest to estimate regression error variance that may change with the explanatory variables.

More information

Biostatistics 380 Multiple Regression 1. Multiple Regression

Biostatistics 380 Multiple Regression 1. Multiple Regression Biostatistics 0 Multiple Regression ORIGIN 0 Multiple Regression Multiple Regression is an extension of the technique of linear regression to describe the relationship between a single dependent (response)

More information

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017 Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017 PDF file location: http://www.murraylax.org/rtutorials/regression_anovatable.pdf

More information

STAT 215 Confidence and Prediction Intervals in Regression

STAT 215 Confidence and Prediction Intervals in Regression STAT 215 Confidence and Prediction Intervals in Regression Colin Reimer Dawson Oberlin College 24 October 2016 Outline Regression Slope Inference Partitioning Variability Prediction Intervals Reminder:

More information

Chapter 11 Canonical analysis

Chapter 11 Canonical analysis Chapter 11 Canonical analysis 11.0 Principles of canonical analysis Canonical analysis is the simultaneous analysis of two, or possibly several data tables. Canonical analyses allow ecologists to perform

More information

Lecture 3: Inference in SLR

Lecture 3: Inference in SLR Lecture 3: Inference in SLR STAT 51 Spring 011 Background Reading KNNL:.1.6 3-1 Topic Overview This topic will cover: Review of hypothesis testing Inference about 1 Inference about 0 Confidence Intervals

More information

Part II { Oneway Anova, Simple Linear Regression and ANCOVA with R

Part II { Oneway Anova, Simple Linear Regression and ANCOVA with R Part II { Oneway Anova, Simple Linear Regression and ANCOVA with R Gilles Lamothe February 21, 2017 Contents 1 Anova with one factor 2 1.1 The data.......................................... 2 1.2 A visual

More information

Introduction to Statistics and R

Introduction to Statistics and R Introduction to Statistics and R Mayo-Illinois Computational Genomics Workshop (2018) Ruoqing Zhu, Ph.D. Department of Statistics, UIUC rqzhu@illinois.edu June 18, 2018 Abstract This document is a supplimentary

More information

Temporal eigenfunction methods for multiscale analysis of community composition and other multivariate data

Temporal eigenfunction methods for multiscale analysis of community composition and other multivariate data Temporal eigenfunction methods for multiscale analysis of community composition and other multivariate data Pierre Legendre Département de sciences biologiques Université de Montréal Pierre.Legendre@umontreal.ca

More information

1 Multiple Regression

1 Multiple Regression 1 Multiple Regression In this section, we extend the linear model to the case of several quantitative explanatory variables. There are many issues involved in this problem and this section serves only

More information

Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/

Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/ Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/28.0018 Statistical Analysis in Ecology using R Linear Models/GLM Ing. Daniel Volařík, Ph.D. 13.

More information

STAT 350: Summer Semester Midterm 1: Solutions

STAT 350: Summer Semester Midterm 1: Solutions Name: Student Number: STAT 350: Summer Semester 2008 Midterm 1: Solutions 9 June 2008 Instructor: Richard Lockhart Instructions: This is an open book test. You may use notes, text, other books and a calculator.

More information

BIOSTATS 640 Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) R Users

BIOSTATS 640 Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) R Users BIOSTATS 640 Spring 08 Unit. Regression and Correlation (Part of ) R Users Unit Regression and Correlation of - Practice Problems Solutions R Users. In this exercise, you will gain some practice doing

More information

Analysis of variance. Gilles Guillot. September 30, Gilles Guillot September 30, / 29

Analysis of variance. Gilles Guillot. September 30, Gilles Guillot September 30, / 29 Analysis of variance Gilles Guillot gigu@dtu.dk September 30, 2013 Gilles Guillot (gigu@dtu.dk) September 30, 2013 1 / 29 1 Introductory example 2 One-way ANOVA 3 Two-way ANOVA 4 Two-way ANOVA with interactions

More information

Statistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat).

Statistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat). Statistics 512: Solution to Homework#11 Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat). 1. Perform the two-way ANOVA without interaction for this model. Use the results

More information

14 Multiple Linear Regression

14 Multiple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

Outline. 1 Preliminaries. 2 Introduction. 3 Multivariate Linear Regression. 4 Online Resources for R. 5 References. 6 Upcoming Mini-Courses

Outline. 1 Preliminaries. 2 Introduction. 3 Multivariate Linear Regression. 4 Online Resources for R. 5 References. 6 Upcoming Mini-Courses UCLA Department of Statistics Statistical Consulting Center Introduction to Regression in R Part II: Multivariate Linear Regression Denise Ferrari denise@stat.ucla.edu Outline 1 Preliminaries 2 Introduction

More information

STATISTICS 110/201 PRACTICE FINAL EXAM

STATISTICS 110/201 PRACTICE FINAL EXAM STATISTICS 110/201 PRACTICE FINAL EXAM Questions 1 to 5: There is a downloadable Stata package that produces sequential sums of squares for regression. In other words, the SS is built up as each variable

More information

df=degrees of freedom = n - 1

df=degrees of freedom = n - 1 One sample t-test test of the mean Assumptions: Independent, random samples Approximately normal distribution (from intro class: σ is unknown, need to calculate and use s (sample standard deviation)) Hypotheses:

More information

Stat 412/512 TWO WAY ANOVA. Charlotte Wickham. stat512.cwick.co.nz. Feb

Stat 412/512 TWO WAY ANOVA. Charlotte Wickham. stat512.cwick.co.nz. Feb Stat 42/52 TWO WAY ANOVA Feb 6 25 Charlotte Wickham stat52.cwick.co.nz Roadmap DONE: Understand what a multiple regression model is. Know how to do inference on single and multiple parameters. Some extra

More information

EXAM PRACTICE. 12 questions * 4 categories: Statistics Background Multivariate Statistics Interpret True / False

EXAM PRACTICE. 12 questions * 4 categories: Statistics Background Multivariate Statistics Interpret True / False EXAM PRACTICE 12 questions * 4 categories: Statistics Background Multivariate Statistics Interpret True / False Stats 1: What is a Hypothesis? A testable assertion about how the world works Hypothesis

More information

No other aids are allowed. For example you are not allowed to have any other textbook or past exams.

No other aids are allowed. For example you are not allowed to have any other textbook or past exams. UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Sample Exam Note: This is one of our past exams, In fact the only past exam with R. Before that we were using SAS. In

More information

G562 Geometric Morphometrics. Statistical Tests. Department of Geological Sciences Indiana University. (c) 2012, P. David Polly

G562 Geometric Morphometrics. Statistical Tests. Department of Geological Sciences Indiana University. (c) 2012, P. David Polly Statistical Tests Basic components of GMM Procrustes This aligns shapes and minimizes differences between them to ensure that only real shape differences are measured. PCA (primary use) This creates a

More information

Unbalanced Data in Factorials Types I, II, III SS Part 1

Unbalanced Data in Factorials Types I, II, III SS Part 1 Unbalanced Data in Factorials Types I, II, III SS Part 1 Chapter 10 in Oehlert STAT:5201 Week 9 - Lecture 2 1 / 14 When we perform an ANOVA, we try to quantify the amount of variability in the data accounted

More information

General Linear Statistical Models - Part III

General Linear Statistical Models - Part III General Linear Statistical Models - Part III Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Interaction Models Lets examine two models involving Weight and Domestic in the cars93 dataset.

More information

Data Exploration and Unsupervised Learning with Clustering

Data Exploration and Unsupervised Learning with Clustering Data Exploration and Unsupervised Learning with Clustering Paul F Rodriguez,PhD San Diego Supercomputer Center Predictive Analytic Center of Excellence Clustering Idea Given a set of data can we find a

More information

GMM - Generalized method of moments

GMM - Generalized method of moments GMM - Generalized method of moments GMM Intuition: Matching moments You want to estimate properties of a data set {x t } T t=1. You assume that x t has a constant mean and variance. x t (µ 0, σ 2 ) Consider

More information

Appendix S1. Alternative derivations of Weibull PDF and. and data testing

Appendix S1. Alternative derivations of Weibull PDF and. and data testing Appendix S1. Alternative derivations of Weibull PDF and CDF for energies E and data testing General assumptions A methylation change at a genomic region has an associated amount of information I processed

More information

SPSS LAB FILE 1

SPSS LAB FILE  1 SPSS LAB FILE www.mcdtu.wordpress.com 1 www.mcdtu.wordpress.com 2 www.mcdtu.wordpress.com 3 OBJECTIVE 1: Transporation of Data Set to SPSS Editor INPUTS: Files: group1.xlsx, group1.txt PROCEDURE FOLLOWED:

More information

Estimated Simple Regression Equation

Estimated Simple Regression Equation Simple Linear Regression A simple linear regression model that describes the relationship between two variables x and y can be expressed by the following equation. The numbers α and β are called parameters,

More information

ST430 Exam 2 Solutions

ST430 Exam 2 Solutions ST430 Exam 2 Solutions Date: November 9, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textbook are permitted but you may use a calculator. Giving

More information

Stat 5303 (Oehlert): Analysis of CR Designs; January

Stat 5303 (Oehlert): Analysis of CR Designs; January Stat 5303 (Oehlert): Analysis of CR Designs; January 2016 1 > resin

More information

Experimental Design and Data Analysis for Biologists

Experimental Design and Data Analysis for Biologists Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1

More information

Introduction and Single Predictor Regression. Correlation

Introduction and Single Predictor Regression. Correlation Introduction and Single Predictor Regression Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Correlation A correlation

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression MATH 282A Introduction to Computational Statistics University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/ eariasca/math282a.html MATH 282A University

More information

Confidence Intervals, Testing and ANOVA Summary

Confidence Intervals, Testing and ANOVA Summary Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0

More information

13 Simple Linear Regression

13 Simple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 3 Simple Linear Regression 3. An industrial example A study was undertaken to determine the effect of stirring rate on the amount of impurity

More information

Homework 9 Sample Solution

Homework 9 Sample Solution Homework 9 Sample Solution # 1 (Ex 9.12, Ex 9.23) Ex 9.12 (a) Let p vitamin denote the probability of having cold when a person had taken vitamin C, and p placebo denote the probability of having cold

More information

Community surveys through space and time: testing the space-time interaction in the absence of replication

Community surveys through space and time: testing the space-time interaction in the absence of replication Community surveys through space and time: testing the space-time interaction in the absence of replication Pierre Legendre Département de sciences biologiques Université de Montréal http://www.numericalecology.com/

More information

Multiple Regression Introduction to Statistics Using R (Psychology 9041B)

Multiple Regression Introduction to Statistics Using R (Psychology 9041B) Multiple Regression Introduction to Statistics Using R (Psychology 9041B) Paul Gribble Winter, 2016 1 Correlation, Regression & Multiple Regression 1.1 Bivariate correlation The Pearson product-moment

More information

Appendix A : rational of the spatial Principal Component Analysis

Appendix A : rational of the spatial Principal Component Analysis Appendix A : rational of the spatial Principal Component Analysis In this appendix, the following notations are used : X is the n-by-p table of centred allelic frequencies, where rows are observations

More information

Stat 5102 Final Exam May 14, 2015

Stat 5102 Final Exam May 14, 2015 Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions

More information

Modeling Overdispersion

Modeling Overdispersion James H. Steiger Department of Psychology and Human Development Vanderbilt University Regression Modeling, 2009 1 Introduction 2 Introduction In this lecture we discuss the problem of overdispersion in

More information

BIOSTATS 640 Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) STATA Users

BIOSTATS 640 Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) STATA Users Unit Regression and Correlation 1 of - Practice Problems Solutions Stata Users 1. In this exercise, you will gain some practice doing a simple linear regression using a Stata data set called week0.dta.

More information

Introduction GeoXp : an R package for interactive exploratory spatial data analysis. Illustration with a data set of schools in Midi-Pyrénées.

Introduction GeoXp : an R package for interactive exploratory spatial data analysis. Illustration with a data set of schools in Midi-Pyrénées. Presentation of Presentation of Use of Introduction : an R package for interactive exploratory spatial data analysis. Illustration with a data set of schools in Midi-Pyrénées. Authors of : Christine Thomas-Agnan,

More information

Exam details. Final Review Session. Things to Review

Exam details. Final Review Session. Things to Review Exam details Final Review Session Short answer, similar to book problems Formulae and tables will be given You CAN use a calculator Date and Time: Dec. 7, 006, 1-1:30 pm Location: Osborne Centre, Unit

More information

Univariate analysis. Simple and Multiple Regression. Univariate analysis. Simple Regression How best to summarise the data?

Univariate analysis. Simple and Multiple Regression. Univariate analysis. Simple Regression How best to summarise the data? Univariate analysis Example - linear regression equation: y = ax + c Least squares criteria ( yobs ycalc ) = yobs ( ax + c) = minimum Simple and + = xa xc xy xa + nc = y Solve for a and c Univariate analysis

More information

Statistical Analysis. G562 Geometric Morphometrics PC 2 PC 2 PC 3 PC 2 PC 1. Department of Geological Sciences Indiana University

Statistical Analysis. G562 Geometric Morphometrics PC 2 PC 2 PC 3 PC 2 PC 1. Department of Geological Sciences Indiana University PC 2 PC 2 G562 Geometric Morphometrics Statistical Analysis PC 2 PC 1 PC 3 Basic components of GMM Procrustes Whenever shapes are analyzed together, they must be superimposed together This aligns shapes

More information

Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee

Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee Lecture - 04 Basic Statistics Part-1 (Refer Slide Time: 00:33)

More information

Regression. Marc H. Mehlman University of New Haven

Regression. Marc H. Mehlman University of New Haven Regression Marc H. Mehlman marcmehlman@yahoo.com University of New Haven the statistician knows that in nature there never was a normal distribution, there never was a straight line, yet with normal and

More information

1 Introduction 1. 2 The Multiple Regression Model 1

1 Introduction 1. 2 The Multiple Regression Model 1 Multiple Linear Regression Contents 1 Introduction 1 2 The Multiple Regression Model 1 3 Setting Up a Multiple Regression Model 2 3.1 Introduction.............................. 2 3.2 Significance Tests

More information

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Lecture 2. The Simple Linear Regression Model: Matrix Approach Lecture 2 The Simple Linear Regression Model: Matrix Approach Matrix algebra Matrix representation of simple linear regression model 1 Vectors and Matrices Where it is necessary to consider a distribution

More information

Fractional Factorial Designs

Fractional Factorial Designs Fractional Factorial Designs ST 516 Each replicate of a 2 k design requires 2 k runs. E.g. 64 runs for k = 6, or 1024 runs for k = 10. When this is infeasible, we use a fraction of the runs. As a result,

More information

Tests of Linear Restrictions

Tests of Linear Restrictions Tests of Linear Restrictions 1. Linear Restricted in Regression Models In this tutorial, we consider tests on general linear restrictions on regression coefficients. In other tutorials, we examine some

More information

Lecture 10. Factorial experiments (2-way ANOVA etc)

Lecture 10. Factorial experiments (2-way ANOVA etc) Lecture 10. Factorial experiments (2-way ANOVA etc) Jesper Rydén Matematiska institutionen, Uppsala universitet jesper@math.uu.se Regression and Analysis of Variance autumn 2014 A factorial experiment

More information

the logic of parametric tests

the logic of parametric tests the logic of parametric tests define the test statistic (e.g. mean) compare the observed test statistic to a distribution calculated for random samples that are drawn from a single (normal) distribution.

More information

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box. FINAL EXAM ** Two different ways to submit your answer sheet (i) Use MS-Word and place it in a drop-box. (ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box. Deadline: December

More information

Workshop 7.4a: Single factor ANOVA

Workshop 7.4a: Single factor ANOVA -1- Workshop 7.4a: Single factor ANOVA Murray Logan November 23, 2016 Table of contents 1 Revision 1 2 Anova Parameterization 2 3 Partitioning of variance (ANOVA) 10 4 Worked Examples 13 1. Revision 1.1.

More information

Multivariate Analysis of Variance

Multivariate Analysis of Variance Chapter 15 Multivariate Analysis of Variance Jolicouer and Mosimann studied the relationship between the size and shape of painted turtles. The table below gives the length, width, and height (all in mm)

More information

GRAD6/8104; INES 8090 Spatial Statistic Spring 2017

GRAD6/8104; INES 8090 Spatial Statistic Spring 2017 Lab #5 Spatial Regression (Due Date: 04/29/2017) PURPOSES 1. Learn to conduct alternative linear regression modeling on spatial data 2. Learn to diagnose and take into account spatial autocorrelation in

More information

Homework 2. For the homework, be sure to give full explanations where required and to turn in any relevant plots.

Homework 2. For the homework, be sure to give full explanations where required and to turn in any relevant plots. Homework 2 1 Data analysis problems For the homework, be sure to give full explanations where required and to turn in any relevant plots. 1. The file berkeley.dat contains average yearly temperatures for

More information

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA s:5 Applied Linear Regression Chapter 8: ANOVA Two-way ANOVA Used to compare populations means when the populations are classified by two factors (or categorical variables) For example sex and occupation

More information

SMA 6304 / MIT / MIT Manufacturing Systems. Lecture 10: Data and Regression Analysis. Lecturer: Prof. Duane S. Boning

SMA 6304 / MIT / MIT Manufacturing Systems. Lecture 10: Data and Regression Analysis. Lecturer: Prof. Duane S. Boning SMA 6304 / MIT 2.853 / MIT 2.854 Manufacturing Systems Lecture 10: Data and Regression Analysis Lecturer: Prof. Duane S. Boning 1 Agenda 1. Comparison of Treatments (One Variable) Analysis of Variance

More information

Comparing Nested Models

Comparing Nested Models Comparing Nested Models ST 370 Two regression models are called nested if one contains all the predictors of the other, and some additional predictors. For example, the first-order model in two independent

More information

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75

More information

STAT 510 Final Exam Spring 2015

STAT 510 Final Exam Spring 2015 STAT 510 Final Exam Spring 2015 Instructions: The is a closed-notes, closed-book exam No calculator or electronic device of any kind may be used Use nothing but a pen or pencil Please write your name and

More information

Six Sigma Black Belt Study Guides

Six Sigma Black Belt Study Guides Six Sigma Black Belt Study Guides 1 www.pmtutor.org Powered by POeT Solvers Limited. Analyze Correlation and Regression Analysis 2 www.pmtutor.org Powered by POeT Solvers Limited. Variables and relationships

More information

MEMGENE package for R: Tutorials

MEMGENE package for R: Tutorials MEMGENE package for R: Tutorials Paul Galpern 1,2 and Pedro Peres-Neto 3 1 Faculty of Environmental Design, University of Calgary 2 Natural Resources Institute, University of Manitoba 3 Département des

More information

WELCOME! Lecture 13 Thommy Perlinger

WELCOME! Lecture 13 Thommy Perlinger Quantitative Methods II WELCOME! Lecture 13 Thommy Perlinger Parametrical tests (tests for the mean) Nature and number of variables One-way vs. two-way ANOVA One-way ANOVA Y X 1 1 One dependent variable

More information