BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression
|
|
- Diane Williams
- 5 years ago
- Views:
Transcription
1 BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression Introduction to Correlation and Regression The procedures discussed in the previous ANOVA labs are most useful in cases where we are interested in testing hypotheses about differences in the locations of several populations in terms of one or more sets of "factors" which represent discrete levels. However, we may also be interested in examining the relationship between random variables Y and X, where both X and Y are continuous measures on each subject sampled in a single population. When there are just two such variables, the relationship is known as a "bivariate." If there is no implied relationship in which say variable Y "depends" on variable X, then we are just asking if the two variable, Y and X are associated, and we calculate correlation coefficients to determine the strength of this association. However, if two variables are related in such a way that the value of one the variables X is useful in predicting the value of the variable Y, then we can explore the "regression" of variable Y on variable X, we fit a linear model.. Correlation and linear regression analyses are useful in evaluating the association between variables and expressing the nature of their relationship. Correlation Correlation measures the strength of the association between two variables, say Y and X. Correlation is related to Regression, but correlation analyses make different assumptions about the data. First in correlation, there is no independent or dependent variable, so one is not predicting Y from X as in Regression. The Pearson Product Moment linear correlation coefficient assumes the data are independent and bivariate normal - that is that the joint probability distribution of (Y, X) is bivariate normal. The formula for Pearson's r is: n ( i=1 (y i y )(x i x ) ) r = ( n i=1 (y i y ) 2 (x i x ) 2 n i=1 ) 1/2 Pearson's r takes on values between -1 and +1, with values near 0 indicating no correlation (no association), and values near 1 meaning a strong positive association in which Y and X increase together, and values near -1 meaning a strong negative association indicating that when the value of Y goes up the value of X goes down, or vice versa. Significance tests are available to establish that the estimated correlation is unlikely by chance at some α level. There are also non-parametric correlation coefficients. The most commonly used is Spearman rho, ρ. To calculate Spearman's ρ, first rank the Y and X values separately, and then calculate the difference in the Y and X ranks for each subject (d i = R y - R x ). Spearmans ρ is then: 1
2 ρ = 1 6 n i=1 d i 2 n(n 2 1) In R, one can apply the functions "cor" or "cor.test" to calculate Pearson's, Spearman's, or Kendall's correlation coefficient. The function "cor" does not provide a significance test, but cor.test does. The necessary arguments are: cor.test(x,y,method="pearson") or alternatively insert "spearman" or "kendall" for the method argument. For example using the data file kenyabees.csv we can perform a correlation analysis computing the Pearson r or Spearman's ρ. The data represent the coefficient of variation in bee abundance (CVN) at pan traps on farms spread across several regions in Kenya. The coefficient of variation is calculated as CV = (s x ) 100 and is a measure of variability that attempts to remove the fact that data with high means have higher variances in order to compare variability among samples with different means. The variable CTYPE is the number of different crop species planted at each farm. Zero indicates that the farm had no planted crops, only pasture. Read in the data and obtain a scatter plot. dat=read.csv("k:/biometry/biometry-fall-2015/lab9/kenyabees.csv",header=true) head(dat) ctype cvn text plot(dat$ctype,dat$cvn, xlab= "Number of Crop Species", ylab= "CV Bees", lwd= 1, cex.lab=1.25,cex.axis=1.25,cex=1.5) 2
3 There is a trend toward lower values of the CV in bee abundance on farms with more crop species. However, it is difficult to tell how strong the trend is. Now calculate Pearson's r using cor.test cor.test(dat$ctype,dat$cvn,na.rm=true, method ="pearson") Pearson's product-moment correlation data: dat$ctype and dat$cvn t = , df = 93, p-value = alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: sample estimates: cor Now calculate Spearman's ρ using cor.test. cor.test(dat$ctype,dat$cvn,method="spearman") Warning in cor.test.default(dat$ctype, dat$cvn, method = "spearman"): Cannot compute exact p-value with ties Spearman's rank correlation rho data: dat$ctype and dat$cvn S = , p-value = alternative hypothesis: true rho is not equal to 0 sample estimates: rho
4 Neither correlation coefficient is significant at α = 0.05, but the Pearson's r = p = is suggestive of a trend. If one just wanted the value of the correlation coefficient, the cor function could be used to calculate either the Pearson's, Spearman's or Kendall's correlation coefficient. Note however, that cor does not automatically discard cases with incomplete or missing data as des cor.test. One must specify the "use=" argument to be "complete.obs." cor(dat$ctype,dat$cvn, use="complete.obs",method="pearson") [1] Regression In regression, Y is termed the dependent variable, and X the independent variable and one builds a model using X to predict values of Y. For example, consider the relationship between crop yield and precipitation. Yield (Y) is a function of precipitation (X), since we hypothesize that water availability affects plant growth, but on average plant growth does not affect precipitation. Using linear regression, we can quantify the observed relationship between the two variables. We might ask if there is a significant regression of yield on precipitation, indicating that yield can be predicted from knowledge of precipitation, or if no regression exists and yield cannot be predicted from knowledge of precipitation. In bivariate regression, we assume that the relationship between variables can be described by a straight line. The line relating two variables X and Y is described by the equation: Y = b o + b 1 x + ε where, b 0 is called the intercept, corresponding to the point where X = 0 and the line intercepts the Y axis, and b 1 is the slope, the change in Y per unit change in X. The independent variable, X, is used to predict Y, the dependent variable. b 0 and b 1 are the regression coefficients or the parameters of a line which fix and define the linear relationship between Y and X. ε is the "error" or that component of the variation in the values of Y that cannot be predicted by the regression of Y on X. ε arises both because the fit of the regression equation or regression model to the data may be inadequate and because there is inherent variability in the values of Y observed at each value of X. The differences (errors) between the actual values of Y and the values predicted by the regression equation - a line fitting the data - are called residuals. The estimation of the parameters is performed by finding the "best fit" regression line, the line that minimizes the sums - of - square of the deviations of the observed values from the predicted values. This method is called the method of Ordinary Least Squares (OLS). To fit a regression model to the data on CV of be abundance from Kenya, we use CV as the dependent variable and CTYPE as the independent variable. This is because it 4
5 seems possible that the variability in bee abundance might depend on the number of crop species on a farm, but unlikely that the number of crop types on a farm depends on the variability in bee abundance, rather it is up to the farmer how many crops to plant. To fit a linear regression in R, we use the "lm" or linear model function and then get a summary of the model fit. All we need to specify is a formula for the model. which is of the form y~x (y is a function of x). m1=lm(dat$cvn~dat$ctype) summary(m1) Call: lm(formula = dat$cvn ~ dat$ctype) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) <2e-16 *** dat$ctype Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 93 degrees of freedom (9 observations deleted due to missingness) Multiple R-squared: , Adjusted R-squared: F-statistic: on 1 and 93 DF, p-value: Using the str(m1) function in R, you could see all the information contained in the model object m1. Several items can be extracted using the extractor symbol $. For example, the $coefficients, $residuals, and $fitted.values. The summary of the model object includes a summary of the values of the residuals. The regression coefficients and their standard errors are given in the "estimate" and "Std. Error" columns, respectively. The row labeled "(Intercept)" is the y - intercept (b 0 ) in the model, and the row labeled dat$ctype is the regression coefficient for CTYPE (b 1 ). On the same rows there are t - tests that test the null hypothesis that the respective coefficient equal 0. At the bottom of the summary, you will see the Residual Standard error (the square root of the Mean Square Residual), An F - test for the significance of the overall model, and an estimate of R 2 which the proportion of variation in the Y variable that is accounted for by the X variable. We can plot the fitted regression line on the scatter plot of points using the function "abline" in R. We see the downward trend in CV, but there is substantial scatter of the data about the fitted line. 5
6 plot(dat$ctype,dat$cvn, xlab= "Number of Crop Species", ylab= "CV Bees", lwd= 1, cex.lab=1.25,cex.axis=1.25,cex=1.5) abline(lm(dat$cvn ~ dat$ctype),lwd=2) Assumptions of Linear Regression In using linear regression, a number of assumptions must be made; these are discussed in detail in lecture. In summary, it is assumed that: 1. each value of X is measured without error; 2. the set of observed (X, Y) values consists of n independent measures; 3. ε is a normally distributed error with a mean of zero and some standard deviation which is constant for all values of X (homogeneity of variances) 4. Y is a linear function of X. If we make these assumptions, fitting the regression model is very simple; however, to use the regression model to predict or estimate values of Y, we must test the assumptions we have made. The residuals are the difference between the observed and predicted values of Y. These deviations can be used as a tool to see if the necessary assumptions for regression have been met, and to further investigate the adequacy or goodness - of - fit of the regression model. If the assumptions have been met, the residuals (errors) should be independent and, for each value of X, the set of possible residuals should be approximately normally distributed with a mean of zero and a variance that is not a function of X. If the residuals do not have these characteristics, then some of the assumptions made in fitting the 2 σ e 6
7 model must be incorrect and the results of the regression cannot be deemed valid. Therefore, it is not reasonable to accept a regression without examining whether the assumptions are met. There are many methods that can be used in evaluating the residuals from a regression equation. Some of these methods are objective hypothesis tests; the alternative is to graph the residuals versus the values of X and to evaluate them subjectively. The properties to be evaluated are: 1) independence, 2) normality, and 3) constant variance. The property of independence can be tested in a variety of ways. Basically, though, we can assume that, if the errors are independent, the errors will not tend to have any pattern; they will be random. If they are not random, this fact will be evident because there will be a few long series of either positive or negative values, instead of numerous shorter series. A hypothesis test known as a "runs" test can be used to evaluate the randomness of the residuals; alternately, a subjective evaluation can be performed. Other objective tests are also available. Normality is another desirable property in the residuals. What is required is that for any particular value of X the set of possible errors should be normally distributed. Unless a large number of measurements of Y have been obtained for each of the values of X, there is no reasonable way of testing this assumption. Therefore, we examine the overall distribution of the residuals for normality using graphical and/or statistical approaches. The requirement of a constant variance is very important in evaluating regression models. However, unless the number of residuals is relatively large, a subjective visual evaluation is usually all that can be performed. Visually, residuals that suggest that the variances are constant look like a rectangular scatter that is evenly spread about the regression line all along its length. However, when samples sizes are small it is often difficult to make an accurate judgment about constancy of variances based on a visual inspection of the residuals. One way in R to get some quick diagnostic information to help determine if the data meet the assumption of regression is to plot the model object. This will produce 4 diagnostic plots so, if you want to see them simultaneously then you need to set a plotting parameter mfrows as such par(mfrows=c(2,2)) This will generate two rows of plots each with two plots. par(mfrow=c(2,2)) plot(m1) 7
8 The upper left plot shows the raw residuals plotted against the predicted y - values from the model. I prefer to plot the standardized residuals against either the values of X or the standardized values of X. In either case, the scatter of the residuals should be similar along the length of the plot. The lower left plot shows the square root of the standardized residual with a similar pattern. These plots are useful to determine if the data meet the homogeneity of variances assumption. An equal range of the scatter of the data about the horizontal line would suggest that the assumption is met. The upper right plot is a normal probability plot of the residuals. If the residuals are normally distributed then the points should fall on the dotted line. In this case they deviate substantially in at the upper end of the plot. Using the norm function in the QuantPsyc package on the residuals, show that the residuals are not normally distributed. library(quantpsyc) 8
9 Loading required package: boot Loading required package: MASS Attaching package: 'QuantPsyc' The following object is masked from 'package:base': norm norm(m1$residuals) Statistic SE t-val p Skewness e-07 Kurtosis e-02 The final plot in the lower right is used to diagnose data points that are outliers or overly influential. The solid red line demarcates where points would have Cook's D values of 1 or more from those with Cook's D values of less than 1. Cook's D is a measure of the influence of a data point on the regression. Small values of Cook's D are preferred (values <1). To get a plot of the standardized residuals versus the X values. We have to get the X values used in the model fit, since some data points were dropped because the CV values were missing (NA). If you use the str(m1) command, you will see that m1 consists of 13 lists. The second item in the 13th list are those values of the X variable. If you just tried to plot CTYPE versus the m1$residuals and error indicating that the vector are not the same length would occur. This is because the cases in CTYPE for which CV was NA are still present. nctype=m1[[13]][[2]] stdresid=scale(m1$residuals,center=true, scale=true) plot(nctype,stdresid) abline(h=0, lty=2) 9
10 This plot also does not have an equal scatter about the fitted lien (represented by the horizontal dashed line. However, we cannot tell if the low scatter on the right end of the plot is due to inherently unequal variances, or to there being few farms that plant a large number of crops in our sample of farms. Further Instructions for Lab 9 Data files for regression and correlation require that each subject be represented by a line in the data file and each column represents a variable. So, for correlation or bivariate regression, an R data file need only have 2 columns of values. However, if you have more than two variables for a single set of subjects for which you want to calculate their correlations, just enter all the variables in separate columns and R can calculate the correlations between the variables in each pair of columns - a correlation matrix. instead of inserting the X and Y variable names when using the 'cor' function insert the name of the data.frame and all pairs of correlation will be calculated. LAB - 9 Assignment PART 1: Introduction to Correlation and Regression The Bermuda Petrel is an oceanic bird spending most of its year on the open sea, only returning to land during the breeding season. Its nesting sites are on a small, uninhabited island of the Bermuda group, where careful hatching records have been kept over several years. The Bermuda Petrel feeds only upon fish caught in the open ocean waters far from land. Unfortunately, DDT is now so widespread, and is so concentrated by the biological amplification system knows as the "food chain," that the Bermuda Petrel can no longer lay hard shelled eggs. Since DDT breaks down so slowly, it would appear that this beautiful bird is doomed to extinction (along with how many others?) You data below represent hatching rates of clutches of eggs over a number of years. Use correlation and linear regression in R to see if there is a significant relationship between the percent of clutches hatching over time. Interpret the output. Also produce a scatter plot of the relationship between hatching rate and year. Year % of Clutches Hatching 10
11 PART 2: Assumptions of simple linear regression A) Using the kenyabees.csv data, is it possible to transform the CV data to an alternative scale on which the residuals and the Y variable are normally distributed? For example, what if we log transformed the CV data? B) Estimate the linear regression model for each of the three sample data sets (reg1, reg3, reg5) using the lm function in R. Use data in Column 1 as the X-variate and column 2 as the Y-variate in each data file. C) Write the regression equation for at least two of the data sets. D) Reiterating from the lab, the null hypothesis to be tested in each instance states that Y is not a linear function of X, and thus X will not be a good predictor of Y. More specifically, under the null hypothesis we are testing that the slope, b 1, will be equal to zero, since this would be indicative of no relationship between the two variables. At the α = 0.05 level, based on the output of the regression alone (F - test) for which of the three data sets would you reject the null hypothesis? E) Based on the R 2 values, which model reveals the best fit? F) To see if the models are adequate, you must check to see if the assumptions of regression have been met. Use graphical and/or statistical methods to assess the assumptions of normality, homogeneity of variances, and linearity for each data set each data set? For which data sets is linear regression appropriate, and for which data sets is it clear that a linear regression model should not be imposed on the data? Would some transformation of scale for the Y or X data make these data normal and homoscedastic? Would transformation of X improve linearity? 11
SLR output RLS. Refer to slr (code) on the Lecture Page of the class website.
SLR output RLS Refer to slr (code) on the Lecture Page of the class website. Old Faithful at Yellowstone National Park, WY: Simple Linear Regression (SLR) Analysis SLR analysis explores the linear association
More informationRegression. Marc H. Mehlman University of New Haven
Regression Marc H. Mehlman marcmehlman@yahoo.com University of New Haven the statistician knows that in nature there never was a normal distribution, there never was a straight line, yet with normal and
More informationNonparametric Statistics. Leah Wright, Tyler Ross, Taylor Brown
Nonparametric Statistics Leah Wright, Tyler Ross, Taylor Brown Before we get to nonparametric statistics, what are parametric statistics? These statistics estimate and test population means, while holding
More informationQuantitative Understanding in Biology Module II: Model Parameter Estimation Lecture I: Linear Correlation and Regression
Quantitative Understanding in Biology Module II: Model Parameter Estimation Lecture I: Linear Correlation and Regression Correlation Linear correlation and linear regression are often confused, mostly
More informationRegression and correlation
6 Regression and correlation The main object of this chapter is to show how to perform basic regression analyses, including plots for model checking and display of confidence and prediction intervals.
More informationLinear Modelling: Simple Regression
Linear Modelling: Simple Regression 10 th of Ma 2018 R. Nicholls / D.-L. Couturier / M. Fernandes Introduction: ANOVA Used for testing hpotheses regarding differences between groups Considers the variation
More informationRegression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.
Regression, Part I I. Difference from correlation. II. Basic idea: A) Correlation describes the relationship between two variables, where neither is independent or a predictor. - In correlation, it would
More informationDensity Temp vs Ratio. temp
Temp Ratio Density 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Density 0.0 0.2 0.4 0.6 0.8 1.0 1. (a) 170 175 180 185 temp 1.0 1.5 2.0 2.5 3.0 ratio The histogram shows that the temperature measures have two peaks,
More informationInferences for Regression
Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In
More informationChapte The McGraw-Hill Companies, Inc. All rights reserved.
12er12 Chapte Bivariate i Regression (Part 1) Bivariate Regression Visual Displays Begin the analysis of bivariate data (i.e., two variables) with a scatter plot. A scatter plot - displays each observed
More informationLAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION
LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION In this lab you will learn how to use Excel to display the relationship between two quantitative variables, measure the strength and direction of the
More informationIntroduction and Single Predictor Regression. Correlation
Introduction and Single Predictor Regression Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Correlation A correlation
More informationLinear regression and correlation
Faculty of Health Sciences Linear regression and correlation Statistics for experimental medical researchers 2018 Julie Forman, Christian Pipper & Claus Ekstrøm Department of Biostatistics, University
More information1 A Review of Correlation and Regression
1 A Review of Correlation and Regression SW, Chapter 12 Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then
More informationAdvanced Statistical Regression Analysis: Mid-Term Exam Chapters 1-5
Advanced Statistical Regression Analysis: Mid-Term Exam Chapters 1-5 Instructions: Read each question carefully before determining the best answer. Show all work; supporting computer code and output must
More informationInference with Heteroskedasticity
Inference with Heteroskedasticity Note on required packages: The following code requires the packages sandwich and lmtest to estimate regression error variance that may change with the explanatory variables.
More informationREVIEW 8/2/2017 陈芳华东师大英语系
REVIEW Hypothesis testing starts with a null hypothesis and a null distribution. We compare what we have to the null distribution, if the result is too extreme to belong to the null distribution (p
More informationSTAT 3022 Spring 2007
Simple Linear Regression Example These commands reproduce what we did in class. You should enter these in R and see what they do. Start by typing > set.seed(42) to reset the random number generator so
More informationCorrelation. January 11, 2018
Correlation January 11, 2018 Contents Correlations The Scattterplot The Pearson correlation The computational raw-score formula Survey data Fun facts about r Sensitivity to outliers Spearman rank-order
More informationcor(dataset$measurement1, dataset$measurement2, method= pearson ) cor.test(datavector1, datavector2, method= pearson )
Tutorial 7: Correlation and Regression Correlation Used to test whether two variables are linearly associated. A correlation coefficient (r) indicates the strength and direction of the association. A correlation
More informationSPSS LAB FILE 1
SPSS LAB FILE www.mcdtu.wordpress.com 1 www.mcdtu.wordpress.com 2 www.mcdtu.wordpress.com 3 OBJECTIVE 1: Transporation of Data Set to SPSS Editor INPUTS: Files: group1.xlsx, group1.txt PROCEDURE FOLLOWED:
More information9 Correlation and Regression
9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the
More informationMultiple Regression and Regression Model Adequacy
Multiple Regression and Regression Model Adequacy Joseph J. Luczkovich, PhD February 14, 2014 Introduction Regression is a technique to mathematically model the linear association between two or more variables,
More informationCuckoo Birds. Analysis of Variance. Display of Cuckoo Bird Egg Lengths
Cuckoo Birds Analysis of Variance Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison Statistics 371 29th November 2005 Cuckoo birds have a behavior in which they lay their
More informationMODULE 4 SIMPLE LINEAR REGRESSION
MODULE 4 SIMPLE LINEAR REGRESSION Module Objectives: 1. Describe the equation of a line including the meanings of the two parameters. 2. Describe how the best-fit line to a set of bivariate data is derived.
More informationGlossary. The ISI glossary of statistical terms provides definitions in a number of different languages:
Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the
More informationLab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model
Lab 3 A Quick Introduction to Multiple Linear Regression Psychology 310 Instructions.Work through the lab, saving the output as you go. You will be submitting your assignment as an R Markdown document.
More informationSlide 7.1. Theme 7. Correlation
Slide 7.1 Theme 7 Correlation Slide 7.2 Overview Researchers are often interested in exploring whether or not two variables are associated This lecture will consider Scatter plots Pearson correlation coefficient
More informationL21: Chapter 12: Linear regression
L21: Chapter 12: Linear regression Department of Statistics, University of South Carolina Stat 205: Elementary Statistics for the Biological and Life Sciences 1 / 37 So far... 12.1 Introduction One sample
More informationContents. Acknowledgments. xix
Table of Preface Acknowledgments page xv xix 1 Introduction 1 The Role of the Computer in Data Analysis 1 Statistics: Descriptive and Inferential 2 Variables and Constants 3 The Measurement of Variables
More informationLAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION
LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION In this lab you will first learn how to display the relationship between two quantitative variables with a scatterplot and also how to measure the strength of
More informationCorrelation and Simple Linear Regression
Correlation and Simple Linear Regression Sasivimol Rattanasiri, Ph.D Section for Clinical Epidemiology and Biostatistics Ramathibodi Hospital, Mahidol University E-mail: sasivimol.rat@mahidol.ac.th 1 Outline
More informationRegression and Models with Multiple Factors. Ch. 17, 18
Regression and Models with Multiple Factors Ch. 17, 18 Mass 15 20 25 Scatter Plot 70 75 80 Snout-Vent Length Mass 15 20 25 Linear Regression 70 75 80 Snout-Vent Length Least-squares The method of least
More informationChapter 8: Correlation & Regression
Chapter 8: Correlation & Regression We can think of ANOVA and the two-sample t-test as applicable to situations where there is a response variable which is quantitative, and another variable that indicates
More informationIntroduction to Linear regression analysis. Part 2. Model comparisons
Introduction to Linear regression analysis Part Model comparisons 1 ANOVA for regression Total variation in Y SS Total = Variation explained by regression with X SS Regression + Residual variation SS Residual
More informationLinear Probability Model
Linear Probability Model Note on required packages: The following code requires the packages sandwich and lmtest to estimate regression error variance that may change with the explanatory variables. If
More informationBiostatistics for physicists fall Correlation Linear regression Analysis of variance
Biostatistics for physicists fall 2015 Correlation Linear regression Analysis of variance Correlation Example: Antibody level on 38 newborns and their mothers There is a positive correlation in antibody
More informationRegression Analysis. Table Relationship between muscle contractile force (mj) and stimulus intensity (mv).
Regression Analysis Two variables may be related in such a way that the magnitude of one, the dependent variable, is assumed to be a function of the magnitude of the second, the independent variable; however,
More informationSimple Linear Regression: One Quantitative IV
Simple Linear Regression: One Quantitative IV Linear regression is frequently used to explain variation observed in a dependent variable (DV) with theoretically linked independent variables (IV). For example,
More informationappstats27.notebook April 06, 2017
Chapter 27 Objective Students will conduct inference on regression and analyze data to write a conclusion. Inferences for Regression An Example: Body Fat and Waist Size pg 634 Our chapter example revolves
More informationUNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017
UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75
More informationAny of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure.
STATGRAPHICS Rev. 9/13/213 Calibration Models Summary... 1 Data Input... 3 Analysis Summary... 5 Analysis Options... 7 Plot of Fitted Model... 9 Predicted Values... 1 Confidence Intervals... 11 Observed
More informationSTATISTICS 110/201 PRACTICE FINAL EXAM
STATISTICS 110/201 PRACTICE FINAL EXAM Questions 1 to 5: There is a downloadable Stata package that produces sequential sums of squares for regression. In other words, the SS is built up as each variable
More informationChapter 3 - Linear Regression
Chapter 3 - Linear Regression Lab Solution 1 Problem 9 First we will read the Auto" data. Note that most datasets referred to in the text are in the R package the authors developed. So we just need to
More informationMultiple Regression Introduction to Statistics Using R (Psychology 9041B)
Multiple Regression Introduction to Statistics Using R (Psychology 9041B) Paul Gribble Winter, 2016 1 Correlation, Regression & Multiple Regression 1.1 Bivariate correlation The Pearson product-moment
More informationImportant note: Transcripts are not substitutes for textbook assignments. 1
In this lesson we will cover correlation and regression, two really common statistical analyses for quantitative (or continuous) data. Specially we will review how to organize the data, the importance
More informationAnalysis of 2x2 Cross-Over Designs using T-Tests
Chapter 234 Analysis of 2x2 Cross-Over Designs using T-Tests Introduction This procedure analyzes data from a two-treatment, two-period (2x2) cross-over design. The response is assumed to be a continuous
More informationDr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46
BIO5312 Biostatistics Lecture 10:Regression and Correlation Methods Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 11/1/2016 1/46 Outline In this lecture, we will discuss topics
More informationIntroduction to Simple Linear Regression
Introduction to Simple Linear Regression 1. Regression Equation A simple linear regression (also known as a bivariate regression) is a linear equation describing the relationship between an explanatory
More informationNature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.
Understanding regression output from software Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals In 1966 Cyril Burt published a paper called The genetic determination of differences
More informationCorrelation and the Analysis of Variance Approach to Simple Linear Regression
Correlation and the Analysis of Variance Approach to Simple Linear Regression Biometry 755 Spring 2009 Correlation and the Analysis of Variance Approach to Simple Linear Regression p. 1/35 Correlation
More informationBIOL Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES
BIOL 458 - Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES PART 1: INTRODUCTION TO ANOVA Purpose of ANOVA Analysis of Variance (ANOVA) is an extremely useful statistical method
More informationCorrelation. We don't consider one variable independent and the other dependent. Does x go up as y goes up? Does x go down as y goes up?
Comment: notes are adapted from BIOL 214/312. I. Correlation. Correlation A) Correlation is used when we want to examine the relationship of two continuous variables. We are not interested in prediction.
More informationStatistics. Introduction to R for Public Health Researchers. Processing math: 100%
Statistics Introduction to R for Public Health Researchers Statistics Now we are going to cover how to perform a variety of basic statistical tests in R. Correlation T-tests/Rank-sum tests Linear Regression
More informationData Set 1A: Algal Photosynthesis vs. Salinity and Temperature
Data Set A: Algal Photosynthesis vs. Salinity and Temperature Statistical setting These data are from a controlled experiment in which two quantitative variables were manipulated, to determine their effects
More informationCorrelation and regression
Correlation and regression Patrick Breheny December 1, 2016 Today s lab is about correlation and regression. It will be somewhat shorter than some of our other labs, as I would also like to spend some
More informationBIOSTATS 640 Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) R Users
BIOSTATS 640 Spring 08 Unit. Regression and Correlation (Part of ) R Users Unit Regression and Correlation of - Practice Problems Solutions R Users. In this exercise, you will gain some practice doing
More informationChapter 3: Examining Relationships
Chapter 3: Examining Relationships Most statistical studies involve more than one variable. Often in the AP Statistics exam, you will be asked to compare two data sets by using side by side boxplots or
More informationLecture (chapter 13): Association between variables measured at the interval-ratio level
Lecture (chapter 13): Association between variables measured at the interval-ratio level Ernesto F. L. Amaral April 9 11, 2018 Advanced Methods of Social Research (SOCI 420) Source: Healey, Joseph F. 2015.
More informationInference for Regression
Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu
More informationComparing Nested Models
Comparing Nested Models ST 370 Two regression models are called nested if one contains all the predictors of the other, and some additional predictors. For example, the first-order model in two independent
More informationGeneralised linear models. Response variable can take a number of different formats
Generalised linear models Response variable can take a number of different formats Structure Limitations of linear models and GLM theory GLM for count data GLM for presence \ absence data GLM for proportion
More informationK. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =
K. Model Diagnostics We ve already seen how to check model assumptions prior to fitting a one-way ANOVA. Diagnostics carried out after model fitting by using residuals are more informative for assessing
More informationLinear Regression. In this lecture we will study a particular type of regression model: the linear regression model
1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor
More informationBivariate Relationships Between Variables
Bivariate Relationships Between Variables BUS 735: Business Decision Making and Research 1 Goals Specific goals: Detect relationships between variables. Be able to prescribe appropriate statistical methods
More informationChapter 12 - Part I: Correlation Analysis
ST coursework due Friday, April - Chapter - Part I: Correlation Analysis Textbook Assignment Page - # Page - #, Page - # Lab Assignment # (available on ST webpage) GOALS When you have completed this lecture,
More informationChapter 5 Exercises 1
Chapter 5 Exercises 1 Data Analysis & Graphics Using R, 2 nd edn Solutions to Exercises (December 13, 2006) Preliminaries > library(daag) Exercise 2 For each of the data sets elastic1 and elastic2, determine
More informationLecture 11: Simple Linear Regression
Lecture 11: Simple Linear Regression Readings: Sections 3.1-3.3, 11.1-11.3 Apr 17, 2009 In linear regression, we examine the association between two quantitative variables. Number of beers that you drink
More informationLecture 8: Fitting Data Statistical Computing, Wednesday October 7, 2015
Lecture 8: Fitting Data Statistical Computing, 36-350 Wednesday October 7, 2015 In previous episodes Loading and saving data sets in R format Loading and saving data sets in other structured formats Intro
More informationRegression and the 2-Sample t
Regression and the 2-Sample t James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Regression and the 2-Sample t 1 / 44 Regression
More informationInteractions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept
Interactions Lectures 1 & Regression Sometimes two variables appear related: > smoking and lung cancers > height and weight > years of education and income > engine size and gas mileage > GMAT scores and
More informationMeasuring relationships among multiple responses
Measuring relationships among multiple responses Linear association (correlation, relatedness, shared information) between pair-wise responses is an important property used in almost all multivariate analyses.
More informationIntroduction to Linear Regression
Introduction to Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Introduction to Linear Regression 1 / 46
More informationSimple linear regression
Simple linear regression Biometry 755 Spring 2008 Simple linear regression p. 1/40 Overview of regression analysis Evaluate relationship between one or more independent variables (X 1,...,X k ) and a single
More informationy response variable x 1, x 2,, x k -- a set of explanatory variables
11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate
More informationSimple Linear Regression
Simple Linear Regression September 24, 2008 Reading HH 8, GIll 4 Simple Linear Regression p.1/20 Problem Data: Observe pairs (Y i,x i ),i = 1,...n Response or dependent variable Y Predictor or independent
More informationConsider fitting a model using ordinary least squares (OLS) regression:
Example 1: Mating Success of African Elephants In this study, 41 male African elephants were followed over a period of 8 years. The age of the elephant at the beginning of the study and the number of successful
More informationR in Linguistic Analysis. Wassink 2012 University of Washington Week 6
R in Linguistic Analysis Wassink 2012 University of Washington Week 6 Overview R for phoneticians and lab phonologists Johnson 3 Reading Qs Equivalence of means (t-tests) Multiple Regression Principal
More informationRegression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.
TCELL 9/4/205 36-309/749 Experimental Design for Behavioral and Social Sciences Simple Regression Example Male black wheatear birds carry stones to the nest as a form of sexual display. Soler et al. wanted
More informationASSIGNMENT 3 SIMPLE LINEAR REGRESSION. Old Faithful
ASSIGNMENT 3 SIMPLE LINEAR REGRESSION In the simple linear regression model, the mean of a response variable is a linear function of an explanatory variable. The model and associated inferential tools
More informationPractical Statistics for the Analytical Scientist Table of Contents
Practical Statistics for the Analytical Scientist Table of Contents Chapter 1 Introduction - Choosing the Correct Statistics 1.1 Introduction 1.2 Choosing the Right Statistical Procedures 1.2.1 Planning
More informationSimple linear regression: estimation, diagnostics, prediction
UPPSALA UNIVERSITY Department of Mathematics Mathematical statistics Regression and Analysis of Variance Autumn 2015 COMPUTER SESSION 1: Regression In the first computer exercise we will study the following
More informationChapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania
Chapter 10 Regression Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania Scatter Diagrams A graph in which pairs of points, (x, y), are
More informationChapter 8: Correlation & Regression
Chapter 8: Correlation & Regression We can think of ANOVA and the two-sample t-test as applicable to situations where there is a response variable which is quantitative, and another variable that indicates
More informationRemedial Measures, Brown-Forsythe test, F test
Remedial Measures, Brown-Forsythe test, F test Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 7, Slide 1 Remedial Measures How do we know that the regression function
More informationFormal Statement of Simple Linear Regression Model
Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor
More informationChapter 16. Simple Linear Regression and Correlation
Chapter 16 Simple Linear Regression and Correlation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will
More informationVariance Decomposition and Goodness of Fit
Variance Decomposition and Goodness of Fit 1. Example: Monthly Earnings and Years of Education In this tutorial, we will focus on an example that explores the relationship between total monthly earnings
More informationCHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)
FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter
More informationLAB 2. HYPOTHESIS TESTING IN THE BIOLOGICAL SCIENCES- Part 2
LAB 2. HYPOTHESIS TESTING IN THE BIOLOGICAL SCIENCES- Part 2 Data Analysis: The mean egg masses (g) of the two different types of eggs may be exactly the same, in which case you may be tempted to accept
More information1 Correlation and Inference from Regression
1 Correlation and Inference from Regression Reading: Kennedy (1998) A Guide to Econometrics, Chapters 4 and 6 Maddala, G.S. (1992) Introduction to Econometrics p. 170-177 Moore and McCabe, chapter 12 is
More informationR 2 and F -Tests and ANOVA
R 2 and F -Tests and ANOVA December 6, 2018 1 Partition of Sums of Squares The distance from any point y i in a collection of data, to the mean of the data ȳ, is the deviation, written as y i ȳ. Definition.
More informationPubH 7405: REGRESSION ANALYSIS SLR: DIAGNOSTICS & REMEDIES
PubH 7405: REGRESSION ANALYSIS SLR: DIAGNOSTICS & REMEDIES Normal Error RegressionModel : Y = β 0 + β ε N(0,σ 2 1 x ) + ε The Model has several parts: Normal Distribution, Linear Mean, Constant Variance,
More informationWeek 7 Multiple factors. Ch , Some miscellaneous parts
Week 7 Multiple factors Ch. 18-19, Some miscellaneous parts Multiple Factors Most experiments will involve multiple factors, some of which will be nuisance variables Dealing with these factors requires
More informationANOVA (Analysis of Variance) output RLS 11/20/2016
ANOVA (Analysis of Variance) output RLS 11/20/2016 1. Analysis of Variance (ANOVA) The goal of ANOVA is to see if the variation in the data can explain enough to see if there are differences in the means.
More informationThe Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1)
The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1) Authored by: Sarah Burke, PhD Version 1: 31 July 2017 Version 1.1: 24 October 2017 The goal of the STAT T&E COE
More information36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression
36-309/749 Experimental Design for Behavioral and Social Sciences Sep. 22, 2015 Lecture 4: Linear Regression TCELL Simple Regression Example Male black wheatear birds carry stones to the nest as a form
More informationTABLES AND FORMULAS FOR MOORE Basic Practice of Statistics
TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics Exploring Data: Distributions Look for overall pattern (shape, center, spread) and deviations (outliers). Mean (use a calculator): x = x 1 + x
More informationdf=degrees of freedom = n - 1
One sample t-test test of the mean Assumptions: Independent, random samples Approximately normal distribution (from intro class: σ is unknown, need to calculate and use s (sample standard deviation)) Hypotheses:
More informationChapter 16. Simple Linear Regression and dcorrelation
Chapter 16 Simple Linear Regression and dcorrelation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will
More information