Online Resource 2: Why Tobit regression?
|
|
- Erica Burke
- 6 years ago
- Views:
Transcription
1 Online Resource 2: Why Tobit regression? March 8, 2017 Contents 1 Introduction 2 2 Inspect data graphically 3 3 Why is linear regression not good enough? Model assumptions are not fulfilled Pragmatism vs. rigorousness Why not log-transform? The sampling design of the predictors induces a systematic error Why not ANOVA? Discussion Tobit regression in practice Assumptions are fulfilled Conclusions 11 1
2 1 Introduction In this brief document we explain why Tobit regression was used to analyse the data, what its advantages over a linear model are and how it can be implemented in R. To do that, we introduce a toy example where the effect of distance from next garden onto the cover of Trachicarpus fortunei is analysed. Disclaimer: this document does not aim at being an introduction to Tobit regression. Here we illustrate the reasons that led us to use this method in our specific analysis. The book Analysis of failure and survival data by Peter Smith (Chapman & Hall/CRC) can be used as introduction to Tobit regression and related techniques. The data used here is a subset of the real data analysed in this paper. We start loading it and getting the needed variables. d.1 <- readrds(file = "DataConedera2017.RDS") d.1 <- subset(d.1, select = c(t.tra, Tra, dng)) str(d.1) 'data.frame': 200 obs. of 3 variables: $ T.Tra: num $ Tra : num $ dng : num head(d.1) T.Tra Tra dng ## install.packages('regr0', repos=' require(regr0) require(lattice) T.Tra is the transformed cover of hemp palm, one of the four response variables analysed in this publication. Tra is the untransformed cover of hemp palm (range ). dng, the sole predictor here, is the untransformed distance from the next garden (given in metres). 2
3 2 Inspect data graphically We start by plotting the values of the response variable against the predictor. As explained in the main text, we arc-sine square-root transformed the response variable to stabilise the variance of the residuals 1. We also chose the logarithmic scale for the distance to next garden. To better visualise data we used jittering on the x-axis (we add a small amount of noise to x values) and transparency (observations are semi-transparent). Both actions were carried out to alleviate the effect of overlap. By using this procedure, we can highlight the large number of observations that have zero cover. 0.4 Hemp palm cover (asin.sqrt transformed) log distance to next garden [m] The line drawn on this graph is a least squares regression line which shows that sampling plots close to gardens have highest covers of hemp palm. This does not come unexpected as the gardens in the study region often contain the hemp palm, and may act as seed reservoirs. 3 Why is linear regression not good enough? On could fit a linear regression to this data. However, as we will show below, this has several important drawbacks. 3.1 Model assumptions are not fulfilled In order to make statistical inference (i.e. compute p-values and confidence intervals) on a normal linear model, we assume that the errors are normally distributed, that they are independent from each other and that their variance is constant. More mathematically we can summarise this as: y = β 0 + β 1 x + ε ε iid N (0, σ 2 ε) (1) 1 This is standard procedure for proportions. 3
4 In our example, y is the transformed cover of the hemp palm (i.e. T.Tra), x is the log-distance from gardens (i.e. log(dng)), βs are the regression coefficients (intercept and slope) and ε are the errors. We fit the model: lm.0 <- lm(t.tra ~ log(dng), data = d.1) The usual way to assess whether the model assumptions are fulfilled is to produce residual diagnostics. For linear models the most important tool is the Tukey-Anscombe plot (i.e. a plot of the residuals against the fitted values). We reproduce this plot here for the linear model fitted to the toy data. Note that transparency is used again. Tukey Anscombe plot for lm residuals fitted values The variance of the residuals is evidently not constant, but increases with the fitted values. As the range of the vertical axis shows undoubtedly, the residuals are far from being symmetric. In addition to that, we can clearly see the bounding effect of zeros. Indeed, all the residuals of those observations showing zero cover lay on a line at the bottom of the graph. A quantile-quantile plot (not drawn here) would also show that the residuals do not follow a normal distribution. Thus, the model assumptions are grossly violated. 3.2 Pragmatism vs. rigorousness The above arguments against the use linear regression for this analysis may sound as not pragmatic and too rigorous. Indeed, the model assumptions are never perfectly fulfilled. However, there are also other practical implications of fitting a linear model to this data. As an example, the predicted values for sampling plots at more than 250 metres from a garden are negative. Given that we are modelling cover, this does not make sense. 4
5 3.3 Why not log-transform? To force the fitted values to be positive, we could log-transform the response variable. Note that we need to add a small positive value prior to log transform as some observed covers are 0. The positive constant added to the response variable prior to transformation is usually either 1 or the smallest non-zero value that has been observed in the data. Adding 1 is irrational since its effect depends on the measurement unit of the variable (e.g., in percent or in 1 in 1000). We therefore used the second choice. A more well-behaved modified logarithm that solves the problem of zeros in a rational way is implemented in the function logst() of the package regr0, which we use for our analysis. ( min.cover <- min(d.1$tra[d.1$tra!= 0]) ) [1] d.1$log.tra <- log(d.1$tra + min.cover) We then plot the newly obtained response variable against the predictor to inspect their relationship. 3 2 Hemp palm cover (log transformed) log distance to next garden [m] We now fit the model on the newly obtained response variable: lm.log <- lm(log.tra ~ log(dng), data = d.1) 5
6 In order to assess the model assumptions, we look at the Tukey-Anscombe plot again. Tukey Anscombe plot for lm.log residuals fitted values It is clear that the assumptions of the log-transformed model are not fulfilled. In addition to that, the back-transformed fitted values are not all positive as we hoped. Indeed, the addition of a constant prior to transformation results, in some cases, in negative fitted values. As an example sites at 350 and 450 metres are predicted to have a negative cover (see below). y.hat <- predict(lm.log, newdata = data.frame(dng = c(25, 75, 250, 350, 450))) round((exp(y.hat) - min.cover), 2) Adding 1 instead of minimum cover makes no difference. Because we are modelling cover, negative fitted values in the original space are clearly not sensible. Thus, the use of the log-transformation is not of any help here to get strictly positive fitted values. To somehow solve this problem negative values could be rounded to zero. Nevertheless, as we will show, Tobit regression offers a more elegant solution to this problem and solves other issues too. 3.4 The sampling design of the predictors induces a systematic error Again, one could argue that in this publication we are only interested in comparing the effects of the predictors in a fair way. Therefore, whether the model assumptions are perfectly fulfilled and whether the fitted values are all positive is unimportant. From a very pragmatic point of view this could be considered to be true. Nevertheless, we should note that the regression coefficient of dng depends on the sampling of the predictor. More in particular, if we sampled sites at further distances (at e.g. 750, 1500 and 5000 metres) we would have very likely observed zero covers. This would change the estimate obtained for dng (i.e. a flatter line would be obtained). Below we display graphically this situation. We create two fake data sets with additional observations at further distances. d.fake.1 goes up to
7 metres, while d.fake.2 up to metres. All these new observations have zero cover. We then fit two linear models with these additional observations. Ideally modifying the sampling of observations should not have any influence on the estimates. d.temp.1 <- data.frame(t.tra = 0, Tra = NA, log.tra = NA, dng = rep(c(750, 1500, 5000), each = 10)) d.fake.1 <- rbind(d.1, d.temp.1) lm.1 <- update(lm.0, data = d.fake.1) ## d.temp.2 <- data.frame(t.tra = 0, Tra = NA, log.tra = NA, dng = rep(c(750, 1500, 5000, 10000, 20000), each = 10)) d.fake.2 <- rbind(d.1, d.temp.2) lm.2 <- update(lm.0, data = d.fake.2) To display how the further sampling would affect the estimates, we reproduce the graph at page 3 and add the fitted values of the two new models. 0.4 Regressions original data fake 1 fake 2 Hemp palm cover (asin.sqrt transformed) log distance to next garden [m] The lines obtained here, are clearly flatter than the one obtained with the original data (blue line). The pink line is obtained with the data set where observations go to metres. The green line, which is even flatter, represents the regression with data up to We can formally compare the coefficients obtained. coef(lm.0)["log(dng)"] log(dng) coef(lm.1)["log(dng)"] log(dng)
8 coef(lm.2)["log(dng)"] log(dng) The regression coefficient for log-distance from the next garden (as well as the t and p-values) are different. The further you sample the predictor (i.e. distance to next garden), the lower the regression estimate will be. This is unfortunate and unwanted, as the design should not influence the estimated regression coefficient in a systematic way. 3.5 Why not ANOVA? We could analyse this data within the ANOVA framework (i.e. take dng as a categorical variable with 6 levels). However, note that other predictors analysed in the main analysis are continuous variables. This implies that the comparing between predictors would not be fair as the number of estimated parameters differ. Note in addition, that if we were to analyse this toy data within the ANOVA framework, we could formally compare levels of the factor in a post-hoc analysis. Here, we would conclude that groups 250 m, 350 m and 450 m are not statistically different from each others 2. However, in practice we would expect the cover at these distances to differ. Thus, it is important to highlight that observed cover does not indicate suitability here. To make a more extreme example, we could compare a plot at 250 metres and one in the middle of the lake. In both cases all observations would be zero. However, if we would enlarge our sample, we may be able to find the hemp palm in the 250 metres plot, but for sure not in the one in the middle of the lake. We therefore need to account for the fact that not all zeros carry the same information in this context. Essentially, we require a technique that allows zeros to be different. One possible solution to this problem is to use Tobit regression. This method enables the user to discriminate between zeros. In this example, all zero observations are said to be censored. In other words we assume that zero is the lowest value that we can possibly measure. 3.6 Discussion As we have seen above not considering the censoring of the data can lead to misleading results. In addition to that, the model assumptions were clearly violated in all models fitted. Thus, looking for a more appropriate model is supported by practical (i.e. biological interpretation of the results) and theoretical reasons (i.e. distributional assumptions). 2 To carry out this post-hoc analysis, we would assume that the dng factor has a significant effect and that the model assumptions are fulfilled. This is not the case as will be shown further down. 8
9 4 Tobit regression in practice A satisfactory way out of the discussed difficulties consists of using a model that is suitable for target variables which are either positive, with a potentially high probability for the value zero. This is called Tobit regression and relies on the following idea: the occurrence of the plants is driven by a variable that we can call potential for its growth. For clearly positive values, it is the expected coverage, from which the observed coverage deviates by the usual random error. If the potential declines to zero and below, the probability of observing zero coverage will grow and eventually reach one. More precisely, this probability equals the probability that the potential plus the random error is negative. This corresponds exactly to a regular linear model with the modification that the observations are censored at 0. Fitting a Tobit model with regr() function is trivial (package regr0). In this analysis, we consider that all zero observations are censored. d.1$censored <- d.1$t.tra == 0 table(d.1$censored) FALSE TRUE There are 61 censored observations out of 200. Here we fit a Tobit model using the wrapper function regr(). By using the limit we can declare the censoring (i.e. the smallest values possibly observed). After fitting the model we can look at the summary output. tob.0 <- regr(tobit(t.tra, limit = 0) ~ log(dng), data = d.1) summary(tob.0) Call: regr(formula = Tobit(T.Tra, limit = 0) ~ log(dng), data = d.1) Fitting function: survreg Terms: coef df cilow cihigh R2.x signif p.value (Intercept) NA NA log(dng) log(scale) NA NA NA NA NA p.symb (Intercept) log(dng) *** log(scale) NA --- Signif. codes: 0 *** ** 0.01 * deviance df p.value Model e-17 Null NA Distribution: gaussian. Shape parameter (`scale`): AIC: Not unexpectedly the summary tells us that distance to garden has a strong negative effect on the response variable. In addition, the regression coefficient that takes into account censoring of the data is obviously more negative than the one obtained with the normal linear model. Indeed, zeros are not considered to carry all the same information, and therefore the zero covers observed at 150 metres are considered to be different from those at 450 metres. 9
10 Again, we can compare the coefficients of the two models. coef(lm.0)["log(dng)"] log(dng) coef(tob.0)["log(dng)"] log(dng) Assumptions are fulfilled The model fitted here assumes that the observed data is censored at zero. However, what is actually modelled is a latent (unobserved) variable that has no censoring. In this case the latent variable is assumed to follow a normal distribution. The latent variable can biologically be interpreted as suitability. We can thus check the model assumptions with the classical Tukey-Anscombe plot 3. set.seed(1) res.sim.1 <- resid(tob.0)[, "random"] Tukey Anscombe plot for tobit model 0.4 residuals 0.2 censored FALSE TRUE fitted values No obvious violations of the model assumptions are visible here. There is no bounding effect and the variance of the residuals appears to be reasonably stable (i.e. homoscedastic). Note that the apparently smaller spread of residuals for small fitted values is partly due to the fact that there are less observations in this range 4. Finally, we show that the sampling design has no effect on the estimates! The regression coefficients obtained are exactly the same for all data sets used. 3 Note that to obtain meaningful plots, the residuals of censored observations are simulated. 4 A scale-location plot (i.e. absolute residuals plotted against the fitted values) would show this clearly. This is not shown here for the sake of brevity and because the information conveyed is partially redundant with the Tukey-Anscombe plot shown. 10
11 tob.fake.1 <- update(tob.0, data = d.fake.1) tob.fake.2 <- update(tob.0, data = d.fake.2) coef(tob.0) (Intercept) log(dng) coef(tob.fake.1) (Intercept) log(dng) coef(tob.fake.2) (Intercept) log(dng) Conclusions The advantages of using Tobit regression in this context are multiple. From a practical point of view it is important to note that the sampling design of the predictors does not influence the regression coefficients obtained. In addition, the modelling of a latent variable (i.e. suitability ) enables us to discriminate between zero covers, and solves the problem of the negative covers obtained in the other models. From a more rigorous point of view, the model assumptions are fulfilled and we can compare the effect of the predictors in a fair manner. 11
The R-Function regr and Package regr0 for an Augmented Regression Analysis
The R-Function regr and Package regr0 for an Augmented Regression Analysis Werner A. Stahel, ETH Zurich April 19, 2013 Abstract The R function regr is a wrapper function that allows for fitting several
More informationBIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression
BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression Introduction to Correlation and Regression The procedures discussed in the previous ANOVA labs are most useful in cases where we are interested
More informationLectures 5 & 6: Hypothesis Testing
Lectures 5 & 6: Hypothesis Testing in which you learn to apply the concept of statistical significance to OLS estimates, learn the concept of t values, how to use them in regression work and come across
More informationHypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima
Applied Statistics Lecturer: Serena Arima Hypothesis testing for the linear model Under the Gauss-Markov assumptions and the normality of the error terms, we saw that β N(β, σ 2 (X X ) 1 ) and hence s
More informationLecture 10: F -Tests, ANOVA and R 2
Lecture 10: F -Tests, ANOVA and R 2 1 ANOVA We saw that we could test the null hypothesis that β 1 0 using the statistic ( β 1 0)/ŝe. (Although I also mentioned that confidence intervals are generally
More informationR 2 and F -Tests and ANOVA
R 2 and F -Tests and ANOVA December 6, 2018 1 Partition of Sums of Squares The distance from any point y i in a collection of data, to the mean of the data ȳ, is the deviation, written as y i ȳ. Definition.
More informationWorkshop 7.4a: Single factor ANOVA
-1- Workshop 7.4a: Single factor ANOVA Murray Logan November 23, 2016 Table of contents 1 Revision 1 2 Anova Parameterization 2 3 Partitioning of variance (ANOVA) 10 4 Worked Examples 13 1. Revision 1.1.
More informationInferences for Regression
Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In
More informationRegression. Marc H. Mehlman University of New Haven
Regression Marc H. Mehlman marcmehlman@yahoo.com University of New Haven the statistician knows that in nature there never was a normal distribution, there never was a straight line, yet with normal and
More informationBasic Business Statistics 6 th Edition
Basic Business Statistics 6 th Edition Chapter 12 Simple Linear Regression Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of a dependent variable based
More informationStatistical View of Least Squares
Basic Ideas Some Examples Least Squares May 22, 2007 Basic Ideas Simple Linear Regression Basic Ideas Some Examples Least Squares Suppose we have two variables x and y Basic Ideas Simple Linear Regression
More informationThe Model Building Process Part I: Checking Model Assumptions Best Practice
The Model Building Process Part I: Checking Model Assumptions Best Practice Authored by: Sarah Burke, PhD 31 July 2017 The goal of the STAT T&E COE is to assist in developing rigorous, defensible test
More information9 Correlation and Regression
9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the
More informationThe Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1)
The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1) Authored by: Sarah Burke, PhD Version 1: 31 July 2017 Version 1.1: 24 October 2017 The goal of the STAT T&E COE
More informationChapter 9 Regression. 9.1 Simple linear regression Linear models Least squares Predictions and residuals.
9.1 Simple linear regression 9.1.1 Linear models Response and eplanatory variables Chapter 9 Regression With bivariate data, it is often useful to predict the value of one variable (the response variable,
More informationRegression. Bret Hanlon and Bret Larget. December 8 15, Department of Statistics University of Wisconsin Madison.
Regression Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison December 8 15, 2011 Regression 1 / 55 Example Case Study The proportion of blackness in a male lion s nose
More informationStatistical Modelling in Stata 5: Linear Models
Statistical Modelling in Stata 5: Linear Models Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 07/11/2017 Structure This Week What is a linear model? How good is my model? Does
More informationBIOL Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES
BIOL 458 - Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES PART 1: INTRODUCTION TO ANOVA Purpose of ANOVA Analysis of Variance (ANOVA) is an extremely useful statistical method
More informationProbability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur
Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur Lecture No. # 36 Sampling Distribution and Parameter Estimation
More informationDensity Temp vs Ratio. temp
Temp Ratio Density 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Density 0.0 0.2 0.4 0.6 0.8 1.0 1. (a) 170 175 180 185 temp 1.0 1.5 2.0 2.5 3.0 ratio The histogram shows that the temperature measures have two peaks,
More informationSCHOOL OF MATHEMATICS AND STATISTICS
RESTRICTED OPEN BOOK EXAMINATION (Not to be removed from the examination hall) Data provided: Statistics Tables by H.R. Neave MAS5052 SCHOOL OF MATHEMATICS AND STATISTICS Basic Statistics Spring Semester
More informationInference for Regression Inference about the Regression Model and Using the Regression Line
Inference for Regression Inference about the Regression Model and Using the Regression Line PBS Chapter 10.1 and 10.2 2009 W.H. Freeman and Company Objectives (PBS Chapter 10.1 and 10.2) Inference about
More informationMultiple Predictor Variables: ANOVA
Multiple Predictor Variables: ANOVA 1/32 Linear Models with Many Predictors Multiple regression has many predictors BUT - so did 1-way ANOVA if treatments had 2 levels What if there are multiple treatment
More informationCorrelation & Simple Regression
Chapter 11 Correlation & Simple Regression The previous chapter dealt with inference for two categorical variables. In this chapter, we would like to examine the relationship between two quantitative variables.
More informationStatistics 572 Semester Review
Statistics 572 Semester Review Final Exam Information: The final exam is Friday, May 16, 10:05-12:05, in Social Science 6104. The format will be 8 True/False and explains questions (3 pts. each/ 24 pts.
More informationInference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3
Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details Section 10.1, 2, 3 Basic components of regression setup Target of inference: linear dependency
More informationRegression Analysis: Exploring relationships between variables. Stat 251
Regression Analysis: Exploring relationships between variables Stat 251 Introduction Objective of regression analysis is to explore the relationship between two (or more) variables so that information
More informationInference with Simple Regression
1 Introduction Inference with Simple Regression Alan B. Gelder 06E:071, The University of Iowa 1 Moving to infinite means: In this course we have seen one-mean problems, twomean problems, and problems
More informationBusiness Statistics. Lecture 10: Course Review
Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,
More informationDiagnostics and Transformations Part 2
Diagnostics and Transformations Part 2 Bivariate Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University Multilevel Regression Modeling, 2009 Diagnostics
More informationExam Applied Statistical Regression. Good Luck!
Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.
More informationSTAT 572 Assignment 5 - Answers Due: March 2, 2007
1. The file glue.txt contains a data set with the results of an experiment on the dry sheer strength (in pounds per square inch) of birch plywood, bonded with 5 different resin glues A, B, C, D, and E.
More informationExperimental Design and Data Analysis for Biologists
Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1
More informationUsing R formulae to test for main effects in the presence of higher-order interactions
Using R formulae to test for main effects in the presence of higher-order interactions Roger Levy arxiv:1405.2094v2 [stat.me] 15 Jan 2018 January 16, 2018 Abstract Traditional analysis of variance (ANOVA)
More informationChapter 7: Simple linear regression
The absolute movement of the ground and buildings during an earthquake is small even in major earthquakes. The damage that a building suffers depends not upon its displacement, but upon the acceleration.
More informationOutline. Possible Reasons. Nature of Heteroscedasticity. Basic Econometrics in Transportation. Heteroscedasticity
1/25 Outline Basic Econometrics in Transportation Heteroscedasticity What is the nature of heteroscedasticity? What are its consequences? How does one detect it? What are the remedial measures? Amir Samimi
More informationSRBx14_9.xls Ch14.xls
Model Based Statistics in Biology. Part IV. The General Linear Model. Multiple Explanatory Variables. ANCOVA Model Revision ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch 9,
More informationMachine Learning Linear Regression. Prof. Matteo Matteucci
Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares
More informationData Set 1A: Algal Photosynthesis vs. Salinity and Temperature
Data Set A: Algal Photosynthesis vs. Salinity and Temperature Statistical setting These data are from a controlled experiment in which two quantitative variables were manipulated, to determine their effects
More informationIntroduction to Linear regression analysis. Part 2. Model comparisons
Introduction to Linear regression analysis Part Model comparisons 1 ANOVA for regression Total variation in Y SS Total = Variation explained by regression with X SS Regression + Residual variation SS Residual
More information4. Introduction to Local Estimation
4. Introduction to Local Estimation Overview 1. Traditional vs. piecewise SEM 2. Tests of directed separation 3. Introduction to piecewisesem 4.1 Traditional vs. Piecewise SEM 4.1 Comparison. Traditional
More informationChapter 1. Modeling Basics
Chapter 1. Modeling Basics What is a model? Model equation and probability distribution Types of model effects Writing models in matrix form Summary 1 What is a statistical model? A model is a mathematical
More informationIntroduction and Single Predictor Regression. Correlation
Introduction and Single Predictor Regression Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Correlation A correlation
More informationTABLES AND FORMULAS FOR MOORE Basic Practice of Statistics
TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics Exploring Data: Distributions Look for overall pattern (shape, center, spread) and deviations (outliers). Mean (use a calculator): x = x 1 + x
More information22S39: Class Notes / November 14, 2000 back to start 1
Model diagnostics Interpretation of fitted regression model 22S39: Class Notes / November 14, 2000 back to start 1 Model diagnostics 22S39: Class Notes / November 14, 2000 back to start 2 Model diagnostics
More information6 Multivariate Regression
6 Multivariate Regression 6.1 The Model a In multiple linear regression, we study the relationship between several input variables or regressors and a continuous target variable. Here, several target variables
More informationLinear regression and correlation
Faculty of Health Sciences Linear regression and correlation Statistics for experimental medical researchers 2018 Julie Forman, Christian Pipper & Claus Ekstrøm Department of Biostatistics, University
More informationTutorial 6: Linear Regression
Tutorial 6: Linear Regression Rob Nicholls nicholls@mrc-lmb.cam.ac.uk MRC LMB Statistics Course 2014 Contents 1 Introduction to Simple Linear Regression................ 1 2 Parameter Estimation and Model
More informationINTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y
INTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y Predictor or Independent variable x Model with error: for i = 1,..., n, y i = α + βx i + ε i ε i : independent errors (sampling, measurement,
More informationProbability and Statistics
Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be CHAPTER 4: IT IS ALL ABOUT DATA 4a - 1 CHAPTER 4: IT
More informationReview of Statistics 101
Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods
More informationLecture 2 Linear Regression: A Model for the Mean. Sharyn O Halloran
Lecture 2 Linear Regression: A Model for the Mean Sharyn O Halloran Closer Look at: Linear Regression Model Least squares procedure Inferential tools Confidence and Prediction Intervals Assumptions Robustness
More informationISQS 5349 Spring 2013 Final Exam
ISQS 5349 Spring 2013 Final Exam Name: General Instructions: Closed books, notes, no electronic devices. Points (out of 200) are in parentheses. Put written answers on separate paper; multiple choices
More informationcor(dataset$measurement1, dataset$measurement2, method= pearson ) cor.test(datavector1, datavector2, method= pearson )
Tutorial 7: Correlation and Regression Correlation Used to test whether two variables are linearly associated. A correlation coefficient (r) indicates the strength and direction of the association. A correlation
More informationLecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression:
Biost 518 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture utline Choice of Model Alternative Models Effect of data driven selection of
More informationSociology 6Z03 Review II
Sociology 6Z03 Review II John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 1 / 35 Outline: Review II Probability Part I Sampling Distributions Probability
More informationMODULE 6 LOGISTIC REGRESSION. Module Objectives:
MODULE 6 LOGISTIC REGRESSION Module Objectives: 1. 147 6.1. LOGIT TRANSFORMATION MODULE 6. LOGISTIC REGRESSION Logistic regression models are used when a researcher is investigating the relationship between
More informationMORE ON SIMPLE REGRESSION: OVERVIEW
FI=NOT0106 NOTICE. Unless otherwise indicated, all materials on this page and linked pages at the blue.temple.edu address and at the astro.temple.edu address are the sole property of Ralph B. Taylor and
More informationA brief introduction to mixed models
A brief introduction to mixed models University of Gothenburg Gothenburg April 6, 2017 Outline An introduction to mixed models based on a few examples: Definition of standard mixed models. Parameter estimation.
More informationStatistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018
Statistics Boot Camp Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 March 21, 2018 Outline of boot camp Summarizing and simplifying data Point and interval estimation Foundations of statistical
More informationUsing R in 200D Luke Sonnet
Using R in 200D Luke Sonnet Contents Working with data frames 1 Working with variables........................................... 1 Analyzing data............................................... 3 Random
More informationExample. Multiple Regression. Review of ANOVA & Simple Regression /749 Experimental Design for Behavioral and Social Sciences
36-309/749 Experimental Design for Behavioral and Social Sciences Sep. 29, 2015 Lecture 5: Multiple Regression Review of ANOVA & Simple Regression Both Quantitative outcome Independent, Gaussian errors
More informationSTAT 3022 Spring 2007
Simple Linear Regression Example These commands reproduce what we did in class. You should enter these in R and see what they do. Start by typing > set.seed(42) to reset the random number generator so
More informationEstimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.
Estimating σ 2 We can do simple prediction of Y and estimation of the mean of Y at any value of X. To perform inferences about our regression line, we must estimate σ 2, the variance of the error term.
More informationSingle and multiple linear regression analysis
Single and multiple linear regression analysis Marike Cockeran 2017 Introduction Outline of the session Simple linear regression analysis SPSS example of simple linear regression analysis Additional topics
More informationThe Multiple Regression Model
Multiple Regression The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & or more independent variables (X i ) Multiple Regression Model with k Independent Variables:
More informationEstimability Tools for Package Developers by Russell V. Lenth
CONTRIBUTED RESEARCH ARTICLES 195 Estimability Tools for Package Developers by Russell V. Lenth Abstract When a linear model is rank-deficient, then predictions based on that model become questionable
More informationFinal Exam. Name: Solution:
Final Exam. Name: Instructions. Answer all questions on the exam. Open books, open notes, but no electronic devices. The first 13 problems are worth 5 points each. The rest are worth 1 point each. HW1.
More informationBasic Business Statistics, 10/e
Chapter 4 4- Basic Business Statistics th Edition Chapter 4 Introduction to Multiple Regression Basic Business Statistics, e 9 Prentice-Hall, Inc. Chap 4- Learning Objectives In this chapter, you learn:
More informationIntroduction to Statistics and R
Introduction to Statistics and R Mayo-Illinois Computational Genomics Workshop (2018) Ruoqing Zhu, Ph.D. Department of Statistics, UIUC rqzhu@illinois.edu June 18, 2018 Abstract This document is a supplimentary
More informationMultiple Comparisons
Multiple Comparisons Error Rates, A Priori Tests, and Post-Hoc Tests Multiple Comparisons: A Rationale Multiple comparison tests function to tease apart differences between the groups within our IV when
More informationRegression and Models with Multiple Factors. Ch. 17, 18
Regression and Models with Multiple Factors Ch. 17, 18 Mass 15 20 25 Scatter Plot 70 75 80 Snout-Vent Length Mass 15 20 25 Linear Regression 70 75 80 Snout-Vent Length Least-squares The method of least
More informationVariance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.
10/3/011 Functional Connectivity Correlation and Regression Variance VAR = Standard deviation Standard deviation SD = Unbiased SD = 1 10/3/011 Standard error Confidence interval SE = CI = = t value for
More informationSTAT 4385 Topic 01: Introduction & Review
STAT 4385 Topic 01: Introduction & Review Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2016 Outline Welcome What is Regression Analysis? Basics
More information401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.
401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis
More informationRegression models. Generalized linear models in R. Normal regression models are not always appropriate. Generalized linear models. Examples.
Regression models Generalized linear models in R Dr Peter K Dunn http://www.usq.edu.au Department of Mathematics and Computing University of Southern Queensland ASC, July 00 The usual linear regression
More informationRelations in epidemiology-- the need for models
Plant Disease Epidemiology REVIEW: Terminology & history Monitoring epidemics: Disease measurement Disease intensity: severity, incidence,... Types of variables, etc. Measurement (assessment) of severity
More informationChapter 4: Regression Models
Sales volume of company 1 Textbook: pp. 129-164 Chapter 4: Regression Models Money spent on advertising 2 Learning Objectives After completing this chapter, students will be able to: Identify variables,
More informationNature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.
Understanding regression output from software Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals In 1966 Cyril Burt published a paper called The genetic determination of differences
More informationIn the previous chapter, we learned how to use the method of least-squares
03-Kahane-45364.qxd 11/9/2007 4:40 PM Page 37 3 Model Performance and Evaluation In the previous chapter, we learned how to use the method of least-squares to find a line that best fits a scatter of points.
More informationData Analysis and Statistical Methods Statistics 651
y 1 2 3 4 5 6 7 x Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching.html Lecture 32 Suhasini Subba Rao Previous lecture We are interested in whether a dependent
More informationWe like to capture and represent the relationship between a set of possible causes and their response, by using a statistical predictive model.
Statistical Methods in Business Lecture 5. Linear Regression We like to capture and represent the relationship between a set of possible causes and their response, by using a statistical predictive model.
More informationLinear Regression. In this lecture we will study a particular type of regression model: the linear regression model
1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor
More informationOnline Courses for High School Students
Online Courses for High School Students 1-888-972-6237 Algebra I Course Description: Students explore the tools of algebra and learn to identify the structure and properties of the real number system;
More informationLinear Regression Models
Linear Regression Models Model Description and Model Parameters Modelling is a central theme in these notes. The idea is to develop and continuously improve a library of predictive models for hazards,
More informationNotes on Maxwell & Delaney
Notes on Maxwell & Delaney PSY710 9 Designs with Covariates 9.1 Blocking Consider the following hypothetical experiment. We want to measure the effect of a drug on locomotor activity in hyperactive children.
More informationRegression in R I. Part I : Simple Linear Regression
UCLA Department of Statistics Statistical Consulting Center Regression in R Part I : Simple Linear Regression Denise Ferrari & Tiffany Head denise@stat.ucla.edu tiffany@stat.ucla.edu Feb 10, 2010 Objective
More informationModule 03 Lecture 14 Inferential Statistics ANOVA and TOI
Introduction of Data Analytics Prof. Nandan Sudarsanam and Prof. B Ravindran Department of Management Studies and Department of Computer Science and Engineering Indian Institute of Technology, Madras Module
More informationChapter 3 - Linear Regression
Chapter 3 - Linear Regression Lab Solution 1 Problem 9 First we will read the Auto" data. Note that most datasets referred to in the text are in the R package the authors developed. So we just need to
More informationLinear Modelling: Simple Regression
Linear Modelling: Simple Regression 10 th of Ma 2018 R. Nicholls / D.-L. Couturier / M. Fernandes Introduction: ANOVA Used for testing hpotheses regarding differences between groups Considers the variation
More information1 A Review of Correlation and Regression
1 A Review of Correlation and Regression SW, Chapter 12 Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then
More informationSTAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS
STAT 512 MidTerm I (2/21/2013) Spring 2013 Name: Key INSTRUCTIONS 1. This exam is open book/open notes. All papers (but no electronic devices except for calculators) are allowed. 2. There are 5 pages in
More informationAssumptions, Diagnostics, and Inferences for the Simple Linear Regression Model with Normal Residuals
Assumptions, Diagnostics, and Inferences for the Simple Linear Regression Model with Normal Residuals 4 December 2018 1 The Simple Linear Regression Model with Normal Residuals In previous class sessions,
More informationGeneralized Linear Models
York SPIDA John Fox Notes Generalized Linear Models Copyright 2010 by John Fox Generalized Linear Models 1 1. Topics I The structure of generalized linear models I Poisson and other generalized linear
More informationMULTIPLE REGRESSION ANALYSIS AND OTHER ISSUES. Business Statistics
MULTIPLE REGRESSION ANALYSIS AND OTHER ISSUES Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression analysis Predicting with regression analysis Old exam question
More informationComparing Several Means: ANOVA
Comparing Several Means: ANOVA Understand the basic principles of ANOVA Why it is done? What it tells us? Theory of one way independent ANOVA Following up an ANOVA: Planned contrasts/comparisons Choosing
More informationStats fest Analysis of variance. Single factor ANOVA. Aims. Single factor ANOVA. Data
1 Stats fest 2007 Analysis of variance murray.logan@sci.monash.edu.au Single factor ANOVA 2 Aims Description Investigate differences between population means Explanation How much of the variation in response
More informationBivariate data analysis
Bivariate data analysis Categorical data - creating data set Upload the following data set to R Commander sex female male male male male female female male female female eye black black blue green green
More informationAcknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression
INTRODUCTION TO CLINICAL RESEARCH Introduction to Linear Regression Karen Bandeen-Roche, Ph.D. July 17, 2012 Acknowledgements Marie Diener-West Rick Thompson ICTR Leadership / Team JHU Intro to Clinical
More informationChapter 5 Exercises 1
Chapter 5 Exercises 1 Data Analysis & Graphics Using R, 2 nd edn Solutions to Exercises (December 13, 2006) Preliminaries > library(daag) Exercise 2 For each of the data sets elastic1 and elastic2, determine
More information