Introduction to Linear regression analysis. Part 2. Model comparisons

Size: px
Start display at page:

Download "Introduction to Linear regression analysis. Part 2. Model comparisons"

Transcription

1 Introduction to Linear regression analysis Part Model comparisons 1

2 ANOVA for regression Total variation in Y SS Total = Variation explained by regression with X SS Regression + Residual variation SS Residual Full model y i = + 1 x i + i Unexplained variation in Y from full model = SS Residual ( y y ) i i

3 Reduced model (H true) Reduced model (H : 1 = true): y i = + i (Mean and error) Unexplained variation in Y from reduced model = SS Total ( y y) i Model comparison Difference in unexplained variation between full and reduced models: SS Total -SS Residual = SS Regression ( y y) i Variation explained by including 1 in model 3

4 Explained variation Proportion of variation in Y explained by linear relationship with X Termed r, coefficient of determination: SS Regression SS Total r is simply square of correlation coefficient (r) between X and Y. Which is the better model?? Y1 3 Y X X 4

5 Which is the better model?? Y X Dep Var: Y1 N: 6 Multiple R: Squared multiple R: Adjusted squared multiple R: Standard error of estimate: Effect Coefficient Std Error Std Coef Tolerance t P( Tail) CONSTANT X Analysis of Variance Source Sum-of-Squares df Mean-Square F-ratio P Regression.39543E E Residual E Which is the better model?? Y X Dep Var: Y N: 5 Multiple R: Squared multiple R: Adjusted squared multiple R: Standard error of estimate: Effect Coefficient Std Error Std Coef Tolerance t P( Tail) CONSTANT X Analysis of Variance Source Sum-of-Squares df Mean-Square F-ratio P Regression E E Residual

6 Which is the better model?? Y1 3 Y X n = 6 P =.9 r = X n = 5 P =.386 r =.94 Which is the better model?? 95% Confidence bands (for slope) Y1 3 Y X n = 6 P =.9 r = X n = 5 P =.386 r =.94 6

7 Assumptions Normality Y normally distributed at each value of X: Boxplot of y should be symmetrical - watch out for outliers and skewness Transformations often help 7

8 Homogeneity of variance Variance (spread) of Y should be constant for each value of x i (homogeneity of variance): Very difficult to assess usually (for models with only one value of y per x). Y x y i 1 i Y1 Y x 1 x X 8

9 Homogeneity of variance Variance (spread) of Y should be constant for each value of x i (homogeneity of variance): Very difficult to assess usually (for models with only one value of y per x). Spread of residuals should be even when plotted against x i or predicted y i s Transformations often help Transformations that improve normality of Y will also usually make variance of Y more constant Tests of Homogeneity of variance Y Standardized Residual 1-1 RESIDUAL X X ESTIMATE 9

10 Independence Values of y i are independent of each other: watch out for data which are a time series on same experimental or sampling units should be considered at design stage Linearity True population relationship between Y and X is linear: scatterplot of Y against X watch out for asymptotic or exponential patterns transformations of Y or Y and X often help Always look at residuals 1

11 EDA and regression diagnostics Check assumptions Check fit of model Warn about influential observations and outliers EDA Boxplots of Y (and X): check for normality, outliers etc. Scatterplot of Y and X: check for linearity, homogeneity of variance, outliers etc. 11

12 Anscombe (1973) data set R =.667, y = *x, t = 4.4, P =

13 Limited or weighted data Smoothers (for data exploration especially useful for model fitting) Nonparametric description of relationship between Y and X unconstrained by specific model structure Useful exploratory technique: is linear model appropriate? are particular observations influential? Limited or weighted data Smoothers Each observation replaced by mean or median of surrounding observations or predicted value of regression model through surrounding observations Surrounding observations in window (or band) covers range along X-axis size of window (number of observations) determined by smoothing parameter 13

14 Limited or weighted data Smoothers Adjacent windows overlap resulting line is smooth smoothness controlled by smoothing parameter (width of window) Any section of line robust to extreme values in other windows Types of limited or weighted data smoothers (examples) Running (moving) means or averages: means or medians within each window Lo(w)ess: locally weighted regression scatterplot smoothing observations within a window weighted differently observations replaced by predicted values from local regression line 14

15 Residuals very useful for examining regression assumptions Difference between observed value and value predicted or fitted by the model Residual for each observation: difference between observed y and value of y predicted by linear regression equation: ( y y ) i i Studentised residuals residual / SE residuals follow a t-distribution studentised residuals can be compared between different regressions Observations with large residual (or studentised residual) are outliers from fitted model. 15

16 Plot residuals against predicted y i Residual y +se -se x Predicted y i Even spread of Y around line No pattern in residuals Indicates assumptions OK y x Uneven spread of Y around line Residual +se -se Predicted y i Increasing spread of residuals, ie. wedge-shape Unequal variance in Y Skewed distribution of Y Transformation of Y helps 16

17 Violations of assumptions may not always lead to non-significant result Relationship between birth rate and Per capita income.6 Birthrate per female (per year) Per Capita Income (thousands) Violations of assumptions may not always lead to non-significant result Relationship between birth rate and Per capita income Birthrate per female (per year) Dep Var: BIRTHRATE N: 57 Multiple R: Squared multiple R:.5761 Adjusted squared multiple R: Standard error of estimate:.8899 Analysis of Variance Source Sum-of-Squares df Mean-Square F-ratio P Regression Residual Per Capita Income (thousands) 17

18 Violations of assumptions may not always lead to non-significant result Perhaps consider transformation Birthrate per female (per year) Per Capita Income (thousands) RESIDUAL ESTIMATE Violations of assumptions may not always lead to non-significant result Log transform of Per Capita Income Birthrate per female (per year) Per Capita Income (log (thousands)) RESIDUAL ESTIMATE 18

19 Comparison of models Birthrate per female (per year) Per Capita Income (thousands) Dep Var: BIRTHRATE N: 57 Multiple R: Squared multiple R:.5761 Adjusted squared multiple R: Standard error of estimate:.8899 Analysis of Variance Source Sum-of-Squares df Mean-Square F-ratio P Regression Residual Birthrate per female (per year) Per Capita Income (log (thousands)) Dep Var: BIRTHRATE N: 57 Multiple R: Squared multiple R:.851 Adjusted squared multiple R: Standard error of estimate:.6444 Analysis of Variance Source Sum-of-Squares df Mean-Square F-ratio P Regression Residual Outliers Leverage Influence Other indicators 19

20 Outliers Observations further from fitted model than remaining observations might be different from sample outliers in boxplots Large residual outlier Use robust estimator.syz Leverage How extreme observation is for X-variable Measures how much each x i influences predicted y i Large leverage

21 Influence Cook s D statistic: incorporates leverage & residual observations with large influence on estimated slope observations with D near or greater than 1 should be checked Y 1 Observation 1 is X and Y outlier but not influential Observation has large residual - outlier Observation 3 is very influential (large Cook s D) - also outlier X 3 1

22 Transformations If Y (and error terms) is skewed: log or power transformation of Y improves homogeneity of variance can reduce influence of outliers If nonlinear relationship: linearize by transformation of Y and/or X Transformed variables must make biological sense Robust regression Help with outliers in Y Least Absolute Deviation (LAD) regression Minimizes sum of residuals (instead of squares) M (maximum likelihood) regression Many types often based on iteratively reweighted least squares, not useful for issues with leverage. Converges on OLS when assumptions are met Help with outliers in Y and X Least Median of Squares (LMS) regression Minimizes median of squares of the residuals Least Trimmed Squares (LTS) regression Uses a trimmed set of observations and operates on sums of residuals Rank regression Uses ranks of residuals rather than values

23 Regression through origin Fit model: y i = 1 x i + i Problems: extrapolation below smallest x i ANOVA partition of SS no longer additive r difficult to interpret Always test H that = first Recommendation - Only do this if there is a compelling reason to do so (usually there is not) 3

Experimental Design and Data Analysis for Biologists

Experimental Design and Data Analysis for Biologists Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression 1 Correlation indicates the magnitude and direction of the linear relationship between two variables. Linear Regression: variable Y (criterion) is predicted by variable X (predictor)

More information

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference. Understanding regression output from software Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals In 1966 Cyril Burt published a paper called The genetic determination of differences

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression OI CHAPTER 7 Important Concepts Correlation (r or R) and Coefficient of determination (R 2 ) Interpreting y-intercept and slope coefficients Inference (hypothesis testing and confidence

More information

Statistical Modelling in Stata 5: Linear Models

Statistical Modelling in Stata 5: Linear Models Statistical Modelling in Stata 5: Linear Models Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 07/11/2017 Structure This Week What is a linear model? How good is my model? Does

More information

Introduction to hypothesis testing

Introduction to hypothesis testing Introduction to hypothesis testing Review: Logic of Hypothesis Tests Usually, we test (attempt to falsify) a null hypothesis (H 0 ): includes all possibilities except prediction in hypothesis (H A ) If

More information

Remedial Measures for Multiple Linear Regression Models

Remedial Measures for Multiple Linear Regression Models Remedial Measures for Multiple Linear Regression Models Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Remedial Measures for Multiple Linear Regression Models 1 / 25 Outline

More information

Regression. Marc H. Mehlman University of New Haven

Regression. Marc H. Mehlman University of New Haven Regression Marc H. Mehlman marcmehlman@yahoo.com University of New Haven the statistician knows that in nature there never was a normal distribution, there never was a straight line, yet with normal and

More information

Data Set 8: Laysan Finch Beak Widths

Data Set 8: Laysan Finch Beak Widths Data Set 8: Finch Beak Widths Statistical Setting This handout describes an analysis of covariance (ANCOVA) involving one categorical independent variable (with only two levels) and one quantitative covariate.

More information

The Model Building Process Part I: Checking Model Assumptions Best Practice

The Model Building Process Part I: Checking Model Assumptions Best Practice The Model Building Process Part I: Checking Model Assumptions Best Practice Authored by: Sarah Burke, PhD 31 July 2017 The goal of the STAT T&E COE is to assist in developing rigorous, defensible test

More information

Contents. Acknowledgments. xix

Contents. Acknowledgments. xix Table of Preface Acknowledgments page xv xix 1 Introduction 1 The Role of the Computer in Data Analysis 1 Statistics: Descriptive and Inferential 2 Variables and Constants 3 The Measurement of Variables

More information

Analysis of Covariance

Analysis of Covariance Analysis of Covariance Using categorical and continuous predictor variables Example An experiment is set up to look at the effects of watering on Oak Seedling establishment Three levels of watering: (no

More information

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics Exploring Data: Distributions Look for overall pattern (shape, center, spread) and deviations (outliers). Mean (use a calculator): x = x 1 + x

More information

Stat 101 Exam 1 Important Formulas and Concepts 1

Stat 101 Exam 1 Important Formulas and Concepts 1 1 Chapter 1 1.1 Definitions Stat 101 Exam 1 Important Formulas and Concepts 1 1. Data Any collection of numbers, characters, images, or other items that provide information about something. 2. Categorical/Qualitative

More information

9. Linear Regression and Correlation

9. Linear Regression and Correlation 9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,

More information

Soil Phosphorus Discussion

Soil Phosphorus Discussion Solution: Soil Phosphorus Discussion Summary This analysis is ambiguous: there are two reasonable approaches which yield different results. Both lead to the conclusion that there is not an independent

More information

Lecture 1: Linear Models and Applications

Lecture 1: Linear Models and Applications Lecture 1: Linear Models and Applications Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Introduction to linear models Exploratory data analysis (EDA) Estimation

More information

robustness, efficiency, breakdown point, outliers, rank-based procedures, least absolute regression

robustness, efficiency, breakdown point, outliers, rank-based procedures, least absolute regression Robust Statistics robustness, efficiency, breakdown point, outliers, rank-based procedures, least absolute regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html

More information

The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1)

The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1) The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1) Authored by: Sarah Burke, PhD Version 1: 31 July 2017 Version 1.1: 24 October 2017 The goal of the STAT T&E COE

More information

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3 Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details Section 10.1, 2, 3 Basic components of regression setup Target of inference: linear dependency

More information

Regression Diagnostics Procedures

Regression Diagnostics Procedures Regression Diagnostics Procedures ASSUMPTIONS UNDERLYING REGRESSION/CORRELATION NORMALITY OF VARIANCE IN Y FOR EACH VALUE OF X For any fixed value of the independent variable X, the distribution of the

More information

Introduction to Regression

Introduction to Regression Introduction to Regression Using Mult Lin Regression Derived variables Many alternative models Which model to choose? Model Criticism Modelling Objective Model Details Data and Residuals Assumptions 1

More information

Outline. Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping

Outline. Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping Topic 19: Remedies Outline Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping Regression Diagnostics Summary Check normality of the residuals

More information

STATISTICS 479 Exam II (100 points)

STATISTICS 479 Exam II (100 points) Name STATISTICS 79 Exam II (1 points) 1. A SAS data set was created using the following input statement: Answer parts(a) to (e) below. input State $ City $ Pop199 Income Housing Electric; (a) () Give the

More information

1 Introduction to Minitab

1 Introduction to Minitab 1 Introduction to Minitab Minitab is a statistical analysis software package. The software is freely available to all students and is downloadable through the Technology Tab at my.calpoly.edu. When you

More information

Data Set 1A: Algal Photosynthesis vs. Salinity and Temperature

Data Set 1A: Algal Photosynthesis vs. Salinity and Temperature Data Set A: Algal Photosynthesis vs. Salinity and Temperature Statistical setting These data are from a controlled experiment in which two quantitative variables were manipulated, to determine their effects

More information

Stats fest Analysis of variance. Single factor ANOVA. Aims. Single factor ANOVA. Data

Stats fest Analysis of variance. Single factor ANOVA. Aims. Single factor ANOVA. Data 1 Stats fest 2007 Analysis of variance murray.logan@sci.monash.edu.au Single factor ANOVA 2 Aims Description Investigate differences between population means Explanation How much of the variation in response

More information

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont. TCELL 9/4/205 36-309/749 Experimental Design for Behavioral and Social Sciences Simple Regression Example Male black wheatear birds carry stones to the nest as a form of sexual display. Soler et al. wanted

More information

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X. Estimating σ 2 We can do simple prediction of Y and estimation of the mean of Y at any value of X. To perform inferences about our regression line, we must estimate σ 2, the variance of the error term.

More information

Regression Analysis for Data Containing Outliers and High Leverage Points

Regression Analysis for Data Containing Outliers and High Leverage Points Alabama Journal of Mathematics 39 (2015) ISSN 2373-0404 Regression Analysis for Data Containing Outliers and High Leverage Points Asim Kumer Dey Department of Mathematics Lamar University Md. Amir Hossain

More information

Introduction to Analysis of Variance (ANOVA) Part 2

Introduction to Analysis of Variance (ANOVA) Part 2 Introduction to Analysis of Variance (ANOVA) Part 2 Single factor Serpulid recruitment and biofilms Effect of biofilm type on number of recruiting serpulid worms in Port Phillip Bay Response variable:

More information

Problem Set 1 ANSWERS

Problem Set 1 ANSWERS Economics 20 Prof. Patricia M. Anderson Problem Set 1 ANSWERS Part I. Multiple Choice Problems 1. If X and Z are two random variables, then E[X-Z] is d. E[X] E[Z] This is just a simple application of one

More information

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression 36-309/749 Experimental Design for Behavioral and Social Sciences Sep. 22, 2015 Lecture 4: Linear Regression TCELL Simple Regression Example Male black wheatear birds carry stones to the nest as a form

More information

Correlation and Simple Linear Regression

Correlation and Simple Linear Regression Correlation and Simple Linear Regression Sasivimol Rattanasiri, Ph.D Section for Clinical Epidemiology and Biostatistics Ramathibodi Hospital, Mahidol University E-mail: sasivimol.rat@mahidol.ac.th 1 Outline

More information

Preface Introduction to Statistics and Data Analysis Overview: Statistical Inference, Samples, Populations, and Experimental Design The Role of

Preface Introduction to Statistics and Data Analysis Overview: Statistical Inference, Samples, Populations, and Experimental Design The Role of Preface Introduction to Statistics and Data Analysis Overview: Statistical Inference, Samples, Populations, and Experimental Design The Role of Probability Sampling Procedures Collection of Data Measures

More information

4.1. Introduction: Comparing Means

4.1. Introduction: Comparing Means 4. Analysis of Variance (ANOVA) 4.1. Introduction: Comparing Means Consider the problem of testing H 0 : µ 1 = µ 2 against H 1 : µ 1 µ 2 in two independent samples of two different populations of possibly

More information

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc IES 612/STA 4-573/STA 4-576 Winter 2008 Week 1--IES 612-STA 4-573-STA 4-576.doc Review Notes: [OL] = Ott & Longnecker Statistical Methods and Data Analysis, 5 th edition. [Handouts based on notes prepared

More information

Bivariate data analysis

Bivariate data analysis Bivariate data analysis Categorical data - creating data set Upload the following data set to R Commander sex female male male male male female female male female female eye black black blue green green

More information

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics 203: Introduction to Regression and Analysis of Variance Course review Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying

More information

Topic 23: Diagnostics and Remedies

Topic 23: Diagnostics and Remedies Topic 23: Diagnostics and Remedies Outline Diagnostics residual checks ANOVA remedial measures Diagnostics Overview We will take the diagnostics and remedial measures that we learned for regression and

More information

Introduction and Single Predictor Regression. Correlation

Introduction and Single Predictor Regression. Correlation Introduction and Single Predictor Regression Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Correlation A correlation

More information

Final Exam. Name: Solution:

Final Exam. Name: Solution: Final Exam. Name: Instructions. Answer all questions on the exam. Open books, open notes, but no electronic devices. The first 13 problems are worth 5 points each. The rest are worth 1 point each. HW1.

More information

Multivariate Lineare Modelle

Multivariate Lineare Modelle 0-1 TALEB AHMAD CASE - Center for Applied Statistics and Economics Humboldt-Universität zu Berlin Motivation 1-1 Motivation Multivariate regression models can accommodate many explanatory which simultaneously

More information

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x). Linear Regression Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x). A dependent variable is a random variable whose variation

More information

****Lab 4, Feb 4: EDA and OLS and WLS

****Lab 4, Feb 4: EDA and OLS and WLS ****Lab 4, Feb 4: EDA and OLS and WLS ------- log: C:\Documents and Settings\Default\Desktop\LDA\Data\cows_Lab4.log log type: text opened on: 4 Feb 2004, 09:26:19. use use "Z:\LDA\DataLDA\cowsP.dta", clear.

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

unadjusted model for baseline cholesterol 22:31 Monday, April 19,

unadjusted model for baseline cholesterol 22:31 Monday, April 19, unadjusted model for baseline cholesterol 22:31 Monday, April 19, 2004 1 Class Level Information Class Levels Values TRETGRP 3 3 4 5 SEX 2 0 1 Number of observations 916 unadjusted model for baseline cholesterol

More information

Chapter 16. Simple Linear Regression and dcorrelation

Chapter 16. Simple Linear Regression and dcorrelation Chapter 16 Simple Linear Regression and dcorrelation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

Chapter 9 - Correlation and Regression

Chapter 9 - Correlation and Regression Chapter 9 - Correlation and Regression 9. Scatter diagram of percentage of LBW infants (Y) and high-risk fertility rate (X ) in Vermont Health Planning Districts. 9.3 Correlation between percentage of

More information

BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression

BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression Introduction to Correlation and Regression The procedures discussed in the previous ANOVA labs are most useful in cases where we are interested

More information

Chapter 11: Robust & Quantile regression

Chapter 11: Robust & Quantile regression Chapter 11: Robust & Adapted from Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1 / 13 11.3: Robust regression Leverages h ii and deleted residuals t i

More information

Measurement Error. Often a data set will contain imperfect measures of the data we would ideally like.

Measurement Error. Often a data set will contain imperfect measures of the data we would ideally like. Measurement Error Often a data set will contain imperfect measures of the data we would ideally like. Aggregate Data: (GDP, Consumption, Investment are only best guesses of theoretical counterparts and

More information

Lectures on Simple Linear Regression Stat 431, Summer 2012

Lectures on Simple Linear Regression Stat 431, Summer 2012 Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population

More information

S The Over-Reliance on the Central Limit Theorem

S The Over-Reliance on the Central Limit Theorem S04-2008 The Over-Reliance on the Central Limit Theorem Abstract The objective is to demonstrate the theoretical and practical implication of the central limit theorem. The theorem states that as n approaches

More information

Density Temp vs Ratio. temp

Density Temp vs Ratio. temp Temp Ratio Density 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Density 0.0 0.2 0.4 0.6 0.8 1.0 1. (a) 170 175 180 185 temp 1.0 1.5 2.0 2.5 3.0 ratio The histogram shows that the temperature measures have two peaks,

More information

Summarizing Data: Paired Quantitative Data

Summarizing Data: Paired Quantitative Data Summarizing Data: Paired Quantitative Data regression line (or least-squares line) a straight line model for the relationship between explanatory (x) and response (y) variables, often used to produce a

More information

For Bonnie and Jesse (again)

For Bonnie and Jesse (again) SECOND EDITION A P P L I E D R E G R E S S I O N A N A L Y S I S a n d G E N E R A L I Z E D L I N E A R M O D E L S For Bonnie and Jesse (again) SECOND EDITION A P P L I E D R E G R E S S I O N A N A

More information

Correlation and Regression

Correlation and Regression Correlation and Regression Dr. Bob Gee Dean Scott Bonney Professor William G. Journigan American Meridian University 1 Learning Objectives Upon successful completion of this module, the student should

More information

Example: Four levels of herbicide strength in an experiment on dry weight of treated plants.

Example: Four levels of herbicide strength in an experiment on dry weight of treated plants. The idea of ANOVA Reminders: A factor is a variable that can take one of several levels used to differentiate one group from another. An experiment has a one-way, or completely randomized, design if several

More information

Diagnostics and Remedial Measures

Diagnostics and Remedial Measures Diagnostics and Remedial Measures Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Diagnostics and Remedial Measures 1 / 72 Remedial Measures How do we know that the regression

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Simple Linear Regression for the MPG Data

Simple Linear Regression for the MPG Data Simple Linear Regression for the MPG Data 2000 2500 3000 3500 15 20 25 30 35 40 45 Wgt MPG What do we do with the data? y i = MPG of i th car x i = Weight of i th car i =1,...,n n = Sample Size Exploratory

More information

Econometrics. 4) Statistical inference

Econometrics. 4) Statistical inference 30C00200 Econometrics 4) Statistical inference Timo Kuosmanen Professor, Ph.D. http://nomepre.net/index.php/timokuosmanen Today s topics Confidence intervals of parameter estimates Student s t-distribution

More information

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests ECON4150 - Introductory Econometrics Lecture 5: OLS with One Regressor: Hypothesis Tests Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 5 Lecture outline 2 Testing Hypotheses about one

More information

Correlation and simple linear regression

Correlation and simple linear regression 8 Correlation and simple linear regression Correlation and regression are techniques used to examine associations and relationships between continuous variables collected on the same set of sampling or

More information

Linear Models in Statistics

Linear Models in Statistics Linear Models in Statistics ALVIN C. RENCHER Department of Statistics Brigham Young University Provo, Utah A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane

More information

INFERENCE FOR REGRESSION

INFERENCE FOR REGRESSION CHAPTER 3 INFERENCE FOR REGRESSION OVERVIEW In Chapter 5 of the textbook, we first encountered regression. The assumptions that describe the regression model we use in this chapter are the following. We

More information

13 Simple Linear Regression

13 Simple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 3 Simple Linear Regression 3. An industrial example A study was undertaken to determine the effect of stirring rate on the amount of impurity

More information

One-sided and two-sided t-test

One-sided and two-sided t-test One-sided and two-sided t-test Given a mean cancer rate in Montreal, 1. What is the probability of finding a deviation of > 1 stdev from the mean? 2. What is the probability of finding 1 stdev more cases?

More information

Ch Inference for Linear Regression

Ch Inference for Linear Regression Ch. 12-1 Inference for Linear Regression ACT = 6.71 + 5.17(GPA) For every increase of 1 in GPA, we predict the ACT score to increase by 5.17. population regression line β (true slope) μ y = α + βx mean

More information

SPSS Guide For MMI 409

SPSS Guide For MMI 409 SPSS Guide For MMI 409 by John Wong March 2012 Preface Hopefully, this document can provide some guidance to MMI 409 students on how to use SPSS to solve many of the problems covered in the D Agostino

More information

Unit 6 - Introduction to linear regression

Unit 6 - Introduction to linear regression Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,

More information

Basic Business Statistics 6 th Edition

Basic Business Statistics 6 th Edition Basic Business Statistics 6 th Edition Chapter 12 Simple Linear Regression Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of a dependent variable based

More information

Transition Passage to Descriptive Statistics 28

Transition Passage to Descriptive Statistics 28 viii Preface xiv chapter 1 Introduction 1 Disciplines That Use Quantitative Data 5 What Do You Mean, Statistics? 6 Statistics: A Dynamic Discipline 8 Some Terminology 9 Problems and Answers 12 Scales of

More information

Checking model assumptions with regression diagnostics

Checking model assumptions with regression diagnostics @graemeleehickey www.glhickey.com graeme.hickey@liverpool.ac.uk Checking model assumptions with regression diagnostics Graeme L. Hickey University of Liverpool Conflicts of interest None Assistant Editor

More information

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006 Chapter 17 Simple Linear Regression and Correlation 17.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

9. Robust regression

9. Robust regression 9. Robust regression Least squares regression........................................................ 2 Problems with LS regression..................................................... 3 Robust regression............................................................

More information

What to do if Assumptions are Violated?

What to do if Assumptions are Violated? What to do if Assumptions are Violated? Abandon simple linear regression for something else (usually more complicated). Some examples of alternative models: weighted least square appropriate model if the

More information

Prentice Hall Stats: Modeling the World 2004 (Bock) Correlated to: National Advanced Placement (AP) Statistics Course Outline (Grades 9-12)

Prentice Hall Stats: Modeling the World 2004 (Bock) Correlated to: National Advanced Placement (AP) Statistics Course Outline (Grades 9-12) National Advanced Placement (AP) Statistics Course Outline (Grades 9-12) Following is an outline of the major topics covered by the AP Statistics Examination. The ordering here is intended to define the

More information

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables Regression Analysis Regression: Methodology for studying the relationship among two or more variables Two major aims: Determine an appropriate model for the relationship between the variables Predict the

More information

y response variable x 1, x 2,, x k -- a set of explanatory variables

y response variable x 1, x 2,, x k -- a set of explanatory variables 11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 5 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 44 Outline of Lecture 5 Now that we know the sampling distribution

More information

Outline. Possible Reasons. Nature of Heteroscedasticity. Basic Econometrics in Transportation. Heteroscedasticity

Outline. Possible Reasons. Nature of Heteroscedasticity. Basic Econometrics in Transportation. Heteroscedasticity 1/25 Outline Basic Econometrics in Transportation Heteroscedasticity What is the nature of heteroscedasticity? What are its consequences? How does one detect it? What are the remedial measures? Amir Samimi

More information

General Linear Model (Chapter 4)

General Linear Model (Chapter 4) General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients

More information

5.3 Three-Stage Nested Design Example

5.3 Three-Stage Nested Design Example 5.3 Three-Stage Nested Design Example A researcher designs an experiment to study the of a metal alloy. A three-stage nested design was conducted that included Two alloy chemistry compositions. Three ovens

More information

Residuals from regression on original data 1

Residuals from regression on original data 1 Residuals from regression on original data 1 Obs a b n i y 1 1 1 3 1 1 2 1 1 3 2 2 3 1 1 3 3 3 4 1 2 3 1 4 5 1 2 3 2 5 6 1 2 3 3 6 7 1 3 3 1 7 8 1 3 3 2 8 9 1 3 3 3 9 10 2 1 3 1 10 11 2 1 3 2 11 12 2 1

More information

Handout 11: Measurement Error

Handout 11: Measurement Error Handout 11: Measurement Error In which you learn to recognise the consequences for OLS estimation whenever some of the variables you use are not measured as accurately as you might expect. A (potential)

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Activity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression

Activity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression Activity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression Scenario: 31 counts (over a 30-second period) were recorded from a Geiger counter at a nuclear

More information

On Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness

On Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness Statistics and Applications {ISSN 2452-7395 (online)} Volume 16 No. 1, 2018 (New Series), pp 289-303 On Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness Snigdhansu

More information

8. Example: Predicting University of New Mexico Enrollment

8. Example: Predicting University of New Mexico Enrollment 8. Example: Predicting University of New Mexico Enrollment year (1=1961) 6 7 8 9 10 6000 10000 14000 0 5 10 15 20 25 30 6 7 8 9 10 unem (unemployment rate) hgrad (highschool graduates) 10000 14000 18000

More information

Nonlinear Regression Functions

Nonlinear Regression Functions Nonlinear Regression Functions (SW Chapter 8) Outline 1. Nonlinear regression functions general comments 2. Nonlinear functions of one variable 3. Nonlinear functions of two variables: interactions 4.

More information

Chapter 16. Simple Linear Regression and Correlation

Chapter 16. Simple Linear Regression and Correlation Chapter 16 Simple Linear Regression and Correlation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

One-Sample and Two-Sample Means Tests

One-Sample and Two-Sample Means Tests One-Sample and Two-Sample Means Tests 1 Sample t Test The 1 sample t test allows us to determine whether the mean of a sample data set is different than a known value. Used when the population variance

More information

Confidence Intervals, Testing and ANOVA Summary

Confidence Intervals, Testing and ANOVA Summary Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0

More information

Prepared by: Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti

Prepared by: Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti Prepared by: Prof Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti Putra Malaysia Serdang M L Regression is an extension to

More information

3 Joint Distributions 71

3 Joint Distributions 71 2.2.3 The Normal Distribution 54 2.2.4 The Beta Density 58 2.3 Functions of a Random Variable 58 2.4 Concluding Remarks 64 2.5 Problems 64 3 Joint Distributions 71 3.1 Introduction 71 3.2 Discrete Random

More information

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Course Review Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,

More information

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects Contents 1 Review of Residuals 2 Detecting Outliers 3 Influential Observations 4 Multicollinearity and its Effects W. Zhou (Colorado State University) STAT 540 July 6th, 2015 1 / 32 Model Diagnostics:

More information

Generalized Additive Models

Generalized Additive Models Generalized Additive Models The Model The GLM is: g( µ) = ß 0 + ß 1 x 1 + ß 2 x 2 +... + ß k x k The generalization to the GAM is: g(µ) = ß 0 + f 1 (x 1 ) + f 2 (x 2 ) +... + f k (x k ) where the functions

More information