Inferences for linear regression (sections 12.1, 12.2)

Similar documents
INFERENCE FOR REGRESSION

Confidence Interval for the mean response

23. Inference for regression

Multiple Regression Examples

Models with qualitative explanatory variables p216

Model Building Chap 5 p251

Lecture 18: Simple Linear Regression

Chapter 9. Correlation and Regression

Conditions for Regression Inference:

28. SIMPLE LINEAR REGRESSION III

Steps for Regression. Simple Linear Regression. Data. Example. Residuals vs. X. Scatterplot. Make a Scatter plot Does it make sense to plot a line?

School of Mathematical Sciences. Question 1. Best Subsets Regression

Warm-up Using the given data Create a scatterplot Find the regression line

Inference for the Regression Coefficient

Histogram of Residuals. Residual Normal Probability Plot. Reg. Analysis Check Model Utility. (con t) Check Model Utility. Inference.

Simple Linear Regression. Steps for Regression. Example. Make a Scatter plot. Check Residual Plot (Residuals vs. X)

Inference for Regression Inference about the Regression Model and Using the Regression Line

[ ESS ESS ] / 2 [ ] / ,019.6 / Lab 10 Key. Regression Analysis: wage versus yrsed, ex

Correlation and Regression

The simple linear regression model discussed in Chapter 13 was written as

Review of Regression Basics

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables

Inferences for Regression

Linear Regression and Correlation. February 11, 2009

Simple Linear Regression. Material from Devore s book (Ed 8), and Cengagebrain.com

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Simple Linear Regression: A Model for the Mean. Chap 7

Ch 13 & 14 - Regression Analysis

Statistical View of Least Squares

MBA Statistics COURSE #4

Multiple Regression an Introduction. Stat 511 Chap 9

Analysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments.

1 Introduction to Minitab

Basic Business Statistics 6 th Edition

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

This document contains 3 sets of practice problems.

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

Apart from this page, you are not permitted to read the contents of this question paper until instructed to do so by an invigilator.

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

Correlation and Regression

Inference for Regression

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.

Unit 6 - Introduction to linear regression

Correlation & Simple Regression

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

AP Statistics Bivariate Data Analysis Test Review. Multiple-Choice

Inference with Simple Regression

4.1 Least Squares Prediction 4.2 Measuring Goodness-of-Fit. 4.3 Modeling Issues. 4.4 Log-Linear Models

Chapter 27 Summary Inferences for Regression

School of Mathematical Sciences. Question 1

Applied Regression Modeling: A Business Approach Chapter 2: Simple Linear Regression Sections

SMAM 314 Exam 42 Name

Chapter 12: Multiple Regression

Correlation and the Analysis of Variance Approach to Simple Linear Regression

PART I. (a) Describe all the assumptions for a normal error regression model with one predictor variable,

y n 1 ( x i x )( y y i n 1 i y 2

Simple Linear Regression. (Chs 12.1, 12.2, 12.4, 12.5)

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

Stat 529 (Winter 2011) A simple linear regression (SLR) case study. Mammals brain weights and body weights

Analysis of Bivariate Data

MULTIPLE REGRESSION METHODS

Business Statistics. Lecture 10: Correlation and Linear Regression

Chapter 10. Correlation and Regression. Lecture 1 Sections:

Basic Business Statistics, 10/e

MATH 80S Residuals and LSR Models October 3, 2011

Simple Linear Regression

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

Unit 6 - Simple linear regression

Measuring the fit of the model - SSR

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

2.4.3 Estimatingσ Coefficient of Determination 2.4. ASSESSING THE MODEL 23

Regression. Marc H. Mehlman University of New Haven

Six Sigma Black Belt Study Guides

Inference for Regression Simple Linear Regression

W&M CSCI 688: Design of Experiments Homework 2. Megan Rose Bryant

Chapter 7. Scatterplots, Association, and Correlation

Institutionen för matematik och matematisk statistik Umeå universitet November 7, Inlämningsuppgift 3. Mariam Shirdel

assumes a linear relationship between mean of Y and the X s with additive normal errors the errors are assumed to be a sample from N(0, σ 2 )

Chapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc.

AP Statistics. The only statistics you can trust are those you falsified yourself. RE- E X P R E S S I N G D A T A ( P A R T 2 ) C H A P 9

Chapter 14 Multiple Regression Analysis

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Stat 501, F. Chiaromonte. Lecture #8

SMAM 314 Computer Assignment 5 due Nov 8,2012 Data Set 1. For each of the following data sets use Minitab to 1. Make a scatterplot.

Data files & analysis PrsnLee.out Ch9.xls

y response variable x 1, x 2,, x k -- a set of explanatory variables

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION

Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania

Finding Relationships Among Variables

Mathematics for Economics MA course

regression analysis is a type of inferential statistics which tells us whether relationships between two or more variables exist

Data Set 8: Laysan Finch Beak Widths

TMA4255 Applied Statistics V2016 (5)

Orthogonal contrasts for a 2x2 factorial design Example p130

Is economic freedom related to economic growth?

9. Linear Regression and Correlation

Important note: Transcripts are not substitutes for textbook assignments. 1

Estadística II Chapter 4: Simple linear regression

Nonlinear Regression Act4 Exponential Predictions (Statcrunch)

Topic 10 - Linear Regression

Transcription:

Inferences for linear regression (sections 12.1, 12.2) Regression case history: do bigger national parks help prevent extinction? ex. area of natural reserves and extinction: 6 national parks in Tanzania (Newmark, W. D. 1996. Insularization of Tanzanian parks and local extinction of large mammals. Conservation Biology 10:1549-1556) area yrs initial present (km ) protected species species + > W! W> 100 36 35 33 137 35 26 25 1834 83 23 21 2600 38 41 40 12950 44 39 39 23051 55 49 49 species extinction modeled as an exponential decay equation: W> œ W!/ 5> 5 : extinction rate > : time in years / œ 2.71828... W! : number of species at time 0 W >: number of species remaining after > years

rearranged (solve for 5): WÎW > 5œ log / >! each park has a different value of 5; calculate using the amount of time the park has been protected. Also calculate the logarithm of the area of the park (Newmark used base 10 for this calculation): 5 log+ 0.0016345 2.00000 0.0011206 2.13672 0.0010960 3.26340 0.0006498 3.41497 0.0000000 4.11227 0.0000000 4.36269 scatterplot of 5 vs log + reveals a strong relationship between the size of the preserve and the extinction rate

0.0018 Scatterplot of k vs log_area 0.0016 0.0014 0.0012 0.0010 k 0.0008 0.0006 0.0004 0.0002 0.0000 2.0 2.5 3.0 log_area 3.5 4.0 4.5

eoutput of MINITAB regression analysis:rgressionanalysis:kversuslog_areatheregressionequationisk=0.00277-0.000627log_areapredictorcoefsecoeftpconstant0.00276560.00040776.780.002log_area-0.00062690.0001222-5.130.007s=0.000267731r-sq=86.8%r-sq(adj)=83.5%analysisofvariancesourcedfssmsfpregression11.88767e-061.88767e-0626.330.007residualeror42.86720e-077.16800e-08total52.17439e-06predictedvaluesfornewobservationsnewobsfitsefit95%ci95%pi10.0008850.000112(0.000573,0.001197)(0.000079,0.001691)valuesofpredictorsfornewobservationsnewobslog_area13.00

Residual Plots for k Normal Probability Plot of the Residuals 99 0.00050 Residuals Versus the Fitted Values 90 0.00025 Percent 50 10 Residual 0.00000-0.00025 1-0.0010-0.0005 0.0000 Residual 0.0005 0.0010-0.00050 0.0000 0.0004 0.0008 Fitted Value 0.0012 2.0 Histogram of the Residuals 0.00050 Residuals Versus the Order of the Data Frequency 1.5 1.0 0.5 Residual 0.00025 0.00000-0.00025 0.0-0.00025 0.00000 0.00025 Residual 0.00050-0.00050 1 2 3 4 Observation Order 5 6 Fitted Line Plot k = 0.002766-0.000627 log_area 0.0025 0.0020 0.0015 Regression 95% CI 95% PI S 0.0002677 R-Sq 86.8% R-Sq(adj) 83.5% 0.0010 k 0.0005 0.0000-0.0005-0.0010 2.0 2.5 3.0 3.5 log_area 4.0 4.5

sample: C response variable dependent var. B predictor variable scœ+,b least squares regression line population: at each value of B, there is a whole frequency distribution of C values C. C œ α " B. C œ α " B B B. C : population mean of C depends (linearly) on the value of BÞ α and " are the true values of the line constants in the population; + and, are their estimates from the sample. 5 : standard deviation around. C (assume it does not depend on B)

Tasks: 1. estimate α, ", 5 ; estimate. C œ α " B (the height of the line at B) 2. measure strength of linear association 3. confidence intervals for α, ". 4. hypothesis tests for α, " (especially H!: " œ 0 vs H + : " Á 0) 5. confidence interval for. C at B. 6. prediction interval for a future response, C, at B 1. estimates of model parameters data: B, C, B, C,..., B, C <œ 1 " " 2 2 8 8 8 BB 3 CC 3 8" = B = C 3œ" " s œ,œ< αs œ+œc,b < : sample correlation = : st.dev of C 's = : st.dev of B 's = C = B C 3 B 3.sC œcœ+,b s (estimated height of line at any particular value of B, used to predict C at B)

sc 3 œ +,B3 (estimated height of line at observation ) B 3 / 3 œ C3 sc3 œ C3 +,B3 ( residual or error of prediction at observation ) B 3 8 8 WWI œ / œ C +,B 3 3 3 3œ" 3œ" of squared errors) (sum 5s WWI œ 8 5s œ 5s œ WWI 8 2. measure strength of linear association <: the sample correlation coefficient <œ BB 3 CC 3 = and B = C œ 1 8 BB 3 CC 3 8 1 = = B C 3œ1 average product of properties of < : 1. 1 Ÿ < Ÿ 1 2. the closer the sample data points are to lying on a straight line, the closer < is to 1 or 1

3. <œ1 or 1: perfect linear relationship 4. scatterplot with no linear relationship: data cloud is like a shotgun blast, < near zero. 5. < near zero does not necessarily indicate absence of relationship (relationship, for example, might be strong but nonlinear) r-squared ( ): (coefficient of determination) is the square of the correlation coefficient < < < is the proportion of variability in C explained or accounted for by the regression model sc œ+,b 0 Ÿ< Ÿ 1: < œ!, the regression equation is no better than C for prediction < œ ", the regression equation predicts perfectly idea of < : prediction when there are no explanatory variables: with one quantitative response variable C and no explanatory variables, we would use C to predict a new value of C C is the least squares prediction; it is the value of - that minimizes the sum of squared errors: D/ œ DC - 3 3

think of total unexplained variability in C as the sum of squared errors caused by just using C as as predictor of C: total variability œ C C 3 then compare the above sse to the sum of squared errors in the regression model: < œ variability unexplained œ C +,B by regression model 3 3 total variabilityvariability unexplained by regression model total variability œ variability explained by regression model total variability