Simple Linear Regression for the Climate Data

Similar documents
Simple Linear Regression for the MPG Data

Multiple Linear Regression for the Salary Data

Multiple Linear Regression for the Supervisor Data

Regression Models - Introduction

Lecture 18: Simple Linear Regression

Chapter 27 Summary Inferences for Regression

Simple Linear Regression for the Advertising Data

Regression Models - Introduction

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

Lecture 14 Simple Linear Regression

Simple and Multiple Linear Regression

appstats27.notebook April 06, 2017

Section 3: Simple Linear Regression

STAT 3A03 Applied Regression With SAS Fall 2017

Inference for the Regression Coefficient

Multiple Linear Regression

Simple Linear Regression

Inferences for Regression

AMS 7 Correlation and Regression Lecture 8

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Intro to Linear Regression

Inference for Regression Simple Linear Regression

Correlation & Simple Regression

Business Statistics. Lecture 9: Simple Regression

Intro to Linear Regression

Simple Linear Regression

INFERENCE FOR REGRESSION

Inference for Regression Inference about the Regression Model and Using the Regression Line

Warm-up Using the given data Create a scatterplot Find the regression line

Basic Probability Reference Sheet

STAT 4385 Topic 03: Simple Linear Regression

Advanced Engineering Statistics - Section 5 - Jay Liu Dept. Chemical Engineering PKNU

Statistical View of Least Squares

Measuring the fit of the model - SSR

Mathematics for Economics MA course

ECON The Simple Regression Model

Lecture 10 Multiple Linear Regression

Simple linear regression

Harvard University. Rigorous Research in Engineering Education

MS&E 226: Small Data

Statistical Techniques II EXST7015 Simple Linear Regression

review session gov 2000 gov 2000 () review session 1 / 38

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Linear regression. Linear regression is a simple approach to supervised learning. It assumes that the dependence of Y on X 1,X 2,...X p is linear.

Section 2: Estimation, Confidence Intervals and Testing Hypothesis

Linear Regression In God we trust, all others bring data. William Edwards Deming

Lecture 3: Inference in SLR

1 Least Squares Estimation - multiple regression.

7.0 Lesson Plan. Regression. Residuals

Assumptions, Diagnostics, and Inferences for the Simple Linear Regression Model with Normal Residuals

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

Ch. 1: Data and Distributions

Lecture 6 Multiple Linear Regression, cont.

MS&E 226: Small Data

Data Science for Engineers Department of Computer Science and Engineering Indian Institute of Technology, Madras

Statistics for Engineers Lecture 9 Linear Regression

Confidence Intervals, Testing and ANOVA Summary

Regression Analysis: Basic Concepts

Start with review, some new definitions, and pictures on the white board. Assumptions in the Normal Linear Regression Model

STA Module 10 Comparing Two Proportions

STAT Chapter 11: Regression

Important note: Transcripts are not substitutes for textbook assignments. 1

Announcements. Lecture 10: Relationship between Measurement Variables. Poverty vs. HS graduate rate. Response vs. explanatory

Chi-square tests. Unit 6: Simple Linear Regression Lecture 1: Introduction to SLR. Statistics 101. Poverty vs. HS graduate rate

Stats Review Chapter 14. Mary Stangler Center for Academic Success Revised 8/16

Exam Applied Statistical Regression. Good Luck!

Introduction and Single Predictor Regression. Correlation

Probability and Statistics Notes

Chapter 6: Linear Regression With Multiple Regressors

Unit 6 - Simple linear regression

Chapter 24. Comparing Means. Copyright 2010 Pearson Education, Inc.

Part Possible Score Base 5 5 MC Total 50

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Review of Statistics 101

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

Lecture 6: Linear Regression

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

36-463/663: Multilevel & Hierarchical Models

Correlation and Linear Regression

Regression. Estimation of the linear function (straight line) describing the linear component of the joint relationship between two variables X and Y.

F3: Classical normal linear rgression model distribution, interval estimation and hypothesis testing

28. SIMPLE LINEAR REGRESSION III

Multiple Regression Analysis

Statistical View of Least Squares

Inference in Regression Analysis

Sociology 6Z03 Review II

Simple Linear Regression Analysis

Business Statistics. Lecture 10: Course Review

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Homework 1 Solutions

Lecture 18 Miscellaneous Topics in Multiple Regression

Lecture 5: Omitted Variables, Dummy Variables and Multicollinearity

Chapter 12 - Lecture 2 Inferences about regression coefficient

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc

Business Statistics. Lecture 10: Correlation and Linear Regression

2. Outliers and inference for regression

The t-test Pivots Summary. Pivots and t-tests. Patrick Breheny. October 15. Patrick Breheny Biostatistical Methods I (BIOS 5710) 1/18

Transcription:

Prediction Prediction Interval Temperature 0.2 0.0 0.2 0.4 0.6 0.8 320 340 360 380 CO 2 Simple Linear Regression for the Climate Data

What do we do with the data? y i = Temperature of i th Year x i =CO 2 in i th Year i =1,...,n n = 55 Sample Size Primary Research Questions: 1. How does CO 2 relate to temperature? 2. Can we predict temperatures based on a CO 2 scenario?

Exploratory Results r =0.93 Temperature 0.2 0.0 0.2 0.4 0.6 320 340 360 380 CO 2 Cov(X, Y )=5.55 1. Form Linear? 2. Direction Positive or Negative 3. Strength 4. Outliers

SLR Model Fit and Assumptions ŷ = 3.08 + 0.01 (CO 2 ) ˆ =0.092 R 2 =0.8649 Predictive Bias 0 PRMSE = 0.1 1. Linear? 2. Independent? 3. Normal? 4. Equal Variance? 320 340 360 380 0.2 0.0 0.2 0.4 0.6 CO2 Temperature Histogram of Standardized Residuals Standardized Residuals Frequency 2 1 0 1 2 0 2 4 6 8 10 0.0 0.2 0.4 0.6 0.2 0.1 0.0 0.1 0.2 Fitted Values Residuals

Accounting for Uncertainty in Predictions Q: If CO2 were to increase by 1, how much would global temperatures increase on average? A: Our best guess is 0.01 but we are uncertain. Note: If we took another sample, our estimate ˆ1 would change. How do we incorporate sampling variability (uncertainty) into our regression results?

Accounting for Uncertainty in Parameters Ways to Incorporate Uncertainty in Claims About 1: 1. Hypothesis Testing T-distribution with One can show (in Stat 535, not in 330) : n-2 degrees of freedom t = ˆ1 1 SE( ˆ1) = ˆ1 1 T n 2 p ˆ P(xi x) 2 Such that the t-distribution can be used to compute p-values.

Accounting for Uncertainty in Parameters Ways to Incorporate Uncertainty in Claims About 1: 1. Hypothesis Testing For example, the p-value for the test H 0 : 1 =0 H a : 1 6= 0 t = 18.37 is, essentially, 0. So, our conclusion, that accounts for uncertainty, is: The effect of CO 2 on global temperature is not zero. Or, CO 2 has a significant effects on temperature. Density 0.0 0.1 0.2 0.3 0.4 t t statistic t

Accounting for Uncertainty in Parameters Ways to Incorporate Uncertainty in Claims About 1: 1. Hypothesis Testing For example, the p-value for the test H 0 : 1 =0 H a : 1 6= 0 t = 18.37 is, essentially, 0. How do you interpret the p-value? Assuming the null hypothesis is true, the probability of observing a slope of 0.01, or more extreme, is essentially zero. Note: Statements about statistical significance are really just asking for you to incorporate uncertainty in your results. Density 0.0 0.1 0.2 0.3 0.4 t t statistic t

Accounting for Uncertainty in Parameters Ways to Incorporate Uncertainty in Claims About 1: 2. Confidence Intervals Because, we know ˆ1 1 SE( ˆ1) T n 2 Prob t? 0.025 < ˆ1 1 SE( ˆ1) <t? 0.975 2.5% quantile: value such that 2.5% is BELOW it. 97.5% quantile: value such that 97.5% is BELOW it.! =0.95.

Accounting for Uncertainty in Parameters Ways to Incorporate Uncertainty in Claims About 1: 2. Confidence Intervals Because, ˆ1 1 SE( ˆ1) T n 2 we know (more generally) for any 0<α<1 Prob t? /2 < ˆ1 1 SE( ˆ1) <t? 1 /2! =1.

Accounting for Uncertainty in Parameters Ways to Incorporate Uncertainty in Claims About 1: 2. Confidence Intervals Rearranging, we find h i Prob ˆ1 t? 1 /2SE( ˆ1) < 1 < h i = Prob ˆ1 t? 1 /2SE( ˆ1) < 1 < h ˆ1 t? /2SE( ˆ1)i =0.95. h ˆ1 + t? 1 /2SE( ˆ1)i Which gives rise to the (100-α)% confidence interval formula: ˆ1 ± t? 1 /2SE( ˆ1) (0.0085, 0.0106)

Accounting for Uncertainty in Parameters Ways to Incorporate Uncertainty in Claims About 1: 2. Confidence Intervals Q: If CO 2 were to increase by 1, how much would global temperatures increase on average? (0.0085, 0.0106) We are 95% confident that a 1 unit increase in CO 2 would increase global temperature by between 0.0085 and 0.0106 degrees, on average.

Accounting for Uncertainty in Parameters Ways to Incorporate Uncertainty in Claims About 1: 2. Confidence Intervals Q: What do we mean by confident? If sampling were to be repeated, 95% of all such confidence intervals would contain the true increase in global temperatures. Important Note: The uncertainty that you express here is only the uncertainty due to sampling variability. That is, the confidence intervals quantify the uncertainty that arises by basing conclusions about a population from a sample.

Accounting for Uncertainty in Parameters Ways to Incorporate Uncertainty in Claims About 0: 1. Hypothesis Testing 2. Confidence Intervals t = ˆ0 0 SE( ˆ0) = ˆ0 q 0 T n 2 1 ˆ P (xi x) 2 n + x 2 ˆ0 ± t? 1 /2SE( ˆ0) H 0 : 0 =0 H a : 0 < 0 p value 0 ( 3.45, 2.72) But, this is hardly ever done because 0 is hard to interpret!

Centered SLR Model Issue: 0 is (often) not interpretable. Solution: Center the predictor (covariate): Then fit the model: Least Squares Estimators: ˆ? 0 =ȳ ˆ? 1 x? =ȳ x? i = x i x y i = 0? + 1x?? i P n ˆ? i=1 1 = (y i ȳ)x? i P n = i=1 (x? i )2 P n i=1 (y i ȳ)(x i x) P n i=1 (x i x) 2 = ˆ1 ˆ? = s PN i=1 (y i N ˆ? 0 ˆ1x? i )2 = s PN i=1 (y i N ˆ0 ˆ1x i ) 2 =ˆ

Centered SLR Model Advantages of Centered Model: 1. Intercept is now interpretable: the predicted value of y when x i = x OR the average value of y. 2. The slope and variation parameter stay the same. 3. Slope and intercept are now independent. Corr( ˆ? 0, ˆ1) =0 The Centered Model: ŷ =0.25 + 0.01x?

Accounting for Uncertainty In Parameters? Ways to Incorporate Uncertainty in Claims About 0 : 1. Hypothesis Testing 2. Confidence Intervals ˆ?? ˆ? t = 0 0 SE( ˆ? 0 ) = 0 ˆ/ p n T n 2? H 0 : 0 =0 H a :? 0 > 0 p value 0? 0 ˆ? 0 ± t? 1 /2 SE( ˆ? 0 ) (0.226, 0.276) We are 95% confident that the average global temperature is between 0.226 and 0.276 degrees.

Accounting for Uncertainty in Predictions Q: Projections indicate an increase of CO 2 into the future. What do you predict will be the global temperature for a CO 2 level of 400? Recall: ŷ = 3.08 + 0.01x Uncentered ŷ =0.25 + 0.01x? Centered A: Our best guess is 0.729 but we are uncertain. Note: If we took another sample, our prediction would change. How do we incorporate sampling variability (uncertainty) into our predictions?

Accounting for Uncertainty In Parameters Ways to Incorporate Uncertainty in Predictions: 1. Confidence Intervals for the Mean Stat 535 Result: which can be rearranged to: ˆ0 + ˆ1x ( 0 + 1 x) SE( ˆ0 + ˆ1x) Prob t? /2 < ˆ0 + ˆ1x ( 0 + 1 x) SE( ˆ0 + ˆ1x) s SE( ˆ0 + ˆ1x) 1 (x x)2 =ˆ + P n (xi x) 2 <t? 1 /2 ( ˆ0 + ˆ1x) ± t? 1 /2 SE( ˆ0 + ˆ1x) T n 2! =1. Variability depends on where you are predicting

Accounting for Uncertainty In Parameters Ways to Incorporate Uncertainty in Predictions: 2. Predictions Intervals for Individuals ŷ y Stat 535 Result: SE(ŷ) T n 2 Prob t? /2 < ŷ y SE(ŷ) <t? 1 /2 =1. s SE(ŷ) =ˆ 1+ 1 (x x)2 + P n (xi x) 2 which can be rearranged to: ŷ ± t? 1 /2 SE(ŷ) Variability depends on where you are predicting

Accounting for Uncertainty in Predictions Q: Projections indicate an increase of CO 2 into the future. What do you predict will be the global temperature for a CO 2 level of 400? Recall: ŷ = 3.08 + 0.01x Uncentered ŷ =0.25 + 0.01x? Centered Confidence Int. For Mean: 0.728 ± 2.01(0.028) = (0.671, 0.786) Predictive Int.: 0.728 ± 2.01(0.096) = (0.535, 0.922) A: We are 95% confident that if the CO2 level was 400, the global temperature would be between 0.535 and 0.922.

Accounting for Uncertainty in Predictions Q: Projections indicate an increase of CO 2 into the future. What do you predict will be the global temperature for a CO 2 level of 400? Recall: ŷ = 3.08 + 0.01x Uncentered ŷ =0.25 + 0.01x? Centered Temperature 0.2 0.0 0.2 0.4 0.6 0.8 Prediction Prediction Interval Confidence Interval 320 340 360 380 CO 2

Cross-Validation Revisited When we perform cross-validation, we are used to calculating: 1. Bias 2. RPMSE But prediction intervals should also be generated for each test observation and calculate the following: 1. Coverage = % of prediction intervals that contain the true value 2. Predictive Interval Width = average width of prediction interval

Regression Assumptions Revisited What happens if our assumptions aren t met: Linearity if non-linear, everything breaks! Don t fit a line to non-linear data! Independence estimates are still unbiased (i.e. we fit the right line) but measures of the accuracy of those estimates (the standard errors) are typically too small. Normality estimates are still unbiased (i.e. we fit the right line), standard errors are correct BUT confidence/prediction intervals are wrong (can t use t-distribution). Equal variance estimates are still unbiased but standard errors are wrong (and we don t know how wrong).

End of Climate Analysis (see webpage for R and SAS code)