s e, which is large when errors are large and small Linear regression model

Similar documents
Summarizing Data: Paired Quantitative Data

Intro to Linear Regression

Ch Inference for Linear Regression

BIVARIATE DATA data for two variables

Analysis of Bivariate Data

Intro to Linear Regression

Inference for Regression Simple Linear Regression

Least-Squares Regression. Unit 3 Exploring Data

Overview. 4.1 Tables and Graphs for the Relationship Between Two Variables. 4.2 Introduction to Correlation. 4.3 Introduction to Regression 3.

Econometrics. 4) Statistical inference

Test 3 Practice Test A. NOTE: Ignore Q10 (not covered)

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

Correlation and Regression

Statistics. Class 7: Covariance, correlation and regression

Example: Can an increase in non-exercise activity (e.g. fidgeting) help people gain less weight?

Module 8: Linear Regression. The Applied Research Center

The Multiple Regression Model

Least Squares Regression

Chapter 9. Correlation and Regression

STATS DOESN T SUCK! ~ CHAPTER 16

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

Bivariate Data Summary

23. Inference for regression

Can you tell the relationship between students SAT scores and their college grades?

y n 1 ( x i x )( y y i n 1 i y 2

y = a + bx 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation Review: Interpreting Computer Regression Output

AP Statistics Two-Variable Data Analysis

Lecture 48 Sections Mon, Nov 16, 2009

Chapter 11. Correlation and Regression

STAT Chapter 11: Regression

PERT Practice Test #2

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

Applied Regression Modeling: A Business Approach Chapter 3: Multiple Linear Regression Sections

Chapter 12 : Linear Correlation and Linear Regression

INFERENCE FOR REGRESSION

χ test statistics of 2.5? χ we see that: χ indicate agreement between the two sets of frequencies.

Simple Linear Regression. (Chs 12.1, 12.2, 12.4, 12.5)

Chapter 10. Simple Linear Regression and Correlation

Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania

Chemometrics. Matti Hotokka Physical chemistry Åbo Akademi University

Math 52 Linear Regression Instructions TI-83

AMS 7 Correlation and Regression Lecture 8

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

Reteach 2-3. Graphing Linear Functions. 22 Holt Algebra 2. Name Date Class

Chapter 8 Handout: Interval Estimates and Hypothesis Testing

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

Inference for Regression Inference about the Regression Model and Using the Regression Line

7. Do not estimate values for y using x-values outside the limits of the data given. This is called extrapolation and is not reliable.

Inference for the Regression Coefficient

Business Statistics. Lecture 10: Course Review

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov

Unit 6 - Introduction to linear regression

Simple Linear Regression Using Ordinary Least Squares

Any of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure.

Multiple Regression Analysis

Prob/Stats Questions? /32

10.1 Simple Linear Regression

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Multiple Regression. More Hypothesis Testing. More Hypothesis Testing The big question: What we really want to know: What we actually know: We know:

Mathematics for Economics MA course

Psychology 282 Lecture #3 Outline

Inferences for Regression

Simple Linear Regression. Material from Devore s book (Ed 8), and Cengagebrain.com

Inferential Statistics and Distributions

Correlation Analysis

Stats Review Chapter 14. Mary Stangler Center for Academic Success Revised 8/16

Probability and Statistics Notes

Chapter 12 - Part I: Correlation Analysis

determine whether or not this relationship is.

Chapter 12 Summarizing Bivariate Data Linear Regression and Correlation

Lecture 3: Inference in SLR

P E R E N C O - C H R I S T M A S P A R T Y

CHAPTER 6 : LITERATURE REVIEW

One-sided and two-sided t-test

Ecn Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman. Midterm 2. Name: ID Number: Section:

Reminder: Univariate Data. Bivariate Data. Example: Puppy Weights. You weigh the pups and get these results: 2.5, 3.5, 3.3, 3.1, 2.6, 3.6, 2.

9 Correlation and Regression

Correlation and Regression Analysis. Linear Regression and Correlation. Correlation and Linear Regression. Three Questions.

Unit 6 - Simple linear regression

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

Ch. 1: Data and Distributions

SMA 6304 / MIT / MIT Manufacturing Systems. Lecture 10: Data and Regression Analysis. Lecturer: Prof. Duane S. Boning

Chapter 3: Examining Relationships

Simple Linear Regression for the Climate Data

SIMPLE REGRESSION ANALYSIS. Business Statistics

Business Statistics. Lecture 9: Simple Regression

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

Business Statistics. Lecture 10: Correlation and Linear Regression

3.2: Least Squares Regressions

Chapter Learning Objectives. Regression Analysis. Correlation. Simple Linear Regression. Chapter 12. Simple Linear Regression

Predicted Y Scores. The symbol stands for a predicted Y score

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

Ch 2: Simple Linear Regression

Review for Final Exam Stat 205: Statistics for the Life Sciences

AP Statistics Unit 6 Note Packet Linear Regression. Scatterplots and Correlation

Correlation and Linear Regression

Transcription:

Linear regression model we assume that two quantitative variables, x and y, are linearly related; that is, the the entire population of (x, y) pairs are related by an ideal population regression line y = a + bx + e where a and b represent the y-intercept and slope coefficients; the quantity e is included to represent the fact that the relation is subject to random errors in measurement e can be interpreted to represent either - the deviation from the mean of that value of y from the population regression line, or - the error in using the line to predict a value of y from the corresponding given x we assume that e is a normally distributed random variable with mean m e = 0 and standard deviation s e, which is large when errors are large and small when errors are small

note that e is a different random variable for different values of x; all such e are assumed to be independent of each other and identically distributed for a fixed value x* of x, the quantity a + bx* represents the (fixed) height of the regression line at x = x*, so y = a + bx* + e is subject to the same kind of variability as e: namely, y is normally distributed with mean m y = a + bx* and standard deviation s y = s e b, being the slope of the line, represents the change in m y associated with a unit change in x; that is, b is the average change in y associated with a unit change in x parameters of interest for the regression model are s e, which measures the ideal size of errors in using the line to make predictions of y values, and b, which measures the average change in y associated with a unit change in x

Estimating regression parameters estimating s e standard deviation about the regression line s e = SSResid n - 2 is not an unbiased estimator of s e [TI83: STAT TESTS LinRegTTest (denoted s).] estimating b the slope of the regression line, b = r s x s y, is an unbiased estimator for b [TI83: STAT TESTS LinRegTTest, also STAT CALC LinReg(a+bx).]

The sampling distribution for b the sampling distribution of b is studied to determine how estimates of b will behave from sample to sample assuming that the n data points produce identical independent normally distributed errors e, all with mean 0 and standard deviation s e, we have that - m b = b s - s b = e s x n -1 - the sampling distribution of b is normal, but since neither s e nor s b are known, we estimate s e with the statistic s e, and s b with the statistic s e s b =, then estimate b with the statistic s x n -1 t = b - b having df = n 2 s b

Confidence interval for b Assuming that the n data points produce identical independent normally distributed errors e, all with mean 0 and standard deviation s e, we obtain the following confidence interval for b: b ± (t-crit.) s b where the t-critical value is based on df = n 2

Model utility test for linear regression If the slope of the regression line is b = 0, then the line is horizontal and values of y do not depend on x, so there is no use to search for a prediction of y based on knowledge of x. A test for whether b = 0 can determine whether it is appropriate to search for a linear regression between the variables x and y. Hypotheses H 0 : b = 0 H a : b 0 Test statistic t = b -0 s b, with df = n 2 Assumptions independent normally distributed errors with mean 0 and equal standard deviatons [TI-83: STAT TESTS LinRegTTest ]

Residual analysis We can use a residual plot (a plot of residuals vs. x values) to check whether it is reasonable to assume that errors are identically distributed independent normal variables; the z-scores of these residuals can be used to display a standardized residual plot: z resid = resid -0 s resid but the standard deviations of each residual vary from point to point and are not automatically calculated by the TI-83. Many statistical packages, however, do perfome these calculations. What to look for: absence of patterns in the (standardized) residual plot very few large residuals (more than 2 standard deviations from the x-axis) no variations in spread of the residuals (would indicate that s e varies with x) influential residuals (residual points far removed from the bulk of the plot)

The sampling distribution for a + bx* Assuming that the n data points produce identical independent normally distributed errors e, all with mean 0 and standard deviation s e, we study the distribution of the prediction statistic a + bx* for some fixed choice of x = x*. a + bx* is an unbiased estimate for the true regression value a + bx*, which thus represents ma + bx* 1 s a + bx* = s e n + Ê z x ˆ Á Ë n -1 1 statistic s a + bx* = s e n + Ê z x ˆ Á Ë n -1 2 and is estimated by the 2 a + bx* is normally distributed, but replacing s a + bx* with the estimate s a + bx* produces a standardized t variable with df = n 2

Confidence interval for a + bx* With the same assumptions as above, the confidence interval formula for a + bx*, the mean value of the predicted y, is where t has df = n 2 (a + bx*) ± (t-crit) s a + bx* Prediction intervals With the same assumptions as above, the prediction interval formula for y*, the prediction of y for the x value x = x*, is (a + bx*) ± (t-crit) s 2 2 e + s a+bx* where t has df = n 2 (variability comes not only from the size of the error but the extent to which the estimate a + bx* differs from the mean value)