SIMPLE LINEAR REGRESSION STAT 251

Similar documents
Linear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation?

Stochastic Analysis and Forecasts of the Patterns of Speed, Acceleration, and Levels of Material Stock Accumulation in Society

Objectives. 2.3 Least-squares regression. Regression lines. Prediction and Extrapolation. Correlation and r 2. Transforming relationships

appstats27.notebook April 06, 2017

Chapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc.

appstats8.notebook October 11, 2016

Chapter 8. Linear Regression. The Linear Model. Fat Versus Protein: An Example. The Linear Model (cont.) Residuals

Chapter 27 Summary Inferences for Regression

AP Statistics. Chapter 6 Scatterplots, Association, and Correlation

Unit 6 - Introduction to linear regression

, (1) e i = ˆσ 1 h ii. c 2016, Jeffrey S. Simonoff 1

Inferences for Regression

STA Module 5 Regression and Correlation. Learning Objectives. Learning Objectives (Cont.) Upon completing this module, you should be able to:

Sociology 6Z03 Review I

Analysing data: regression and correlation S6 and S7

2017 Source of Foreign Income Earned By Fund

Chapter 7 Summary Scatterplots, Association, and Correlation

Learning Objectives. Math Chapter 3. Chapter 3. Association. Response and Explanatory Variables

Warm-up Using the given data Create a scatterplot Find the regression line

Basic Business Statistics 6 th Edition

MODELING. Simple Linear Regression. Want More Stats??? Crickets and Temperature. Crickets and Temperature 4/16/2015. Linear Model

Chapter 7. Scatterplots, Association, and Correlation. Copyright 2010 Pearson Education, Inc.

Chapter 12 - Part I: Correlation Analysis

Unit 6 - Simple linear regression

Canadian Imports of Honey

Chapter 9. Correlation and Regression

Correlation & Simple Regression

Describing Data: Two Variables

Chapter 3: Examining Relationships

MATH 1070 Introductory Statistics Lecture notes Relationships: Correlation and Simple Regression

Relationships Regression

Important note: Transcripts are not substitutes for textbook assignments. 1

Sampling Distributions in Regression. Mini-Review: Inference for a Mean. For data (x 1, y 1 ),, (x n, y n ) generated with the SRM,

STATISTICS Relationships between variables: Correlation

Sociology 6Z03 Review II

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS

International Student Enrollment Fall 2018 By CIP Code, Country of Citizenship, and Education Level Harpur College of Arts and Sciences

Correlation and Regression

Note on Bivariate Regression: Connecting Practice and Theory. Konstantin Kashin

Lectures on Simple Linear Regression Stat 431, Summer 2012

Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania

Chapter 4 Data with Two Variables

Regression and correlation. Correlation & Regression, I. Regression & correlation. Regression vs. correlation. Involve bivariate, paired data, X & Y

Statistics for Managers using Microsoft Excel 6 th Edition

Business Statistics. Lecture 10: Course Review

Chapter 5 Least Squares Regression

Inference with Simple Regression

4.1 Introduction. 4.2 The Scatter Diagram. Chapter 4 Linear Correlation and Regression Analysis

Correlation and regression

Chapter 16. Simple Linear Regression and dcorrelation

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

9. Linear Regression and Correlation

Example: Forced Expiratory Volume (FEV) Program L13. Example: Forced Expiratory Volume (FEV) Example: Forced Expiratory Volume (FEV)

Assumptions, Diagnostics, and Inferences for the Simple Linear Regression Model with Normal Residuals

HOLLOMAN S AP STATISTICS BVD CHAPTER 08, PAGE 1 OF 11. Figure 1 - Variation in the Response Variable

Scatter plot of data from the study. Linear Regression

04 June Dim A W V Total. Total Laser Met

Announcements. Lecture 18: Simple Linear Regression. Poverty vs. HS graduate rate

AMS 7 Correlation and Regression Lecture 8

3.1 Scatterplots and Correlation

Single and multiple linear regression analysis

Chapter 6 Scatterplots, Association and Correlation

Chapter 4 Data with Two Variables

LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION

Chapter 16. Simple Linear Regression and Correlation

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

AP Statistics. Chapter 9 Re-Expressing data: Get it Straight

Mr. Stein s Words of Wisdom

1. Create a scatterplot of this data. 2. Find the correlation coefficient.

Lecture 4 Scatterplots, Association, and Correlation

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

Math 243 OpenStax Chapter 12 Scatterplots and Linear Regression OpenIntro Section and

Appendix B: Detailed tables showing overall figures by country and measure

Chapter 7. Scatterplots, Association, and Correlation

M 140 Test 1 B Name (1 point) SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 75

9 Correlation and Regression

INFERENCE FOR REGRESSION

Inference for the Regression Coefficient

STAT 350 Final (new Material) Review Problems Key Spring 2016

determine whether or not this relationship is.

Notes 11: OLS Theorems ECO 231W - Undergraduate Econometrics

Scatter plot of data from the study. Linear Regression

Harvard University. Rigorous Research in Engineering Education

STAT 4385 Topic 03: Simple Linear Regression

Regression Models. Chapter 4. Introduction. Introduction. Introduction

Regression. Marc H. Mehlman University of New Haven

ECON 497: Lecture 4 Page 1 of 1

UNIT 12 ~ More About Regression

Business Statistics. Lecture 10: Correlation and Linear Regression

Chapter 7. Scatterplots, Association, and Correlation. Scatterplots & Correlation. Scatterplots & Correlation. Stat correlation

LECTURE 15: SIMPLE LINEAR REGRESSION I

Chi-square tests. Unit 6: Simple Linear Regression Lecture 1: Introduction to SLR. Statistics 101. Poverty vs. HS graduate rate

Chapter 10: Comparing Two Quantitative Variables Section 10.1: Scatterplots & Correlation

Applied Regression Analysis. Section 2: Multiple Linear Regression

Does socio-economic indicator influent ICT variable? II. Method of data collection, Objective and data gathered

ia PU BLi s g C o M Pa K T Wa i n CD-1576

BIVARIATE DATA data for two variables

3 Non-linearities and Dummy Variables

Chapter 7: Correlation and regression

Lecture 4 Scatterplots, Association, and Correlation

Transcription:

1 SIMPLE LINEAR REGRESSION STAT 251

OUTLINE Relationships in Data The Beginning Scatterplots Correlation The Least Squares Line Cautions Association vs. Causation Extrapolation Outliers Inference: Simple Linear Regression The Theoretical Model Testing and Estimating the Slope Coefficient Confidence Intervals and Prediction Intervals Verifying Assumptions 2

3 EXPLORING RELATIONSHIPS Up to now we ve mostly concentrated on exploring one variable at a time. In ANOVA and hypothesis testing, we determined whether or not there is a relationship, but we ve yet to explore how two variables relate. Here, we address the question of exploring relationships between two quantitative variables. We ll use an ongoing example to illustrate the concepts of data exploration.

EXPLORING LINEAR RELATIONSHIPS IN DATA 4

Televison, Physicians, and Life Expectancy Country Life expect ancy People /TV People / physici an Argentina 70.5 4 370 Bangladesh 53.5 315 6166 Brazil 65 4 684 Canada 76.5 1.7 449 China 70 8 643 Colombia 71 5.6 1551 Egypt 60.5 15 616 Ethiopia 51.5 503 36660 France 78 2.6 403 Germany 76 2.6 346 India 57.5 44 2471 Indonesia 61 24 7427 Iran 64.5 23 2992 Italy 78.5 3.8 233 Japan 79 1.8 609 Kenya 61 96 7615 Korea, 70 90 370 North Korea, South 70 4.9 1066 Mexico 72 6.6 600 Morocco 64.5 21 4873 Myanmar 54.5 592 3485 (Burma) Pakistan 56.5 73 2364 Peru 64.5 14 1016 Philippines 64.5 8.8 1062 Poland 73 3.9 480 Romania 72 6 559 Russia 69 3.2 259 South Africa 64 11 1340 Spain 78.5 2.6 275 Sudan 53 23 12550 Taiwan 75 3.2 965 Tanzania 52.5* 25229 Thailand 68.5 11 4883 Turkey 70 5 1189 Ukraine 70.5 3 226 United 76 3 611 Kingdom United 75.5 1.3 404 States Venezuela 74.5 5.6 576 Vietnam 65 29 3096 5 Zaire 54* 23193 SOURCE: _The World Almanac and Book of Facts 1993_ (1993), New York: Pharos Books.

6 OUR GOAL We hope to establish a relationship between two variables. This relationship will allow us to make predictions about the value of one variable based on the observed value of the other. Explanatory Variable (X): The explanatory (also called predictor) variable is used to try and explain or predict the other variable. E.g. the number of physicians in a country could be used to predict the life expectancy of its citizens. Response Variable (Y): The response variable is the variable we are trying to predict. E.g. life expectancy

7 USING A GRAPHICAL DEVICE TO VISUALIZE THE RELATIONSHIP. Is there a relationship between the number of physicians in a country and its citizens life expectancy? Would making histograms for both variables help us answer this question? Why? A Scatterplot helps us to visualize the relationship between two quantitative variables and determine if there is any association between the two variables.

8 MAKING A SCATTERPLOT We first identify our explanatory, or independent, variable (X), which will be conveniently plotted on the x-axis, and our response, or dependent, variable (Y) conveniently plotted on the y-axis. Secondly we collect and plot each point (x i,y i ). These are the observations for case i. In our example, (x 1,y 1 ) are the number of physicians and the life expectancy for Argentina. The origin does not have to be included in the plot.

9

10 PATTERNS OF SCATTERPLOTS When looking for patterns on a scatterplot there are generally 4 things we should think about; the direction Form scatter of the points and if there are any outliers.

11 1 DIRECTION The direction is Positive if the x and y values tend to go in the same direction (ie) when x is low y is generally low, and when x is high y is generally high. The direction is Negative if the x and y values tend to go in opposite directions (ie)when x is low y is generally high, and when x is high y is generally low.

12 2 FORM Linear Form Non-linear or Curved form No clear form

13 3 SCATTER When there is a clear form, the scatter of the points about the line or curve will indicate the strength of the association Are the points tight to their form or Are they loose?

14 4 OUTLIERS Scatterplots will often reveal outliers if they are present.

TRANSFORMATIONS It s much simpler to deal with linear relationships than curved ones. (for one it will allow us to use correlation) When a scatter plot has a form which is nonlinear, we can transform one or both variables to render it linear. In our example, the relationship is not linear. The most common transformation is taking the log of one of the variables. On the following page, the log of X, the number of citizens per physician, was taken.

16

17 PLOTTING GUIDELINES 1 quantitative variable: histogram, stem-andleaf, boxplot 1 categorical variable: barchart, piechart 1 quantitative 1 categorical: side-by-side boxplots 2 categorical: bar charts, stacked bar charts 2 quantitative: scatterplots

CORRELATION As with module 1, we ve first discussed how to use graphical tools to explore the relationship between two categorical variables and now we follow this with a numerical measure. Scatterplots are useful at displaying a relationship, but are inherently vague. Correlation measures the strength of the Linear-association between two quantitative variables, and is denoted by the letter (r or ρ).

19 CALCULATING CORRELATION EQUATION You Won t be expected to calculate these by hand!

20 PROPERTIES The direction of the association dictates the sign of the correlation. I.e. if the association is positive, then the r is positive. r is always between -1 and 1. Correlations of 1 or -1 indicate perfect positive or negative association. r near zero indicates very weak or absent linear association. x and y are interchangeable Linear transformations of variables do not affect correlation.

21

22 GUIDELINES r 0.90 very strong association 0.90 r 0.70 fairly strong association 0:70 r 0.50 somewhat weak association 0:50 r 0.30 very weak association r 0.3 no/little association * note, that we are talking about linear association.

23

24 CAREFUL Correlation is used on quantitative variables It is a measure of linear association. The scatterplot should indicate/support linear association. Outliers can severely distort the correlation.

For the scatterplots below, assign what you think is the appropriate correlation for each, choosing from the list of numbers below: r : -1, -0.95, -0.77, -0.55, -0.35, 0, 0, 0.35, 0.50, 0.75, 0.95, 1 25

Q1 A fellow researcher is exploring the relationships between various variables. She informs you that the correlation between variable A and B is 0.85. You ask for the plot to confirm linearity and she says: With a correlation so high it has to be linear. Do you agree with this statement? A. Yes B. No 26

Q2 Betty Crocker was exploring the relationship between cooking time and temperature for brownies. She works in degrees Fahrenheit and found a sample correlation of -0.77. We would like to use these findings, but we are working in degrees Celsius which is 5/9X -160/9 degrees Fahrenheit. What correlation should we report? A. -0.77 B. 5/9(-0.77) C. 5/9(-0.77) -160/9 D. 1-0.77 27

Q3 In the Betty Crocker example, she found a correlation of -0.77. Which of the following statements is correct? A. If we increase the temperature, the cooking time will decrease. B. If we increase the temperature, we expect the cooking time will decrease. C. If we increase the temperature, the cooking time will increase. D. If we increase the temperature, we expect the cooking time will increase. 28

LINEAR REGRESSION: FITTING A LINE TO THE DATA 29

FITTING A LINE Once a linear relation is established, we seek a numerical quantification of the linear relationship between two quantitative variables. We categorized variables as being predictor variables or explanatory variables and in these chapters we discuss how to go about predicting. Linear Regression is a method of fitting a straight line to a scatterplot and predicting the Response (y) using the Explanatory variable (x).

When trying to fit a line to a scatterplot, there are many lines which are acceptable candidates. In order to properly discuss how we choose the best fitting, we ll need to develop some vocabulary first. 31

THE MODEL The model fits a straight line to the data which can be used to make predictions on the response. Mathematically, the model is The slope tells us how big a change in y to expect for a unit increase in x. If it s positive, y will increase with x. Note: the intercept can sometimes be meaningless

33 RESIDUALS Once we have our regression line (from the model), for any value of x, we can predict the value of y by using the corresponding y-value on the line. These predicted values are called fitted values and are denoted by For an observed value of y, we can obtain the difference between the observation and the predicted value which we call the residual (e)

It is apparent from above, that the smaller the residuals are, the better our model is at making predictions. 34

35 MINIMIZE THE SQUARED DEVIATIONS? The Residuals are both positive and negative thus we can t minimize them directly and there are infinitely many lines which lead to residuals which sum to zero. Recall that when calculating variance, we faced the same issue and used the sum of squared differences to quantify the spread in the data. The same trick is used again and it is the sum of squared residuals or deviations from the line which we minimize. Only one line leads to the minimization of the squared residuals. We call this line the least squares line

36 THE LEAST SQUARES LINE Only one line leads to the minimization of the squared residuals. We call this line the least squares line. The Regression line and Least Squares line are the same line.

37

38 EXAMPLE An experiment was designed for the Department of Materials Engineering to study hydrogen embrittlement properties based on electrolytic hydrogen pressure measurements. The solution used was 0.1 N NaOH, the material being certain type of stainless steel. The cathodic charging current density was controlled and varied at four levels. Here are some summary statistics Variable Mean Sample SD Correlation Charging Current Density (ma/cm 2 ) Effective Hydrogen Pressure (atm) 2.03 1.187 0.929 282.68 173.025

Find the regression line 39

40 MORE QUESTIONS What would you predict the pressure to be if the current was 2.1 ma/cm 2? What about 4.0 ma/ cm 2? For every 1 ma/cm 2 increase in current, what is the expected increase in effective hydrogen pressure? On the 16 th trial, the current was set to 1.5 and the pressure was measured at 275.1. What would the residual for this observation be?

SOME REMARKS The regression line goes through the mean-mean point. Interpreting the slope b 1 : On average, an increase of 1 SD x in X is associated with a change of r x SD y in Y. So in our example, for every 1.187mA/cm 2 shift in current, we have an expected shift of (0.929)(173.025) in pressure Interpreting the intercept b 0 : the predicted value for x=0.

CAUTIONS: CAUSATION, EXTRAPOLATION AND OUTLIERS 42

CAREFUL ASSOCIATION IS NOT CAUSATION To make predictions, we only need association, not causation. Observing strong association does not imply causation. Causation leads to association, but association does not necessarily lead to causation. Association may be purely due to luck. 43

HOW WERE THE DATA COLLECTED? There may be an underlying variable, called a lurking variable, which is associated to both x and y. The way the data are obtained dictates if we can imply causation: An Experiment removes the influence of other variable, so we can conclude causation A Study is susceptible to the influence of other variables. 44

EXAMPLES

Televison, Physicians, and Life Expectancy Country Life expect ancy People /TV People / physici an Argentina 70.5 4 370 Bangladesh 53.5 315 6166 Brazil 65 4 684 Canada 76.5 1.7 449 China 70 8 643 Colombia 71 5.6 1551 Egypt 60.5 15 616 Ethiopia 51.5 503 36660 France 78 2.6 403 Germany 76 2.6 346 India 57.5 44 2471 Indonesia 61 24 7427 Iran 64.5 23 2992 Italy 78.5 3.8 233 Japan 79 1.8 609 Kenya 61 96 7615 Korea, 70 90 370 North Korea, South 70 4.9 1066 Mexico 72 6.6 600 Morocco 64.5 21 4873 Myanmar 54.5 592 3485 (Burma) Pakistan 56.5 73 2364 Peru 64.5 14 1016 Philippines 64.5 8.8 1062 Poland 73 3.9 480 Romania 72 6 559 Russia 69 3.2 259 South Africa 64 11 1340 Spain 78.5 2.6 275 Sudan 53 23 12550 Taiwan 75 3.2 965 Tanzania 52.5* 25229 Thailand 68.5 11 4883 Turkey 70 5 1189 Ukraine 70.5 3 226 United 76 3 611 Kingdom United 75.5 1.3 404 States Venezuela 74.5 5.6 576 Vietnam 65 29 3096 46 Zaire 54* 23193 SOURCE: _The World Almanac and Book of Facts 1993_ (1993), New York: Pharos Books.

EXTRAPOLATION Extrapolation is when we try to predict the response variable for an explanatory variable which is outside the range of our observed explanatory variable. Interpolation is when we try to predict within that range. Extrapolating makes the assumption that the relationship for the two variables continues beyond the limits of this range. Often this can give misleading predictions as this assumption doesn t hold Predicting the future through regression is always extrapolating 47

EXAMPLES Some data on weight and age of girls between the ages of 2 and 10 were collected. The relationship is linear and very strong. The model which arises from the data is: Weight = 14.34+5.89(age) So one could interpolate the weight of an average 5 year old to be But were we to trust this model to go on beyond the range of ages here, what would we predict the weight of a 40 year old to be? Does this make sense? 48

The danger of making predictions outside the range of the observed x 49 values is that the linear relationship for the observed data may not hold anymore once we leave the range.

BEWARE OF INFLUENTIAL POINTS We ve already discussed how problematic outliers can be in the context of summary statistics (E.g. mean and variance) Outliers are also problematic in regression. We ll define three types of outliers and how they differ in their effect on the regression line. y-outliers x-outliers 50

51

INFLUENTIAL POINTS We call an observation influential if omitting it from the analysis will largely change the model. If a high leverage point or a y-outlier are model outliers, then they are influential points. When an outlier is present, one should fit two models one with and one without the potentially influential point. The outlier shouldn t be omitted without justification. 52

THE EFFECT OF A NON-INFLUENTIAL OUTLIER Outliers which aren t model outliers can still affect the regression. Including the outlier can, in some cases raise the R 2. R 2 =0.263 R 2 =0.7367 53

Q4 The next step Betty Crocker took was to use her data to estimate a regression line. What is the response variable here? A. Temperature B. Cooking Time 54

Q5 The estimated regression line was: Y = 25 (1/30)X The best interpretation of this slope would be: A. For every degree we raise the temperature, we reduce the cooking time by 1/30 of a minute. B. The cooking time decreases as we increase the temperature. C. For every degree we raise the temperature, we expect to reduce the cooking time by 1/30 of a minute. D. On average, increasing the temperature by one degree will decrease the cooking time by 1/30 of a second 55

Q6 The estimated regression line found by Betty Crocker is: Y = 25 (1/30)X She then cooked brownies in 10 minutes. What temperature do you predict she cooked at? A. 25 (1/30)x10 = 24.66 B. (25-10)x30 = 450 C. (25 + 10)x30 = 1050 D. Can t tell from this information 56

THE STOCHASTIC MODEL AND ASSUMPTIONS

BRINGING THE STATISTICS TO REGRESSION The Least Squares Line is found on purely mathematical grounds. In order to make statistical inference, we expand the model slightly. Y = 0 + 1 x + So for a single response we have Y i = 0 + 1 x i + i Where i ~ N(0, 2 ) 58

FIGURE 17.3 DISTRIBUTION OF Y GIVEN X

THE ASSUMPTIONS There are 4 assumptions made in Simple Linear regression: The residuals are Normally distributed The variance of the errors are constant The observations are independent The relation is linear Note: The assumptions and Model go hand in hand The observations (y i ) are Normally distributed. The variance is constant (homoscedasticity) The relationship is linear 60

DIAGNOSTICS Independence: It is determined through design not by graphical investigation. Normality: Verified using a histogram or a QQ plot Homoscedasticity (Constant Variance): plot the residuals against the fitted values. Look out for patterns as they indicate that the assumptions are not met. Linearity: verified using a scatter plot or a residual plot Outliers: We should also look at the scatter plots for outliers. These are called influential points and should be avoided. 61

62

63

INFERENCE PART 1: INFERENCE ON THE MODEL 64

WHEN GIVEN A MODEL Suppose someone collected data and estimated a model. Without the data, we have no idea how good the model is. Here are questions we may want to ask about the model: How good is the model at predicting? How strong is the relationship? Should we use the explanatory variable to estimate the mean of the response? Is the slope significant? How can I construct a Confidence Interval for the mean of the response? How can I create a prediction Interval for an individual meeting certain criteria. 65

ANSWERS How good is the model at predicting Coefficient of Determination Is the slope significant? T-test for the slope How can I construct a Confidence Interval for the mean of the response? How can I create a prediction Interval for an individual meeting certain criteria. Confidence Interval for the Expected Value of y Prediction Interval 66

SO YOU VE DETERMINED THAT THE MODEL IS LINEAR Having determined that the model is a line and not just a mean, we want to know: How good at making predictions is our model? Recall that correlation is a measure of the linear association between two variables. Obviously the stronger the association, the better the model will be at making predictions. 67

COEFFICIENT OF DETERMINATION The Coefficient of determination is simply a reexpression of the correlation which lends itself better to the question at hand R 2 = r 2 Coefficient of Determination = Correlation 2 68

QUESTIONS Does a higher Coefficient of Determination imply a better model? If we reject the Null Hypothesis of the ANOVA test, do we also reject the t-test for the slope? What if we fail to reject the ANOVA, do we also fail to reject the t-test? 70

TESTING THE SLOPE As with before, ANOVA is a generalization of a t- test. We can use a t-test to test for the slope. Compared to ANOVA, we can test for a specific side and not only for a slope. H 0 : 1 = 0 H A : 1 0 or H A : 1 < 0 or H A : 1 > 0 The Conditions required for this test are those required for Simple Linear Regression. What are they? 71

TESTING THE SLOPE 2 If these are met, then the sampling distribution of b 1 is: Normal Has Mean 1 Has Standard Error Here 72

TESTING THE SLOPE 3 The test follows the same form that all our t-test have followed: With degrees of freedom n-2 We can also construct a Confidence Interval for the Slope 73

Q7 A suspicious Elf measured the relation between the value of toys given at Christmas and the degree of goodness of children (don t ask how). He obtained the following 95% confidence interval for the slope: [-1.2, 5.6]. Should we use degrees of goodness to predict the value of gifts? A. Yes, the slope appears to be positive. B. Yes, it s better than using nothing. C. No, the slope is not significant. 74

75 INFERENCE USING THE MODEL

HOW THIS DIFFERS The inference we saw in the last section pertained to the model itself: Is the mean of the response variable Y a constant or is it a conditional mean conditional on the value of the explanatory variable? If it is conditional, how much are we gaining in predictive power by using a conditional mean instead of a constant? In this section, we look to infer on the Conditional population mean The result of an individual within a conditional population. 76

Figure 17.3 Distribu)on of y Given x

CONFIDENCE INTERVALS The Book talks about Expected Value, which is just a fancy word for Mean. In this case it s a conditional mean. Given a specific value of x, we can construct a confidence interval 78

PREDICTION INTERVALS Given a specific value of x, we may be interested in predicting the behaviour of an individual rather than the mean. We have to change the interval slightly to account for the extra variability observed in individuals rather than means. 79

Interval Es)mates and Predic)on Intervals

New Example here AND new problem a?er

EXAMPLE: OXYGEN DEMAND One of the more challenging problems confronting the water pollution control field is presented by the tanning industry. Their wastes are chemically complex. We consider the experimental data obtained from 33 samples of chemically treated waste. The variables are: The percent reduction in total solids The percent reduction in chemical oxygen demand. 82

Solid Residue Oxygen Demand Mean Sample SD Correlation SSE 33.45 11.39 0.955 10026.02 34.06 10.77 83

EXERCISE 1. Estimate the regression line 2. Construct a 95% confidence interval for the slope. 3. Construct a 95% Confidence Interval for the mean chemical oxygen demand of water with 32% solids reduction. 4. Construct a 95% Prediction Interval for water with 40% solids reduction. 84