SIMPLE LINEAR REGRESSION STAT 251
|
|
- Agatha George
- 5 years ago
- Views:
Transcription
1 1 SIMPLE LINEAR REGRESSION STAT 251
2 OUTLINE Relationships in Data The Beginning Scatterplots Correlation The Least Squares Line Cautions Association vs. Causation Extrapolation Outliers Inference: Simple Linear Regression The Theoretical Model Testing and Estimating the Slope Coefficient Confidence Intervals and Prediction Intervals Verifying Assumptions 2
3 3 EXPLORING RELATIONSHIPS Up to now we ve mostly concentrated on exploring one variable at a time. In ANOVA and hypothesis testing, we determined whether or not there is a relationship, but we ve yet to explore how two variables relate. Here, we address the question of exploring relationships between two quantitative variables. We ll use an ongoing example to illustrate the concepts of data exploration.
4 EXPLORING LINEAR RELATIONSHIPS IN DATA 4
5 Televison, Physicians, and Life Expectancy Country Life expect ancy People /TV People / physici an Argentina Bangladesh Brazil Canada China Colombia Egypt Ethiopia France Germany India Indonesia Iran Italy Japan Kenya Korea, North Korea, South Mexico Morocco Myanmar (Burma) Pakistan Peru Philippines Poland Romania Russia South Africa Spain Sudan Taiwan Tanzania 52.5* Thailand Turkey Ukraine United Kingdom United States Venezuela Vietnam Zaire 54* SOURCE: _The World Almanac and Book of Facts 1993_ (1993), New York: Pharos Books.
6 6 OUR GOAL We hope to establish a relationship between two variables. This relationship will allow us to make predictions about the value of one variable based on the observed value of the other. Explanatory Variable (X): The explanatory (also called predictor) variable is used to try and explain or predict the other variable. E.g. the number of physicians in a country could be used to predict the life expectancy of its citizens. Response Variable (Y): The response variable is the variable we are trying to predict. E.g. life expectancy
7 7 USING A GRAPHICAL DEVICE TO VISUALIZE THE RELATIONSHIP. Is there a relationship between the number of physicians in a country and its citizens life expectancy? Would making histograms for both variables help us answer this question? Why? A Scatterplot helps us to visualize the relationship between two quantitative variables and determine if there is any association between the two variables.
8 8 MAKING A SCATTERPLOT We first identify our explanatory, or independent, variable (X), which will be conveniently plotted on the x-axis, and our response, or dependent, variable (Y) conveniently plotted on the y-axis. Secondly we collect and plot each point (x i,y i ). These are the observations for case i. In our example, (x 1,y 1 ) are the number of physicians and the life expectancy for Argentina. The origin does not have to be included in the plot.
9 9
10 10 PATTERNS OF SCATTERPLOTS When looking for patterns on a scatterplot there are generally 4 things we should think about; the direction Form scatter of the points and if there are any outliers.
11 11 1 DIRECTION The direction is Positive if the x and y values tend to go in the same direction (ie) when x is low y is generally low, and when x is high y is generally high. The direction is Negative if the x and y values tend to go in opposite directions (ie)when x is low y is generally high, and when x is high y is generally low.
12 12 2 FORM Linear Form Non-linear or Curved form No clear form
13 13 3 SCATTER When there is a clear form, the scatter of the points about the line or curve will indicate the strength of the association Are the points tight to their form or Are they loose?
14 14 4 OUTLIERS Scatterplots will often reveal outliers if they are present.
15 TRANSFORMATIONS It s much simpler to deal with linear relationships than curved ones. (for one it will allow us to use correlation) When a scatter plot has a form which is nonlinear, we can transform one or both variables to render it linear. In our example, the relationship is not linear. The most common transformation is taking the log of one of the variables. On the following page, the log of X, the number of citizens per physician, was taken.
16 16
17 17 PLOTTING GUIDELINES 1 quantitative variable: histogram, stem-andleaf, boxplot 1 categorical variable: barchart, piechart 1 quantitative 1 categorical: side-by-side boxplots 2 categorical: bar charts, stacked bar charts 2 quantitative: scatterplots
18 CORRELATION As with module 1, we ve first discussed how to use graphical tools to explore the relationship between two categorical variables and now we follow this with a numerical measure. Scatterplots are useful at displaying a relationship, but are inherently vague. Correlation measures the strength of the Linear-association between two quantitative variables, and is denoted by the letter (r or ρ).
19 19 CALCULATING CORRELATION EQUATION You Won t be expected to calculate these by hand!
20 20 PROPERTIES The direction of the association dictates the sign of the correlation. I.e. if the association is positive, then the r is positive. r is always between -1 and 1. Correlations of 1 or -1 indicate perfect positive or negative association. r near zero indicates very weak or absent linear association. x and y are interchangeable Linear transformations of variables do not affect correlation.
21 21
22 22 GUIDELINES r 0.90 very strong association 0.90 r 0.70 fairly strong association 0:70 r 0.50 somewhat weak association 0:50 r 0.30 very weak association r 0.3 no/little association * note, that we are talking about linear association.
23 23
24 24 CAREFUL Correlation is used on quantitative variables It is a measure of linear association. The scatterplot should indicate/support linear association. Outliers can severely distort the correlation.
25 For the scatterplots below, assign what you think is the appropriate correlation for each, choosing from the list of numbers below: r : -1, -0.95, -0.77, -0.55, -0.35, 0, 0, 0.35, 0.50, 0.75, 0.95, 1 25
26 Q1 A fellow researcher is exploring the relationships between various variables. She informs you that the correlation between variable A and B is You ask for the plot to confirm linearity and she says: With a correlation so high it has to be linear. Do you agree with this statement? A. Yes B. No 26
27 Q2 Betty Crocker was exploring the relationship between cooking time and temperature for brownies. She works in degrees Fahrenheit and found a sample correlation of We would like to use these findings, but we are working in degrees Celsius which is 5/9X -160/9 degrees Fahrenheit. What correlation should we report? A B. 5/9(-0.77) C. 5/9(-0.77) -160/9 D
28 Q3 In the Betty Crocker example, she found a correlation of Which of the following statements is correct? A. If we increase the temperature, the cooking time will decrease. B. If we increase the temperature, we expect the cooking time will decrease. C. If we increase the temperature, the cooking time will increase. D. If we increase the temperature, we expect the cooking time will increase. 28
29 LINEAR REGRESSION: FITTING A LINE TO THE DATA 29
30 FITTING A LINE Once a linear relation is established, we seek a numerical quantification of the linear relationship between two quantitative variables. We categorized variables as being predictor variables or explanatory variables and in these chapters we discuss how to go about predicting. Linear Regression is a method of fitting a straight line to a scatterplot and predicting the Response (y) using the Explanatory variable (x).
31 When trying to fit a line to a scatterplot, there are many lines which are acceptable candidates. In order to properly discuss how we choose the best fitting, we ll need to develop some vocabulary first. 31
32 THE MODEL The model fits a straight line to the data which can be used to make predictions on the response. Mathematically, the model is The slope tells us how big a change in y to expect for a unit increase in x. If it s positive, y will increase with x. Note: the intercept can sometimes be meaningless
33 33 RESIDUALS Once we have our regression line (from the model), for any value of x, we can predict the value of y by using the corresponding y-value on the line. These predicted values are called fitted values and are denoted by For an observed value of y, we can obtain the difference between the observation and the predicted value which we call the residual (e)
34 It is apparent from above, that the smaller the residuals are, the better our model is at making predictions. 34
35 35 MINIMIZE THE SQUARED DEVIATIONS? The Residuals are both positive and negative thus we can t minimize them directly and there are infinitely many lines which lead to residuals which sum to zero. Recall that when calculating variance, we faced the same issue and used the sum of squared differences to quantify the spread in the data. The same trick is used again and it is the sum of squared residuals or deviations from the line which we minimize. Only one line leads to the minimization of the squared residuals. We call this line the least squares line
36 36 THE LEAST SQUARES LINE Only one line leads to the minimization of the squared residuals. We call this line the least squares line. The Regression line and Least Squares line are the same line.
37 37
38 38 EXAMPLE An experiment was designed for the Department of Materials Engineering to study hydrogen embrittlement properties based on electrolytic hydrogen pressure measurements. The solution used was 0.1 N NaOH, the material being certain type of stainless steel. The cathodic charging current density was controlled and varied at four levels. Here are some summary statistics Variable Mean Sample SD Correlation Charging Current Density (ma/cm 2 ) Effective Hydrogen Pressure (atm)
39 Find the regression line 39
40 40 MORE QUESTIONS What would you predict the pressure to be if the current was 2.1 ma/cm 2? What about 4.0 ma/ cm 2? For every 1 ma/cm 2 increase in current, what is the expected increase in effective hydrogen pressure? On the 16 th trial, the current was set to 1.5 and the pressure was measured at What would the residual for this observation be?
41 SOME REMARKS The regression line goes through the mean-mean point. Interpreting the slope b 1 : On average, an increase of 1 SD x in X is associated with a change of r x SD y in Y. So in our example, for every 1.187mA/cm 2 shift in current, we have an expected shift of (0.929)( ) in pressure Interpreting the intercept b 0 : the predicted value for x=0.
42 CAUTIONS: CAUSATION, EXTRAPOLATION AND OUTLIERS 42
43 CAREFUL ASSOCIATION IS NOT CAUSATION To make predictions, we only need association, not causation. Observing strong association does not imply causation. Causation leads to association, but association does not necessarily lead to causation. Association may be purely due to luck. 43
44 HOW WERE THE DATA COLLECTED? There may be an underlying variable, called a lurking variable, which is associated to both x and y. The way the data are obtained dictates if we can imply causation: An Experiment removes the influence of other variable, so we can conclude causation A Study is susceptible to the influence of other variables. 44
45 EXAMPLES
46 Televison, Physicians, and Life Expectancy Country Life expect ancy People /TV People / physici an Argentina Bangladesh Brazil Canada China Colombia Egypt Ethiopia France Germany India Indonesia Iran Italy Japan Kenya Korea, North Korea, South Mexico Morocco Myanmar (Burma) Pakistan Peru Philippines Poland Romania Russia South Africa Spain Sudan Taiwan Tanzania 52.5* Thailand Turkey Ukraine United Kingdom United States Venezuela Vietnam Zaire 54* SOURCE: _The World Almanac and Book of Facts 1993_ (1993), New York: Pharos Books.
47 EXTRAPOLATION Extrapolation is when we try to predict the response variable for an explanatory variable which is outside the range of our observed explanatory variable. Interpolation is when we try to predict within that range. Extrapolating makes the assumption that the relationship for the two variables continues beyond the limits of this range. Often this can give misleading predictions as this assumption doesn t hold Predicting the future through regression is always extrapolating 47
48 EXAMPLES Some data on weight and age of girls between the ages of 2 and 10 were collected. The relationship is linear and very strong. The model which arises from the data is: Weight = (age) So one could interpolate the weight of an average 5 year old to be But were we to trust this model to go on beyond the range of ages here, what would we predict the weight of a 40 year old to be? Does this make sense? 48
49 The danger of making predictions outside the range of the observed x 49 values is that the linear relationship for the observed data may not hold anymore once we leave the range.
50 BEWARE OF INFLUENTIAL POINTS We ve already discussed how problematic outliers can be in the context of summary statistics (E.g. mean and variance) Outliers are also problematic in regression. We ll define three types of outliers and how they differ in their effect on the regression line. y-outliers x-outliers 50
51 51
52 INFLUENTIAL POINTS We call an observation influential if omitting it from the analysis will largely change the model. If a high leverage point or a y-outlier are model outliers, then they are influential points. When an outlier is present, one should fit two models one with and one without the potentially influential point. The outlier shouldn t be omitted without justification. 52
53 THE EFFECT OF A NON-INFLUENTIAL OUTLIER Outliers which aren t model outliers can still affect the regression. Including the outlier can, in some cases raise the R 2. R 2 =0.263 R 2 =
54 Q4 The next step Betty Crocker took was to use her data to estimate a regression line. What is the response variable here? A. Temperature B. Cooking Time 54
55 Q5 The estimated regression line was: Y = 25 (1/30)X The best interpretation of this slope would be: A. For every degree we raise the temperature, we reduce the cooking time by 1/30 of a minute. B. The cooking time decreases as we increase the temperature. C. For every degree we raise the temperature, we expect to reduce the cooking time by 1/30 of a minute. D. On average, increasing the temperature by one degree will decrease the cooking time by 1/30 of a second 55
56 Q6 The estimated regression line found by Betty Crocker is: Y = 25 (1/30)X She then cooked brownies in 10 minutes. What temperature do you predict she cooked at? A. 25 (1/30)x10 = B. (25-10)x30 = 450 C. ( )x30 = 1050 D. Can t tell from this information 56
57 THE STOCHASTIC MODEL AND ASSUMPTIONS
58 BRINGING THE STATISTICS TO REGRESSION The Least Squares Line is found on purely mathematical grounds. In order to make statistical inference, we expand the model slightly. Y = x + So for a single response we have Y i = x i + i Where i ~ N(0, 2 ) 58
59 FIGURE 17.3 DISTRIBUTION OF Y GIVEN X
60 THE ASSUMPTIONS There are 4 assumptions made in Simple Linear regression: The residuals are Normally distributed The variance of the errors are constant The observations are independent The relation is linear Note: The assumptions and Model go hand in hand The observations (y i ) are Normally distributed. The variance is constant (homoscedasticity) The relationship is linear 60
61 DIAGNOSTICS Independence: It is determined through design not by graphical investigation. Normality: Verified using a histogram or a QQ plot Homoscedasticity (Constant Variance): plot the residuals against the fitted values. Look out for patterns as they indicate that the assumptions are not met. Linearity: verified using a scatter plot or a residual plot Outliers: We should also look at the scatter plots for outliers. These are called influential points and should be avoided. 61
62 62
63 63
64 INFERENCE PART 1: INFERENCE ON THE MODEL 64
65 WHEN GIVEN A MODEL Suppose someone collected data and estimated a model. Without the data, we have no idea how good the model is. Here are questions we may want to ask about the model: How good is the model at predicting? How strong is the relationship? Should we use the explanatory variable to estimate the mean of the response? Is the slope significant? How can I construct a Confidence Interval for the mean of the response? How can I create a prediction Interval for an individual meeting certain criteria. 65
66 ANSWERS How good is the model at predicting Coefficient of Determination Is the slope significant? T-test for the slope How can I construct a Confidence Interval for the mean of the response? How can I create a prediction Interval for an individual meeting certain criteria. Confidence Interval for the Expected Value of y Prediction Interval 66
67 SO YOU VE DETERMINED THAT THE MODEL IS LINEAR Having determined that the model is a line and not just a mean, we want to know: How good at making predictions is our model? Recall that correlation is a measure of the linear association between two variables. Obviously the stronger the association, the better the model will be at making predictions. 67
68 COEFFICIENT OF DETERMINATION The Coefficient of determination is simply a reexpression of the correlation which lends itself better to the question at hand R 2 = r 2 Coefficient of Determination = Correlation 2 68
69
70 QUESTIONS Does a higher Coefficient of Determination imply a better model? If we reject the Null Hypothesis of the ANOVA test, do we also reject the t-test for the slope? What if we fail to reject the ANOVA, do we also fail to reject the t-test? 70
71 TESTING THE SLOPE As with before, ANOVA is a generalization of a t- test. We can use a t-test to test for the slope. Compared to ANOVA, we can test for a specific side and not only for a slope. H 0 : 1 = 0 H A : 1 0 or H A : 1 < 0 or H A : 1 > 0 The Conditions required for this test are those required for Simple Linear Regression. What are they? 71
72 TESTING THE SLOPE 2 If these are met, then the sampling distribution of b 1 is: Normal Has Mean 1 Has Standard Error Here 72
73 TESTING THE SLOPE 3 The test follows the same form that all our t-test have followed: With degrees of freedom n-2 We can also construct a Confidence Interval for the Slope 73
74 Q7 A suspicious Elf measured the relation between the value of toys given at Christmas and the degree of goodness of children (don t ask how). He obtained the following 95% confidence interval for the slope: [-1.2, 5.6]. Should we use degrees of goodness to predict the value of gifts? A. Yes, the slope appears to be positive. B. Yes, it s better than using nothing. C. No, the slope is not significant. 74
75 75 INFERENCE USING THE MODEL
76 HOW THIS DIFFERS The inference we saw in the last section pertained to the model itself: Is the mean of the response variable Y a constant or is it a conditional mean conditional on the value of the explanatory variable? If it is conditional, how much are we gaining in predictive power by using a conditional mean instead of a constant? In this section, we look to infer on the Conditional population mean The result of an individual within a conditional population. 76
77 Figure 17.3 Distribu)on of y Given x
78 CONFIDENCE INTERVALS The Book talks about Expected Value, which is just a fancy word for Mean. In this case it s a conditional mean. Given a specific value of x, we can construct a confidence interval 78
79 PREDICTION INTERVALS Given a specific value of x, we may be interested in predicting the behaviour of an individual rather than the mean. We have to change the interval slightly to account for the extra variability observed in individuals rather than means. 79
80 Interval Es)mates and Predic)on Intervals
81 New Example here AND new problem a?er
82 EXAMPLE: OXYGEN DEMAND One of the more challenging problems confronting the water pollution control field is presented by the tanning industry. Their wastes are chemically complex. We consider the experimental data obtained from 33 samples of chemically treated waste. The variables are: The percent reduction in total solids The percent reduction in chemical oxygen demand. 82
83 Solid Residue Oxygen Demand Mean Sample SD Correlation SSE
84 EXERCISE 1. Estimate the regression line 2. Construct a 95% confidence interval for the slope. 3. Construct a 95% Confidence Interval for the mean chemical oxygen demand of water with 32% solids reduction. 4. Construct a 95% Prediction Interval for water with 40% solids reduction. 84
Linear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation?
Did You Mean Association Or Correlation? AP Statistics Chapter 8 Be careful not to use the word correlation when you really mean association. Often times people will incorrectly use the word correlation
More informationStochastic Analysis and Forecasts of the Patterns of Speed, Acceleration, and Levels of Material Stock Accumulation in Society
Stochastic Analysis and Forecasts of the Patterns of Speed, Acceleration, and Levels of Material Stock Accumulation in Society Supporting information Tomer Fishman a,*, Heinz Schandl a,b, and Hiroki Tanikawa
More informationObjectives. 2.3 Least-squares regression. Regression lines. Prediction and Extrapolation. Correlation and r 2. Transforming relationships
Objectives 2.3 Least-squares regression Regression lines Prediction and Extrapolation Correlation and r 2 Transforming relationships Adapted from authors slides 2012 W.H. Freeman and Company Straight Line
More informationappstats27.notebook April 06, 2017
Chapter 27 Objective Students will conduct inference on regression and analyze data to write a conclusion. Inferences for Regression An Example: Body Fat and Waist Size pg 634 Our chapter example revolves
More informationChapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc.
Chapter 8 Linear Regression Copyright 2010 Pearson Education, Inc. Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the Burger King menu: Copyright
More informationappstats8.notebook October 11, 2016
Chapter 8 Linear Regression Objective: Students will construct and analyze a linear model for a given set of data. Fat Versus Protein: An Example pg 168 The following is a scatterplot of total fat versus
More informationChapter 8. Linear Regression. The Linear Model. Fat Versus Protein: An Example. The Linear Model (cont.) Residuals
Chapter 8 Linear Regression Copyright 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 8-1 Copyright 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Fat Versus
More informationChapter 27 Summary Inferences for Regression
Chapter 7 Summary Inferences for Regression What have we learned? We have now applied inference to regression models. Like in all inference situations, there are conditions that we must check. We can test
More informationAP Statistics. Chapter 6 Scatterplots, Association, and Correlation
AP Statistics Chapter 6 Scatterplots, Association, and Correlation Objectives: Scatterplots Association Outliers Response Variable Explanatory Variable Correlation Correlation Coefficient Lurking Variables
More informationUnit 6 - Introduction to linear regression
Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,
More information, (1) e i = ˆσ 1 h ii. c 2016, Jeffrey S. Simonoff 1
Regression diagnostics As is true of all statistical methodologies, linear regression analysis can be a very effective way to model data, as along as the assumptions being made are true. For the regression
More informationInferences for Regression
Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In
More informationSTA Module 5 Regression and Correlation. Learning Objectives. Learning Objectives (Cont.) Upon completing this module, you should be able to:
STA 2023 Module 5 Regression and Correlation Learning Objectives Upon completing this module, you should be able to: 1. Define and apply the concepts related to linear equations with one independent variable.
More informationSociology 6Z03 Review I
Sociology 6Z03 Review I John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review I Fall 2016 1 / 19 Outline: Review I Introduction Displaying Distributions Describing
More informationAnalysing data: regression and correlation S6 and S7
Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association
More information2017 Source of Foreign Income Earned By Fund
2017 Source of Foreign Income Earned By Fund Putnam Emerging Markets Equity Fund EIN: 26-2670607 FYE: 08/31/2017 Statement Pursuant to 1.853-4: The fund is hereby electing to apply code section 853 for
More informationChapter 7 Summary Scatterplots, Association, and Correlation
Chapter 7 Summary Scatterplots, Association, and Correlation What have we learned? We examine scatterplots for direction, form, strength, and unusual features. Although not every relationship is linear,
More informationLearning Objectives. Math Chapter 3. Chapter 3. Association. Response and Explanatory Variables
ASSOCIATION: CONTINGENCY, CORRELATION, AND REGRESSION Chapter 3 Learning Objectives 3.1 The Association between Two Categorical Variables 1. Identify variable type: Response or Explanatory 2. Define Association
More informationWarm-up Using the given data Create a scatterplot Find the regression line
Time at the lunch table Caloric intake 21.4 472 30.8 498 37.7 335 32.8 423 39.5 437 22.8 508 34.1 431 33.9 479 43.8 454 42.4 450 43.1 410 29.2 504 31.3 437 28.6 489 32.9 436 30.6 480 35.1 439 33.0 444
More informationBasic Business Statistics 6 th Edition
Basic Business Statistics 6 th Edition Chapter 12 Simple Linear Regression Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of a dependent variable based
More informationMODELING. Simple Linear Regression. Want More Stats??? Crickets and Temperature. Crickets and Temperature 4/16/2015. Linear Model
STAT 250 Dr. Kari Lock Morgan Simple Linear Regression SECTION 2.6 Least squares line Interpreting coefficients Cautions Want More Stats??? If you have enjoyed learning how to analyze data, and want to
More informationChapter 7. Scatterplots, Association, and Correlation. Copyright 2010 Pearson Education, Inc.
Chapter 7 Scatterplots, Association, and Correlation Copyright 2010 Pearson Education, Inc. Looking at Scatterplots Scatterplots may be the most common and most effective display for data. In a scatterplot,
More informationChapter 12 - Part I: Correlation Analysis
ST coursework due Friday, April - Chapter - Part I: Correlation Analysis Textbook Assignment Page - # Page - #, Page - # Lab Assignment # (available on ST webpage) GOALS When you have completed this lecture,
More informationUnit 6 - Simple linear regression
Sta 101: Data Analysis and Statistical Inference Dr. Çetinkaya-Rundel Unit 6 - Simple linear regression LO 1. Define the explanatory variable as the independent variable (predictor), and the response variable
More informationCanadian Imports of Honey
of 0409000029 - Honey, natural, in containers of a weight > 5 kg, nes (Kilogram) Argentina 236,716 663,087 2,160,216 761,990 35.27% 202.09% /0 76,819 212,038 717,834 257,569 35.88% 205.69% /0 United States
More informationChapter 9. Correlation and Regression
Chapter 9 Correlation and Regression Lesson 9-1/9-2, Part 1 Correlation Registered Florida Pleasure Crafts and Watercraft Related Manatee Deaths 100 80 60 40 20 0 1991 1993 1995 1997 1999 Year Boats in
More informationCorrelation & Simple Regression
Chapter 11 Correlation & Simple Regression The previous chapter dealt with inference for two categorical variables. In this chapter, we would like to examine the relationship between two quantitative variables.
More informationDescribing Data: Two Variables
STAT 250 Dr. Kari Lock Morgan Describing Data: Two Variables SECTIONS 2.4, 2.5 One quantitative variable (2.4) One quantitative and one categorical (2.4) Two quantitative (2.5) z- score Which is better,
More informationChapter 3: Examining Relationships
Chapter 3: Examining Relationships Most statistical studies involve more than one variable. Often in the AP Statistics exam, you will be asked to compare two data sets by using side by side boxplots or
More informationMATH 1070 Introductory Statistics Lecture notes Relationships: Correlation and Simple Regression
MATH 1070 Introductory Statistics Lecture notes Relationships: Correlation and Simple Regression Objectives: 1. Learn the concepts of independent and dependent variables 2. Learn the concept of a scatterplot
More informationRelationships Regression
Relationships Regression BPS chapter 5 2006 W.H. Freeman and Company Objectives (BPS chapter 5) Regression Regression lines The least-squares regression line Using technology Facts about least-squares
More informationImportant note: Transcripts are not substitutes for textbook assignments. 1
In this lesson we will cover correlation and regression, two really common statistical analyses for quantitative (or continuous) data. Specially we will review how to organize the data, the importance
More informationSampling Distributions in Regression. Mini-Review: Inference for a Mean. For data (x 1, y 1 ),, (x n, y n ) generated with the SRM,
Department of Statistics The Wharton School University of Pennsylvania Statistics 61 Fall 3 Module 3 Inference about the SRM Mini-Review: Inference for a Mean An ideal setup for inference about a mean
More informationSTATISTICS Relationships between variables: Correlation
STATISTICS 16 Relationships between variables: Correlation The gentleman pictured above is Sir Francis Galton. Galton invented the statistical concept of correlation and the use of the regression line.
More informationSociology 6Z03 Review II
Sociology 6Z03 Review II John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 1 / 35 Outline: Review II Probability Part I Sampling Distributions Probability
More informationStat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS
Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS 1a) The model is cw i = β 0 + β 1 el i + ɛ i, where cw i is the weight of the ith chick, el i the length of the egg from which it hatched, and ɛ i
More informationInternational Student Enrollment Fall 2018 By CIP Code, Country of Citizenship, and Education Level Harpur College of Arts and Sciences
International Student Enrollment Fall 2018 By CIP Code, Country of Citizenship, and Education Level Harpur College of Arts and Sciences CIP Code Description Citizenship Graduate Undergrad Total 00.0000
More informationCorrelation and Regression
Correlation and Regression Dr. Bob Gee Dean Scott Bonney Professor William G. Journigan American Meridian University 1 Learning Objectives Upon successful completion of this module, the student should
More informationNote on Bivariate Regression: Connecting Practice and Theory. Konstantin Kashin
Note on Bivariate Regression: Connecting Practice and Theory Konstantin Kashin Fall 2012 1 This note will explain - in less theoretical terms - the basics of a bivariate linear regression, including testing
More informationLectures on Simple Linear Regression Stat 431, Summer 2012
Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population
More informationChapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania
Chapter 10 Regression Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania Scatter Diagrams A graph in which pairs of points, (x, y), are
More informationChapter 4 Data with Two Variables
Chapter 4 Data with Two Variables 1 Scatter Plots and Correlation and 2 Pearson s Correlation Coefficient Looking for Correlation Example Does the number of hours you watch TV per week impact your average
More informationRegression and correlation. Correlation & Regression, I. Regression & correlation. Regression vs. correlation. Involve bivariate, paired data, X & Y
Regression and correlation Correlation & Regression, I 9.07 4/1/004 Involve bivariate, paired data, X & Y Height & weight measured for the same individual IQ & exam scores for each individual Height of
More informationStatistics for Managers using Microsoft Excel 6 th Edition
Statistics for Managers using Microsoft Excel 6 th Edition Chapter 13 Simple Linear Regression 13-1 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of
More informationBusiness Statistics. Lecture 10: Course Review
Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,
More informationChapter 5 Least Squares Regression
Chapter 5 Least Squares Regression A Royal Bengal tiger wandered out of a reserve forest. We tranquilized him and want to take him back to the forest. We need an idea of his weight, but have no scale!
More informationInference with Simple Regression
1 Introduction Inference with Simple Regression Alan B. Gelder 06E:071, The University of Iowa 1 Moving to infinite means: In this course we have seen one-mean problems, twomean problems, and problems
More information4.1 Introduction. 4.2 The Scatter Diagram. Chapter 4 Linear Correlation and Regression Analysis
4.1 Introduction Correlation is a technique that measures the strength (or the degree) of the relationship between two variables. For example, we could measure how strong the relationship is between people
More informationCorrelation and regression
NST 1B Experimental Psychology Statistics practical 1 Correlation and regression Rudolf Cardinal & Mike Aitken 11 / 12 November 2003 Department of Experimental Psychology University of Cambridge Handouts:
More informationChapter 16. Simple Linear Regression and dcorrelation
Chapter 16 Simple Linear Regression and dcorrelation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will
More informationKeller: Stats for Mgmt & Econ, 7th Ed July 17, 2006
Chapter 17 Simple Linear Regression and Correlation 17.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will
More information9. Linear Regression and Correlation
9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,
More informationExample: Forced Expiratory Volume (FEV) Program L13. Example: Forced Expiratory Volume (FEV) Example: Forced Expiratory Volume (FEV)
Program L13 Relationships between two variables Correlation, cont d Regression Relationships between more than two variables Multiple linear regression Two numerical variables Linear or curved relationship?
More informationAssumptions, Diagnostics, and Inferences for the Simple Linear Regression Model with Normal Residuals
Assumptions, Diagnostics, and Inferences for the Simple Linear Regression Model with Normal Residuals 4 December 2018 1 The Simple Linear Regression Model with Normal Residuals In previous class sessions,
More informationHOLLOMAN S AP STATISTICS BVD CHAPTER 08, PAGE 1 OF 11. Figure 1 - Variation in the Response Variable
Chapter 08: Linear Regression There are lots of ways to model the relationships between variables. It is important that you not think that what we do is the way. There are many paths to the summit We are
More informationScatter plot of data from the study. Linear Regression
1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25
More information04 June Dim A W V Total. Total Laser Met
4 June 218 Member State State as on 4 June 218 Acronyms are listed in the last page of this document. AUV Mass and Related Quantities Length PR T TF EM Mass Dens Pres F Torq Visc H Grav FF Dim A W V Total
More informationAnnouncements. Lecture 18: Simple Linear Regression. Poverty vs. HS graduate rate
Announcements Announcements Lecture : Simple Linear Regression Statistics 1 Mine Çetinkaya-Rundel March 29, 2 Midterm 2 - same regrade request policy: On a separate sheet write up your request, describing
More informationAMS 7 Correlation and Regression Lecture 8
AMS 7 Correlation and Regression Lecture 8 Department of Applied Mathematics and Statistics, University of California, Santa Cruz Suumer 2014 1 / 18 Correlation pairs of continuous observations. Correlation
More information3.1 Scatterplots and Correlation
3.1 Scatterplots and Correlation Most statistical studies examine data on more than one variable. In many of these settings, the two variables play different roles. Explanatory variable (independent) predicts
More informationSingle and multiple linear regression analysis
Single and multiple linear regression analysis Marike Cockeran 2017 Introduction Outline of the session Simple linear regression analysis SPSS example of simple linear regression analysis Additional topics
More informationChapter 6 Scatterplots, Association and Correlation
Chapter 6 Scatterplots, Association and Correlation Looking for Correlation Example Does the number of hours you watch TV per week impact your average grade in a class? Hours 12 10 5 3 15 16 8 Grade 70
More informationChapter 4 Data with Two Variables
Chapter 4 Data with Two Variables 1 Scatter Plots and Correlation and 2 Pearson s Correlation Coefficient Looking for Correlation Example Does the number of hours you watch TV per week impact your average
More informationLAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION
LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION In this lab you will first learn how to display the relationship between two quantitative variables with a scatterplot and also how to measure the strength of
More informationChapter 16. Simple Linear Regression and Correlation
Chapter 16 Simple Linear Regression and Correlation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will
More informationNature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.
Understanding regression output from software Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals In 1966 Cyril Burt published a paper called The genetic determination of differences
More informationAP Statistics. Chapter 9 Re-Expressing data: Get it Straight
AP Statistics Chapter 9 Re-Expressing data: Get it Straight Objectives: Re-expression of data Ladder of powers Straight to the Point We cannot use a linear model unless the relationship between the two
More informationMr. Stein s Words of Wisdom
Mr. Stein s Words of Wisdom I am writing this review essay for two tests the AP Stat exam and the Applied Stat BFT. The topics are more or less the same, so reviewing for the two tests should be a similar
More information1. Create a scatterplot of this data. 2. Find the correlation coefficient.
How Fast Foods Compare Company Entree Total Calories Fat (grams) McDonald s Big Mac 540 29 Filet o Fish 380 18 Burger King Whopper 670 40 Big Fish Sandwich 640 32 Wendy s Single Burger 470 21 1. Create
More informationLecture 4 Scatterplots, Association, and Correlation
Lecture 4 Scatterplots, Association, and Correlation Previously, we looked at Single variables on their own One or more categorical variable In this lecture: We shall look at two quantitative variables.
More informationBiostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras
Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras Lecture - 39 Regression Analysis Hello and welcome to the course on Biostatistics
More informationMath 243 OpenStax Chapter 12 Scatterplots and Linear Regression OpenIntro Section and
Math 243 OpenStax Chapter 12 Scatterplots and Linear Regression OpenIntro Section 2.1.1 and 8.1-8.2.6 Overview Scatterplots Explanatory and Response Variables Describing Association The Regression Equation
More informationAppendix B: Detailed tables showing overall figures by country and measure
44 country and measure % who report that they are very happy Source: World Values Survey, 2010-2014 except United States, Pew Research Center 2012 Gender and Generations survey and Argentina 32% 32% 36%
More informationChapter 7. Scatterplots, Association, and Correlation
Chapter 7 Scatterplots, Association, and Correlation Bin Zou (bzou@ualberta.ca) STAT 141 University of Alberta Winter 2015 1 / 29 Objective In this chapter, we study relationships! Instead, we investigate
More informationM 140 Test 1 B Name (1 point) SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 75
M 140 est 1 B Name (1 point) SHOW YOUR WORK FOR FULL CREDI! Problem Max. Points Your Points 1-10 10 11 10 12 3 13 4 14 18 15 8 16 7 17 14 otal 75 Multiple choice questions (1 point each) For questions
More information9 Correlation and Regression
9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the
More informationINFERENCE FOR REGRESSION
CHAPTER 3 INFERENCE FOR REGRESSION OVERVIEW In Chapter 5 of the textbook, we first encountered regression. The assumptions that describe the regression model we use in this chapter are the following. We
More informationInference for the Regression Coefficient
Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression line. We can shows that b 0 and b 1 are the unbiased estimates
More informationSTAT 350 Final (new Material) Review Problems Key Spring 2016
1. The editor of a statistics textbook would like to plan for the next edition. A key variable is the number of pages that will be in the final version. Text files are prepared by the authors using LaTeX,
More informationdetermine whether or not this relationship is.
Section 9-1 Correlation A correlation is a between two. The data can be represented by ordered pairs (x,y) where x is the (or ) variable and y is the (or ) variable. There are several types of correlations
More informationNotes 11: OLS Theorems ECO 231W - Undergraduate Econometrics
Notes 11: OLS Theorems ECO 231W - Undergraduate Econometrics Prof. Carolina Caetano For a while we talked about the regression method. Then we talked about the linear model. There were many details, but
More informationScatter plot of data from the study. Linear Regression
1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25
More informationHarvard University. Rigorous Research in Engineering Education
Statistical Inference Kari Lock Harvard University Department of Statistics Rigorous Research in Engineering Education 12/3/09 Statistical Inference You have a sample and want to use the data collected
More informationSTAT 4385 Topic 03: Simple Linear Regression
STAT 4385 Topic 03: Simple Linear Regression Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2017 Outline The Set-Up Exploratory Data Analysis
More informationRegression Models. Chapter 4. Introduction. Introduction. Introduction
Chapter 4 Regression Models Quantitative Analysis for Management, Tenth Edition, by Render, Stair, and Hanna 008 Prentice-Hall, Inc. Introduction Regression analysis is a very valuable tool for a manager
More informationRegression. Marc H. Mehlman University of New Haven
Regression Marc H. Mehlman marcmehlman@yahoo.com University of New Haven the statistician knows that in nature there never was a normal distribution, there never was a straight line, yet with normal and
More informationECON 497: Lecture 4 Page 1 of 1
ECON 497: Lecture 4 Page 1 of 1 Metropolitan State University ECON 497: Research and Forecasting Lecture Notes 4 The Classical Model: Assumptions and Violations Studenmund Chapter 4 Ordinary least squares
More informationUNIT 12 ~ More About Regression
***SECTION 15.1*** The Regression Model When a scatterplot shows a relationship between a variable x and a y, we can use the fitted to the data to predict y for a given value of x. Now we want to do tests
More informationBusiness Statistics. Lecture 10: Correlation and Linear Regression
Business Statistics Lecture 10: Correlation and Linear Regression Scatterplot A scatterplot shows the relationship between two quantitative variables measured on the same individuals. It displays the Form
More informationChapter 7. Scatterplots, Association, and Correlation. Scatterplots & Correlation. Scatterplots & Correlation. Stat correlation
Chapter 7 Scatterplots, Association, and Correlation 1 Scatterplots & Correlation Here, we see a positive relationship between a bear s age and its neck diameter. As a bear gets older, it tends to have
More informationLECTURE 15: SIMPLE LINEAR REGRESSION I
David Youngberg BSAD 20 Montgomery College LECTURE 5: SIMPLE LINEAR REGRESSION I I. From Correlation to Regression a. Recall last class when we discussed two basic types of correlation (positive and negative).
More informationChi-square tests. Unit 6: Simple Linear Regression Lecture 1: Introduction to SLR. Statistics 101. Poverty vs. HS graduate rate
Review and Comments Chi-square tests Unit : Simple Linear Regression Lecture 1: Introduction to SLR Statistics 1 Monika Jingchen Hu June, 20 Chi-square test of GOF k χ 2 (O E) 2 = E i=1 where k = total
More informationChapter 10: Comparing Two Quantitative Variables Section 10.1: Scatterplots & Correlation
Stat 300: Intro to Probability & Statistics Textbook: Introduction to Statistical Investigations Name: American River College Chapter 10: Comparing Two Quantitative Variables Section 10.1: Scatterplots
More informationApplied Regression Analysis. Section 2: Multiple Linear Regression
Applied Regression Analysis Section 2: Multiple Linear Regression 1 The Multiple Regression Model Many problems involve more than one independent variable or factor which affects the dependent or response
More informationDoes socio-economic indicator influent ICT variable? II. Method of data collection, Objective and data gathered
Does socio-economic indicator influent ICT variable? I. Introduction This paper obtains a model of relationship between ICT indicator and macroeconomic indicator in a country. Modern economy paradigm assumes
More informationia PU BLi s g C o M Pa K T Wa i n CD-1576
M h M y CD-1576 o M Pa g C n ar ia PU BLi s in K T Wa i n ed National Geography Standards National Geography Standards Teachers leading discussions while completing units and activities is a prerequisite
More informationBIVARIATE DATA data for two variables
(Chapter 3) BIVARIATE DATA data for two variables INVESTIGATING RELATIONSHIPS We have compared the distributions of the same variable for several groups, using double boxplots and back-to-back stemplots.
More information3 Non-linearities and Dummy Variables
3 Non-linearities and Dummy Variables Reading: Kennedy (1998) A Guide to Econometrics, Chapters 3, 5 and 6 Aim: The aim of this section is to introduce students to ways of dealing with non-linearities
More informationChapter 7: Correlation and regression
Slide 7.1 Chapter 7: Correlation and regression Correlation and regression techniques examine the relationships between variables, e.g. between the price of doughnuts and the demand for them. Such analyses
More informationLecture 4 Scatterplots, Association, and Correlation
Lecture 4 Scatterplots, Association, and Correlation Previously, we looked at Single variables on their own One or more categorical variables In this lecture: We shall look at two quantitative variables.
More information