Data Analysis and Statistical Methods Statistics 651

Size: px
Start display at page:

Download "Data Analysis and Statistical Methods Statistics 651"

Transcription

1 Data Analysis and Statistical Methods Statistics Lecture 31 (MWF) Review of test for independence and starting with linear regression Suhasini Subba Rao

2 Review: Test for independence In many situations we observe two variable on an individual, for example the gender and favourite colour. Often we want to see whether there is dependence between the two observations (does gender influence colour preference). If there is no dependence then the proportions with each subpopulation should be same as the proportions over the entire population. If there is a dependence, this is no longer true. The principle of the Test for independence, it so calculate expected numbers if they are independent and compare it what we observe. 1

3 Example I: Test for independence Psychologists wanted to investigate whether there was dependence between height and how bossy someone was (aka Do short men have a Napolean complex). They gathered the following data. short medium large totals bossy not bossy Test the hypothesis that there is no dependence between height and bossiness against the alternative that there is. 2

4 Solution I Recall that independence means that if you randomly selection someone the probability they will be be bossy is the same as if you were to restrict the population to tall people (or short people or middle size people) and randomly select someone in this subpopulation (of only tall, or short or middle size people). If this is the case, then size has no dependence on bossiness. In reality we cannot calculate these probabilities, because we do not observe the entire population of people, but we do have samples from the population. In this case we have a sample of 1000 people. First look at the data. We see that the proportion of short men who are bossy is larger than the proportion that the proportion of medium and 3

5 large men that are bossy. So from looking at the data, there appears to be a dependence. But this difference could be due to random variation. So we want to test whether the difference is significant or not. Our objective is to test: H 0 : There is no dependence between height and bossiness. H A : There is a dependence between height and bossiness. We first have to make a table of expected values under the null that there is no dependence between height and bossiness. 4

6 Motivation We observe that in the total population of men in the sample 30% = 300/1000 are bossy and 70% = 700/1000 are not bossy. We transfer these percentages to the subgroups of small/median and large men. short medium large totals bossy 30% of % of % of not bossy 70 % of % of % of Which gives: 5

7 short medium large totals bossy = = = not bossy = = = which is the same as: short medium large totals bossy = = = not bossy 1000 = = =

8 In summary, what you need to do... short medium large totals bossy = = = not bossy 1000 = = = So basically you just need to multiple each column number by the row number and divide by the total number to each each entry of the table. We can now evaluate the test statistic, by first taking the difference: 7

9 short medium large totals bossy (60 90) 2 60 ( ) (60 55) not bossy ( ) ( ) ( ) The test statistic is T = (60 90) ( ) ( ) ( ) (60 55) ( ) = 26 8

10 Now because there are 3 2 cells (it is a 3 by 2 table), under the null T has a χ 2 distribution with (3 1) (2 1) = 2-degrees of freedom. Look up Table 7: χ 2 2(0.05) = The p-value is P(χ 2 2 > 26) = Since T = > 5.99, there is enough evidence to reject the null. Equivalently the p-value is very small. That is, based on the data there appears to be a dependence between size and bossiness. 9

11 Example 2 A group of space explorers have discovered a planet which is inhabitated by alien creatures. They notice that there are three main groups of aliens: the Pink aliens, the Blue aliens and the Green aliens. One of the explorer s happens to be a statistican. She notices that the size of the alien tends to differ amongst the population. So she sets out to determine whether there was any dependence between the size of alien and colour of alien. She randomly selected 160 aliens and notes their colour and size (grouped as either large or small). This is the data she collected: Pink Blue Green Subtotal Big Little Subtotal State the null and alternative, what do you think were the conclusions of the statistican s research (use α = 0.05)? 10

12 Solution 2 H 0 : There is no dependence between height and colour. H A : There is a dependence between height and colour. We do a chi-squared test for independence and have need to make a table of what we expect to observe if there is no dependence between height and colour. Pink Blue Green Subtotal Big 160 = = = Little 160 = = = Subtotal

13 We now construct the T statistic Lecture 31 (MWF) Review of test for independence and linear regression T = (30 25) (20 25) (50 50) (10 15) (20 15) (30 30)2 30 = Under the null T has a χ 2 -squared distribution with (3 1) (2 1) = 2 degrees of freedom. Looking into the tables we see that χ 2 (0.05) = The p-value for 5.33 is about 0.07 (which is greater than 0.05). Since 5.33 < 5.99 there is not enough evidence to reject the null. Therefore we cannot conclude from the data that there is clear evidence for dependence between colour and height. 12

14 Linear regression Suppose I randomly pick a pick an adult and I ask you to guess their height. You would probably give me an interval, of say, 4.5 to 6.5 feet (this can be considered as a CI). Suppose I gave you the additional information that they have size 5 feet, would you reassess your previous estimate? Your would probably change your estimate. In this case you may say their height would be between The size 5 gives us additional information about that person. It allows us to narrow down our estimate and make a more precise estimate of her height. 13

15 Put into statistical terms, without knowledge of their shoe size the standard deviation is quite large. Recall that standard devation is a measure of error. Once we know their shoe size the standard devation (amount of error) decreases. Often we believe that one variable may have an influence on another variable. For example the variable X (shoe size of person) may influence the variable Y (the height height of that person). We call X the independent variable. We call Y the dependent variable. To see if X has an influence on Y we often plot a scatter plot with X on the x-axis and Y on the Y -axis. We look for a relationship between the two. 14

16 Sometimes it is not clear what influences what (for example does shoe size have an influence on height or height have an influence on shoe size), in which case, you let the dependent variable Y be the variable of interest. 15

17 Smoking and lung cancer The independent variable is number of cigerattes smoked per capita in a state and the dependent variable is the incidence of lung cancer per 100K people. 16

18 Smoking and leukemia The independent variable is number of cigerattes smoked per capita in a state and the dependent variable is the incidence of leukemia per 100K people. 17

19 None of the plots follow exactly a linear line. To check if x has an effect on Y in a linear way we could fit a line through the points. We can use the line to predict the average value of Y given x. For example, the average height of a person with size 5 feet. What line is the best line to use? How can we check whether this line has any meaning at all (after all we can put a line through any scatterplot)? 18

20 Recall the equation of a line y = mx + c Y m = y x c X In linear regression we fit this line through the data. 19

21 Least squares - the line of best fit We fit the line β 0 + β 1 x through the data, the way we choose β 0 and β 1 is using the method of least squares. We have the observations {(y 1, x 1 ),...,(y n, x n )}, and believe that y i depends linearly on x i. We use x i to predict y i. The predictor is ŷ i, where ŷ i = ˆβ 0 + ˆβ 1 x i. We want ŷ i to be as close as possible to y i, hence we choose the ˆβ 0 and ˆβ 1 such that it minimises the quantity n (y i ŷ i ) 2 = i=1 n (y i ˆβ 0 ˆβ 1 x i ) 2. i=1 20

22 A graphical representation y.. (x 3, y ) 3 y y 3 3. (x 5, y 5 ) y y 5 5 y y 4 4 (x 4, y 4 ). 1 1 (x, y ) 1 1 y y. y y 2 2 (x 2, y ) 2 x 21

23 Quantities required We need the average of the x s: x = 1 n n i=1 x i. And the average of the y s ȳ = 1 n n i=1 y i. We need to calculate: S xy = (y 1 ȳ)(x 1 x) (y n ȳ)(x n x) = S xx n = (x 1 x) (x n x) 2 = (x i x) 2 i=1 n (y i ȳ)(x i x) i=1 22

24 The equations for the parameter estimator The least squares estimator minimises the squared sum of all these vertical distances. Basically it gives the line of best fit through the observations. The line ˆβ 1 and ˆβ 0 can be evaluated using the formulas: ˆβ 1 = S xy S xx where S xy = n i=1 (y i ȳ)(x i x) and S xx = n i=1 (x i x) 2 with x and ȳ, the sample means of X and Y : x = 1 n n i=1 x i and ȳ = 1 n n i=1 y i. And ˆβ 0 = ȳ ˆβ 1 x. 23

25 Therefore given an ˆβ 0 ˆβ1, given any regressor (explanatory variable) x, we can predict y using the predictor ŷ i = ˆβ 0 + ˆβ 1 x i. 24

26 What S xy and S xx mean The S xy and S xx just fall out when trying to minimise the least squares equation. However, they do have an useful interpretation. We start by centralising the data, ie. Y i Ȳ and X i X, this does not change the slope. Let us suppose that X i exerts a positive influence on Y i. This means that large negative values of X i X are likely to result in large negative values of Y i Ȳ and large positive values of X i X are likely to result in large positive values of Y i Ȳ. What this means is that the (X i X)(Y i Ȳ ) is likely to be positive and thus i (X i X)(Y i Ȳ ) is highly likely to be positive (highly likely because remember that data is random so we can never be sure that an effect is seen in the data). Using a similar argument, we can argue that if X i exerts a negative influence on Y i, then it is highly likely i (X i X)(Y i Ȳ ) will be negative. On the 25

27 other hand, if X i does not exert any linear influence on Y i, then the product (X i X)(Y i Ȳ ) can be either negative or positive and the sum i (X i X)(Y i Ȳ ) will cancel out the negative and positive and is likely to be close to zero. S xx is simply the sample standard deviation before dividing by n 1, and measure the amount of variation of the independent variables. The value of the coefficient ˆβ 1 will vary according to the units you use. For example, suppose you want to measure the temperature has on the volume of ice on a lake, if you measure the temperature in Celcius, the slope will be different to if you measure the temperature in Fahrenheit. Thus the slope (like the mean) is sensitive to the units used. 26

28 Toy Example: size of a person and their shoe size This the mechanics of how the slope and intercept are calculated. You do not have to learn the precise details. However, it does give you some idea of what exactly the S xx and S yy are. Let x i be the shoe size and y i their height. We observe the height and shoe size of 5 people: Height y i Feet size x i It is natural to believe there is a possible linear dependence between the shoe size and height. Summary statistics: ȳ = 14 and x = 4. 27

29 Height y i 6 Lecture 31 (MWF) Review of test for independence and linear regression ȳ = 14 feet size x i x = 4 Height y i y i ȳ feet size x i x i x (x i x) (y i ȳ)(x i x) ( 8) ( 3) = = 36 S xy = 4 i=1 (y i ȳ)(x i x) = = 60. S xx = 4 i=1 (x i x) 2 = = 20. Then we have ˆβ 1 = = 3 and ˆβ 0 = ȳ ˆβ 1 x = = 2. 28

30 The line of best fit is Ŷ = 2 + 3x. Lecture 31 (MWF) Review of test for independence and linear regression We plot this below. the points are the observations and the line is the line of best fit y x 29

31 Intepretating the slope What does the slope Ŷ = 2 + 3x (where x is the shoe size and Ŷ is the predictive length) tell us about the relationship between shoe size and height? On face value, the fact that 3 is large and positive may make you think that there is a positive relationship (since the slope is not zero - since a zero slope indicates no relationship). DO NOT be fooled by this! We have estimated the size of the slope from a sample of 5 people, the slope 3 could easily be obtained randomly (when there is no relationship at all). Recall that three of the terms (X i X)(Y i Ȳ ). The slope ˆβ 1 = 3 is an estimate of the true slope (which we define later). 30

32 Therefore our objectives are: Lecture 31 (MWF) Review of test for independence and linear regression (i) Is the slope (estimator) significant? Ie. is there really evidence of a relationship. Here we need to use statistical techniques since as usual we do not observe the entire population (in this example it is just 5 people!). (ii) If there is a relationship, how strong is this relationship (again the size of 5 does not mean anything), the strength of a relationship is determined by how well the line fits the points. 31

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651 y 1 2 3 4 5 6 7 x Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching.html Lecture 32 Suhasini Subba Rao Previous lecture We are interested in whether a dependent

More information

Business Statistics. Lecture 9: Simple Regression

Business Statistics. Lecture 9: Simple Regression Business Statistics Lecture 9: Simple Regression 1 On to Model Building! Up to now, class was about descriptive and inferential statistics Numerical and graphical summaries of data Confidence intervals

More information

Black White Total Observed Expected χ 2 = (f observed f expected ) 2 f expected (83 126) 2 ( )2 126

Black White Total Observed Expected χ 2 = (f observed f expected ) 2 f expected (83 126) 2 ( )2 126 Psychology 60 Fall 2013 Practice Final Actual Exam: This Wednesday. Good luck! Name: To view the solutions, check the link at the end of the document. This practice final should supplement your studying;

More information

Chapter 7. Linear Regression (Pt. 1) 7.1 Introduction. 7.2 The Least-Squares Regression Line

Chapter 7. Linear Regression (Pt. 1) 7.1 Introduction. 7.2 The Least-Squares Regression Line Chapter 7 Linear Regression (Pt. 1) 7.1 Introduction Recall that r, the correlation coefficient, measures the linear association between two quantitative variables. Linear regression is the method of fitting

More information

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression AMS 315/576 Lecture Notes Chapter 11. Simple Linear Regression 11.1 Motivation A restaurant opening on a reservations-only basis would like to use the number of advance reservations x to predict the number

More information

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651 Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching.html Lecture 5 (MWF) Probabilities and the rules Suhasini Subba Rao Review of previous lecture We looked

More information

Chapter 27 Summary Inferences for Regression

Chapter 27 Summary Inferences for Regression Chapter 7 Summary Inferences for Regression What have we learned? We have now applied inference to regression models. Like in all inference situations, there are conditions that we must check. We can test

More information

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651 Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching.html Lecture 6 (MWF) Conditional probabilities and associations Suhasini Subba Rao Review of previous lecture

More information

Simple and Multiple Linear Regression

Simple and Multiple Linear Regression Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

appstats27.notebook April 06, 2017

appstats27.notebook April 06, 2017 Chapter 27 Objective Students will conduct inference on regression and analyze data to write a conclusion. Inferences for Regression An Example: Body Fat and Waist Size pg 634 Our chapter example revolves

More information

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI Introduction of Data Analytics Prof. Nandan Sudarsanam and Prof. B Ravindran Department of Management Studies and Department of Computer Science and Engineering Indian Institute of Technology, Madras Module

More information

Chapter 12 - Lecture 2 Inferences about regression coefficient

Chapter 12 - Lecture 2 Inferences about regression coefficient Chapter 12 - Lecture 2 Inferences about regression coefficient April 19th, 2010 Facts about slope Test Statistic Confidence interval Hypothesis testing Test using ANOVA Table Facts about slope In previous

More information

Important note: Transcripts are not substitutes for textbook assignments. 1

Important note: Transcripts are not substitutes for textbook assignments. 1 In this lesson we will cover correlation and regression, two really common statistical analyses for quantitative (or continuous) data. Specially we will review how to organize the data, the importance

More information

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters Objectives 10.1 Simple linear regression Statistical model for linear regression Estimating the regression parameters Confidence interval for regression parameters Significance test for the slope Confidence

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651 Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching.html Lecture 9 (MWF) Calculations for the normal distribution Suhasini Subba Rao Evaluating probabilities

More information

Correlation & Simple Regression

Correlation & Simple Regression Chapter 11 Correlation & Simple Regression The previous chapter dealt with inference for two categorical variables. In this chapter, we would like to examine the relationship between two quantitative variables.

More information

Ordinary Least Squares Regression Explained: Vartanian

Ordinary Least Squares Regression Explained: Vartanian Ordinary Least Squares Regression Explained: Vartanian When to Use Ordinary Least Squares Regression Analysis A. Variable types. When you have an interval/ratio scale dependent variable.. When your independent

More information

Unit 9 Regression and Correlation Homework #14 (Unit 9 Regression and Correlation) SOLUTIONS. X = cigarette consumption (per capita in 1930)

Unit 9 Regression and Correlation Homework #14 (Unit 9 Regression and Correlation) SOLUTIONS. X = cigarette consumption (per capita in 1930) BIOSTATS 540 Fall 2015 Introductory Biostatistics Page 1 of 10 Unit 9 Regression and Correlation Homework #14 (Unit 9 Regression and Correlation) SOLUTIONS Consider the following study of the relationship

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

t-test for b Copyright 2000 Tom Malloy. All rights reserved. Regression

t-test for b Copyright 2000 Tom Malloy. All rights reserved. Regression t-test for b Copyright 2000 Tom Malloy. All rights reserved. Regression Recall, back some time ago, we used a descriptive statistic which allowed us to draw the best fit line through a scatter plot. We

More information

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651 Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching.html Suhasini Subba Rao Motivations for the ANOVA We defined the F-distribution, this is mainly used in

More information

Midterm 2 - Solutions

Midterm 2 - Solutions Ecn 102 - Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman Midterm 2 - Solutions You have until 10:20am to complete this exam. Please remember to put

More information

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651 Data Analysis and Statistical Methods Statistics 65 http://www.stat.tamu.edu/~suhasini/teaching.html Suhasini Subba Rao Review In the previous lecture we considered the following tests: The independent

More information

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression Rebecca Barter April 20, 2015 Fisher s Exact Test Fisher s Exact Test

More information

LECTURE 15: SIMPLE LINEAR REGRESSION I

LECTURE 15: SIMPLE LINEAR REGRESSION I David Youngberg BSAD 20 Montgomery College LECTURE 5: SIMPLE LINEAR REGRESSION I I. From Correlation to Regression a. Recall last class when we discussed two basic types of correlation (positive and negative).

More information

AMS 7 Correlation and Regression Lecture 8

AMS 7 Correlation and Regression Lecture 8 AMS 7 Correlation and Regression Lecture 8 Department of Applied Mathematics and Statistics, University of California, Santa Cruz Suumer 2014 1 / 18 Correlation pairs of continuous observations. Correlation

More information

Inference with Simple Regression

Inference with Simple Regression 1 Introduction Inference with Simple Regression Alan B. Gelder 06E:071, The University of Iowa 1 Moving to infinite means: In this course we have seen one-mean problems, twomean problems, and problems

More information

Business Statistics 41000: Homework # 5

Business Statistics 41000: Homework # 5 Business Statistics 41000: Homework # 5 Drew Creal Due date: Beginning of class in week # 10 Remarks: These questions cover Lectures #7, 8, and 9. Question # 1. Condence intervals and plug-in predictive

More information

End of year revision

End of year revision IB Questionbank Mathematical Studies 3rd edition End of year revision 163 min 169 marks 1. A woman deposits $100 into her son s savings account on his first birthday. On his second birthday she deposits

More information

Correlation 1. December 4, HMS, 2017, v1.1

Correlation 1. December 4, HMS, 2017, v1.1 Correlation 1 December 4, 2017 1 HMS, 2017, v1.1 Chapter References Diez: Chapter 7 Navidi, Chapter 7 I don t expect you to learn the proofs what will follow. Chapter References 2 Correlation The sample

More information

Overview. Overview. Overview. Specific Examples. General Examples. Bivariate Regression & Correlation

Overview. Overview. Overview. Specific Examples. General Examples. Bivariate Regression & Correlation Bivariate Regression & Correlation Overview The Scatter Diagram Two Examples: Education & Prestige Correlation Coefficient Bivariate Linear Regression Line SPSS Output Interpretation Covariance ou already

More information

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables. Regression Analysis BUS 735: Business Decision Making and Research 1 Goals of this section Specific goals Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate

More information

Conditions for Regression Inference:

Conditions for Regression Inference: AP Statistics Chapter Notes. Inference for Linear Regression We can fit a least-squares line to any data relating two quantitative variables, but the results are useful only if the scatterplot shows a

More information

Notes 11: OLS Theorems ECO 231W - Undergraduate Econometrics

Notes 11: OLS Theorems ECO 231W - Undergraduate Econometrics Notes 11: OLS Theorems ECO 231W - Undergraduate Econometrics Prof. Carolina Caetano For a while we talked about the regression method. Then we talked about the linear model. There were many details, but

More information

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651 Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching.html https://www.openintro.org/stat/textbook.php?stat_book=os (Chapter 2) Lecture 5 (MWF) Probabilities

More information

The Simple Linear Regression Model

The Simple Linear Regression Model The Simple Linear Regression Model Lesson 3 Ryan Safner 1 1 Department of Economics Hood College ECON 480 - Econometrics Fall 2017 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 1 / 77 Bivariate

More information

Ecn Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman. Midterm 2. Name: ID Number: Section:

Ecn Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman. Midterm 2. Name: ID Number: Section: Ecn 102 - Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman Midterm 2 You have until 10:20am to complete this exam. Please remember to put your name,

More information

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit LECTURE 6 Introduction to Econometrics Hypothesis testing & Goodness of fit October 25, 2016 1 / 23 ON TODAY S LECTURE We will explain how multiple hypotheses are tested in a regression model We will define

More information

We're in interested in Pr{three sixes when throwing a single dice 8 times}. => Y has a binomial distribution, or in official notation, Y ~ BIN(n,p).

We're in interested in Pr{three sixes when throwing a single dice 8 times}. => Y has a binomial distribution, or in official notation, Y ~ BIN(n,p). Sampling distributions and estimation. 1) A brief review of distributions: We're in interested in Pr{three sixes when throwing a single dice 8 times}. => Y has a binomial distribution, or in official notation,

More information

Psych 230. Psychological Measurement and Statistics

Psych 230. Psychological Measurement and Statistics Psych 230 Psychological Measurement and Statistics Pedro Wolf December 9, 2009 This Time. Non-Parametric statistics Chi-Square test One-way Two-way Statistical Testing 1. Decide which test to use 2. State

More information

Measuring the fit of the model - SSR

Measuring the fit of the model - SSR Measuring the fit of the model - SSR Once we ve determined our estimated regression line, we d like to know how well the model fits. How far/close are the observations to the fitted line? One way to do

More information

BIOSTATISTICS NURS 3324

BIOSTATISTICS NURS 3324 Simple Linear Regression and Correlation Introduction Previously, our attention has been focused on one variable which we designated by x. Frequently, it is desirable to learn something about the relationship

More information

Introduction to Linear Regression

Introduction to Linear Regression Introduction to Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Introduction to Linear Regression 1 / 46

More information

Warm-up Using the given data Create a scatterplot Find the regression line

Warm-up Using the given data Create a scatterplot Find the regression line Time at the lunch table Caloric intake 21.4 472 30.8 498 37.7 335 32.8 423 39.5 437 22.8 508 34.1 431 33.9 479 43.8 454 42.4 450 43.1 410 29.2 504 31.3 437 28.6 489 32.9 436 30.6 480 35.1 439 33.0 444

More information

Topic 10 - Linear Regression

Topic 10 - Linear Regression Topic 10 - Linear Regression Least squares principle Hypothesis tests/confidence intervals/prediction intervals for regression 1 Linear Regression How much should you pay for a house? Would you consider

More information

Section 3: Simple Linear Regression

Section 3: Simple Linear Regression Section 3: Simple Linear Regression Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction

More information

Answer Key. 9.1 Scatter Plots and Linear Correlation. Chapter 9 Regression and Correlation. CK-12 Advanced Probability and Statistics Concepts 1

Answer Key. 9.1 Scatter Plots and Linear Correlation. Chapter 9 Regression and Correlation. CK-12 Advanced Probability and Statistics Concepts 1 9.1 Scatter Plots and Linear Correlation Answers 1. A high school psychologist wants to conduct a survey to answer the question: Is there a relationship between a student s athletic ability and his/her

More information

Business Statistics. Lecture 10: Correlation and Linear Regression

Business Statistics. Lecture 10: Correlation and Linear Regression Business Statistics Lecture 10: Correlation and Linear Regression Scatterplot A scatterplot shows the relationship between two quantitative variables measured on the same individuals. It displays the Form

More information

Applied Regression Analysis. Section 2: Multiple Linear Regression

Applied Regression Analysis. Section 2: Multiple Linear Regression Applied Regression Analysis Section 2: Multiple Linear Regression 1 The Multiple Regression Model Many problems involve more than one independent variable or factor which affects the dependent or response

More information

Exam Applied Statistical Regression. Good Luck!

Exam Applied Statistical Regression. Good Luck! Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.

More information

Binary Logistic Regression

Binary Logistic Regression The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b

More information

Final Exam - Solutions

Final Exam - Solutions Ecn 102 - Analysis of Economic Data University of California - Davis March 17, 2010 Instructor: John Parman Final Exam - Solutions You have until 12:30pm to complete this exam. Please remember to put your

More information

Inference for Regression Simple Linear Regression

Inference for Regression Simple Linear Regression Inference for Regression Simple Linear Regression IPS Chapter 10.1 2009 W.H. Freeman and Company Objectives (IPS Chapter 10.1) Simple linear regression p Statistical model for linear regression p Estimating

More information

Do not copy, post, or distribute

Do not copy, post, or distribute 14 CORRELATION ANALYSIS AND LINEAR REGRESSION Assessing the Covariability of Two Quantitative Properties 14.0 LEARNING OBJECTIVES In this chapter, we discuss two related techniques for assessing a possible

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Key Algebraic Results in Linear Regression

Key Algebraic Results in Linear Regression Key Algebraic Results in Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) 1 / 30 Key Algebraic Results in

More information

Intro to Linear Regression

Intro to Linear Regression Intro to Linear Regression Introduction to Regression Regression is a statistical procedure for modeling the relationship among variables to predict the value of a dependent variable from one or more predictor

More information

Lecture 15: Chapter 10

Lecture 15: Chapter 10 Lecture 15: Chapter 10 C C Moxley UAB Mathematics 20 July 15 10.1 Pairing Data In Chapter 9, we talked about pairing data in a natural way. In this Chapter, we will essentially be discussing whether these

More information

28. SIMPLE LINEAR REGRESSION III

28. SIMPLE LINEAR REGRESSION III 28. SIMPLE LINEAR REGRESSION III Fitted Values and Residuals To each observed x i, there corresponds a y-value on the fitted line, y = βˆ + βˆ x. The are called fitted values. ŷ i They are the values of

More information

Exam Empirical Methods VU University Amsterdam, Faculty of Exact Sciences h, February 12, 2015

Exam Empirical Methods VU University Amsterdam, Faculty of Exact Sciences h, February 12, 2015 Exam Empirical Methods VU University Amsterdam, Faculty of Exact Sciences 18.30 21.15h, February 12, 2015 Question 1 is on this page. Always motivate your answers. Write your answers in English. Only the

More information

STAT 350 Final (new Material) Review Problems Key Spring 2016

STAT 350 Final (new Material) Review Problems Key Spring 2016 1. The editor of a statistics textbook would like to plan for the next edition. A key variable is the number of pages that will be in the final version. Text files are prepared by the authors using LaTeX,

More information

Intro to Linear Regression

Intro to Linear Regression Intro to Linear Regression Introduction to Regression Regression is a statistical procedure for modeling the relationship among variables to predict the value of a dependent variable from one or more predictor

More information

Biostatistics 4: Trends and Differences

Biostatistics 4: Trends and Differences Biostatistics 4: Trends and Differences Dr. Jessica Ketchum, PhD. email: McKinneyJL@vcu.edu Objectives 1) Know how to see the strength, direction, and linearity of relationships in a scatter plot 2) Interpret

More information

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Course Review Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 370 Regression models are used to study the relationship of a response variable and one or more predictors. The response is also called the dependent variable, and the predictors

More information

4. Nonlinear regression functions

4. Nonlinear regression functions 4. Nonlinear regression functions Up to now: Population regression function was assumed to be linear The slope(s) of the population regression function is (are) constant The effect on Y of a unit-change

More information

Section 11: Quantitative analyses: Linear relationships among variables

Section 11: Quantitative analyses: Linear relationships among variables Section 11: Quantitative analyses: Linear relationships among variables Australian Catholic University 214 ALL RIGHTS RESERVED. No part of this work covered by the copyright herein may be reproduced or

More information

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik MAT2377 Rafa l Kulik Version 2015/November/26 Rafa l Kulik Bivariate data and scatterplot Data: Hydrocarbon level (x) and Oxygen level (y): x: 0.99, 1.02, 1.15, 1.29, 1.46, 1.36, 0.87, 1.23, 1.55, 1.40,

More information

Unit 6 - Introduction to linear regression

Unit 6 - Introduction to linear regression Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,

More information

This gives us an upper and lower bound that capture our population mean.

This gives us an upper and lower bound that capture our population mean. Confidence Intervals Critical Values Practice Problems 1 Estimation 1.1 Confidence Intervals Definition 1.1 Margin of error. The margin of error of a distribution is the amount of error we predict when

More information

Chapter 5 Least Squares Regression

Chapter 5 Least Squares Regression Chapter 5 Least Squares Regression A Royal Bengal tiger wandered out of a reserve forest. We tranquilized him and want to take him back to the forest. We need an idea of his weight, but have no scale!

More information

1. (Rao example 11.15) A study measures oxygen demand (y) (on a log scale) and five explanatory variables (see below). Data are available as

1. (Rao example 11.15) A study measures oxygen demand (y) (on a log scale) and five explanatory variables (see below). Data are available as ST 51, Summer, Dr. Jason A. Osborne Homework assignment # - Solutions 1. (Rao example 11.15) A study measures oxygen demand (y) (on a log scale) and five explanatory variables (see below). Data are available

More information

Analysing data: regression and correlation S6 and S7

Analysing data: regression and correlation S6 and S7 Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association

More information

REVIEW 8/2/2017 陈芳华东师大英语系

REVIEW 8/2/2017 陈芳华东师大英语系 REVIEW Hypothesis testing starts with a null hypothesis and a null distribution. We compare what we have to the null distribution, if the result is too extreme to belong to the null distribution (p

More information

HOLLOMAN S AP STATISTICS BVD CHAPTER 08, PAGE 1 OF 11. Figure 1 - Variation in the Response Variable

HOLLOMAN S AP STATISTICS BVD CHAPTER 08, PAGE 1 OF 11. Figure 1 - Variation in the Response Variable Chapter 08: Linear Regression There are lots of ways to model the relationships between variables. It is important that you not think that what we do is the way. There are many paths to the summit We are

More information

Final Exam - Solutions

Final Exam - Solutions Ecn 102 - Analysis of Economic Data University of California - Davis March 19, 2010 Instructor: John Parman Final Exam - Solutions You have until 5:30pm to complete this exam. Please remember to put your

More information

Extra Exam Empirical Methods VU University Amsterdam, Faculty of Exact Sciences , July 2, 2015

Extra Exam Empirical Methods VU University Amsterdam, Faculty of Exact Sciences , July 2, 2015 Extra Exam Empirical Methods VU University Amsterdam, Faculty of Exact Sciences 12.00 14.45, July 2, 2015 Also hand in this exam and your scrap paper. Always motivate your answers. Write your answers in

More information

The scatterplot is the basic tool for graphically displaying bivariate quantitative data.

The scatterplot is the basic tool for graphically displaying bivariate quantitative data. Bivariate Data: Graphical Display The scatterplot is the basic tool for graphically displaying bivariate quantitative data. Example: Some investors think that the performance of the stock market in January

More information

Bivariate Data: Graphical Display The scatterplot is the basic tool for graphically displaying bivariate quantitative data.

Bivariate Data: Graphical Display The scatterplot is the basic tool for graphically displaying bivariate quantitative data. Bivariate Data: Graphical Display The scatterplot is the basic tool for graphically displaying bivariate quantitative data. Example: Some investors think that the performance of the stock market in January

More information

Lectures on Simple Linear Regression Stat 431, Summer 2012

Lectures on Simple Linear Regression Stat 431, Summer 2012 Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population

More information

1 Least Squares Estimation - multiple regression.

1 Least Squares Estimation - multiple regression. Introduction to multiple regression. Fall 2010 1 Least Squares Estimation - multiple regression. Let y = {y 1,, y n } be a n 1 vector of dependent variable observations. Let β = {β 0, β 1 } be the 2 1

More information

Chapter 26: Comparing Counts (Chi Square)

Chapter 26: Comparing Counts (Chi Square) Chapter 6: Comparing Counts (Chi Square) We ve seen that you can turn a qualitative variable into a quantitative one (by counting the number of successes and failures), but that s a compromise it forces

More information

Mathematics for Economics MA course

Mathematics for Economics MA course Mathematics for Economics MA course Simple Linear Regression Dr. Seetha Bandara Simple Regression Simple linear regression is a statistical method that allows us to summarize and study relationships between

More information

WISE International Masters

WISE International Masters WISE International Masters ECONOMETRICS Instructor: Brett Graham INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This examination paper contains 32 questions. You are

More information

Review of Multiple Regression

Review of Multiple Regression Ronald H. Heck 1 Let s begin with a little review of multiple regression this week. Linear models [e.g., correlation, t-tests, analysis of variance (ANOVA), multiple regression, path analysis, multivariate

More information

a) Do you see a pattern in the scatter plot, or does it look like the data points are

a) Do you see a pattern in the scatter plot, or does it look like the data points are Aim #93: How do we distinguish between scatter plots that model a linear versus a nonlinear equation and how do we write the linear regression equation for a set of data using our calculator? Homework:

More information

Ordinary Least Squares Regression Explained: Vartanian

Ordinary Least Squares Regression Explained: Vartanian Ordinary Least Squares Regression Eplained: Vartanian When to Use Ordinary Least Squares Regression Analysis A. Variable types. When you have an interval/ratio scale dependent variable.. When your independent

More information

Lecture 6 Multiple Linear Regression, cont.

Lecture 6 Multiple Linear Regression, cont. Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression

More information

Lecture 10: F -Tests, ANOVA and R 2

Lecture 10: F -Tests, ANOVA and R 2 Lecture 10: F -Tests, ANOVA and R 2 1 ANOVA We saw that we could test the null hypothesis that β 1 0 using the statistic ( β 1 0)/ŝe. (Although I also mentioned that confidence intervals are generally

More information

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x). Linear Regression Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x). A dependent variable is a random variable whose variation

More information

a. Yes, it is consistent. a. Positive c. Near Zero

a. Yes, it is consistent. a. Positive c. Near Zero Chapter 4 Test B Multiple Choice Section 4.1 (Visualizing Variability with a Scatterplot) 1. [Objective: Analyze a scatter plot and recognize trends] Doctors believe that smoking cigarettes lowers lung

More information

Statistical View of Least Squares

Statistical View of Least Squares May 23, 2006 Purpose of Regression Some Examples Least Squares Purpose of Regression Purpose of Regression Some Examples Least Squares Suppose we have two variables x and y Purpose of Regression Some Examples

More information

Lecture 14. Analysis of Variance * Correlation and Regression. The McGraw-Hill Companies, Inc., 2000

Lecture 14. Analysis of Variance * Correlation and Regression. The McGraw-Hill Companies, Inc., 2000 Lecture 14 Analysis of Variance * Correlation and Regression Outline Analysis of Variance (ANOVA) 11-1 Introduction 11-2 Scatter Plots 11-3 Correlation 11-4 Regression Outline 11-5 Coefficient of Determination

More information

Lecture 14. Outline. Outline. Analysis of Variance * Correlation and Regression Analysis of Variance (ANOVA)

Lecture 14. Outline. Outline. Analysis of Variance * Correlation and Regression Analysis of Variance (ANOVA) Outline Lecture 14 Analysis of Variance * Correlation and Regression Analysis of Variance (ANOVA) 11-1 Introduction 11- Scatter Plots 11-3 Correlation 11-4 Regression Outline 11-5 Coefficient of Determination

More information

STAT Chapter 11: Regression

STAT Chapter 11: Regression STAT 515 -- Chapter 11: Regression Mostly we have studied the behavior of a single random variable. Often, however, we gather data on two random variables. We wish to determine: Is there a relationship

More information

THE PEARSON CORRELATION COEFFICIENT

THE PEARSON CORRELATION COEFFICIENT CORRELATION Two variables are said to have a relation if knowing the value of one variable gives you information about the likely value of the second variable this is known as a bivariate relation There

More information