15.8 MULTIPLE REGRESSION WITH MANY EXPLANATORY VARIABLES

Size: px
Start display at page:

Download "15.8 MULTIPLE REGRESSION WITH MANY EXPLANATORY VARIABLES"

Transcription

1 15.8 MULTIPLE REGRESSION WITH MANY EXPLANATORY VARIABLES The method of multiple regression that we have studied through the use of the two explanatory variable life expectancies example can be extended to any number of explanatory (sometimes called predictor ) variables. This is very useful when we have several potentially useful explanatory variables measured for each observation and wish to explore which are useful for prediction of a response variable of interest. Performing a multiple regression analysis on all the available potentially explanatory variables can help with this. Consider the following example.

2 Example Data were gathered for a period of 17 years on the average price of beef and a number of factors that were believed to have potential effects on the price of beef. But we do not know without statistical analysis that all six explanatory variables are simultaneously needed to predict beef pricing. These explanatory variables are as follows:* CBE: consumption of beef per capita (lb) PPO: price of pork (cents/lb) CPO: consumption of pork per capita (lb) DINC: disposable income per capita index CFO: food consumption per capita index RDINC: index of real disposable income per capita It is easy to see why each of these variables could have an effect on the price of beef. For example, the more pork that is consumed, or the cheaper pork is, the less the demand for beef should be. Obtainable from any standard statistical computer package, the usual least squares multiple regression analysis yields the following regression relationship between the price of beef (in cents/lb) and the six explanatory variables: Beef price (CBE) 0. 32(PPO) 0. 87(CPO) 0. 07(DINC) 0. 37(CFO) 0. 16(RDINC) The ANOVA table for regression, as introduced in the previous section, is as follows: Sum of Degrees Mean Source squares of freedom square Regression Error Total F Clearly, we strongly reject the null hypothesis that the regression is not worthwhile (consult the F tables for 5% and 1% significance values to see whether you agree!). The equation we have constructed thus does have the power to explain the price of beef. Two essential issues arise, however. First, why does the RDINC coefficient have its sign opposite to that which we might expect? That is, we would expect that as real disposable income (RDINC) increased, the price of beef would increase since people would consume more high-priced beef, increasing its demand and driving up its price. However, the negative sign for that term indicates the *F. B. Waugh, Graphic Analysis in Agricultural Economics, Agricultural Handbook 128 (Washington, D.C.: U.S. Department of Agriculture, 1957). The term per capita means per person here.

3 opposite relationship. It is possible that prediction is helped by the (RDINC) term, but we wonder! A second and crucial question is, could we have got by with fewer explanatory variables and have done just as well in explaining the price of beef? Perhaps we are overfitting the data by including some useless explanatory variables. We will address these questions later in this section. Estimation of the Regression Equation and Using the Equation for Prediction The idea of using least squares as a method of finding coefficients in linear regression was introduced in Section 3.5 and discussed briefly in the last section. In Section 3.5, we only had one explanatory variable, and we attempted to determine what value of the slope of the regression line would minimize the mean squared error. The method here is exactly the same, except that we are minimizing over all explanatory variables simultaneously. We want to find the set of coefficients that, when applied in the regression equation, minimizes the mean square error. Of course, we cannot perform this minimization by hand without severe difficulty and spending a great amount of time. That is why we turn to a convenient computer package to provide the proper least square estimates of the coefficients of the explanatory variables. Once this regression line has been found, we can use it, as we did in Chapter 3, to make predictions as to what future responses will be, based on observed explanatory variables. As was explained in Chapter 3, we want to be careful to use only interpolation, not extrapolation. In multiple regression, interpolation means that all of the observed explanatory values used to make the prediction should be within the range of the data values of the corresponding explanatory variable used in forming the regression equation. For example, with the beef data, the range of each of the explanatory variables was as follows: CBE: PPO: CPO: DINC: CFO: RDINC: So, if we were going to use the this equation for prediction, we would want to make sure that each of the explanatory values we were using was within these ranges (a few exceptions, as long as they are not too far out of the range, would be acceptable).

4 Testing Explanatory Variables for Usefulness We now want to explore the questions we posed at the end of Example Namely, we will explore why one of the coefficients in the regression equation has its sign opposite to what was expected, and also whether we can eliminate certain explanatory variables without lessening our ability to explain the price of beef. It turns out that the answers to these questions are related. First, it is important to understand that which of the other six explanatory variables are present in a particular multiple regression equation changes the coefficient of a particular variable such as RDINC. There exists a certain amount of total variation in the response variable, as measured by its total sum of squares. Although one of the explanatory variables may explain some of this total response variation, other explanatory variables will also share in explaining the total variation in the response variable, depending on which of them are included in the model. To understand this perhaps puzzling idea, let s consider an example. Consider prediction of college freshman grade point average (GPA) using both SAT and ACT college entrance scores. First, we note that although they are somewhat different, in fact these two college entrance tests measure very similar things and are highly correlated in the population entering college. We would expect, as is true, that the SAT score by itself does a good job predicting freshman GPA, and hence will have an influential coefficient in the regression equation GPA C m(sat score). But if both scores are used together as explanatory variables, the coefficient of the SAT score will be much less because the ACT score is now also sharing in predicting freshman GPA. The point is that if several explanatory variables are present, then the coefficient of each variable represents the explanatory capacity of that variable viewed in cooperation with the explanatory capacities of other variables. Let s see how this relates to our beef pricing prediction equation. Consider the case of the variable RDINC. We now understand that the variation it explains in the regression equation of Example is variation not being explained by the other five variables, including in particular the variable DINC. It is natural to assume that RDINC and DINC, which are indeed defined to be very similar, would be explaining much the same variation in the price of beef. In fact, the the sample correlation coefficient between RDINC and DINC is 0.82, indicating a strong relationship between them and hence a similar prediction role for them. The point is that the coefficient for any particular explanatory variable in a regression equation has to be understood in the context of all the other explanatory variables present in the equation.

5 This leads to the issue of whether it is of value to include both RDINC and DINC in the regression. If they both explain approximately the same thing, then why include them both? This is a very important issue that statisticians doing multiple regression address, since in developing models to predict and explain the world around us, we are always interested in creating the simplest model (in particular, fewest explanatory variables)possible, while retaining our ability to explain or predict the response variable as well as possible. The ANOVA table shown in Example considers the regression in one line of the table. However, it is possible (though we do not explain how here) to split off, from the regression sum of squares, a sum of squares component for RDINC. For our example, this expanded ANOVA table is as follows: Sum of Degrees Mean Source squares of freedom square F Other five variables RDINC Error Total Note that the six-degrees-of-freedom regression sum of squares (743.13) of Example has here been decomposed into the five-degrees-of-freedom sum of squares for the combined influence of CBO, PPO, CPO, DINC, and CFO and the single-degree-of-freedom sum of squares for RDINC, which, as the theory says, must add to (check it!). Now we have a separate F test for the explanatory variable RDINC. It is important to recall, however, that this sum of squares for RDINC is the sum of squares assuming that the other five explanatory variables are included in the equation. Thus the F test is asking whether including the RDINC variable helps predict the response variable (the price of beef) given that the other five variables are part of the prediction equation. Because of this, if we find by using the F distribution that an explanatory variable is not important, we will want to redo the least squares regression equation with that variable removed. When a variable is removed, the coefficients of the remaining explanatory variables will change. In our example, the RDINC F statistic is We test the null hypothesis that the RDINC variable is of no use in the model (that its coefficient is 0) by comparing 0.16 to the F distribution 5% value with 1 numerator and 10 denominator degrees of freedom. This 5% point is 4.96, so we strongly conclude that we cannot reject this null hypothesis. Thus we conclude that RDINC is not of use in the presence of the other explanatory variables (and

6 hence its original negative coefficient was not to be trusted). We remove this variable from the regression equation. We could explore removing other explanatory variables. Indeed, it is important to ask how many and which variables are needed to obtain a regression equation where each included variable is useful for prediction in addition to the others present and where adding any other explanatory variable does not improve prediction. An advanced ANOVA analysis that considers all possible regression equations formed by including various subsets of the six explanatory variables produces a solution to this question: Beef price (CFO) 1. 27(CBE) 0. 78(CPO) 0. 31(PPO) Here the F test for each coefficient rejects the null hypothesis that the coefficient is zero, indicating the predictive usefulness of each of the variables, even with the other three explanatory variables present. Recall the multiple correlation coefficient was only 66% in the life expectancy example of Section By contrast, the (100 R2)% value here is 97%, a very high value indicating very effective predictive capability. Comparing the coefficients of the four explanatory variables in the above equation and in the original Example equation with all six explanatory variables, we note that two of the coefficients changed little, one changed a moderate amount, and one is now much different. Interestingly, this four explanatory variable equation has dropped both RDINC and DINC because of their being ineffective in the presence of the other four variables. SECTION 15.8 EXERCISES 1. For both of the following sets of values, using 2 3. Can you determine the value of R for the beef the regression equation found in the section, price example? Refer back to Example predict the price of beef, if it is appropriate. If on page 656. it is not appropriate, explain why not. 4. Consider the regression with two predica. CBE 52, PPO 51. 2, CPO 56. 3, tor variables based on 59 metropolitan ar- DINC 48. 7, CFO 90. 5, RDINC eas, where Y Average income in $1000 s, X Average educational level, and Z b. CBE 48, PPO 86. 3, CPO 48. 1, Percentage of workers who are white-collar. DINC 21. 9, CFO 96. 2, RDINC a. The ANOVA table below shows the sum of squares for X as the explanatory variable, 2. Answer true or false, and explain: Since the and the sum of squares for Z after X has sign for the explanatory variable DINC is explained what it can: positive, the correlation between DINC and the price of beef is necessarily also positive.

7 Sum of Degrees of Mean Source squares freedom square X ?? Z ?? Error ? Total Fill in the mean squares for XZ,, and Error, and the F statistics for X and Z. b. Perform the F test for the X variable, with significance level. 05. What do you conclude? c. Perform the F test for the Z variable. What do you conclude? d. The next ANOVA table shows the sum of squares for Z as the explanatory variable, and a blank for the sum of square for X after Z has explained what it can: Sum of Degrees of Mean Source squares freedom square F F Exams, score on exams during semester (not including final); and Final, score on final exam. a. Let Y Final. Which single one of the other variables would you expect to best predict Y? b. The ANOVA table below has the sums of squares for predicting the final exam score from the others. This first line has the sum of squares due to the three variables Labs, In Class, and Exams, and the second has the sum of squares due to the HW after the other three have explained what they can: Degrees Sum of of Mean Source squares freedom square Labs, In Class, Exams ??? HW???? Error ?? Total ? Z ?? Fill in the spaces that have question marks. X? 1?? c. Test whether Labs, In Class, and Exams Error ? together have significant predictive value Total of final exam score. d. Test whether HW has significant additional predictive power after the other three vari- Fill in the sum of squares for X, the mean ables have explained what they can. squares, and the F s. e. The next ANOVA table has the sum of e. Perform the F test for the Z variable. What squares for Labs and In Class together, do you conclude? Compare your concluthen the sum of squares for the additional sion to that in part (c). Is there a contradiceffect of Exams. tion. Explain. f. Perform the F test for the X variable. What do you conclude? Degrees g. Which equation would you prefer? Explain. Source squares freedom square F Sum of of Mean (i) Y a bx cz Labs, In Class ??? (ii) Y a bx Exams???? (iii) Y a cz Error ?? 5. The scores for 107 statistics students included the following: HW, score on book homework; Total ? Labs, score on computer laboratory assignments; In Class, score on in-class assignments; Fill in the missing information. F

8 f. Test whether Labs and In Class vari- h. Which of the following equations would ables combined have significant predictive you prefer for predicting final score? Why? power. (i) Final a b Labs c In Class d g. Test whether Exams has significant addi- Exams tional predictive power after Labs and In (ii) Final a b HW c Labs d Class have explained what they can. In Class e Exams (iii) Final a b Labs c In Class

Steps to take to do the descriptive part of regression analysis:

Steps to take to do the descriptive part of regression analysis: STA 2023 Simple Linear Regression: Least Squares Model Steps to take to do the descriptive part of regression analysis: A. Plot the data on a scatter plot. Describe patterns: 1. Is there a strong, moderate,

More information

9. Linear Regression and Correlation

9. Linear Regression and Correlation 9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,

More information

Sociology 593 Exam 2 Answer Key March 28, 2002

Sociology 593 Exam 2 Answer Key March 28, 2002 Sociology 59 Exam Answer Key March 8, 00 I. True-False. (0 points) Indicate whether the following statements are true or false. If false, briefly explain why.. A variable is called CATHOLIC. This probably

More information

Do not copy, post, or distribute

Do not copy, post, or distribute 14 CORRELATION ANALYSIS AND LINEAR REGRESSION Assessing the Covariability of Two Quantitative Properties 14.0 LEARNING OBJECTIVES In this chapter, we discuss two related techniques for assessing a possible

More information

11 Correlation and Regression

11 Correlation and Regression Chapter 11 Correlation and Regression August 21, 2017 1 11 Correlation and Regression When comparing two variables, sometimes one variable (the explanatory variable) can be used to help predict the value

More information

Answer Key: Problem Set 6

Answer Key: Problem Set 6 : Problem Set 6 1. Consider a linear model to explain monthly beer consumption: beer = + inc + price + educ + female + u 0 1 3 4 E ( u inc, price, educ, female ) = 0 ( u inc price educ female) σ inc var,,,

More information

LI EAR REGRESSIO A D CORRELATIO

LI EAR REGRESSIO A D CORRELATIO CHAPTER 6 LI EAR REGRESSIO A D CORRELATIO Page Contents 6.1 Introduction 10 6. Curve Fitting 10 6.3 Fitting a Simple Linear Regression Line 103 6.4 Linear Correlation Analysis 107 6.5 Spearman s Rank Correlation

More information

Can you tell the relationship between students SAT scores and their college grades?

Can you tell the relationship between students SAT scores and their college grades? Correlation One Challenge Can you tell the relationship between students SAT scores and their college grades? A: The higher SAT scores are, the better GPA may be. B: The higher SAT scores are, the lower

More information

Unit 6 - Introduction to linear regression

Unit 6 - Introduction to linear regression Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,

More information

Sociology 593 Exam 2 March 28, 2002

Sociology 593 Exam 2 March 28, 2002 Sociology 59 Exam March 8, 00 I. True-False. (0 points) Indicate whether the following statements are true or false. If false, briefly explain why.. A variable is called CATHOLIC. This probably means that

More information

Lecture 14. Analysis of Variance * Correlation and Regression. The McGraw-Hill Companies, Inc., 2000

Lecture 14. Analysis of Variance * Correlation and Regression. The McGraw-Hill Companies, Inc., 2000 Lecture 14 Analysis of Variance * Correlation and Regression Outline Analysis of Variance (ANOVA) 11-1 Introduction 11-2 Scatter Plots 11-3 Correlation 11-4 Regression Outline 11-5 Coefficient of Determination

More information

Lecture 14. Outline. Outline. Analysis of Variance * Correlation and Regression Analysis of Variance (ANOVA)

Lecture 14. Outline. Outline. Analysis of Variance * Correlation and Regression Analysis of Variance (ANOVA) Outline Lecture 14 Analysis of Variance * Correlation and Regression Analysis of Variance (ANOVA) 11-1 Introduction 11- Scatter Plots 11-3 Correlation 11-4 Regression Outline 11-5 Coefficient of Determination

More information

ECON 497: Lecture Notes 10 Page 1 of 1

ECON 497: Lecture Notes 10 Page 1 of 1 ECON 497: Lecture Notes 10 Page 1 of 1 Metropolitan State University ECON 497: Research and Forecasting Lecture Notes 10 Heteroskedasticity Studenmund Chapter 10 We'll start with a quote from Studenmund:

More information

Chapter 10. Correlation and Regression. McGraw-Hill, Bluman, 7th ed., Chapter 10 1

Chapter 10. Correlation and Regression. McGraw-Hill, Bluman, 7th ed., Chapter 10 1 Chapter 10 Correlation and Regression McGraw-Hill, Bluman, 7th ed., Chapter 10 1 Chapter 10 Overview Introduction 10-1 Scatter Plots and Correlation 10- Regression 10-3 Coefficient of Determination and

More information

Sampling Distributions: Central Limit Theorem

Sampling Distributions: Central Limit Theorem Review for Exam 2 Sampling Distributions: Central Limit Theorem Conceptually, we can break up the theorem into three parts: 1. The mean (µ M ) of a population of sample means (M) is equal to the mean (µ)

More information

Answer Key. 9.1 Scatter Plots and Linear Correlation. Chapter 9 Regression and Correlation. CK-12 Advanced Probability and Statistics Concepts 1

Answer Key. 9.1 Scatter Plots and Linear Correlation. Chapter 9 Regression and Correlation. CK-12 Advanced Probability and Statistics Concepts 1 9.1 Scatter Plots and Linear Correlation Answers 1. A high school psychologist wants to conduct a survey to answer the question: Is there a relationship between a student s athletic ability and his/her

More information

Hypothesis testing: Steps

Hypothesis testing: Steps Review for Exam 2 Hypothesis testing: Steps Exam 2 Review 1. Determine appropriate test and hypotheses 2. Use distribution table to find critical statistic value(s) representing rejection region 3. Compute

More information

Black White Total Observed Expected χ 2 = (f observed f expected ) 2 f expected (83 126) 2 ( )2 126

Black White Total Observed Expected χ 2 = (f observed f expected ) 2 f expected (83 126) 2 ( )2 126 Psychology 60 Fall 2013 Practice Final Actual Exam: This Wednesday. Good luck! Name: To view the solutions, check the link at the end of the document. This practice final should supplement your studying;

More information

ECON 497 Midterm Spring

ECON 497 Midterm Spring ECON 497 Midterm Spring 2009 1 ECON 497: Economic Research and Forecasting Name: Spring 2009 Bellas Midterm You have three hours and twenty minutes to complete this exam. Answer all questions and explain

More information

ECO375 Tutorial 4 Wooldridge: Chapter 6 and 7

ECO375 Tutorial 4 Wooldridge: Chapter 6 and 7 ECO375 Tutorial 4 Wooldridge: Chapter 6 and 7 Matt Tudball University of Toronto St. George October 6, 2017 Matt Tudball (University of Toronto) ECO375H1 October 6, 2017 1 / 36 ECO375 Tutorial 4 Welcome

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

t-test for b Copyright 2000 Tom Malloy. All rights reserved. Regression

t-test for b Copyright 2000 Tom Malloy. All rights reserved. Regression t-test for b Copyright 2000 Tom Malloy. All rights reserved. Regression Recall, back some time ago, we used a descriptive statistic which allowed us to draw the best fit line through a scatter plot. We

More information

Area1 Scaled Score (NAPLEX) .535 ** **.000 N. Sig. (2-tailed)

Area1 Scaled Score (NAPLEX) .535 ** **.000 N. Sig. (2-tailed) Institutional Assessment Report Texas Southern University College of Pharmacy and Health Sciences "An Analysis of 2013 NAPLEX, P4-Comp. Exams and P3 courses The following analysis illustrates relationships

More information

CRP 272 Introduction To Regression Analysis

CRP 272 Introduction To Regression Analysis CRP 272 Introduction To Regression Analysis 30 Relationships Among Two Variables: Interpretations One variable is used to explain another variable X Variable Independent Variable Explaining Variable Exogenous

More information

Practice exam questions

Practice exam questions Practice exam questions Nathaniel Higgins nhiggins@jhu.edu, nhiggins@ers.usda.gov 1. The following question is based on the model y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 + u. Discuss the following two hypotheses.

More information

SECTION I Number of Questions 42 Percent of Total Grade 50

SECTION I Number of Questions 42 Percent of Total Grade 50 AP Stats Chap 7-9 Practice Test Name Pd SECTION I Number of Questions 42 Percent of Total Grade 50 Directions: Solve each of the following problems, using the available space (or extra paper) for scratchwork.

More information

Wednesday, October 10 Handout: One-Tailed Tests, Two-Tailed Tests, and Logarithms

Wednesday, October 10 Handout: One-Tailed Tests, Two-Tailed Tests, and Logarithms Amherst College Department of Economics Economics 360 Fall 2012 Wednesday, October 10 Handout: One-Tailed Tests, Two-Tailed Tests, and Logarithms Preview A One-Tailed Hypothesis Test: The Downward Sloping

More information

Psychology Seminar Psych 406 Dr. Jeffrey Leitzel

Psychology Seminar Psych 406 Dr. Jeffrey Leitzel Psychology Seminar Psych 406 Dr. Jeffrey Leitzel Structural Equation Modeling Topic 1: Correlation / Linear Regression Outline/Overview Correlations (r, pr, sr) Linear regression Multiple regression interpreting

More information

Unit 6 - Simple linear regression

Unit 6 - Simple linear regression Sta 101: Data Analysis and Statistical Inference Dr. Çetinkaya-Rundel Unit 6 - Simple linear regression LO 1. Define the explanatory variable as the independent variable (predictor), and the response variable

More information

Multiple Linear Regression

Multiple Linear Regression 1. Purpose To Model Dependent Variables Multiple Linear Regression Purpose of multiple and simple regression is the same, to model a DV using one or more predictors (IVs) and perhaps also to obtain a prediction

More information

Chapter 10. Correlation and Regression. McGraw-Hill, Bluman, 7th ed., Chapter 10 1

Chapter 10. Correlation and Regression. McGraw-Hill, Bluman, 7th ed., Chapter 10 1 Chapter 10 Correlation and Regression McGraw-Hill, Bluman, 7th ed., Chapter 10 1 Example 10-2: Absences/Final Grades Please enter the data below in L1 and L2. The data appears on page 537 of your textbook.

More information

Econometrics Review questions for exam

Econometrics Review questions for exam Econometrics Review questions for exam Nathaniel Higgins nhiggins@jhu.edu, 1. Suppose you have a model: y = β 0 x 1 + u You propose the model above and then estimate the model using OLS to obtain: ŷ =

More information

Simple Linear Regression: One Quantitative IV

Simple Linear Regression: One Quantitative IV Simple Linear Regression: One Quantitative IV Linear regression is frequently used to explain variation observed in a dependent variable (DV) with theoretically linked independent variables (IV). For example,

More information

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables. Regression Analysis BUS 735: Business Decision Making and Research 1 Goals of this section Specific goals Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate

More information

Hypothesis testing: Steps

Hypothesis testing: Steps Review for Exam 2 Hypothesis testing: Steps Repeated-Measures ANOVA 1. Determine appropriate test and hypotheses 2. Use distribution table to find critical statistic value(s) representing rejection region

More information

CHAPTER 5 LINEAR REGRESSION AND CORRELATION

CHAPTER 5 LINEAR REGRESSION AND CORRELATION CHAPTER 5 LINEAR REGRESSION AND CORRELATION Expected Outcomes Able to use simple and multiple linear regression analysis, and correlation. Able to conduct hypothesis testing for simple and multiple linear

More information

Eco 391, J. Sandford, spring 2013 April 5, Midterm 3 4/5/2013

Eco 391, J. Sandford, spring 2013 April 5, Midterm 3 4/5/2013 Midterm 3 4/5/2013 Instructions: You may use a calculator, and one sheet of notes. You will never be penalized for showing work, but if what is asked for can be computed directly, points awarded will depend

More information

Final Exam - Solutions

Final Exam - Solutions Ecn 102 - Analysis of Economic Data University of California - Davis March 19, 2010 Instructor: John Parman Final Exam - Solutions You have until 5:30pm to complete this exam. Please remember to put your

More information

Final Exam. Question 1 (20 points) 2 (25 points) 3 (30 points) 4 (25 points) 5 (10 points) 6 (40 points) Total (150 points) Bonus question (10)

Final Exam. Question 1 (20 points) 2 (25 points) 3 (30 points) 4 (25 points) 5 (10 points) 6 (40 points) Total (150 points) Bonus question (10) Name Economics 170 Spring 2004 Honor pledge: I have neither given nor received aid on this exam including the preparation of my one page formula list and the preparation of the Stata assignment for the

More information

Simple Linear Regression

Simple Linear Regression 9-1 l Chapter 9 l Simple Linear Regression 9.1 Simple Linear Regression 9.2 Scatter Diagram 9.3 Graphical Method for Determining Regression 9.4 Least Square Method 9.5 Correlation Coefficient and Coefficient

More information

Prob/Stats Questions? /32

Prob/Stats Questions? /32 Prob/Stats 10.4 Questions? 1 /32 Prob/Stats 10.4 Homework Apply p551 Ex 10-4 p 551 7, 8, 9, 10, 12, 13, 28 2 /32 Prob/Stats 10.4 Objective Compute the equation of the least squares 3 /32 Regression A scatter

More information

Chapter 12 : Linear Correlation and Linear Regression

Chapter 12 : Linear Correlation and Linear Regression Chapter 1 : Linear Correlation and Linear Regression Determining whether a linear relationship exists between two quantitative variables, and modeling the relationship with a line, if the linear relationship

More information

In a one-way ANOVA, the total sums of squares among observations is partitioned into two components: Sums of squares represent:

In a one-way ANOVA, the total sums of squares among observations is partitioned into two components: Sums of squares represent: Activity #10: AxS ANOVA (Repeated subjects design) Resources: optimism.sav So far in MATH 300 and 301, we have studied the following hypothesis testing procedures: 1) Binomial test, sign-test, Fisher s

More information

We d like to know the equation of the line shown (the so called best fit or regression line).

We d like to know the equation of the line shown (the so called best fit or regression line). Linear Regression in R. Example. Let s create a data frame. > exam1 = c(100,90,90,85,80,75,60) > exam2 = c(95,100,90,80,95,60,40) > students = c("asuka", "Rei", "Shinji", "Mari", "Hikari", "Toji", "Kensuke")

More information

Quantitative Bivariate Data

Quantitative Bivariate Data Statistics 211 (L02) - Linear Regression Quantitative Bivariate Data Consider two quantitative variables, defined in the following way: X i - the observed value of Variable X from subject i, i = 1, 2,,

More information

3.2: Least Squares Regressions

3.2: Least Squares Regressions 3.2: Least Squares Regressions Section 3.2 Least-Squares Regression After this section, you should be able to INTERPRET a regression line CALCULATE the equation of the least-squares regression line CALCULATE

More information

STAT 350 Final (new Material) Review Problems Key Spring 2016

STAT 350 Final (new Material) Review Problems Key Spring 2016 1. The editor of a statistics textbook would like to plan for the next edition. A key variable is the number of pages that will be in the final version. Text files are prepared by the authors using LaTeX,

More information

This document contains 3 sets of practice problems.

This document contains 3 sets of practice problems. P RACTICE PROBLEMS This document contains 3 sets of practice problems. Correlation: 3 problems Regression: 4 problems ANOVA: 8 problems You should print a copy of these practice problems and bring them

More information

Marketing Research Session 10 Hypothesis Testing with Simple Random samples (Chapter 12)

Marketing Research Session 10 Hypothesis Testing with Simple Random samples (Chapter 12) Marketing Research Session 10 Hypothesis Testing with Simple Random samples (Chapter 12) Remember: Z.05 = 1.645, Z.01 = 2.33 We will only cover one-sided hypothesis testing (cases 12.3, 12.4.2, 12.5.2,

More information

Correlation and Linear Regression

Correlation and Linear Regression Correlation and Linear Regression Correlation: Relationships between Variables So far, nearly all of our discussion of inferential statistics has focused on testing for differences between group means

More information

Econometrics Homework 1

Econometrics Homework 1 Econometrics Homework Due Date: March, 24. by This problem set includes questions for Lecture -4 covered before midterm exam. Question Let z be a random column vector of size 3 : z = @ (a) Write out z

More information

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit LECTURE 6 Introduction to Econometrics Hypothesis testing & Goodness of fit October 25, 2016 1 / 23 ON TODAY S LECTURE We will explain how multiple hypotheses are tested in a regression model We will define

More information

1 Correlation and Inference from Regression

1 Correlation and Inference from Regression 1 Correlation and Inference from Regression Reading: Kennedy (1998) A Guide to Econometrics, Chapters 4 and 6 Maddala, G.S. (1992) Introduction to Econometrics p. 170-177 Moore and McCabe, chapter 12 is

More information

CHAPTER 4. > 0, where β

CHAPTER 4. > 0, where β CHAPTER 4 SOLUTIONS TO PROBLEMS 4. (i) and (iii) generally cause the t statistics not to have a t distribution under H. Homoskedasticity is one of the CLM assumptions. An important omitted variable violates

More information

Solutions to Problem Set 4 (Due November 13) Maximum number of points for Problem set 4 is: 66. Problem C 6.1

Solutions to Problem Set 4 (Due November 13) Maximum number of points for Problem set 4 is: 66. Problem C 6.1 Solutions to Problem Set 4 (Due November 13) EC 228 01, Fall 2013 Prof. Baum, Mr. Lim Maximum number of points for Problem set 4 is: 66 Problem C 6.1 (i) (3 pts.) If the presence of the incinerator depresses

More information

Multiple Regression Analysis

Multiple Regression Analysis Multiple Regression Analysis y = β 0 + β 1 x 1 + β 2 x 2 +... β k x k + u 2. Inference 0 Assumptions of the Classical Linear Model (CLM)! So far, we know: 1. The mean and variance of the OLS estimators

More information

Linear Correlation and Regression Analysis

Linear Correlation and Regression Analysis Linear Correlation and Regression Analysis Set Up the Calculator 2 nd CATALOG D arrow down DiagnosticOn ENTER ENTER SCATTER DIAGRAM Positive Linear Correlation Positive Correlation Variables will tend

More information

Psych 230. Psychological Measurement and Statistics

Psych 230. Psychological Measurement and Statistics Psych 230 Psychological Measurement and Statistics Pedro Wolf December 9, 2009 This Time. Non-Parametric statistics Chi-Square test One-way Two-way Statistical Testing 1. Decide which test to use 2. State

More information

Perform the same three operations as above on the values in the matrix, where some notation is given as a shorthand way to describe each operation:

Perform the same three operations as above on the values in the matrix, where some notation is given as a shorthand way to describe each operation: SECTION 2.1: SOLVING SYSTEMS OF EQUATIONS WITH A UNIQUE SOLUTION In Chapter 1 we took a look at finding the intersection point of two lines on a graph. Chapter 2 begins with a look at a more formal approach

More information

Solutions: Monday, October 15

Solutions: Monday, October 15 Amherst College Department of Economics Economics 360 Fall 2012 1. Consider Nebraska petroleum consumption. Solutions: Monday, October 15 Petroleum Consumption Data for Nebraska: Annual time series data

More information

AP STATISTICS Name: Period: Review Unit IV Scatterplots & Regressions

AP STATISTICS Name: Period: Review Unit IV Scatterplots & Regressions AP STATISTICS Name: Period: Review Unit IV Scatterplots & Regressions Know the definitions of the following words: bivariate data, regression analysis, scatter diagram, correlation coefficient, independent

More information

Regression Analysis. BUS 735: Business Decision Making and Research

Regression Analysis. BUS 735: Business Decision Making and Research Regression Analysis BUS 735: Business Decision Making and Research 1 Goals and Agenda Goals of this section Specific goals Learn how to detect relationships between ordinal and categorical variables. Learn

More information

B. Weaver (24-Mar-2005) Multiple Regression Chapter 5: Multiple Regression Y ) (5.1) Deviation score = (Y i

B. Weaver (24-Mar-2005) Multiple Regression Chapter 5: Multiple Regression Y ) (5.1) Deviation score = (Y i B. Weaver (24-Mar-2005) Multiple Regression... 1 Chapter 5: Multiple Regression 5.1 Partial and semi-partial correlation Before starting on multiple regression per se, we need to consider the concepts

More information

Analysis of Variance. Contents. 1 Analysis of Variance. 1.1 Review. Anthony Tanbakuchi Department of Mathematics Pima Community College

Analysis of Variance. Contents. 1 Analysis of Variance. 1.1 Review. Anthony Tanbakuchi Department of Mathematics Pima Community College Introductory Statistics Lectures Analysis of Variance 1-Way ANOVA: Many sample test of means Department of Mathematics Pima Community College Redistribution of this material is prohibited without written

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression EdPsych 580 C.J. Anderson Fall 2005 Simple Linear Regression p. 1/80 Outline 1. What it is and why it s useful 2. How 3. Statistical Inference 4. Examining assumptions (diagnostics)

More information

Lecture 10 Multiple Linear Regression

Lecture 10 Multiple Linear Regression Lecture 10 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 10-1 Topic Overview Multiple Linear Regression Model 10-2 Data for Multiple Regression Y i is the response variable

More information

Describing Bivariate Data

Describing Bivariate Data Describing Bivariate Data Correlation Linear Regression Assessing the Fit of a Line Nonlinear Relationships & Transformations The Linear Correlation Coefficient, r Recall... Bivariate Data: data that consists

More information

Chapter 10: Multiple Regression Analysis Introduction

Chapter 10: Multiple Regression Analysis Introduction Chapter 10: Multiple Regression Analysis Introduction Chapter 10 Outline Simple versus Multiple Regression Analysis Goal of Multiple Regression Analysis A One-Tailed Test: Downward Sloping Demand Theory

More information

Basic Linear Model. Chapters 4 and 4: Part II. Basic Linear Model

Basic Linear Model. Chapters 4 and 4: Part II. Basic Linear Model Basic Linear Model Chapters 4 and 4: Part II Statistical Properties of Least Square Estimates Y i = α+βx i + ε I Want to chooses estimates for α and β that best fit the data Objective minimize the sum

More information

Using regression to study economic relationships is called econometrics. econo = of or pertaining to the economy. metrics = measurement

Using regression to study economic relationships is called econometrics. econo = of or pertaining to the economy. metrics = measurement EconS 450 Forecasting part 3 Forecasting with Regression Using regression to study economic relationships is called econometrics econo = of or pertaining to the economy metrics = measurement Econometrics

More information

Chapter 6. Exploring Data: Relationships. Solutions. Exercises:

Chapter 6. Exploring Data: Relationships. Solutions. Exercises: Chapter 6 Exploring Data: Relationships Solutions Exercises: 1. (a) It is more reasonable to explore study time as an explanatory variable and the exam grade as the response variable. (b) It is more reasonable

More information

You are permitted to use your own calculator where it has been stamped as approved by the University.

You are permitted to use your own calculator where it has been stamped as approved by the University. ECONOMICS TRIPOS Part I Friday 11 June 004 9 1 Paper 3 Quantitative Methods in Economics This exam comprises four sections. Sections A and B are on Mathematics; Sections C and D are on Statistics. You

More information

REVIEW 8/2/2017 陈芳华东师大英语系

REVIEW 8/2/2017 陈芳华东师大英语系 REVIEW Hypothesis testing starts with a null hypothesis and a null distribution. We compare what we have to the null distribution, if the result is too extreme to belong to the null distribution (p

More information

Chapter 9: Roots and Irrational Numbers

Chapter 9: Roots and Irrational Numbers Chapter 9: Roots and Irrational Numbers Index: A: Square Roots B: Irrational Numbers C: Square Root Functions & Shifting D: Finding Zeros by Completing the Square E: The Quadratic Formula F: Quadratic

More information

Simple Linear Regression: One Qualitative IV

Simple Linear Regression: One Qualitative IV Simple Linear Regression: One Qualitative IV 1. Purpose As noted before regression is used both to explain and predict variation in DVs, and adding to the equation categorical variables extends regression

More information

y response variable x 1, x 2,, x k -- a set of explanatory variables

y response variable x 1, x 2,, x k -- a set of explanatory variables 11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate

More information

Paired Samples. Lecture 37 Sections 11.1, 11.2, Robb T. Koether. Hampden-Sydney College. Mon, Apr 2, 2012

Paired Samples. Lecture 37 Sections 11.1, 11.2, Robb T. Koether. Hampden-Sydney College. Mon, Apr 2, 2012 Paired Samples Lecture 37 Sections 11.1, 11.2, 11.3 Robb T. Koether Hampden-Sydney College Mon, Apr 2, 2012 Robb T. Koether (Hampden-Sydney College) Paired Samples Mon, Apr 2, 2012 1 / 17 Outline 1 Dependent

More information

Testing and Model Selection

Testing and Model Selection Testing and Model Selection This is another digression on general statistics: see PE App C.8.4. The EViews output for least squares, probit and logit includes some statistics relevant to testing hypotheses

More information

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box. FINAL EXAM ** Two different ways to submit your answer sheet (i) Use MS-Word and place it in a drop-box. (ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box. Deadline: December

More information

Correlation & Regression Chapter 5

Correlation & Regression Chapter 5 Correlation & Regression Chapter 5 Correlation: Do you have a relationship? Between two Quantitative Variables (measured on Same Person) (1) If you have a relationship (p

More information

Lesson Least Squares Regression Line as Line of Best Fit

Lesson Least Squares Regression Line as Line of Best Fit STATWAY STUDENT HANDOUT STUDENT NAME DATE INTRODUCTION Comparing Lines for Predicting Textbook Costs In the previous lesson, you predicted the value of the response variable knowing the value of the explanatory

More information

Inferences for Correlation

Inferences for Correlation Inferences for Correlation Quantitative Methods II Plan for Today Recall: correlation coefficient Bivariate normal distributions Hypotheses testing for population correlation Confidence intervals for population

More information

Information Sources. Class webpage (also linked to my.ucdavis page for the class):

Information Sources. Class webpage (also linked to my.ucdavis page for the class): STATISTICS 108 Outline for today: Go over syllabus Provide requested information I will hand out blank paper and ask questions Brief introduction and hands-on activity Information Sources Class webpage

More information

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

Correlation. A statistics method to measure the relationship between two variables. Three characteristics Correlation Correlation A statistics method to measure the relationship between two variables Three characteristics Direction of the relationship Form of the relationship Strength/Consistency Direction

More information

Stat 500 Midterm 2 12 November 2009 page 0 of 11

Stat 500 Midterm 2 12 November 2009 page 0 of 11 Stat 500 Midterm 2 12 November 2009 page 0 of 11 Please put your name on the back of your answer book. Do NOT put it on the front. Thanks. Do not start until I tell you to. The exam is closed book, closed

More information

Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania

Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania Chapter 10 Regression Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania Scatter Diagrams A graph in which pairs of points, (x, y), are

More information

Variance Estimates and the F Ratio. ERSH 8310 Lecture 3 September 2, 2009

Variance Estimates and the F Ratio. ERSH 8310 Lecture 3 September 2, 2009 Variance Estimates and the F Ratio ERSH 8310 Lecture 3 September 2, 2009 Today s Class Completing the analysis (the ANOVA table) Evaluating the F ratio Errors in hypothesis testing A complete numerical

More information

QUEEN S UNIVERSITY FINAL EXAMINATION FACULTY OF ARTS AND SCIENCE DEPARTMENT OF ECONOMICS APRIL 2018

QUEEN S UNIVERSITY FINAL EXAMINATION FACULTY OF ARTS AND SCIENCE DEPARTMENT OF ECONOMICS APRIL 2018 Page 1 of 4 QUEEN S UNIVERSITY FINAL EXAMINATION FACULTY OF ARTS AND SCIENCE DEPARTMENT OF ECONOMICS APRIL 2018 ECONOMICS 250 Introduction to Statistics Instructor: Gregor Smith Instructions: The exam

More information

STA2601. Tutorial Letter 104/1/2014. Applied Statistics II. Semester 1. Department of Statistics STA2601/104/1/2014 TRIAL EXAMINATION PAPER

STA2601. Tutorial Letter 104/1/2014. Applied Statistics II. Semester 1. Department of Statistics STA2601/104/1/2014 TRIAL EXAMINATION PAPER STA2601/104/1/2014 Tutorial Letter 104/1/2014 Applied Statistics II STA2601 Semester 1 Department of Statistics TRIAL EXAMINATION PAPER BAR CODE Learn without limits. university of south africa Dear Student

More information

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z). Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z). For example P(X 1.04) =.8508. For z < 0 subtract the value from

More information

Q1: What is the interpretation of the number 4.1? A: There were 4.1 million visits to ER by people 85 and older, Q2: What percent of people 65-74

Q1: What is the interpretation of the number 4.1? A: There were 4.1 million visits to ER by people 85 and older, Q2: What percent of people 65-74 Lecture 4 This week lab:exam 1! Review lectures, practice labs 1 to 4 and homework 1 to 5!!!!! Need help? See me during my office hrs, or goto open lab or GS 211. Bring your picture ID and simple calculator.(note

More information

Introduction To Confirmatory Factor Analysis and Item Response Theory

Introduction To Confirmatory Factor Analysis and Item Response Theory Introduction To Confirmatory Factor Analysis and Item Response Theory Lecture 23 May 3, 2005 Applied Regression Analysis Lecture #23-5/3/2005 Slide 1 of 21 Today s Lecture Confirmatory Factor Analysis.

More information

1 Descriptive statistics. 2 Scores and probability distributions. 3 Hypothesis testing and one-sample t-test. 4 More on t-tests

1 Descriptive statistics. 2 Scores and probability distributions. 3 Hypothesis testing and one-sample t-test. 4 More on t-tests Overall Overview INFOWO Statistics lecture S3: Hypothesis testing Peter de Waal Department of Information and Computing Sciences Faculty of Science, Universiteit Utrecht 1 Descriptive statistics 2 Scores

More information

Scatterplots and Correlation

Scatterplots and Correlation Bivariate Data Page 1 Scatterplots and Correlation Essential Question: What is the correlation coefficient and what does it tell you? Most statistical studies examine data on more than one variable. Fortunately,

More information

DISTRIBUTIONS USED IN STATISTICAL WORK

DISTRIBUTIONS USED IN STATISTICAL WORK DISTRIBUTIONS USED IN STATISTICAL WORK In one of the classic introductory statistics books used in Education and Psychology (Glass and Stanley, 1970, Prentice-Hall) there was an excellent chapter on different

More information

Wooldridge, Introductory Econometrics, 4th ed. Chapter 6: Multiple regression analysis: Further issues

Wooldridge, Introductory Econometrics, 4th ed. Chapter 6: Multiple regression analysis: Further issues Wooldridge, Introductory Econometrics, 4th ed. Chapter 6: Multiple regression analysis: Further issues What effects will the scale of the X and y variables have upon multiple regression? The coefficients

More information

MGEC11H3Y L01 Introduction to Regression Analysis Term Test Friday July 5, PM Instructor: Victor Yu

MGEC11H3Y L01 Introduction to Regression Analysis Term Test Friday July 5, PM Instructor: Victor Yu Last Name (Print): Solution First Name (Print): Student Number: MGECHY L Introduction to Regression Analysis Term Test Friday July, PM Instructor: Victor Yu Aids allowed: Time allowed: Calculator and one

More information

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information. STA441: Spring 2018 Multiple Regression This slide show is a free open source document. See the last slide for copyright information. 1 Least Squares Plane 2 Statistical MODEL There are p-1 explanatory

More information

Extra Exam Empirical Methods VU University Amsterdam, Faculty of Exact Sciences , July 2, 2015

Extra Exam Empirical Methods VU University Amsterdam, Faculty of Exact Sciences , July 2, 2015 Extra Exam Empirical Methods VU University Amsterdam, Faculty of Exact Sciences 12.00 14.45, July 2, 2015 Also hand in this exam and your scrap paper. Always motivate your answers. Write your answers in

More information

Mgmt 469. Causality and Identification

Mgmt 469. Causality and Identification Mgmt 469 Causality and Identification As you have learned by now, a key issue in empirical research is identifying the direction of causality in the relationship between two variables. This problem often

More information