Statistics and Quantitative Analysis U4320

Similar documents
Statistics and Quantitative Analysis U4320. Lecture 13: Explaining Variation Prof. Sharyn O Halloran

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

Correlation Analysis

Inferences for Regression

Sociology 593 Exam 2 Answer Key March 28, 2002

Statistics and Quantitative Analysis U4320. Segment 10 Prof. Sharyn O Halloran

Inference for Regression

Review of Multiple Regression

Chapter 3 Multiple Regression Complete Example

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

Can you tell the relationship between students SAT scores and their college grades?

Chapter 14 Student Lecture Notes 14-1

Lecture 28 Chi-Square Analysis

Sociology 593 Exam 2 March 28, 2002

WELCOME! Lecture 13 Thommy Perlinger

Basic Business Statistics 6 th Edition

Ordinary Least Squares Regression Explained: Vartanian

Regression Analysis II

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

Sociology Research Statistics I Final Exam Answer Key December 15, 1993

REVIEW 8/2/2017 陈芳华东师大英语系

A discussion on multiple regression models

Chapter 8 Student Lecture Notes 8-1. Department of Economics. Business Statistics. Chapter 12 Chi-square test of independence & Analysis of Variance

ECON 497 Midterm Spring

Multiple Regression. Peerapat Wongchaiwat, Ph.D.

Chapter 4. Regression Models. Learning Objectives

Mathematics for Economics MA course

Computer Exercise 3 Answers Hypothesis Testing

Statistics for Managers using Microsoft Excel 6 th Edition

CHAPTER 4 Analysis of Variance. One-way ANOVA Two-way ANOVA i) Two way ANOVA without replication ii) Two way ANOVA with replication

1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species

Department of Economics. Business Statistics. Chapter 12 Chi-square test of independence & Analysis of Variance ECON 509. Dr.

Inference for the Regression Coefficient

Chapter 12 - Lecture 2 Inferences about regression coefficient

Chapter 7 Student Lecture Notes 7-1

Lecture 9: Linear Regression

Regression Models. Chapter 4. Introduction. Introduction. Introduction

t-test for b Copyright 2000 Tom Malloy. All rights reserved. Regression

Difference in two or more average scores in different groups

Simple Linear Regression: One Qualitative IV

Interactions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept

Notes for Week 13 Analysis of Variance (ANOVA) continued WEEK 13 page 1

Chapter Fifteen. Frequency Distribution, Cross-Tabulation, and Hypothesis Testing

Two-Way ANOVA. Chapter 15

Lecture 6 Multiple Linear Regression, cont.

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58

STAT 350 Final (new Material) Review Problems Key Spring 2016

Chapter 13 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics

Chapter Learning Objectives. Regression Analysis. Correlation. Simple Linear Regression. Chapter 12. Simple Linear Regression

STAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS

School of Mathematical Sciences. Question 1

AMS 7 Correlation and Regression Lecture 8

Econometrics. 4) Statistical inference

Variance Decomposition and Goodness of Fit

Answers to Problem Set #4

Ch 3: Multiple Linear Regression

Chapter 14 Simple Linear Regression (A)

Data Analysis 1 LINEAR REGRESSION. Chapter 03

Multiple Regression Analysis

Tests of Linear Restrictions

Chapter 4: Regression Models

Inference for Regression Simple Linear Regression

Lab 10 - Binary Variables

BNAD 276 Lecture 10 Simple Linear Regression Model

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

Economics 471: Econometrics Department of Economics, Finance and Legal Studies University of Alabama

This document contains 3 sets of practice problems.

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

STAT Chapter 10: Analysis of Variance

where Female = 0 for males, = 1 for females Age is measured in years (22, 23, ) GPA is measured in units on a four-point scale (0, 1.22, 3.45, etc.

Ordinary Least Squares Regression Explained: Vartanian

Analysis of Variance: Part 1

16.400/453J Human Factors Engineering. Design of Experiments II

CIVL 7012/8012. Simple Linear Regression. Lecture 3

An Analysis of College Algebra Exam Scores December 14, James D Jones Math Section 01

Ch 2: Simple Linear Regression

Psych 230. Psychological Measurement and Statistics

Ch. 1: Data and Distributions

Chapter 10-Regression

FREC 608 Guided Exercise 9

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Multiple linear regression

The Multiple Regression Model

PubH 7405: REGRESSION ANALYSIS. MLR: INFERENCES, Part I

Area1 Scaled Score (NAPLEX) .535 ** **.000 N. Sig. (2-tailed)

Final Exam - Solutions

Ch 13 & 14 - Regression Analysis

Basic Business Statistics, 10/e

Unit 27 One-Way Analysis of Variance

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

MATH ASSIGNMENT 2: SOLUTIONS

One-Way ANOVA. Some examples of when ANOVA would be appropriate include:

Review of Statistics

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Regression Analysis and Forecasting Prof. Shalabh Department of Mathematics and Statistics Indian Institute of Technology-Kanpur

STA 2101/442 Assignment Four 1

Factorial designs. Experiments

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression

Transcription:

Statistics and Quantitative Analysis U3 Lecture 13: Explaining Variation Prof. Sharyn O Halloran Explaining Variation: Adjusted R (cont) Definition of Adjusted R So we'd like a measure like R, but one that takes into account the fact that adding extra variables always increases your explanatory power. The statistic we use for this is call the Adjusted R, and its formula is: n 1 R = 1 ( 1 R ); n = number of observations, Includes constant k = number of independent variables. The Adjusted R can actually fall if the variable you add doesn't explain much of the variance. Explaining Variation: Adjusted R (cont) Back to the Example Comparing Adjusted R Model 1:.9 Model :.31 Model 3:.33 Model :.3 Interpretation You can see that the adjusted R rises from equation 1 to equation, and from equation to equation 3. But then it falls from equation 3 to, when we add in the variables for national parks and the zodiac. Explaining Variation: Adjusted R (cont) Example: Equation Analysis of Variance Sum DFof Squares Mean Square Regression.73 3.39 Residual 7 1.75.391 F=.13 Signif F =. We calculate: Multiple R.15 R Square.3555 Adjusted R Square.31 Standard Error.5 Variable B SE B Beta T T Sig T SKOOL.1.97 1.59.153 TUBETIME -.1 -.15777-3.31.9 (Constant).1.15 13. R n 1 = 1 ( 1 R ) 7 1 = 1 ( 7 3 1. 3555 ) =. 31 Copyright Sharyn O'Halloran 1 1

Explaining Variation: Adjusted R (cont) Stepwise Regression One strategy for model building is to add variables only if they increase your adjusted R. This technique is called stepwise regression. However, I don't want to emphasize this approach to strongly. Just as people can fixate on R they can fixate on adjusted R. If you have a theory that suggests that certain variables are important for your analysis then include them whether or not they increase the adjusted R. Negative findings can be important! Comparing Models: F-Tests When to use an F-Test? Say you add a number of variables into a regression model and you want to see if, as a group, they are significant in explaining variation in your dependent variable Y. The F-test tells you whether a group of variables, or even an entire model, is jointly significant. This is in contrast to a t-test, which tells whether an individual coefficient is significantly different from zero. In short, does the specified model explain a significant proportion of the total variation. Equations To be precise, say our original equation is: Model 1: Y = b + b 1 X 1 + b X, We add two more variables, so the new equation is: Model : Y = b + b 1 X 1 + b X + b 3 X 3 + b X. We want to test the hypothesis that Η : β 3 = β =. We want to test the joint hypothesis that X 3 and X together are not significant factors in determining Y. Using Adjusted R First There's an easy way to tell if these two variables are not significant. First, run the regression without X 3 and X in it, then run the regression with X 3 and X. Now look at the adjusted R 's for the two regressions. If the adjusted R went down, then X 3 and X are not jointly significant. So the adjusted R can serve as a quick test for insignificance. Copyright Sharyn O'Halloran 1

Calculating an F-Test If the adjusted R goes up, then you need to do a more complicated test, F-Test. Ratio Let regression 1 be the model without X 3 and X, and let regression include X 3 and X. The basic idea of the F statistic, then, is to compute the ratio: SSE SSE SSE 1 Correction We have to correct for the number of independent we add. So the complete statistic is: SSE1 SSE m is the number of F = m ; SSE additional variables added to the model m = number of restrictions; k = number of independent variables. Remember: k is the total number of independent variables, including the ones that you are testing and the constant. Correction (cont.) This equation defines an F-statistic with m and n-k degrees of freedom. We write it like this: m F To get critical values for the F statistic, we use a set of tables, just like for the normal and t- statistics. Example Adding Extra Variables: Are a group of variables jointly significant? Are the variables YELOWSTN and MYSIGN jointly significant? Model1: TRUSTTV = b + blikejpan + b SKOOL + b TUBETIME 1 3 Model : TRUSTTV = b + blikejpan+ b SKOOL + b TUBETIME + b MYSIGN+ b YELOWSTN 5 1 3 Copyright Sharyn O'Halloran 1 3

Adding Extra Variables (cont.) State the null hypothesis H : B = B = 5 Calculate the F-statistic Our formula for the F-statistic is: SSE1 SSE F = m, SSE What is SSE 1? the sum of squared errors in the first regression. What is SSE? the sum of squared errors in the second regression m = N = 7 k = The formula is: 1. 7 11. F = 11. 7 =.319 Reject or fail to reject the null hypothesis? The critical value at the 5 % level F 7 from the table, is 3.. Is the F-statistic > F 7? If yes, then we reject the null hypothesis that the variables are not significantly different from zero; otherwise we fail to reject. We can reject the null hypothesis because.319 < 3.. β=.319 3. Testing All Variables: Is the Model Significant? Equation : Impact of school and TV watched DF Sum of SquaresMean Square Regression.7 3.37 Residual 7 1.7.39 F Statistic =.1 Signif F =.E- Multiple R.15 R Square.35 Adjusted R Square.31 Standard Error.5 Dependent Variable: Trust TV Variable B SE B Beta T Sig T SKOOL.1.11.9 1.59.15 TUBETIME -..1 -.15-3.31.1 (Constant).1.15 13.. Copyright Sharyn O'Halloran 1

Hypothesis Testing: State Hypothesis H : β = β = 1 Calculate test statistic Again, we start with our formula: SSE1 SSE F = m, SSE Calculate F-statistic SSE = 1.7 This is the number reported in your printout under the F statistic. SSE 1 is the sum of squared errors when there are no explanatory variables at all. If there are no explanatory variables, then SSR must be. In this case, SSE=SST. So we can substitute SST for SSE1 in our formula. SST = SSR + SSE =.73 + 1.7 = 19.5 19. 5 1. 7 F = 1. 7 7 3 =.1. Reject or fail to reject the null hypothesis? The critical value at the 5% level, F 7 from 3 your table, is 3.. So this time we can reject the null hypothesis that β 1 = β =. Interpretation? The model explains a significant amount of the total variation in how much people trust what is said on TV. Comparing Models: Example Study of 7 Seventh Grade students in a mid-western school. Path Diagram Gender + + Copyright Sharyn O'Halloran 1 5

1 1 1 Average of 7.9537 7.139 Average of 15.3797 11.957 Gender Data Average of Average of Variables = student s score on a standard test = student s grade point average Gender= students gender (1 for male; for female) Descriptive Statistics Gender Mean 7.5 Mean 1.9 Mean. Standard Error. Standard Error 1.9 Standard Error. Mode 9.17 Mode 111. Mode 1. Sample Variance.1 Sample Variance 173.7 Sample Variance. Kurtosis 1.1 Kurtosis. Kurtosis -1.7 Minimum.53 Minimum 7. Minimum. Sum 5.3 Sum 9. Sum 7. Relation between and 1 1 5 1 15 Gender Average of Average of 7.9537 15.3797 1 7.139 11.957 Grand Total 7.53 1.9379 Graphs Relation between and For Women Only 1 1 1 1 1 Relation between and For Men Only 1 1 1 1 1 1 Hypothesis Testing: Hypothesizes concerning coefficients H : β = 1 H : β a 1 We want to know if and Gender explain a significant amount of the variation in. Hypothesizes Concerning Models H H a : β = β = 1 : β = β 1 Estimation Model I Relation between and 1 1 y =.11x - 3.5571 - - 1 1 1 1 - - = b + b 1 SUMMARY OUTPUT Dependent Variable: Regression Statistics Multiple R.3 R Square. Adjusted R Square.39 Standard Error 1.3 Observations 7. ANOVA df SS MS F Significance F Regression 1 13.3 13.3 51.1.7373E-1 Residual 7 3.11.7 Total 77 339.3 Coefficients Standard Error t Stat P-value Intercept -3.5 1.55 -.9.59.1.1 7.1.7373E-1 = 3.5+. 1 Copyright Sharyn O'Halloran 1

Model II: Relation between and 1 1 = 3.73.97 +. 11Gender = - 3.5571 +.11 - - 1 1 1 1 - - = b + b + b Gender 1 SUMMARY OUTPUT Dependent Variable: Regression Statistics Multiple R.7 R Square.5 Adjusted R Square. Standard Error 1.5 Observations 7. ANOVA df SS MS F Significance F Regression 153.1 7.5 3.. Residual 75 1.7. Total 77 339.3 Coefficients Standard Error t Stat P-value -3.73 1.5 -.9.1 Intercept.11.1 7.77. Gender -.97.37 -..1 Is Model I better than Model II? F = F-test Statistics SSE SSE m SSE 1 3.11 1.7, 1.. 1 3 1 = =.11.75 77 3 1 F =. <.1 7 = 3.73.97 +. 11Gender β=..1 Yes it is. Interactive Terms Regression Statistics Multiple R.75 R Square.5 Adjusted R Square.33 Standard Error 1.55 Observations 77. ANOVA df SS MS F Significance F Regression 3 153.5 51.175.33. Residual 73 13.5.513 Total 7 33.9 Coefficients Standard Error t Stat P-value Intercept -.19.19-1.13.315.93..53. Gender -3.99 3.55-1.7. *Gender.7..97.333 The interactive term is not statistically significant. A high or low has the same effect on independent of gender. Interpretation Coefficients Both and Gender matter. increases by.11 points holding Gender constant. Gender Decreases by.97 points holding constant. Models F-statistic shows that the model that includes Gender performs significantly better in explaining variation then does the model with only. We are therefore able to reject the null hypothesis that model 1=model at the 5% significance level. Copyright Sharyn O'Halloran 1 7

Final Paper Clearly state your hypothesis. Use a path diagram to present the causal relation. Use the correlations to help you determine what causes what. State the alternative hypothesis. Present descriptive statistics. This includes a correlation matrix and histogram or scatter plot. Estimate your model. You can do simple regression, include interactive terms, do path analysis, use dummy variables; whatever is appropriate to your hypothesis. Present your results. Interpret your results. Draw out the policy implications of your analysis. The paper should begin with a brief which states the basic project and your main findings. Copyright Sharyn O'Halloran 1