Finding Relationships Among Variables

Similar documents
Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.

Bivariate Relationships Between Variables

Regression Analysis. BUS 735: Business Decision Making and Research

Regression Analysis II

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

ECON 450 Development Economics

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami

Chapter 4. Regression Models. Learning Objectives

Regression Models. Chapter 4. Introduction. Introduction. Introduction

The Multiple Regression Model

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

Business Statistics. Lecture 10: Correlation and Linear Regression

SIMPLE REGRESSION ANALYSIS. Business Statistics

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

Basic Business Statistics 6 th Edition

Chapter 14 Simple Linear Regression (A)

What is a Hypothesis?

The simple linear regression model discussed in Chapter 13 was written as

Basic Business Statistics, 10/e

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

LOOKING FOR RELATIONSHIPS

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

Analysing data: regression and correlation S6 and S7

Simple Linear Regression

MULTIPLE REGRESSION ANALYSIS AND OTHER ISSUES. Business Statistics

LI EAR REGRESSIO A D CORRELATIO

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

Inference for Regression

Chapter 12 - Lecture 2 Inferences about regression coefficient

Can you tell the relationship between students SAT scores and their college grades?

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Draft Proof - Do not copy, post, or distribute. Chapter Learning Objectives REGRESSION AND CORRELATION THE SCATTER DIAGRAM

Statistics for Managers using Microsoft Excel 6 th Edition

BNAD 276 Lecture 10 Simple Linear Regression Model

Mathematics for Economics MA course

Chapter 7: Correlation and regression

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

STA121: Applied Regression Analysis

Chapter Learning Objectives. Regression Analysis. Correlation. Simple Linear Regression. Chapter 12. Simple Linear Regression

Econ 3790: Statistics Business and Economics. Instructor: Yogesh Uppal

Exam details. Final Review Session. Things to Review

Confidence Intervals, Testing and ANOVA Summary

Applied Econometrics (QEM)

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.

Chapter 3 Multiple Regression Complete Example

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

STAT 212 Business Statistics II 1

Example: Forced Expiratory Volume (FEV) Program L13. Example: Forced Expiratory Volume (FEV) Example: Forced Expiratory Volume (FEV)

Variance Decomposition and Goodness of Fit

Chapter 4: Regression Models

Ch 13 & 14 - Regression Analysis

Correlation Analysis

CORRELATION AND SIMPLE REGRESSION 10.0 OBJECTIVES 10.1 INTRODUCTION

Section 3: Simple Linear Regression

Review of Statistics 101

Chapter 7 Student Lecture Notes 7-1

5.1 Model Specification and Data 5.2 Estimating the Parameters of the Multiple Regression Model 5.3 Sampling Properties of the Least Squares

Inferences for Regression

Lecture 9: Linear Regression

Understand the difference between symmetric and asymmetric measures

Statistics Introductory Correlation

Simple Linear Regression

Hypothesis Testing hypothesis testing approach

Chapter 16. Simple Linear Regression and Correlation

Textbook Examples of. SPSS Procedure

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

CHAPTER 5 LINEAR REGRESSION AND CORRELATION

Ordinary Least Squares Regression Explained: Vartanian

Inference for Regression Simple Linear Regression

Semester 2, 2015/2016

Econ 3790: Business and Economics Statistics. Instructor: Yogesh Uppal

Measuring the fit of the model - SSR

Formal Statement of Simple Linear Regression Model

Ch 2: Simple Linear Regression

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Simple Linear Regression Using Ordinary Least Squares

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues

What Is ANOVA? Comparing Groups. One-way ANOVA. One way ANOVA (the F ratio test)

We like to capture and represent the relationship between a set of possible causes and their response, by using a statistical predictive model.

Inference for the Regression Coefficient

Chapter 13 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics

Chapter 16. Simple Linear Regression and dcorrelation

Chapter 14 Student Lecture Notes 14-1

FAQ: Linear and Multiple Regression Analysis: Coefficients

Regression used to predict or estimate the value of one variable corresponding to a given value of another variable.

9. Linear Regression and Correlation

Biostatistics for physicists fall Correlation Linear regression Analysis of variance

Multiple Regression. Peerapat Wongchaiwat, Ph.D.

Contents. Acknowledgments. xix

ANOVA CIVL 7012/8012

Types of Statistical Tests DR. MIKE MARRAPODI

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

Correlation and simple linear regression S5

Chapter 14. Linear least squares

Practice exam questions

Business Statistics. Lecture 10: Course Review

HYPOTHESIS TESTING: THE CHI-SQUARE STATISTIC

Single and multiple linear regression analysis

y response variable x 1, x 2,, x k -- a set of explanatory variables

Bayesian Analysis LEARNING OBJECTIVES. Calculating Revised Probabilities. Calculating Revised Probabilities. Calculating Revised Probabilities

Transcription:

Finding Relationships Among Variables BUS 230: Business and Economic Research and Communication 1 Goals Specific goals: Re-familiarize ourselves with basic statistics ideas: sampling distributions, hypothesis tests, p-values. Be able to distinguish different types of data and prescribe appropriate statistical methods. Conduct a number of hypothesis tests using methods appropriate for questions involving only one or two variables. Learning objectives: LO2: Interpret data using statistical analysis. LO2.3: Formulate conclusions and recommendations based upon statistical results. What to Look For There is a closed-book, closed-note quiz tomorrow. For each test, remember the following: In plain English, be able to describe the purpose of the test. Know whether the test is a parametric test or a non-parametric test. Know the null and alternative hypotheses. Know what types of variables are appropriate for applying the test. 2 Relationships Between Two Variables 2.1 Correlation Correlation 1

A correlation exists between two variables when one of them is related to the other in some way. The Pearson linear correlation coefficient is a measure of the strength of the linear relationship between two variables. Parametric test! Null hypothesis: there is zero linear correlation between two variables. Alternative hypothesis: there is a linear correlation (either positive or negative) between two variables. Spearman s Rank Test Non-parametric test. Behind the scenes - replaces actual data with their rank, computes the Pearson using ranks. Same hypotheses. Positive linear correlation Positive correlation: two variables move in the same direction. Stronger the correlation: closer the correlation coefficient is to 1. Perfect positive correlation: ρ = 1 Negative linear correlation 2

Negative correlation: two variables move in opposite directions. Stronger the correlation: closer the correlation coefficient is to -1. Perfect negative correlation: ρ = 1 No linear correlation Panel (g): no relationship at all. Panel (h): strong relationship, but not a linear relationship. Cannot use regular correlation to detect this. 2.2 Chi-Squared Test of Independence Chi-Squared Test for Independence Used to determine if two categorical variables (eg: nominal) are related. Example: Suppose a hotel manager surveys guest who indicate they will Reason for Not Returning Reason for Stay Price Location Amenities not return: Personal/Vacation 56 49 0 Business 20 47 27 3

Data in the table are always frequencies that fall into individual categories. Could use this table to test if two variables are independent. Test of independence Null hypothesis: there is no relationship between the row variable and the column variable (independent) Alternative hypothesis: There is a relationship between the row variable and the column variable (dependent). Test statistic: χ 2 = (O E) 2 O: observed frequency in a cell from the contingency table. E: expected frequency computed with the assumption that the variables are independent. Large χ 2 values indicate variables are dependent (reject the null hypothesis). 3 Regression 3.1 Single Variable Regression Regression Regression line: equation of the line that describes the linear relationship between variable x and variable y. Need to assume that independent variables influence dependent variables. x: independent or explanatory variable. y: dependent variable. Variable x can influence the value for variable y, but not vice versa. Example: How does advertising expenditures affect sales revenue? Regression line Population regression line: E y i = β 0 + β 1 x i + ɛ i The actual coefficients β 0 and β 1 describing the relationship between x and y are unknown. 4

Use sample data to come up with an estimate of the regression line: y i = b 0 + b 1 x i + e i Since x and y are not perfectly correlated, still need to have an error term. Predicted values and residuals Given a value for x i, can come up with a predicted value for y i, denoted ŷ i. ŷ i = b 0 + b 1 x i This is not likely be the actual value for y i. Residual is the difference in the sample between the actual value of y i and the predicted value, ŷ. 3.2 Multiple Regression Multiple Regression e i = y i ŷ = y i b 0 b 1 x i Multiple regression line (population): y i = β 0 + β 1 x 1,i + β 2 x 2 +... + β k 1 x k 1 + ɛ i Multiple regression line (sample): y i = b 0 + b 1 x 1,i + b 2 x 2 +... + b k x k + e i k: number of parameters (coefficients) you are estimating. ɛ i : error term, since linear relationship between the x variables and y are not perfect. e i : residual = the difference between the predicted value ŷ and the actual value y i. Interpreting the slope Interpreting the slope, β: amount the y is predicted to increase when increasing x by one unit. When β < 0 there is a negative linear relationship. When β > 0 there is a positive linear relationship. When β = 0 there is no linear relationship between x and y. SPSS reports sample estimates for coefficients, along with... Estimates of the standard errors. T-test statistics for H 0 : β = 0. P-values of the T-tests. Confidence intervals for the coefficients. 5

3.3 Variance Decomposition Sum of Squares Measures of Variation Sum of Squares Regression (SSR): measure of the amount of variability in the dependent (Y) variable that is explained by the independent variables (X s). n SSR = (ŷ i ȳ) 2 i=1 Sum of Squares Error (SSE): measure of the unexplained variability in the dependent variable. SSE = n (y i ŷ i ) 2 i=1 Sum of Squares Measures of Variation Sum of Squares Total (SST): measure of the total variability in the dependent variable. n SST = (y i ȳ) 2 SST = SSR + SSE. Coefficient of determination i=1 The coefficient of determination is the percentage of variability in y that is explained by x. R 2 = SSR SST R 2 will always be between 0 and 1. The closer R 2 is to 1, the better x is able to explain y. The more variables you add to the regression, the higher R 2 will be. Adjusted R 2 R 2 will likely increase (slightly) even by adding nonsense variables. Adding such variables increases in-sample fit, but will likely hurt out-ofsample forecasting accuracy. The Adjusted R 2 penalizes R 2 for additional variables. R adj 2 = 1 n 1 ( 1 R 2 ) n k 1 6

When the adjusted R 2 increases when adding a variable, then the additional variable really did help explain the dependent variable. When the adjusted R 2 decreases when adding a variable, then the additional variable does not help explain the dependent variable. F-test for Regression Fit F-test for Regression Fit: Tests if the regression line explains the data. Very, very, very similar to ANOVA F-test. H 0 : β 1 = β 2 =... = β k = 0. H 1 : At least one of the variables has explanatory power (i.e. at least one coefficient is not equal to zero). F = SSR/(k 1) SSE/(n k) Where k is the number of explanatory variables. 3.4 Regression Assumptions Assumptions from the CLT Using the normal distribution to compute p-values depends on results from the Central Limit Theorem. Sufficiently large sample size (much more than 30). Useful for normality result from the Central Limit Theorem Also necessary as you increase the number of explanatory variables. Normally distributed dependent and independent variables Useful for small sample sizes, but not essential as sample size increases. Types of data: Dependent variable must be interval or ratio. Independent variable can be interval, ratio, or a dummy variable. 7

Crucial Assumptions for Regression Linearity: a straight line reasonably describes the data. Exceptions: experience on productivity, ordinal data like education level on income. Consider transforming variables. Stationarity: The central limit theorem: behavior of statistics as sample size approaches infinity! The mean and variance must exist and be constant. Big issue in economic and financial time series. Exogeneity of explanatory variables. Dependent variable must not influence explanatory variables. Explanatory variables must not be influenced by excluded variables that can influence dependent variable. Example problem: how does advertising affect sales? 8