Introduction to Regression

Size: px
Start display at page:

Download "Introduction to Regression"

Transcription

1 Introduction to Regression ιατµηµατικό Πρόγραµµα Μεταπτυχιακών Σπουδών Τεχνο-Οικονοµικά Συστήµατα ηµήτρης Φουσκάκης

2 Introduction Basic idea: Use data to identify relationships among variables and use these relationships to make predictions. Regression analysis describes the relationship between two (or more) variables. Examples: Income and educational level. Demand for electricity and the weather. Home sales and interest rates.

3 Simple Example A linear model for hours worked: Hours worked = a + b*per-capita GDP Where: Hours of work: dependent variable (Y) GDP per-capita: independent variable (X) a : intercept (or baseline), b: slope are the regression coefficients

4 Simple Example The slope of this line gives: b = Change in Hours Worked Change in GDP per - capita If b>0, hours worked increase with the level of income. If b<0, the work week gets shorter as a country develops.

5 Simple Example We want to find coefficient values that give a good fit of the data Plot of the data is called a scatter diagram It describes the relationship between Hours Worked and GDP per-capita for several countries

6 Scatter Diagram: Hours Worked and GDP per Capita Weekly Hours Worked ,000 10,000 15,000 20,000 25,000 30,000 35,000 40,000 GDP per capita

7 So Many Choices... 55,0 50,0 Weekly Hours Worked 45,0 40,0 Line C 35,0 30, GDP per capita

8 Simple Example The regression line is the line that best summarizes the data. More precisely, it s s the line that minimizes the distance between every point in the scatter diagram and the corresponding point in the line. This method of estimating the regression line is called least squares.

9 Scatter Diagram and Regression Line Weekly Hours Worked ,000 10,000 15,000 20,000 25,000 30,000 35,000 40,000 GDP per capita

10 Simple Example In our example the regression line is: Hours Worked = Per capita GDP (1.52) ( ) A $1,000 increase in GDP per-capita reduces by a quarter of an hour The standard errors (in parenthesis) are a measure of the statistical precision with which the coefficients are estimated

11 Predicting Sales from Advertising Expenditures Table 1: Advertising expenditures and first year sales of AppleGlo by region. Advertising Expenditures ($million) (x i ) First-Year Sales ($million) (y i )

12 Scatter Plot of first year sales and advertising expenditures First Year Sales ($ million) Advertising Expenditures ($ million)

13 Notation n = 14 observations Y = First Year Sales ($ million) x = Advertising Expenditures ($ million) Try to fit a simple linear regression model: Y = β 0 + β 1 x + ε noise, a Normally distributed random variable with mean 0 and standard deviation σ ( we estimate it

14 Estimation If b o and b 1 are the estimates of the intercept β 0 and the slope β 1, their values are called the regression coefficients. We want to choose b o and b 1 in such a way that the line is the best fit for the data observations (x i, y i ), i = 1,.., 14.

15 Estimation If we have trial values of b o and b 1, then the estimated or predicted values of the dependent variables are: y = b + bx ˆi 0 1 i

16 Estimation The difference then between the observed values of the dependent variable and the predicted values are called residuals and are: e = y yˆ = y b bx i i i i 0 1 i

17 Estimation Our goal is to select values of b o and b 1 in such a way that the residuals are as small as n n 2 2 possible min ( e) = min ( y yˆ ) Least Squares Method i i i i= 1 i= 1 b 1 = n i = 1 ( x x)( y y) n i i = 1 ( x x) i i 2 b0 = y bx 1

18 Estimation Advertising expenditures ($ million) Fitted values First Year Sales ($ million) Fitted values/first Year Sales ($ million)

19 Computer Output Source SS df MS Number of obs = F( 1, 12) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = expenditures Coef. Std. Err. t P> t [95% Conf. Interval] adv_exp _cons S A line can be drawn Coefficient of Determination Attempt to take into account the sampling Regression Coefficients Y = x P-values Confidence Intervals for the coefficients

20 Coefficient of Determination R 2 is the proportion of the total variation of the observed values of the dependent variable Y that is accounted for by the regression equation of the independent variables. Always 0 R 2 1. Closer to 1 means that the points lie closer to the straight line. That s s why the value of R 2 is frequently used to measure the extent to which the regression model fits the data. This is WRONG!!!!!!! There are other ways to determine whether a linear regression is valid or not.

21 Coefficient of Determination Fitted values/var var5 Fitted values var6 R 2 = Fitted values/advertising expenditures ($ million) R 2 = var4 Fitted values Advertising expenditures ($ million)

22 Coefficient of Determination and Sample Correlation It can be proved that R 2 = [corr[ (x,y) ] 2 Thus R 2 is the square of the sample correlation coefficient between the independent variable y and the dependent variable x. The estimate of the slope b 1 in the simple linear regression model can be written as S x b1 = corr( x, y) S y where S X and S Y are the sample standard deviations of x, y respectively.

23 Multiple Regression Simple Linear Regression is a model to predict the value of one variable from another. Multiple Regression is a natural extension of this model: We use it to predict values of an outcome from several predictors.

24 Predicting Sales of a product based on Multiple Factors Table: Sales of Nature-Bar, advertising expenditures, promotion expenditures, and competitors sales, by region, for Region Sales ($million) Y i Advertising Expenditures ($million) X 1i Promotions Expenditures ($million) X 2i Competitors Sales ($million) X 3i Selkirk Susquehanna Kittery Acton Finger Lakes Berkshire Central Providence Nashua Dunster Endicott Five-Towns Waldeboro Jackson Stowe

25 Predicting Sales of a product based on Multiple Factors Y: dependent variable sales of nature bar k = 3 independent variables x 1 = advertising expenditures x 2 = promotional expenditures x 3 = competitors sales n = 15 number of observations

26 Predicting Sales of a product based on Multiple Factors Y i = β 0 + β 1 x 1i + β 2 x 2i + β 3 x 3i + ε i, with i = 1,,n,n and ε i are observed values of independent Normally distributed random variables with mean 0 and standard deviation σ. β 0 : baseline; the value of Y when x 1 =x 2 =x 3 =0 β 1, β 2, β 3 denotes the change in Y per unit change in each x 1, x 2, x 3 respectively

27 Predicting Sales of a product based on Multiple Factors Y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 + ε, where ε ~Ν(0, (0,σ 2 ) Ε(Υ x 1,x 2,x 3 ) = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 and Standard Deviation (Υ ( x 1,x 2,x 3 ) = σ Does not depend on Does not depend on x 1,x 2,x 3

28 Predicting Sales of a product based on Multiple Factors Let b 0, b 1, b 2, b 3 be the estimates of β 0, β 1, β 2, β 3. The predicted values then are: ˆ i = i i i y b b x b x b x The residuals are: e = y yˆ = y b bx b x b x i i i i 0 1 1i 2 2i 3 3i

29 Predicting Sales of a product based on Multiple Factors The residual sum of squares is: n n n () e ˆ i = ( yi yi) = ( yi b0 bx 1 1i bx 2 2i bx 3 3i) i= 1 i= 1 i= 1 The best regression line is the one that chooses b 0, b 1, b 2, b 3 to minimize the above quantity.

30 Predicting Sales of a product based on Multiple Factors With our data it comes out that: Y = x x x 3 Based on the above regression let suppose that we want to predict sales of Nature-Bar for next year in the Nashua region given that we are planning to spend $0.7 million on advertising, $0.6 million on promotions and we estimate that competitors sales will remain flat at their current level of $31.30 million. Y = = $ million

31 Computer output Source SS df MS Number of obs = F( 3, 11) = Model Prob > F = Residual R-squared R = Adj R-squared = Total Root t MSE = sales Coef.. Std. Err. t P> t [95% Conf. Interval] Advertising promotion competitors _cons /23.625=2.53=t, β=0.95, df=15 =15-3-1, 1, c=2.201 from the tables of t For the C.I. we have ( , ) Let c be the number for which P(-c T c) c) = β/100, where T obeys a t-distribution t with (n-k-1) df.. If t >c we are confident at the β% confident level that the coefficient 0

32 Validation Linearity Normality of the ε i Heteroscedasticity Autocorrelation

33 Linearity The dependent variable Y depends linearly on the values of the independent variables x 1, x 2, x 3. When k=1 check that with a scatter plot With k>1 rely on common sense. Check value of R 2 but as discussed before with caution. You might need to add a quadratic term for example if there is a problem with linearity, or transform both the dependent and independent variables.

34 Normality The linear regression model Yi = β 0 + β 1 x 1i + β 2 x 2i + β 3 x 3i assumes that ε i ~Ν(0, (0,σ 2 ). 3i +ε i In order to check that plot a histogram of the regression residuals e = y yˆ = y b + bx + b x + b x i i i i 0 1 1i 2 2i 3 3i Frequency Residuals If there is evidence for no normality, you might need to transform your variables, usually the dependent.

35 Heteroscedasticity The linear regression model Y i = β 0 + β 1 x 1i + β 2 x 2i + β 3 x 3i +ε i assumes that ε i ~Ν(0, (0,σ 2 ),, i.e. all ε i have the same standard deviation. This property is called homoscedasticity. Plot residuals versus the independent variables or versus the fitted values yˆ i and check that there is no pattern. If there is a pattern you need to transform your dependent variable.

36 Heteroscedasticity Residuals advertising_expenditures

37 Heteroscedasticity

38 Autocorrelation The linear regression model Y i = β 0 + β 1 x 1i + β 2 x 2i + β 3 x 3i +ε i assumes that ε i ~Ν(0, (0,σ 2 ),, with ε i independent. The phenomenon of autocorrelation can occur if the assumption of independence is violated. Suppose that the regression model is specified with a time component (data for the last 14 weeks) Plot the residuals in time order of the observations and see if there is any kind of a pattern. If there is such a pattern then incorporate time as one of the independent variables.

39 Autocorrelation

40 Autocorrelation Residuals observation_number

41 Warnings and Issues 1. Overspecification by the addition of too many Independent Variables. Use only the independent variables that make sense. It is true that the more the better, since R 2 cannot be decreased by adding variables, but the simpler your model the better. n 5(k+2) Use stepwise multiple regression (start from the null model and add the best variables at each time until R 2 is quite large, or its increase is too small.

42 Warnings and Issues 2. Extrapolating beyond the Range of the Data. Y = x x x 3 Notice that all of the advertising expenditures (x 1 ) for the regions in the table with the data are between $0.4 and $1.9. The regression model is valid in this range. Thus it would be unwise to use the model to predict sales if we had spend for advertising purposes $10 million.

43 Warnings and Issues 3. Multicollinearity. Two independent variables are highly correlated. Should suspect it if R 2 is high but one or more of the variables does not pass the significant test. Check all correlations before running regression. If multicollinearity occurs, drop one of the independent variables that is highly correlated with another one.

44 Multicollinearity Table:Undergraduate grade point average (GPA), GMAT score and graduate school grade point average (GPA) for 25 MBA students Student Number Undergraduate GMAT GPA Graduate School GPA

45 Multicollinearity Graduate GPA = (Under. GPA) (GMAT) R 2 = Corr(under.. GPA, GMAT) = Not significant Graduate GPA = (Under. GPA) R 2 = significant

46 Outliers Observations that lie outside the overall pattern of the other observations. Observations with large residuals Observations falling far from the regression line while not following the pattern of the relationship apparent in the others

47 Outliers

48 Outliers Outliers can distort the regression results. Therefore many scientists remove them to have a better fitting. But be CAREFUL!!!!!!! Remove outliers only if you are sure that it is a bad data point. Transforming data is one way to soften the impact of outliers since the most commonly used expressions, square roots and logarithms, shrink larger values to a much greater extent than they shrink smaller values. Outliers should be investigated carefully. Often they contain valuable information about the process under investigation or the data gathering and recording process. Before considering the possible elimination of these points from the data, one should try to understand why they appeared and whether it is likely similar values will continue to appear. Of course, outliers are often bad data points.

49 Other Types of Regression Non linear (e.g. add a quadratic term)

50 Other Types of Regression Logistic Regression.. The independent variable Y is binary (common in medical research) Poisson Regression.. The independent variable Y is categorical.

51 Dummy Variables We would like to use linear regression to predict the effect that a particular phenomenon has on the value of the dependant variable, where the phenomenon in question either takes place or not

52 Dummy Variables Table: Annual Repair Costs for 19 vehicles at an automobile dealership Vehicle Age of Vehicle (Years) Automatic Transmission (Yes=1, No=0) Annual Repair Costs ($)

53 Dummy Variables Repair Cost = β 0 + β 1 x 1 + β 2 x 2 + ε, where ε ~Ν(0, (0,σ 2 ) R 2 = Coeff.. St.Err. Intercept Age Automatic Age Dummy Variable (x 2 =1 or 0 depending on weather or not the vehicle has an automatic transm.) Repair Cost = x x 2 Estimate of the additional annual repair cost if you have an automatic transmission

9. Linear Regression and Correlation

9. Linear Regression and Correlation 9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,

More information

STATISTICS 110/201 PRACTICE FINAL EXAM

STATISTICS 110/201 PRACTICE FINAL EXAM STATISTICS 110/201 PRACTICE FINAL EXAM Questions 1 to 5: There is a downloadable Stata package that produces sequential sums of squares for regression. In other words, the SS is built up as each variable

More information

Chapter 4: Regression Models

Chapter 4: Regression Models Sales volume of company 1 Textbook: pp. 129-164 Chapter 4: Regression Models Money spent on advertising 2 Learning Objectives After completing this chapter, students will be able to: Identify variables,

More information

General Linear Model (Chapter 4)

General Linear Model (Chapter 4) General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients

More information

Final Exam. Question 1 (20 points) 2 (25 points) 3 (30 points) 4 (25 points) 5 (10 points) 6 (40 points) Total (150 points) Bonus question (10)

Final Exam. Question 1 (20 points) 2 (25 points) 3 (30 points) 4 (25 points) 5 (10 points) 6 (40 points) Total (150 points) Bonus question (10) Name Economics 170 Spring 2004 Honor pledge: I have neither given nor received aid on this exam including the preparation of my one page formula list and the preparation of the Stata assignment for the

More information

Correlation and Simple Linear Regression

Correlation and Simple Linear Regression Correlation and Simple Linear Regression Sasivimol Rattanasiri, Ph.D Section for Clinical Epidemiology and Biostatistics Ramathibodi Hospital, Mahidol University E-mail: sasivimol.rat@mahidol.ac.th 1 Outline

More information

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises LINEAR REGRESSION ANALYSIS MODULE XVI Lecture - 44 Exercises Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Exercise 1 The following data has been obtained on

More information

Multiple Regression Methods

Multiple Regression Methods Chapter 1: Multiple Regression Methods Hildebrand, Ott and Gray Basic Statistical Ideas for Managers Second Edition 1 Learning Objectives for Ch. 1 The Multiple Linear Regression Model How to interpret

More information

Regression Models. Chapter 4. Introduction. Introduction. Introduction

Regression Models. Chapter 4. Introduction. Introduction. Introduction Chapter 4 Regression Models Quantitative Analysis for Management, Tenth Edition, by Render, Stair, and Hanna 008 Prentice-Hall, Inc. Introduction Regression analysis is a very valuable tool for a manager

More information

15.063: Communicating with Data

15.063: Communicating with Data 15.063: Communicating with Data Summer 2003 Recitation 6 Linear Regression Today s Content Linear Regression Multiple Regression Some Problems 15.063 - Summer '03 2 Linear Regression Why? What is it? Pros?

More information

Lab 07 Introduction to Econometrics

Lab 07 Introduction to Econometrics Lab 07 Introduction to Econometrics Learning outcomes for this lab: Introduce the different typologies of data and the econometric models that can be used Understand the rationale behind econometrics Understand

More information

Immigration attitudes (opposes immigration or supports it) it may seriously misestimate the magnitude of the effects of IVs

Immigration attitudes (opposes immigration or supports it) it may seriously misestimate the magnitude of the effects of IVs Logistic Regression, Part I: Problems with the Linear Probability Model (LPM) Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 22, 2015 This handout steals

More information

Mathematics for Economics MA course

Mathematics for Economics MA course Mathematics for Economics MA course Simple Linear Regression Dr. Seetha Bandara Simple Regression Simple linear regression is a statistical method that allows us to summarize and study relationships between

More information

Ch 13 & 14 - Regression Analysis

Ch 13 & 14 - Regression Analysis Ch 3 & 4 - Regression Analysis Simple Regression Model I. Multiple Choice:. A simple regression is a regression model that contains a. only one independent variable b. only one dependent variable c. more

More information

ECO220Y Simple Regression: Testing the Slope

ECO220Y Simple Regression: Testing the Slope ECO220Y Simple Regression: Testing the Slope Readings: Chapter 18 (Sections 18.3-18.5) Winter 2012 Lecture 19 (Winter 2012) Simple Regression Lecture 19 1 / 32 Simple Regression Model y i = β 0 + β 1 x

More information

Start with review, some new definitions, and pictures on the white board. Assumptions in the Normal Linear Regression Model

Start with review, some new definitions, and pictures on the white board. Assumptions in the Normal Linear Regression Model Start with review, some new definitions, and pictures on the white board. Assumptions in the Normal Linear Regression Model A1: There is a linear relationship between X and Y. A2: The error terms (and

More information

Interpreting coefficients for transformed variables

Interpreting coefficients for transformed variables Interpreting coefficients for transformed variables! Recall that when both independent and dependent variables are untransformed, an estimated coefficient represents the change in the dependent variable

More information

Statistical Modelling in Stata 5: Linear Models

Statistical Modelling in Stata 5: Linear Models Statistical Modelling in Stata 5: Linear Models Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 07/11/2017 Structure This Week What is a linear model? How good is my model? Does

More information

Chapter 4. Regression Models. Learning Objectives

Chapter 4. Regression Models. Learning Objectives Chapter 4 Regression Models To accompany Quantitative Analysis for Management, Eleventh Edition, by Render, Stair, and Hanna Power Point slides created by Brian Peterson Learning Objectives After completing

More information

Section Least Squares Regression

Section Least Squares Regression Section 2.3 - Least Squares Regression Statistics 104 Autumn 2004 Copyright c 2004 by Mark E. Irwin Regression Correlation gives us a strength of a linear relationship is, but it doesn t tell us what it

More information

Lecture 5. In the last lecture, we covered. This lecture introduces you to

Lecture 5. In the last lecture, we covered. This lecture introduces you to Lecture 5 In the last lecture, we covered. homework 2. The linear regression model (4.) 3. Estimating the coefficients (4.2) This lecture introduces you to. Measures of Fit (4.3) 2. The Least Square Assumptions

More information

Data Analysis 1 LINEAR REGRESSION. Chapter 03

Data Analysis 1 LINEAR REGRESSION. Chapter 03 Data Analysis 1 LINEAR REGRESSION Chapter 03 Data Analysis 2 Outline The Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression Other Considerations in Regression Model Qualitative

More information

Chapter 14 Student Lecture Notes 14-1

Chapter 14 Student Lecture Notes 14-1 Chapter 14 Student Lecture Notes 14-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter 14 Multiple Regression Analysis and Model Building Chap 14-1 Chapter Goals After completing this

More information

Problem Set #3-Key. wage Coef. Std. Err. t P> t [95% Conf. Interval]

Problem Set #3-Key. wage Coef. Std. Err. t P> t [95% Conf. Interval] Problem Set #3-Key Sonoma State University Economics 317- Introduction to Econometrics Dr. Cuellar 1. Use the data set Wage1.dta to answer the following questions. a. For the regression model Wage i =

More information

ECON3150/4150 Spring 2015

ECON3150/4150 Spring 2015 ECON3150/4150 Spring 2015 Lecture 3&4 - The linear regression model Siv-Elisabeth Skjelbred University of Oslo January 29, 2015 1 / 67 Chapter 4 in S&W Section 17.1 in S&W (extended OLS assumptions) 2

More information

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b.

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b. B203: Quantitative Methods Answer all questions from part I. Answer two question from part II.a, and one question from part II.b. Part I: Compulsory Questions. Answer all questions. Each question carries

More information

Confidence Interval for the mean response

Confidence Interval for the mean response Week 3: Prediction and Confidence Intervals at specified x. Testing lack of fit with replicates at some x's. Inference for the correlation. Introduction to regression with several explanatory variables.

More information

Practice exam questions

Practice exam questions Practice exam questions Nathaniel Higgins nhiggins@jhu.edu, nhiggins@ers.usda.gov 1. The following question is based on the model y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 + u. Discuss the following two hypotheses.

More information

Simple Linear Regression: One Qualitative IV

Simple Linear Regression: One Qualitative IV Simple Linear Regression: One Qualitative IV 1. Purpose As noted before regression is used both to explain and predict variation in DVs, and adding to the equation categorical variables extends regression

More information

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling Linear Modelling in Stata Session 6: Further Topics in Linear Modelling Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 14/11/2017 This Week Categorical Variables Categorical

More information

Business Statistics. Lecture 9: Simple Regression

Business Statistics. Lecture 9: Simple Regression Business Statistics Lecture 9: Simple Regression 1 On to Model Building! Up to now, class was about descriptive and inferential statistics Numerical and graphical summaries of data Confidence intervals

More information

2.1. Consider the following production function, known in the literature as the transcendental production function (TPF).

2.1. Consider the following production function, known in the literature as the transcendental production function (TPF). CHAPTER Functional Forms of Regression Models.1. Consider the following production function, known in the literature as the transcendental production function (TPF). Q i B 1 L B i K i B 3 e B L B K 4 i

More information

Basic Business Statistics, 10/e

Basic Business Statistics, 10/e Chapter 4 4- Basic Business Statistics th Edition Chapter 4 Introduction to Multiple Regression Basic Business Statistics, e 9 Prentice-Hall, Inc. Chap 4- Learning Objectives In this chapter, you learn:

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple

More information

Final Exam. Name: Solution:

Final Exam. Name: Solution: Final Exam. Name: Instructions. Answer all questions on the exam. Open books, open notes, but no electronic devices. The first 13 problems are worth 5 points each. The rest are worth 1 point each. HW1.

More information

Regression Models - Introduction

Regression Models - Introduction Regression Models - Introduction In regression models there are two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent

More information

3 Variables: Cyberloafing Conscientiousness Age

3 Variables: Cyberloafing Conscientiousness Age title 'Cyberloafing, Mike Sage'; run; PROC CORR data=sage; var Cyberloafing Conscientiousness Age; run; quit; The CORR Procedure 3 Variables: Cyberloafing Conscientiousness Age Simple Statistics Variable

More information

Lecture 3: Multiple Regression. Prof. Sharyn O Halloran Sustainable Development U9611 Econometrics II

Lecture 3: Multiple Regression. Prof. Sharyn O Halloran Sustainable Development U9611 Econometrics II Lecture 3: Multiple Regression Prof. Sharyn O Halloran Sustainable Development Econometrics II Outline Basics of Multiple Regression Dummy Variables Interactive terms Curvilinear models Review Strategies

More information

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests ECON4150 - Introductory Econometrics Lecture 5: OLS with One Regressor: Hypothesis Tests Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 5 Lecture outline 2 Testing Hypotheses about one

More information

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is Practice Final Exam Last Name:, First Name:. Please write LEGIBLY. Answer all questions on this exam in the space provided (you may use the back of any page if you need more space). Show all work but do

More information

Basic Business Statistics 6 th Edition

Basic Business Statistics 6 th Edition Basic Business Statistics 6 th Edition Chapter 12 Simple Linear Regression Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of a dependent variable based

More information

Lecture 4: Multivariate Regression, Part 2

Lecture 4: Multivariate Regression, Part 2 Lecture 4: Multivariate Regression, Part 2 Gauss-Markov Assumptions 1) Linear in Parameters: Y X X X i 0 1 1 2 2 k k 2) Random Sampling: we have a random sample from the population that follows the above

More information

Unit 6 - Introduction to linear regression

Unit 6 - Introduction to linear regression Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,

More information

Lecture 10 Multiple Linear Regression

Lecture 10 Multiple Linear Regression Lecture 10 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 10-1 Topic Overview Multiple Linear Regression Model 10-2 Data for Multiple Regression Y i is the response variable

More information

1: a b c d e 2: a b c d e 3: a b c d e 4: a b c d e 5: a b c d e. 6: a b c d e 7: a b c d e 8: a b c d e 9: a b c d e 10: a b c d e

1: a b c d e 2: a b c d e 3: a b c d e 4: a b c d e 5: a b c d e. 6: a b c d e 7: a b c d e 8: a b c d e 9: a b c d e 10: a b c d e Economics 102: Analysis of Economic Data Cameron Spring 2016 Department of Economics, U.C.-Davis Final Exam (A) Tuesday June 7 Compulsory. Closed book. Total of 58 points and worth 45% of course grade.

More information

Lecture 24: Partial correlation, multiple regression, and correlation

Lecture 24: Partial correlation, multiple regression, and correlation Lecture 24: Partial correlation, multiple regression, and correlation Ernesto F. L. Amaral November 21, 2017 Advanced Methods of Social Research (SOCI 420) Source: Healey, Joseph F. 2015. Statistics: A

More information

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression INTRODUCTION TO CLINICAL RESEARCH Introduction to Linear Regression Karen Bandeen-Roche, Ph.D. July 17, 2012 Acknowledgements Marie Diener-West Rick Thompson ICTR Leadership / Team JHU Intro to Clinical

More information

Question 1 carries a weight of 25%; Question 2 carries 20%; Question 3 carries 20%; Question 4 carries 35%.

Question 1 carries a weight of 25%; Question 2 carries 20%; Question 3 carries 20%; Question 4 carries 35%. UNIVERSITY OF EAST ANGLIA School of Economics Main Series PGT Examination 017-18 ECONOMETRIC METHODS ECO-7000A Time allowed: hours Answer ALL FOUR Questions. Question 1 carries a weight of 5%; Question

More information

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Course Review Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,

More information

Interactions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept

Interactions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept Interactions Lectures 1 & Regression Sometimes two variables appear related: > smoking and lung cancers > height and weight > years of education and income > engine size and gas mileage > GMAT scores and

More information

Can you tell the relationship between students SAT scores and their college grades?

Can you tell the relationship between students SAT scores and their college grades? Correlation One Challenge Can you tell the relationship between students SAT scores and their college grades? A: The higher SAT scores are, the better GPA may be. B: The higher SAT scores are, the lower

More information

sociology 362 regression

sociology 362 regression sociology 36 regression Regression is a means of studying how the conditional distribution of a response variable (say, Y) varies for different values of one or more independent explanatory variables (say,

More information

Econometrics Midterm Examination Answers

Econometrics Midterm Examination Answers Econometrics Midterm Examination Answers March 4, 204. Question (35 points) Answer the following short questions. (i) De ne what is an unbiased estimator. Show that X is an unbiased estimator for E(X i

More information

ECON3150/4150 Spring 2016

ECON3150/4150 Spring 2016 ECON3150/4150 Spring 2016 Lecture 4 - The linear regression model Siv-Elisabeth Skjelbred University of Oslo Last updated: January 26, 2016 1 / 49 Overview These lecture slides covers: The linear regression

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 13 Nonlinearities Saul Lach October 2018 Saul Lach () Applied Statistics and Econometrics October 2018 1 / 91 Outline of Lecture 13 1 Nonlinear regression functions

More information

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues Overfitting Categorical Variables Interaction Terms Non-linear Terms Linear Logarithmic y = a +

More information

Lecture 4: Multivariate Regression, Part 2

Lecture 4: Multivariate Regression, Part 2 Lecture 4: Multivariate Regression, Part 2 Gauss-Markov Assumptions 1) Linear in Parameters: Y X X X i 0 1 1 2 2 k k 2) Random Sampling: we have a random sample from the population that follows the above

More information

Instructions: Closed book, notes, and no electronic devices. Points (out of 200) in parentheses

Instructions: Closed book, notes, and no electronic devices. Points (out of 200) in parentheses ISQS 5349 Final Spring 2011 Instructions: Closed book, notes, and no electronic devices. Points (out of 200) in parentheses 1. (10) What is the definition of a regression model that we have used throughout

More information

Lecture notes on Regression & SAS example demonstration

Lecture notes on Regression & SAS example demonstration Regression & Correlation (p. 215) When two variables are measured on a single experimental unit, the resulting data are called bivariate data. You can describe each variable individually, and you can also

More information

Unit 6 - Simple linear regression

Unit 6 - Simple linear regression Sta 101: Data Analysis and Statistical Inference Dr. Çetinkaya-Rundel Unit 6 - Simple linear regression LO 1. Define the explanatory variable as the independent variable (predictor), and the response variable

More information

ECON 497 Final Exam Page 1 of 12

ECON 497 Final Exam Page 1 of 12 ECON 497 Final Exam Page of 2 ECON 497: Economic Research and Forecasting Name: Spring 2008 Bellas Final Exam Return this exam to me by 4:00 on Wednesday, April 23. It may be e-mailed to me. It may be

More information

Unit 11: Multiple Linear Regression

Unit 11: Multiple Linear Regression Unit 11: Multiple Linear Regression Statistics 571: Statistical Methods Ramón V. León 7/13/2004 Unit 11 - Stat 571 - Ramón V. León 1 Main Application of Multiple Regression Isolating the effect of a variable

More information

ECON Introductory Econometrics. Lecture 17: Experiments

ECON Introductory Econometrics. Lecture 17: Experiments ECON4150 - Introductory Econometrics Lecture 17: Experiments Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 13 Lecture outline 2 Why study experiments? The potential outcome framework.

More information

STAT Chapter 11: Regression

STAT Chapter 11: Regression STAT 515 -- Chapter 11: Regression Mostly we have studied the behavior of a single random variable. Often, however, we gather data on two random variables. We wish to determine: Is there a relationship

More information

THE MULTIVARIATE LINEAR REGRESSION MODEL

THE MULTIVARIATE LINEAR REGRESSION MODEL THE MULTIVARIATE LINEAR REGRESSION MODEL Why multiple regression analysis? Model with more than 1 independent variable: y 0 1x1 2x2 u It allows : -Controlling for other factors, and get a ceteris paribus

More information

27. SIMPLE LINEAR REGRESSION II

27. SIMPLE LINEAR REGRESSION II 27. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.

More information

Correlation & Simple Regression

Correlation & Simple Regression Chapter 11 Correlation & Simple Regression The previous chapter dealt with inference for two categorical variables. In this chapter, we would like to examine the relationship between two quantitative variables.

More information

Problem Set #5-Key Sonoma State University Dr. Cuellar Economics 317- Introduction to Econometrics

Problem Set #5-Key Sonoma State University Dr. Cuellar Economics 317- Introduction to Econometrics Problem Set #5-Key Sonoma State University Dr. Cuellar Economics 317- Introduction to Econometrics C1.1 Use the data set Wage1.dta to answer the following questions. Estimate regression equation wage =

More information

Question 1a 1b 1c 1d 1e 2a 2b 2c 2d 2e 2f 3a 3b 3c 3d 3e 3f M ult: choice Points

Question 1a 1b 1c 1d 1e 2a 2b 2c 2d 2e 2f 3a 3b 3c 3d 3e 3f M ult: choice Points Economics 102: Analysis of Economic Data Cameron Spring 2016 May 12 Department of Economics, U.C.-Davis Second Midterm Exam (Version A) Compulsory. Closed book. Total of 30 points and worth 22.5% of course

More information

Linear Regression Measurement & Evaluation of HCC Systems

Linear Regression Measurement & Evaluation of HCC Systems Linear Regression Measurement & Evaluation of HCC Systems Linear Regression Today s goal: Evaluate the effect of multiple variables on an outcome variable (regression) Outline: - Basic theory - Simple

More information

sociology 362 regression

sociology 362 regression sociology 36 regression Regression is a means of modeling how the conditional distribution of a response variable (say, Y) varies for different values of one or more independent explanatory variables (say,

More information

Exercices for Applied Econometrics A

Exercices for Applied Econometrics A QEM F. Gardes-C. Starzec-M.A. Diaye Exercices for Applied Econometrics A I. Exercice: The panel of households expenditures in Poland, for years 1997 to 2000, gives the following statistics for the whole

More information

Nonlinear Regression Functions

Nonlinear Regression Functions Nonlinear Regression Functions (SW Chapter 8) Outline 1. Nonlinear regression functions general comments 2. Nonlinear functions of one variable 3. Nonlinear functions of two variables: interactions 4.

More information

Regression analysis is a tool for building mathematical and statistical models that characterize relationships between variables Finds a linear

Regression analysis is a tool for building mathematical and statistical models that characterize relationships between variables Finds a linear Regression analysis is a tool for building mathematical and statistical models that characterize relationships between variables Finds a linear relationship between: - one independent variable X and -

More information

Homework 2: Simple Linear Regression

Homework 2: Simple Linear Regression STAT 4385 Applied Regression Analysis Homework : Simple Linear Regression (Simple Linear Regression) Thirty (n = 30) College graduates who have recently entered the job market. For each student, the CGPA

More information

Correlation Analysis

Correlation Analysis Simple Regression Correlation Analysis Correlation analysis is used to measure strength of the association (linear relationship) between two variables Correlation is only concerned with strength of the

More information

Chapter 15 Multiple Regression

Chapter 15 Multiple Regression Multiple Regression Learning Objectives 1. Understand how multiple regression analysis can be used to develop relationships involving one dependent variable and several independent variables. 2. Be able

More information

Problem 4.1. Problem 4.3

Problem 4.1. Problem 4.3 BOSTON COLLEGE Department of Economics EC 228 01 Econometric Methods Fall 2008, Prof. Baum, Ms. Phillips (tutor), Mr. Dmitriev (grader) Problem Set 3 Due at classtime, Thursday 14 Oct 2008 Problem 4.1

More information

Introduction to Regression

Introduction to Regression Regression Introduction to Regression If two variables covary, we should be able to predict the value of one variable from another. Correlation only tells us how much two variables covary. In regression,

More information

ECON Introductory Econometrics. Lecture 6: OLS with Multiple Regressors

ECON Introductory Econometrics. Lecture 6: OLS with Multiple Regressors ECON4150 - Introductory Econometrics Lecture 6: OLS with Multiple Regressors Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 6 Lecture outline 2 Violation of first Least Squares assumption

More information

Binary Dependent Variables

Binary Dependent Variables Binary Dependent Variables In some cases the outcome of interest rather than one of the right hand side variables - is discrete rather than continuous Binary Dependent Variables In some cases the outcome

More information

regression analysis is a type of inferential statistics which tells us whether relationships between two or more variables exist

regression analysis is a type of inferential statistics which tells us whether relationships between two or more variables exist regression analysis is a type of inferential statistics which tells us whether relationships between two or more variables exist sales $ (y - dependent variable) advertising $ (x - independent variable)

More information

4.1 Least Squares Prediction 4.2 Measuring Goodness-of-Fit. 4.3 Modeling Issues. 4.4 Log-Linear Models

4.1 Least Squares Prediction 4.2 Measuring Goodness-of-Fit. 4.3 Modeling Issues. 4.4 Log-Linear Models 4.1 Least Squares Prediction 4. Measuring Goodness-of-Fit 4.3 Modeling Issues 4.4 Log-Linear Models y = β + β x + e 0 1 0 0 ( ) E y where e 0 is a random error. We assume that and E( e 0 ) = 0 var ( e

More information

STAT 212 Business Statistics II 1

STAT 212 Business Statistics II 1 STAT 1 Business Statistics II 1 KING FAHD UNIVERSITY OF PETROLEUM & MINERALS DEPARTMENT OF MATHEMATICAL SCIENCES DHAHRAN, SAUDI ARABIA STAT 1: BUSINESS STATISTICS II Semester 091 Final Exam Thursday Feb

More information

Decision 411: Class 7

Decision 411: Class 7 Decision 411: Class 7 Confidence limits for sums of coefficients Use of the time index as a regressor The difficulty of predicting the future Confidence intervals for sums of coefficients Sometimes the

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

Chapter 9. Correlation and Regression

Chapter 9. Correlation and Regression Chapter 9 Correlation and Regression Lesson 9-1/9-2, Part 1 Correlation Registered Florida Pleasure Crafts and Watercraft Related Manatee Deaths 100 80 60 40 20 0 1991 1993 1995 1997 1999 Year Boats in

More information

Sampling Distributions in Regression. Mini-Review: Inference for a Mean. For data (x 1, y 1 ),, (x n, y n ) generated with the SRM,

Sampling Distributions in Regression. Mini-Review: Inference for a Mean. For data (x 1, y 1 ),, (x n, y n ) generated with the SRM, Department of Statistics The Wharton School University of Pennsylvania Statistics 61 Fall 3 Module 3 Inference about the SRM Mini-Review: Inference for a Mean An ideal setup for inference about a mean

More information

Multiple Regression. Peerapat Wongchaiwat, Ph.D.

Multiple Regression. Peerapat Wongchaiwat, Ph.D. Peerapat Wongchaiwat, Ph.D. wongchaiwat@hotmail.com The Multiple Regression Model Examine the linear relationship between 1 dependent (Y) & 2 or more independent variables (X i ) Multiple Regression Model

More information

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables. Regression Analysis BUS 735: Business Decision Making and Research 1 Goals of this section Specific goals Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate

More information

LI EAR REGRESSIO A D CORRELATIO

LI EAR REGRESSIO A D CORRELATIO CHAPTER 6 LI EAR REGRESSIO A D CORRELATIO Page Contents 6.1 Introduction 10 6. Curve Fitting 10 6.3 Fitting a Simple Linear Regression Line 103 6.4 Linear Correlation Analysis 107 6.5 Spearman s Rank Correlation

More information

Please discuss each of the 3 problems on a separate sheet of paper, not just on a separate page!

Please discuss each of the 3 problems on a separate sheet of paper, not just on a separate page! Econometrics - Exam May 11, 2011 1 Exam Please discuss each of the 3 problems on a separate sheet of paper, not just on a separate page! Problem 1: (15 points) A researcher has data for the year 2000 from

More information

Problem Set 1 ANSWERS

Problem Set 1 ANSWERS Economics 20 Prof. Patricia M. Anderson Problem Set 1 ANSWERS Part I. Multiple Choice Problems 1. If X and Z are two random variables, then E[X-Z] is d. E[X] E[Z] This is just a simple application of one

More information

At this point, if you ve done everything correctly, you should have data that looks something like:

At this point, if you ve done everything correctly, you should have data that looks something like: This homework is due on July 19 th. Economics 375: Introduction to Econometrics Homework #4 1. One tool to aid in understanding econometrics is the Monte Carlo experiment. A Monte Carlo experiment allows

More information

Correlation and Linear Regression

Correlation and Linear Regression Correlation and Linear Regression Correlation: Relationships between Variables So far, nearly all of our discussion of inferential statistics has focused on testing for differences between group means

More information

University of California at Berkeley Fall Introductory Applied Econometrics Final examination. Scores add up to 125 points

University of California at Berkeley Fall Introductory Applied Econometrics Final examination. Scores add up to 125 points EEP 118 / IAS 118 Elisabeth Sadoulet and Kelly Jones University of California at Berkeley Fall 2008 Introductory Applied Econometrics Final examination Scores add up to 125 points Your name: SID: 1 1.

More information

Lecture 12: Interactions and Splines

Lecture 12: Interactions and Splines Lecture 12: Interactions and Splines Sandy Eckel seckel@jhsph.edu 12 May 2007 1 Definition Effect Modification The phenomenon in which the relationship between the primary predictor and outcome varies

More information

Essential of Simple regression

Essential of Simple regression Essential of Simple regression We use simple regression when we are interested in the relationship between two variables (e.g., x is class size, and y is student s GPA). For simplicity we assume the relationship

More information

Linear Regression with Multiple Regressors

Linear Regression with Multiple Regressors Linear Regression with Multiple Regressors (SW Chapter 6) Outline 1. Omitted variable bias 2. Causality and regression analysis 3. Multiple regression and OLS 4. Measures of fit 5. Sampling distribution

More information

Statistics for Managers using Microsoft Excel 6 th Edition

Statistics for Managers using Microsoft Excel 6 th Edition Statistics for Managers using Microsoft Excel 6 th Edition Chapter 13 Simple Linear Regression 13-1 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of

More information