Introduction to Regression
|
|
- Marsha O’Connor’
- 5 years ago
- Views:
Transcription
1 Introduction to Regression ιατµηµατικό Πρόγραµµα Μεταπτυχιακών Σπουδών Τεχνο-Οικονοµικά Συστήµατα ηµήτρης Φουσκάκης
2 Introduction Basic idea: Use data to identify relationships among variables and use these relationships to make predictions. Regression analysis describes the relationship between two (or more) variables. Examples: Income and educational level. Demand for electricity and the weather. Home sales and interest rates.
3 Simple Example A linear model for hours worked: Hours worked = a + b*per-capita GDP Where: Hours of work: dependent variable (Y) GDP per-capita: independent variable (X) a : intercept (or baseline), b: slope are the regression coefficients
4 Simple Example The slope of this line gives: b = Change in Hours Worked Change in GDP per - capita If b>0, hours worked increase with the level of income. If b<0, the work week gets shorter as a country develops.
5 Simple Example We want to find coefficient values that give a good fit of the data Plot of the data is called a scatter diagram It describes the relationship between Hours Worked and GDP per-capita for several countries
6 Scatter Diagram: Hours Worked and GDP per Capita Weekly Hours Worked ,000 10,000 15,000 20,000 25,000 30,000 35,000 40,000 GDP per capita
7 So Many Choices... 55,0 50,0 Weekly Hours Worked 45,0 40,0 Line C 35,0 30, GDP per capita
8 Simple Example The regression line is the line that best summarizes the data. More precisely, it s s the line that minimizes the distance between every point in the scatter diagram and the corresponding point in the line. This method of estimating the regression line is called least squares.
9 Scatter Diagram and Regression Line Weekly Hours Worked ,000 10,000 15,000 20,000 25,000 30,000 35,000 40,000 GDP per capita
10 Simple Example In our example the regression line is: Hours Worked = Per capita GDP (1.52) ( ) A $1,000 increase in GDP per-capita reduces by a quarter of an hour The standard errors (in parenthesis) are a measure of the statistical precision with which the coefficients are estimated
11 Predicting Sales from Advertising Expenditures Table 1: Advertising expenditures and first year sales of AppleGlo by region. Advertising Expenditures ($million) (x i ) First-Year Sales ($million) (y i )
12 Scatter Plot of first year sales and advertising expenditures First Year Sales ($ million) Advertising Expenditures ($ million)
13 Notation n = 14 observations Y = First Year Sales ($ million) x = Advertising Expenditures ($ million) Try to fit a simple linear regression model: Y = β 0 + β 1 x + ε noise, a Normally distributed random variable with mean 0 and standard deviation σ ( we estimate it
14 Estimation If b o and b 1 are the estimates of the intercept β 0 and the slope β 1, their values are called the regression coefficients. We want to choose b o and b 1 in such a way that the line is the best fit for the data observations (x i, y i ), i = 1,.., 14.
15 Estimation If we have trial values of b o and b 1, then the estimated or predicted values of the dependent variables are: y = b + bx ˆi 0 1 i
16 Estimation The difference then between the observed values of the dependent variable and the predicted values are called residuals and are: e = y yˆ = y b bx i i i i 0 1 i
17 Estimation Our goal is to select values of b o and b 1 in such a way that the residuals are as small as n n 2 2 possible min ( e) = min ( y yˆ ) Least Squares Method i i i i= 1 i= 1 b 1 = n i = 1 ( x x)( y y) n i i = 1 ( x x) i i 2 b0 = y bx 1
18 Estimation Advertising expenditures ($ million) Fitted values First Year Sales ($ million) Fitted values/first Year Sales ($ million)
19 Computer Output Source SS df MS Number of obs = F( 1, 12) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = expenditures Coef. Std. Err. t P> t [95% Conf. Interval] adv_exp _cons S A line can be drawn Coefficient of Determination Attempt to take into account the sampling Regression Coefficients Y = x P-values Confidence Intervals for the coefficients
20 Coefficient of Determination R 2 is the proportion of the total variation of the observed values of the dependent variable Y that is accounted for by the regression equation of the independent variables. Always 0 R 2 1. Closer to 1 means that the points lie closer to the straight line. That s s why the value of R 2 is frequently used to measure the extent to which the regression model fits the data. This is WRONG!!!!!!! There are other ways to determine whether a linear regression is valid or not.
21 Coefficient of Determination Fitted values/var var5 Fitted values var6 R 2 = Fitted values/advertising expenditures ($ million) R 2 = var4 Fitted values Advertising expenditures ($ million)
22 Coefficient of Determination and Sample Correlation It can be proved that R 2 = [corr[ (x,y) ] 2 Thus R 2 is the square of the sample correlation coefficient between the independent variable y and the dependent variable x. The estimate of the slope b 1 in the simple linear regression model can be written as S x b1 = corr( x, y) S y where S X and S Y are the sample standard deviations of x, y respectively.
23 Multiple Regression Simple Linear Regression is a model to predict the value of one variable from another. Multiple Regression is a natural extension of this model: We use it to predict values of an outcome from several predictors.
24 Predicting Sales of a product based on Multiple Factors Table: Sales of Nature-Bar, advertising expenditures, promotion expenditures, and competitors sales, by region, for Region Sales ($million) Y i Advertising Expenditures ($million) X 1i Promotions Expenditures ($million) X 2i Competitors Sales ($million) X 3i Selkirk Susquehanna Kittery Acton Finger Lakes Berkshire Central Providence Nashua Dunster Endicott Five-Towns Waldeboro Jackson Stowe
25 Predicting Sales of a product based on Multiple Factors Y: dependent variable sales of nature bar k = 3 independent variables x 1 = advertising expenditures x 2 = promotional expenditures x 3 = competitors sales n = 15 number of observations
26 Predicting Sales of a product based on Multiple Factors Y i = β 0 + β 1 x 1i + β 2 x 2i + β 3 x 3i + ε i, with i = 1,,n,n and ε i are observed values of independent Normally distributed random variables with mean 0 and standard deviation σ. β 0 : baseline; the value of Y when x 1 =x 2 =x 3 =0 β 1, β 2, β 3 denotes the change in Y per unit change in each x 1, x 2, x 3 respectively
27 Predicting Sales of a product based on Multiple Factors Y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 + ε, where ε ~Ν(0, (0,σ 2 ) Ε(Υ x 1,x 2,x 3 ) = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 and Standard Deviation (Υ ( x 1,x 2,x 3 ) = σ Does not depend on Does not depend on x 1,x 2,x 3
28 Predicting Sales of a product based on Multiple Factors Let b 0, b 1, b 2, b 3 be the estimates of β 0, β 1, β 2, β 3. The predicted values then are: ˆ i = i i i y b b x b x b x The residuals are: e = y yˆ = y b bx b x b x i i i i 0 1 1i 2 2i 3 3i
29 Predicting Sales of a product based on Multiple Factors The residual sum of squares is: n n n () e ˆ i = ( yi yi) = ( yi b0 bx 1 1i bx 2 2i bx 3 3i) i= 1 i= 1 i= 1 The best regression line is the one that chooses b 0, b 1, b 2, b 3 to minimize the above quantity.
30 Predicting Sales of a product based on Multiple Factors With our data it comes out that: Y = x x x 3 Based on the above regression let suppose that we want to predict sales of Nature-Bar for next year in the Nashua region given that we are planning to spend $0.7 million on advertising, $0.6 million on promotions and we estimate that competitors sales will remain flat at their current level of $31.30 million. Y = = $ million
31 Computer output Source SS df MS Number of obs = F( 3, 11) = Model Prob > F = Residual R-squared R = Adj R-squared = Total Root t MSE = sales Coef.. Std. Err. t P> t [95% Conf. Interval] Advertising promotion competitors _cons /23.625=2.53=t, β=0.95, df=15 =15-3-1, 1, c=2.201 from the tables of t For the C.I. we have ( , ) Let c be the number for which P(-c T c) c) = β/100, where T obeys a t-distribution t with (n-k-1) df.. If t >c we are confident at the β% confident level that the coefficient 0
32 Validation Linearity Normality of the ε i Heteroscedasticity Autocorrelation
33 Linearity The dependent variable Y depends linearly on the values of the independent variables x 1, x 2, x 3. When k=1 check that with a scatter plot With k>1 rely on common sense. Check value of R 2 but as discussed before with caution. You might need to add a quadratic term for example if there is a problem with linearity, or transform both the dependent and independent variables.
34 Normality The linear regression model Yi = β 0 + β 1 x 1i + β 2 x 2i + β 3 x 3i assumes that ε i ~Ν(0, (0,σ 2 ). 3i +ε i In order to check that plot a histogram of the regression residuals e = y yˆ = y b + bx + b x + b x i i i i 0 1 1i 2 2i 3 3i Frequency Residuals If there is evidence for no normality, you might need to transform your variables, usually the dependent.
35 Heteroscedasticity The linear regression model Y i = β 0 + β 1 x 1i + β 2 x 2i + β 3 x 3i +ε i assumes that ε i ~Ν(0, (0,σ 2 ),, i.e. all ε i have the same standard deviation. This property is called homoscedasticity. Plot residuals versus the independent variables or versus the fitted values yˆ i and check that there is no pattern. If there is a pattern you need to transform your dependent variable.
36 Heteroscedasticity Residuals advertising_expenditures
37 Heteroscedasticity
38 Autocorrelation The linear regression model Y i = β 0 + β 1 x 1i + β 2 x 2i + β 3 x 3i +ε i assumes that ε i ~Ν(0, (0,σ 2 ),, with ε i independent. The phenomenon of autocorrelation can occur if the assumption of independence is violated. Suppose that the regression model is specified with a time component (data for the last 14 weeks) Plot the residuals in time order of the observations and see if there is any kind of a pattern. If there is such a pattern then incorporate time as one of the independent variables.
39 Autocorrelation
40 Autocorrelation Residuals observation_number
41 Warnings and Issues 1. Overspecification by the addition of too many Independent Variables. Use only the independent variables that make sense. It is true that the more the better, since R 2 cannot be decreased by adding variables, but the simpler your model the better. n 5(k+2) Use stepwise multiple regression (start from the null model and add the best variables at each time until R 2 is quite large, or its increase is too small.
42 Warnings and Issues 2. Extrapolating beyond the Range of the Data. Y = x x x 3 Notice that all of the advertising expenditures (x 1 ) for the regions in the table with the data are between $0.4 and $1.9. The regression model is valid in this range. Thus it would be unwise to use the model to predict sales if we had spend for advertising purposes $10 million.
43 Warnings and Issues 3. Multicollinearity. Two independent variables are highly correlated. Should suspect it if R 2 is high but one or more of the variables does not pass the significant test. Check all correlations before running regression. If multicollinearity occurs, drop one of the independent variables that is highly correlated with another one.
44 Multicollinearity Table:Undergraduate grade point average (GPA), GMAT score and graduate school grade point average (GPA) for 25 MBA students Student Number Undergraduate GMAT GPA Graduate School GPA
45 Multicollinearity Graduate GPA = (Under. GPA) (GMAT) R 2 = Corr(under.. GPA, GMAT) = Not significant Graduate GPA = (Under. GPA) R 2 = significant
46 Outliers Observations that lie outside the overall pattern of the other observations. Observations with large residuals Observations falling far from the regression line while not following the pattern of the relationship apparent in the others
47 Outliers
48 Outliers Outliers can distort the regression results. Therefore many scientists remove them to have a better fitting. But be CAREFUL!!!!!!! Remove outliers only if you are sure that it is a bad data point. Transforming data is one way to soften the impact of outliers since the most commonly used expressions, square roots and logarithms, shrink larger values to a much greater extent than they shrink smaller values. Outliers should be investigated carefully. Often they contain valuable information about the process under investigation or the data gathering and recording process. Before considering the possible elimination of these points from the data, one should try to understand why they appeared and whether it is likely similar values will continue to appear. Of course, outliers are often bad data points.
49 Other Types of Regression Non linear (e.g. add a quadratic term)
50 Other Types of Regression Logistic Regression.. The independent variable Y is binary (common in medical research) Poisson Regression.. The independent variable Y is categorical.
51 Dummy Variables We would like to use linear regression to predict the effect that a particular phenomenon has on the value of the dependant variable, where the phenomenon in question either takes place or not
52 Dummy Variables Table: Annual Repair Costs for 19 vehicles at an automobile dealership Vehicle Age of Vehicle (Years) Automatic Transmission (Yes=1, No=0) Annual Repair Costs ($)
53 Dummy Variables Repair Cost = β 0 + β 1 x 1 + β 2 x 2 + ε, where ε ~Ν(0, (0,σ 2 ) R 2 = Coeff.. St.Err. Intercept Age Automatic Age Dummy Variable (x 2 =1 or 0 depending on weather or not the vehicle has an automatic transm.) Repair Cost = x x 2 Estimate of the additional annual repair cost if you have an automatic transmission
9. Linear Regression and Correlation
9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,
More informationSTATISTICS 110/201 PRACTICE FINAL EXAM
STATISTICS 110/201 PRACTICE FINAL EXAM Questions 1 to 5: There is a downloadable Stata package that produces sequential sums of squares for regression. In other words, the SS is built up as each variable
More informationChapter 4: Regression Models
Sales volume of company 1 Textbook: pp. 129-164 Chapter 4: Regression Models Money spent on advertising 2 Learning Objectives After completing this chapter, students will be able to: Identify variables,
More informationGeneral Linear Model (Chapter 4)
General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients
More informationFinal Exam. Question 1 (20 points) 2 (25 points) 3 (30 points) 4 (25 points) 5 (10 points) 6 (40 points) Total (150 points) Bonus question (10)
Name Economics 170 Spring 2004 Honor pledge: I have neither given nor received aid on this exam including the preparation of my one page formula list and the preparation of the Stata assignment for the
More informationCorrelation and Simple Linear Regression
Correlation and Simple Linear Regression Sasivimol Rattanasiri, Ph.D Section for Clinical Epidemiology and Biostatistics Ramathibodi Hospital, Mahidol University E-mail: sasivimol.rat@mahidol.ac.th 1 Outline
More informationLINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises
LINEAR REGRESSION ANALYSIS MODULE XVI Lecture - 44 Exercises Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Exercise 1 The following data has been obtained on
More informationMultiple Regression Methods
Chapter 1: Multiple Regression Methods Hildebrand, Ott and Gray Basic Statistical Ideas for Managers Second Edition 1 Learning Objectives for Ch. 1 The Multiple Linear Regression Model How to interpret
More informationRegression Models. Chapter 4. Introduction. Introduction. Introduction
Chapter 4 Regression Models Quantitative Analysis for Management, Tenth Edition, by Render, Stair, and Hanna 008 Prentice-Hall, Inc. Introduction Regression analysis is a very valuable tool for a manager
More information15.063: Communicating with Data
15.063: Communicating with Data Summer 2003 Recitation 6 Linear Regression Today s Content Linear Regression Multiple Regression Some Problems 15.063 - Summer '03 2 Linear Regression Why? What is it? Pros?
More informationLab 07 Introduction to Econometrics
Lab 07 Introduction to Econometrics Learning outcomes for this lab: Introduce the different typologies of data and the econometric models that can be used Understand the rationale behind econometrics Understand
More informationImmigration attitudes (opposes immigration or supports it) it may seriously misestimate the magnitude of the effects of IVs
Logistic Regression, Part I: Problems with the Linear Probability Model (LPM) Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 22, 2015 This handout steals
More informationMathematics for Economics MA course
Mathematics for Economics MA course Simple Linear Regression Dr. Seetha Bandara Simple Regression Simple linear regression is a statistical method that allows us to summarize and study relationships between
More informationCh 13 & 14 - Regression Analysis
Ch 3 & 4 - Regression Analysis Simple Regression Model I. Multiple Choice:. A simple regression is a regression model that contains a. only one independent variable b. only one dependent variable c. more
More informationECO220Y Simple Regression: Testing the Slope
ECO220Y Simple Regression: Testing the Slope Readings: Chapter 18 (Sections 18.3-18.5) Winter 2012 Lecture 19 (Winter 2012) Simple Regression Lecture 19 1 / 32 Simple Regression Model y i = β 0 + β 1 x
More informationStart with review, some new definitions, and pictures on the white board. Assumptions in the Normal Linear Regression Model
Start with review, some new definitions, and pictures on the white board. Assumptions in the Normal Linear Regression Model A1: There is a linear relationship between X and Y. A2: The error terms (and
More informationInterpreting coefficients for transformed variables
Interpreting coefficients for transformed variables! Recall that when both independent and dependent variables are untransformed, an estimated coefficient represents the change in the dependent variable
More informationStatistical Modelling in Stata 5: Linear Models
Statistical Modelling in Stata 5: Linear Models Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 07/11/2017 Structure This Week What is a linear model? How good is my model? Does
More informationChapter 4. Regression Models. Learning Objectives
Chapter 4 Regression Models To accompany Quantitative Analysis for Management, Eleventh Edition, by Render, Stair, and Hanna Power Point slides created by Brian Peterson Learning Objectives After completing
More informationSection Least Squares Regression
Section 2.3 - Least Squares Regression Statistics 104 Autumn 2004 Copyright c 2004 by Mark E. Irwin Regression Correlation gives us a strength of a linear relationship is, but it doesn t tell us what it
More informationLecture 5. In the last lecture, we covered. This lecture introduces you to
Lecture 5 In the last lecture, we covered. homework 2. The linear regression model (4.) 3. Estimating the coefficients (4.2) This lecture introduces you to. Measures of Fit (4.3) 2. The Least Square Assumptions
More informationData Analysis 1 LINEAR REGRESSION. Chapter 03
Data Analysis 1 LINEAR REGRESSION Chapter 03 Data Analysis 2 Outline The Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression Other Considerations in Regression Model Qualitative
More informationChapter 14 Student Lecture Notes 14-1
Chapter 14 Student Lecture Notes 14-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter 14 Multiple Regression Analysis and Model Building Chap 14-1 Chapter Goals After completing this
More informationProblem Set #3-Key. wage Coef. Std. Err. t P> t [95% Conf. Interval]
Problem Set #3-Key Sonoma State University Economics 317- Introduction to Econometrics Dr. Cuellar 1. Use the data set Wage1.dta to answer the following questions. a. For the regression model Wage i =
More informationECON3150/4150 Spring 2015
ECON3150/4150 Spring 2015 Lecture 3&4 - The linear regression model Siv-Elisabeth Skjelbred University of Oslo January 29, 2015 1 / 67 Chapter 4 in S&W Section 17.1 in S&W (extended OLS assumptions) 2
More informationAnswer all questions from part I. Answer two question from part II.a, and one question from part II.b.
B203: Quantitative Methods Answer all questions from part I. Answer two question from part II.a, and one question from part II.b. Part I: Compulsory Questions. Answer all questions. Each question carries
More informationConfidence Interval for the mean response
Week 3: Prediction and Confidence Intervals at specified x. Testing lack of fit with replicates at some x's. Inference for the correlation. Introduction to regression with several explanatory variables.
More informationPractice exam questions
Practice exam questions Nathaniel Higgins nhiggins@jhu.edu, nhiggins@ers.usda.gov 1. The following question is based on the model y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 + u. Discuss the following two hypotheses.
More informationSimple Linear Regression: One Qualitative IV
Simple Linear Regression: One Qualitative IV 1. Purpose As noted before regression is used both to explain and predict variation in DVs, and adding to the equation categorical variables extends regression
More informationLinear Modelling in Stata Session 6: Further Topics in Linear Modelling
Linear Modelling in Stata Session 6: Further Topics in Linear Modelling Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 14/11/2017 This Week Categorical Variables Categorical
More informationBusiness Statistics. Lecture 9: Simple Regression
Business Statistics Lecture 9: Simple Regression 1 On to Model Building! Up to now, class was about descriptive and inferential statistics Numerical and graphical summaries of data Confidence intervals
More information2.1. Consider the following production function, known in the literature as the transcendental production function (TPF).
CHAPTER Functional Forms of Regression Models.1. Consider the following production function, known in the literature as the transcendental production function (TPF). Q i B 1 L B i K i B 3 e B L B K 4 i
More informationBasic Business Statistics, 10/e
Chapter 4 4- Basic Business Statistics th Edition Chapter 4 Introduction to Multiple Regression Basic Business Statistics, e 9 Prentice-Hall, Inc. Chap 4- Learning Objectives In this chapter, you learn:
More informationApplied Statistics and Econometrics
Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple
More informationFinal Exam. Name: Solution:
Final Exam. Name: Instructions. Answer all questions on the exam. Open books, open notes, but no electronic devices. The first 13 problems are worth 5 points each. The rest are worth 1 point each. HW1.
More informationRegression Models - Introduction
Regression Models - Introduction In regression models there are two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent
More information3 Variables: Cyberloafing Conscientiousness Age
title 'Cyberloafing, Mike Sage'; run; PROC CORR data=sage; var Cyberloafing Conscientiousness Age; run; quit; The CORR Procedure 3 Variables: Cyberloafing Conscientiousness Age Simple Statistics Variable
More informationLecture 3: Multiple Regression. Prof. Sharyn O Halloran Sustainable Development U9611 Econometrics II
Lecture 3: Multiple Regression Prof. Sharyn O Halloran Sustainable Development Econometrics II Outline Basics of Multiple Regression Dummy Variables Interactive terms Curvilinear models Review Strategies
More informationECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests
ECON4150 - Introductory Econometrics Lecture 5: OLS with One Regressor: Hypothesis Tests Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 5 Lecture outline 2 Testing Hypotheses about one
More information5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is
Practice Final Exam Last Name:, First Name:. Please write LEGIBLY. Answer all questions on this exam in the space provided (you may use the back of any page if you need more space). Show all work but do
More informationBasic Business Statistics 6 th Edition
Basic Business Statistics 6 th Edition Chapter 12 Simple Linear Regression Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of a dependent variable based
More informationLecture 4: Multivariate Regression, Part 2
Lecture 4: Multivariate Regression, Part 2 Gauss-Markov Assumptions 1) Linear in Parameters: Y X X X i 0 1 1 2 2 k k 2) Random Sampling: we have a random sample from the population that follows the above
More informationUnit 6 - Introduction to linear regression
Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,
More informationLecture 10 Multiple Linear Regression
Lecture 10 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 10-1 Topic Overview Multiple Linear Regression Model 10-2 Data for Multiple Regression Y i is the response variable
More information1: a b c d e 2: a b c d e 3: a b c d e 4: a b c d e 5: a b c d e. 6: a b c d e 7: a b c d e 8: a b c d e 9: a b c d e 10: a b c d e
Economics 102: Analysis of Economic Data Cameron Spring 2016 Department of Economics, U.C.-Davis Final Exam (A) Tuesday June 7 Compulsory. Closed book. Total of 58 points and worth 45% of course grade.
More informationLecture 24: Partial correlation, multiple regression, and correlation
Lecture 24: Partial correlation, multiple regression, and correlation Ernesto F. L. Amaral November 21, 2017 Advanced Methods of Social Research (SOCI 420) Source: Healey, Joseph F. 2015. Statistics: A
More informationAcknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression
INTRODUCTION TO CLINICAL RESEARCH Introduction to Linear Regression Karen Bandeen-Roche, Ph.D. July 17, 2012 Acknowledgements Marie Diener-West Rick Thompson ICTR Leadership / Team JHU Intro to Clinical
More informationQuestion 1 carries a weight of 25%; Question 2 carries 20%; Question 3 carries 20%; Question 4 carries 35%.
UNIVERSITY OF EAST ANGLIA School of Economics Main Series PGT Examination 017-18 ECONOMETRIC METHODS ECO-7000A Time allowed: hours Answer ALL FOUR Questions. Question 1 carries a weight of 5%; Question
More informationBusiness Statistics. Lecture 10: Course Review
Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,
More informationInteractions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept
Interactions Lectures 1 & Regression Sometimes two variables appear related: > smoking and lung cancers > height and weight > years of education and income > engine size and gas mileage > GMAT scores and
More informationCan you tell the relationship between students SAT scores and their college grades?
Correlation One Challenge Can you tell the relationship between students SAT scores and their college grades? A: The higher SAT scores are, the better GPA may be. B: The higher SAT scores are, the lower
More informationsociology 362 regression
sociology 36 regression Regression is a means of studying how the conditional distribution of a response variable (say, Y) varies for different values of one or more independent explanatory variables (say,
More informationEconometrics Midterm Examination Answers
Econometrics Midterm Examination Answers March 4, 204. Question (35 points) Answer the following short questions. (i) De ne what is an unbiased estimator. Show that X is an unbiased estimator for E(X i
More informationECON3150/4150 Spring 2016
ECON3150/4150 Spring 2016 Lecture 4 - The linear regression model Siv-Elisabeth Skjelbred University of Oslo Last updated: January 26, 2016 1 / 49 Overview These lecture slides covers: The linear regression
More informationApplied Statistics and Econometrics
Applied Statistics and Econometrics Lecture 13 Nonlinearities Saul Lach October 2018 Saul Lach () Applied Statistics and Econometrics October 2018 1 / 91 Outline of Lecture 13 1 Nonlinear regression functions
More informationTrendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues
Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues Overfitting Categorical Variables Interaction Terms Non-linear Terms Linear Logarithmic y = a +
More informationLecture 4: Multivariate Regression, Part 2
Lecture 4: Multivariate Regression, Part 2 Gauss-Markov Assumptions 1) Linear in Parameters: Y X X X i 0 1 1 2 2 k k 2) Random Sampling: we have a random sample from the population that follows the above
More informationInstructions: Closed book, notes, and no electronic devices. Points (out of 200) in parentheses
ISQS 5349 Final Spring 2011 Instructions: Closed book, notes, and no electronic devices. Points (out of 200) in parentheses 1. (10) What is the definition of a regression model that we have used throughout
More informationLecture notes on Regression & SAS example demonstration
Regression & Correlation (p. 215) When two variables are measured on a single experimental unit, the resulting data are called bivariate data. You can describe each variable individually, and you can also
More informationUnit 6 - Simple linear regression
Sta 101: Data Analysis and Statistical Inference Dr. Çetinkaya-Rundel Unit 6 - Simple linear regression LO 1. Define the explanatory variable as the independent variable (predictor), and the response variable
More informationECON 497 Final Exam Page 1 of 12
ECON 497 Final Exam Page of 2 ECON 497: Economic Research and Forecasting Name: Spring 2008 Bellas Final Exam Return this exam to me by 4:00 on Wednesday, April 23. It may be e-mailed to me. It may be
More informationUnit 11: Multiple Linear Regression
Unit 11: Multiple Linear Regression Statistics 571: Statistical Methods Ramón V. León 7/13/2004 Unit 11 - Stat 571 - Ramón V. León 1 Main Application of Multiple Regression Isolating the effect of a variable
More informationECON Introductory Econometrics. Lecture 17: Experiments
ECON4150 - Introductory Econometrics Lecture 17: Experiments Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 13 Lecture outline 2 Why study experiments? The potential outcome framework.
More informationSTAT Chapter 11: Regression
STAT 515 -- Chapter 11: Regression Mostly we have studied the behavior of a single random variable. Often, however, we gather data on two random variables. We wish to determine: Is there a relationship
More informationTHE MULTIVARIATE LINEAR REGRESSION MODEL
THE MULTIVARIATE LINEAR REGRESSION MODEL Why multiple regression analysis? Model with more than 1 independent variable: y 0 1x1 2x2 u It allows : -Controlling for other factors, and get a ceteris paribus
More information27. SIMPLE LINEAR REGRESSION II
27. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.
More informationCorrelation & Simple Regression
Chapter 11 Correlation & Simple Regression The previous chapter dealt with inference for two categorical variables. In this chapter, we would like to examine the relationship between two quantitative variables.
More informationProblem Set #5-Key Sonoma State University Dr. Cuellar Economics 317- Introduction to Econometrics
Problem Set #5-Key Sonoma State University Dr. Cuellar Economics 317- Introduction to Econometrics C1.1 Use the data set Wage1.dta to answer the following questions. Estimate regression equation wage =
More informationQuestion 1a 1b 1c 1d 1e 2a 2b 2c 2d 2e 2f 3a 3b 3c 3d 3e 3f M ult: choice Points
Economics 102: Analysis of Economic Data Cameron Spring 2016 May 12 Department of Economics, U.C.-Davis Second Midterm Exam (Version A) Compulsory. Closed book. Total of 30 points and worth 22.5% of course
More informationLinear Regression Measurement & Evaluation of HCC Systems
Linear Regression Measurement & Evaluation of HCC Systems Linear Regression Today s goal: Evaluate the effect of multiple variables on an outcome variable (regression) Outline: - Basic theory - Simple
More informationsociology 362 regression
sociology 36 regression Regression is a means of modeling how the conditional distribution of a response variable (say, Y) varies for different values of one or more independent explanatory variables (say,
More informationExercices for Applied Econometrics A
QEM F. Gardes-C. Starzec-M.A. Diaye Exercices for Applied Econometrics A I. Exercice: The panel of households expenditures in Poland, for years 1997 to 2000, gives the following statistics for the whole
More informationNonlinear Regression Functions
Nonlinear Regression Functions (SW Chapter 8) Outline 1. Nonlinear regression functions general comments 2. Nonlinear functions of one variable 3. Nonlinear functions of two variables: interactions 4.
More informationRegression analysis is a tool for building mathematical and statistical models that characterize relationships between variables Finds a linear
Regression analysis is a tool for building mathematical and statistical models that characterize relationships between variables Finds a linear relationship between: - one independent variable X and -
More informationHomework 2: Simple Linear Regression
STAT 4385 Applied Regression Analysis Homework : Simple Linear Regression (Simple Linear Regression) Thirty (n = 30) College graduates who have recently entered the job market. For each student, the CGPA
More informationCorrelation Analysis
Simple Regression Correlation Analysis Correlation analysis is used to measure strength of the association (linear relationship) between two variables Correlation is only concerned with strength of the
More informationChapter 15 Multiple Regression
Multiple Regression Learning Objectives 1. Understand how multiple regression analysis can be used to develop relationships involving one dependent variable and several independent variables. 2. Be able
More informationProblem 4.1. Problem 4.3
BOSTON COLLEGE Department of Economics EC 228 01 Econometric Methods Fall 2008, Prof. Baum, Ms. Phillips (tutor), Mr. Dmitriev (grader) Problem Set 3 Due at classtime, Thursday 14 Oct 2008 Problem 4.1
More informationIntroduction to Regression
Regression Introduction to Regression If two variables covary, we should be able to predict the value of one variable from another. Correlation only tells us how much two variables covary. In regression,
More informationECON Introductory Econometrics. Lecture 6: OLS with Multiple Regressors
ECON4150 - Introductory Econometrics Lecture 6: OLS with Multiple Regressors Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 6 Lecture outline 2 Violation of first Least Squares assumption
More informationBinary Dependent Variables
Binary Dependent Variables In some cases the outcome of interest rather than one of the right hand side variables - is discrete rather than continuous Binary Dependent Variables In some cases the outcome
More informationregression analysis is a type of inferential statistics which tells us whether relationships between two or more variables exist
regression analysis is a type of inferential statistics which tells us whether relationships between two or more variables exist sales $ (y - dependent variable) advertising $ (x - independent variable)
More information4.1 Least Squares Prediction 4.2 Measuring Goodness-of-Fit. 4.3 Modeling Issues. 4.4 Log-Linear Models
4.1 Least Squares Prediction 4. Measuring Goodness-of-Fit 4.3 Modeling Issues 4.4 Log-Linear Models y = β + β x + e 0 1 0 0 ( ) E y where e 0 is a random error. We assume that and E( e 0 ) = 0 var ( e
More informationSTAT 212 Business Statistics II 1
STAT 1 Business Statistics II 1 KING FAHD UNIVERSITY OF PETROLEUM & MINERALS DEPARTMENT OF MATHEMATICAL SCIENCES DHAHRAN, SAUDI ARABIA STAT 1: BUSINESS STATISTICS II Semester 091 Final Exam Thursday Feb
More informationDecision 411: Class 7
Decision 411: Class 7 Confidence limits for sums of coefficients Use of the time index as a regressor The difficulty of predicting the future Confidence intervals for sums of coefficients Sometimes the
More informationInferences for Regression
Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In
More informationChapter 9. Correlation and Regression
Chapter 9 Correlation and Regression Lesson 9-1/9-2, Part 1 Correlation Registered Florida Pleasure Crafts and Watercraft Related Manatee Deaths 100 80 60 40 20 0 1991 1993 1995 1997 1999 Year Boats in
More informationSampling Distributions in Regression. Mini-Review: Inference for a Mean. For data (x 1, y 1 ),, (x n, y n ) generated with the SRM,
Department of Statistics The Wharton School University of Pennsylvania Statistics 61 Fall 3 Module 3 Inference about the SRM Mini-Review: Inference for a Mean An ideal setup for inference about a mean
More informationMultiple Regression. Peerapat Wongchaiwat, Ph.D.
Peerapat Wongchaiwat, Ph.D. wongchaiwat@hotmail.com The Multiple Regression Model Examine the linear relationship between 1 dependent (Y) & 2 or more independent variables (X i ) Multiple Regression Model
More informationRegression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.
Regression Analysis BUS 735: Business Decision Making and Research 1 Goals of this section Specific goals Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate
More informationLI EAR REGRESSIO A D CORRELATIO
CHAPTER 6 LI EAR REGRESSIO A D CORRELATIO Page Contents 6.1 Introduction 10 6. Curve Fitting 10 6.3 Fitting a Simple Linear Regression Line 103 6.4 Linear Correlation Analysis 107 6.5 Spearman s Rank Correlation
More informationPlease discuss each of the 3 problems on a separate sheet of paper, not just on a separate page!
Econometrics - Exam May 11, 2011 1 Exam Please discuss each of the 3 problems on a separate sheet of paper, not just on a separate page! Problem 1: (15 points) A researcher has data for the year 2000 from
More informationProblem Set 1 ANSWERS
Economics 20 Prof. Patricia M. Anderson Problem Set 1 ANSWERS Part I. Multiple Choice Problems 1. If X and Z are two random variables, then E[X-Z] is d. E[X] E[Z] This is just a simple application of one
More informationAt this point, if you ve done everything correctly, you should have data that looks something like:
This homework is due on July 19 th. Economics 375: Introduction to Econometrics Homework #4 1. One tool to aid in understanding econometrics is the Monte Carlo experiment. A Monte Carlo experiment allows
More informationCorrelation and Linear Regression
Correlation and Linear Regression Correlation: Relationships between Variables So far, nearly all of our discussion of inferential statistics has focused on testing for differences between group means
More informationUniversity of California at Berkeley Fall Introductory Applied Econometrics Final examination. Scores add up to 125 points
EEP 118 / IAS 118 Elisabeth Sadoulet and Kelly Jones University of California at Berkeley Fall 2008 Introductory Applied Econometrics Final examination Scores add up to 125 points Your name: SID: 1 1.
More informationLecture 12: Interactions and Splines
Lecture 12: Interactions and Splines Sandy Eckel seckel@jhsph.edu 12 May 2007 1 Definition Effect Modification The phenomenon in which the relationship between the primary predictor and outcome varies
More informationEssential of Simple regression
Essential of Simple regression We use simple regression when we are interested in the relationship between two variables (e.g., x is class size, and y is student s GPA). For simplicity we assume the relationship
More informationLinear Regression with Multiple Regressors
Linear Regression with Multiple Regressors (SW Chapter 6) Outline 1. Omitted variable bias 2. Causality and regression analysis 3. Multiple regression and OLS 4. Measures of fit 5. Sampling distribution
More informationStatistics for Managers using Microsoft Excel 6 th Edition
Statistics for Managers using Microsoft Excel 6 th Edition Chapter 13 Simple Linear Regression 13-1 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of
More information