L6: Regression II. JJ Chen. July 2, 2015

Size: px
Start display at page:

Download "L6: Regression II. JJ Chen. July 2, 2015"

Transcription

1 L6: Regression II JJ Chen July 2, 2015

2 Today s Plan Review basic inference based on Sample average Difference in sample average Extrapolate the knowledge to sample regression coefficients Standard error, robust standard error Hypothesis testing Confidence interval

3 Population Parameters

4 Toy Population Data Some population data on people s college type and their later earnings Yi: Earnings Pi: Dummy variable for attending private college Suppose there re 1,000,000 people in the population Lot of observations: measures such as expected value, variance, and standard deviations are useful to summarize the data

5 Data Source: local data frame [1,000,000 x 2] P Y

6 Distribution: Count

7 Distribution: Density

8 E, Var, SD E(Y) Var(Y) SD(Y) n(y) 78, ,266,203 19,500 1,000,000 E(Y) Var(Y) SD(Y),, and are population parameters Their Greek names are:,, and μ Y σ 2 Y σ Y

9 E, Var, SD E(Y) Var(Y) SD(Y) n(y) 78, ,759,816 19,462 1,000,000 Expected value measures central tendency Variance measures dispersion of the data Standard deviation is the square root of variance SD has the same unit as Yi Expectation and standard deviation are also useful for doing normalization

10 Normalization I To review the idea of normalization (something you already know from stat class), suppose we also have data on another population Maybe a country that has severe inflation E(Y2) Var(Y2) SD(Y2) N(Y2) 7,870,027 14,751,343,877,384 3,840, ,000

11 Normalization II If we want to compare two persons earning, but they are from two different population person A from population 1: $48,527 person Z from population 2: #4,558,819 A natural thing to do is to normalize their earnings so that they are comparable Many ways of normalization, usually done by finding a fixed unit, for example The amount of dollars person Z can buy The numbers of coffee beans both person can buy

12 Normalization III Expectation and standard deviation can be used to do normalization Step 1: Find deviation from mean Step 2: Rescale the deviation using std. dev. For example: Person A: Person Z: 4,558,819 E[Y2 i ] SD(Y2 i ) 48,527 E[ Y i ] SD( ) Y i = ( )/19456 = 1 = ( )/ = These scores are normalized deviations from means of two population

13 Normalization IV Suppose we randomly pick two persons from the first sample: Person A from population 1: $48,527 Person B from population 1: $95,301 We can also use normalization to say somethings about the relative position of their earnings: Person A: 48,527 E[ Y i ] SD( ) Y i Person B: 95,301 E[ Y i ] SD( ) Y i = (48, , 541)/19, 456 = 1.5 = (95, , 541)/19, 456 = 0.86

14 Conditional Parameters Back to the first population, we can focus on more interesting population expectation CE: E[ ] = E[ Y 1 i Yi Pi = 1] CE: E[ ] = E[ = 0] Y 0 i Yi Pi E[ Yi Pi] Together they makes a CEF: For each group, we can also find it s population variance (and standard deviation) CV: Var( Y 1 i ), Var( Y 1 i ) Sometimes CVs are the same, but most of the time are not

15 Conditional Distribution

16 Pop Scattor Plot

17 Pop Scattor Plot: Jitter Version

18 Pop Conditional Expectation

19 Pop Conditional Expectation Func

20 Population Regression Call: lm(formula = Y ~ P, data = college) Coefficients: (Intercept) P

21 Population Regression: Plot

22 Pop Expectation P Cond. E(Y) Std. Dev. Total Observations 0 69,978 17, , ,015 14, ,967

23 Population Parameters To summarize, we have the following fixed population parameters: E[ ] = 78, 614 Yi SD( ) = 19, 466 Yi E[ = 0] = μ Yi Pi 0 = 69, 949 E[ = 1] = μ Yi Pi 1 = 89, 926 E[ = 1] E[ = 0] = Δμ = 19, 977 Yi Pi Yi Pi SD( Y 0 ) = σ Y 0 = 17, 981 SD( Y 1 ) = σ Y 1 = 14, 976 α = 69, 949 Reg. Intercept: Reg. Slope: β = 19, 977

24 Unknown Population Of course, we don t have population data for many problems Thus, we really don t know all the population parameters Statisticians use a sample to make inference about a population

25 Inference: Sample Average

26 Sample 1 Suppose we get a 1% sample from the population Sample 1: Summary Statistics Sample Average Sample Variance Sample Standard Deviation Sample Size P 78, ,454,177 19,505 10,000 Sample Average Sample 1: Summary Statistics for Groups Sample Variance Sample Standard Deviation Sample Size 0 69, ,719,278 17,909 5, , ,719,421 15,057 4,303

27 Sample 2 Suppose we are lucky and have another 1% sample from the population Sample 2: Summary Statistics Sample Average Sample Variance Sample Standard Deviation Sample Size P 78, ,841,427 19,541 10,000 Sample Average Sample 2: Summary Statistics for Groups Sample Variance Sample Standard Deviation Sample Size 0 69, ,349,745 17,842 5, , ,887,797 15,063 4,330

28 Sample 3 Another 1% sample Sample 3: Summary Statistics Sample Average Sample Variance Sample Standard Deviation Sample Size P 78, ,556,306 19,585 10,000 Sample Average Sample 3: Summary Statistics for Groups Sample Variance Sample Standard Deviation Sample Size 0 70, ,989,121 18,083 5, , ,058,638 15,201 4,336

29 Sample Average is Random All the 3 sample give different sample statistics Since sample average is random, it also has mean and standard error E( Ȳ 1 i ) = E( ) SE( Ȳ 1 i ) = Y σ 1 i Y n Coming from square root of sampling variance Var( ) Ȳ 1 i

30 Review Sampling Variance V( Ȳ) = V ([ 1 n ]) Yi def. of Ȳ n i=1 = 1 V ( n ) n 2 i=1 since V(aY) = V(Y) Yi a 2 = = 1 n 2 i=1 1 n n n 2 i=1 V( ) each is indecently drawn Yi V(Y) each Yi Yi is from a same distribution

31 Review of Sampling Variance Simplifying further V( Ȳ) = 1 n n 2 i=1 V(Y) = 1 n 2 i=1 n σ 2 Y Greek name = nσ 2 Y sum of n identical quantity = n 2 cancel n σ 2 Y n

32 The Problem All the 3 sample give different sample statistics, but they all are very close to population parameters Sample 1: Ȳ 1 i = 90, 132 Sample 2: Ȳ 1 i = 89, 888 Sample 3: Ȳ 1 i = 90, 303 Population: E[ 1 Y i ] = 89, 926 The difficulties, of course, are that 1. We don t know population parameters 2. We don t have that many sample The problem is: how to draw inference based on just one sample?

33 Hypothesis Testing Hypothesis testing is one way of drawing inference We don t know population parameters? Make a guess, say: μ 1 = 90, 000 We only have one sample? Make a normalization based on (1) the assumed expectation and (2) standard error of sample average and find the relative position, say for sample 1 Ȳ 1 μ i 1 SE( Yi 1 ) = 90, , 000 SE( Yi 1 )

34 Scaled by Est. SE Very bad, the standard error of Ȳ 1 i also contains a population parameter we don t know: SE( ) = Ȳ 1 i We use sample standard deviation, the estimated SE: σ Y n σ Y Est. SE( Ȳ 1 i ) = SE^ ( Ȳ 1 i ) =, instead and get σ Y n

35 Rescaling Scores

36 The rescaling score based on sample 1 is = = Ȳ 1 μ i 1 SE^ ( Yi 1 ) 90, , 000 SE^ ( Yi 1 ) 90, , 000 σ Y n = 0.73 That is, the sample average based on the hypothesis that μ 1 = 90, 000 has a position of 0.73 standard error above the mean = 90, , ,964 10,000

37 Is it good? t value and t stat The score is a particular value calculated based on the rescaling formula: t Ȳ 1 μ i 1 SE^ ( Yi 1 ) We call the score a value and the formula is used to construct a statistics t

38 t CLT The stat has a very good property: IF E[ Yi] is indeed equal to μ, then as long as the sample is large enough the quantity t(μ) has a sampling distribution that is very close to a bell shaped standard normal distribution, regardless how Yi is distributed.

39 Hypothesis Testing Reasoning 90, 132 Say sample 1 gives a sample average of Make a guess about population expectation, say, Normalize the sample average and get the value of t 90, 000

40 Hypothesis Testing Reasoning If our guess in indeed true and the sample is large, stat should follow a standard normal distribution, and so a t value of 0.73 is no bad (within the range of [ 2, +2] ) That is, under the null hypothesis that our guess is true, the likelihood of having this t value is acceptable ( p value = 0.77, not that small) Conclusion/Decision: The sample doesn t provide strong evidence to reject the null; we choose to live with our guess unless we find new evidence later t

41 Confidence Interval Making reasonable guess asks us to think, but to save our ene thinking ideas, why not just give a range of population expec Confidence interval is another way of drawing inference When calculated in repeated samples, the 95% confidence inter approximately The first sample gives a CI: [ 2 ( ), + 2 ( )] Ȳ SE^ Ȳ Ȳ SE^ Ȳ [90, , 90, ] = [89773

42 Inference: Difference in Sample Average

43 HT and CI We re also interested in difference in pollution expectation, Δμ A naive comparison Making inference based on difference in sample average is almost the same Hypothesis Testing: normalization score based on difference in sample average,, and standard error, SE(Δ ) Ȳi Confidence Interval: The only complication is the estimated standard error formula: ΔȲi [Δ 2 SE^ (Δ Ȳ), Δ Ȳ + 2 SE^ (Δ Ȳ)] Ȳi SE^ (Δ Ȳ)

44 Complication To see the estimated standard error formula, recall that it s based on a standard error formula, which is based on sampling variance of, Δ = Ȳi Ȳ 1 i Ȳ 2 i V(ΔY) = V( Ȳ 1 Ȳ 0 ) def. of ΔY = V( Ȳ 1 ) + V( Ȳ 0 ) each Yi is indeed. drawn V 1 ( Yi) V 0 ( Yi) = + def. of sampling var. n 1 n 0

45 Complication V 1 ( Yi) V 0 ( Yi) V(ΔY) = + n 1 The complication is that whether you want to make an assumption that V 1 ( Yi) = V 0 ( Yi) That is, whether earnings for group 1 and group 2 have a same population variance (and hence standard standard deviation) of the underlying variable, n 0 Yi

46 Pop Box Plot

47 Pop Conditional Distribution

48 Unequal Var. In our particular example, population group variance are difference, so we have sampling variance, standard error, and estimated standard error for unequal group variance V 1 ( Yi) V 0 ( Yi) V(ΔY) = + n 1 n 0 V 1 ( Yi) V 0 ( Yi) SE(ΔY) = + n 1 n 0 SE^ (ΔY) = V 1 ( Yi ) V 0 ( Yi) +

49 n 1 n 0 Equal Var. If you re willing to assume equal group variance, the three formula can be simplified as )[ ( ) ( ) 1 1 ] V(ΔY) = + = V( + n 1 n 1 V 1 Yi SE(ΔY) = SD( ) Yi [ SE^ (ΔY) = SD^ [ ( Yi ) V 0 Yi n ] + n 1 n ] + n 1 n 0 Yi n 0

50 What to Choose? In most cases, especially when data is not from an experimental setting, the assumption of equal variance would be too strong

51 Hypothesis Testing Reasoning Assume group variances are not equal, we can now perform hypothesis testing, just like what we did for sample average We have a difference in sample average, ΔYi, but we want to know the difference in population expectation, Δμ = Δ E[ ] = E[ = 1] E[ = 0] Yi Y 1 i We don t know population parameters, so we make a guess, say Δμ = a number Normalize the difference in sample average and get a t value Given the guess, see if the value is too dramatic such t Pi Y 0 i Pi

52 that it s unacceptable Confidence Interval Again, confidence interval is just the similar bound: [Δ Ȳ 2 SE^ (Δ Ȳ), Δ Ȳ + 2 SE^ (Δ Ȳ)]

53 Test Sample Means by Group P mean(y) var(y) sd(y) n(y) 0 69, ,719,278 17,909 5, , ,719,421 15,057 4,303 Sample Mean Comparison Difference in Sample Mean Est. Std. Err. (323.5)

54 Inference: Sample Regression Coefficient

55 Bivariate Sample Regression Consider again a bivariate regression with a dummy variable But this time the regression is estimated based on a sample Call: lm(formula = Y ~ P, data = sam1) Coefficients: (Intercept) P

56 Estimated Regression The estimated regression is α β Pi Y^i = + = Pi How would you interpret the estimated slope?

57 The Logic Carries The estimated slope gives the difference in sample average So if we re worry about sampling uncertainty associated with the estimated slope, we can also use an estimated standard error Hypothesis Testing Confidence Interval

58 Displaying Reg. Results Software always give estimated coefficients and estimated standard errors: term estimate std.error statistic p.value 1 (Intercept) P It s customary to display both estimated regression coefficients, and estimated standard error

59 Displaying An Equation term estimate std.error statistic p.value 1 (Intercept) P Y^i = Pi (221.8) (338.13) Again, estimated standard error tells us how precise a coefficient is being estimated

60 Display An Table term estimate std.error statistic p.value 1 (Intercept) P A more common way to display regression results is regression table: A Regression Table P Y (332.5) Intercept Note: Estimated standard errors are in parenthesis. (217.9)

61 Complication Estimated slope is giving the difference in sampleaverage in bivariate regression with a dummy variable Just like the standard errors of difference in sampleaverage, there re two basic ways of computing standard errors for estimated reg. coef.: Assume equal group variance : Homoskedasticity Assume unequal group variance : Heteroskedasiticity

62 Homoskedasticity Homoskedasticity is an old fashioned assumption Throughout the world, perhaps only undergrads taking introductory stat/ metrics are using them Homoskedasticity gives an simple formula for std. err. and est. std. err: and σ e SE( β ) = n σ e SE^ ( β ) = n 1 σ X 1 σ X

63 Heteroskedasiticity A more realistic assumption is Heteroskedasiticity, and it gives another formulas, usually called (Est.) Robust Standard Error formulas and RSE( β ) = RSE( ^ β ) = 1 n 1 n V( Xie i ) (σx 2 i ) 2 ( ) V^ Xie i (σ 2 X i ) 2

64 HT and CI Hypothesis testing and confidence interval are based on the similar recipe: t(β) = β β RSE ^ ( β ) [ 2 RSE ^ ( β ), β + 2 RSE ^ ( )] β β

65 Demo Many software still use homoskedasticity standard errors as default, so usually we need to put a few extra efforts to get the est. std. err. we want In Stata, this can be done by specifying the robust option: reg Y P, robust In R, it s still a bit complicated

66 Demo: Homoskedasticity Std. Err. lm(y ~ P, data = sam1) Call: lm(formula = Y ~ P, data = sam1) Coefficients: (Intercept) P est.reg = lm(y ~ P, data = sam1)

67 Demo: Homoskedasticity Std. Err. library(broom) tidy(est.reg) term estimate std.error statistic p.value 1 (Intercept) P

68 Demo: Heteroskedasticity Std. Err. library(sandwich) vcovhc(est.reg) (Intercept) P (Intercept) P sqrt(55143) [1] 235 sqrt(104667) [1] 324

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals (SW Chapter 5) Outline. The standard error of ˆ. Hypothesis tests concerning β 3. Confidence intervals for β 4. Regression

More information

Linear Regression with 1 Regressor. Introduction to Econometrics Spring 2012 Ken Simons

Linear Regression with 1 Regressor. Introduction to Econometrics Spring 2012 Ken Simons Linear Regression with 1 Regressor Introduction to Econometrics Spring 2012 Ken Simons Linear Regression with 1 Regressor 1. The regression equation 2. Estimating the equation 3. Assumptions required for

More information

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont. TCELL 9/4/205 36-309/749 Experimental Design for Behavioral and Social Sciences Simple Regression Example Male black wheatear birds carry stones to the nest as a form of sexual display. Soler et al. wanted

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression 36-309/749 Experimental Design for Behavioral and Social Sciences Sep. 22, 2015 Lecture 4: Linear Regression TCELL Simple Regression Example Male black wheatear birds carry stones to the nest as a form

More information

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters Objectives 10.1 Simple linear regression Statistical model for linear regression Estimating the regression parameters Confidence interval for regression parameters Significance test for the slope Confidence

More information

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Course Review Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,

More information

Lecture 18: Simple Linear Regression

Lecture 18: Simple Linear Regression Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength

More information

The Simple Linear Regression Model

The Simple Linear Regression Model The Simple Linear Regression Model Lesson 3 Ryan Safner 1 1 Department of Economics Hood College ECON 480 - Econometrics Fall 2017 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 1 / 77 Bivariate

More information

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests ECON4150 - Introductory Econometrics Lecture 5: OLS with One Regressor: Hypothesis Tests Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 5 Lecture outline 2 Testing Hypotheses about one

More information

Business Statistics. Lecture 9: Simple Regression

Business Statistics. Lecture 9: Simple Regression Business Statistics Lecture 9: Simple Regression 1 On to Model Building! Up to now, class was about descriptive and inferential statistics Numerical and graphical summaries of data Confidence intervals

More information

General Linear Model (Chapter 4)

General Linear Model (Chapter 4) General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients

More information

y response variable x 1, x 2,, x k -- a set of explanatory variables

y response variable x 1, x 2,, x k -- a set of explanatory variables 11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate

More information

Inference with Simple Regression

Inference with Simple Regression 1 Introduction Inference with Simple Regression Alan B. Gelder 06E:071, The University of Iowa 1 Moving to infinite means: In this course we have seen one-mean problems, twomean problems, and problems

More information

Statistics and Quantitative Analysis U4320. Segment 10 Prof. Sharyn O Halloran

Statistics and Quantitative Analysis U4320. Segment 10 Prof. Sharyn O Halloran Statistics and Quantitative Analysis U4320 Segment 10 Prof. Sharyn O Halloran Key Points 1. Review Univariate Regression Model 2. Introduce Multivariate Regression Model Assumptions Estimation Hypothesis

More information

Regression and the 2-Sample t

Regression and the 2-Sample t Regression and the 2-Sample t James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Regression and the 2-Sample t 1 / 44 Regression

More information

df=degrees of freedom = n - 1

df=degrees of freedom = n - 1 One sample t-test test of the mean Assumptions: Independent, random samples Approximately normal distribution (from intro class: σ is unknown, need to calculate and use s (sample standard deviation)) Hypotheses:

More information

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference. Understanding regression output from software Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals In 1966 Cyril Burt published a paper called The genetic determination of differences

More information

1 Independent Practice: Hypothesis tests for one parameter:

1 Independent Practice: Hypothesis tests for one parameter: 1 Independent Practice: Hypothesis tests for one parameter: Data from the Indian DHS survey from 2006 includes a measure of autonomy of the women surveyed (a scale from 0-10, 10 being the most autonomous)

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 5 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 44 Outline of Lecture 5 Now that we know the sampling distribution

More information

9. Linear Regression and Correlation

9. Linear Regression and Correlation 9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,

More information

LECTURE 15: SIMPLE LINEAR REGRESSION I

LECTURE 15: SIMPLE LINEAR REGRESSION I David Youngberg BSAD 20 Montgomery College LECTURE 5: SIMPLE LINEAR REGRESSION I I. From Correlation to Regression a. Recall last class when we discussed two basic types of correlation (positive and negative).

More information

Lecture 3: Inference in SLR

Lecture 3: Inference in SLR Lecture 3: Inference in SLR STAT 51 Spring 011 Background Reading KNNL:.1.6 3-1 Topic Overview This topic will cover: Review of hypothesis testing Inference about 1 Inference about 0 Confidence Intervals

More information

ECON3150/4150 Spring 2015

ECON3150/4150 Spring 2015 ECON3150/4150 Spring 2015 Lecture 3&4 - The linear regression model Siv-Elisabeth Skjelbred University of Oslo January 29, 2015 1 / 67 Chapter 4 in S&W Section 17.1 in S&W (extended OLS assumptions) 2

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple

More information

Binary Logistic Regression

Binary Logistic Regression The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b

More information

where Female = 0 for males, = 1 for females Age is measured in years (22, 23, ) GPA is measured in units on a four-point scale (0, 1.22, 3.45, etc.

where Female = 0 for males, = 1 for females Age is measured in years (22, 23, ) GPA is measured in units on a four-point scale (0, 1.22, 3.45, etc. Notes on regression analysis 1. Basics in regression analysis key concepts (actual implementation is more complicated) A. Collect data B. Plot data on graph, draw a line through the middle of the scatter

More information

Two Sample Problems. Two sample problems

Two Sample Problems. Two sample problems Two Sample Problems Two sample problems The goal of inference is to compare the responses in two groups. Each group is a sample from a different population. The responses in each group are independent

More information

Immigration attitudes (opposes immigration or supports it) it may seriously misestimate the magnitude of the effects of IVs

Immigration attitudes (opposes immigration or supports it) it may seriously misestimate the magnitude of the effects of IVs Logistic Regression, Part I: Problems with the Linear Probability Model (LPM) Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 22, 2015 This handout steals

More information

ECON 497 Midterm Spring

ECON 497 Midterm Spring ECON 497 Midterm Spring 2009 1 ECON 497: Economic Research and Forecasting Name: Spring 2009 Bellas Midterm You have three hours and twenty minutes to complete this exam. Answer all questions and explain

More information

Chapter 11. Regression with a Binary Dependent Variable

Chapter 11. Regression with a Binary Dependent Variable Chapter 11 Regression with a Binary Dependent Variable 2 Regression with a Binary Dependent Variable (SW Chapter 11) So far the dependent variable (Y) has been continuous: district-wide average test score

More information

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity LECTURE 10 Introduction to Econometrics Multicollinearity & Heteroskedasticity November 22, 2016 1 / 23 ON PREVIOUS LECTURES We discussed the specification of a regression equation Specification consists

More information

Important note: Transcripts are not substitutes for textbook assignments. 1

Important note: Transcripts are not substitutes for textbook assignments. 1 In this lesson we will cover correlation and regression, two really common statistical analyses for quantitative (or continuous) data. Specially we will review how to organize the data, the importance

More information

Harvard University. Rigorous Research in Engineering Education

Harvard University. Rigorous Research in Engineering Education Statistical Inference Kari Lock Harvard University Department of Statistics Rigorous Research in Engineering Education 12/3/09 Statistical Inference You have a sample and want to use the data collected

More information

Introduction to Econometrics. Review of Probability & Statistics

Introduction to Econometrics. Review of Probability & Statistics 1 Introduction to Econometrics Review of Probability & Statistics Peerapat Wongchaiwat, Ph.D. wongchaiwat@hotmail.com Introduction 2 What is Econometrics? Econometrics consists of the application of mathematical

More information

Psychology 282 Lecture #4 Outline Inferences in SLR

Psychology 282 Lecture #4 Outline Inferences in SLR Psychology 282 Lecture #4 Outline Inferences in SLR Assumptions To this point we have not had to make any distributional assumptions. Principle of least squares requires no assumptions. Can use correlations

More information

AP Statistics L I N E A R R E G R E S S I O N C H A P 7

AP Statistics L I N E A R R E G R E S S I O N C H A P 7 AP Statistics 1 L I N E A R R E G R E S S I O N C H A P 7 The object [of statistics] is to discover methods of condensing information concerning large groups of allied facts into brief and compendious

More information

Hypothesis Tests and Confidence Intervals in Multiple Regression

Hypothesis Tests and Confidence Intervals in Multiple Regression Hypothesis Tests and Confidence Intervals in Multiple Regression (SW Chapter 7) Outline 1. Hypothesis tests and confidence intervals for one coefficient. Joint hypothesis tests on multiple coefficients

More information

Introduction to Econometrics. Multiple Regression (2016/2017)

Introduction to Econometrics. Multiple Regression (2016/2017) Introduction to Econometrics STAT-S-301 Multiple Regression (016/017) Lecturer: Yves Dominicy Teaching Assistant: Elise Petit 1 OLS estimate of the TS/STR relation: OLS estimate of the Test Score/STR relation:

More information

Section 3: Simple Linear Regression

Section 3: Simple Linear Regression Section 3: Simple Linear Regression Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction

More information

Introduction to Econometrics. Multiple Regression

Introduction to Econometrics. Multiple Regression Introduction to Econometrics The statistical analysis of economic (and related) data STATS301 Multiple Regression Titulaire: Christopher Bruffaerts Assistant: Lorenzo Ricci 1 OLS estimate of the TS/STR

More information

Sampling Distributions in Regression. Mini-Review: Inference for a Mean. For data (x 1, y 1 ),, (x n, y n ) generated with the SRM,

Sampling Distributions in Regression. Mini-Review: Inference for a Mean. For data (x 1, y 1 ),, (x n, y n ) generated with the SRM, Department of Statistics The Wharton School University of Pennsylvania Statistics 61 Fall 3 Module 3 Inference about the SRM Mini-Review: Inference for a Mean An ideal setup for inference about a mean

More information

Applied Regression Analysis

Applied Regression Analysis Applied Regression Analysis Lecture 2 January 27, 2005 Lecture #2-1/27/2005 Slide 1 of 46 Today s Lecture Simple linear regression. Partitioning the sum of squares. Tests of significance.. Regression diagnostics

More information

Introduction to Econometrics Third Edition James H. Stock Mark W. Watson The statistical analysis of economic (and related) data

Introduction to Econometrics Third Edition James H. Stock Mark W. Watson The statistical analysis of economic (and related) data Introduction to Econometrics Third Edition James H. Stock Mark W. Watson The statistical analysis of economic (and related) data 1/2/3-1 1/2/3-2 Brief Overview of the Course Economics suggests important

More information

Foundations of Correlation and Regression

Foundations of Correlation and Regression BWH - Biostatistics Intermediate Biostatistics for Medical Researchers Robert Goldman Professor of Statistics Simmons College Foundations of Correlation and Regression Tuesday, March 7, 2017 March 7 Foundations

More information

Multiple Regression Analysis

Multiple Regression Analysis Multiple Regression Analysis y = 0 + 1 x 1 + x +... k x k + u 6. Heteroskedasticity What is Heteroskedasticity?! Recall the assumption of homoskedasticity implied that conditional on the explanatory variables,

More information

Regression Analysis: Exploring relationships between variables. Stat 251

Regression Analysis: Exploring relationships between variables. Stat 251 Regression Analysis: Exploring relationships between variables Stat 251 Introduction Objective of regression analysis is to explore the relationship between two (or more) variables so that information

More information

Inference in Regression Model

Inference in Regression Model Inference in Regression Model Christopher Taber Department of Economics University of Wisconsin-Madison March 25, 2009 Outline 1 Final Step of Classical Linear Regression Model 2 Confidence Intervals 3

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

THE SAMPLING DISTRIBUTION OF THE MEAN

THE SAMPLING DISTRIBUTION OF THE MEAN THE SAMPLING DISTRIBUTION OF THE MEAN COGS 14B JANUARY 26, 2017 TODAY Sampling Distributions Sampling Distribution of the Mean Central Limit Theorem INFERENTIAL STATISTICS Inferential statistics: allows

More information

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43 Panel Data March 2, 212 () Applied Economoetrics: Topic March 2, 212 1 / 43 Overview Many economic applications involve panel data. Panel data has both cross-sectional and time series aspects. Regression

More information

9/2/2010. Wildlife Management is a very quantitative field of study. throughout this course and throughout your career.

9/2/2010. Wildlife Management is a very quantitative field of study. throughout this course and throughout your career. Introduction to Data and Analysis Wildlife Management is a very quantitative field of study Results from studies will be used throughout this course and throughout your career. Sampling design influences

More information

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation Biost 58 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 5: Review Purpose of Statistics Statistics is about science (Science in the broadest

More information

An overview of applied econometrics

An overview of applied econometrics An overview of applied econometrics Jo Thori Lind September 4, 2011 1 Introduction This note is intended as a brief overview of what is necessary to read and understand journal articles with empirical

More information

Regression, part II. I. What does it all mean? A) Notice that so far all we ve done is math.

Regression, part II. I. What does it all mean? A) Notice that so far all we ve done is math. Regression, part II I. What does it all mean? A) Notice that so far all we ve done is math. 1) One can calculate the Least Squares Regression Line for anything, regardless of any assumptions. 2) But, if

More information

Answer Key: Problem Set 6

Answer Key: Problem Set 6 : Problem Set 6 1. Consider a linear model to explain monthly beer consumption: beer = + inc + price + educ + female + u 0 1 3 4 E ( u inc, price, educ, female ) = 0 ( u inc price educ female) σ inc var,,,

More information

Simple Regression Model. January 24, 2011

Simple Regression Model. January 24, 2011 Simple Regression Model January 24, 2011 Outline Descriptive Analysis Causal Estimation Forecasting Regression Model We are actually going to derive the linear regression model in 3 very different ways

More information

At this point, if you ve done everything correctly, you should have data that looks something like:

At this point, if you ve done everything correctly, you should have data that looks something like: This homework is due on July 19 th. Economics 375: Introduction to Econometrics Homework #4 1. One tool to aid in understanding econometrics is the Monte Carlo experiment. A Monte Carlo experiment allows

More information

STAT Chapter 11: Regression

STAT Chapter 11: Regression STAT 515 -- Chapter 11: Regression Mostly we have studied the behavior of a single random variable. Often, however, we gather data on two random variables. We wish to determine: Is there a relationship

More information

1. Create a scatterplot of this data. 2. Find the correlation coefficient.

1. Create a scatterplot of this data. 2. Find the correlation coefficient. How Fast Foods Compare Company Entree Total Calories Fat (grams) McDonald s Big Mac 540 29 Filet o Fish 380 18 Burger King Whopper 670 40 Big Fish Sandwich 640 32 Wendy s Single Burger 470 21 1. Create

More information

Multiple Regression Analysis: Heteroskedasticity

Multiple Regression Analysis: Heteroskedasticity Multiple Regression Analysis: Heteroskedasticity y = β 0 + β 1 x 1 + β x +... β k x k + u Read chapter 8. EE45 -Chaiyuth Punyasavatsut 1 topics 8.1 Heteroskedasticity and OLS 8. Robust estimation 8.3 Testing

More information

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS 1a) The model is cw i = β 0 + β 1 el i + ɛ i, where cw i is the weight of the ith chick, el i the length of the egg from which it hatched, and ɛ i

More information

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression INTRODUCTION TO CLINICAL RESEARCH Introduction to Linear Regression Karen Bandeen-Roche, Ph.D. July 17, 2012 Acknowledgements Marie Diener-West Rick Thompson ICTR Leadership / Team JHU Intro to Clinical

More information

regression analysis is a type of inferential statistics which tells us whether relationships between two or more variables exist

regression analysis is a type of inferential statistics which tells us whether relationships between two or more variables exist regression analysis is a type of inferential statistics which tells us whether relationships between two or more variables exist sales $ (y - dependent variable) advertising $ (x - independent variable)

More information

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n =

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n = Hypothesis testing I I. What is hypothesis testing? [Note we re temporarily bouncing around in the book a lot! Things will settle down again in a week or so] - Exactly what it says. We develop a hypothesis,

More information

POL 681 Lecture Notes: Statistical Interactions

POL 681 Lecture Notes: Statistical Interactions POL 681 Lecture Notes: Statistical Interactions 1 Preliminaries To this point, the linear models we have considered have all been interpreted in terms of additive relationships. That is, the relationship

More information

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651 y 1 2 3 4 5 6 7 x Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching.html Lecture 32 Suhasini Subba Rao Previous lecture We are interested in whether a dependent

More information

Practice exam questions

Practice exam questions Practice exam questions Nathaniel Higgins nhiggins@jhu.edu, nhiggins@ers.usda.gov 1. The following question is based on the model y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 + u. Discuss the following two hypotheses.

More information

Simple and Multiple Linear Regression

Simple and Multiple Linear Regression Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where

More information

Bio 183 Statistics in Research. B. Cleaning up your data: getting rid of problems

Bio 183 Statistics in Research. B. Cleaning up your data: getting rid of problems Bio 183 Statistics in Research A. Research designs B. Cleaning up your data: getting rid of problems C. Basic descriptive statistics D. What test should you use? What is science?: Science is a way of knowing.(anon.?)

More information

Economics 113. Simple Regression Assumptions. Simple Regression Derivation. Changing Units of Measurement. Nonlinear effects

Economics 113. Simple Regression Assumptions. Simple Regression Derivation. Changing Units of Measurement. Nonlinear effects Economics 113 Simple Regression Models Simple Regression Assumptions Simple Regression Derivation Changing Units of Measurement Nonlinear effects OLS and unbiased estimates Variance of the OLS estimates

More information

Linear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation?

Linear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation? Did You Mean Association Or Correlation? AP Statistics Chapter 8 Be careful not to use the word correlation when you really mean association. Often times people will incorrectly use the word correlation

More information

Chapter 7. Hypothesis Tests and Confidence Intervals in Multiple Regression

Chapter 7. Hypothesis Tests and Confidence Intervals in Multiple Regression Chapter 7 Hypothesis Tests and Confidence Intervals in Multiple Regression Outline 1. Hypothesis tests and confidence intervals for a single coefficie. Joint hypothesis tests on multiple coefficients 3.

More information

Multiple Regression Analysis

Multiple Regression Analysis Multiple Regression Analysis y = β 0 + β 1 x 1 + β 2 x 2 +... β k x k + u 2. Inference 0 Assumptions of the Classical Linear Model (CLM)! So far, we know: 1. The mean and variance of the OLS estimators

More information

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS In our work on hypothesis testing, we used the value of a sample statistic to challenge an accepted value of a population parameter. We focused only

More information

FNCE 926 Empirical Methods in CF

FNCE 926 Empirical Methods in CF FNCE 926 Empirical Methods in CF Lecture 2 Linear Regression II Professor Todd Gormley Today's Agenda n Quick review n Finish discussion of linear regression q Hypothesis testing n n Standard errors Robustness,

More information

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model 1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor

More information

CS 5014: Research Methods in Computer Science. Bernoulli Distribution. Binomial Distribution. Poisson Distribution. Clifford A. Shaffer.

CS 5014: Research Methods in Computer Science. Bernoulli Distribution. Binomial Distribution. Poisson Distribution. Clifford A. Shaffer. Department of Computer Science Virginia Tech Blacksburg, Virginia Copyright c 2015 by Clifford A. Shaffer Computer Science Title page Computer Science Clifford A. Shaffer Fall 2015 Clifford A. Shaffer

More information

y n 1 ( x i x )( y y i n 1 i y 2

y n 1 ( x i x )( y y i n 1 i y 2 STP3 Brief Class Notes Instructor: Ela Jackiewicz Chapter Regression and Correlation In this chapter we will explore the relationship between two quantitative variables, X an Y. We will consider n ordered

More information

Chapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc.

Chapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc. Chapter 8 Linear Regression Copyright 2010 Pearson Education, Inc. Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the Burger King menu: Copyright

More information

Lecture 30. DATA 8 Summer Regression Inference

Lecture 30. DATA 8 Summer Regression Inference DATA 8 Summer 2018 Lecture 30 Regression Inference Slides created by John DeNero (denero@berkeley.edu) and Ani Adhikari (adhikari@berkeley.edu) Contributions by Fahad Kamran (fhdkmrn@berkeley.edu) and

More information

III. Inferential Tools

III. Inferential Tools III. Inferential Tools A. Introduction to Bat Echolocation Data (10.1.1) 1. Q: Do echolocating bats expend more enery than non-echolocating bats and birds, after accounting for mass? 2. Strategy: (i) Explore

More information

Overview. Overview. Overview. Specific Examples. General Examples. Bivariate Regression & Correlation

Overview. Overview. Overview. Specific Examples. General Examples. Bivariate Regression & Correlation Bivariate Regression & Correlation Overview The Scatter Diagram Two Examples: Education & Prestige Correlation Coefficient Bivariate Linear Regression Line SPSS Output Interpretation Covariance ou already

More information

Comparing Means from Two-Sample

Comparing Means from Two-Sample Comparing Means from Two-Sample Kwonsang Lee University of Pennsylvania kwonlee@wharton.upenn.edu April 3, 2015 Kwonsang Lee STAT111 April 3, 2015 1 / 22 Inference from One-Sample We have two options to

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

AMS 7 Correlation and Regression Lecture 8

AMS 7 Correlation and Regression Lecture 8 AMS 7 Correlation and Regression Lecture 8 Department of Applied Mathematics and Statistics, University of California, Santa Cruz Suumer 2014 1 / 18 Correlation pairs of continuous observations. Correlation

More information

CHAPTER 6: SPECIFICATION VARIABLES

CHAPTER 6: SPECIFICATION VARIABLES Recall, we had the following six assumptions required for the Gauss-Markov Theorem: 1. The regression model is linear, correctly specified, and has an additive error term. 2. The error term has a zero

More information

Simple, Marginal, and Interaction Effects in General Linear Models

Simple, Marginal, and Interaction Effects in General Linear Models Simple, Marginal, and Interaction Effects in General Linear Models PRE 905: Multivariate Analysis Lecture 3 Today s Class Centering and Coding Predictors Interpreting Parameters in the Model for the Means

More information

Generalized Linear Models for Non-Normal Data

Generalized Linear Models for Non-Normal Data Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture

More information

Applied Regression Modeling: A Business Approach Chapter 2: Simple Linear Regression Sections

Applied Regression Modeling: A Business Approach Chapter 2: Simple Linear Regression Sections Applied Regression Modeling: A Business Approach Chapter 2: Simple Linear Regression Sections 2.1 2.3 by Iain Pardoe 2.1 Probability model for and 2 Simple linear regression model for and....................................

More information

Nonlinear Regression Functions

Nonlinear Regression Functions Nonlinear Regression Functions (SW Chapter 8) Outline 1. Nonlinear regression functions general comments 2. Nonlinear functions of one variable 3. Nonlinear functions of two variables: interactions 4.

More information

STA 101 Final Review

STA 101 Final Review STA 101 Final Review Statistics 101 Thomas Leininger June 24, 2013 Announcements All work (besides projects) should be returned to you and should be entered on Sakai. Office Hour: 2 3pm today (Old Chem

More information

using the beginning of all regression models

using the beginning of all regression models Estimating using the beginning of all regression models 3 examples Note about shorthand Cavendish's 29 measurements of the earth's density Heights (inches) of 14 11 year-old males from Alberta study Half-life

More information

Interactions and Factorial ANOVA

Interactions and Factorial ANOVA Interactions and Factorial ANOVA STA442/2101 F 2017 See last slide for copyright information 1 Interactions Interaction between explanatory variables means It depends. Relationship between one explanatory

More information

Chapter 16. Simple Linear Regression and dcorrelation

Chapter 16. Simple Linear Regression and dcorrelation Chapter 16 Simple Linear Regression and dcorrelation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) 1 EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) Taisuke Otsu London School of Economics Summer 2018 A.1. Summation operator (Wooldridge, App. A.1) 2 3 Summation operator For

More information

Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.

Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6. Chapter 7 Reading 7.1, 7.2 Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.112 Introduction In Chapter 5 and 6, we emphasized

More information

appstats8.notebook October 11, 2016

appstats8.notebook October 11, 2016 Chapter 8 Linear Regression Objective: Students will construct and analyze a linear model for a given set of data. Fat Versus Protein: An Example pg 168 The following is a scatterplot of total fat versus

More information

Interactions and Factorial ANOVA

Interactions and Factorial ANOVA Interactions and Factorial ANOVA STA442/2101 F 2018 See last slide for copyright information 1 Interactions Interaction between explanatory variables means It depends. Relationship between one explanatory

More information

Varieties of Count Data

Varieties of Count Data CHAPTER 1 Varieties of Count Data SOME POINTS OF DISCUSSION What are counts? What are count data? What is a linear statistical model? What is the relationship between a probability distribution function

More information