L6: Regression II. JJ Chen. July 2, 2015
|
|
- Julian Holt
- 6 years ago
- Views:
Transcription
1 L6: Regression II JJ Chen July 2, 2015
2 Today s Plan Review basic inference based on Sample average Difference in sample average Extrapolate the knowledge to sample regression coefficients Standard error, robust standard error Hypothesis testing Confidence interval
3 Population Parameters
4 Toy Population Data Some population data on people s college type and their later earnings Yi: Earnings Pi: Dummy variable for attending private college Suppose there re 1,000,000 people in the population Lot of observations: measures such as expected value, variance, and standard deviations are useful to summarize the data
5 Data Source: local data frame [1,000,000 x 2] P Y
6 Distribution: Count
7 Distribution: Density
8 E, Var, SD E(Y) Var(Y) SD(Y) n(y) 78, ,266,203 19,500 1,000,000 E(Y) Var(Y) SD(Y),, and are population parameters Their Greek names are:,, and μ Y σ 2 Y σ Y
9 E, Var, SD E(Y) Var(Y) SD(Y) n(y) 78, ,759,816 19,462 1,000,000 Expected value measures central tendency Variance measures dispersion of the data Standard deviation is the square root of variance SD has the same unit as Yi Expectation and standard deviation are also useful for doing normalization
10 Normalization I To review the idea of normalization (something you already know from stat class), suppose we also have data on another population Maybe a country that has severe inflation E(Y2) Var(Y2) SD(Y2) N(Y2) 7,870,027 14,751,343,877,384 3,840, ,000
11 Normalization II If we want to compare two persons earning, but they are from two different population person A from population 1: $48,527 person Z from population 2: #4,558,819 A natural thing to do is to normalize their earnings so that they are comparable Many ways of normalization, usually done by finding a fixed unit, for example The amount of dollars person Z can buy The numbers of coffee beans both person can buy
12 Normalization III Expectation and standard deviation can be used to do normalization Step 1: Find deviation from mean Step 2: Rescale the deviation using std. dev. For example: Person A: Person Z: 4,558,819 E[Y2 i ] SD(Y2 i ) 48,527 E[ Y i ] SD( ) Y i = ( )/19456 = 1 = ( )/ = These scores are normalized deviations from means of two population
13 Normalization IV Suppose we randomly pick two persons from the first sample: Person A from population 1: $48,527 Person B from population 1: $95,301 We can also use normalization to say somethings about the relative position of their earnings: Person A: 48,527 E[ Y i ] SD( ) Y i Person B: 95,301 E[ Y i ] SD( ) Y i = (48, , 541)/19, 456 = 1.5 = (95, , 541)/19, 456 = 0.86
14 Conditional Parameters Back to the first population, we can focus on more interesting population expectation CE: E[ ] = E[ Y 1 i Yi Pi = 1] CE: E[ ] = E[ = 0] Y 0 i Yi Pi E[ Yi Pi] Together they makes a CEF: For each group, we can also find it s population variance (and standard deviation) CV: Var( Y 1 i ), Var( Y 1 i ) Sometimes CVs are the same, but most of the time are not
15 Conditional Distribution
16 Pop Scattor Plot
17 Pop Scattor Plot: Jitter Version
18 Pop Conditional Expectation
19 Pop Conditional Expectation Func
20 Population Regression Call: lm(formula = Y ~ P, data = college) Coefficients: (Intercept) P
21 Population Regression: Plot
22 Pop Expectation P Cond. E(Y) Std. Dev. Total Observations 0 69,978 17, , ,015 14, ,967
23 Population Parameters To summarize, we have the following fixed population parameters: E[ ] = 78, 614 Yi SD( ) = 19, 466 Yi E[ = 0] = μ Yi Pi 0 = 69, 949 E[ = 1] = μ Yi Pi 1 = 89, 926 E[ = 1] E[ = 0] = Δμ = 19, 977 Yi Pi Yi Pi SD( Y 0 ) = σ Y 0 = 17, 981 SD( Y 1 ) = σ Y 1 = 14, 976 α = 69, 949 Reg. Intercept: Reg. Slope: β = 19, 977
24 Unknown Population Of course, we don t have population data for many problems Thus, we really don t know all the population parameters Statisticians use a sample to make inference about a population
25 Inference: Sample Average
26 Sample 1 Suppose we get a 1% sample from the population Sample 1: Summary Statistics Sample Average Sample Variance Sample Standard Deviation Sample Size P 78, ,454,177 19,505 10,000 Sample Average Sample 1: Summary Statistics for Groups Sample Variance Sample Standard Deviation Sample Size 0 69, ,719,278 17,909 5, , ,719,421 15,057 4,303
27 Sample 2 Suppose we are lucky and have another 1% sample from the population Sample 2: Summary Statistics Sample Average Sample Variance Sample Standard Deviation Sample Size P 78, ,841,427 19,541 10,000 Sample Average Sample 2: Summary Statistics for Groups Sample Variance Sample Standard Deviation Sample Size 0 69, ,349,745 17,842 5, , ,887,797 15,063 4,330
28 Sample 3 Another 1% sample Sample 3: Summary Statistics Sample Average Sample Variance Sample Standard Deviation Sample Size P 78, ,556,306 19,585 10,000 Sample Average Sample 3: Summary Statistics for Groups Sample Variance Sample Standard Deviation Sample Size 0 70, ,989,121 18,083 5, , ,058,638 15,201 4,336
29 Sample Average is Random All the 3 sample give different sample statistics Since sample average is random, it also has mean and standard error E( Ȳ 1 i ) = E( ) SE( Ȳ 1 i ) = Y σ 1 i Y n Coming from square root of sampling variance Var( ) Ȳ 1 i
30 Review Sampling Variance V( Ȳ) = V ([ 1 n ]) Yi def. of Ȳ n i=1 = 1 V ( n ) n 2 i=1 since V(aY) = V(Y) Yi a 2 = = 1 n 2 i=1 1 n n n 2 i=1 V( ) each is indecently drawn Yi V(Y) each Yi Yi is from a same distribution
31 Review of Sampling Variance Simplifying further V( Ȳ) = 1 n n 2 i=1 V(Y) = 1 n 2 i=1 n σ 2 Y Greek name = nσ 2 Y sum of n identical quantity = n 2 cancel n σ 2 Y n
32 The Problem All the 3 sample give different sample statistics, but they all are very close to population parameters Sample 1: Ȳ 1 i = 90, 132 Sample 2: Ȳ 1 i = 89, 888 Sample 3: Ȳ 1 i = 90, 303 Population: E[ 1 Y i ] = 89, 926 The difficulties, of course, are that 1. We don t know population parameters 2. We don t have that many sample The problem is: how to draw inference based on just one sample?
33 Hypothesis Testing Hypothesis testing is one way of drawing inference We don t know population parameters? Make a guess, say: μ 1 = 90, 000 We only have one sample? Make a normalization based on (1) the assumed expectation and (2) standard error of sample average and find the relative position, say for sample 1 Ȳ 1 μ i 1 SE( Yi 1 ) = 90, , 000 SE( Yi 1 )
34 Scaled by Est. SE Very bad, the standard error of Ȳ 1 i also contains a population parameter we don t know: SE( ) = Ȳ 1 i We use sample standard deviation, the estimated SE: σ Y n σ Y Est. SE( Ȳ 1 i ) = SE^ ( Ȳ 1 i ) =, instead and get σ Y n
35 Rescaling Scores
36 The rescaling score based on sample 1 is = = Ȳ 1 μ i 1 SE^ ( Yi 1 ) 90, , 000 SE^ ( Yi 1 ) 90, , 000 σ Y n = 0.73 That is, the sample average based on the hypothesis that μ 1 = 90, 000 has a position of 0.73 standard error above the mean = 90, , ,964 10,000
37 Is it good? t value and t stat The score is a particular value calculated based on the rescaling formula: t Ȳ 1 μ i 1 SE^ ( Yi 1 ) We call the score a value and the formula is used to construct a statistics t
38 t CLT The stat has a very good property: IF E[ Yi] is indeed equal to μ, then as long as the sample is large enough the quantity t(μ) has a sampling distribution that is very close to a bell shaped standard normal distribution, regardless how Yi is distributed.
39 Hypothesis Testing Reasoning 90, 132 Say sample 1 gives a sample average of Make a guess about population expectation, say, Normalize the sample average and get the value of t 90, 000
40 Hypothesis Testing Reasoning If our guess in indeed true and the sample is large, stat should follow a standard normal distribution, and so a t value of 0.73 is no bad (within the range of [ 2, +2] ) That is, under the null hypothesis that our guess is true, the likelihood of having this t value is acceptable ( p value = 0.77, not that small) Conclusion/Decision: The sample doesn t provide strong evidence to reject the null; we choose to live with our guess unless we find new evidence later t
41 Confidence Interval Making reasonable guess asks us to think, but to save our ene thinking ideas, why not just give a range of population expec Confidence interval is another way of drawing inference When calculated in repeated samples, the 95% confidence inter approximately The first sample gives a CI: [ 2 ( ), + 2 ( )] Ȳ SE^ Ȳ Ȳ SE^ Ȳ [90, , 90, ] = [89773
42 Inference: Difference in Sample Average
43 HT and CI We re also interested in difference in pollution expectation, Δμ A naive comparison Making inference based on difference in sample average is almost the same Hypothesis Testing: normalization score based on difference in sample average,, and standard error, SE(Δ ) Ȳi Confidence Interval: The only complication is the estimated standard error formula: ΔȲi [Δ 2 SE^ (Δ Ȳ), Δ Ȳ + 2 SE^ (Δ Ȳ)] Ȳi SE^ (Δ Ȳ)
44 Complication To see the estimated standard error formula, recall that it s based on a standard error formula, which is based on sampling variance of, Δ = Ȳi Ȳ 1 i Ȳ 2 i V(ΔY) = V( Ȳ 1 Ȳ 0 ) def. of ΔY = V( Ȳ 1 ) + V( Ȳ 0 ) each Yi is indeed. drawn V 1 ( Yi) V 0 ( Yi) = + def. of sampling var. n 1 n 0
45 Complication V 1 ( Yi) V 0 ( Yi) V(ΔY) = + n 1 The complication is that whether you want to make an assumption that V 1 ( Yi) = V 0 ( Yi) That is, whether earnings for group 1 and group 2 have a same population variance (and hence standard standard deviation) of the underlying variable, n 0 Yi
46 Pop Box Plot
47 Pop Conditional Distribution
48 Unequal Var. In our particular example, population group variance are difference, so we have sampling variance, standard error, and estimated standard error for unequal group variance V 1 ( Yi) V 0 ( Yi) V(ΔY) = + n 1 n 0 V 1 ( Yi) V 0 ( Yi) SE(ΔY) = + n 1 n 0 SE^ (ΔY) = V 1 ( Yi ) V 0 ( Yi) +
49 n 1 n 0 Equal Var. If you re willing to assume equal group variance, the three formula can be simplified as )[ ( ) ( ) 1 1 ] V(ΔY) = + = V( + n 1 n 1 V 1 Yi SE(ΔY) = SD( ) Yi [ SE^ (ΔY) = SD^ [ ( Yi ) V 0 Yi n ] + n 1 n ] + n 1 n 0 Yi n 0
50 What to Choose? In most cases, especially when data is not from an experimental setting, the assumption of equal variance would be too strong
51 Hypothesis Testing Reasoning Assume group variances are not equal, we can now perform hypothesis testing, just like what we did for sample average We have a difference in sample average, ΔYi, but we want to know the difference in population expectation, Δμ = Δ E[ ] = E[ = 1] E[ = 0] Yi Y 1 i We don t know population parameters, so we make a guess, say Δμ = a number Normalize the difference in sample average and get a t value Given the guess, see if the value is too dramatic such t Pi Y 0 i Pi
52 that it s unacceptable Confidence Interval Again, confidence interval is just the similar bound: [Δ Ȳ 2 SE^ (Δ Ȳ), Δ Ȳ + 2 SE^ (Δ Ȳ)]
53 Test Sample Means by Group P mean(y) var(y) sd(y) n(y) 0 69, ,719,278 17,909 5, , ,719,421 15,057 4,303 Sample Mean Comparison Difference in Sample Mean Est. Std. Err. (323.5)
54 Inference: Sample Regression Coefficient
55 Bivariate Sample Regression Consider again a bivariate regression with a dummy variable But this time the regression is estimated based on a sample Call: lm(formula = Y ~ P, data = sam1) Coefficients: (Intercept) P
56 Estimated Regression The estimated regression is α β Pi Y^i = + = Pi How would you interpret the estimated slope?
57 The Logic Carries The estimated slope gives the difference in sample average So if we re worry about sampling uncertainty associated with the estimated slope, we can also use an estimated standard error Hypothesis Testing Confidence Interval
58 Displaying Reg. Results Software always give estimated coefficients and estimated standard errors: term estimate std.error statistic p.value 1 (Intercept) P It s customary to display both estimated regression coefficients, and estimated standard error
59 Displaying An Equation term estimate std.error statistic p.value 1 (Intercept) P Y^i = Pi (221.8) (338.13) Again, estimated standard error tells us how precise a coefficient is being estimated
60 Display An Table term estimate std.error statistic p.value 1 (Intercept) P A more common way to display regression results is regression table: A Regression Table P Y (332.5) Intercept Note: Estimated standard errors are in parenthesis. (217.9)
61 Complication Estimated slope is giving the difference in sampleaverage in bivariate regression with a dummy variable Just like the standard errors of difference in sampleaverage, there re two basic ways of computing standard errors for estimated reg. coef.: Assume equal group variance : Homoskedasticity Assume unequal group variance : Heteroskedasiticity
62 Homoskedasticity Homoskedasticity is an old fashioned assumption Throughout the world, perhaps only undergrads taking introductory stat/ metrics are using them Homoskedasticity gives an simple formula for std. err. and est. std. err: and σ e SE( β ) = n σ e SE^ ( β ) = n 1 σ X 1 σ X
63 Heteroskedasiticity A more realistic assumption is Heteroskedasiticity, and it gives another formulas, usually called (Est.) Robust Standard Error formulas and RSE( β ) = RSE( ^ β ) = 1 n 1 n V( Xie i ) (σx 2 i ) 2 ( ) V^ Xie i (σ 2 X i ) 2
64 HT and CI Hypothesis testing and confidence interval are based on the similar recipe: t(β) = β β RSE ^ ( β ) [ 2 RSE ^ ( β ), β + 2 RSE ^ ( )] β β
65 Demo Many software still use homoskedasticity standard errors as default, so usually we need to put a few extra efforts to get the est. std. err. we want In Stata, this can be done by specifying the robust option: reg Y P, robust In R, it s still a bit complicated
66 Demo: Homoskedasticity Std. Err. lm(y ~ P, data = sam1) Call: lm(formula = Y ~ P, data = sam1) Coefficients: (Intercept) P est.reg = lm(y ~ P, data = sam1)
67 Demo: Homoskedasticity Std. Err. library(broom) tidy(est.reg) term estimate std.error statistic p.value 1 (Intercept) P
68 Demo: Heteroskedasticity Std. Err. library(sandwich) vcovhc(est.reg) (Intercept) P (Intercept) P sqrt(55143) [1] 235 sqrt(104667) [1] 324
Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals
Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals (SW Chapter 5) Outline. The standard error of ˆ. Hypothesis tests concerning β 3. Confidence intervals for β 4. Regression
More informationLinear Regression with 1 Regressor. Introduction to Econometrics Spring 2012 Ken Simons
Linear Regression with 1 Regressor Introduction to Econometrics Spring 2012 Ken Simons Linear Regression with 1 Regressor 1. The regression equation 2. Estimating the equation 3. Assumptions required for
More informationRegression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.
TCELL 9/4/205 36-309/749 Experimental Design for Behavioral and Social Sciences Simple Regression Example Male black wheatear birds carry stones to the nest as a form of sexual display. Soler et al. wanted
More informationReview of Statistics 101
Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods
More information36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression
36-309/749 Experimental Design for Behavioral and Social Sciences Sep. 22, 2015 Lecture 4: Linear Regression TCELL Simple Regression Example Male black wheatear birds carry stones to the nest as a form
More informationObjectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters
Objectives 10.1 Simple linear regression Statistical model for linear regression Estimating the regression parameters Confidence interval for regression parameters Significance test for the slope Confidence
More informationBusiness Statistics. Lecture 10: Course Review
Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,
More informationLecture 18: Simple Linear Regression
Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength
More informationThe Simple Linear Regression Model
The Simple Linear Regression Model Lesson 3 Ryan Safner 1 1 Department of Economics Hood College ECON 480 - Econometrics Fall 2017 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 1 / 77 Bivariate
More informationECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests
ECON4150 - Introductory Econometrics Lecture 5: OLS with One Regressor: Hypothesis Tests Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 5 Lecture outline 2 Testing Hypotheses about one
More informationBusiness Statistics. Lecture 9: Simple Regression
Business Statistics Lecture 9: Simple Regression 1 On to Model Building! Up to now, class was about descriptive and inferential statistics Numerical and graphical summaries of data Confidence intervals
More informationGeneral Linear Model (Chapter 4)
General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients
More informationy response variable x 1, x 2,, x k -- a set of explanatory variables
11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate
More informationInference with Simple Regression
1 Introduction Inference with Simple Regression Alan B. Gelder 06E:071, The University of Iowa 1 Moving to infinite means: In this course we have seen one-mean problems, twomean problems, and problems
More informationStatistics and Quantitative Analysis U4320. Segment 10 Prof. Sharyn O Halloran
Statistics and Quantitative Analysis U4320 Segment 10 Prof. Sharyn O Halloran Key Points 1. Review Univariate Regression Model 2. Introduce Multivariate Regression Model Assumptions Estimation Hypothesis
More informationRegression and the 2-Sample t
Regression and the 2-Sample t James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Regression and the 2-Sample t 1 / 44 Regression
More informationdf=degrees of freedom = n - 1
One sample t-test test of the mean Assumptions: Independent, random samples Approximately normal distribution (from intro class: σ is unknown, need to calculate and use s (sample standard deviation)) Hypotheses:
More informationNature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.
Understanding regression output from software Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals In 1966 Cyril Burt published a paper called The genetic determination of differences
More information1 Independent Practice: Hypothesis tests for one parameter:
1 Independent Practice: Hypothesis tests for one parameter: Data from the Indian DHS survey from 2006 includes a measure of autonomy of the women surveyed (a scale from 0-10, 10 being the most autonomous)
More informationApplied Statistics and Econometrics
Applied Statistics and Econometrics Lecture 5 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 44 Outline of Lecture 5 Now that we know the sampling distribution
More information9. Linear Regression and Correlation
9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,
More informationLECTURE 15: SIMPLE LINEAR REGRESSION I
David Youngberg BSAD 20 Montgomery College LECTURE 5: SIMPLE LINEAR REGRESSION I I. From Correlation to Regression a. Recall last class when we discussed two basic types of correlation (positive and negative).
More informationLecture 3: Inference in SLR
Lecture 3: Inference in SLR STAT 51 Spring 011 Background Reading KNNL:.1.6 3-1 Topic Overview This topic will cover: Review of hypothesis testing Inference about 1 Inference about 0 Confidence Intervals
More informationECON3150/4150 Spring 2015
ECON3150/4150 Spring 2015 Lecture 3&4 - The linear regression model Siv-Elisabeth Skjelbred University of Oslo January 29, 2015 1 / 67 Chapter 4 in S&W Section 17.1 in S&W (extended OLS assumptions) 2
More informationApplied Statistics and Econometrics
Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple
More informationBinary Logistic Regression
The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b
More informationwhere Female = 0 for males, = 1 for females Age is measured in years (22, 23, ) GPA is measured in units on a four-point scale (0, 1.22, 3.45, etc.
Notes on regression analysis 1. Basics in regression analysis key concepts (actual implementation is more complicated) A. Collect data B. Plot data on graph, draw a line through the middle of the scatter
More informationTwo Sample Problems. Two sample problems
Two Sample Problems Two sample problems The goal of inference is to compare the responses in two groups. Each group is a sample from a different population. The responses in each group are independent
More informationImmigration attitudes (opposes immigration or supports it) it may seriously misestimate the magnitude of the effects of IVs
Logistic Regression, Part I: Problems with the Linear Probability Model (LPM) Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 22, 2015 This handout steals
More informationECON 497 Midterm Spring
ECON 497 Midterm Spring 2009 1 ECON 497: Economic Research and Forecasting Name: Spring 2009 Bellas Midterm You have three hours and twenty minutes to complete this exam. Answer all questions and explain
More informationChapter 11. Regression with a Binary Dependent Variable
Chapter 11 Regression with a Binary Dependent Variable 2 Regression with a Binary Dependent Variable (SW Chapter 11) So far the dependent variable (Y) has been continuous: district-wide average test score
More informationLECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity
LECTURE 10 Introduction to Econometrics Multicollinearity & Heteroskedasticity November 22, 2016 1 / 23 ON PREVIOUS LECTURES We discussed the specification of a regression equation Specification consists
More informationImportant note: Transcripts are not substitutes for textbook assignments. 1
In this lesson we will cover correlation and regression, two really common statistical analyses for quantitative (or continuous) data. Specially we will review how to organize the data, the importance
More informationHarvard University. Rigorous Research in Engineering Education
Statistical Inference Kari Lock Harvard University Department of Statistics Rigorous Research in Engineering Education 12/3/09 Statistical Inference You have a sample and want to use the data collected
More informationIntroduction to Econometrics. Review of Probability & Statistics
1 Introduction to Econometrics Review of Probability & Statistics Peerapat Wongchaiwat, Ph.D. wongchaiwat@hotmail.com Introduction 2 What is Econometrics? Econometrics consists of the application of mathematical
More informationPsychology 282 Lecture #4 Outline Inferences in SLR
Psychology 282 Lecture #4 Outline Inferences in SLR Assumptions To this point we have not had to make any distributional assumptions. Principle of least squares requires no assumptions. Can use correlations
More informationAP Statistics L I N E A R R E G R E S S I O N C H A P 7
AP Statistics 1 L I N E A R R E G R E S S I O N C H A P 7 The object [of statistics] is to discover methods of condensing information concerning large groups of allied facts into brief and compendious
More informationHypothesis Tests and Confidence Intervals in Multiple Regression
Hypothesis Tests and Confidence Intervals in Multiple Regression (SW Chapter 7) Outline 1. Hypothesis tests and confidence intervals for one coefficient. Joint hypothesis tests on multiple coefficients
More informationIntroduction to Econometrics. Multiple Regression (2016/2017)
Introduction to Econometrics STAT-S-301 Multiple Regression (016/017) Lecturer: Yves Dominicy Teaching Assistant: Elise Petit 1 OLS estimate of the TS/STR relation: OLS estimate of the Test Score/STR relation:
More informationSection 3: Simple Linear Regression
Section 3: Simple Linear Regression Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction
More informationIntroduction to Econometrics. Multiple Regression
Introduction to Econometrics The statistical analysis of economic (and related) data STATS301 Multiple Regression Titulaire: Christopher Bruffaerts Assistant: Lorenzo Ricci 1 OLS estimate of the TS/STR
More informationSampling Distributions in Regression. Mini-Review: Inference for a Mean. For data (x 1, y 1 ),, (x n, y n ) generated with the SRM,
Department of Statistics The Wharton School University of Pennsylvania Statistics 61 Fall 3 Module 3 Inference about the SRM Mini-Review: Inference for a Mean An ideal setup for inference about a mean
More informationApplied Regression Analysis
Applied Regression Analysis Lecture 2 January 27, 2005 Lecture #2-1/27/2005 Slide 1 of 46 Today s Lecture Simple linear regression. Partitioning the sum of squares. Tests of significance.. Regression diagnostics
More informationIntroduction to Econometrics Third Edition James H. Stock Mark W. Watson The statistical analysis of economic (and related) data
Introduction to Econometrics Third Edition James H. Stock Mark W. Watson The statistical analysis of economic (and related) data 1/2/3-1 1/2/3-2 Brief Overview of the Course Economics suggests important
More informationFoundations of Correlation and Regression
BWH - Biostatistics Intermediate Biostatistics for Medical Researchers Robert Goldman Professor of Statistics Simmons College Foundations of Correlation and Regression Tuesday, March 7, 2017 March 7 Foundations
More informationMultiple Regression Analysis
Multiple Regression Analysis y = 0 + 1 x 1 + x +... k x k + u 6. Heteroskedasticity What is Heteroskedasticity?! Recall the assumption of homoskedasticity implied that conditional on the explanatory variables,
More informationRegression Analysis: Exploring relationships between variables. Stat 251
Regression Analysis: Exploring relationships between variables Stat 251 Introduction Objective of regression analysis is to explore the relationship between two (or more) variables so that information
More informationInference in Regression Model
Inference in Regression Model Christopher Taber Department of Economics University of Wisconsin-Madison March 25, 2009 Outline 1 Final Step of Classical Linear Regression Model 2 Confidence Intervals 3
More information401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.
401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis
More informationTHE SAMPLING DISTRIBUTION OF THE MEAN
THE SAMPLING DISTRIBUTION OF THE MEAN COGS 14B JANUARY 26, 2017 TODAY Sampling Distributions Sampling Distribution of the Mean Central Limit Theorem INFERENTIAL STATISTICS Inferential statistics: allows
More informationPanel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43
Panel Data March 2, 212 () Applied Economoetrics: Topic March 2, 212 1 / 43 Overview Many economic applications involve panel data. Panel data has both cross-sectional and time series aspects. Regression
More information9/2/2010. Wildlife Management is a very quantitative field of study. throughout this course and throughout your career.
Introduction to Data and Analysis Wildlife Management is a very quantitative field of study Results from studies will be used throughout this course and throughout your career. Sampling design influences
More informationBiost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation
Biost 58 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 5: Review Purpose of Statistics Statistics is about science (Science in the broadest
More informationAn overview of applied econometrics
An overview of applied econometrics Jo Thori Lind September 4, 2011 1 Introduction This note is intended as a brief overview of what is necessary to read and understand journal articles with empirical
More informationRegression, part II. I. What does it all mean? A) Notice that so far all we ve done is math.
Regression, part II I. What does it all mean? A) Notice that so far all we ve done is math. 1) One can calculate the Least Squares Regression Line for anything, regardless of any assumptions. 2) But, if
More informationAnswer Key: Problem Set 6
: Problem Set 6 1. Consider a linear model to explain monthly beer consumption: beer = + inc + price + educ + female + u 0 1 3 4 E ( u inc, price, educ, female ) = 0 ( u inc price educ female) σ inc var,,,
More informationSimple Regression Model. January 24, 2011
Simple Regression Model January 24, 2011 Outline Descriptive Analysis Causal Estimation Forecasting Regression Model We are actually going to derive the linear regression model in 3 very different ways
More informationAt this point, if you ve done everything correctly, you should have data that looks something like:
This homework is due on July 19 th. Economics 375: Introduction to Econometrics Homework #4 1. One tool to aid in understanding econometrics is the Monte Carlo experiment. A Monte Carlo experiment allows
More informationSTAT Chapter 11: Regression
STAT 515 -- Chapter 11: Regression Mostly we have studied the behavior of a single random variable. Often, however, we gather data on two random variables. We wish to determine: Is there a relationship
More information1. Create a scatterplot of this data. 2. Find the correlation coefficient.
How Fast Foods Compare Company Entree Total Calories Fat (grams) McDonald s Big Mac 540 29 Filet o Fish 380 18 Burger King Whopper 670 40 Big Fish Sandwich 640 32 Wendy s Single Burger 470 21 1. Create
More informationMultiple Regression Analysis: Heteroskedasticity
Multiple Regression Analysis: Heteroskedasticity y = β 0 + β 1 x 1 + β x +... β k x k + u Read chapter 8. EE45 -Chaiyuth Punyasavatsut 1 topics 8.1 Heteroskedasticity and OLS 8. Robust estimation 8.3 Testing
More informationStat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS
Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS 1a) The model is cw i = β 0 + β 1 el i + ɛ i, where cw i is the weight of the ith chick, el i the length of the egg from which it hatched, and ɛ i
More informationAcknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression
INTRODUCTION TO CLINICAL RESEARCH Introduction to Linear Regression Karen Bandeen-Roche, Ph.D. July 17, 2012 Acknowledgements Marie Diener-West Rick Thompson ICTR Leadership / Team JHU Intro to Clinical
More informationregression analysis is a type of inferential statistics which tells us whether relationships between two or more variables exist
regression analysis is a type of inferential statistics which tells us whether relationships between two or more variables exist sales $ (y - dependent variable) advertising $ (x - independent variable)
More informationHypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n =
Hypothesis testing I I. What is hypothesis testing? [Note we re temporarily bouncing around in the book a lot! Things will settle down again in a week or so] - Exactly what it says. We develop a hypothesis,
More informationPOL 681 Lecture Notes: Statistical Interactions
POL 681 Lecture Notes: Statistical Interactions 1 Preliminaries To this point, the linear models we have considered have all been interpreted in terms of additive relationships. That is, the relationship
More informationData Analysis and Statistical Methods Statistics 651
y 1 2 3 4 5 6 7 x Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching.html Lecture 32 Suhasini Subba Rao Previous lecture We are interested in whether a dependent
More informationPractice exam questions
Practice exam questions Nathaniel Higgins nhiggins@jhu.edu, nhiggins@ers.usda.gov 1. The following question is based on the model y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 + u. Discuss the following two hypotheses.
More informationSimple and Multiple Linear Regression
Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where
More informationBio 183 Statistics in Research. B. Cleaning up your data: getting rid of problems
Bio 183 Statistics in Research A. Research designs B. Cleaning up your data: getting rid of problems C. Basic descriptive statistics D. What test should you use? What is science?: Science is a way of knowing.(anon.?)
More informationEconomics 113. Simple Regression Assumptions. Simple Regression Derivation. Changing Units of Measurement. Nonlinear effects
Economics 113 Simple Regression Models Simple Regression Assumptions Simple Regression Derivation Changing Units of Measurement Nonlinear effects OLS and unbiased estimates Variance of the OLS estimates
More informationLinear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation?
Did You Mean Association Or Correlation? AP Statistics Chapter 8 Be careful not to use the word correlation when you really mean association. Often times people will incorrectly use the word correlation
More informationChapter 7. Hypothesis Tests and Confidence Intervals in Multiple Regression
Chapter 7 Hypothesis Tests and Confidence Intervals in Multiple Regression Outline 1. Hypothesis tests and confidence intervals for a single coefficie. Joint hypothesis tests on multiple coefficients 3.
More informationMultiple Regression Analysis
Multiple Regression Analysis y = β 0 + β 1 x 1 + β 2 x 2 +... β k x k + u 2. Inference 0 Assumptions of the Classical Linear Model (CLM)! So far, we know: 1. The mean and variance of the OLS estimators
More informationT.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS
ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS In our work on hypothesis testing, we used the value of a sample statistic to challenge an accepted value of a population parameter. We focused only
More informationFNCE 926 Empirical Methods in CF
FNCE 926 Empirical Methods in CF Lecture 2 Linear Regression II Professor Todd Gormley Today's Agenda n Quick review n Finish discussion of linear regression q Hypothesis testing n n Standard errors Robustness,
More informationLinear Regression. In this lecture we will study a particular type of regression model: the linear regression model
1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor
More informationCS 5014: Research Methods in Computer Science. Bernoulli Distribution. Binomial Distribution. Poisson Distribution. Clifford A. Shaffer.
Department of Computer Science Virginia Tech Blacksburg, Virginia Copyright c 2015 by Clifford A. Shaffer Computer Science Title page Computer Science Clifford A. Shaffer Fall 2015 Clifford A. Shaffer
More informationy n 1 ( x i x )( y y i n 1 i y 2
STP3 Brief Class Notes Instructor: Ela Jackiewicz Chapter Regression and Correlation In this chapter we will explore the relationship between two quantitative variables, X an Y. We will consider n ordered
More informationChapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc.
Chapter 8 Linear Regression Copyright 2010 Pearson Education, Inc. Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the Burger King menu: Copyright
More informationLecture 30. DATA 8 Summer Regression Inference
DATA 8 Summer 2018 Lecture 30 Regression Inference Slides created by John DeNero (denero@berkeley.edu) and Ani Adhikari (adhikari@berkeley.edu) Contributions by Fahad Kamran (fhdkmrn@berkeley.edu) and
More informationIII. Inferential Tools
III. Inferential Tools A. Introduction to Bat Echolocation Data (10.1.1) 1. Q: Do echolocating bats expend more enery than non-echolocating bats and birds, after accounting for mass? 2. Strategy: (i) Explore
More informationOverview. Overview. Overview. Specific Examples. General Examples. Bivariate Regression & Correlation
Bivariate Regression & Correlation Overview The Scatter Diagram Two Examples: Education & Prestige Correlation Coefficient Bivariate Linear Regression Line SPSS Output Interpretation Covariance ou already
More informationComparing Means from Two-Sample
Comparing Means from Two-Sample Kwonsang Lee University of Pennsylvania kwonlee@wharton.upenn.edu April 3, 2015 Kwonsang Lee STAT111 April 3, 2015 1 / 22 Inference from One-Sample We have two options to
More informationInference for Regression
Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu
More informationAMS 7 Correlation and Regression Lecture 8
AMS 7 Correlation and Regression Lecture 8 Department of Applied Mathematics and Statistics, University of California, Santa Cruz Suumer 2014 1 / 18 Correlation pairs of continuous observations. Correlation
More informationCHAPTER 6: SPECIFICATION VARIABLES
Recall, we had the following six assumptions required for the Gauss-Markov Theorem: 1. The regression model is linear, correctly specified, and has an additive error term. 2. The error term has a zero
More informationSimple, Marginal, and Interaction Effects in General Linear Models
Simple, Marginal, and Interaction Effects in General Linear Models PRE 905: Multivariate Analysis Lecture 3 Today s Class Centering and Coding Predictors Interpreting Parameters in the Model for the Means
More informationGeneralized Linear Models for Non-Normal Data
Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture
More informationApplied Regression Modeling: A Business Approach Chapter 2: Simple Linear Regression Sections
Applied Regression Modeling: A Business Approach Chapter 2: Simple Linear Regression Sections 2.1 2.3 by Iain Pardoe 2.1 Probability model for and 2 Simple linear regression model for and....................................
More informationNonlinear Regression Functions
Nonlinear Regression Functions (SW Chapter 8) Outline 1. Nonlinear regression functions general comments 2. Nonlinear functions of one variable 3. Nonlinear functions of two variables: interactions 4.
More informationSTA 101 Final Review
STA 101 Final Review Statistics 101 Thomas Leininger June 24, 2013 Announcements All work (besides projects) should be returned to you and should be entered on Sakai. Office Hour: 2 3pm today (Old Chem
More informationusing the beginning of all regression models
Estimating using the beginning of all regression models 3 examples Note about shorthand Cavendish's 29 measurements of the earth's density Heights (inches) of 14 11 year-old males from Alberta study Half-life
More informationInteractions and Factorial ANOVA
Interactions and Factorial ANOVA STA442/2101 F 2017 See last slide for copyright information 1 Interactions Interaction between explanatory variables means It depends. Relationship between one explanatory
More informationChapter 16. Simple Linear Regression and dcorrelation
Chapter 16 Simple Linear Regression and dcorrelation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will
More informationEC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)
1 EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) Taisuke Otsu London School of Economics Summer 2018 A.1. Summation operator (Wooldridge, App. A.1) 2 3 Summation operator For
More informationQuestions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.
Chapter 7 Reading 7.1, 7.2 Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.112 Introduction In Chapter 5 and 6, we emphasized
More informationappstats8.notebook October 11, 2016
Chapter 8 Linear Regression Objective: Students will construct and analyze a linear model for a given set of data. Fat Versus Protein: An Example pg 168 The following is a scatterplot of total fat versus
More informationInteractions and Factorial ANOVA
Interactions and Factorial ANOVA STA442/2101 F 2018 See last slide for copyright information 1 Interactions Interaction between explanatory variables means It depends. Relationship between one explanatory
More informationVarieties of Count Data
CHAPTER 1 Varieties of Count Data SOME POINTS OF DISCUSSION What are counts? What are count data? What is a linear statistical model? What is the relationship between a probability distribution function
More information