Statistics, inference and ordinary least squares. Frank Venmans

Size: px
Start display at page:

Download "Statistics, inference and ordinary least squares. Frank Venmans"

Transcription

1 Statistics, inference and ordinary least squares Frank Venmans

2 Statistics

3 Conditional probability Consider 2 events: A: die shows 1,3 or 5 => P(A)=3/6 B: die shows 3 or 6 =>P(B)=2/6 A B : A and B occur: die shows 3 =>P(A&B)=1/6 AUB : A or B occur: die shows 1,3, 5 or 6 =>P(AorB)=4/6 Addition rule: P(AorB)=P(A)+P(B)-P(A&B) (~ venn diagram) P A B = P A&B P B (~ venn diagram) P(A B): prob of event A given that B occurs=1/2 P(B A): prob of event B given that A occurs=1/3 Bayes Law: P A&B = P(A B)P(B)=P(B A)P(A) Event can be any set of outcomes. Example A: Random draw from belgian population with income >30,000 B: Random draw from Belgian population with education >12 years P(A B) P(A) Income>30,000 Education>12

4 Independence 2 events A and B: P A B = P A P B A = P B A and B are independent Two variables X and Y f x y = f x f y x = f y x and y are independent X and Y are independent if the conditional distribution of X given Y is the same as the unconditional distribution of X. Independent variables do not necessarily have a zero correlation. Example: height of my sun and Indian GDP are correlated (both affected by time) Dependent variables may have a zero correlation in exceptional cases. Example: selection bias may compensate a causal effect (see further)

5 Cumulative Distribution Function CDF Probability Density Function PDF Notation: Random variables X,Y: ex. Yearly earnings and level of eduction Discrete if earnings are multiples of 100 and eduction in years ~Continuous if earnings are expressed un eurocent and education in seconds Specific values of random variables: a,b or x,y Cumulative Distrubtion Function: probability that X is smaller than or equal to a F a = P X a Probability Density Function For discrete variables: f(a)=p(x=a) For continuous variables f a = df a da a F a = f X dx Area under the pdf =1 because F = 1

6 Joint Cumulative Distribution Function Assume Y Yearly earnings and X level of education F x, y = P X < x &Y < y

7 Density function Joint Density Function For discrete variables:f x, y Continuous variables: f x, y Marginal Denstity Function = P X = x&y = y = 2 F x,y x y Discrete variables f x = P X = x disregarding y y= Continuous variables f x = f x, y dy (red and blue line) Conditional Density Function Discrete variables f x y f x y = f x,y y= = P X = x Y = y f y (intersections through the joint density function)

8 Regression as a conditional density function

9 Expected value Unconditional expected value For a discrete random variable : E X = x i P x i = μ For a continuous random variable : E X = xf x dx = μ Conditional expected value (in finance many expectations are conditional on the information set at time t) E X Y = E Y [X]= x i P x i Y E X Y = xf x y dx Variance= σ 2 = E[ X μ 2 ] Covariance between X and Y= σ X,Y = E X μ X Y μ Y Skewness= E Kurtosis= E X μ σ X μ σ 4 3

10 Normal distribution 2 f x = 1 exp 1 x μ σ 2π 2 σ Notation X~N(μ, σ 2 ) Skewness=0 Kurtosis=3 Jacques-Berra test for normality: tests if skewness and kurtosis are close to 0 and 3. Any linear combination of normally distributed variables (correlated or not) is normally distributed Central limit theorem: the probability distribution of a variable that is the sum of an infinite number of independent random variables with any distribution will be normally distributed.

11 Chi square distribution n Y = i=1 X 2 i with X i ~N 0,1 and all X i independent follows a χ 2 distribution with n degrees of freedom. Y~χ2 n

12 Student t distribution Z = X with X~N 0,1 and Y~χ2 Y n and X independent from Y n follows a student or t-distribution with n degrees of freedom Z~t n Higher variance and kurtosis than the standardized normal distribution Converges to the normal distribution for large n: t = N 0,1

13 F distribution Z= X/n Y/m with X~χ n 2 and Y~χ m 2 and X independent from Y follows an F distribution with n and m degrees of freedom. Z~F n,m

14 Inference

15 Statistical inference Try to say something about the real distribution of a random variable based on a sample. The real distribution corresponds to an infinitely repeated event (ex dice), the entire population, entire set of possible states of the world in a future period etc.

16 3 types of inference Point estimator: Ex: sample mean, sample variance, marginal effect in a linear regression (beta), correlation Concept of repeated sampling: every sample gives another estimator θ=> θwill follow a prob distribution Unbiased: Expected value of estimator corresponds to the real parameter E θ = θ Consistent: The estimator can get arbitrarily close to the real parameter by increasing the sample size plim θ = θ n Ex: sample variance estimator s ² = 1 y n i i y 2 is a biased but consistent estimator of the variance Efficient estimator: var(θ) is small Interval estimation: Ex: given the observed sample, the real mean lays between 1 and 3 with 95% probability Hypothesis testing: Ex: if the null hypothesis is true (μ = 2), what is the probability of a random sample to have a more extreme (less likely) outcome than the observed sample mean of 4 and sample variance of 2.

17 Example: Sample mean Income of Belgian households: a random variable following a distribution with mean μ and variance σ² (distribution is skewed, not normal) You have a sample of n individuals. You want to say something about μ and σ² Estimator of μ: sample mean y = y 1+y 2 +y 3 y n n Estimator will be different each time you draw a different sample=>sample mean will follow a distribution, which is different from the distribution of y. Central limit theorem =>the sample mean converges to a normal distribution even if y does not follow a normal distribution.

18 Sample mean: variance known y assymptotic ~N μ, σ2 n y μ σ n assymptotic ~N(0,1) This allows to determine a 95%confidence interval P 1,96 < y μ σ/ n < 1,96 = 0,95 P y 1,96 σ n < μ < y + 1,96 σ n =0,95 When interval includes zero we say that the sample mean is not significantly different from zero at the 5% confidence level.

19 Sample mean: variance unknown and y normally distributed Both mean and variance will need to be estimated. Estimator for variance: s 2 = 1 y n 1 n i y 2 If Y follows a normal distribution (n 1)s 2 = y i y σ 2 n σ σ/ n y μ y μ = s / n s σ = y μ σ/ n ~ N 0,1 n 1 s 2 (n 1)σ 2 χ 2 n 1 n 1 = t n ~χ n 1 (no proof but intuitive) This allows to determine a 95% confidence interval (ex. n=21) P 2,086 < y μ s < 2,086 = 0,95 P y 2,086 s < μ < y + 2,086 s =0,95 n n n For large n, the t distribution converges to the normal distribution

20 Hypothesis testing Null hypothesis H 0 : θ = θ 0 ex: H 0 : θ = 0 One sided test H A : θ > θ 0 (or θ < θ 0 ) ex: H A : θ > 0 Two sided test H A : θ θ 0 ex: H A : θ 0 2 regions: If observed data (test statistic) falls in rejection region =>reject H 0 If observed data (test statistic) falls in acceptence region =>accept H 0 Imagine you have 10 months of data and you observe a mean monthly return of the stock of Apple of 0,8% and you want to test if this mean is different from a zero return. Assume the standard error of the return is observed to be 1,58%, so the standard error of the mean is 1,58% 10 = 0,5%

21 One sided test vs 2-sided test One sided test: if the real mean was zero, what would be the probability to observe an estimator larger than 0,8%(1,58%)? Standardize your outcome H 0 : μ = 0 y 0 P Y > y 0 s n s / n ~t n = P Y > 1,6 =0,05 (Pvalue given by Stata) Two sided test: if the real mean was zero, what would be the probability that the sample mean was outside the interval of [ y s /, y ] n s / n y y s / n ) H 0 : μ = 0 1 P( < X < s / n = 1- P(-1,6<X<1,6)=0,10 (Pvalue given by Stata) Remark: n is small so assumption of normally distributed returns is needed

22 Type I and type II errors Do not reject H0 Reject H0 H0 true Type I error, α (ex 5%) H0 false Type II error, β 1-β=power of test Level of significance: probability to reject the null if the null is true Power of the test: probability to reject the null if the nulle is false Reduce probability of type I error =>increase probability of Type II error More efficient estimator=>reduce probability of Type II error=increase power of the test Increase sample size=>reduce probability of Type II error=increase power of the test General rule: go for a large sample, in small samples you may only see phenomenons big as an elephant, that you knew allready before doing the test, all the rest has an insignificant effect.

23 Ordinary least squares

24 Regression Assume we want to know the relationship between sales and advertising expenditure OLS: minimize squared distance between points and regresssion line Y=Sales Y = α + βx Slope= β ε i α + βx i α Y i X=advertising expenditure

25 Population regression line vs sampling regression line Estimators and regression line (orange) will be different for each sample. Would you use a one-sided or two-sided t-test for beta? Y=Sales Y = α + βx Slope= β Y = α + βx Slope= ε i β εi α α + βx i α + βx Y i Advertising expenditures

26 Assumptions of OLS 5 Gauss-Markov assumptions: The true model is Y = α 0 + β 1 X 1 + β 2 X ε with E ε = 0 (linearity) No perfect collinearity (you cannot write X 1 as a linear combination of the other X j s) Homoscedastic errors E ε 2 i = σ² matrix notation E εε Uncorrelated errors E ε i ε j = 0 E ε X 1, X 2 = 0 (exogenous explainatory variables, no endogeneïty) If 5 assumptions are met, OLS is Best Linear Unbiased Estimator (BLUE) If the errors are normally distributed, OLS is Best Unbiased Estimator (BUE) OLS with non-normal errors is still unbiased and consistent! β follows a t-distribution only if errors are normal => be prudent with interpreting confidence intervals in small samples = σi

27 The math behind OLS (optional, only for those who like it) Consider Matrix notation of model: Y = Xβ + ε If there is an intercept, X contains a column of one s Minimum distance estimator min 2 n ε β i = min ε ε = min Y Xβ Y Xβ = min Y Y 2β X Y + β X Xβ β β β first derivative: 2X Y + 2X Xβ = 0 β = X X 1 X Y Method of moments: errors must be uncorrelated with the regressors X ε = 0 X Y Xβ = 0 β = X X 1 X Y Maximum likelihood under normal distribution of error term 1 2 likelihood = exp ε i i 2πσ 2 2σ 2 Loglikelihood = n log 2 2πσ2 1 ε 2σ i² 2 n Minimising the loglikelihood boils down to minimum distance estimator=>ols is BUE

28 OLS inference If errors are normally distributed, the estimate β follows a student t distribution (only assymptotically the case if errors are not normally distributed) If Errors are correlated or heteroscedastic, the variance of beta σ β 2 can be increased to take that into account (option robust in stata) Stata command: regress Y X1 X2, robust Default includes a constant, you can add option noconstant Exogeneity of X s cannot be tested. The problem of endogeneïty is most important condition for causal interpretation of beta s (see next week)

29 Avoid endogeneïty Conditions 1 and 5 imply E Y X 1, X 2 = α 0 + β 1 X 1 + β 2 X 2. This allows a causal interpretation of beta s: an increase of X1 by one unit, all other relevant factors being equal, will have an effect β 1 on Y. Intuition: all other factors being equal implies that all factors that drive the error term (and thus Y), are uncorrelated to the variables of interest X.

30 The effect of marketing «all else being equal» Innovative company Sales Marketing expenditures Error= all other factors Competitors Quality of product Delivery time Business cycle E ε X 0 cov ε, X 0 ε and X are driven by common factors.

31 Fixed effect panel regression to avoid endogeneïty Marketing expenditures Innovative company Sales Fixed effect= all factors that are constant over time Competitors Quality of product Idiosyncratic error= all other factors that change over time Delivery time Business cycle

32 Fixed effects - random effect - pooled panel 3 ways of writing the same fixed effect model: Y= Xβ + i γ i D i + ε with D i a dummy variable for company i. Y it = X it β + γ i + ε it Y it Y i = (X it X i )β + ε it εi = within estimator (obtained by subtracting the sum of eq 2 over time periods) Beta measures the effect of a deviation from mean marketing expenditures on the deviation of mean sales within a company i. =>The difference in mean sales between a company with high and low average marketing expenditures does not drive the estimation of beta => be careful with measurement errors and lagged effects because part of variability is filtered out. If theory indicates that you may avoid a source of endogeneity and you have enough data to find significant effects => use fixed effects! Random Effect model and Pooled panel regression assume that none of the factors that drives a company specific effect drives any of the X s as well: fixed effects are uncorrelated with X s Pooled panel regression is an OLS as if there was no panel structure: every observation has equal weight Random effect model is a Generalized Least Squares (GLS) estimator: the observed heterogeneity and serial correlation is used to make estimator more efficient compared to the pooled panel regression

33 Alternative functional forms If X increases by 1, Y will increase by β Y = α + βx + ε dy = β= marginal effect dx If X increases by 1%, Y will increase by β% lny = α + βlnx + ε Y = e α X β e ε dlny dlnx = β = dy/y dx/x = elasticity If X increases by 1, Y will increase by β% lny = α + βx + ε Y = e α e βx e ε dlny dy/y = β = = gowth rate (think of X as time) dx dx If X increases by 1%, Y will increase by β Y = α + βlnx + ε e Y = e α X β e ε dy lnx = β = dy dx X Any transformation (lnx, 1/X, X², X³, expx) is in principle allowed Transformation can be justified by theory (in most cases) or by the data (see graphs) Y X Y X Y X

34 Some useful commands in Stata Type help in the command windows for the following commands: summarize: summarize information about a variable or dataset tabulate var1 var2 : tabulation table to explore data destring : define a variable as numerical if it would be imported as string (text) generate var1=var2+var3 : generates a new variable (also for log transformation) replace var1=0 if var2==. & var3>36 : logical expressions with == ; dot= missing (or ) => if var3 is missing, it satisfies var3>36! replace var1=1 if l.var2==25 var3==var4 : lag operator only if dataset defined as time series or panel. regress y x1 x2 : regression with many options tset year : define dataset to be a time series with year name of time variable

35 Commands in Stata: xtset i t : define a database to be a panel with i name of person or company xtreg : fixed effect or random effect panel regression Create an id per companyname: egen id= group(companyname) egen stands for extensions to generate, operates on groups of observations. generate only applies to one observation at a time Calculate mean (over time) of variable income per company id: by id: egen meanincome = mean(income) Eliminate extreme values beyond 99th percentile of variable income: Summarize income, detail Replace income=. if income>`r(p99) Most commands create different macro s (mentioned at the end of help document). You can use them with `. Since they are local macro s, they are erased at some point (in this case until the next command that uses r to store results.

Job Training Partnership Act (JTPA)

Job Training Partnership Act (JTPA) Causal inference Part I.b: randomized experiments, matching and regression (this lecture starts with other slides on randomized experiments) Frank Venmans Example of a randomized experiment: Job Training

More information

Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis"

Ninth ARTNeT Capacity Building Workshop for Trade Research Trade Flows and Trade Policy Analysis Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis" June 2013 Bangkok, Thailand Cosimo Beverelli and Rainer Lanz (World Trade Organization) 1 Selected econometric

More information

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b.

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b. B203: Quantitative Methods Answer all questions from part I. Answer two question from part II.a, and one question from part II.b. Part I: Compulsory Questions. Answer all questions. Each question carries

More information

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data July 2012 Bangkok, Thailand Cosimo Beverelli (World Trade Organization) 1 Content a) Classical regression model b)

More information

Lecture 9: Panel Data Model (Chapter 14, Wooldridge Textbook)

Lecture 9: Panel Data Model (Chapter 14, Wooldridge Textbook) Lecture 9: Panel Data Model (Chapter 14, Wooldridge Textbook) 1 2 Panel Data Panel data is obtained by observing the same person, firm, county, etc over several periods. Unlike the pooled cross sections,

More information

Review of Statistics

Review of Statistics Review of Statistics Topics Descriptive Statistics Mean, Variance Probability Union event, joint event Random Variables Discrete and Continuous Distributions, Moments Two Random Variables Covariance and

More information

INTRODUCTION TO BASIC LINEAR REGRESSION MODEL

INTRODUCTION TO BASIC LINEAR REGRESSION MODEL INTRODUCTION TO BASIC LINEAR REGRESSION MODEL 13 September 2011 Yogyakarta, Indonesia Cosimo Beverelli (World Trade Organization) 1 LINEAR REGRESSION MODEL In general, regression models estimate the effect

More information

Econ 1123: Section 5. Review. Internal Validity. Panel Data. Clustered SE. STATA help for Problem Set 5. Econ 1123: Section 5.

Econ 1123: Section 5. Review. Internal Validity. Panel Data. Clustered SE. STATA help for Problem Set 5. Econ 1123: Section 5. Outline 1 Elena Llaudet 2 3 4 October 6, 2010 5 based on Common Mistakes on P. Set 4 lnftmpop = -.72-2.84 higdppc -.25 lackpf +.65 higdppc * lackpf 2 lnftmpop = β 0 + β 1 higdppc + β 2 lackpf + β 3 lackpf

More information

The Simple Linear Regression Model

The Simple Linear Regression Model The Simple Linear Regression Model Lesson 3 Ryan Safner 1 1 Department of Economics Hood College ECON 480 - Econometrics Fall 2017 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 1 / 77 Bivariate

More information

Econometrics - 30C00200

Econometrics - 30C00200 Econometrics - 30C00200 Lecture 11: Heteroskedasticity Antti Saastamoinen VATT Institute for Economic Research Fall 2015 30C00200 Lecture 11: Heteroskedasticity 12.10.2015 Aalto University School of Business

More information

Ordinary Least Squares Regression Explained: Vartanian

Ordinary Least Squares Regression Explained: Vartanian Ordinary Least Squares Regression Explained: Vartanian When to Use Ordinary Least Squares Regression Analysis A. Variable types. When you have an interval/ratio scale dependent variable.. When your independent

More information

Review of Econometrics

Review of Econometrics Review of Econometrics Zheng Tian June 5th, 2017 1 The Essence of the OLS Estimation Multiple regression model involves the models as follows Y i = β 0 + β 1 X 1i + β 2 X 2i + + β k X ki + u i, i = 1,...,

More information

Introduction to Econometrics

Introduction to Econometrics Introduction to Econometrics T H I R D E D I T I O N Global Edition James H. Stock Harvard University Mark W. Watson Princeton University Boston Columbus Indianapolis New York San Francisco Upper Saddle

More information

Econometrics Summary Algebraic and Statistical Preliminaries

Econometrics Summary Algebraic and Statistical Preliminaries Econometrics Summary Algebraic and Statistical Preliminaries Elasticity: The point elasticity of Y with respect to L is given by α = ( Y/ L)/(Y/L). The arc elasticity is given by ( Y/ L)/(Y/L), when L

More information

Math Review Sheet, Fall 2008

Math Review Sheet, Fall 2008 1 Descriptive Statistics Math 3070-5 Review Sheet, Fall 2008 First we need to know about the relationship among Population Samples Objects The distribution of the population can be given in one of the

More information

Applied Econometrics (QEM)

Applied Econometrics (QEM) Applied Econometrics (QEM) based on Prinicples of Econometrics Jakub Mućk Department of Quantitative Economics Jakub Mućk Applied Econometrics (QEM) Meeting #3 1 / 42 Outline 1 2 3 t-test P-value Linear

More information

Applied Microeconometrics (L5): Panel Data-Basics

Applied Microeconometrics (L5): Panel Data-Basics Applied Microeconometrics (L5): Panel Data-Basics Nicholas Giannakopoulos University of Patras Department of Economics ngias@upatras.gr November 10, 2015 Nicholas Giannakopoulos (UPatras) MSc Applied Economics

More information

Intermediate Econometrics

Intermediate Econometrics Intermediate Econometrics Heteroskedasticity Text: Wooldridge, 8 July 17, 2011 Heteroskedasticity Assumption of homoskedasticity, Var(u i x i1,..., x ik ) = E(u 2 i x i1,..., x ik ) = σ 2. That is, the

More information

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables. Regression Analysis BUS 735: Business Decision Making and Research 1 Goals of this section Specific goals Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate

More information

Econometrics for PhDs

Econometrics for PhDs Econometrics for PhDs Amine Ouazad April 2012, Final Assessment - Answer Key 1 Questions with a require some Stata in the answer. Other questions do not. 1 Ordinary Least Squares: Equality of Estimates

More information

Motivation for multiple regression

Motivation for multiple regression Motivation for multiple regression 1. Simple regression puts all factors other than X in u, and treats them as unobserved. Effectively the simple regression does not account for other factors. 2. The slope

More information

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors Laura Mayoral IAE, Barcelona GSE and University of Gothenburg Gothenburg, May 2015 Roadmap Deviations from the standard

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data

Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data Panel data Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data - possible to control for some unobserved heterogeneity - possible

More information

Applied Quantitative Methods II

Applied Quantitative Methods II Applied Quantitative Methods II Lecture 4: OLS and Statistics revision Klára Kaĺıšková Klára Kaĺıšková AQM II - Lecture 4 VŠE, SS 2016/17 1 / 68 Outline 1 Econometric analysis Properties of an estimator

More information

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) 1 EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) Taisuke Otsu London School of Economics Summer 2018 A.1. Summation operator (Wooldridge, App. A.1) 2 3 Summation operator For

More information

Instrumental Variables, Simultaneous and Systems of Equations

Instrumental Variables, Simultaneous and Systems of Equations Chapter 6 Instrumental Variables, Simultaneous and Systems of Equations 61 Instrumental variables In the linear regression model y i = x iβ + ε i (61) we have been assuming that bf x i and ε i are uncorrelated

More information

Applied Economics. Panel Data. Department of Economics Universidad Carlos III de Madrid

Applied Economics. Panel Data. Department of Economics Universidad Carlos III de Madrid Applied Economics Panel Data Department of Economics Universidad Carlos III de Madrid See also Wooldridge (chapter 13), and Stock and Watson (chapter 10) 1 / 38 Panel Data vs Repeated Cross-sections In

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple

More information

Functional Form. So far considered models written in linear form. Y = b 0 + b 1 X + u (1) Implies a straight line relationship between y and X

Functional Form. So far considered models written in linear form. Y = b 0 + b 1 X + u (1) Implies a straight line relationship between y and X Functional Form So far considered models written in linear form Y = b 0 + b 1 X + u (1) Implies a straight line relationship between y and X Functional Form So far considered models written in linear form

More information

ECO220Y Simple Regression: Testing the Slope

ECO220Y Simple Regression: Testing the Slope ECO220Y Simple Regression: Testing the Slope Readings: Chapter 18 (Sections 18.3-18.5) Winter 2012 Lecture 19 (Winter 2012) Simple Regression Lecture 19 1 / 32 Simple Regression Model y i = β 0 + β 1 x

More information

Essential of Simple regression

Essential of Simple regression Essential of Simple regression We use simple regression when we are interested in the relationship between two variables (e.g., x is class size, and y is student s GPA). For simplicity we assume the relationship

More information

Econometrics Honor s Exam Review Session. Spring 2012 Eunice Han

Econometrics Honor s Exam Review Session. Spring 2012 Eunice Han Econometrics Honor s Exam Review Session Spring 2012 Eunice Han Topics 1. OLS The Assumptions Omitted Variable Bias Conditional Mean Independence Hypothesis Testing and Confidence Intervals Homoskedasticity

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

CHAPTER 6: SPECIFICATION VARIABLES

CHAPTER 6: SPECIFICATION VARIABLES Recall, we had the following six assumptions required for the Gauss-Markov Theorem: 1. The regression model is linear, correctly specified, and has an additive error term. 2. The error term has a zero

More information

Topic 10: Panel Data Analysis

Topic 10: Panel Data Analysis Topic 10: Panel Data Analysis Advanced Econometrics (I) Dong Chen School of Economics, Peking University 1 Introduction Panel data combine the features of cross section data time series. Usually a panel

More information

Simple Linear Regression Model & Introduction to. OLS Estimation

Simple Linear Regression Model & Introduction to. OLS Estimation Inside ECOOMICS Introduction to Econometrics Simple Linear Regression Model & Introduction to Introduction OLS Estimation We are interested in a model that explains a variable y in terms of other variables

More information

Outline. Nature of the Problem. Nature of the Problem. Basic Econometrics in Transportation. Autocorrelation

Outline. Nature of the Problem. Nature of the Problem. Basic Econometrics in Transportation. Autocorrelation 1/30 Outline Basic Econometrics in Transportation Autocorrelation Amir Samimi What is the nature of autocorrelation? What are the theoretical and practical consequences of autocorrelation? Since the assumption

More information

WISE International Masters

WISE International Masters WISE International Masters ECONOMETRICS Instructor: Brett Graham INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This examination paper contains 32 questions. You are

More information

MFin Econometrics I Session 5: F-tests for goodness of fit, Non-linearity and Model Transformations, Dummy variables

MFin Econometrics I Session 5: F-tests for goodness of fit, Non-linearity and Model Transformations, Dummy variables MFin Econometrics I Session 5: F-tests for goodness of fit, Non-linearity and Model Transformations, Dummy variables Thilo Klein University of Cambridge Judge Business School Session 5: Non-linearity,

More information

Probabilities & Statistics Revision

Probabilities & Statistics Revision Probabilities & Statistics Revision Christopher Ting Christopher Ting http://www.mysmu.edu/faculty/christophert/ : christopherting@smu.edu.sg : 6828 0364 : LKCSB 5036 January 6, 2017 Christopher Ting QF

More information

Econ 836 Final Exam. 2 w N 2 u N 2. 2 v N

Econ 836 Final Exam. 2 w N 2 u N 2. 2 v N 1) [4 points] Let Econ 836 Final Exam Y Xβ+ ε, X w+ u, w N w~ N(, σi ), u N u~ N(, σi ), ε N ε~ Nu ( γσ, I ), where X is a just one column. Let denote the OLS estimator, and define residuals e as e Y X.

More information

Lecture 8: Functional Form

Lecture 8: Functional Form Lecture 8: Functional Form What we know now OLS - fitting a straight line y = b 0 + b 1 X through the data using the principle of choosing the straight line that minimises the sum of squared residuals

More information

1 Correlation and Inference from Regression

1 Correlation and Inference from Regression 1 Correlation and Inference from Regression Reading: Kennedy (1998) A Guide to Econometrics, Chapters 4 and 6 Maddala, G.S. (1992) Introduction to Econometrics p. 170-177 Moore and McCabe, chapter 12 is

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrics of Panel Data Jakub Mućk Meeting # 6 Jakub Mućk Econometrics of Panel Data Meeting # 6 1 / 36 Outline 1 The First-Difference (FD) estimator 2 Dynamic panel data models 3 The Anderson and Hsiao

More information

Preliminary Statistics. Lecture 3: Probability Models and Distributions

Preliminary Statistics. Lecture 3: Probability Models and Distributions Preliminary Statistics Lecture 3: Probability Models and Distributions Rory Macqueen (rm43@soas.ac.uk), September 2015 Outline Revision of Lecture 2 Probability Density Functions Cumulative Distribution

More information

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017 Introduction to Regression Analysis Dr. Devlina Chatterjee 11 th August, 2017 What is regression analysis? Regression analysis is a statistical technique for studying linear relationships. One dependent

More information

Chapter 15 Panel Data Models. Pooling Time-Series and Cross-Section Data

Chapter 15 Panel Data Models. Pooling Time-Series and Cross-Section Data Chapter 5 Panel Data Models Pooling Time-Series and Cross-Section Data Sets of Regression Equations The topic can be introduced wh an example. A data set has 0 years of time series data (from 935 to 954)

More information

Instructions: Closed book, notes, and no electronic devices. Points (out of 200) in parentheses

Instructions: Closed book, notes, and no electronic devices. Points (out of 200) in parentheses ISQS 5349 Final Spring 2011 Instructions: Closed book, notes, and no electronic devices. Points (out of 200) in parentheses 1. (10) What is the definition of a regression model that we have used throughout

More information

Lecture 4: Linear panel models

Lecture 4: Linear panel models Lecture 4: Linear panel models Luc Behaghel PSE February 2009 Luc Behaghel (PSE) Lecture 4 February 2009 1 / 47 Introduction Panel = repeated observations of the same individuals (e.g., rms, workers, countries)

More information

Linear Regression. Junhui Qian. October 27, 2014

Linear Regression. Junhui Qian. October 27, 2014 Linear Regression Junhui Qian October 27, 2014 Outline The Model Estimation Ordinary Least Square Method of Moments Maximum Likelihood Estimation Properties of OLS Estimator Unbiasedness Consistency Efficiency

More information

PhD/MA Econometrics Examination. January, 2015 PART A. (Answer any TWO from Part A)

PhD/MA Econometrics Examination. January, 2015 PART A. (Answer any TWO from Part A) PhD/MA Econometrics Examination January, 2015 Total Time: 8 hours MA students are required to answer from A and B. PhD students are required to answer from A, B, and C. PART A (Answer any TWO from Part

More information

Empirical Application of Panel Data Regression

Empirical Application of Panel Data Regression Empirical Application of Panel Data Regression 1. We use Fatality data, and we are interested in whether rising beer tax rate can help lower traffic death. So the dependent variable is traffic death, while

More information

Econometrics Master in Business and Quantitative Methods

Econometrics Master in Business and Quantitative Methods Econometrics Master in Business and Quantitative Methods Helena Veiga Universidad Carlos III de Madrid Models with discrete dependent variables and applications of panel data methods in all fields of economics

More information

Inference in Regression Analysis

Inference in Regression Analysis ECNS 561 Inference Inference in Regression Analysis Up to this point 1.) OLS is unbiased 2.) OLS is BLUE (best linear unbiased estimator i.e., the variance is smallest among linear unbiased estimators)

More information

FinQuiz Notes

FinQuiz Notes Reading 10 Multiple Regression and Issues in Regression Analysis 2. MULTIPLE LINEAR REGRESSION Multiple linear regression is a method used to model the linear relationship between a dependent variable

More information

The regression model with one fixed regressor cont d

The regression model with one fixed regressor cont d The regression model with one fixed regressor cont d 3150/4150 Lecture 4 Ragnar Nymoen 27 January 2012 The model with transformed variables Regression with transformed variables I References HGL Ch 2.8

More information

Graduate Econometrics Lecture 4: Heteroskedasticity

Graduate Econometrics Lecture 4: Heteroskedasticity Graduate Econometrics Lecture 4: Heteroskedasticity Department of Economics University of Gothenburg November 30, 2014 1/43 and Autocorrelation Consequences for OLS Estimator Begin from the linear model

More information

Instrumental Variables and the Problem of Endogeneity

Instrumental Variables and the Problem of Endogeneity Instrumental Variables and the Problem of Endogeneity September 15, 2015 1 / 38 Exogeneity: Important Assumption of OLS In a standard OLS framework, y = xβ + ɛ (1) and for unbiasedness we need E[x ɛ] =

More information

ECNS 561 Multiple Regression Analysis

ECNS 561 Multiple Regression Analysis ECNS 561 Multiple Regression Analysis Model with Two Independent Variables Consider the following model Crime i = β 0 + β 1 Educ i + β 2 [what else would we like to control for?] + ε i Here, we are taking

More information

Lab 07 Introduction to Econometrics

Lab 07 Introduction to Econometrics Lab 07 Introduction to Econometrics Learning outcomes for this lab: Introduce the different typologies of data and the econometric models that can be used Understand the rationale behind econometrics Understand

More information

1 Motivation for Instrumental Variable (IV) Regression

1 Motivation for Instrumental Variable (IV) Regression ECON 370: IV & 2SLS 1 Instrumental Variables Estimation and Two Stage Least Squares Econometric Methods, ECON 370 Let s get back to the thiking in terms of cross sectional (or pooled cross sectional) data

More information

6. Assessing studies based on multiple regression

6. Assessing studies based on multiple regression 6. Assessing studies based on multiple regression Questions of this section: What makes a study using multiple regression (un)reliable? When does multiple regression provide a useful estimate of the causal

More information

2) For a normal distribution, the skewness and kurtosis measures are as follows: A) 1.96 and 4 B) 1 and 2 C) 0 and 3 D) 0 and 0

2) For a normal distribution, the skewness and kurtosis measures are as follows: A) 1.96 and 4 B) 1 and 2 C) 0 and 3 D) 0 and 0 Introduction to Econometrics Midterm April 26, 2011 Name Student ID MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. (5,000 credit for each correct

More information

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43 Panel Data March 2, 212 () Applied Economoetrics: Topic March 2, 212 1 / 43 Overview Many economic applications involve panel data. Panel data has both cross-sectional and time series aspects. Regression

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

Applied Quantitative Methods II

Applied Quantitative Methods II Applied Quantitative Methods II Lecture 10: Panel Data Klára Kaĺıšková Klára Kaĺıšková AQM II - Lecture 10 VŠE, SS 2016/17 1 / 38 Outline 1 Introduction 2 Pooled OLS 3 First differences 4 Fixed effects

More information

Econometrics. 9) Heteroscedasticity and autocorrelation

Econometrics. 9) Heteroscedasticity and autocorrelation 30C00200 Econometrics 9) Heteroscedasticity and autocorrelation Timo Kuosmanen Professor, Ph.D. http://nomepre.net/index.php/timokuosmanen Today s topics Heteroscedasticity Possible causes Testing for

More information

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity LECTURE 10 Introduction to Econometrics Multicollinearity & Heteroskedasticity November 22, 2016 1 / 23 ON PREVIOUS LECTURES We discussed the specification of a regression equation Specification consists

More information

Econometrics Midterm Examination Answers

Econometrics Midterm Examination Answers Econometrics Midterm Examination Answers March 4, 204. Question (35 points) Answer the following short questions. (i) De ne what is an unbiased estimator. Show that X is an unbiased estimator for E(X i

More information

Inference with Simple Regression

Inference with Simple Regression 1 Introduction Inference with Simple Regression Alan B. Gelder 06E:071, The University of Iowa 1 Moving to infinite means: In this course we have seen one-mean problems, twomean problems, and problems

More information

Short T Panels - Review

Short T Panels - Review Short T Panels - Review We have looked at methods for estimating parameters on time-varying explanatory variables consistently in panels with many cross-section observation units but a small number of

More information

PhD/MA Econometrics Examination January 2012 PART A

PhD/MA Econometrics Examination January 2012 PART A PhD/MA Econometrics Examination January 2012 PART A ANSWER ANY TWO QUESTIONS IN THIS SECTION NOTE: (1) The indicator function has the properties: (2) Question 1 Let, [defined as if using the indicator

More information

Lecture 4: Multivariate Regression, Part 2

Lecture 4: Multivariate Regression, Part 2 Lecture 4: Multivariate Regression, Part 2 Gauss-Markov Assumptions 1) Linear in Parameters: Y X X X i 0 1 1 2 2 k k 2) Random Sampling: we have a random sample from the population that follows the above

More information

Multiple Linear Regression CIVL 7012/8012

Multiple Linear Regression CIVL 7012/8012 Multiple Linear Regression CIVL 7012/8012 2 Multiple Regression Analysis (MLR) Allows us to explicitly control for many factors those simultaneously affect the dependent variable This is important for

More information

Midterm 2 - Solutions

Midterm 2 - Solutions Ecn 102 - Analysis of Economic Data University of California - Davis February 24, 2010 Instructor: John Parman Midterm 2 - Solutions You have until 10:20am to complete this exam. Please remember to put

More information

Recitation 2: Probability

Recitation 2: Probability Recitation 2: Probability Colin White, Kenny Marino January 23, 2018 Outline Facts about sets Definitions and facts about probability Random Variables and Joint Distributions Characteristics of distributions

More information

Lectures 5 & 6: Hypothesis Testing

Lectures 5 & 6: Hypothesis Testing Lectures 5 & 6: Hypothesis Testing in which you learn to apply the concept of statistical significance to OLS estimates, learn the concept of t values, how to use them in regression work and come across

More information

p(z)

p(z) Chapter Statistics. Introduction This lecture is a quick review of basic statistical concepts; probabilities, mean, variance, covariance, correlation, linear regression, probability density functions and

More information

1. The Multivariate Classical Linear Regression Model

1. The Multivariate Classical Linear Regression Model Business School, Brunel University MSc. EC550/5509 Modelling Financial Decisions and Markets/Introduction to Quantitative Methods Prof. Menelaos Karanasos (Room SS69, Tel. 08956584) Lecture Notes 5. The

More information

Financial Econometrics

Financial Econometrics Financial Econometrics Estimation and Inference Gerald P. Dwyer Trinity College, Dublin January 2013 Who am I? Visiting Professor and BB&T Scholar at Clemson University Federal Reserve Bank of Atlanta

More information

Econometrics. 4) Statistical inference

Econometrics. 4) Statistical inference 30C00200 Econometrics 4) Statistical inference Timo Kuosmanen Professor, Ph.D. http://nomepre.net/index.php/timokuosmanen Today s topics Confidence intervals of parameter estimates Student s t-distribution

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrics of Panel Data Jakub Mućk Meeting # 1 Jakub Mućk Econometrics of Panel Data Meeting # 1 1 / 31 Outline 1 Course outline 2 Panel data Advantages of Panel Data Limitations of Panel Data 3 Pooled

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 7 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 68 Outline of Lecture 7 1 Empirical example: Italian labor force

More information

This paper is not to be removed from the Examination Halls

This paper is not to be removed from the Examination Halls ~~ST104B ZA d0 This paper is not to be removed from the Examination Halls UNIVERSITY OF LONDON ST104B ZB BSc degrees and Diplomas for Graduates in Economics, Management, Finance and the Social Sciences,

More information

The regression model with one stochastic regressor.

The regression model with one stochastic regressor. The regression model with one stochastic regressor. 3150/4150 Lecture 6 Ragnar Nymoen 30 January 2012 We are now on Lecture topic 4 The main goal in this lecture is to extend the results of the regression

More information

Lecture 3: Multiple Regression

Lecture 3: Multiple Regression Lecture 3: Multiple Regression R.G. Pierse 1 The General Linear Model Suppose that we have k explanatory variables Y i = β 1 + β X i + β 3 X 3i + + β k X ki + u i, i = 1,, n (1.1) or Y i = β j X ji + u

More information

Multiple Regression Analysis

Multiple Regression Analysis Multiple Regression Analysis y = β 0 + β 1 x 1 + β 2 x 2 +... β k x k + u 2. Inference 0 Assumptions of the Classical Linear Model (CLM)! So far, we know: 1. The mean and variance of the OLS estimators

More information

Introduction to Econometrics. Heteroskedasticity

Introduction to Econometrics. Heteroskedasticity Introduction to Econometrics Introduction Heteroskedasticity When the variance of the errors changes across segments of the population, where the segments are determined by different values for the explanatory

More information

Practical Econometrics. for. Finance and Economics. (Econometrics 2)

Practical Econometrics. for. Finance and Economics. (Econometrics 2) Practical Econometrics for Finance and Economics (Econometrics 2) Seppo Pynnönen and Bernd Pape Department of Mathematics and Statistics, University of Vaasa 1. Introduction 1.1 Econometrics Econometrics

More information

Applied Health Economics (for B.Sc.)

Applied Health Economics (for B.Sc.) Applied Health Economics (for B.Sc.) Helmut Farbmacher Department of Economics University of Mannheim Autumn Semester 2017 Outlook 1 Linear models (OLS, Omitted variables, 2SLS) 2 Limited and qualitative

More information

Lecture 4: Heteroskedasticity

Lecture 4: Heteroskedasticity Lecture 4: Heteroskedasticity Econometric Methods Warsaw School of Economics (4) Heteroskedasticity 1 / 24 Outline 1 What is heteroskedasticity? 2 Testing for heteroskedasticity White Goldfeld-Quandt Breusch-Pagan

More information

1. You have data on years of work experience, EXPER, its square, EXPER2, years of education, EDUC, and the log of hourly wages, LWAGE

1. You have data on years of work experience, EXPER, its square, EXPER2, years of education, EDUC, and the log of hourly wages, LWAGE 1. You have data on years of work experience, EXPER, its square, EXPER, years of education, EDUC, and the log of hourly wages, LWAGE You estimate the following regressions: (1) LWAGE =.00 + 0.05*EDUC +

More information

Lecture 4: Multivariate Regression, Part 2

Lecture 4: Multivariate Regression, Part 2 Lecture 4: Multivariate Regression, Part 2 Gauss-Markov Assumptions 1) Linear in Parameters: Y X X X i 0 1 1 2 2 k k 2) Random Sampling: we have a random sample from the population that follows the above

More information

Review of probability and statistics 1 / 31

Review of probability and statistics 1 / 31 Review of probability and statistics 1 / 31 2 / 31 Why? This chapter follows Stock and Watson (all graphs are from Stock and Watson). You may as well refer to the appendix in Wooldridge or any other introduction

More information

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is Practice Final Exam Last Name:, First Name:. Please write LEGIBLY. Answer all questions on this exam in the space provided (you may use the back of any page if you need more space). Show all work but do

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Asymptotics Asymptotics Multiple Linear Regression: Assumptions Assumption MLR. (Linearity in parameters) Assumption MLR. (Random Sampling from the population) We have a random

More information

Econometrics I. by Kefyalew Endale (AAU)

Econometrics I. by Kefyalew Endale (AAU) Econometrics I By Kefyalew Endale, Assistant Professor, Department of Economics, Addis Ababa University Email: ekefyalew@gmail.com October 2016 Main reference-wooldrigde (2004). Introductory Econometrics,

More information

Econ 510 B. Brown Spring 2014 Final Exam Answers

Econ 510 B. Brown Spring 2014 Final Exam Answers Econ 510 B. Brown Spring 2014 Final Exam Answers Answer five of the following questions. You must answer question 7. The question are weighted equally. You have 2.5 hours. You may use a calculator. Brevity

More information

Panel Data Models. Chapter 5. Financial Econometrics. Michael Hauser WS17/18 1 / 63

Panel Data Models. Chapter 5. Financial Econometrics. Michael Hauser WS17/18 1 / 63 1 / 63 Panel Data Models Chapter 5 Financial Econometrics Michael Hauser WS17/18 2 / 63 Content Data structures: Times series, cross sectional, panel data, pooled data Static linear panel data models:

More information