Econ 5150: Applied Econometrics Dynamic Demand Model Model Selection. Sung Y. Park CUHK

Similar documents
Model comparison and selection

Testing Restrictions and Comparing Models

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

Model Selection and Geometry

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

MS-C1620 Statistical inference

Sparse Linear Models (10/7/13)

Bayesian Estimation of Regression Coefficients Under Extended Balanced Loss Function

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems

On the equivalence of confidence interval estimation based on frequentist model averaging and least-squares of the full model in linear regression

Regression I: Mean Squared Error and Measuring Quality of Fit

Extended Bayesian Information Criteria for Model Selection with Large Model Spaces

Least Squares Regression

Machine Learning for OR & FE

Analysis Methods for Supersaturated Design: Some Comparisons

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models

Regression, Ridge Regression, Lasso

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

Introductory Econometrics

Least Squares Regression

Day 4: Shrinkage Estimators

Spring 2017 Econ 574 Roger Koenker. Lecture 14 GEE-GMM

Linear Regression. Junhui Qian. October 27, 2014

Empirical Market Microstructure Analysis (EMMA)

Vector Auto-Regressive Models

VAR Models and Applications

Prelim Examination. Friday August 11, Time limit: 150 minutes

The regression model with one fixed regressor cont d

Machine Learning Linear Classification. Prof. Matteo Matteucci

ECON 4160: Econometrics-Modelling and Systems Estimation Lecture 9: Multiple equation models II

Quick Review on Linear Multiple Regression

ISyE 691 Data mining and analytics

Bayesian Decision and Bayesian Learning

STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă

SGN Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection

Lecture 7 Introduction to Statistical Decision Theory

A Significance Test for the Lasso

Graduate Econometrics I: Unbiased Estimation

Bayesian Gaussian / Linear Models. Read Sections and 3.3 in the text by Bishop

Statistics 910, #5 1. Regression Methods

Probabilistic machine learning group, Aalto University Bayesian theory and methods, approximative integration, model

Covariance function estimation in Gaussian process regression

Advanced Statistics I : Gaussian Linear Model (and beyond)

Lecture 2 Machine Learning Review

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Bayesian Decision Theory

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)

Data Mining Stat 588

Statistics 262: Intermediate Biostatistics Model selection

STA414/2104 Statistical Methods for Machine Learning II

Modelling the Covariance

ARIMA Modelling and Forecasting

Quantile Regression for Panel/Longitudinal Data

Transformations The bias-variance tradeoff Model selection criteria Remarks. Model selection I. Patrick Breheny. February 17

Estimating prediction error in mixed models

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over

Vector Autoregressive Model. Vector Autoregressions II. Estimation of Vector Autoregressions II. Estimation of Vector Autoregressions I.

Short T Panels - Review

High-dimensional Covariance Estimation Based On Gaussian Graphical Models

Chapter 3: Maximum Likelihood Theory

Lecture 14: Shrinkage

ARMA MODELS Herman J. Bierens Pennsylvania State University February 23, 2009

Linear Model Selection and Regularization

ECON 4160, Autumn term Lecture 1

Topic 12 Overview of Estimation

Frequentist-Bayesian Model Comparisons: A Simple Example

High-dimensional regression with unknown variance

Testing methodology. It often the case that we try to determine the form of the model on the basis of data

Association studies and regression

Statistical Measures of Uncertainty in Inverse Problems

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Machine Learning Linear Regression. Prof. Matteo Matteucci

5. Erroneous Selection of Exogenous Variables (Violation of Assumption #A1)

General Linear Model: Statistical Inference

Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13)

For more information about how to cite these materials visit

Introduction to Bayesian Inference

Short Questions (Do two out of three) 15 points each

DETECTION theory deals primarily with techniques for

David Giles Bayesian Econometrics

Linear Models A linear model is defined by the expression

11. Simultaneous-Equation Models

Econometrics of Panel Data

STA 4273H: Statistical Machine Learning

7. Integrated Processes

MLR Model Selection. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project

Linear Regression (9/11/13)

IEOR 165 Lecture 7 1 Bias-Variance Tradeoff

Vector Autoregression

1 Hypothesis Testing and Model Selection

Midterm Suggested Solutions

UNIVERSITETET I OSLO

ECE531 Lecture 6: Detection of Discrete-Time Signals with Random Parameters

Transcription:

Econ 5150: Applied Econometrics Dynamic Demand Model Model Selection Sung Y. Park CUHK

Simple dynamic models A typical simple model: y t = α 0 + α 1 y t 1 + α 2 y t 2 + x tβ 0 x t 1β 1 + u t, where y t is per-capita U.S. gasoline consumption and x t is a vector of exogenous variables, for example, x t =(1, p t, z t ). The lag operator: y t 1 = Ly t y t 2 = Ly t 1 = L 2 y t = Then (1 α 1 L α 2 L 2 )y t = α 0 +(β 0 + β 1 L) x t + u t

Simple dynamic models More compactly A(L)y t = α 0 B(L)x t + u t It is tempting to solve the above by writing y t = A(L) 1 α 0 + A(L) 1 B(L) x t + A(L) 1 u t This model is called linear transfer function model. How can we interpret the above model? We may want to explain the notion of equilibrium forms of the above model.

Simple dynamic models Stability in linear difference equations: Consider the simplest possible case By repeated substitution X t = ax t 1 X t = ax t 1 = a 2 X t 2 = = a t X 0 where X 0 denotes an initial condition. a < 1: X t 0 a > 1: X t a =1: eitherx t X 0 or X t = ±X 0

Simple dynamic models Consider the second order difference equation: X t = a 1 X t 1 + a 2 X t 2. The solutions take the form, X t = A 1 θ t 1 + A 2 θ t 2, where A 1 and A 2 are parameters determined by initial conditions and the θ s are dependent on the a s. By substituting A 1 θ t 1 + A 2θ t 2 = a 1(A 1 θ t 1 1 + A 2 θ t 1 2 )+a 2 (A 1 θ t 2 1 + A 2 θ t 2 2 ) or 0=A 1 θ t 1 (1 a 1θ 1 1 a 2 θ 2 1 )+A 2θ t 2 (1 a 1θ 1 2 a 2 θ 2 2 )

Simple dynamic models Suppose we find the roots of the quadratic equation 1 a 1 z a 2 z 2 =0 and call these roots θ1 1 and θ2 1. Done... Stability? Suppose that all the roots are real: both θ 1 and θ 2 must be less than one in absolute value. θ is complex: θ = λ 1 + λ 2 i we can represent θ in polar coordinates θ = r(cos(φ)+i sin(φ)), where r =(λ 2 1 + λ2 2 )1/2,cos(φ) =λ 1 /r, sin(φ) =λ 2 /r.

Simple dynamic models Thus, it is necessary that the roots of the equation 1 a 1 z a 2 z 2 =0 should lie outside of the unit circle. Roots outside unit circle are good (stability). Roots inside unit circle: explosive behavior. Roots on the unit circle: Unit root.

Impulse response functions Interpreting the expression D(L) =A(L) 1 B(L) Consider or B(L) =A(L)D(L) β 0 + β 1 L + + β s L s =(1 α 1 L α r L r )(δ 0 + δ 1 L + )

Impulse response functions For j s β 0 = δ 0 β 1 = δ 0 α 1 + δ 1 β 2 = δ 0 α 2 δ 1 α 1 + δ 2. β j = δ 0 α j δ j 1 α 1 + δ j This means that a system can be solved recursively given the α, β s for the δ s. More generally, δ j = { j r i=1 α iδ j i + β j j s j r i=1 α iδ j i j > s

Impulse response functions The function of cumulative sums of the δ s Δ(j) = j i=1 δ i the impulse response function: provide a complete picture of the time pathof the response of y to a once-and-for-all unit shock in x. Case: a single exogenous variable x stays at x 0 for a long time. Thus y is randomly fluctuating around an equilibrium value y 0.Nowx changes tt x 1 and stays there. What happens to y?

Impulse response functions EΔy t = A(L) 1 B(L)Δx t = D(L)Δx t D(1)Δx = δ i Δx i=1 a new equilibrium : the accumulation of the short-run impulse response a new equilibrium : can be calculated simply by letting y t = y e and x t = x e. if the roots of the A(z) = 0 lie outside the unit circle... Inferences?

Error correction form Consider the following simple dynamic model: y t = α 1 y t 1 + α 0 + β 0 x t + β 1 x t 1 + u t In equilibrium with x t x e y e = α 0 1 α 1 + β 0 + β 1 1 α 1 x e + 1 1 α 1 u t subtract y t from bothsides of the model and add and subtract β 0 x t 1 or Δy t =(α 1 1)y t 1 + α 0 + β 0 Δx t +(β 0 + β 1 )x t 1 + u t Δy t = β 0 Δx t +(α 1 1)[y t 1 α 0 1 α 1 β 0 + β 1 1 α 1 x t 1 ]+u t

Model selection Consider a collection of parametric models: {f i (x,θ)}, where θ Θ j for j =1,, J. Some linear structure usually imposed on the parameter space: Θ j = m j θj,wherem j is a linear subspace of R p J and p 1 < p 2 < < p J. Also assume that the models are nested: θ 1 θ 2 θ J.

Model selection Akaike information criterion [Akaike (1969)] AIC(j) =l j (ˆθ) p j, where l j (ˆθ) denotes the log-likelihood corresponding to the j t h model. Akaike s selection rule is simply choose the model j which maximizes AIC(j). Schwarz s information criterion [Schwarz (1978)] SIC(j) =l j (ˆθ) 1 2 p j log n where ĵ =argmaxs(j). p(ĵ = j ) 1. (1/2) log n > 1forn > 8 the SIC penalty is larger than the AIC penalty.

Model selection Connection with classical hypothesis testing: Under quite general conditions for nested models for p j > p i = p. 2(l j (ˆθ j ) l i (ˆθ i )) χ 2 p j p i SIC would choose j over i iff 2(l j l i ) p j p i > log n log n can be interpreted as an implicit critical value for the model selection decision based on SIC Make sense? AIC: an implicit critical value is 2: positive probability of Type I error.

Model selection SIC in the linear regression model: consider the Gaussian linear regression model: l(β,σ) = n 2 log(2π) n 2 log σ2 S 2σ 2 where S =(y X β) (y X β). Evaluating at ˆβ and ˆσ 2 = S/n l( ˆβ, ˆσ) = n 2 log(2π) n 2 n log ˆσ2 2 Thus we maximize SIC l i 1 2 p i log(n) which is the same as minimizing log ˆσ 2 j +(p j /n)logn

Model selection Connection with F-test statistic: Note l i l j = n 2 (log ˆσ2 j log ˆσ 2 i ) = n 2 log(ˆσ2 j /ˆσ i 2 ) ( ) = n 2 log 1 ˆσ2 i ˆσ j 2 ˆσ i 2 Usual Taylor-series approximation for log(1 ± a) fora small 2(l i l j ) n(ˆσ2 j ˆσ i 2) ˆσ i 2.

Model selection, Shrinkage and the LASSO The information criterion approach: balance the two objectives of simplicity (penalty) and goodness-of-fit (fidelity). Too simple model risks serious bias Too complicate model risks high degree of uncertainty Start with Bayesian method for linear regression model: Shrinkage methods or Stein-rule methods

Model selection, Shrinkage and the LASSO Consider the linear model: where u N(0,σ 2 I). y = X β + u, L(y b) =(2π) n/2 σ n exp{ 1 2σ 2 ( ˆβ b) X X ( ˆβ b)} Suppose that we have a prior that β N(β 0, Ω), i.e., π(b) =(2π) p/2 Ω 1/2 exp{ 1 2 (b β 0) Ω 1 (b β 0 )} Using the Bayes rule p(b y) = L(y b) π(b) L(y b)π(b)db.

Model selection, Shrinkage and the LASSO Then p(b y) =κ exp{ 1 2 (b β) (σ 2 X X +Ω 1 )(b β)} where κ is a constant and β =(σ 2 (X X )+Ω 1 ) 1 (σ 2 (X X ) ˆβ +Ω 1 β 0 ). the posterior distribution is also Gaussian with mean β. ˆβ and β0 have covariance matrices σ 2 (x x) 1 and Ω, respectively. They are weighted by the inverses of the covariance matrices.

Model selection, Shrinkage and the LASSO Tibshirani (1996) considered the l 1 norm in the penalty term Pen(θ) = p θ i i=1 and he proposed the following regression model min (yi x i θ)2 + λpen(θ) θ for some appropriately chosen λ the lasso (least absolute shrinkage and selection operator). Ridge regression: min θ (yi x i θ)2 + λ p i=1 θ 2 i

Model selection, Shrinkage and the LASSO One can also use the l 1 fidelity criterion: min yi x i θ + λpen(θ) θ This has been done by Wang, Li and Jiang (JBES, 2007).

Model selection, Shrinkage and the LASSO Figure: LASSO and Ridge shrinkage

Bias and Variance Consider the following stylized situation in regression (long-model) (short-model) y = X β + Zγ + u y = X β + v What are the price we pay when we misspecify the model...

Bias and Variance Assume that the long model is true and we estimate the short model (omitted variables). E ˆβ s = E(X X ) 1 X y = E(X X ) 1 X (X β + Zγ + u) = β +(X X ) 1 X Zγ the bias associated with estimation of β Gγ =(X X ) 1 X Zγ where G is obtained by regressing the columns Z on the columns of X. Bias vanishes if γ =0orifX is orthogonal to Z.

Bias and Variance Example: One estimates a static model when a dynamic one is the true model. Suppose the correct specification: y t = α + p β i x t i + u t i=0 where x t is exogenous variable. Instead we estimate the static model y t = α + β 0 x t + v t the relationship between our estimate of β 0 in the static model and the coefficients of the dynamic model...

Bias and Variance E ˆβ 0 = β 0 + p g i β i where g i denotes the slope coefficient of the obtained in a regression of x t i on x t, and an intercept. If x t is strongly trended, then these g i will tend to be close to one and E ˆβ 0 will be close to p i=0 β i: long-run effect. i=1

Bias and Variance Assume that the short model is true and we estimate the long model. bias?... E ˆβ L = E(X M Z X ) 1 X M Z y = E(X M Z X ) 1 X M Z (X β + u) = β Happy? There is a price to be paid of estimating parameters γ...

Bias and Variance Proposition ˆβ s = ˆβ L + G ˆγ L Proposition Assuming V (y) =E(y Ey)(y Ey) = σ 2 I, V ( ˆβ L )=V ( ˆβ s )+GV (ˆγ L )G... the variability of the long estimate always exceeds the variability of the short estimate... but...

Fishing concerns the difficulties associated with preliminary testing and model selection... based on Freedman (1983, American Statistician) (seealso Leeb and Pötscher (2005, ET)) He consider a model of the form: y i = x i β 0 + u i where u i iidn (0,σ 2 ). The matrix X =(x i )isn p and X X = I p. And p as n so that p/n ρ for some 0 <ρ<1. He also assumes β 0 =0.

Fishing Theorem For the above model, R 2 n ρ and F n 1. Proof: The usual F n statistic for the model is really distributed as F.So EF n =(n p)/(n p 2) which tends to 1. And so F n = n p 1 p R 2 n 1 Rn 2 ( ) n p 1 Rn 2 = F / + F p Thus since F 1wehavethatR 2 n ρ.

Fishing Now consider the following case: all p variables are initially tried. Those attaining α-level of significance in a standard t-test are retained, say, q n,α of them. Then the model is reestimated with only these variables. Theorem For the above model, R 2 n,α g(λ α )ρ and F n,α where g(λ) = z >λ and λ is chosen so Φ(λ) =1 α/2. z 2 φ(z)dz ( ) g(λα) α / ( ) 1 g(λ)ρ 1 αρ,

Fishing Example: Suppose that n = 100, p = 50, so ρ =1/2. Set α =0.25 so λ =1.15 and g(λ) =0.72. Then E(Z 2 z >λ) 2.9 Rn,α 2 g(λ) 0.72 0.5 0.36 ( ) g(λ) F n,α α (1 g(λ)ρ) 4.0 (1 αρ) Eq n,α = αρn =0.25 0.50 100 12.5 F 12,88,0.05 =1.88 P(F 12,88 > 4.0) 0.0001