Mostly Dangerous Econometrics: How to do Model Selection with Inference in Mind

Similar documents
Uniform Post Selection Inference for LAD Regression and Other Z-estimation problems. ArXiv: Alexandre Belloni (Duke) + Kengo Kato (Tokyo)

Econometrics of Big Data: Large p Case

Econometrics of High-Dimensional Sparse Models

Program Evaluation with High-Dimensional Data

Big Data, Machine Learning, and Causal Inference

HIGH-DIMENSIONAL METHODS AND INFERENCE ON STRUCTURAL AND TREATMENT EFFECTS

Data with a large number of variables relative to the sample size highdimensional

EMERGING MARKETS - Lecture 2: Methodology refresher

SPARSE MODELS AND METHODS FOR OPTIMAL INSTRUMENTS WITH AN APPLICATION TO EMINENT DOMAIN

Exam ECON5106/9106 Fall 2018

Ultra High Dimensional Variable Selection with Endogenous Variables

INFERENCE ON TREATMENT EFFECTS AFTER SELECTION AMONGST HIGH-DIMENSIONAL CONTROLS

Shrinkage Methods: Ridge and Lasso

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data?

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

Interpreting Regression Results

Inference on treatment effects after selection amongst highdimensional

INFERENCE IN ADDITIVELY SEPARABLE MODELS WITH A HIGH DIMENSIONAL COMPONENT

Regression, Ridge Regression, Lasso

Econ 2120: Section 2

Statistical Inference

Lecture 7: Dynamic panel models 2

Locally Robust Semiparametric Estimation

DSGE Methods. Estimation of DSGE models: GMM and Indirect Inference. Willi Mutschler, M.Sc.

Lecture 6: Dynamic panel models 1

150C Causal Inference

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

High-Dimensional Sparse Econometrics. Models, an Introduction

DSGE-Models. Limited Information Estimation General Method of Moments and Indirect Inference

Construction of PoSI Statistics 1

STAT 518 Intro Student Presentation

Topic 4: Model Specifications

2. Linear regression with multiple regressors

1 Motivation for Instrumental Variable (IV) Regression

Motivation for multiple regression

Motivation Non-linear Rational Expectations The Permanent Income Hypothesis The Log of Gravity Non-linear IV Estimation Summary.

Econometrics in a nutshell: Variation and Identification Linear Regression Model in STATA. Research Methods. Carlos Noton.

arxiv: v3 [stat.me] 9 May 2012

Reliability of inference (1 of 2 lectures)

Flexible Estimation of Treatment Effect Parameters

Predicting bond returns using the output gap in expansions and recessions

INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION. 1. Introduction

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

ECE521 week 3: 23/26 January 2017

Inference in Nonparametric Series Estimation with Data-Dependent Number of Series Terms

interval forecasting

IV Quantile Regression for Group-level Treatments, with an Application to the Distributional Effects of Trade

Honest confidence regions for a regression parameter in logistic regression with a large number of controls

Essential of Simple regression

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Classification & Regression. Multicollinearity Intro to Nominal Data

Machine Learning and Applied Econometrics

ACE 564 Spring Lecture 8. Violations of Basic Assumptions I: Multicollinearity and Non-Sample Information. by Professor Scott H.

Variable Targeting and Reduction in High-Dimensional Vector Autoregressions

Econometrics. Week 11. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Simultaneous Inference for Best Linear Predictor of the Conditional Average Treatment Effect and Other Structural Functions

ECON 4160, Lecture 11 and 12

Andreas Steinhauer, University of Zurich Tobias Wuergler, University of Zurich. September 24, 2010

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.

Outline. 11. Time Series Analysis. Basic Regression. Differences between Time Series and Cross Section

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

De-biasing the Lasso: Optimal Sample Size for Gaussian Designs

G. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication

Machine learning, shrinkage estimation, and economic theory

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Time Series and Forecasting Lecture 4 NonLinear Time Series

Learning 2 -Continuous Regression Functionals via Regularized Riesz Representers

Outline. Possible Reasons. Nature of Heteroscedasticity. Basic Econometrics in Transportation. Heteroscedasticity

Habilitationsvortrag: Machine learning, shrinkage estimation, and economic theory

Selective Inference for Effect Modification

Cross-Validation with Confidence

INFERENCE IN HIGH DIMENSIONAL PANEL MODELS WITH AN APPLICATION TO GUN CONTROL

Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis"

Linear Regression. Junhui Qian. October 27, 2014

What s New in Econometrics? Lecture 14 Quantile Methods

Exercises (in progress) Applied Econometrics Part 1

The risk of machine learning

Why Do Statisticians Treat Predictors as Fixed? A Conspiracy Theory

Machine Learning Linear Regression. Prof. Matteo Matteucci

New Developments in Econometrics Lecture 16: Quantile Estimation

Applied Microeconometrics (L5): Panel Data-Basics

High Dimensional Sparse Econometric Models: An Introduction

Autocorrelation. Think of autocorrelation as signifying a systematic relationship between the residuals measured at different points in time

ESL Chap3. Some extensions of lasso

Iris Wang.

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

LASSO METHODS FOR GAUSSIAN INSTRUMENTAL VARIABLES MODELS. 1. Introduction

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Dynamic Panel Data Models

ECON Introductory Econometrics. Lecture 16: Instrumental variables

The lasso, persistence, and cross-validation

Making sense of Econometrics: Basics

Multiple Linear Regression CIVL 7012/8012

Problem Set 7. Ideally, these would be the same observations left out when you

Heteroskedasticity. Part VII. Heteroskedasticity

Cross-fitting and fast remainder rates for semiparametric estimation

Statistics 910, #5 1. Regression Methods

Machine Learning for Economists: Part 4 Shrinkage and Sparsity

Can we do statistical inference in a non-asymptotic way? 1

DISCUSSION OF A SIGNIFICANCE TEST FOR THE LASSO. By Peter Bühlmann, Lukas Meier and Sara van de Geer ETH Zürich

Transcription:

Outline Introduction Analysis in Low Dimensional Settings Analysis in High-Dimensional Settings Bonus Track: Genaralizations Econometrics: How to do Model Selection with Inference in Mind June 25, 2015, MIT

Outline Introduction Analysis in Low Dimensional Settings Analysis in High-Dimensional Confronting Model Settings SelectionBonus An Example: Track: Genaralizations Effect of Institutions Introduction Richer data and methodological developments lead us to consider more elaborate econometric models than before.

Outline Introduction Analysis in Low Dimensional Settings Analysis in High-Dimensional Confronting Model Settings SelectionBonus An Example: Track: Genaralizations Effect of Institutions Introduction Richer data and methodological developments lead us to consider more elaborate econometric models than before. Focus discussion on the linear endogenous model y i outcome = d i treatment IE[ɛ i effect α + x i, z i exogenous vars p x ij β j + ɛ i, (1) j=1 controls ] = 0. noise

Outline Introduction Analysis in Low Dimensional Settings Analysis in High-Dimensional Confronting Model Settings SelectionBonus An Example: Track: Genaralizations Effect of Institutions Introduction Richer data and methodological developments lead us to consider more elaborate econometric models than before. Focus discussion on the linear endogenous model y i outcome = d i treatment IE[ɛ i effect α + x i, z i exogenous vars p x ij β j + ɛ i, (1) j=1 controls ] = 0. noise Controls can be richer as more features become available (Census characteristics, housing characteristics, geography, text data) big data

Outline Introduction Analysis in Low Dimensional Settings Analysis in High-Dimensional Confronting Model Settings SelectionBonus An Example: Track: Genaralizations Effect of Institutions Introduction Richer data and methodological developments lead us to consider more elaborate econometric models than before. Focus discussion on the linear endogenous model y i outcome = d i treatment IE[ɛ i effect α + x i, z i exogenous vars p x ij β j + ɛ i, (1) j=1 controls ] = 0. noise Controls can be richer as more features become available (Census characteristics, housing characteristics, geography, text data) big data Controls can contain transformation of raw controls in an effort to make models more flexible nonparametric series modeling, machine learning

Outline Introduction Analysis in Low Dimensional Settings Analysis in High-Dimensional Confronting Model Settings SelectionBonus An Example: Track: Genaralizations Effect of Institutions Introduction This forces us to explicitly consider model selection to select controls that are most relevant. Model selection techniques: CLASSICAL: t and F tests MODERN: Lasso, Regression Trees, Random Forests, Boosting

Outline Introduction Analysis in Low Dimensional Settings Analysis in High-Dimensional Confronting Model Settings SelectionBonus An Example: Track: Genaralizations Effect of Institutions Introduction This forces us to explicitly consider model selection to select controls that are most relevant. Model selection techniques: CLASSICAL: t and F tests MODERN: Lasso, Regression Trees, Random Forests, Boosting If you are using any of these MS techniques directly in (1), you are doing it wrong. Have to do additional selection to make it right.

Outline Introduction Analysis in Low Dimensional Settings Analysis in High-Dimensional Confronting Model Settings SelectionBonus An Example: Track: Genaralizations Effect of Institutions An Example: Effect of Institutions on the Wealth of Nations Acemoglu, Johnson, Robinson (2001) Impact of institutions on wealth y i log gdp per capita today = d i quality of institutions effect α + p x ij β j j=1 geography controls Instrument z i : the early settler mortality (200 years ago) Sample size n = 67 Specification of controls: +ɛ i, (2) Basic: constant, latitude (p=2) Flexible: + cubic spline in latitude, continent dummies (p=16)

Outline Introduction Analysis in Low Dimensional Settings Analysis in High-Dimensional Confronting Model Settings SelectionBonus An Example: Track: Genaralizations Effect of Institutions Example: The Effect of Institutions Institutions Effect Std. Err. Basic Controls.96 0.21 Flexible Controls.98 0.80 Is it ok to drop the additional controls?

Outline Introduction Analysis in Low Dimensional Settings Analysis in High-Dimensional Confronting Model Settings SelectionBonus An Example: Track: Genaralizations Effect of Institutions Example: The Effect of Institutions Institutions Effect Std. Err. Basic Controls.96 0.21 Flexible Controls.98 0.80 Is it ok to drop the additional controls? Potentially Dangerous.

Outline Introduction Analysis in Low Dimensional Settings Analysis in High-Dimensional Confronting Model Settings SelectionBonus An Example: Track: Genaralizations Effect of Institutions Example: The Effect of Institutions Institutions Effect Std. Err. Basic Controls.96 0.21 Flexible Controls.98 0.80 Is it ok to drop the additional controls? Potentially Dangerous. Very.

Outline Introduction Analysis in Low Dimensional Settings Analysis in High-Dimensional Effects of Institutions Settings Revisited BonusEffect Track: ofgenaralizations Abortion Murder Rates Analysis: things can go wrong even with p = 1 Consider a very simple exogenous model y i = d i α + x i β + ɛ i, IE[ɛ i d i, x i ] = 0. Common practice is to do the following. Post-single selection procedure: Step 1. Include x i only if it is a significant predictor of y i as judged by a conservative test (t-test, Lasso, etc.). Drop it otherwise. Step 2. Refit the model after selection, use standard confidence intervals. This can fail miserably, if β is close to zero but not equal to zero, formally if β 1/ n

Outline Introduction Analysis in Low Dimensional Settings Analysis in High-Dimensional Effects of Institutions Settings Revisited BonusEffect Track: ofgenaralizations Abortion Murder Rates What can go wrong? Distribution of n(ˆα α) is not what you think y i = d i α+x i β+ɛ i, d i = x i γ+v i α = 0, β =.2, γ =.8, n = 100 ɛ i N(0, 1) ( [ (d i, x i ) N 0, 1.8.8 1 ]) selection done by a t-test Reject H 0 : α = 0 (the truth) about 50% of the time (with nominal size of 5%)

Outline Introduction Analysis in Low Dimensional Settings Analysis in High-Dimensional Effects of Institutions Settings Revisited BonusEffect Track: ofgenaralizations Abortion Murder Rates What can go wrong? Distribution of n(ˆα α) is not what you think y i = d i α+x i β+ɛ i, d i = x i γ+v i α = 0, β =.2, γ =.8, n = 100 ɛ i N(0, 1) ( [ (d i, x i ) N 0, 1.8.8 1 ]) selection done by Lasso Reject H 0 : α = 0 (the truth) of no effect about 50% of the time

Outline Introduction Analysis in Low Dimensional Settings Analysis in High-Dimensional Effects of Institutions Settings Revisited BonusEffect Track: ofgenaralizations Abortion Murder Rates Solutions? Pseudo-solutions: Practical: bootstrap (does not work), Classical: assume the problem away by assuming that either β = 0 or β 0, Conservative: don t do selection

Outline Introduction Analysis in Low Dimensional Settings Analysis in High-Dimensional Effects of Institutions Settings Revisited BonusEffect Track: ofgenaralizations Abortion Murder Rates Solution: Post-double selection Post-double selection procedure (BCH, 2010, ES World Congress, ReStud, 2013): Step 1. Include x i if it is a significant predictor of y i as judged by a conservative test (t-test, Lasso etc). Step 2. Include x i if it is a significant predictor of d i as judged by a conservative test (t-test, Lasso etc). [In the IV models must include x i if it a significant predictor of z i ]. Step 3. Refit the model after selection, use standard confidence intervals. Theorem (Belloni, Chernozhukov, Hansen: WC ES 2010, ReStud 2013) DS works in low-dimensional setting and in high-dimensional approximately sparse settings.

Outline Introduction Analysis in Low Dimensional Settings Analysis in High-Dimensional Effects of Institutions Settings Revisited BonusEffect Track: ofgenaralizations Abortion Murder Rates Double Selection Works y i = d i α+x i β+ɛ i, d i = x i γ+v i α = 0, β =.2, γ =.8, n = 100 ɛ i N(0, 1) ( [ (d i, x i ) N 0, 1.8.8 1 ]) double selection done by t-tests Reject H 0 : α = 0 (the truth) about 5% of the time (for nominal size = 5%)

Outline Introduction Analysis in Low Dimensional Settings Analysis in High-Dimensional Effects of Institutions Settings Revisited BonusEffect Track: ofgenaralizations Abortion Murder Rates Double Selection Works y i = d i α+x i β+ɛ i, d i = x i γ+v i α = 0, β =.2, γ =.8, n = 100 ɛ i N(0, 1) ( [ (d i, x i ) N 0, 1.8.8 1 ]) double selection done by Lasso Reject H 0 : α = 0 (the truth) about 5% of the time (nominal size = 5%)

Outline Introduction Analysis in Low Dimensional Settings Analysis in High-Dimensional Effects of Institutions Settings Revisited BonusEffect Track: ofgenaralizations Abortion Murder Rates Intuition The Double Selection the selection among the controls x i that predict either d i or y i creates this robustness. It finds controls whose omission would lead to a large omitted variable bias, and includes them in the regression. In essence the procedure is a model selection version of Frisch-Waugh-Lovell partialling-put procedure for estimating linear regression. The double selection method is robust to moderate selection mistakes in the two selection steps.

Outline Introduction Analysis in Low Dimensional Settings Analysis in High-Dimensional Effects of Institutions Settings Revisited BonusEffect Track: ofgenaralizations Abortion Murder Rates More Intuition via OMVB Analysis Think about omitted variables bias: y i = αd i + βx i + ζ i ; d i = γx i + v i If we drop x i, the short regression of y i on d i gives n( α α) = good term + n (D D/n) 1 (X X/n)(γβ). }{{} OMVB the good term is asymptotically normal, and we want nγβ 0. single selection can drop x i only if β = O( 1/n), but nγ 1/n 0 double selection can drop x i only if both β = O( 1/n) and γ = O( 1/n), that is, if nγβ = O(1/ n) 0.

Outline Introduction Analysis in Low Dimensional Settings Analysis in High-Dimensional Effects of Institutions Settings Revisited BonusEffect Track: ofgenaralizations Abortion Murder Rates Example: The Effect of Institutions, Continued Going back to Acemoglu, Johnson, Robinson (2001): Double Selection: include x ij s that are significant predictors of either y i or d i or z i, as judged by Lasso. Drop otherwise. Intitutions Effect Std. Err. Basic Controls.96 0.21 Flexible Controls.98 0.80 Double Selection.78 0.19

Outline Introduction Analysis in Low Dimensional Settings Analysis in High-Dimensional Effects of Institutions Settings Revisited BonusEffect Track: ofgenaralizations Abortion Murder Rates Application: Effect of Abortion on Murder Rates in the U.S. Estimate the consequences of abortion rates on crime in the U.S., Donohue and Levitt (2001) y it = αd it + x it β + ζ it y it = change in crime-rate in state i between t and t 1, d it = change in the (lagged) abortion rate, 1. x it = basic controls ( time-varying confounding state-level factors, trends; p =20) 2. x it = flexible controls ( basic +state initial conditions + two-way interactions of all these variables) p = 251, n = 576

Outline Introduction Analysis in Low Dimensional Settings Analysis in High-Dimensional Effects of Institutions Settings Revisited BonusEffect Track: ofgenaralizations Abortion Murder Rates Effect of Abortion on Murder, continued Abortion on Murder Estimator Effect Std. Err. Basic Controls -0.204 0.068 Flexible Controls -0.321 1.109 Single Selection - 0.202 0.051 Double Selection -0.166 0.216 Double selection by Lasso: 8 controls selected, including state initial conditions and trends interacted with initial conditions

Outline Introduction Analysis in Low Dimensional Settings Analysis in High-Dimensional Effects of Institutions Settings Revisited BonusEffect Track: ofgenaralizations Abortion Murder Rates This is sort of a negative result, unlike in AJR (2011) Double selection doest not always overturn results. Plenty of positive results confirming: Barro and Lee s convergence results in cross-country growth rates; Poterba et al results on positive impact of 401(k) on savings; Acemoglu et al (2014) results on democracy causing growth;

Outline Introduction Analysis in Low Dimensional Settings Analysis in High-Dimensional Lasso as a Selection Settings DeviceBonus Uniform Track: Validity Genaralizations of the Double Selection High-Dimensional Prediction Problems Generic prediction problem u i = p x ij π j + ζ i, IE[ζ i x i ] = 0, i = 1,..., n, j=1 can have p = p n small, p n, or even p n. In the double selection procedure, u i could be outcome y i, treatment d i, or instrument z i. Need to find good predictors among x ij s. APPROXIMATE SPARSITY: after sorting, absolute values of coefficients decay fast enough: π (j) Aj a, a > 1, j = 1,..., p = p n, n RESTRICTED ISOMETRY: small groups of x ij s are not close to being collinear.

Outline Introduction Analysis in Low Dimensional Settings Analysis in High-Dimensional Lasso as a Selection Settings DeviceBonus Uniform Track: Validity Genaralizations of the Double Selection Selection of Predictors by Lasso Assuming x ijs normalized to have the second empirical moment to 1. Ideal (Akaike, Schwarz): minimize n u i i=1 2 p p x ij b j + λ 1{b j 0}. j=1 Lasso (Bickel, Ritov, Tsybakov, Annals, 2009): minimize n u i i=1 2 p p x ij b j + λ b j, j=1 j=1 j=1 λ = IEζ 2 2 2nlog(pn)

Outline Introduction Analysis in Low Dimensional Settings Analysis in High-Dimensional Lasso as a Selection Settings DeviceBonus Uniform Track: Validity Genaralizations of the Double Selection Selection of Predictors by Lasso Assuming x ijs normalized to have the second empirical moment to 1. Ideal (Akaike, Schwarz): minimize n u i i=1 2 p p x ij b j + λ 1{b j 0}. j=1 Lasso (Bickel, Ritov, Tsybakov, Annals, 2009): minimize n u i i=1 2 p p x ij b j + λ b j, j=1 j=1 j=1 λ = IEζ 2 2 2nlog(pn) Root Lasso (Belloni, Chernozhukov, Wang, Biometrika, 2011): minimize n u i 2 p p x ij b j + λ b j, i=1 j=1 j=1 λ = 2nlog(pn)

Outline Introduction Analysis in Low Dimensional Settings Analysis in High-Dimensional Lasso as a Selection Settings DeviceBonus Uniform Track: Validity Genaralizations of the Double Selection Lasso provides high-quality model selection Theorem (Belloni and Chernozhukov: Bernoulli, 2013, Annals, 2014) Under approximate sparsity and restricted isometry conditions, Lasso and Root-Lasso find parsimonious models of approximately optimal size s = n 1 2a. Using these models, the OLS can approximate the regression functions at the nearly optimal rates in the root mean square error: s n log(pn)

Outline Introduction Analysis in Low Dimensional Settings Analysis in High-Dimensional Lasso as a Selection Settings DeviceBonus Uniform Track: Validity Genaralizations of the Double Selection Double Selection in Approximately Sparse Regression Exogenous model p y i = d i α + x ij β j + ζ i, IE[ζ i d i, x i ] = 0, i = 1,..., n, j=1 p d i = x ij γ j + ν i, IE[ν i x i ] = 0, i = 1,..., n, j=1 can have p small, p n, or even p n. APPROXIMATE SPARSITY: after sorting absolute values of coefficients decay fast enough: β (j) Aj a, a > 1, γ (j) Aj a, a > 1. RESTRICTED ISOMETRY: small groups of x ij s are not close to being collinear.

Outline Introduction Analysis in Low Dimensional Settings Analysis in High-Dimensional Lasso as a Selection Settings DeviceBonus Uniform Track: Validity Genaralizations of the Double Selection Double Selection Procedure Post-double selection procedure (BCH, 2010, ES World Congress, ReStud 2013): Step 1. Include x ij s that are significant predictors of y i as judged by LASSO or OTHER high-quality selection procedure. Step 2. Include x ij s that are significant predictors of d i as judged by LASSO or OTHER high-quality selection procedures. Step 3. Refit the model by least squares after selection, use standard confidence intervals.

Outline Introduction Analysis in Low Dimensional Settings Analysis in High-Dimensional Lasso as a Selection Settings DeviceBonus Uniform Track: Validity Genaralizations of the Double Selection Uniform Validity of the Double Selection Theorem (Belloni, Chernozhukov, Hansen: WC 2010, ReStud 2013) Uniformly within a class of approximately sparse models with restricted isometry conditions σ 1 n n(ˇα α0 ) d N(0, 1), where σ 2 n is conventional variance formula for least squares. Under homoscedasticity, semi-parametrically efficient. Model selection mistakes are asymptotically negligible due to double selection. Analogous result also holds for endogenous models, see Chernozhukov, Hansen, Spindler, Annual Review of Economics, 2015.

Outline Introduction Analysis in Low Dimensional Settings Analysis in High-Dimensional Lasso as a Selection Settings DeviceBonus Uniform Track: Validity Genaralizations of the Double Selection Monte Carlo Confirmation In this simulation we used: p = 200, n = 100, α 0 =.5 approximately sparse model: R 2 =.5 in each equation y i = d i α + x i β + ζ i, ζ i N(0, 1) d i = x i γ + v i, v i N(0, 1) β j 1/j 2, γ j 1/j 2 regressors are correlated Gaussians: x N(0, Σ), Σ kj = (0.5) j k.

Outline Introduction Analysis in Low Dimensional Settings Analysis in High-Dimensional Lasso as a Selection Settings DeviceBonus Uniform Track: Validity Genaralizations of the Double Selection Distribution of Post Double Selection Estimator p = 200, n = 100 0 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8

Outline Introduction Analysis in Low Dimensional Settings Analysis in High-Dimensional Lasso as a Selection Settings DeviceBonus Uniform Track: Validity Genaralizations of the Double Selection Distribution of Post-Single Selection Estimator p = 200 and n = 100 0 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8

Outline Introduction Analysis in Low Dimensional Settings Analysis in High-Dimensional Settings Bonus Track: Genaralizations Generalization: Orthogonalized or Doubly Robust Moment Equations Goal: inference on structural parameter α (e.g., elasticity) having done Lasso & friends fitting of reduced forms η( ) Use orthogonalization methods to remove biases. This often amounts to solving auxiliary prediction problems. In a nutshell, we want to set up moment conditions IE[g( }{{} W, α 0, η }{{} 0 )] = 0 }{{} data structural parameter reduced form such that the orthogonality conditions hold: η IE[g(W, α 0, η)] = 0 η=η0 See my website for papers on this.

Outline Introduction Analysis in Low Dimensional Settings Analysis in High-Dimensional Settings Bonus Track: Genaralizations Inference on Structural/Treatment Parameters Without Orthogonalization With Orhogonalization

Outline Introduction Analysis in Low Dimensional Settings Analysis in High-Dimensional Settings Bonus Track: Genaralizations Conclusion It is time to address model selection Mostly dangerous: naive (post-single) selection does not work Double selection works More generally, the key is to use orthogonolized moment conditions for inference