To Hold Out or Not. Frank Schorfheide and Ken Wolpin. April 4, University of Pennsylvania
|
|
- Caren Hudson
- 5 years ago
- Views:
Transcription
1 Frank Schorfheide and Ken Wolpin University of Pennsylvania April 4, 2011
2 Introduction Randomized controlled trials (RCTs) to evaluate policies, e.g., cash transfers for school attendance, have become a prominent methodology in applied economics. Limitation: one cannot extrapolate outside of the treatment variation in the particular experiment. Given their cost, RCTs cannot be used to perform ex ante policy evaluation over a wide range of policy alternatives. Extrapolation to new treatments requires developing models that embed behavioral and statistical assumptions. It is thus important to have methods for assessing the relative credibility of models.
3 Introduction In practice researchers often hold out data from estimation to use for external validation, e.g., Wise (1985), Todd and Wolpin (2006), Duflo, Hanna and Ryan (2009). Further References Although having intuitive appeal, the use of holdout samples is puzzling from a Bayesian perspective, which prescribes using the entire sample to form posteriors. Our contributions: 1 Provide a formal, albeit stylized, framework in which data mining poses an impediment to the implementation of the ideal Bayesian analysis. 2 Provide a numerical illustration of potential costs of data mining and potential benefits of holdout samples designed to discourage data mining. We measure losses relative to the ideal Bayesian solution. (Structural) Data Mining: Process by which a modeler tries to improve the fit of a structural model during estimation, e.g., change functional forms, allow for unobserved heterogeneity, add latent state variables.
4 To Fix Ideas... A Working Example Evaluate impact of monetary subsidy to low-income households based on school attendance of their children. No direct tuition cost of schooling. Structural Models M i : Household solves (a = 1 means attend school): max U i (c, a; x, ɛ, ϑ i ) a {0,1} s.t. c = y + w(1 a) Decision rule a = ϕ i (y, w; x, ɛ, ϑ i ). Attendance subsidy s modifies budget constraint c = y + w(1 a) + sa = (y + s) + (w s) (1 a). }{{}}{{} ỹ w Optimal attendance choice in presence of subsidy is a = ϕ i (ỹ, w; x, ɛ, ϑ i ).
5 Example Continued Social experiment: a randomly selected treatment sample has been offered a subsidy, s = s; a randomly selected control sample, has not been offered the subsidy, s = 0. Policy maker would like to have an estimate of how sensitive the outcome is to varying the subsidy level. It is too costly to vary subsidy in experiment.
6 Example Continued Change of notation: Y is outcome; S subsidy; X i, i = 1, 2, are characteristics such as income and wage. Assumptions: n observations, 50% control and 50% treatment sample. Let X = [X 1 X 2]. Then: [ 1 n X X p σ1 2 ρσ 1σ 2 Γ = ρσ 1σ 2 σ2 2 ]. The treatment is determined independently of the covariates: 1 n X S p 0.
7 Two Modelers Policy maker engages 2 modelers in this endeavor: M i, i = 1, 2. Structural models embody restrictions that allow the extrapolation of policy effects even though no variation in the policy instrument has been observed ( extrapolation by theory ). Approximation/simplification of the attendance function ϕ i ( ): Write model as linear regression: M i : Y = X i β i + Sθ + U, U (X, S) N(0, I ), i = 1, 2. (Structural) model restriction: θ = β i. Cross-coefficient restriction rules out need of treatment sample for identification. ( 1 Prior: θ N 0, ). nλ 2
8 Policy Maker Goal: predicting the effects of a counterfactual subsidy level s s. Assumption: no counterfactual policy predictions with reduced form model. The policy maker can estimate a simple reduced form model: M pol : Y = Sθ + V ˆθ(Y, S) M pol provides a consistent estimate of treatment effect. But, M pol is unable to answer the question of interest. For model selection/averaging the particular counterfactual policy s is irrelevant: either policy maker weights models based on fit or on their ability to predict effect of actual subsidy level s.
9 Ideal Case: Full Bayesian Analysis Policy maker assigns prior probabilities π i,0 to M i. From the policy maker s perspective the overall posterior distribution of the treatment effect is given by the mixture p(θ Y, X, S) = π i,n p(θ Y, X, S, M i ), i=1,2 Model weights are based on marginal likelihood p(y X, S, M i ) = p(y θ, X, S, M i )p(θ M i )dθ. θ Θ Treatment effect estimates conditional on full sample: p(θ Y, X, S, M i ) = p(y X, S, θ, M i)p(θ M i ). p(y X, S, M i ) π 1,n π 2,n = π 1,0 π 2,0 p(y X, S, M 1 ) p(y X, S, M 2 )
10 Remark: Full Bayesian Analysis The assumption that θ = β i N ( 0, 1/(nλ 2 ) ) implies that models remain asymptotically difficult to distinguish: log posterior odds of M 1 and M 2 are not divergent as n. In reality policy makers are confronted with multiple models that are potentially consistent with the observed data.
11 Impediments to Full Bayesian Analysis The policy maker is concerned that the modelers engage in data mining and do not report the marginal data densities p(y X, S, M i ) associated with their models truthfully. The policy maker has the option of providing modelers with only a subset of the outcome data: Partition Y = [Y r, Y p], here r stands for regression; p stands for prediction (holdout sample). We assume researchers have access to the full data vectors X 1, X 2, S. and to request a predictive density for the holdout sample p(y p Y r, X, S, M i ), a predictive distribution for the PM s estimate of the treatment effect ˆθ(Y, S) M pol : p(ˆθ([y r, Y p], S) Y r, X, S, M i ). Next step: characterize behavior of modelers if they have access to the full sample Y (Case 1); the sub-sample Y r (Case 2).
12 Case 1: Modeler Has Access to Full Sample Y Our stylized representation of data mining = data-based modification of prior distribution 1 break link between β and θ by introducing an additional parameter ψ such that θ = β i + ψ; 2 center prior at maximum likelihood estimate. Step 1: Write the model as Y = X i (θ ψ) + Sθ + U = X i θ X i ψ + U, where X i = X i + S. Let M Xi = I X i ( X i X i ) 1 X i. Then: ψ = (X i M X i X i ) 1 X i M X i Y. Data-miner subsequently imposes the relationship θ = β i + ψ.
13 Case 1: Modeler Has Access to Full Sample Y Step 2: Modified Model M i : Ỹ i = X i θ + U, with Ỹ i = Y + X i ψ. Maximum likelihood estimator: θ i = ( X i X i ) 1 X i Ỹ i. Data-mining prior: ( θ M i N θ i, (κ X i X i ) ). 1
14 Case 1: Modeler Has Access to Full Sample Y Modeler is able raise the marginal likelihood from: p(y X, S, M i ) = (2π) n/2 λ X i X i /n + λ 2 1/2 { exp 1 } 2 [Y (I X i ( X i X i + nλ 2 ) 1 X i )Y ] ; to: ) 1/2 ( p(y X, S, M i ) = (2π) n/2 κ κ + 1 { exp 1 } 2 [Ỹ i (I X i ( X i X i ) 1 X i )Ỹ i ]. Penalty term is eliminated. In-sample-fit term Ỹ i (I X i ( X i X i ) 1 X i )Ỹi corresponds to unrestricted regression: Y = X i β i + Sθ + U. Policy maker ends up computing distorted model posteriors based on p(y X, S, M i ).
15 Case 2: Modeler Only Has Access to Subsample Y r Modeler is asked to report a predictive density for Y p. Modeler contemplates reporting p(y p Y r, X, S, M i ) instead of p(y p Y r, X, S, M i ). By Jensen s inequality, the expected log ratio of the predictive likelihoods is: [ p(y p Y r, X, S, ln M ] i ) p(y p Y r, X, S, M i )dy p 0 p(y p Y r, X, S, M i ) Deduce: the use of predictive densities for a holdout sample makes it optimal for the modeler to reveal p(y p Y r, X, S, M i ).
16 Case 2: Modeler Only Has Access to Subsample Y r However, we allow the modeler to consider a reference model M i0 that takes the form (similar to above) M i0 : Y = β i X i + θs + U, β i N(0, nλ 2 ), θ N(0, nλ 2 ). Modeler computes posterior probabilities for M i and M i0. Predictive distribution for hold-out sample: p i (Y p Y r, X, X ) = π i0,r p(y p Y r, X, S, M i0 ) +π i,r p(y p Y r, X, S, M i ) Behavioral implication (approximately): If the modeler finds M i rejected against M i0 (π i0,r 1), he reports p(y p Y r, X, S, M i0 ): data mining on predictive density. Otherwise, he reports p(y p Y r, X, S, M i ).
17 So far: From the Policy Maker s Perspective... If modelers are provided with entire sample Y, they data-mine and report results from model Mi. If modelers are provided with a subsample Y r, they can potentially assess their restrictions θ = β i and either report results from their actual model M i or the reference model M i0, depending on the relative fit. If Y r contains no information from the treatment sample, then modelers have no evidence against θ = β i and always reveal M i. In the case of a holdout sample, the policy maker could either use predictive distributions for Y p or ˆθ(Y p, ) to weight competing models.
18 When you come to a fork in the road take it. (Yogi Berra) Post-model-averaging Model weights based on Model weights based on estimation Y p Y r pred. density ˆθ Yr pred. density based only on r = 0 implements Y r sample Bayesian model weights (clearly dominated) based on r = 0 implements full Y sample Bayesian solution (see illustration) Model building without data? Reporting high-dimensional predictive densities for Y p? Current practice in treatment effect literature comes closest to choosing model weights based on the ˆθ-predictive density. lim r 0 p(y 1 r Y r, X, S, M i ) = p(y X, S, M i ).
19 Numerical Illustration First, we present results conditional on M i and/or (θ, β i ). Second, we present results under the marginal distribution of the data p(y, X, S) = 1 2 p(y, X, S M 1) p(y, X, S M 2), where p(y, X, S M i ) = p(x )p(s) p(y θ, β, X, S, M i )p(β, θ M i )d(β, θ). τ is fraction of observations from the treatment group in the regression sample Y p. We consider: τ = τ min, where τ min = 0 for r 0.5 and then converges to 0.5 as r 1. τ = 0.5 Rather than conducting model averaging, we consider degenerate model weights that are either 0 or 1 (model selection).
20 Parameterization Observable characteristics X : σ 2 1 = σ2 2 = σ2 = 2, ρ = 0.2; Treatment: s = 2; Sample size: n = 1, 000 (we have a well defined limit distribution). Policy maker: prior probability 0.5 for M 1 and M 2. Modelers: prior probability of 0.52 for M i and 0.48 for M i0 ; λ 1 = λ 2 = 1; Implication of experimental design: Probability of highest posterior probability model being the true model: Integrated: 0.68 Conditional on θ = (5 prior stdd): 1.00 Conditional on θ = (0.2 prior stdd): 0.51
21 Policy Experiment and Loss Function Raise subsidy from s = 2 to s = 4. Predict outcome for an individual whose relevant characteristic x i = σ and whose irrelevant characteristic x i = ρσ. Loss function is quadratic: L(y, ŷ) = (y ŷ) 2. Optimal predictor is posterior mean; we consider posterior mean conditional on highest posterior probability model: ŷ bayes = ˆβX i + ˆθS = ˆθ bayes (σ + s ) We report the expected value of (ŷ ŷ bayes ) 2 under the marginal density of Y (integrated risk differential).
22 Policy Experiment and Loss Function Suppose M 1 is the highest posterior probability model. The following outcomes are possible. Full sample data mining if modelers have access to full sample Y : Modelers introduce ψ; estimates of β and θ deviate from ˆθ bayes. Data mining on predictive density: 1 Modeler 1 is honest and M 1 is selected: ŷ = ŷ bayes. 2 Modeler 1 is not honest and policy maker ends up selecting M 1,0. Misses restriction θ = β i 3 Modeler 2 is honest and M 2 is selected. Uses wrong x. 4 Modeler 2 is not honest and policy maker ends up selecting M 2,0. Uses wrong x, misses restriction. Example: r = 0.5, τ = 0.5 Case 1 Case 2 Case 3 Case 4 Probability Cond. E[(ŷ j ŷ bayes ) 2 ]
23 Composition of Estimation Sample Y r, n = 1, 000 τ = τ min τ = 0.5 Control Treatment Control Treatment r = r = r = r =
24 Probability that Modeler 1 is Honest Cond. on M 1 M1 is solid τ = τ min is blue, τ = 0.5 is red
25 Probability that Modeler 2 is Honest Cond. on M 1 M2 is dashed τ = τ min is blue, τ = 0.5 is red
26 Probability that Modelers are Honest Cond. on M 1 M1 is solid, M2 is dashed τ = τ min is blue
27 Probability that Modelers are Honest Cond. on M 1 M1 is solid, M2 is dashed τ = 0.5 is red
28 Probability that Modelers are Honest Cond. on M 1 M1 is solid, M2 is dashed τ = τ min is blue, τ = 0.5 is red
29 Probability that Modelers are Honest r 0.5, τ min = 0: the modelers have no information that allows them to test the restriction of their model. In turn, they are honest with probability 1. τ = 0.5 and θ is small: even for small values of r the modelers find their restrictions rejected with some probability. For large values of θ modeler M 1 does not find his restrictions rejected, whereas modeler M 2 does with probability 0.6 for r = For small values of θ both modelers find their restrictions rejected with approximately equal probability.
30 Prob. PM Finds Best Model Cond. on M 1 and θ ˆθ-density-based selection τ = τ min is blue, τ = 0.5 is red
31 Prob. PM Finds Best Model Cond. on M 1 and θ The figure confounds the probability that the modelers are honest and the probability that the predictive-density-based selection find the highest posterior probability model. Large value of θ: τ = τ min dominates τ = 0.5. Inverted U-shape. For r 0.5 policy maker finds highest prob model almost with certainty. Conjecture: small r suffers from imprecise estimate of θ; large r from short evaluation sample Y p. Small value of θ: Policy maker finds highest posterior probability model with at most probability 1/2. For τ = 0.5 and r < 0.5 there is a visible effect of predictive data mining, i.e. the use of M i0 instead of M i.
32 Risk Cond. on M 1 and θ ˆθ-density-based selection τ = τ min is blue, τ = 0.5 is red Data mining on full sample is green
33 Risk Cond. on M 1 and θ Results mirror the probability of PM finding the highest posterior probability model. For large values of θ the policy maker can with r = 0.5 and τ min = 0 obtain a risk differential that is essentially zero. The risk associated with full sample data mining is large for both small and large values of θ.
34 Integrated Probability that Modelers are Honest M1 is solid, M2 is dashed τ = τ min is blue, τ = 0.5 is red
35 Integrated Probability that Modelers are Honest Blue vs. red lines: if r 0.5, then τ min = 0. Thus, the modelers have no information that allows them to test the restriction of their model. In turn, they are honest with probability 1. Blue vs. red lines: if τ = 0.5, then even for small values of r the modelers find their restrictions rejected with some probability. Blue vs. red For large values of r the difference between τ = 0.5 and τ = τ min vanishes as τ min 0.5. Solid versus dashed lines: conditional on M 1, the probability that M 2 finds his model rejected is higher than that of M 1 and vice versa.
36 Integrated Probability that PM Finds Best Model and Risk ˆθ-density-based selection τ = τ min is blue, τ = 0.5 is red Data mining on full sample is green
37 Relationship to Existing Literature Stone (1976): Cross-validation emphasizes that model validation on pseudo-holdout samples can generate a measure of fit that penalizes model complexity. Leamer (1984): Effect of specification searches on inference in non-experimental setting. Data Snooping: Lo and MacKinlay (1990): correcting tests of asset pricing theories based on data-snooped portfolios. White (2000): correcting standard errors for tests of no predictive superiority for specification searches. Discussion: In our framework the researcher has no access to Y p before the policy maker weights the models. Cross validation does not rule out our kind of data mining. In the context of structural modeling it is not feasible to mimic the data mining / specification search on samples that could have been observed.
38 Extensions Model misspecification: include a third model, M, such that the policy maker entertains the possibility that neither M 1 or M 2 are correct. Specification search versus data mining: modelers could discover that restrictions hold conditional on additional regressors. Non-random hold-out samples.
39 Conclusion We develop a framework that allows us to characterize potential costs of data mining and potential benefits of holdout samples designed to discourage data mining. In our numerical illustration we find that a model weighting based on a predictive density for the subsidy effect estimate that the policy maker can generate on the full sample is preferable to a selection based on full sample marginal likelihoods that are contaminated by data mining. In our setup the best results are obtained if the holdout sample consists purely of observations from the control group.
40 Literature: Examples of Random Holdout Samples Back Wise (1985) - housing rent subsidy experiment Todd and Wolpin (2006) - student attendance subsidy experiment Duflo, Hanna and Ryan (2009) - teacher attendance subsidy experiment
41 Literature: Examples of Non-random Holdout Samples Back Lumsdaine, Stock, and Wise (1992) - effect of introducing a pension window on retirement: estimation sample - pre-window period holdout sample - post-window period Kaboski and Townsend (2007) - effect of Thai Million Baht Program, a transfer to 80,000 villages to start village banks, on village investment estimation sample - pre-program period holdout sample - post-program period Keane and Wolpin (2007) - effect of welfare on female schooling, labor supply, marriage, fertility and take-up estimation sample - individuals in five states: California, Michigan, New York, North Carolina, Ohio holdout sample - Texas (very low welfare state)
NBER WORKING PAPER SERIES TO HOLD OUT OR NOT TO HOLD OUT. Frank Schorfheide Kenneth I. Wolpin. Working Paper
NBER WORKING PAPER SERIES TO HOLD OUT OR NOT TO HOLD OUT Frank Schorfheide Kenneth I. Wolpin Working Paper 19565 http://www.nber.org/papers/w19565 NATIONAL BUREAU OF ECONOMIC RESEARCH 1050 Massachusetts
More informationLecture I. What is Quantitative Macroeconomics?
Lecture I What is Quantitative Macroeconomics? Gianluca Violante New York University Quantitative Macroeconomics G. Violante, What is Quantitative Macro? p. 1 /11 Qualitative economics Qualitative analysis:
More informationLecture I. What is Quantitative Macroeconomics?
Lecture I What is Quantitative Macroeconomics? Gianluca Violante New York University Quantitative Macroeconomics G. Violante, What is Quantitative Macro? p. 1 /11 Qualitative economics Qualitative analysis:
More informationThe logistic regression model is thus a glm-model with canonical link function so that the log-odds equals the linear predictor, that is
Example The logistic regression model is thus a glm-model with canonical link function so that the log-odds equals the linear predictor, that is log p 1 p = β 0 + β 1 f 1 (y 1 ) +... + β d f d (y d ).
More informationTruncation and Censoring
Truncation and Censoring Laura Magazzini laura.magazzini@univr.it Laura Magazzini (@univr.it) Truncation and Censoring 1 / 35 Truncation and censoring Truncation: sample data are drawn from a subset of
More informationGibbs Sampling in Endogenous Variables Models
Gibbs Sampling in Endogenous Variables Models Econ 690 Purdue University Outline 1 Motivation 2 Identification Issues 3 Posterior Simulation #1 4 Posterior Simulation #2 Motivation In this lecture we take
More informationIntroduction to Bayesian Inference
University of Pennsylvania EABCN Training School May 10, 2016 Bayesian Inference Ingredients of Bayesian Analysis: Likelihood function p(y φ) Prior density p(φ) Marginal data density p(y ) = p(y φ)p(φ)dφ
More informationLoss Function Estimation of Forecasting Models A Bayesian Perspective
Loss Function Estimation of Forecasting Models A Bayesian Perspective Frank Schorfheide University of Pennsylvania, Department of Economics 3718 Locust Walk, Philadelphia, PA 19104-6297 schorf@ssc.upenn.edu
More informationPredicting the Treatment Status
Predicting the Treatment Status Nikolay Doudchenko 1 Introduction Many studies in social sciences deal with treatment effect models. 1 Usually there is a treatment variable which determines whether a particular
More informationIncentives Work: Getting Teachers to Come to School. Esther Duflo, Rema Hanna, and Stephen Ryan. Web Appendix
Incentives Work: Getting Teachers to Come to School Esther Duflo, Rema Hanna, and Stephen Ryan Web Appendix Online Appendix: Estimation of model with AR(1) errors: Not for Publication To estimate a model
More informationA Measure of Robustness to Misspecification
A Measure of Robustness to Misspecification Susan Athey Guido W. Imbens December 2014 Graduate School of Business, Stanford University, and NBER. Electronic correspondence: athey@stanford.edu. Graduate
More informationIntroduction to causal identification. Nidhiya Menon IGC Summer School, New Delhi, July 2015
Introduction to causal identification Nidhiya Menon IGC Summer School, New Delhi, July 2015 Outline 1. Micro-empirical methods 2. Rubin causal model 3. More on Instrumental Variables (IV) Estimating causal
More informationSequential Monte Carlo Methods (for DSGE Models)
Sequential Monte Carlo Methods (for DSGE Models) Frank Schorfheide University of Pennsylvania, PIER, CEPR, and NBER October 23, 2017 Some References These lectures use material from our joint work: Tempered
More informationFor more information about how to cite these materials visit
Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/
More informationMicroeconomic Theory (501b) Problem Set 10. Auctions and Moral Hazard Suggested Solution: Tibor Heumann
Dirk Bergemann Department of Economics Yale University Microeconomic Theory (50b) Problem Set 0. Auctions and Moral Hazard Suggested Solution: Tibor Heumann 4/5/4 This problem set is due on Tuesday, 4//4..
More informationECOM 009 Macroeconomics B. Lecture 2
ECOM 009 Macroeconomics B Lecture 2 Giulio Fella c Giulio Fella, 2014 ECOM 009 Macroeconomics B - Lecture 2 40/197 Aim of consumption theory Consumption theory aims at explaining consumption/saving decisions
More informationg-priors for Linear Regression
Stat60: Bayesian Modeling and Inference Lecture Date: March 15, 010 g-priors for Linear Regression Lecturer: Michael I. Jordan Scribe: Andrew H. Chan 1 Linear regression and g-priors In the last lecture,
More informationEmpirical approaches in public economics
Empirical approaches in public economics ECON4624 Empirical Public Economics Fall 2016 Gaute Torsvik Outline for today The canonical problem Basic concepts of causal inference Randomized experiments Non-experimental
More informationPrelim Examination. Friday August 11, Time limit: 150 minutes
University of Pennsylvania Economics 706, Fall 2017 Prelim Prelim Examination Friday August 11, 2017. Time limit: 150 minutes Instructions: (i) The total number of points is 80, the number of points for
More informationWhy experimenters should not randomize, and what they should do instead
Why experimenters should not randomize, and what they should do instead Maximilian Kasy Department of Economics, Harvard University Maximilian Kasy (Harvard) Experimental design 1 / 42 project STAR Introduction
More informationAdvanced Quantitative Research Methodology Lecture Notes: January Ecological 28, 2012 Inference1 / 38
Advanced Quantitative Research Methodology Lecture Notes: Ecological Inference 1 Gary King http://gking.harvard.edu January 28, 2012 1 c Copyright 2008 Gary King, All Rights Reserved. Gary King http://gking.harvard.edu
More informationRidge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation
Patrick Breheny February 8 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/27 Introduction Basic idea Standardization Large-scale testing is, of course, a big area and we could keep talking
More informationBayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference
1 The views expressed in this paper are those of the authors and do not necessarily reflect the views of the Federal Reserve Board of Governors or the Federal Reserve System. Bayesian Estimation of DSGE
More informationMachine learning, shrinkage estimation, and economic theory
Machine learning, shrinkage estimation, and economic theory Maximilian Kasy December 14, 2018 1 / 43 Introduction Recent years saw a boom of machine learning methods. Impressive advances in domains such
More informationCh. 5 Hypothesis Testing
Ch. 5 Hypothesis Testing The current framework of hypothesis testing is largely due to the work of Neyman and Pearson in the late 1920s, early 30s, complementing Fisher s work on estimation. As in estimation,
More informationHypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33
Hypothesis Testing Econ 690 Purdue University Justin L. Tobias (Purdue) Testing 1 / 33 Outline 1 Basic Testing Framework 2 Testing with HPD intervals 3 Example 4 Savage Dickey Density Ratio 5 Bartlett
More informationSelection on Observables: Propensity Score Matching.
Selection on Observables: Propensity Score Matching. Department of Economics and Management Irene Brunetti ireneb@ec.unipi.it 24/10/2017 I. Brunetti Labour Economics in an European Perspective 24/10/2017
More informationBayesian Analysis of Risk for Data Mining Based on Empirical Likelihood
1 / 29 Bayesian Analysis of Risk for Data Mining Based on Empirical Likelihood Yuan Liao Wenxin Jiang Northwestern University Presented at: Department of Statistics and Biostatistics Rutgers University
More informationLabor-Supply Shifts and Economic Fluctuations. Technical Appendix
Labor-Supply Shifts and Economic Fluctuations Technical Appendix Yongsung Chang Department of Economics University of Pennsylvania Frank Schorfheide Department of Economics University of Pennsylvania January
More informationIDENTIFICATION OF TREATMENT EFFECTS WITH SELECTIVE PARTICIPATION IN A RANDOMIZED TRIAL
IDENTIFICATION OF TREATMENT EFFECTS WITH SELECTIVE PARTICIPATION IN A RANDOMIZED TRIAL BRENDAN KLINE AND ELIE TAMER Abstract. Randomized trials (RTs) are used to learn about treatment effects. This paper
More informationBayesian Econometrics - Computer section
Bayesian Econometrics - Computer section Leandro Magnusson Department of Economics Brown University Leandro Magnusson@brown.edu http://www.econ.brown.edu/students/leandro Magnusson/ April 26, 2006 Preliminary
More informationOne Economist s Perspective on Some Important Estimation Issues
One Economist s Perspective on Some Important Estimation Issues Jere R. Behrman W.R. Kenan Jr. Professor of Economics & Sociology University of Pennsylvania SRCD Seattle Preconference on Interventions
More informationA Bayesian Treatment of Linear Gaussian Regression
A Bayesian Treatment of Linear Gaussian Regression Frank Wood December 3, 2009 Bayesian Approach to Classical Linear Regression In classical linear regression we have the following model y β, σ 2, X N(Xβ,
More informationChoosing among models
Eco 515 Fall 2014 Chris Sims Choosing among models September 18, 2014 c 2014 by Christopher A. Sims. This document is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported
More informationPenalized Loss functions for Bayesian Model Choice
Penalized Loss functions for Bayesian Model Choice Martyn International Agency for Research on Cancer Lyon, France 13 November 2009 The pure approach For a Bayesian purist, all uncertainty is represented
More informationICES REPORT Model Misspecification and Plausibility
ICES REPORT 14-21 August 2014 Model Misspecification and Plausibility by Kathryn Farrell and J. Tinsley Odena The Institute for Computational Engineering and Sciences The University of Texas at Austin
More informationDavid Giles Bayesian Econometrics
9. Model Selection - Theory David Giles Bayesian Econometrics One nice feature of the Bayesian analysis is that we can apply it to drawing inferences about entire models, not just parameters. Can't do
More informationSpecial Topic: Bayesian Finite Population Survey Sampling
Special Topic: Bayesian Finite Population Survey Sampling Sudipto Banerjee Division of Biostatistics School of Public Health University of Minnesota April 2, 2008 1 Special Topic Overview Scientific survey
More informationLecture 1: Bayesian Framework Basics
Lecture 1: Bayesian Framework Basics Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de April 21, 2014 What is this course about? Building Bayesian machine learning models Performing the inference of
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationShould all Machine Learning be Bayesian? Should all Bayesian models be non-parametric?
Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric? Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/
More informationCausal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies
Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies Kosuke Imai Department of Politics Princeton University November 13, 2013 So far, we have essentially assumed
More informationFlexible Estimation of Treatment Effect Parameters
Flexible Estimation of Treatment Effect Parameters Thomas MaCurdy a and Xiaohong Chen b and Han Hong c Introduction Many empirical studies of program evaluations are complicated by the presence of both
More informationPart III. A Decision-Theoretic Approach and Bayesian testing
Part III A Decision-Theoretic Approach and Bayesian testing 1 Chapter 10 Bayesian Inference as a Decision Problem The decision-theoretic framework starts with the following situation. We would like to
More informationApplied Microeconometrics (L5): Panel Data-Basics
Applied Microeconometrics (L5): Panel Data-Basics Nicholas Giannakopoulos University of Patras Department of Economics ngias@upatras.gr November 10, 2015 Nicholas Giannakopoulos (UPatras) MSc Applied Economics
More informationMultivariate Versus Multinomial Probit: When are Binary Decisions Made Separately also Jointly Optimal?
Multivariate Versus Multinomial Probit: When are Binary Decisions Made Separately also Jointly Optimal? Dale J. Poirier and Deven Kapadia University of California, Irvine March 10, 2012 Abstract We provide
More informationA Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness
A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model
More informationDSGE Model Forecasting
University of Pennsylvania EABCN Training School May 1, 216 Introduction The use of DSGE models at central banks has triggered a strong interest in their forecast performance. The subsequent material draws
More informationEconometrics of causal inference. Throughout, we consider the simplest case of a linear outcome equation, and homogeneous
Econometrics of causal inference Throughout, we consider the simplest case of a linear outcome equation, and homogeneous effects: y = βx + ɛ (1) where y is some outcome, x is an explanatory variable, and
More informationSome Curiosities Arising in Objective Bayesian Analysis
. Some Curiosities Arising in Objective Bayesian Analysis Jim Berger Duke University Statistical and Applied Mathematical Institute Yale University May 15, 2009 1 Three vignettes related to John s work
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationGibbs Sampling in Latent Variable Models #1
Gibbs Sampling in Latent Variable Models #1 Econ 690 Purdue University Outline 1 Data augmentation 2 Probit Model Probit Application A Panel Probit Panel Probit 3 The Tobit Model Example: Female Labor
More informationBayesian Model Comparison
BS2 Statistical Inference, Lecture 11, Hilary Term 2009 February 26, 2009 Basic result An accurate approximation Asymptotic posterior distribution An integral of form I = b a e λg(y) h(y) dy where h(y)
More informationRobustness to Parametric Assumptions in Missing Data Models
Robustness to Parametric Assumptions in Missing Data Models Bryan Graham NYU Keisuke Hirano University of Arizona April 2011 Motivation Motivation We consider the classic missing data problem. In practice
More informationDynamic Factor Models and Factor Augmented Vector Autoregressions. Lawrence J. Christiano
Dynamic Factor Models and Factor Augmented Vector Autoregressions Lawrence J Christiano Dynamic Factor Models and Factor Augmented Vector Autoregressions Problem: the time series dimension of data is relatively
More informationSeminar über Statistik FS2008: Model Selection
Seminar über Statistik FS2008: Model Selection Alessia Fenaroli, Ghazale Jazayeri Monday, April 2, 2008 Introduction Model Choice deals with the comparison of models and the selection of a model. It can
More informationFractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling
Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction
More informationBayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida
Bayesian Statistical Methods Jeff Gill Department of Political Science, University of Florida 234 Anderson Hall, PO Box 117325, Gainesville, FL 32611-7325 Voice: 352-392-0262x272, Fax: 352-392-8127, Email:
More informationStatistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach
Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score
More informationIEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm
IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.
More informationEcon 2148, spring 2019 Statistical decision theory
Econ 2148, spring 2019 Statistical decision theory Maximilian Kasy Department of Economics, Harvard University 1 / 53 Takeaways for this part of class 1. A general framework to think about what makes a
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationVariational Bayesian Inference for Parametric and Non-Parametric Regression with Missing Predictor Data
for Parametric and Non-Parametric Regression with Missing Predictor Data August 23, 2010 Introduction Bayesian inference For parametric regression: long history (e.g. Box and Tiao, 1973; Gelman, Carlin,
More informationLecture 11/12. Roy Model, MTE, Structural Estimation
Lecture 11/12. Roy Model, MTE, Structural Estimation Economics 2123 George Washington University Instructor: Prof. Ben Williams Roy model The Roy model is a model of comparative advantage: Potential earnings
More informationExtended Bayesian Information Criteria for Model Selection with Large Model Spaces
Extended Bayesian Information Criteria for Model Selection with Large Model Spaces Jiahua Chen, University of British Columbia Zehua Chen, National University of Singapore (Biometrika, 2008) 1 / 18 Variable
More informationCEPA Working Paper No
CEPA Working Paper No. 15-06 Identification based on Difference-in-Differences Approaches with Multiple Treatments AUTHORS Hans Fricke Stanford University ABSTRACT This paper discusses identification based
More informationSTA 216, GLM, Lecture 16. October 29, 2007
STA 216, GLM, Lecture 16 October 29, 2007 Efficient Posterior Computation in Factor Models Underlying Normal Models Generalized Latent Trait Models Formulation Genetic Epidemiology Illustration Structural
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More informationWarwick Business School Forecasting System. Summary. Ana Galvao, Anthony Garratt and James Mitchell November, 2014
Warwick Business School Forecasting System Summary Ana Galvao, Anthony Garratt and James Mitchell November, 21 The main objective of the Warwick Business School Forecasting System is to provide competitive
More informationWeak Identification in Maximum Likelihood: A Question of Information
Weak Identification in Maximum Likelihood: A Question of Information By Isaiah Andrews and Anna Mikusheva Weak identification commonly refers to the failure of classical asymptotics to provide a good approximation
More informationExtending causal inferences from a randomized trial to a target population
Extending causal inferences from a randomized trial to a target population Issa Dahabreh Center for Evidence Synthesis in Health, Brown University issa dahabreh@brown.edu January 16, 2019 Issa Dahabreh
More informationStatistical Machine Learning Lectures 4: Variational Bayes
1 / 29 Statistical Machine Learning Lectures 4: Variational Bayes Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 29 Synonyms Variational Bayes Variational Inference Variational Bayesian Inference
More informationTime Series and Dynamic Models
Time Series and Dynamic Models Section 1 Intro to Bayesian Inference Carlos M. Carvalho The University of Texas at Austin 1 Outline 1 1. Foundations of Bayesian Statistics 2. Bayesian Estimation 3. The
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationBayesian methods in economics and finance
1/26 Bayesian methods in economics and finance Linear regression: Bayesian model selection and sparsity priors Linear Regression 2/26 Linear regression Model for relationship between (several) independent
More informationLecture Notes 1: Decisions and Data. In these notes, I describe some basic ideas in decision theory. theory is constructed from
Topics in Data Analysis Steven N. Durlauf University of Wisconsin Lecture Notes : Decisions and Data In these notes, I describe some basic ideas in decision theory. theory is constructed from The Data:
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationPrinciples Underlying Evaluation Estimators
The Principles Underlying Evaluation Estimators James J. University of Chicago Econ 350, Winter 2019 The Basic Principles Underlying the Identification of the Main Econometric Evaluation Estimators Two
More informationEconometrics I. Professor William Greene Stern School of Business Department of Economics 1-1/40. Part 1: Introduction
Econometrics I Professor William Greene Stern School of Business Department of Economics 1-1/40 http://people.stern.nyu.edu/wgreene/econometrics/econometrics.htm 1-2/40 Overview: This is an intermediate
More informationLinear Models and Estimation by Least Squares
Linear Models and Estimation by Least Squares Jin-Lung Lin 1 Introduction Causal relation investigation lies in the heart of economics. Effect (Dependent variable) cause (Independent variable) Example:
More informationChapter 8. Quantile Regression and Quantile Treatment Effects
Chapter 8. Quantile Regression and Quantile Treatment Effects By Joan Llull Quantitative & Statistical Methods II Barcelona GSE. Winter 2018 I. Introduction A. Motivation As in most of the economics literature,
More informationBAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA
BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA Intro: Course Outline and Brief Intro to Marina Vannucci Rice University, USA PASI-CIMAT 04/28-30/2010 Marina Vannucci
More informationPhysics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester
Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability
More informationBayes Correlated Equilibrium and Comparing Information Structures
Bayes Correlated Equilibrium and Comparing Information Structures Dirk Bergemann and Stephen Morris Spring 2013: 521 B Introduction game theoretic predictions are very sensitive to "information structure"
More informationHypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006
Hypothesis Testing Part I James J. Heckman University of Chicago Econ 312 This draft, April 20, 2006 1 1 A Brief Review of Hypothesis Testing and Its Uses values and pure significance tests (R.A. Fisher)
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationHeterogeneity and Lotteries in Monetary Search Models
SÉBASTIEN LOTZ ANDREI SHEVCHENKO CHRISTOPHER WALLER Heterogeneity and Lotteries in Monetary Search Models We introduce ex ante heterogeneity into the Berentsen, Molico, and Wright monetary search model
More informationUsing data to inform policy
Using data to inform policy Maximilian Kasy Department of Economics, Harvard University Maximilian Kasy (Harvard) data and policy 1 / 41 Introduction The roles of econometrics Forecasting: What will be?
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationChapter 7: Model Assessment and Selection
Chapter 7: Model Assessment and Selection DD3364 April 20, 2012 Introduction Regression: Review of our problem Have target variable Y to estimate from a vector of inputs X. A prediction model ˆf(X) has
More informationBayesian Inference for DSGE Models. Lawrence J. Christiano
Bayesian Inference for DSGE Models Lawrence J. Christiano Outline State space-observer form. convenient for model estimation and many other things. Preliminaries. Probabilities. Maximum Likelihood. Bayesian
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationModelling Czech and Slovak labour markets: A DSGE model with labour frictions
Modelling Czech and Slovak labour markets: A DSGE model with labour frictions Daniel Němec Faculty of Economics and Administrations Masaryk University Brno, Czech Republic nemecd@econ.muni.cz ESF MU (Brno)
More informationOnline Appendix for Investment Hangover and the Great Recession
ONLINE APPENDIX INVESTMENT HANGOVER A1 Online Appendix for Investment Hangover and the Great Recession By MATTHEW ROGNLIE, ANDREI SHLEIFER, AND ALP SIMSEK APPENDIX A: CALIBRATION This appendix describes
More informationFoundations of Statistical Inference
Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2015 Julien Berestycki (University of Oxford) SB2a MT 2015 1 / 16 Lecture 16 : Bayesian analysis
More informationMeasurement error as missing data: the case of epidemiologic assays. Roderick J. Little
Measurement error as missing data: the case of epidemiologic assays Roderick J. Little Outline Discuss two related calibration topics where classical methods are deficient (A) Limit of quantification methods
More informationST440/540: Applied Bayesian Statistics. (9) Model selection and goodness-of-fit checks
(9) Model selection and goodness-of-fit checks Objectives In this module we will study methods for model comparisons and checking for model adequacy For model comparisons there are a finite number of candidate
More informationThe Normal Linear Regression Model with Natural Conjugate Prior. March 7, 2016
The Normal Linear Regression Model with Natural Conjugate Prior March 7, 2016 The Normal Linear Regression Model with Natural Conjugate Prior The plan Estimate simple regression model using Bayesian methods
More informationThe linear model is the most fundamental of all serious statistical models encompassing:
Linear Regression Models: A Bayesian perspective Ingredients of a linear model include an n 1 response vector y = (y 1,..., y n ) T and an n p design matrix (e.g. including regressors) X = [x 1,..., x
More informationBayesian Model Diagnostics and Checking
Earvin Balderama Quantitative Ecology Lab Department of Forestry and Environmental Resources North Carolina State University April 12, 2013 1 / 34 Introduction MCMCMC 2 / 34 Introduction MCMCMC Steps in
More information