Using Mixed Integer Programming for Matching in Observational Studies

Size: px
Start display at page:

Download "Using Mixed Integer Programming for Matching in Observational Studies"

Transcription

1 Using Mixed Integer Programming for Matching in Observational Studies José R. Zubizarreta Department Statistics The Wharton School University of Pennsylvania

2 Key takeaway points Optimal matching method Get the balance you want Know it is infeasible Eliminate guesswork Directly balance several statistics beyond means Keep the adjustments simple enough People can talk about them Sensitivity analysis to unobserved biases José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 1 / 17

3 Outline The 2010 Chilean earthquake Optimal matching via mixed integer programming Applications Summary and remarks José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 1 / 17

4 Outline The 2010 Chilean earthquake Optimal matching via mixed integer programming Applications Summary and remarks José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 1 / 17

5 The 2010 Chilean earthquake 4th strongest earthquake in the world in the last 50 years Sebastián Martínez/AP Photo José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 2 / 17

6 Effect of the earthquake Effect of the earthquake on posttraumatic stress? The post earthquake survey (EPT) 7/89"! 9).<=>2)?&! 9:;!!"#$!!!!!!!!!!%&'!!!!!!!!!!!!()*!!!!!!!!!!!!!+&,!!!!!!!!!!-).!!!!!!!!!!!!/0.!!!!!!!!!!!-)1!!!!!!!!!!!(2*!!!!!!!!!!!!!3445!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!3464! Re-interviewed 22,456 households from CASEN 2009 Detailed measurements of the same individuals before and after José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 3 / 17

7 Intensity of the earthquake Peak ground acceleration (PGA) in the communes of the EPT Jose R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 4 / 17

8 Matched design Matched respondents with PGA < to those with PGA We matched exactly for sex, age and ethnic groups with fine balance for self-rated health, quality of the housing balancing the entire empirical distributions of income mean balancing the 46 covariates in the study José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 5 / 17

9 Outline The 2010 Chilean earthquake Optimal matching via mixed integer programming Applications Summary and remarks José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 5 / 17

10 Notation Let T = {t 1,..., t T } be the set of treated units, and C = {c 1,..., c C }, the set of potential controls, with T C Define P = {p 1,..., p P } as the set of observed covariates Each treated unit t T has a vector of observed covariates x t, = {x t,p1,..., x t,pp }, and each control c C has a similar vector x c, = {x c,p1,..., x c,pp } Based on these covariates there is a distance 0 δ t,c < between treated and control units Decision variable a t,c = { 1 if treated t is assigned to control c 0 otherwise José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 6 / 17

11 The assignment algorithm minimize a subject to δ t,c a t,c t T c C a t,c = m, t T c C a t,c 1, c C t T a t,c {0, 1}, t T, c C José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 7 / 17

12 A MIP with direct balance via the objective function minimize a subject to δ t,c a t,c + ω i µ i (a) j J t T c C a t,c = m, t T c C a t,c 1, c C t T a t,c {0, 1}, t T, c C José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 7 / 17

13 A MIP with direct balance via the constraints minimize a subject to δ t,c a t,c t T c C a t,c = m, t T c C a t,c 1, c C t T a t,c {0, 1}, t T, c C ν j (a) ε j, j J José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 7 / 17

14 Outline The 2010 Chilean earthquake Optimal matching via mixed integer programming Applications Summary and remarks José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 7 / 17

15 Balancing the means of the covariates (1) minimize a subject to δ t,c a t,c + x c,j a t,c ω j mt x T,j t T c C j J t T c C a t,c = m, t T c C a t,c 1, c C t T a t,c {0, 1}, t T, c C José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 8 / 17

16 Balancing the means of the covariates (1) minimize a,z subject to δ t,c a t,c + ω j z j j J t T c C a t,c = m, t T c C a t,c 1, c C t T a t,c {0, 1}, t T, c C z j x c,j a t,c mt x T,j, j J t T c C z j x c,j a t,c mt + x T,j, j J t T c C José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 8 / 17

17 Balancing the means of the covariates (1) minimize a,z subject to δ t,c a t,c t T c C a t,c = m, t T c C a t,c 1, c C t T a t,c {0, 1}, t T, c C x c,j a t,c mt x T,j ε j, j J t T c C x c,j a t,c mt + x T,j ε j, j J t T c C José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 8 / 17

18 Balancing the means of the covariates (2) age_09 sex_09 n_per_hh_09 mar_coh_09 div_wid_09 single_09 rural_09 hlth_prb_d_09 psy_prb_09 hptlizd_09 dis_ss_lo_09 dis_md_sv_09 dis_no_09 dis_ndat_09 fonasa_09 isapre_09 others_09 no_ins_09 dnk_ins_09 yrs_edu_09 employ_09 unempl_09 inacti_09 w_i_09 pc_a_i_09 pc_t_i_09 poor_09 hs_no_oc_09 hs_md_oc_09 hs_cr_oc_09 hs_own_09 hs_rent_09 hs_ced_09 hs_irr_09 Before matching Absolute standardized differences in means José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 9 / 17

19 Balancing the means of the covariates (2) age_09 sex_09 n_per_hh_09 mar_coh_09 div_wid_09 single_09 rural_09 hlth_prb_d_09 psy_prb_09 hptlizd_09 dis_ss_lo_09 dis_md_sv_09 dis_no_09 dis_ndat_09 fonasa_09 isapre_09 others_09 no_ins_09 dnk_ins_09 yrs_edu_09 employ_09 unempl_09 inacti_09 w_i_09 pc_a_i_09 pc_t_i_09 poor_09 hs_no_oc_09 hs_md_oc_09 hs_cr_oc_09 hs_own_09 hs_rent_09 hs_ced_09 hs_irr_09 Before matching After matching, assignment algorithm Absolute standardized differences in means José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 9 / 17

20 Balancing the means of the covariates (2) age_09 sex_09 n_per_hh_09 mar_coh_09 div_wid_09 single_09 rural_09 hlth_prb_d_09 psy_prb_09 hptlizd_09 dis_ss_lo_09 dis_md_sv_09 dis_no_09 dis_ndat_09 fonasa_09 isapre_09 others_09 no_ins_09 dnk_ins_09 yrs_edu_09 employ_09 unempl_09 inacti_09 w_i_09 pc_a_i_09 pc_t_i_09 poor_09 hs_no_oc_09 hs_md_oc_09 hs_cr_oc_09 hs_own_09 hs_rent_09 hs_ced_09 hs_irr_09 Before matching After matching, assignment algorithm After matching, mipmatch Absolute standardized differences in means José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 9 / 17

21 Balancing the means of the covariates (2) age_09 sex_09 n_per_hh_09 mar_coh_09 div_wid_09 single_09 rural_09 hlth_prb_d_09 psy_prb_09 hptlizd_09 dis_ss_lo_09 dis_md_sv_09 dis_no_09 dis_ndat_09 fonasa_09 isapre_09 others_09 no_ins_09 dnk_ins_09 yrs_edu_09 employ_09 unempl_09 inacti_09 w_i_09 pc_a_i_09 pc_t_i_09 poor_09 hs_no_oc_09 hs_md_oc_09 hs_cr_oc_09 hs_own_09 hs_rent_09 hs_ced_09 hs_irr_09 Before matching After matching, assignment algorithm After matching, mipmatch Absolute standardized differences in means José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 9 / 17

22 Balancing the means of the covariates (2) age_09 sex_09 n_per_hh_09 mar_coh_09 div_wid_09 single_09 rural_09 hlth_prb_d_09 psy_prb_09 hptlizd_09 dis_ss_lo_09 dis_md_sv_09 dis_no_09 dis_ndat_09 fonasa_09 isapre_09 others_09 no_ins_09 dnk_ins_09 yrs_edu_09 employ_09 unempl_09 inacti_09 w_i_09 pc_a_i_09 pc_t_i_09 poor_09 hs_no_oc_09 hs_md_oc_09 hs_cr_oc_09 hs_own_09 hs_rent_09 hs_ced_09 hs_irr_09 Before matching After matching, assignment algorithm After matching, mipmatch Absolute standardized differences in means José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 9 / 17

23 Balancing Kolmogorov-Smirnov statistics (1) ω j µ j (a) = ω j sup FT (x c,p ) F C (x c,p, a) x c,p G(x T,p ) j J = ω j z j 1 G(x T,p ) t T c C 1 {xg 1;p x c,p<x g;p}a t,c mt x g;p G(x T,p ) José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 10 / 17

24 Balancing Kolmogorov-Smirnov statistics (2) ECDF(x) Controls Exposed, before matching Exposed, after matching, assignment algorithm Exposed, after matching, mipmatch x = Household per capita income (thousand pesos) José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 11 / 17

25 Balancing Kolmogorov-Smirnov statistics (2) ECDF(x) Controls Exposed, before matching Exposed, after matching, assignment algorithm Exposed, after matching, mipmatch x = Household per capita income (thousand pesos) José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 11 / 17

26 Balancing Kolmogorov-Smirnov statistics (2) ECDF(x) Controls Exposed, before matching Exposed, after matching, assignment algorithm Exposed, after matching, mipmatch x = Household per capita income (thousand pesos) José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 11 / 17

27 Fine and near-fine balance for several covariates (1) Fine balance: a t,c 1 {xc,p=b} = m 1 {xt,p=b} b B t T c C t T Near-fine balance: a t,c 1 {xc,p=b} m 1 {xt,p=b} ξ b B t T c C t T José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 12 / 17

28 Fine and near-fine balance for several covariates (2) Table: Fine balance for self-rated health Exposed Controls Poor Good Fair Table: Fine balance for material quality of the housing Exposed Controls Acceptable Unacceptable Beyond repair José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 13 / 17

29 Density plot of PTS scores Estimated Density of Pair Differences Density Density estimate x = Exposed-minus-control pair differences in PTS scores Boxplot of Pair Differences Exposed-minus-control pair differences in PTS scores José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 14 / 17

30 Outline The 2010 Chilean earthquake Optimal matching via mixed integer programming Applications Summary and remarks José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 14 / 17

31 Summary and remarks Explicitly optimize or constrain the criteria used to assess the quality of the match Meet the criteria Know that the criteria is infeasible Directly balance Means Variances and skewness Correlations Quantiles Kolmogorov-Smirnov statistic While matching with exact, near-exact, fine and near-fine balance for more than one covariate A systematic method for improving covariate balance José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 15 / 17

32 Extensions Optimal subset matching Building a stronger instrumental variable Enhancing regression discontinuity designs R package mipmatch José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 16 / 17

33 References Zubizarreta, J. R. (2012), Using Mixed Integer Programming for Matching in an Observational Study of Acute Kidney Injury after Surgery, under revision. Zubizarreta, J. R., Cerdá, M. and Rosenbaum, P. R. (2012), Effect of the 2010 Chilean Earthquake on Posttraumatic Stress: Designing an Observational Study to be Less Sensitive to Unmeasured Biases, under revision. Zubizarreta, J. R., Reinke, C. E., Kelz, R. R., Silber, J. H. and Rosenbaum, P. R. (2011), Matching for Several Sparse Nominal Variables in a Case-Control Study of Readmission Following Surgery, The American Statistician, 65, José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 17 / 17

34 Using Mixed Integer Programming for Matching in Observational Studies José R. Zubizarreta Department Statistics The Wharton School University of Pennsylvania José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 17 / 17

Using Mixed Integer Programming for Matching in an Observational Study of Kidney Failure after Surgery

Using Mixed Integer Programming for Matching in an Observational Study of Kidney Failure after Surgery Using Mixed Integer Programming for Matching in an Observational Study of Kidney Failure after Surgery José R. Zubizarreta Abstract This paper presents a new method for optimal matching in observational

More information

Strong control of the family-wise error rate in observational studies that discover effect modification by exploratory methods

Strong control of the family-wise error rate in observational studies that discover effect modification by exploratory methods Strong control of the family-wise error rate in observational studies that discover effect modification by exploratory methods Jesse Y. Hsu 1, José R. Zubizarreta, Dylan S. Small, Paul R. Rosenbaum University

More information

Handling Limited Overlap in Observational Studies with Cardinality Matching

Handling Limited Overlap in Observational Studies with Cardinality Matching Observational Studies 4 (2018) 217-249 Submitted 11/17; Published 7/18 Handling Limited Overlap in Observational Studies with Cardinality Matching Giancarlo Visconti Department of Political Science Purdue

More information

Stable Weights that Balance Covariates for Estimation with Incomplete Outcome Data

Stable Weights that Balance Covariates for Estimation with Incomplete Outcome Data Stable Weights that Balance Covariates for Estimation with Incomplete Outcome Data José R. Zubizarreta Abstract Weighting methods that adjust for observed covariates, such as inverse probability weighting,

More information

Large, Sparse Optimal Matching with Refined Covariate Balance in an Observational Study of the Health Outcomes Produced by New Surgeons

Large, Sparse Optimal Matching with Refined Covariate Balance in an Observational Study of the Health Outcomes Produced by New Surgeons University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1-23-2015 Large, Sparse Optimal Matching with Refined Covariate Balance in an Observational Study of the Health Outcomes

More information

The Nonparametric Bootstrap

The Nonparametric Bootstrap The Nonparametric Bootstrap The nonparametric bootstrap may involve inferences about a parameter, but we use a nonparametric procedure in approximating the parametric distribution using the ECDF. We use

More information

What s New in Econometrics. Lecture 1

What s New in Econometrics. Lecture 1 What s New in Econometrics Lecture 1 Estimation of Average Treatment Effects Under Unconfoundedness Guido Imbens NBER Summer Institute, 2007 Outline 1. Introduction 2. Potential Outcomes 3. Estimands and

More information

Optimal Matching with Minimal Deviation from Fine Balance in a Study of Obesity and Surgical Outcomes

Optimal Matching with Minimal Deviation from Fine Balance in a Study of Obesity and Surgical Outcomes University of Pennsylvania ScholarlyCommons Health Care Management Papers Wharton Faculty Research 6-2012 Optimal Matching with Minimal Deviation from Fine Balance in a Study of Obesity and Surgical Outcomes

More information

Incentives and Nutrition for Rotten Kids: Intrahousehold Food Allocation in the Philippines

Incentives and Nutrition for Rotten Kids: Intrahousehold Food Allocation in the Philippines Incentives and Nutrition for Rotten Kids: Intrahousehold Food Allocation in the Philippines Pierre Dubois and Ethan Ligon presented by Rachel Heath November 3, 2006 Introduction Outline Introduction Modification

More information

MATCHING FOR BALANCE, PAIRING FOR HETEROGENEITY IN AN OBSERVATIONAL STUDY OF THE EFFECTIVENESS OF FOR-PROFIT AND NOT-FOR-PROFIT HIGH SCHOOLS IN CHILE

MATCHING FOR BALANCE, PAIRING FOR HETEROGENEITY IN AN OBSERVATIONAL STUDY OF THE EFFECTIVENESS OF FOR-PROFIT AND NOT-FOR-PROFIT HIGH SCHOOLS IN CHILE The Annals of Applied Statistics 2014, Vol. 8, No. 1, 204 231 DOI: 10.1214/13-AOAS713 Institute of Mathematical Statistics, 2014 MATCHING FOR BALANCE, PAIRING FOR HETEROGENEITY IN AN OBSERVATIONAL STUDY

More information

Determining Changes in Welfare Distributions at the Micro-level: Updating Poverty Maps By Chris Elbers, Jean O. Lanjouw, and Peter Lanjouw 1

Determining Changes in Welfare Distributions at the Micro-level: Updating Poverty Maps By Chris Elbers, Jean O. Lanjouw, and Peter Lanjouw 1 Determining Changes in Welfare Distributions at the Micro-level: Updating Poverty Maps By Chris Elbers, Jean O. Lanjouw, and Peter Lanjouw 1 Income and wealth distributions have a prominent position in

More information

Lecture (chapter 13): Association between variables measured at the interval-ratio level

Lecture (chapter 13): Association between variables measured at the interval-ratio level Lecture (chapter 13): Association between variables measured at the interval-ratio level Ernesto F. L. Amaral April 9 11, 2018 Advanced Methods of Social Research (SOCI 420) Source: Healey, Joseph F. 2015.

More information

Causal Inference with Big Data Sets

Causal Inference with Big Data Sets Causal Inference with Big Data Sets Marcelo Coca Perraillon University of Colorado AMC November 2016 1 / 1 Outlone Outline Big data Causal inference in economics and statistics Regression discontinuity

More information

Variable selection and machine learning methods in causal inference

Variable selection and machine learning methods in causal inference Variable selection and machine learning methods in causal inference Debashis Ghosh Department of Biostatistics and Informatics Colorado School of Public Health Joint work with Yeying Zhu, University of

More information

T-Test QUESTION T-TEST GROUPS = sex(1 2) /MISSING = ANALYSIS /VARIABLES = quiz1 quiz2 quiz3 quiz4 quiz5 final total /CRITERIA = CI(.95).

T-Test QUESTION T-TEST GROUPS = sex(1 2) /MISSING = ANALYSIS /VARIABLES = quiz1 quiz2 quiz3 quiz4 quiz5 final total /CRITERIA = CI(.95). QUESTION 11.1 GROUPS = sex(1 2) /MISSING = ANALYSIS /VARIABLES = quiz2 quiz3 quiz4 quiz5 final total /CRITERIA = CI(.95). Group Statistics quiz2 quiz3 quiz4 quiz5 final total sex N Mean Std. Deviation

More information

Use of Matching Methods for Causal Inference in Experimental and Observational Studies. This Talk Draws on the Following Papers:

Use of Matching Methods for Causal Inference in Experimental and Observational Studies. This Talk Draws on the Following Papers: Use of Matching Methods for Causal Inference in Experimental and Observational Studies Kosuke Imai Department of Politics Princeton University April 13, 2009 Kosuke Imai (Princeton University) Matching

More information

CSSS/STAT/SOC 321 Case-Based Social Statistics I. Levels of Measurement

CSSS/STAT/SOC 321 Case-Based Social Statistics I. Levels of Measurement CSSS/STAT/SOC 321 Case-Based Social Statistics I Levels of Measurement Christopher Adolph Department of Political Science and Center for Statistics and the Social Sciences University of Washington, Seattle

More information

STA 291 Lecture 16. Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately) normal

STA 291 Lecture 16. Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately) normal STA 291 Lecture 16 Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately) normal X STA 291 - Lecture 16 1 Sampling Distributions Sampling

More information

Using Instrumental Variables to Find Causal Effects in Public Health

Using Instrumental Variables to Find Causal Effects in Public Health 1 Using Instrumental Variables to Find Causal Effects in Public Health Antonio Trujillo, PhD John Hopkins Bloomberg School of Public Health Department of International Health Health Systems Program October

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall17 Carlos Fernandez-Granda Descriptive statistics Techniques to visualize

More information

Survey nonresponse and the distribution of income

Survey nonresponse and the distribution of income Survey nonresponse and the distribution of income Emanuela Galasso* Development Research Group, World Bank Module 1. Sampling for Surveys 1: Why are we concerned about non response? 2: Implications for

More information

New Developments in Nonresponse Adjustment Methods

New Developments in Nonresponse Adjustment Methods New Developments in Nonresponse Adjustment Methods Fannie Cobben January 23, 2009 1 Introduction In this paper, we describe two relatively new techniques to adjust for (unit) nonresponse bias: The sample

More information

unadjusted model for baseline cholesterol 22:31 Monday, April 19,

unadjusted model for baseline cholesterol 22:31 Monday, April 19, unadjusted model for baseline cholesterol 22:31 Monday, April 19, 2004 1 Class Level Information Class Levels Values TRETGRP 3 3 4 5 SEX 2 0 1 Number of observations 916 unadjusted model for baseline cholesterol

More information

TA session# 8. Jun Sakamoto November 29, Empirical study Empirical study Empirical study 3 3

TA session# 8. Jun Sakamoto November 29, Empirical study Empirical study Empirical study 3 3 TA session# 8 Jun Sakamoto November 29,2018 Contents 1 Empirical study 1 1 2 Empirical study 2 2 3 Empirical study 3 3 4 Empirical study 4 4 We will look at some empirical studies for panel data analysis.

More information

Linear Methods for Classification

Linear Methods for Classification Linear Methods for Classification Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Classification Supervised learning Training data: {(x 1, g 1 ), (x 2, g 2 ),..., (x

More information

Advanced Statistical Methods for Observational Studies L E C T U R E 0 6

Advanced Statistical Methods for Observational Studies L E C T U R E 0 6 Advanced Statistical Methods for Observational Studies L E C T U R E 0 6 class management Problem set 1 is posted Questions? design thus far We re off to a bad start. 1 2 1 2 1 2 2 2 1 1 1 2 1 1 2 2 2

More information

Unit 20: Planning Accelerated Life Tests

Unit 20: Planning Accelerated Life Tests Unit 20: Planning Accelerated Life Tests Ramón V. León Notes largely based on Statistical Methods for Reliability Data by W.Q. Meeker and L. A. Escobar, Wiley, 1998 and on their class notes. 11/13/2004

More information

Outline. Possible Reasons. Nature of Heteroscedasticity. Basic Econometrics in Transportation. Heteroscedasticity

Outline. Possible Reasons. Nature of Heteroscedasticity. Basic Econometrics in Transportation. Heteroscedasticity 1/25 Outline Basic Econometrics in Transportation Heteroscedasticity What is the nature of heteroscedasticity? What are its consequences? How does one detect it? What are the remedial measures? Amir Samimi

More information

The cover page of the Encyclopedia of Health Economics (2014) Introduction to Econometric Application in Health Economics

The cover page of the Encyclopedia of Health Economics (2014) Introduction to Econometric Application in Health Economics PHPM110062 Teaching Demo The cover page of the Encyclopedia of Health Economics (2014) Introduction to Econometric Application in Health Economics Instructor: Mengcen Qian School of Public Health What

More information

You are permitted to use your own calculator where it has been stamped as approved by the University.

You are permitted to use your own calculator where it has been stamped as approved by the University. ECONOMICS TRIPOS Part I Friday 11 June 004 9 1 Paper 3 Quantitative Methods in Economics This exam comprises four sections. Sections A and B are on Mathematics; Sections C and D are on Statistics. You

More information

Declarative Statistics

Declarative Statistics Declarative Statistics Roberto Rossi, 1 Özgür Akgün, 2 Steven D. Prestwich, 3 S. Armagan Tarim 3 1 The University of Edinburgh Business School, The University of Edinburgh, UK 2 Department of Computer

More information

Multistate models in survival and event history analysis

Multistate models in survival and event history analysis Multistate models in survival and event history analysis Dorota M. Dabrowska UCLA November 8, 2011 Research supported by the grant R01 AI067943 from NIAID. The content is solely the responsibility of the

More information

arxiv: v1 [stat.ap] 14 Apr 2014

arxiv: v1 [stat.ap] 14 Apr 2014 The Annals of Applied Statistics 2014, Vol. 8, No. 1, 204 231 DOI: 10.1214/13-AOAS713 c Institute of Mathematical Statistics, 2014 arxiv:1404.3584v1 [stat.ap] 14 Apr 2014 MATCHING FOR BALANCE, PAIRING

More information

Given a sample of n observations measured on k IVs and one DV, we obtain the equation

Given a sample of n observations measured on k IVs and one DV, we obtain the equation Psychology 8 Lecture #13 Outline Prediction and Cross-Validation One of the primary uses of MLR is for prediction of the value of a dependent variable for future observations, or observations that were

More information

Regression Analysis Tutorial 34 LECTURE / DISCUSSION. Statistical Properties of OLS

Regression Analysis Tutorial 34 LECTURE / DISCUSSION. Statistical Properties of OLS Regression Analysis Tutorial 34 LETURE / DISUSSION Statistical Properties of OLS Regression Analysis Tutorial 35 Statistical Properties of OLS y = " + $x + g dependent included omitted variable explanatory

More information

Descriptive Data Summarization

Descriptive Data Summarization Descriptive Data Summarization Descriptive data summarization gives the general characteristics of the data and identify the presence of noise or outliers, which is useful for successful data cleaning

More information

Simple New Keynesian Model without Capital. Lawrence J. Christiano

Simple New Keynesian Model without Capital. Lawrence J. Christiano Simple New Keynesian Model without Capital Lawrence J. Christiano Outline Formulate the nonlinear equilibrium conditions of the model. Need actual nonlinear conditions to study Ramsey optimal policy, even

More information

Introduction to Econometrics. Heteroskedasticity

Introduction to Econometrics. Heteroskedasticity Introduction to Econometrics Introduction Heteroskedasticity When the variance of the errors changes across segments of the population, where the segments are determined by different values for the explanatory

More information

Statistics for Managers using Microsoft Excel 6 th Edition

Statistics for Managers using Microsoft Excel 6 th Edition Statistics for Managers using Microsoft Excel 6 th Edition Chapter 3 Numerical Descriptive Measures 3-1 Learning Objectives In this chapter, you learn: To describe the properties of central tendency, variation,

More information

Use of Matching Methods for Causal Inference in Experimental and Observational Studies. This Talk Draws on the Following Papers:

Use of Matching Methods for Causal Inference in Experimental and Observational Studies. This Talk Draws on the Following Papers: Use of Matching Methods for Causal Inference in Experimental and Observational Studies Kosuke Imai Department of Politics Princeton University April 27, 2007 Kosuke Imai (Princeton University) Matching

More information

Section 3. Measures of Variation

Section 3. Measures of Variation Section 3 Measures of Variation Range Range = (maximum value) (minimum value) It is very sensitive to extreme values; therefore not as useful as other measures of variation. Sample Standard Deviation The

More information

Forecasting the use, costs and benefits of HSR in the years ahead. Samer Madanat UC Berkeley

Forecasting the use, costs and benefits of HSR in the years ahead. Samer Madanat UC Berkeley Forecasting the use, costs and benefits of HSR in the years ahead Samer Madanat UC Berkeley Outline Demand models and ridership forecasts Errors in demand models and consequences Case study: the CA HSR

More information

Letting p shows that {B t } t 0. Definition 0.5. For λ R let δ λ : A (V ) A (V ) be defined by. 1 = g (symmetric), and. 3. g

Letting p shows that {B t } t 0. Definition 0.5. For λ R let δ λ : A (V ) A (V ) be defined by. 1 = g (symmetric), and. 3. g 4 Contents.1 Lie group p variation results Suppose G, d) is a group equipped with a left invariant metric, i.e. Let a := d e, a), then d ca, cb) = d a, b) for all a, b, c G. d a, b) = d e, a 1 b ) = a

More information

Multivariate Lineare Modelle

Multivariate Lineare Modelle 0-1 TALEB AHMAD CASE - Center for Applied Statistics and Economics Humboldt-Universität zu Berlin Motivation 1-1 Motivation Multivariate regression models can accommodate many explanatory which simultaneously

More information

CompSci Understanding Data: Theory and Applications

CompSci Understanding Data: Theory and Applications CompSci 590.6 Understanding Data: Theory and Applications Lecture 17 Causality in Statistics Instructor: Sudeepa Roy Email: sudeepa@cs.duke.edu Fall 2015 1 Today s Reading Rubin Journal of the American

More information

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata Maura Department of Economics and Finance Università Tor Vergata Hypothesis Testing Outline It is a mistake to confound strangeness with mystery Sherlock Holmes A Study in Scarlet Outline 1 The Power Function

More information

Dynamic Discrete Choice Structural Models in Empirical IO

Dynamic Discrete Choice Structural Models in Empirical IO Dynamic Discrete Choice Structural Models in Empirical IO Lecture 4: Euler Equations and Finite Dependence in Dynamic Discrete Choice Models Victor Aguirregabiria (University of Toronto) Carlos III, Madrid

More information

Salt Lake Community College MATH 1040 Final Exam Fall Semester 2011 Form E

Salt Lake Community College MATH 1040 Final Exam Fall Semester 2011 Form E Salt Lake Community College MATH 1040 Final Exam Fall Semester 011 Form E Name Instructor Time Limit: 10 minutes Any hand-held calculator may be used. Computers, cell phones, or other communication devices

More information

Supporting Information. Controlling the Airwaves: Incumbency Advantage and. Community Radio in Brazil

Supporting Information. Controlling the Airwaves: Incumbency Advantage and. Community Radio in Brazil Supporting Information Controlling the Airwaves: Incumbency Advantage and Community Radio in Brazil Taylor C. Boas F. Daniel Hidalgo May 7, 2011 1 1 Descriptive Statistics Descriptive statistics for the

More information

Propensity Score Matching and Genetic Matching : Monte Carlo Results

Propensity Score Matching and Genetic Matching : Monte Carlo Results Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS060) p.5391 Propensity Score Matching and Genetic Matching : Monte Carlo Results Donzé, Laurent University of Fribourg

More information

CHAPTER 3 : A SYSTEMATIC APPROACH TO DECISION MAKING

CHAPTER 3 : A SYSTEMATIC APPROACH TO DECISION MAKING CHAPTER 3 : A SYSTEMATIC APPROACH TO DECISION MAKING 47 INTRODUCTION A l o g i c a l a n d s y s t e m a t i c d e c i s i o n - m a k i n g p r o c e s s h e l p s t h e d e c i s i o n m a k e r s a

More information

Molinas. June 15, 2018

Molinas. June 15, 2018 ITT8 SAMBa Presentation June 15, 2018 ling Data The data we have include: Approx 30,000 questionnaire responses each with 234 questions during 1998-2017 A data set of 60 questions asked to 500,000 households

More information

9 Correlation and Regression

9 Correlation and Regression 9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the

More information

Causal Modeling in Environmental Epidemiology. Joel Schwartz Harvard University

Causal Modeling in Environmental Epidemiology. Joel Schwartz Harvard University Causal Modeling in Environmental Epidemiology Joel Schwartz Harvard University When I was Young What do I mean by Causal Modeling? What would have happened if the population had been exposed to a instead

More information

SIMPLE CORRECTION FOR MEASUREMENT ERRORS WITH STATA

SIMPLE CORRECTION FOR MEASUREMENT ERRORS WITH STATA SIMPLE CORRECTION FOR MEASUREMENT ERRORS WITH STATA 8ª Reunión Usuarios Stata, Madrid 22th October 2015 Anna DeCastellarnau ESS-CST, Universitat Pompeu Fabra anna.decastellarnau@upf.edu A simple procedure

More information

Semi and Nonparametric Models in Econometrics

Semi and Nonparametric Models in Econometrics Semi and Nonparametric Models in Econometrics Part 4: partial identification Xavier d Haultfoeuille CREST-INSEE Outline Introduction First examples: missing data Second example: incomplete models Inference

More information

Supplementary Materials for Congressional Decision Making and the Separation of Powers

Supplementary Materials for Congressional Decision Making and the Separation of Powers Supplementary Materials for Congressional Decision Making and the Separation of Powers Andrew D. Martin February 19, 2001 1 Table 1: House Hierarchical Probit Estimates Strategic Model (Nominate Second

More information

Introduction to Survey Data Integration

Introduction to Survey Data Integration Introduction to Survey Data Integration Jae-Kwang Kim Iowa State University May 20, 2014 Outline 1 Introduction 2 Survey Integration Examples 3 Basic Theory for Survey Integration 4 NASS application 5

More information

Bayesian regression tree models for causal inference: regularization, confounding and heterogeneity

Bayesian regression tree models for causal inference: regularization, confounding and heterogeneity Bayesian regression tree models for causal inference: regularization, confounding and heterogeneity P. Richard Hahn, Jared Murray, and Carlos Carvalho June 22, 2017 The problem setting We want to estimate

More information

Technical Appendix C: Methods. Multilevel Regression Models

Technical Appendix C: Methods. Multilevel Regression Models Technical Appendix C: Methods Multilevel Regression Models As not all readers may be familiar with the analytical methods used in this study, a brief note helps to clarify the techniques. The firewall

More information

11 CHI-SQUARED Introduction. Objectives. How random are your numbers? After studying this chapter you should

11 CHI-SQUARED Introduction. Objectives. How random are your numbers? After studying this chapter you should 11 CHI-SQUARED Chapter 11 Chi-squared Objectives After studying this chapter you should be able to use the χ 2 distribution to test if a set of observations fits an appropriate model; know how to calculate

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

Chapter 7: Hypothesis testing

Chapter 7: Hypothesis testing Chapter 7: Hypothesis testing Hypothesis testing is typically done based on the cumulative hazard function. Here we ll use the Nelson-Aalen estimate of the cumulative hazard. The survival function is used

More information

Lecture 24: Partial correlation, multiple regression, and correlation

Lecture 24: Partial correlation, multiple regression, and correlation Lecture 24: Partial correlation, multiple regression, and correlation Ernesto F. L. Amaral November 21, 2017 Advanced Methods of Social Research (SOCI 420) Source: Healey, Joseph F. 2015. Statistics: A

More information

Mixed Integer Programming (MIP) for Daily Fantasy Sports, Statistics and Marketing

Mixed Integer Programming (MIP) for Daily Fantasy Sports, Statistics and Marketing Mixed Integer Programming (MIP) for Daily Fantasy Sports, Statistics and Marketing Juan Pablo Vielma Massachusetts Institute of Technology AM/ES 121, SEAS, Harvard. Boston, MA, November, 2016. MIP & Daily

More information

Matching Methods for Observational Microarray Studies

Matching Methods for Observational Microarray Studies Bioinformatics Advance Access published December 19, 2008 Matching Methods for Observational Microarray Studies Ruth Heller 1,, Elisabetta Manduchi 2 and Dylan Small 1 1 Department of Statistics, Wharton

More information

Might using the Internet while travelling affect car ownership plans of Millennials? Dr. David McArthur and Dr. Jinhyun Hong

Might using the Internet while travelling affect car ownership plans of Millennials? Dr. David McArthur and Dr. Jinhyun Hong Might using the Internet while travelling affect car ownership plans of Millennials? Dr. David McArthur and Dr. Jinhyun Hong Introduction Travel habits among Millennials (people born between 1980 and 2000)

More information

q3_3 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

q3_3 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. q3_3 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Provide an appropriate response. 1) In 2007, the number of wins had a mean of 81.79 with a standard

More information

Contents. Preface to Second Edition Preface to First Edition Abbreviations PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1

Contents. Preface to Second Edition Preface to First Edition Abbreviations PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1 Contents Preface to Second Edition Preface to First Edition Abbreviations xv xvii xix PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1 1 The Role of Statistical Methods in Modern Industry and Services

More information

Formula for the t-test

Formula for the t-test Formula for the t-test: How the t-test Relates to the Distribution of the Data for the Groups Formula for the t-test: Formula for the Standard Error of the Difference Between the Means Formula for the

More information

Introduction to causal identification. Nidhiya Menon IGC Summer School, New Delhi, July 2015

Introduction to causal identification. Nidhiya Menon IGC Summer School, New Delhi, July 2015 Introduction to causal identification Nidhiya Menon IGC Summer School, New Delhi, July 2015 Outline 1. Micro-empirical methods 2. Rubin causal model 3. More on Instrumental Variables (IV) Estimating causal

More information

Structured Problems and Algorithms

Structured Problems and Algorithms Integer and quadratic optimization problems Dept. of Engg. and Comp. Sci., Univ. of Cal., Davis Aug. 13, 2010 Table of contents Outline 1 2 3 Benefits of Structured Problems Optimization problems may become

More information

Empirical Likelihood Tests for High-dimensional Data

Empirical Likelihood Tests for High-dimensional Data Empirical Likelihood Tests for High-dimensional Data Department of Statistics and Actuarial Science University of Waterloo, Canada ICSA - Canada Chapter 2013 Symposium Toronto, August 2-3, 2013 Based on

More information

Lab 4, modified 2/25/11; see also Rogosa R-session

Lab 4, modified 2/25/11; see also Rogosa R-session Lab 4, modified 2/25/11; see also Rogosa R-session Stat 209 Lab: Matched Sets in R Lab prepared by Karen Kapur. 1 Motivation 1. Suppose we are trying to measure the effect of a treatment variable on the

More information

Simple New Keynesian Model without Capital

Simple New Keynesian Model without Capital Simple New Keynesian Model without Capital Lawrence J. Christiano March, 28 Objective Review the foundations of the basic New Keynesian model without capital. Clarify the role of money supply/demand. Derive

More information

Instrumental Variables

Instrumental Variables Instrumental Variables James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Instrumental Variables 1 / 10 Instrumental Variables

More information

What Accounts for the Growing Fluctuations in FamilyOECD Income March in the US? / 32

What Accounts for the Growing Fluctuations in FamilyOECD Income March in the US? / 32 What Accounts for the Growing Fluctuations in Family Income in the US? Peter Gottschalk and Sisi Zhang OECD March 2 2011 What Accounts for the Growing Fluctuations in FamilyOECD Income March in the US?

More information

Topic 9: Canonical Correlation

Topic 9: Canonical Correlation Topic 9: Canonical Correlation Ying Li Stockholm University October 22, 2012 1/19 Basic Concepts Objectives In canonical correlation analysis, we examine the linear relationship between a set of X variables

More information

Asymptotic Statistics-VI. Changliang Zou

Asymptotic Statistics-VI. Changliang Zou Asymptotic Statistics-VI Changliang Zou Kolmogorov-Smirnov distance Example (Kolmogorov-Smirnov confidence intervals) We know given α (0, 1), there is a well-defined d = d α,n such that, for any continuous

More information

GROWING APART: THE CHANGING FIRM-SIZE WAGE PREMIUM AND ITS INEQUALITY CONSEQUENCES ONLINE APPENDIX

GROWING APART: THE CHANGING FIRM-SIZE WAGE PREMIUM AND ITS INEQUALITY CONSEQUENCES ONLINE APPENDIX GROWING APART: THE CHANGING FIRM-SIZE WAGE PREMIUM AND ITS INEQUALITY CONSEQUENCES ONLINE APPENDIX The following document is the online appendix for the paper, Growing Apart: The Changing Firm-Size Wage

More information

Model generation and model selection in credit scoring

Model generation and model selection in credit scoring Model generation and model selection in credit scoring Vadim STRIJOV Russian Academy of Sciences Computing Center EURO 2010 Lisbon July 14 th The workflow Client s application & history Client s score:

More information

Hypothesis testing. 1 Principle of hypothesis testing 2

Hypothesis testing. 1 Principle of hypothesis testing 2 Hypothesis testing Contents 1 Principle of hypothesis testing One sample tests 3.1 Tests on Mean of a Normal distribution..................... 3. Tests on Variance of a Normal distribution....................

More information

Statistics and parameters

Statistics and parameters Statistics and parameters Tables, histograms and other charts are used to summarize large amounts of data. Often, an even more extreme summary is desirable. Statistics and parameters are numbers that characterize

More information

Introduction to Propensity Score Matching: A Review and Illustration

Introduction to Propensity Score Matching: A Review and Illustration Introduction to Propensity Score Matching: A Review and Illustration Shenyang Guo, Ph.D. School of Social Work University of North Carolina at Chapel Hill January 28, 2005 For Workshop Conducted at the

More information

Neighborhood social characteristics and chronic disease outcomes: does the geographic scale of neighborhood matter? Malia Jones

Neighborhood social characteristics and chronic disease outcomes: does the geographic scale of neighborhood matter? Malia Jones Neighborhood social characteristics and chronic disease outcomes: does the geographic scale of neighborhood matter? Malia Jones Prepared for consideration for PAA 2013 Short Abstract Empirical research

More information

Instructions: Closed book, notes, and no electronic devices. Points (out of 200) in parentheses

Instructions: Closed book, notes, and no electronic devices. Points (out of 200) in parentheses ISQS 5349 Final Spring 2011 Instructions: Closed book, notes, and no electronic devices. Points (out of 200) in parentheses 1. (10) What is the definition of a regression model that we have used throughout

More information

Instrumental Variables

Instrumental Variables James H. Steiger Department of Psychology and Human Development Vanderbilt University Regression Modeling, 2009 1 Introduction 2 3 4 Instrumental variables allow us to get a better estimate of a causal

More information

Concepts and Applications of Kriging

Concepts and Applications of Kriging Esri International User Conference San Diego, California Technical Workshops July 24, 2012 Concepts and Applications of Kriging Konstantin Krivoruchko Eric Krause Outline Intro to interpolation Exploratory

More information

Supplement to The cyclical dynamics of illiquid housing, debt, and foreclosures (Quantitative Economics, Vol. 7, No. 1, March 2016, )

Supplement to The cyclical dynamics of illiquid housing, debt, and foreclosures (Quantitative Economics, Vol. 7, No. 1, March 2016, ) Supplementary Material Supplement to The cyclical dynamics of illiquid housing, debt, and foreclosures Quantitative Economics, Vol. 7, No. 1, March 2016, 289 328) Aaron Hedlund Department of Economics,

More information

Comparing latent inequality with ordinal health data

Comparing latent inequality with ordinal health data Comparing latent inequality with ordinal health data David M. Kaplan University of Missouri Longhao Zhuo University of Missouri Midwest Econometrics Group October 2018 Dave Kaplan (Missouri) and Longhao

More information

A METHODOLOGY TO COMPUTE REGIONAL HOUSING INDEX PRICE. Dusan Paredes-Araya USING MATCHING ESTIMATOR METHODS

A METHODOLOGY TO COMPUTE REGIONAL HOUSING INDEX PRICE. Dusan Paredes-Araya USING MATCHING ESTIMATOR METHODS The Regional Economics Applications Laboratory (REAL) is a unit of University of Illinois focusing on the development and use of analytical models for urban and region economic development. The purpose

More information

Optimal Data-Driven Regression Discontinuity Plots. Supplemental Appendix

Optimal Data-Driven Regression Discontinuity Plots. Supplemental Appendix Optimal Data-Driven Regression Discontinuity Plots Supplemental Appendix Sebastian Calonico Matias D. Cattaneo Rocio Titiunik November 25, 2015 Abstract This supplemental appendix contains the proofs of

More information

Rockefeller College University at Albany

Rockefeller College University at Albany Rockefeller College University at Albany PAD 705 Handout: Simultaneous quations and Two-Stage Least Squares So far, we have studied examples where the causal relationship is quite clear: the value of the

More information

Diploma Part 2. Quantitative Methods. Examiners Suggested Answers

Diploma Part 2. Quantitative Methods. Examiners Suggested Answers Diploma Part 2 Quantitative Methods Examiners Suggested Answers Q1 (a) A frequency distribution is a table or graph (i.e. a histogram) that shows the total number of measurements that fall in each of a

More information

Empirical approaches in public economics

Empirical approaches in public economics Empirical approaches in public economics ECON4624 Empirical Public Economics Fall 2016 Gaute Torsvik Outline for today The canonical problem Basic concepts of causal inference Randomized experiments Non-experimental

More information

Proofs and derivations

Proofs and derivations A Proofs and derivations Proposition 1. In the sheltering decision problem of section 1.1, if m( b P M + s) = u(w b P M + s), where u( ) is weakly concave and twice continuously differentiable, then f

More information

Stat 710: Mathematical Statistics Lecture 31

Stat 710: Mathematical Statistics Lecture 31 Stat 710: Mathematical Statistics Lecture 31 Jun Shao Department of Statistics University of Wisconsin Madison, WI 53706, USA Jun Shao (UW-Madison) Stat 710, Lecture 31 April 13, 2009 1 / 13 Lecture 31:

More information

Analysis of Gamma and Weibull Lifetime Data under a General Censoring Scheme and in the presence of Covariates

Analysis of Gamma and Weibull Lifetime Data under a General Censoring Scheme and in the presence of Covariates Communications in Statistics - Theory and Methods ISSN: 0361-0926 (Print) 1532-415X (Online) Journal homepage: http://www.tandfonline.com/loi/lsta20 Analysis of Gamma and Weibull Lifetime Data under a

More information

2. Variance and Higher Moments

2. Variance and Higher Moments 1 of 16 7/16/2009 5:45 AM Virtual Laboratories > 4. Expected Value > 1 2 3 4 5 6 2. Variance and Higher Moments Recall that by taking the expected value of various transformations of a random variable,

More information