Using Mixed Integer Programming for Matching in Observational Studies
|
|
- Jeffry Hardy
- 5 years ago
- Views:
Transcription
1 Using Mixed Integer Programming for Matching in Observational Studies José R. Zubizarreta Department Statistics The Wharton School University of Pennsylvania
2 Key takeaway points Optimal matching method Get the balance you want Know it is infeasible Eliminate guesswork Directly balance several statistics beyond means Keep the adjustments simple enough People can talk about them Sensitivity analysis to unobserved biases José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 1 / 17
3 Outline The 2010 Chilean earthquake Optimal matching via mixed integer programming Applications Summary and remarks José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 1 / 17
4 Outline The 2010 Chilean earthquake Optimal matching via mixed integer programming Applications Summary and remarks José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 1 / 17
5 The 2010 Chilean earthquake 4th strongest earthquake in the world in the last 50 years Sebastián Martínez/AP Photo José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 2 / 17
6 Effect of the earthquake Effect of the earthquake on posttraumatic stress? The post earthquake survey (EPT) 7/89"! 9).<=>2)?&! 9:;!!"#$!!!!!!!!!!%&'!!!!!!!!!!!!()*!!!!!!!!!!!!!+&,!!!!!!!!!!-).!!!!!!!!!!!!/0.!!!!!!!!!!!-)1!!!!!!!!!!!(2*!!!!!!!!!!!!!3445!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!3464! Re-interviewed 22,456 households from CASEN 2009 Detailed measurements of the same individuals before and after José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 3 / 17
7 Intensity of the earthquake Peak ground acceleration (PGA) in the communes of the EPT Jose R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 4 / 17
8 Matched design Matched respondents with PGA < to those with PGA We matched exactly for sex, age and ethnic groups with fine balance for self-rated health, quality of the housing balancing the entire empirical distributions of income mean balancing the 46 covariates in the study José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 5 / 17
9 Outline The 2010 Chilean earthquake Optimal matching via mixed integer programming Applications Summary and remarks José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 5 / 17
10 Notation Let T = {t 1,..., t T } be the set of treated units, and C = {c 1,..., c C }, the set of potential controls, with T C Define P = {p 1,..., p P } as the set of observed covariates Each treated unit t T has a vector of observed covariates x t, = {x t,p1,..., x t,pp }, and each control c C has a similar vector x c, = {x c,p1,..., x c,pp } Based on these covariates there is a distance 0 δ t,c < between treated and control units Decision variable a t,c = { 1 if treated t is assigned to control c 0 otherwise José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 6 / 17
11 The assignment algorithm minimize a subject to δ t,c a t,c t T c C a t,c = m, t T c C a t,c 1, c C t T a t,c {0, 1}, t T, c C José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 7 / 17
12 A MIP with direct balance via the objective function minimize a subject to δ t,c a t,c + ω i µ i (a) j J t T c C a t,c = m, t T c C a t,c 1, c C t T a t,c {0, 1}, t T, c C José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 7 / 17
13 A MIP with direct balance via the constraints minimize a subject to δ t,c a t,c t T c C a t,c = m, t T c C a t,c 1, c C t T a t,c {0, 1}, t T, c C ν j (a) ε j, j J José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 7 / 17
14 Outline The 2010 Chilean earthquake Optimal matching via mixed integer programming Applications Summary and remarks José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 7 / 17
15 Balancing the means of the covariates (1) minimize a subject to δ t,c a t,c + x c,j a t,c ω j mt x T,j t T c C j J t T c C a t,c = m, t T c C a t,c 1, c C t T a t,c {0, 1}, t T, c C José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 8 / 17
16 Balancing the means of the covariates (1) minimize a,z subject to δ t,c a t,c + ω j z j j J t T c C a t,c = m, t T c C a t,c 1, c C t T a t,c {0, 1}, t T, c C z j x c,j a t,c mt x T,j, j J t T c C z j x c,j a t,c mt + x T,j, j J t T c C José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 8 / 17
17 Balancing the means of the covariates (1) minimize a,z subject to δ t,c a t,c t T c C a t,c = m, t T c C a t,c 1, c C t T a t,c {0, 1}, t T, c C x c,j a t,c mt x T,j ε j, j J t T c C x c,j a t,c mt + x T,j ε j, j J t T c C José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 8 / 17
18 Balancing the means of the covariates (2) age_09 sex_09 n_per_hh_09 mar_coh_09 div_wid_09 single_09 rural_09 hlth_prb_d_09 psy_prb_09 hptlizd_09 dis_ss_lo_09 dis_md_sv_09 dis_no_09 dis_ndat_09 fonasa_09 isapre_09 others_09 no_ins_09 dnk_ins_09 yrs_edu_09 employ_09 unempl_09 inacti_09 w_i_09 pc_a_i_09 pc_t_i_09 poor_09 hs_no_oc_09 hs_md_oc_09 hs_cr_oc_09 hs_own_09 hs_rent_09 hs_ced_09 hs_irr_09 Before matching Absolute standardized differences in means José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 9 / 17
19 Balancing the means of the covariates (2) age_09 sex_09 n_per_hh_09 mar_coh_09 div_wid_09 single_09 rural_09 hlth_prb_d_09 psy_prb_09 hptlizd_09 dis_ss_lo_09 dis_md_sv_09 dis_no_09 dis_ndat_09 fonasa_09 isapre_09 others_09 no_ins_09 dnk_ins_09 yrs_edu_09 employ_09 unempl_09 inacti_09 w_i_09 pc_a_i_09 pc_t_i_09 poor_09 hs_no_oc_09 hs_md_oc_09 hs_cr_oc_09 hs_own_09 hs_rent_09 hs_ced_09 hs_irr_09 Before matching After matching, assignment algorithm Absolute standardized differences in means José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 9 / 17
20 Balancing the means of the covariates (2) age_09 sex_09 n_per_hh_09 mar_coh_09 div_wid_09 single_09 rural_09 hlth_prb_d_09 psy_prb_09 hptlizd_09 dis_ss_lo_09 dis_md_sv_09 dis_no_09 dis_ndat_09 fonasa_09 isapre_09 others_09 no_ins_09 dnk_ins_09 yrs_edu_09 employ_09 unempl_09 inacti_09 w_i_09 pc_a_i_09 pc_t_i_09 poor_09 hs_no_oc_09 hs_md_oc_09 hs_cr_oc_09 hs_own_09 hs_rent_09 hs_ced_09 hs_irr_09 Before matching After matching, assignment algorithm After matching, mipmatch Absolute standardized differences in means José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 9 / 17
21 Balancing the means of the covariates (2) age_09 sex_09 n_per_hh_09 mar_coh_09 div_wid_09 single_09 rural_09 hlth_prb_d_09 psy_prb_09 hptlizd_09 dis_ss_lo_09 dis_md_sv_09 dis_no_09 dis_ndat_09 fonasa_09 isapre_09 others_09 no_ins_09 dnk_ins_09 yrs_edu_09 employ_09 unempl_09 inacti_09 w_i_09 pc_a_i_09 pc_t_i_09 poor_09 hs_no_oc_09 hs_md_oc_09 hs_cr_oc_09 hs_own_09 hs_rent_09 hs_ced_09 hs_irr_09 Before matching After matching, assignment algorithm After matching, mipmatch Absolute standardized differences in means José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 9 / 17
22 Balancing the means of the covariates (2) age_09 sex_09 n_per_hh_09 mar_coh_09 div_wid_09 single_09 rural_09 hlth_prb_d_09 psy_prb_09 hptlizd_09 dis_ss_lo_09 dis_md_sv_09 dis_no_09 dis_ndat_09 fonasa_09 isapre_09 others_09 no_ins_09 dnk_ins_09 yrs_edu_09 employ_09 unempl_09 inacti_09 w_i_09 pc_a_i_09 pc_t_i_09 poor_09 hs_no_oc_09 hs_md_oc_09 hs_cr_oc_09 hs_own_09 hs_rent_09 hs_ced_09 hs_irr_09 Before matching After matching, assignment algorithm After matching, mipmatch Absolute standardized differences in means José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 9 / 17
23 Balancing Kolmogorov-Smirnov statistics (1) ω j µ j (a) = ω j sup FT (x c,p ) F C (x c,p, a) x c,p G(x T,p ) j J = ω j z j 1 G(x T,p ) t T c C 1 {xg 1;p x c,p<x g;p}a t,c mt x g;p G(x T,p ) José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 10 / 17
24 Balancing Kolmogorov-Smirnov statistics (2) ECDF(x) Controls Exposed, before matching Exposed, after matching, assignment algorithm Exposed, after matching, mipmatch x = Household per capita income (thousand pesos) José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 11 / 17
25 Balancing Kolmogorov-Smirnov statistics (2) ECDF(x) Controls Exposed, before matching Exposed, after matching, assignment algorithm Exposed, after matching, mipmatch x = Household per capita income (thousand pesos) José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 11 / 17
26 Balancing Kolmogorov-Smirnov statistics (2) ECDF(x) Controls Exposed, before matching Exposed, after matching, assignment algorithm Exposed, after matching, mipmatch x = Household per capita income (thousand pesos) José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 11 / 17
27 Fine and near-fine balance for several covariates (1) Fine balance: a t,c 1 {xc,p=b} = m 1 {xt,p=b} b B t T c C t T Near-fine balance: a t,c 1 {xc,p=b} m 1 {xt,p=b} ξ b B t T c C t T José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 12 / 17
28 Fine and near-fine balance for several covariates (2) Table: Fine balance for self-rated health Exposed Controls Poor Good Fair Table: Fine balance for material quality of the housing Exposed Controls Acceptable Unacceptable Beyond repair José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 13 / 17
29 Density plot of PTS scores Estimated Density of Pair Differences Density Density estimate x = Exposed-minus-control pair differences in PTS scores Boxplot of Pair Differences Exposed-minus-control pair differences in PTS scores José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 14 / 17
30 Outline The 2010 Chilean earthquake Optimal matching via mixed integer programming Applications Summary and remarks José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 14 / 17
31 Summary and remarks Explicitly optimize or constrain the criteria used to assess the quality of the match Meet the criteria Know that the criteria is infeasible Directly balance Means Variances and skewness Correlations Quantiles Kolmogorov-Smirnov statistic While matching with exact, near-exact, fine and near-fine balance for more than one covariate A systematic method for improving covariate balance José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 15 / 17
32 Extensions Optimal subset matching Building a stronger instrumental variable Enhancing regression discontinuity designs R package mipmatch José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 16 / 17
33 References Zubizarreta, J. R. (2012), Using Mixed Integer Programming for Matching in an Observational Study of Acute Kidney Injury after Surgery, under revision. Zubizarreta, J. R., Cerdá, M. and Rosenbaum, P. R. (2012), Effect of the 2010 Chilean Earthquake on Posttraumatic Stress: Designing an Observational Study to be Less Sensitive to Unmeasured Biases, under revision. Zubizarreta, J. R., Reinke, C. E., Kelz, R. R., Silber, J. H. and Rosenbaum, P. R. (2011), Matching for Several Sparse Nominal Variables in a Case-Control Study of Readmission Following Surgery, The American Statistician, 65, José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 17 / 17
34 Using Mixed Integer Programming for Matching in Observational Studies José R. Zubizarreta Department Statistics The Wharton School University of Pennsylvania José R. Zubizarreta (Statistics, Wharton) Mixed Integer Programming for Matching 05/25/12 17 / 17
Using Mixed Integer Programming for Matching in an Observational Study of Kidney Failure after Surgery
Using Mixed Integer Programming for Matching in an Observational Study of Kidney Failure after Surgery José R. Zubizarreta Abstract This paper presents a new method for optimal matching in observational
More informationStrong control of the family-wise error rate in observational studies that discover effect modification by exploratory methods
Strong control of the family-wise error rate in observational studies that discover effect modification by exploratory methods Jesse Y. Hsu 1, José R. Zubizarreta, Dylan S. Small, Paul R. Rosenbaum University
More informationHandling Limited Overlap in Observational Studies with Cardinality Matching
Observational Studies 4 (2018) 217-249 Submitted 11/17; Published 7/18 Handling Limited Overlap in Observational Studies with Cardinality Matching Giancarlo Visconti Department of Political Science Purdue
More informationStable Weights that Balance Covariates for Estimation with Incomplete Outcome Data
Stable Weights that Balance Covariates for Estimation with Incomplete Outcome Data José R. Zubizarreta Abstract Weighting methods that adjust for observed covariates, such as inverse probability weighting,
More informationLarge, Sparse Optimal Matching with Refined Covariate Balance in an Observational Study of the Health Outcomes Produced by New Surgeons
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1-23-2015 Large, Sparse Optimal Matching with Refined Covariate Balance in an Observational Study of the Health Outcomes
More informationThe Nonparametric Bootstrap
The Nonparametric Bootstrap The nonparametric bootstrap may involve inferences about a parameter, but we use a nonparametric procedure in approximating the parametric distribution using the ECDF. We use
More informationWhat s New in Econometrics. Lecture 1
What s New in Econometrics Lecture 1 Estimation of Average Treatment Effects Under Unconfoundedness Guido Imbens NBER Summer Institute, 2007 Outline 1. Introduction 2. Potential Outcomes 3. Estimands and
More informationOptimal Matching with Minimal Deviation from Fine Balance in a Study of Obesity and Surgical Outcomes
University of Pennsylvania ScholarlyCommons Health Care Management Papers Wharton Faculty Research 6-2012 Optimal Matching with Minimal Deviation from Fine Balance in a Study of Obesity and Surgical Outcomes
More informationIncentives and Nutrition for Rotten Kids: Intrahousehold Food Allocation in the Philippines
Incentives and Nutrition for Rotten Kids: Intrahousehold Food Allocation in the Philippines Pierre Dubois and Ethan Ligon presented by Rachel Heath November 3, 2006 Introduction Outline Introduction Modification
More informationMATCHING FOR BALANCE, PAIRING FOR HETEROGENEITY IN AN OBSERVATIONAL STUDY OF THE EFFECTIVENESS OF FOR-PROFIT AND NOT-FOR-PROFIT HIGH SCHOOLS IN CHILE
The Annals of Applied Statistics 2014, Vol. 8, No. 1, 204 231 DOI: 10.1214/13-AOAS713 Institute of Mathematical Statistics, 2014 MATCHING FOR BALANCE, PAIRING FOR HETEROGENEITY IN AN OBSERVATIONAL STUDY
More informationDetermining Changes in Welfare Distributions at the Micro-level: Updating Poverty Maps By Chris Elbers, Jean O. Lanjouw, and Peter Lanjouw 1
Determining Changes in Welfare Distributions at the Micro-level: Updating Poverty Maps By Chris Elbers, Jean O. Lanjouw, and Peter Lanjouw 1 Income and wealth distributions have a prominent position in
More informationLecture (chapter 13): Association between variables measured at the interval-ratio level
Lecture (chapter 13): Association between variables measured at the interval-ratio level Ernesto F. L. Amaral April 9 11, 2018 Advanced Methods of Social Research (SOCI 420) Source: Healey, Joseph F. 2015.
More informationCausal Inference with Big Data Sets
Causal Inference with Big Data Sets Marcelo Coca Perraillon University of Colorado AMC November 2016 1 / 1 Outlone Outline Big data Causal inference in economics and statistics Regression discontinuity
More informationVariable selection and machine learning methods in causal inference
Variable selection and machine learning methods in causal inference Debashis Ghosh Department of Biostatistics and Informatics Colorado School of Public Health Joint work with Yeying Zhu, University of
More informationT-Test QUESTION T-TEST GROUPS = sex(1 2) /MISSING = ANALYSIS /VARIABLES = quiz1 quiz2 quiz3 quiz4 quiz5 final total /CRITERIA = CI(.95).
QUESTION 11.1 GROUPS = sex(1 2) /MISSING = ANALYSIS /VARIABLES = quiz2 quiz3 quiz4 quiz5 final total /CRITERIA = CI(.95). Group Statistics quiz2 quiz3 quiz4 quiz5 final total sex N Mean Std. Deviation
More informationUse of Matching Methods for Causal Inference in Experimental and Observational Studies. This Talk Draws on the Following Papers:
Use of Matching Methods for Causal Inference in Experimental and Observational Studies Kosuke Imai Department of Politics Princeton University April 13, 2009 Kosuke Imai (Princeton University) Matching
More informationCSSS/STAT/SOC 321 Case-Based Social Statistics I. Levels of Measurement
CSSS/STAT/SOC 321 Case-Based Social Statistics I Levels of Measurement Christopher Adolph Department of Political Science and Center for Statistics and the Social Sciences University of Washington, Seattle
More informationSTA 291 Lecture 16. Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately) normal
STA 291 Lecture 16 Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately) normal X STA 291 - Lecture 16 1 Sampling Distributions Sampling
More informationUsing Instrumental Variables to Find Causal Effects in Public Health
1 Using Instrumental Variables to Find Causal Effects in Public Health Antonio Trujillo, PhD John Hopkins Bloomberg School of Public Health Department of International Health Health Systems Program October
More informationDescriptive Statistics
Descriptive Statistics DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall17 Carlos Fernandez-Granda Descriptive statistics Techniques to visualize
More informationSurvey nonresponse and the distribution of income
Survey nonresponse and the distribution of income Emanuela Galasso* Development Research Group, World Bank Module 1. Sampling for Surveys 1: Why are we concerned about non response? 2: Implications for
More informationNew Developments in Nonresponse Adjustment Methods
New Developments in Nonresponse Adjustment Methods Fannie Cobben January 23, 2009 1 Introduction In this paper, we describe two relatively new techniques to adjust for (unit) nonresponse bias: The sample
More informationunadjusted model for baseline cholesterol 22:31 Monday, April 19,
unadjusted model for baseline cholesterol 22:31 Monday, April 19, 2004 1 Class Level Information Class Levels Values TRETGRP 3 3 4 5 SEX 2 0 1 Number of observations 916 unadjusted model for baseline cholesterol
More informationTA session# 8. Jun Sakamoto November 29, Empirical study Empirical study Empirical study 3 3
TA session# 8 Jun Sakamoto November 29,2018 Contents 1 Empirical study 1 1 2 Empirical study 2 2 3 Empirical study 3 3 4 Empirical study 4 4 We will look at some empirical studies for panel data analysis.
More informationLinear Methods for Classification
Linear Methods for Classification Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Classification Supervised learning Training data: {(x 1, g 1 ), (x 2, g 2 ),..., (x
More informationAdvanced Statistical Methods for Observational Studies L E C T U R E 0 6
Advanced Statistical Methods for Observational Studies L E C T U R E 0 6 class management Problem set 1 is posted Questions? design thus far We re off to a bad start. 1 2 1 2 1 2 2 2 1 1 1 2 1 1 2 2 2
More informationUnit 20: Planning Accelerated Life Tests
Unit 20: Planning Accelerated Life Tests Ramón V. León Notes largely based on Statistical Methods for Reliability Data by W.Q. Meeker and L. A. Escobar, Wiley, 1998 and on their class notes. 11/13/2004
More informationOutline. Possible Reasons. Nature of Heteroscedasticity. Basic Econometrics in Transportation. Heteroscedasticity
1/25 Outline Basic Econometrics in Transportation Heteroscedasticity What is the nature of heteroscedasticity? What are its consequences? How does one detect it? What are the remedial measures? Amir Samimi
More informationThe cover page of the Encyclopedia of Health Economics (2014) Introduction to Econometric Application in Health Economics
PHPM110062 Teaching Demo The cover page of the Encyclopedia of Health Economics (2014) Introduction to Econometric Application in Health Economics Instructor: Mengcen Qian School of Public Health What
More informationYou are permitted to use your own calculator where it has been stamped as approved by the University.
ECONOMICS TRIPOS Part I Friday 11 June 004 9 1 Paper 3 Quantitative Methods in Economics This exam comprises four sections. Sections A and B are on Mathematics; Sections C and D are on Statistics. You
More informationDeclarative Statistics
Declarative Statistics Roberto Rossi, 1 Özgür Akgün, 2 Steven D. Prestwich, 3 S. Armagan Tarim 3 1 The University of Edinburgh Business School, The University of Edinburgh, UK 2 Department of Computer
More informationMultistate models in survival and event history analysis
Multistate models in survival and event history analysis Dorota M. Dabrowska UCLA November 8, 2011 Research supported by the grant R01 AI067943 from NIAID. The content is solely the responsibility of the
More informationarxiv: v1 [stat.ap] 14 Apr 2014
The Annals of Applied Statistics 2014, Vol. 8, No. 1, 204 231 DOI: 10.1214/13-AOAS713 c Institute of Mathematical Statistics, 2014 arxiv:1404.3584v1 [stat.ap] 14 Apr 2014 MATCHING FOR BALANCE, PAIRING
More informationGiven a sample of n observations measured on k IVs and one DV, we obtain the equation
Psychology 8 Lecture #13 Outline Prediction and Cross-Validation One of the primary uses of MLR is for prediction of the value of a dependent variable for future observations, or observations that were
More informationRegression Analysis Tutorial 34 LECTURE / DISCUSSION. Statistical Properties of OLS
Regression Analysis Tutorial 34 LETURE / DISUSSION Statistical Properties of OLS Regression Analysis Tutorial 35 Statistical Properties of OLS y = " + $x + g dependent included omitted variable explanatory
More informationDescriptive Data Summarization
Descriptive Data Summarization Descriptive data summarization gives the general characteristics of the data and identify the presence of noise or outliers, which is useful for successful data cleaning
More informationSimple New Keynesian Model without Capital. Lawrence J. Christiano
Simple New Keynesian Model without Capital Lawrence J. Christiano Outline Formulate the nonlinear equilibrium conditions of the model. Need actual nonlinear conditions to study Ramsey optimal policy, even
More informationIntroduction to Econometrics. Heteroskedasticity
Introduction to Econometrics Introduction Heteroskedasticity When the variance of the errors changes across segments of the population, where the segments are determined by different values for the explanatory
More informationStatistics for Managers using Microsoft Excel 6 th Edition
Statistics for Managers using Microsoft Excel 6 th Edition Chapter 3 Numerical Descriptive Measures 3-1 Learning Objectives In this chapter, you learn: To describe the properties of central tendency, variation,
More informationUse of Matching Methods for Causal Inference in Experimental and Observational Studies. This Talk Draws on the Following Papers:
Use of Matching Methods for Causal Inference in Experimental and Observational Studies Kosuke Imai Department of Politics Princeton University April 27, 2007 Kosuke Imai (Princeton University) Matching
More informationSection 3. Measures of Variation
Section 3 Measures of Variation Range Range = (maximum value) (minimum value) It is very sensitive to extreme values; therefore not as useful as other measures of variation. Sample Standard Deviation The
More informationForecasting the use, costs and benefits of HSR in the years ahead. Samer Madanat UC Berkeley
Forecasting the use, costs and benefits of HSR in the years ahead Samer Madanat UC Berkeley Outline Demand models and ridership forecasts Errors in demand models and consequences Case study: the CA HSR
More informationLetting p shows that {B t } t 0. Definition 0.5. For λ R let δ λ : A (V ) A (V ) be defined by. 1 = g (symmetric), and. 3. g
4 Contents.1 Lie group p variation results Suppose G, d) is a group equipped with a left invariant metric, i.e. Let a := d e, a), then d ca, cb) = d a, b) for all a, b, c G. d a, b) = d e, a 1 b ) = a
More informationMultivariate Lineare Modelle
0-1 TALEB AHMAD CASE - Center for Applied Statistics and Economics Humboldt-Universität zu Berlin Motivation 1-1 Motivation Multivariate regression models can accommodate many explanatory which simultaneously
More informationCompSci Understanding Data: Theory and Applications
CompSci 590.6 Understanding Data: Theory and Applications Lecture 17 Causality in Statistics Instructor: Sudeepa Roy Email: sudeepa@cs.duke.edu Fall 2015 1 Today s Reading Rubin Journal of the American
More informationTesting Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata
Maura Department of Economics and Finance Università Tor Vergata Hypothesis Testing Outline It is a mistake to confound strangeness with mystery Sherlock Holmes A Study in Scarlet Outline 1 The Power Function
More informationDynamic Discrete Choice Structural Models in Empirical IO
Dynamic Discrete Choice Structural Models in Empirical IO Lecture 4: Euler Equations and Finite Dependence in Dynamic Discrete Choice Models Victor Aguirregabiria (University of Toronto) Carlos III, Madrid
More informationSalt Lake Community College MATH 1040 Final Exam Fall Semester 2011 Form E
Salt Lake Community College MATH 1040 Final Exam Fall Semester 011 Form E Name Instructor Time Limit: 10 minutes Any hand-held calculator may be used. Computers, cell phones, or other communication devices
More informationSupporting Information. Controlling the Airwaves: Incumbency Advantage and. Community Radio in Brazil
Supporting Information Controlling the Airwaves: Incumbency Advantage and Community Radio in Brazil Taylor C. Boas F. Daniel Hidalgo May 7, 2011 1 1 Descriptive Statistics Descriptive statistics for the
More informationPropensity Score Matching and Genetic Matching : Monte Carlo Results
Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS060) p.5391 Propensity Score Matching and Genetic Matching : Monte Carlo Results Donzé, Laurent University of Fribourg
More informationCHAPTER 3 : A SYSTEMATIC APPROACH TO DECISION MAKING
CHAPTER 3 : A SYSTEMATIC APPROACH TO DECISION MAKING 47 INTRODUCTION A l o g i c a l a n d s y s t e m a t i c d e c i s i o n - m a k i n g p r o c e s s h e l p s t h e d e c i s i o n m a k e r s a
More informationMolinas. June 15, 2018
ITT8 SAMBa Presentation June 15, 2018 ling Data The data we have include: Approx 30,000 questionnaire responses each with 234 questions during 1998-2017 A data set of 60 questions asked to 500,000 households
More information9 Correlation and Regression
9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the
More informationCausal Modeling in Environmental Epidemiology. Joel Schwartz Harvard University
Causal Modeling in Environmental Epidemiology Joel Schwartz Harvard University When I was Young What do I mean by Causal Modeling? What would have happened if the population had been exposed to a instead
More informationSIMPLE CORRECTION FOR MEASUREMENT ERRORS WITH STATA
SIMPLE CORRECTION FOR MEASUREMENT ERRORS WITH STATA 8ª Reunión Usuarios Stata, Madrid 22th October 2015 Anna DeCastellarnau ESS-CST, Universitat Pompeu Fabra anna.decastellarnau@upf.edu A simple procedure
More informationSemi and Nonparametric Models in Econometrics
Semi and Nonparametric Models in Econometrics Part 4: partial identification Xavier d Haultfoeuille CREST-INSEE Outline Introduction First examples: missing data Second example: incomplete models Inference
More informationSupplementary Materials for Congressional Decision Making and the Separation of Powers
Supplementary Materials for Congressional Decision Making and the Separation of Powers Andrew D. Martin February 19, 2001 1 Table 1: House Hierarchical Probit Estimates Strategic Model (Nominate Second
More informationIntroduction to Survey Data Integration
Introduction to Survey Data Integration Jae-Kwang Kim Iowa State University May 20, 2014 Outline 1 Introduction 2 Survey Integration Examples 3 Basic Theory for Survey Integration 4 NASS application 5
More informationBayesian regression tree models for causal inference: regularization, confounding and heterogeneity
Bayesian regression tree models for causal inference: regularization, confounding and heterogeneity P. Richard Hahn, Jared Murray, and Carlos Carvalho June 22, 2017 The problem setting We want to estimate
More informationTechnical Appendix C: Methods. Multilevel Regression Models
Technical Appendix C: Methods Multilevel Regression Models As not all readers may be familiar with the analytical methods used in this study, a brief note helps to clarify the techniques. The firewall
More information11 CHI-SQUARED Introduction. Objectives. How random are your numbers? After studying this chapter you should
11 CHI-SQUARED Chapter 11 Chi-squared Objectives After studying this chapter you should be able to use the χ 2 distribution to test if a set of observations fits an appropriate model; know how to calculate
More informationLecture 14: Introduction to Poisson Regression
Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why
More informationModelling counts. Lecture 14: Introduction to Poisson Regression. Overview
Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week
More informationChapter 7: Hypothesis testing
Chapter 7: Hypothesis testing Hypothesis testing is typically done based on the cumulative hazard function. Here we ll use the Nelson-Aalen estimate of the cumulative hazard. The survival function is used
More informationLecture 24: Partial correlation, multiple regression, and correlation
Lecture 24: Partial correlation, multiple regression, and correlation Ernesto F. L. Amaral November 21, 2017 Advanced Methods of Social Research (SOCI 420) Source: Healey, Joseph F. 2015. Statistics: A
More informationMixed Integer Programming (MIP) for Daily Fantasy Sports, Statistics and Marketing
Mixed Integer Programming (MIP) for Daily Fantasy Sports, Statistics and Marketing Juan Pablo Vielma Massachusetts Institute of Technology AM/ES 121, SEAS, Harvard. Boston, MA, November, 2016. MIP & Daily
More informationMatching Methods for Observational Microarray Studies
Bioinformatics Advance Access published December 19, 2008 Matching Methods for Observational Microarray Studies Ruth Heller 1,, Elisabetta Manduchi 2 and Dylan Small 1 1 Department of Statistics, Wharton
More informationMight using the Internet while travelling affect car ownership plans of Millennials? Dr. David McArthur and Dr. Jinhyun Hong
Might using the Internet while travelling affect car ownership plans of Millennials? Dr. David McArthur and Dr. Jinhyun Hong Introduction Travel habits among Millennials (people born between 1980 and 2000)
More informationq3_3 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.
q3_3 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Provide an appropriate response. 1) In 2007, the number of wins had a mean of 81.79 with a standard
More informationContents. Preface to Second Edition Preface to First Edition Abbreviations PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1
Contents Preface to Second Edition Preface to First Edition Abbreviations xv xvii xix PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1 1 The Role of Statistical Methods in Modern Industry and Services
More informationFormula for the t-test
Formula for the t-test: How the t-test Relates to the Distribution of the Data for the Groups Formula for the t-test: Formula for the Standard Error of the Difference Between the Means Formula for the
More informationIntroduction to causal identification. Nidhiya Menon IGC Summer School, New Delhi, July 2015
Introduction to causal identification Nidhiya Menon IGC Summer School, New Delhi, July 2015 Outline 1. Micro-empirical methods 2. Rubin causal model 3. More on Instrumental Variables (IV) Estimating causal
More informationStructured Problems and Algorithms
Integer and quadratic optimization problems Dept. of Engg. and Comp. Sci., Univ. of Cal., Davis Aug. 13, 2010 Table of contents Outline 1 2 3 Benefits of Structured Problems Optimization problems may become
More informationEmpirical Likelihood Tests for High-dimensional Data
Empirical Likelihood Tests for High-dimensional Data Department of Statistics and Actuarial Science University of Waterloo, Canada ICSA - Canada Chapter 2013 Symposium Toronto, August 2-3, 2013 Based on
More informationLab 4, modified 2/25/11; see also Rogosa R-session
Lab 4, modified 2/25/11; see also Rogosa R-session Stat 209 Lab: Matched Sets in R Lab prepared by Karen Kapur. 1 Motivation 1. Suppose we are trying to measure the effect of a treatment variable on the
More informationSimple New Keynesian Model without Capital
Simple New Keynesian Model without Capital Lawrence J. Christiano March, 28 Objective Review the foundations of the basic New Keynesian model without capital. Clarify the role of money supply/demand. Derive
More informationInstrumental Variables
Instrumental Variables James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Instrumental Variables 1 / 10 Instrumental Variables
More informationWhat Accounts for the Growing Fluctuations in FamilyOECD Income March in the US? / 32
What Accounts for the Growing Fluctuations in Family Income in the US? Peter Gottschalk and Sisi Zhang OECD March 2 2011 What Accounts for the Growing Fluctuations in FamilyOECD Income March in the US?
More informationTopic 9: Canonical Correlation
Topic 9: Canonical Correlation Ying Li Stockholm University October 22, 2012 1/19 Basic Concepts Objectives In canonical correlation analysis, we examine the linear relationship between a set of X variables
More informationAsymptotic Statistics-VI. Changliang Zou
Asymptotic Statistics-VI Changliang Zou Kolmogorov-Smirnov distance Example (Kolmogorov-Smirnov confidence intervals) We know given α (0, 1), there is a well-defined d = d α,n such that, for any continuous
More informationGROWING APART: THE CHANGING FIRM-SIZE WAGE PREMIUM AND ITS INEQUALITY CONSEQUENCES ONLINE APPENDIX
GROWING APART: THE CHANGING FIRM-SIZE WAGE PREMIUM AND ITS INEQUALITY CONSEQUENCES ONLINE APPENDIX The following document is the online appendix for the paper, Growing Apart: The Changing Firm-Size Wage
More informationModel generation and model selection in credit scoring
Model generation and model selection in credit scoring Vadim STRIJOV Russian Academy of Sciences Computing Center EURO 2010 Lisbon July 14 th The workflow Client s application & history Client s score:
More informationHypothesis testing. 1 Principle of hypothesis testing 2
Hypothesis testing Contents 1 Principle of hypothesis testing One sample tests 3.1 Tests on Mean of a Normal distribution..................... 3. Tests on Variance of a Normal distribution....................
More informationStatistics and parameters
Statistics and parameters Tables, histograms and other charts are used to summarize large amounts of data. Often, an even more extreme summary is desirable. Statistics and parameters are numbers that characterize
More informationIntroduction to Propensity Score Matching: A Review and Illustration
Introduction to Propensity Score Matching: A Review and Illustration Shenyang Guo, Ph.D. School of Social Work University of North Carolina at Chapel Hill January 28, 2005 For Workshop Conducted at the
More informationNeighborhood social characteristics and chronic disease outcomes: does the geographic scale of neighborhood matter? Malia Jones
Neighborhood social characteristics and chronic disease outcomes: does the geographic scale of neighborhood matter? Malia Jones Prepared for consideration for PAA 2013 Short Abstract Empirical research
More informationInstructions: Closed book, notes, and no electronic devices. Points (out of 200) in parentheses
ISQS 5349 Final Spring 2011 Instructions: Closed book, notes, and no electronic devices. Points (out of 200) in parentheses 1. (10) What is the definition of a regression model that we have used throughout
More informationInstrumental Variables
James H. Steiger Department of Psychology and Human Development Vanderbilt University Regression Modeling, 2009 1 Introduction 2 3 4 Instrumental variables allow us to get a better estimate of a causal
More informationConcepts and Applications of Kriging
Esri International User Conference San Diego, California Technical Workshops July 24, 2012 Concepts and Applications of Kriging Konstantin Krivoruchko Eric Krause Outline Intro to interpolation Exploratory
More informationSupplement to The cyclical dynamics of illiquid housing, debt, and foreclosures (Quantitative Economics, Vol. 7, No. 1, March 2016, )
Supplementary Material Supplement to The cyclical dynamics of illiquid housing, debt, and foreclosures Quantitative Economics, Vol. 7, No. 1, March 2016, 289 328) Aaron Hedlund Department of Economics,
More informationComparing latent inequality with ordinal health data
Comparing latent inequality with ordinal health data David M. Kaplan University of Missouri Longhao Zhuo University of Missouri Midwest Econometrics Group October 2018 Dave Kaplan (Missouri) and Longhao
More informationA METHODOLOGY TO COMPUTE REGIONAL HOUSING INDEX PRICE. Dusan Paredes-Araya USING MATCHING ESTIMATOR METHODS
The Regional Economics Applications Laboratory (REAL) is a unit of University of Illinois focusing on the development and use of analytical models for urban and region economic development. The purpose
More informationOptimal Data-Driven Regression Discontinuity Plots. Supplemental Appendix
Optimal Data-Driven Regression Discontinuity Plots Supplemental Appendix Sebastian Calonico Matias D. Cattaneo Rocio Titiunik November 25, 2015 Abstract This supplemental appendix contains the proofs of
More informationRockefeller College University at Albany
Rockefeller College University at Albany PAD 705 Handout: Simultaneous quations and Two-Stage Least Squares So far, we have studied examples where the causal relationship is quite clear: the value of the
More informationDiploma Part 2. Quantitative Methods. Examiners Suggested Answers
Diploma Part 2 Quantitative Methods Examiners Suggested Answers Q1 (a) A frequency distribution is a table or graph (i.e. a histogram) that shows the total number of measurements that fall in each of a
More informationEmpirical approaches in public economics
Empirical approaches in public economics ECON4624 Empirical Public Economics Fall 2016 Gaute Torsvik Outline for today The canonical problem Basic concepts of causal inference Randomized experiments Non-experimental
More informationProofs and derivations
A Proofs and derivations Proposition 1. In the sheltering decision problem of section 1.1, if m( b P M + s) = u(w b P M + s), where u( ) is weakly concave and twice continuously differentiable, then f
More informationStat 710: Mathematical Statistics Lecture 31
Stat 710: Mathematical Statistics Lecture 31 Jun Shao Department of Statistics University of Wisconsin Madison, WI 53706, USA Jun Shao (UW-Madison) Stat 710, Lecture 31 April 13, 2009 1 / 13 Lecture 31:
More informationAnalysis of Gamma and Weibull Lifetime Data under a General Censoring Scheme and in the presence of Covariates
Communications in Statistics - Theory and Methods ISSN: 0361-0926 (Print) 1532-415X (Online) Journal homepage: http://www.tandfonline.com/loi/lsta20 Analysis of Gamma and Weibull Lifetime Data under a
More information2. Variance and Higher Moments
1 of 16 7/16/2009 5:45 AM Virtual Laboratories > 4. Expected Value > 1 2 3 4 5 6 2. Variance and Higher Moments Recall that by taking the expected value of various transformations of a random variable,
More information