Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design

Size: px
Start display at page:

Download "Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design"

Transcription

1 1 / 32 Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design Changbao Wu Department of Statistics and Actuarial Science University of Waterloo (Joint work with Min Chen and Mary Thompson) November 13, 2015

2 2 / 32 1 Empirical Likelihood for Two-sample Problems 2 Survey Samples for Observational Studies 3 Concluding Remarks

3 3 / 32 1 Empirical Likelihood for Two-sample Problems 2 Survey Samples for Observational Studies 3 Concluding Remarks

4 4 / 32 (1) Two Independent Samples Two groups: treatment vs control Response Y: Y 1 for treatment and Y 0 for control Sample data on Y: { Y11,, Y 1n1 } Covariates might be involved µ 1 = E(Y 1 ) and µ 0 = E(Y 0 ) and F 1 (t) = P(Y 1 t) and F 0 (t) = P(Y 0 t) S 1 (t) = P(Y 1 > t) = 1 F 1 (t) S 0 (t) = P(Y 0 > t) = 1 F 0 (t) { Y01,, Y 0n0 }

5 5 / 32 (1) Two Independent Samples Inference on treatment effect: θ = µ 1 µ 0 Inference on survival functions (distribution functions): Y 1 is stochastically larger than Y 0 if S 1 (t) > S 0 (t) for t > 0. Empirical likelihood methods for two-sample problems (Wu and Yan, 2012): Two independent samples with no missing values Two independent samples with responses missing at random Two independent survey samples with no missing values

6 6 / 32 (2) Pretest-Posttest Studies A very popular approach in medical and social sciences Measure changes resulting from a treatment or an intervention A version of two-sample problems Design I: Paired comparison (less popular one) Select a random sample of n units from the target population; Measure the response Y on all units BEFORE and AFTER the treatment/intervention.

7 7 / 32 (2) Pretest-Posttest Studies Design II: (commonly used method) A random sample of n units is taken from the target population Measures on certain baseline variables Z are obtained for ALL n individuals (pretest measures) Among the n units, n 1 are randomly selected and assigned to treatment ; Values of the response variable Y are obtained (posttest measures) The other n 0 = n n 1 units are assigned to control ; Values of the response variable Y are also obtained

8 8 / 32 (2) Pretest-Posttest Studies Y 1 : response under treatment Y 0 : response under control The available data (n = n 1 + n 0 ) i 1 2 n 1 n n n Z Z 1 Z 2 Z n1 Z n1 +1 Z n1 +2 Z n Y 1 Y 11 Y 12 Y 1n1 Y 0 Y 0(n1 +1) Y 0(n1 +2) Y 0n Two distinct features of the design: Response Missing-by-Design Availability of baseline information for all n units The Z variables follow the same distributions for both groups (due to randomization)

9 9 / 32 (2) Pretest-Posttest Studies A two-sample problem with unique features Parameter of primary interest: θ = µ 1 µ 0 Test H 0 : θ = 0 vs H 1 : θ 0 (or θ > 0) Test H 0 : F 1 = F 0 vs H 1 : F 1 < F 0 (OR H 0 : S 1 = S 0 vs H 1 : S 1 > S 0 ) Question: How to effectively use the baseline information and the feature of missing-by-design?

10 10 / 32 (3) Two-sample Problems for Observational Studies Baseline information (Z) collected for all n units Each unit is assigned to either treatment or control (Missing-by-Design) Assignments to treatment or control are not randomized: Example 1. Patients self-selection of treatment among two alternative choices. Example 2. Voluntary participation in a school smoking-intervention education program. Example 3. Modes of survey data collection: Web versus telephone interview.

11 11 / 32 An EL Approach to Pretest-Posttest Studies (Huang, Qin and Follmann, JASA, 2008) Parameter of interest: θ = µ 1 µ 0 Find the EL estimators of µ 1 and µ 0 separately Estimate θ by ˆθ = ˆµ 1 ˆµ 0 Estimate the standard error of ˆθ through a bootstrap method Inference on θ = µ 1 µ 0 using (ˆθ θ)/se(ˆθ) Finding ˆµ 1 (and ˆµ 0 ) is the main focus of the HQF paper

12 12 / 32 An Imputation-based Two-Sample EL Approach Why imputation? More efficient use of baseline information! i 1 2 n 1 n n n Z Z 1 Z 2 Z n1 Z n1 +1 Z n1 +2 Z n Y 1 Y 11 Y 12 Y 1n1 Y 0 Y 0(n1 +1) Y 0(n1 +2) Y 0n Regression modelling: Y 1i = Z T i β 1 + ɛ 1i, i = 1,, n, (1) Y 0i = Z T i β 0 + ɛ 0i, i = 1,, n, (2)

13 13 / 32 An Imputation-based Two-Sample EL Approach Let R i = 1 if i is under treatment, R i = 0 otherwise, ˆβ 1 = ˆβ 0 = ( n ) 1 n R i Z i Z T i R i Z i Y 1i, i=1 i=1 ( n ) 1 n (1 R i )Z i Z T i (1 R i )Z i Y 0i i=1 i=1 Regression imputation: Y 1i = Z T i ˆβ 1, i = n 1 + 1,, n Y 0i = Z T i ˆβ 0, i = 1,, n 1

14 14 / 32 Imputation-basede EL Inference on θ = µ 1 µ 0 Two augmented samples after imputation: {Ỹ 1i = R i Y 1i + (1 R i )Y 1i, i = 1,, n} {Ỹ 0i = (1 R i )Y 0i + R i Y 0i, i = 1,, n}. Each sample has an enlarged sample size at n = n 1 + n 0 The imputed samples are no longer independent Inferences are under the assumed regression models (the imputation model)

15 15 / 32 1 Empirical Likelihood for Two-sample Problems 2 Survey Samples for Observational Studies 3 Concluding Remarks

16 16 / 32 Two Independent Surveys Two independent survey samples S 1 and S 0 from the same population Two (possibly different) designs: d 1i, i S 1 ; d 0i, i S 0 d 1i = d 1i k S 1 d 1k and d0i = d 0i k S 0 d 0k Response variables Y 1 and Y 0 ; Survey sample data: } } {(y 1i, z 1i ), i S 1 and {(y 0i, z 0i ), i S 0 Parameter of interest: θ = µ y1 µ y0 Design-based estimator of θ: ˆθ 1 = ˆµ y1 ˆµ y0 = i S1 d 1i y 1i i S0 d 0i y 0i

17 17 / 32 Two-sample Pseudo Empirical Likelihood Two-sample pseudo EL function l(p, q) = 1 d1i log(p 1i ) + 1 d0i log(q 0i ) 2 2 i S 1 i S 0 Normalization constraints i S 1 p 1i = 1 and i S 0 q 0i = 1 Parameter constraint p 1i y 1i q 0i y 0i = θ i S 1 i S 0 Additional constraints on z 1 and z 0 depending on what s available

18 18 / 32 The ITC Four Country Survey (ITC4) The International Tobacco Control Policy Evaluation Project (The ITC Project) ITC Four Country Survey: Waves 1-7 by telephone interview ITC Four Country Survey: Wave 8, respondents chose to complete the survey either by telephone interview or self-administered web survey Total number of respondents at wave 8: 4507 Number of respondents by telephone: 2709 Number of respondents by web: 1798 The research question: Examine the effect of two different modes for data collection Y 1 : response by web (treatment) Y 0 : response by telephone (control)

19 19 / 32 Mode Effect in Survey Data Collection Survey sample S of size n; design weights d i, i = 1,, n. Units i = 1,, n 1 chose treatment Units i = n 1 + 1,, n chose control Let R i = 1 if unit i chose treatment, R i = 0 if unit i chose control, i = 1,, n Baseline information z i observed for all i = 1,, n Response variable missing-by-design: Y 1i observed for i = 1,, n 1 Y 0i observed for i = n 1 + 1,, n Point estimate and confidence interval for θ = µ y1 µ y0

20 20 / 32 Mode Effect Under randomization to treatment and control: E(Y 1 R = 1) E(Y 0 R = 0) = E(Y 1 ) E(Y 0 ) = θ With self-selection of treatment and control: E(Y 1 R = 1) E(Y 1 ), E(Y 0 R = 0) E(Y 0 ) Ignorable treatment assignment (Rosenbaum and Rubin, 1983): (Y 1, Y 0 ) and R are independent given Z Test ignobility using a two-phase sampling technique? (Chen and Kim, 2014)

21 21 / 32 Mode Effect: Propensity Score Adjustment (PSA) Treatment assignment (self-selection) depends only on Z P(R = 1 Y 1, Y 0, Z = z) = P(R = 1 Z = z) = r(z) Fit a feasible model to obtain the propensity scores ˆr i = ˆr(z i ), i = 1,, n Available data {(y 1i, z i ), i = 1,, n 1 } and { } (y 0i, z i ), i = n 1 + 1,, n Point estimator for θ = µ y1 µ y0 using PSA: ˆθ 2 = n 1 i=1 d i ˆr i y 1i n i=n 1 +1 d i 1 ˆr i y 0i

22 22 / 32 Pseudo EL Under Propensity Score Adjustment Design weights d i = P(i S), i = 1,, n; d i = d i / k S d k Pseudo empirical likelihood function and constraints l(p, q) = n 1 i=1 n 1 d i ˆr i log(p i ) + p i = 1, i=1 n 1 p i y 1i i=1 n j=n 1 +1 n j=n 1 +1 n j=n 1 +1 d j 1 ˆr j log(q j ) (3) q j = 1 (4) q j y 0j = θ (5) ˆp(θ) and ˆq(θ): Maximizer of (3) under constraints (4) and (5) The maximum ( pseudo EL ) estimator ˆθ 3 : Maximize l ˆp(θ), ˆq(θ) w.r.t. θ

23 23 / 32 A Simulation Study N = 20, 000; n = 200; Single stage PPS sampling x i1 Bernoulli(0.5); x i2 U[0, 1]; x i exp(1) π i x i3 ; max π i / min π i = 45 Linear models for the responses y i0 and y i1 : y ik = β 0k + β 1k x i1 + β 2k x i2 + β 3k x i3 + ε i, i = 1, 2,..., N Logistic regression model for R i : r i = P(R i = 1 x i ) ( ri ) log = γ 0 + γ 1 x i1 + γ 2 x i2 + γ 3 x i3, 1 r i i = 1, 2,..., N Three point estimators of θ = µ y1 µ y0 : ˆθ1, ˆθ2, ˆθ3 B = 1000 repeated simulation runs

24 24 / 32 A Simulation Study Table : Absolute Relative Bias (ARB, %) and Mean Square Error (MSE) θ = µ y1 µ y0 = 1 θ = µ y1 µ y0 = 2 E(R) ˆθ1 ˆθ2 ˆθ3 ˆθ1 ˆθ2 ˆθ ARB MSE ARB MSE ARB MSE

25 25 / 32 Post-stratification by Propensity Score With self-selection of treatment and control, the distributions of Z R = 1 and Z R = 0 tend to be different Matching by propensity score is an effective way to balance the distribution of Z between the treated and the untreated (Rosenbaum and Rubin, 1983, 1984) Order the units based on fitted propensity scores ˆr (1) ˆr (2) ˆr (n) Form K strata based on suitable cut-off of the propensity scores (Popular choice: K = 5) Units within the same stratum have similar values of propensity scores

26 26 / 32 Post-stratification by Propensity Score Post-stratified samples: S = Q 1 Q K Estimated stratum weights ( ) ( ) Ŵ k = / d i, k = 1,, K Population means µ y1 = i Q k d i i S K W k µ y1k and µ y0 = k=1 K W k µ y0k k=1 Treatment and control groups within Q k = S 1k S 0k : R i = 1 if i S 1k and R i = 0 if i S 0k

27 27 / 32 Post-stratification by Propensity Score For each Q k, the distributions of Z over S 1k and S 0k are approximately the same Balance diagnostics tools are available (Austin, 2008, 2009) The post-stratified estimator of θ: ˆµ y1k = ˆθ = K k=1 Ŵ k ( ˆµ y1k ˆµ y0k ) i S 1k d i y 1i i S 1k d i, ˆµ y0k = i S 0k d i y 0i i S 0k d i Post-stratification provides an effective way of using baseline information on Z

28 28 / 32 Using Z in Pseudo EL Through Additional Constraints The pseudo EL function for the post-stratified samples l = K Ŵ k i S 1k dik log(p ik ) + k=1 k=1 K Ŵ k i S 0k dik log(q ik ) Constraints p ik = 1, q ik = 1, k = 1,, K i S 1k i S 0k K K Ŵ k p ik y 1i Ŵ k q ik y 0i = θ k=1 i S 1k k=1 i S 0k K K Ŵ k p ik z i = Ŵ k q ik z i i S 1k i S 0k k=1 k=1

29 29 / 32 Using Z in Pseudo EL Through Imputation For each Q k = S 1k S 0k : (y 1i, z i ), i S 1k plus z i, i S 0k (y 0i, z i ), i S 0k plus z i, i S 1k Build a model using {(y 1i, z i ), i S 1k } Predict y 1i for i S 0k using z i, i S 0k Build a model using {(y 0i, z i ), i S 0k } Predict y 0i for i S 1k using z i, i S 1k Using the two imputed stratified samples for pseudo EL inference

30 30 / 32 1 Empirical Likelihood for Two-sample Problems 2 Survey Samples for Observational Studies 3 Concluding Remarks

31 31 / 32 Concluding Remarks Confidence intervals or hypothesis tests using the EL ratio statistic have better performances than the conventional normal theory methods Performances of imputation-based EL approach to pretest-posttest studies depend on two crucial conditions: Prediction power of the baseline variables Z Reliability of the model used for imputation Linear regression models are convenient, but kernel regression models can also be used Efficient computational procedures for EL are available for practical implementations We are currently conducting further simulation studies on mode effects in survey data collection

32 32 / 32 Acknowledgement This talk is partially based on Min Chen s PhD dissertation research at the University of Waterloo Chen, M., Wu, C. and Thompson, M.E. (2015). An Imputation Based Empirical Likelihood Approach to Pretest-Posttest Studies. The Canadian Journal of Statistics, 43, Chen, M., Wu, C. and Thompson, M.E. (2015). Mann-Whitney Test with Empirical Likelihood Methods for Pretest-Posttest Studies. Revised for Journal of Nonparametric Statistics. Wu, C. and Yan, Y. (2012). Weighted Empirical Likelihood Inference for Two-sample Problems. Statistics and Its Interface, 5,

Empirical Likelihood Methods for Pretest-Posttest Studies

Empirical Likelihood Methods for Pretest-Posttest Studies Empirical Likelihood Methods for Pretest-Posttest Studies by Min Chen A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Doctor of Philosophy in

More information

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW SSC Annual Meeting, June 2015 Proceedings of the Survey Methods Section ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW Xichen She and Changbao Wu 1 ABSTRACT Ordinal responses are frequently involved

More information

Causal Inference Basics

Causal Inference Basics Causal Inference Basics Sam Lendle October 09, 2013 Observed data, question, counterfactuals Observed data: n i.i.d copies of baseline covariates W, treatment A {0, 1}, and outcome Y. O i = (W i, A i,

More information

An Introduction to Causal Analysis on Observational Data using Propensity Scores

An Introduction to Causal Analysis on Observational Data using Propensity Scores An Introduction to Causal Analysis on Observational Data using Propensity Scores Margie Rosenberg*, PhD, FSA Brian Hartman**, PhD, ASA Shannon Lane* *University of Wisconsin Madison **University of Connecticut

More information

Selection on Observables: Propensity Score Matching.

Selection on Observables: Propensity Score Matching. Selection on Observables: Propensity Score Matching. Department of Economics and Management Irene Brunetti ireneb@ec.unipi.it 24/10/2017 I. Brunetti Labour Economics in an European Perspective 24/10/2017

More information

Empirical Likelihood Methods for Sample Survey Data: An Overview

Empirical Likelihood Methods for Sample Survey Data: An Overview AUSTRIAN JOURNAL OF STATISTICS Volume 35 (2006), Number 2&3, 191 196 Empirical Likelihood Methods for Sample Survey Data: An Overview J. N. K. Rao Carleton University, Ottawa, Canada Abstract: The use

More information

Fractional Imputation in Survey Sampling: A Comparative Review

Fractional Imputation in Survey Sampling: A Comparative Review Fractional Imputation in Survey Sampling: A Comparative Review Shu Yang Jae-Kwang Kim Iowa State University Joint Statistical Meetings, August 2015 Outline Introduction Fractional imputation Features Numerical

More information

Data Integration for Big Data Analysis for finite population inference

Data Integration for Big Data Analysis for finite population inference for Big Data Analysis for finite population inference Jae-kwang Kim ISU January 23, 2018 1 / 36 What is big data? 2 / 36 Data do not speak for themselves Knowledge Reproducibility Information Intepretation

More information

Propensity Score Methods for Causal Inference

Propensity Score Methods for Causal Inference John Pura BIOS790 October 2, 2015 Causal inference Philosophical problem, statistical solution Important in various disciplines (e.g. Koch s postulates, Bradford Hill criteria, Granger causality) Good

More information

Propensity Score Weighting with Multilevel Data

Propensity Score Weighting with Multilevel Data Propensity Score Weighting with Multilevel Data Fan Li Department of Statistical Science Duke University October 25, 2012 Joint work with Alan Zaslavsky and Mary Beth Landrum Introduction In comparative

More information

Robustness to Parametric Assumptions in Missing Data Models

Robustness to Parametric Assumptions in Missing Data Models Robustness to Parametric Assumptions in Missing Data Models Bryan Graham NYU Keisuke Hirano University of Arizona April 2011 Motivation Motivation We consider the classic missing data problem. In practice

More information

Empirical Likelihood Inference for Two-Sample Problems

Empirical Likelihood Inference for Two-Sample Problems Empirical Likelihood Inference for Two-Sample Problems by Ying Yan A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Master of Mathematics in Statistics

More information

Covariate Balancing Propensity Score for General Treatment Regimes

Covariate Balancing Propensity Score for General Treatment Regimes Covariate Balancing Propensity Score for General Treatment Regimes Kosuke Imai Princeton University October 14, 2014 Talk at the Department of Psychiatry, Columbia University Joint work with Christian

More information

arxiv: v1 [stat.me] 15 May 2011

arxiv: v1 [stat.me] 15 May 2011 Working Paper Propensity Score Analysis with Matching Weights Liang Li, Ph.D. arxiv:1105.2917v1 [stat.me] 15 May 2011 Associate Staff of Biostatistics Department of Quantitative Health Sciences, Cleveland

More information

Combining Non-probability and Probability Survey Samples Through Mass Imputation

Combining Non-probability and Probability Survey Samples Through Mass Imputation Combining Non-probability and Probability Survey Samples Through Mass Imputation Jae-Kwang Kim 1 Iowa State University & KAIST October 27, 2018 1 Joint work with Seho Park, Yilin Chen, and Changbao Wu

More information

New Developments in Nonresponse Adjustment Methods

New Developments in Nonresponse Adjustment Methods New Developments in Nonresponse Adjustment Methods Fannie Cobben January 23, 2009 1 Introduction In this paper, we describe two relatively new techniques to adjust for (unit) nonresponse bias: The sample

More information

Recent Advances in the analysis of missing data with non-ignorable missingness

Recent Advances in the analysis of missing data with non-ignorable missingness Recent Advances in the analysis of missing data with non-ignorable missingness Jae-Kwang Kim Department of Statistics, Iowa State University July 4th, 2014 1 Introduction 2 Full likelihood-based ML estimation

More information

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score Causal Inference with General Treatment Regimes: Generalizing the Propensity Score David van Dyk Department of Statistics, University of California, Irvine vandyk@stat.harvard.edu Joint work with Kosuke

More information

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind

More information

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data Jae-Kwang Kim 1 Iowa State University June 28, 2012 1 Joint work with Dr. Ming Zhou (when he was a PhD student at ISU)

More information

MISSING or INCOMPLETE DATA

MISSING or INCOMPLETE DATA MISSING or INCOMPLETE DATA A (fairly) complete review of basic practice Don McLeish and Cyntha Struthers University of Waterloo Dec 5, 2015 Structure of the Workshop Session 1 Common methods for dealing

More information

Propensity Score Analysis with Hierarchical Data

Propensity Score Analysis with Hierarchical Data Propensity Score Analysis with Hierarchical Data Fan Li Alan Zaslavsky Mary Beth Landrum Department of Health Care Policy Harvard Medical School May 19, 2008 Introduction Population-based observational

More information

Causal Inference for Mediation Effects

Causal Inference for Mediation Effects Causal Inference for Mediation Effects by Jing Zhang B.S., University of Science and Technology of China, 2006 M.S., Brown University, 2008 A Dissertation Submitted in Partial Fulfillment of the Requirements

More information

Combining multiple observational data sources to estimate causal eects

Combining multiple observational data sources to estimate causal eects Department of Statistics, North Carolina State University Combining multiple observational data sources to estimate causal eects Shu Yang* syang24@ncsuedu Joint work with Peng Ding UC Berkeley May 23,

More information

Modification and Improvement of Empirical Likelihood for Missing Response Problem

Modification and Improvement of Empirical Likelihood for Missing Response Problem UW Biostatistics Working Paper Series 12-30-2010 Modification and Improvement of Empirical Likelihood for Missing Response Problem Kwun Chuen Gary Chan University of Washington - Seattle Campus, kcgchan@u.washington.edu

More information

Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal

Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal Overview In observational and experimental studies, the goal may be to estimate the effect

More information

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes Maximum Likelihood Estimation Econometrics II Department of Economics Universidad Carlos III de Madrid Máster Universitario en Desarrollo y Crecimiento Económico Outline 1 3 4 General Approaches to Parameter

More information

Constrained Maximum Likelihood Estimation for Model Calibration Using Summary-level Information from External Big Data Sources

Constrained Maximum Likelihood Estimation for Model Calibration Using Summary-level Information from External Big Data Sources Constrained Maximum Likelihood Estimation for Model Calibration Using Summary-level Information from External Big Data Sources Yi-Hau Chen Institute of Statistical Science, Academia Sinica Joint with Nilanjan

More information

DATA-ADAPTIVE VARIABLE SELECTION FOR

DATA-ADAPTIVE VARIABLE SELECTION FOR DATA-ADAPTIVE VARIABLE SELECTION FOR CAUSAL INFERENCE Group Health Research Institute Department of Biostatistics, University of Washington shortreed.s@ghc.org joint work with Ashkan Ertefaie Department

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

PROPENSITY SCORE MATCHING. Walter Leite

PROPENSITY SCORE MATCHING. Walter Leite PROPENSITY SCORE MATCHING Walter Leite 1 EXAMPLE Question: Does having a job that provides or subsidizes child care increate the length that working mothers breastfeed their children? Treatment: Working

More information

Generalized Pseudo Empirical Likelihood Inferences for Complex Surveys

Generalized Pseudo Empirical Likelihood Inferences for Complex Surveys The Canadian Journal of Statistics Vol.??, No.?,????, Pages???-??? La revue canadienne de statistique Generalized Pseudo Empirical Likelihood Inferences for Complex Surveys Zhiqiang TAN 1 and Changbao

More information

What s New in Econometrics. Lecture 1

What s New in Econometrics. Lecture 1 What s New in Econometrics Lecture 1 Estimation of Average Treatment Effects Under Unconfoundedness Guido Imbens NBER Summer Institute, 2007 Outline 1. Introduction 2. Potential Outcomes 3. Estimands and

More information

Empirical Likelihood Methods

Empirical Likelihood Methods Handbook of Statistics, Volume 29 Sample Surveys: Theory, Methods and Inference Empirical Likelihood Methods J.N.K. Rao and Changbao Wu (February 14, 2008, Final Version) 1 Likelihood-based Approaches

More information

Empirical likelihood methods in missing response problems and causal interference

Empirical likelihood methods in missing response problems and causal interference The University of Toledo The University of Toledo Digital Repository Theses and Dissertations 2017 Empirical likelihood methods in missing response problems and causal interference Kaili Ren University

More information

Causal Inference in Observational Studies with Non-Binary Treatments. David A. van Dyk

Causal Inference in Observational Studies with Non-Binary Treatments. David A. van Dyk Causal Inference in Observational Studies with Non-Binary reatments Statistics Section, Imperial College London Joint work with Shandong Zhao and Kosuke Imai Cass Business School, October 2013 Outline

More information

Primal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing

Primal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing Primal-dual Covariate Balance and Minimal Double Robustness via (Joint work with Daniel Percival) Department of Statistics, Stanford University JSM, August 9, 2015 Outline 1 2 3 1/18 Setting Rubin s causal

More information

Semiparametric Regression Based on Multiple Sources

Semiparametric Regression Based on Multiple Sources Semiparametric Regression Based on Multiple Sources Benjamin Kedem Department of Mathematics & Inst. for Systems Research University of Maryland, College Park Give me a place to stand and rest my lever

More information

MISSING or INCOMPLETE DATA

MISSING or INCOMPLETE DATA MISSING or INCOMPLETE DATA A (fairly) complete review of basic practice Don McLeish and Cyntha Struthers University of Waterloo Dec 5, 2015 Structure of the Workshop Session 1 Common methods for dealing

More information

Introduction to Statistical Analysis

Introduction to Statistical Analysis Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive

More information

Advising on Research Methods: A consultant's companion. Herman J. Ader Gideon J. Mellenbergh with contributions by David J. Hand

Advising on Research Methods: A consultant's companion. Herman J. Ader Gideon J. Mellenbergh with contributions by David J. Hand Advising on Research Methods: A consultant's companion Herman J. Ader Gideon J. Mellenbergh with contributions by David J. Hand Contents Preface 13 I Preliminaries 19 1 Giving advice on research methods

More information

Variable selection and machine learning methods in causal inference

Variable selection and machine learning methods in causal inference Variable selection and machine learning methods in causal inference Debashis Ghosh Department of Biostatistics and Informatics Colorado School of Public Health Joint work with Yeying Zhu, University of

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Survival Analysis for Case-Cohort Studies

Survival Analysis for Case-Cohort Studies Survival Analysis for ase-ohort Studies Petr Klášterecký Dept. of Probability and Mathematical Statistics, Faculty of Mathematics and Physics, harles University, Prague, zech Republic e-mail: petr.klasterecky@matfyz.cz

More information

DOUBLY ROBUST NONPARAMETRIC MULTIPLE IMPUTATION FOR IGNORABLE MISSING DATA

DOUBLY ROBUST NONPARAMETRIC MULTIPLE IMPUTATION FOR IGNORABLE MISSING DATA Statistica Sinica 22 (2012), 149-172 doi:http://dx.doi.org/10.5705/ss.2010.069 DOUBLY ROBUST NONPARAMETRIC MULTIPLE IMPUTATION FOR IGNORABLE MISSING DATA Qi Long, Chiu-Hsieh Hsu and Yisheng Li Emory University,

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

Difference-in-Differences Methods

Difference-in-Differences Methods Difference-in-Differences Methods Teppei Yamamoto Keio University Introduction to Causal Inference Spring 2016 1 Introduction: A Motivating Example 2 Identification 3 Estimation and Inference 4 Diagnostics

More information

A comparison of weighted estimators for the population mean. Ye Yang Weighting in surveys group

A comparison of weighted estimators for the population mean. Ye Yang Weighting in surveys group A comparison of weighted estimators for the population mean Ye Yang Weighting in surveys group Motivation Survey sample in which auxiliary variables are known for the population and an outcome variable

More information

Monte Carlo Study on the Successive Difference Replication Method for Non-Linear Statistics

Monte Carlo Study on the Successive Difference Replication Method for Non-Linear Statistics Monte Carlo Study on the Successive Difference Replication Method for Non-Linear Statistics Amang S. Sukasih, Mathematica Policy Research, Inc. Donsig Jang, Mathematica Policy Research, Inc. Amang S. Sukasih,

More information

Mann-Whitney Test with Adjustments to Pre-treatment Variables for Missing Values and Observational Study

Mann-Whitney Test with Adjustments to Pre-treatment Variables for Missing Values and Observational Study MPRA Munich Personal RePEc Archive Mann-Whitney Test with Adjustments to Pre-treatment Variables for Missing Values and Observational Study Songxi Chen and Jing Qin and Chengyong Tang January 013 Online

More information

Section 9c. Propensity scores. Controlling for bias & confounding in observational studies

Section 9c. Propensity scores. Controlling for bias & confounding in observational studies Section 9c Propensity scores Controlling for bias & confounding in observational studies 1 Logistic regression and propensity scores Consider comparing an outcome in two treatment groups: A vs B. In a

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth

More information

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data Malaysian Journal of Mathematical Sciences 11(3): 33 315 (217) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES Journal homepage: http://einspem.upm.edu.my/journal Approximation of Survival Function by Taylor

More information

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model

More information

Nuisance parameter elimination for proportional likelihood ratio models with nonignorable missingness and random truncation

Nuisance parameter elimination for proportional likelihood ratio models with nonignorable missingness and random truncation Biometrika Advance Access published October 24, 202 Biometrika (202), pp. 8 C 202 Biometrika rust Printed in Great Britain doi: 0.093/biomet/ass056 Nuisance parameter elimination for proportional likelihood

More information

Survival Analysis Math 434 Fall 2011

Survival Analysis Math 434 Fall 2011 Survival Analysis Math 434 Fall 2011 Part IV: Chap. 8,9.2,9.3,11: Semiparametric Proportional Hazards Regression Jimin Ding Math Dept. www.math.wustl.edu/ jmding/math434/fall09/index.html Basic Model Setup

More information

Jong-Min Kim* and Jon E. Anderson. Statistics Discipline Division of Science and Mathematics University of Minnesota at Morris

Jong-Min Kim* and Jon E. Anderson. Statistics Discipline Division of Science and Mathematics University of Minnesota at Morris Jackknife Variance Estimation for Two Samples after Imputation under Two-Phase Sampling Jong-Min Kim* and Jon E. Anderson jongmink@mrs.umn.edu Statistics Discipline Division of Science and Mathematics

More information

Strategy of Bayesian Propensity. Score Estimation Approach. in Observational Study

Strategy of Bayesian Propensity. Score Estimation Approach. in Observational Study Theoretical Mathematics & Applications, vol.2, no.3, 2012, 75-86 ISSN: 1792-9687 (print), 1792-9709 (online) Scienpress Ltd, 2012 Strategy of Bayesian Propensity Score Estimation Approach in Observational

More information

Extending causal inferences from a randomized trial to a target population

Extending causal inferences from a randomized trial to a target population Extending causal inferences from a randomized trial to a target population Issa Dahabreh Center for Evidence Synthesis in Health, Brown University issa dahabreh@brown.edu January 16, 2019 Issa Dahabreh

More information

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction

More information

Statistics 135 Fall 2008 Final Exam

Statistics 135 Fall 2008 Final Exam Name: SID: Statistics 135 Fall 2008 Final Exam Show your work. The number of points each question is worth is shown at the beginning of the question. There are 10 problems. 1. [2] The normal equations

More information

Propensity score modelling in observational studies using dimension reduction methods

Propensity score modelling in observational studies using dimension reduction methods University of Colorado, Denver From the SelectedWorks of Debashis Ghosh 2011 Propensity score modelling in observational studies using dimension reduction methods Debashis Ghosh, Penn State University

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

The propensity score with continuous treatments

The propensity score with continuous treatments 7 The propensity score with continuous treatments Keisuke Hirano and Guido W. Imbens 1 7.1 Introduction Much of the work on propensity score analysis has focused on the case in which the treatment is binary.

More information

Master s Written Examination - Solution

Master s Written Examination - Solution Master s Written Examination - Solution Spring 204 Problem Stat 40 Suppose X and X 2 have the joint pdf f X,X 2 (x, x 2 ) = 2e (x +x 2 ), 0 < x < x 2

More information

Guanglei Hong. University of Chicago. Presentation at the 2012 Atlantic Causal Inference Conference

Guanglei Hong. University of Chicago. Presentation at the 2012 Atlantic Causal Inference Conference Guanglei Hong University of Chicago Presentation at the 2012 Atlantic Causal Inference Conference 2 Philosophical Question: To be or not to be if treated ; or if untreated Statistics appreciates uncertainties

More information

Propensity Score Methods for Estimating Causal Effects from Complex Survey Data

Propensity Score Methods for Estimating Causal Effects from Complex Survey Data Propensity Score Methods for Estimating Causal Effects from Complex Survey Data Dissertation Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School

More information

Balancing Covariates via Propensity Score Weighting

Balancing Covariates via Propensity Score Weighting Balancing Covariates via Propensity Score Weighting Kari Lock Morgan Department of Statistics Penn State University klm47@psu.edu Stochastic Modeling and Computational Statistics Seminar October 17, 2014

More information

Parametric fractional imputation for missing data analysis

Parametric fractional imputation for missing data analysis 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Biometrika (????),??,?, pp. 1 15 C???? Biometrika Trust Printed in

More information

Local Polynomial Wavelet Regression with Missing at Random

Local Polynomial Wavelet Regression with Missing at Random Applied Mathematical Sciences, Vol. 6, 2012, no. 57, 2805-2819 Local Polynomial Wavelet Regression with Missing at Random Alsaidi M. Altaher School of Mathematical Sciences Universiti Sains Malaysia 11800

More information

Topics and Papers for Spring 14 RIT

Topics and Papers for Spring 14 RIT Eric Slud Feb. 3, 204 Topics and Papers for Spring 4 RIT The general topic of the RIT is inference for parameters of interest, such as population means or nonlinearregression coefficients, in the presence

More information

Model comparison and selection

Model comparison and selection BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)

More information

Likelihood-based inference with missing data under missing-at-random

Likelihood-based inference with missing data under missing-at-random Likelihood-based inference with missing data under missing-at-random Jae-kwang Kim Joint work with Shu Yang Department of Statistics, Iowa State University May 4, 014 Outline 1. Introduction. Parametric

More information

AN INSTRUMENTAL VARIABLE APPROACH FOR IDENTIFICATION AND ESTIMATION WITH NONIGNORABLE NONRESPONSE

AN INSTRUMENTAL VARIABLE APPROACH FOR IDENTIFICATION AND ESTIMATION WITH NONIGNORABLE NONRESPONSE Statistica Sinica 24 (2014), 1097-1116 doi:http://dx.doi.org/10.5705/ss.2012.074 AN INSTRUMENTAL VARIABLE APPROACH FOR IDENTIFICATION AND ESTIMATION WITH NONIGNORABLE NONRESPONSE Sheng Wang 1, Jun Shao

More information

Lecture 12: Application of Maximum Likelihood Estimation:Truncation, Censoring, and Corner Solutions

Lecture 12: Application of Maximum Likelihood Estimation:Truncation, Censoring, and Corner Solutions Econ 513, USC, Department of Economics Lecture 12: Application of Maximum Likelihood Estimation:Truncation, Censoring, and Corner Solutions I Introduction Here we look at a set of complications with the

More information

UNIVERSITÄT POTSDAM Institut für Mathematik

UNIVERSITÄT POTSDAM Institut für Mathematik UNIVERSITÄT POTSDAM Institut für Mathematik Testing the Acceleration Function in Life Time Models Hannelore Liero Matthias Liero Mathematische Statistik und Wahrscheinlichkeitstheorie Universität Potsdam

More information

A note on multiple imputation for general purpose estimation

A note on multiple imputation for general purpose estimation A note on multiple imputation for general purpose estimation Shu Yang Jae Kwang Kim SSC meeting June 16, 2015 Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 1 / 32 Introduction Basic Setup Assume

More information

Propensity-Score Based Methods for Causal Inference in Observational Studies with Fixed Non-Binary Treatments

Propensity-Score Based Methods for Causal Inference in Observational Studies with Fixed Non-Binary Treatments Propensity-Score Based Methods for Causal Inference in Observational Studies with Fixed Non-Binary reatments Shandong Zhao David A. van Dyk Kosuke Imai July 3, 2018 Abstract Propensity score methods are

More information

Propensity score adjusted method for missing data

Propensity score adjusted method for missing data Graduate Theses and Dissertations Graduate College 2013 Propensity score adjusted method for missing data Minsun Kim Riddles Iowa State University Follow this and additional works at: http://lib.dr.iastate.edu/etd

More information

Test of Association between Two Ordinal Variables while Adjusting for Covariates

Test of Association between Two Ordinal Variables while Adjusting for Covariates Test of Association between Two Ordinal Variables while Adjusting for Covariates Chun Li, Bryan Shepherd Department of Biostatistics Vanderbilt University May 13, 2009 Examples Amblyopia http://www.medindia.net/

More information

Missing covariate data in matched case-control studies: Do the usual paradigms apply?

Missing covariate data in matched case-control studies: Do the usual paradigms apply? Missing covariate data in matched case-control studies: Do the usual paradigms apply? Bryan Langholz USC Department of Preventive Medicine Joint work with Mulugeta Gebregziabher Larry Goldstein Mark Huberman

More information

OUTCOME REGRESSION AND PROPENSITY SCORES (CHAPTER 15) BIOS Outcome regressions and propensity scores

OUTCOME REGRESSION AND PROPENSITY SCORES (CHAPTER 15) BIOS Outcome regressions and propensity scores OUTCOME REGRESSION AND PROPENSITY SCORES (CHAPTER 15) BIOS 776 1 15 Outcome regressions and propensity scores Outcome Regression and Propensity Scores ( 15) Outline 15.1 Outcome regression 15.2 Propensity

More information

Two-phase sampling approach to fractional hot deck imputation

Two-phase sampling approach to fractional hot deck imputation Two-phase sampling approach to fractional hot deck imputation Jongho Im 1, Jae-Kwang Kim 1 and Wayne A. Fuller 1 Abstract Hot deck imputation is popular for handling item nonresponse in survey sampling.

More information

Advanced Quantitative Research Methodology, Lecture Notes: Research Designs for Causal Inference 1

Advanced Quantitative Research Methodology, Lecture Notes: Research Designs for Causal Inference 1 Advanced Quantitative Research Methodology, Lecture Notes: Research Designs for Causal Inference 1 Gary King GaryKing.org April 13, 2014 1 c Copyright 2014 Gary King, All Rights Reserved. Gary King ()

More information

Semiparametric Regression Based on Multiple Sources

Semiparametric Regression Based on Multiple Sources Semiparametric Regression Based on Multiple Sources Benjamin Kedem Department of Mathematics & Inst. for Systems Research University of Maryland, College Park Give me a place to stand and rest my lever

More information

Causal Sensitivity Analysis for Decision Trees

Causal Sensitivity Analysis for Decision Trees Causal Sensitivity Analysis for Decision Trees by Chengbo Li A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Master of Mathematics in Computer

More information

High Dimensional Propensity Score Estimation via Covariate Balancing

High Dimensional Propensity Score Estimation via Covariate Balancing High Dimensional Propensity Score Estimation via Covariate Balancing Kosuke Imai Princeton University Talk at Columbia University May 13, 2017 Joint work with Yang Ning and Sida Peng Kosuke Imai (Princeton)

More information

Implementing Matching Estimators for. Average Treatment Effects in STATA

Implementing Matching Estimators for. Average Treatment Effects in STATA Implementing Matching Estimators for Average Treatment Effects in STATA Guido W. Imbens - Harvard University West Coast Stata Users Group meeting, Los Angeles October 26th, 2007 General Motivation Estimation

More information

Linear models and their mathematical foundations: Simple linear regression

Linear models and their mathematical foundations: Simple linear regression Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction

More information

Introduction An approximated EM algorithm Simulation studies Discussion

Introduction An approximated EM algorithm Simulation studies Discussion 1 / 33 An Approximated Expectation-Maximization Algorithm for Analysis of Data with Missing Values Gong Tang Department of Biostatistics, GSPH University of Pittsburgh NISS Workshop on Nonignorable Nonresponse

More information

Chapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70

Chapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70 Chapter 5: Models used in conjunction with sampling J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70 Nonresponse Unit Nonresponse: weight adjustment Item Nonresponse:

More information

On the covariate-adjusted estimation for an overall treatment difference with data from a randomized comparative clinical trial

On the covariate-adjusted estimation for an overall treatment difference with data from a randomized comparative clinical trial Biostatistics Advance Access published January 30, 2012 Biostatistics (2012), xx, xx, pp. 1 18 doi:10.1093/biostatistics/kxr050 On the covariate-adjusted estimation for an overall treatment difference

More information

Probabilistic Index Models

Probabilistic Index Models Probabilistic Index Models Jan De Neve Department of Data Analysis Ghent University M3 Storrs, Conneticut, USA May 23, 2017 Jan.DeNeve@UGent.be 1 / 37 Introduction 2 / 37 Introduction to Probabilistic

More information

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017 Introduction to Regression Analysis Dr. Devlina Chatterjee 11 th August, 2017 What is regression analysis? Regression analysis is a statistical technique for studying linear relationships. One dependent

More information

Sampling Techniques. Esra Akdeniz. February 9th, 2016

Sampling Techniques. Esra Akdeniz. February 9th, 2016 Sampling Techniques Esra Akdeniz February 9th, 2016 HOW TO DO RESEARCH? Question. Literature research. Hypothesis. Collect data. Analyze data. Interpret and present results. HOW TO DO RESEARCH? Collect

More information

Bounds on Causal Effects in Three-Arm Trials with Non-compliance. Jing Cheng Dylan Small

Bounds on Causal Effects in Three-Arm Trials with Non-compliance. Jing Cheng Dylan Small Bounds on Causal Effects in Three-Arm Trials with Non-compliance Jing Cheng Dylan Small Department of Biostatistics and Department of Statistics University of Pennsylvania June 20, 2005 A Three-Arm Randomized

More information

Lecture 5: Poisson and logistic regression

Lecture 5: Poisson and logistic regression Dankmar Böhning Southampton Statistical Sciences Research Institute University of Southampton, UK S 3 RI, 3-5 March 2014 introduction to Poisson regression application to the BELCAP study introduction

More information

Ratio of Mediator Probability Weighting for Estimating Natural Direct and Indirect Effects

Ratio of Mediator Probability Weighting for Estimating Natural Direct and Indirect Effects Ratio of Mediator Probability Weighting for Estimating Natural Direct and Indirect Effects Guanglei Hong University of Chicago, 5736 S. Woodlawn Ave., Chicago, IL 60637 Abstract Decomposing a total causal

More information

Cox s proportional hazards model and Cox s partial likelihood

Cox s proportional hazards model and Cox s partial likelihood Cox s proportional hazards model and Cox s partial likelihood Rasmus Waagepetersen October 12, 2018 1 / 27 Non-parametric vs. parametric Suppose we want to estimate unknown function, e.g. survival function.

More information

The Effective Use of Complete Auxiliary Information From Survey Data

The Effective Use of Complete Auxiliary Information From Survey Data The Effective Use of Complete Auxiliary Information From Survey Data by Changbao Wu B.S., Anhui Laodong University, China, 1982 M.S. Diploma, East China Normal University, 1986 a thesis submitted in partial

More information