A Distributional Approach Using Propensity Scores

Similar documents
REGRESSION WITH QUADRATIC LOSS

Regression with quadratic loss

Expectation and Variance of a random variable

1 Models for Matched Pairs

BIOSTATISTICS. Lecture 5 Interval Estimations for Mean and Proportion. dr. Petr Nazarov

Lecture 7: Properties of Random Samples

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

Topic 9: Sampling Distributions of Estimators

Sampling Error. Chapter 6 Student Lecture Notes 6-1. Business Statistics: A Decision-Making Approach, 6e. Chapter Goals

6 Sample Size Calculations

Journal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula

Chapter 6 Principles of Data Reduction

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

Regression with an Evaporating Logarithmic Trend

STA6938-Logistic Regression Model

Parameter, Statistic and Random Samples

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

Statistics Lecture 27. Final review. Administrative Notes. Outline. Experiments. Sampling and Surveys. Administrative Notes

Topic 9: Sampling Distributions of Estimators

Sample Size Determination (Two or More Samples)

Linear Regression Models

High-Dimensional M-Estimation with Missing Outcomes: A Semi-Parametric Framework

Singular Continuous Measures by Michael Pejic 5/14/10

Exponential Families and Bayesian Inference

Quick Review of Probability

Agenda: Recap. Lecture. Chapter 12. Homework. Chapt 12 #1, 2, 3 SAS Problems 3 & 4 by hand. Marquette University MATH 4740/MSCS 5740

Empirical Process Theory and Oracle Inequalities

(6) Fundamental Sampling Distribution and Data Discription

Quick Review of Probability

Mathematics 170B Selected HW Solutions.

Data Analysis and Statistical Methods Statistics 651

Binomial Distribution

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

Section 14. Simple linear regression.

SDS 321: Introduction to Probability and Statistics

Stat 421-SP2012 Interval Estimation Section

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

STAC51: Categorical data Analysis

Chapter 6 Sampling Distributions

Topic 9: Sampling Distributions of Estimators

Unbiased Estimation. February 7-12, 2008

1 Inferential Methods for Correlation and Regression Analysis

Introductory statistics

TMA4245 Statistics. Corrected 30 May and 4 June Norwegian University of Science and Technology Department of Mathematical Sciences.

Homework 5 Solutions

Sample Size Estimation in the Proportional Hazards Model for K-sample or Regression Settings Scott S. Emerson, M.D., Ph.D.

A goodness-of-fit test based on the empirical characteristic function and a comparison of tests for normality

Lecture 15: Learning Theory: Concentration Inequalities

G. R. Pasha Department of Statistics Bahauddin Zakariya University Multan, Pakistan

Sample questions. 8. Let X denote a continuous random variable with probability density function f(x) = 4x 3 /15 for

Chapter two: Hypothesis testing

Since X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain

Additional Notes and Computational Formulas CHAPTER 3

A new distribution-free quantile estimator

This is an introductory course in Analysis of Variance and Design of Experiments.

Clustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar.

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Example: Find the SD of the set {x j } = {2, 4, 5, 8, 5, 11, 7}.

Output Analysis (2, Chapters 10 &11 Law)

Random Variables, Sampling and Estimation

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

4. Partial Sums and the Central Limit Theorem

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9

Asymptotic Coupling and Its Applications in Information Theory

Data Description. Measure of Central Tendency. Data Description. Chapter x i

IMPROVING EFFICIENT MARGINAL ESTIMATORS IN BIVARIATE MODELS WITH PARAMETRIC MARGINALS

CEU Department of Economics Econometrics 1, Problem Set 1 - Solutions

Mathematical Statistics - MS

Bull. Korean Math. Soc. 36 (1999), No. 3, pp. 451{457 THE STRONG CONSISTENCY OF NONLINEAR REGRESSION QUANTILES ESTIMATORS Seung Hoe Choi and Hae Kyung

11 Correlation and Regression

Chapter 13: Tests of Hypothesis Section 13.1 Introduction

Statistical Inference Based on Extremum Estimators

Error & Uncertainty. Error. More on errors. Uncertainty. Page # The error is the difference between a TRUE value, x, and a MEASURED value, x i :

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

NYU Center for Data Science: DS-GA 1003 Machine Learning and Computational Statistics (Spring 2018)

Appendix to: Hypothesis Testing for Multiple Mean and Correlation Curves with Functional Data

Machine Learning Brett Bernstein

Vector Quantization: a Limiting Case of EM

Statistics 203 Introduction to Regression and Analysis of Variance Assignment #1 Solutions January 20, 2005

CSE 527, Additional notes on MLE & EM

Chapter 6 Part 5. Confidence Intervals t distribution chi square distribution. October 23, 2008

Bayesian Methods: Introduction to Multi-parameter Models

THE SYSTEMATIC AND THE RANDOM. ERRORS - DUE TO ELEMENT TOLERANCES OF ELECTRICAL NETWORKS

ACO Comprehensive Exam 9 October 2007 Student code A. 1. Graph Theory

Final Review for MATH 3510

The standard deviation of the mean

Probability and Statistics

Study the bias (due to the nite dimensional approximation) and variance of the estimators

Logit regression Logit regression

Stat410 Probability and Statistics II (F16)

April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE

Lecture 11 and 12: Basic estimation theory

An Introduction to Asymptotic Theory

of the matrix is =-85, so it is not positive definite. Thus, the first

Optimally Sparse SVMs

Confidence intervals summary Conservative and approximate confidence intervals for a binomial p Examples. MATH1005 Statistics. Lecture 24. M.

Estimation of Gumbel Parameters under Ranked Set Sampling

Transcription:

A Distributioal Approach Usig Propesity Scores Zhiqiag Ta Departmet of Biostatistics Johs Hopkis School of Public Health http://www.biostat.jhsph.edu/ zta Jue 20, 2005

Outlie Itroductio Couterfactual framework Illustratio Applicatio No-cofoudig case Kow propesity score Parametric propesity score Cofoudig case

Itroductio Right heart catheterizatio (RHC) is performed daily i hospitals sice 1970s. The beefit of RHC had NOT bee demostrated i a successful radomized cliical trial. Coors et al. s (1996) observatioal study raised the cocer that RHC might ot beefit critically ill patiets ad might i fact cause harm. Data were collected o 5735 critically ill patiets admitted to the ICUs of five medical ceters: Treatmet: No-RHC or RHC Outcome: 30-day survival Covariates: 75 covariates HOW to evaluate the effect of RHC o survival?

Couterfactual framework X: covariates measured T : treatmet variable takig value 0 or 1 if a patiet actually receives No-RHC or RHC (Y 0, Y 1 ): potetial outcome that would be observed if a patiet received No-RHC or RHC Y = (1 T ) Y 0 + T Y 1 : observed outcome We are iterested i average causal effect E( Y 1 Y 0 ) = E(Y 1 ) E(Y 0 ) or P ({Y 1 }) versus P ({Y 0 }) Assigmet mechaism No-cofoudig: Cofoudig: T (Y 0, Y 1 ) X T (Y 0, Y 1 ) X Propesity score: π(x) = P (T = 1 X)

Thirty day survival curves RHC, Raw No RHC, Raw 0 5 10 15 20 25 30 Day Proportio of Survivig 0.5 0.6 0.7 0.8 0.9 1.0 Raw histogram of aps RHC No RHC 0 50 100 150 Raw histogram of meabp 0 50 100 150 200 250 Raw histogram of pafi 0 200 400 600 800 1000 0.000 0.002 0.004 0.006

Illustratio RHC= 1 RHC= 0 BP= 1 (52, 28) 80 (11, 9) 20 BP= 0 (30, 10) 40 (37, 23) 60 82, 38 120 48, 32 80 Patiets get RHC at radom P ( survival RHC = 1 ) 82/120 = 68.3% P ( survival RHC = 0 ) 48/80 = 60.0% Patiets get RHC at radom give blood pressure 80 40 60 20 100 100 Weight each patiet such that 80w 1 (1) = 1/2, 40w 1 (0) = 1/2, 20w 0 (1) = 1/2, 60w 0 (0) = 1/2. Compare the weighted probabilities 52w 1 (1) + 30w 1 (0) = 70.0%, 11w 0 (1) + 37w 0 (0) = 58.3%.

WHAT IF patiets are NOT equally likely to get RHC at each level of blood pressure? Previous estimates: P ( obs survival BP =, RHC = 1 ) = 70.0%, P ( obs survival BP =, RHC = 0 ) = 58.3%. Weight each patiet such that 80 140 21 λ 1i w 1 (1) = 1 120 2, i=81 λ 0i w 0 (1) = 1 200 2, 41 where Λ 1 λ 1i, λ 0i Λ (Λ = 1.5). λ 1i w 1 (0) = 1 2, λ 0i w 0 (0) = 1 2, Boud the weighted probabilities 120 λ 1i w 1 (X i ) Y 1i, 200 21 subject to the foregoig costraits. λ 0i w 0 (X i ) Y 0i, P (!obs survival BP =, RHC = 1 ) 72.2%, P (!obs survival BP =, RHC = 0 ) 55.0%.

Thirty day survival curves RHC, Raw No RHC, Raw 0 5 10 15 20 25 30 Day Proportio of Survivig 0.5 0.6 0.7 0.8 0.9 1.0 Raw histogram of aps RHC No RHC 0 50 100 150 Raw histogram of meabp 0 50 100 150 200 250 Raw histogram of pafi 0 200 400 600 800 1000 0.000 0.002 0.004 0.006

Thirty day survival curves RHC, Raw No RHC, Raw RHC, Weighted No RHC, Weighted 0 5 10 15 20 25 30 Day Proportio of Survivig 0.5 0.6 0.7 0.8 0.9 1.0 Raw histogram of aps RHC No RHC 0 50 100 150 Raw histogram of meabp 0 50 100 150 200 250 Raw histogram of pafi 0 200 400 600 800 1000 Weighted histogram of aps RHC No RHC 0 50 100 150 Weighted histogram of meabp 0 50 100 150 200 250 Weighted histogram of pafi 0 200 400 600 800 1000 0.000 0.002 0.004 0.006 0.000 0.002 0.004 0.006

Thirty day survival curves RHC, Observed RHC, Couterfactual No RHC, Couterfactual No RHC, Observed 0 5 10 15 20 25 30 Day Proportio of Survivig 0.5 0.6 0.7 0.8 0.9 1.0 Weighted histogram of aps No RHC, Couterfactual No RHC, Observed 0 50 100 150 Weighted histogram of meabp 0 50 100 150 200 250 Weighted histogram of pafi 0 200 400 600 800 1000 Weighted histogram of aps RHC, Observed RHC, Couterfactual 0 50 100 150 Weighted histogram of meabp 0 50 100 150 200 250 Weighted histogram of pafi 0 200 400 600 800 1000 0.000 0.002 0.004 0.006 0.000 0.002 0.004 0.006

No-cofoudig case Data: (X i, Y T i, T i ), i = 1, 2,..., Likelihood: = L 1 L 2 [ ] (1 π(x i )) 1 T i π(x i ) T i [ ] G 0 ({X i, Y 0i }) 1 T i G 1 ({X i, Y 1i }) T i where G 0 is the joit distributio of (X, Y 0 ) ad G 1 is the joit distributio of (X, Y 1 ). G 0 ad G 1 iduce the same margial distributios o the covariate space X. Equivaletly, h(x) dg 0 (x, y 0 ) = h(x) dg 1 (x, y 1 ) for each bouded fuctio h o X. Take fiitely may costraits ad fid MLE (Ĝ 0, Ĝ 1 ): ˆµ 1 = y 1 dĝ 1 (x, y 1 ), ˆµ 0 = y 0 dĝ 0 (x, y 0 ).

Kow propesity score [Model S0: kow π ] Maximize the likelihood subject to the costraits π (x) dg 0 = π (x)dg 1, h j (x) dg 0 = h j (x)dg 1, j = 1,..., m. Let h = (π, 1 π, h 1,..., h m). Maximize 1 1 log(λ h (X i )) + 1 The Ĝ 1 {(X i, Y 1i )} = Ĝ 0 {(X i, Y 0i )} = i= 1 +1 log(1 λ h (X i )). 1 λ h (X i ), i = 1,..., 1, 1 1 λ h (X i ), i = 1 + 1,...,. First-order approximatio: µ 1 = 1 µ 0 = 1 Y 1i T i π (X i ) β 1 [ 1 Y 0i (1 T i ) 1 π (X i ) β 0 h (X i ) ( Ti )] 1 π (X i ) π (X i ) 1, [ 1 h (X i ) ( 1 Ti )] π (X i ) 1 π (X i ) 1, where β 1 = B 1 C 1 ad β 0 = B 1 C 0.

The method of cotrol variates: 1 Y 1i T i π (X i ) b 1 [ 1 i=0 h (X i ) 1 π (X i ) ( Ti π (X i ) 1 )]. The optimal choice of b 1 is β 1 = B 1 C 1. A more geeral class of estimators: 1 Y 1i T i π (X i ) 1 ( Ti ) φ 1 (X i ) π (X i ) 1. The optimal choice of φ 1 (x) is E(Y 1 X = x). achieves semiparametric efficiecy uder S0. Choose h such that E(Y 1 X = x) is cotaied the liear spa of E(Y 0 X = x) is cotaied the liear spa of h (x) 1 π (x), h (x) π (x). Outcome regressio [Model R] E(Y 1 X) = Ψ ( α 1 g 1(X) ), E(Y 0 X) = Ψ ( α 0 g 0(X) ). Choose h = ( π, 1 π, π Ψ(ˆα 0 g 0), (1 π )Ψ(ˆα 1 g 1) ).

Parametric propesity score [Model S: π( ; γ)] Maximize the likelihood subject to the costraits ˆπ(x) dg 1 = ˆπ(x)dG 0, ĥ j (x) dg 1 = ĥ j (x)dg 0, j = 1,..., m. Let ĥ = (ˆπ, 1 ˆπ, ĥ 1,..., ĥ m ). Maximize 1 1 log(λ ĥ(x i )) + 1 i= 1 +1 1 log(1 λ ĥ(x i )). The Ĝ 1 {(X i, Y 1i )} = λ ĥ(x i ), i = 1,..., 1, 1 Ĝ 0 {(X i, Y 0i )} = 1 λ ĥ(x i ), i = 1 + 1,...,. First-order approximatio: µ 1 = 1 µ 0 = 1 Y 1i T i ˆπ(X i ) β 1 [ 1 Y 0i (1 T i ) 1 ˆπ(X i ) β 0 [ 1 ĥ(x i ) ( Ti )] 1 ˆπ(X i ) ˆπ(X i ) 1, ĥ(x i ) ( 1 Ti )] ˆπ(X i ) 1 ˆπ(X i ) 1, where β 1 = B 1 C 1 ad β 0 = B 1 C 0.

Our strategy is To build ad check propesity score models to esure cosistecy To use outcome regressio models for variace ad bias reductio Propesity score models ca be checked with the followig idea: Pick up a collectio of test fuctios ĥ j s o X, for example, (ˆπ, 1 ˆπ, ˆπX, (1 ˆπ)X). Compute the sample average ( T Ẽ[ĥj (X) ˆπ(X) 1 T )] 1 ˆπ(X) i.e. average differece i ĥ j (X) betwee the treated ad cotrol after propesity score weightig. If model S is correct, the the sample averages relative to stadard errors, or z-ratios, should be statistically osigificat from zero. Examiatio of z-ratios agaist the stadard ormal ca reveal possible misspecificatio of model S.

z ratio 4 2 0 2 4 z ratio 4 2 0 2 4 1 2 3 4 Model 1 2 3 4 Model

Cofoudig case Data: (X i, Y T i, T i ), i = 1, 2,..., Likelihood: L 1 L 2 [ ] = (1 π(x i )) 1 T i π(x i ) T i [ ] H 0 ({X i, Y 0i }) 1 T i H 1 ({X i, Y 1i }) T i where H 0 is the distributio P ({Y 0 } T = 0, X)P ({X}) ad H 1 is the distributio P ({Y 1 } T = 1, X)P ({X}). H 0 ad H 1 iduce the same margial distributios o the covariate space X. Equivaletly, h(x) dh 0 (x, y 0 ) = h(x) dh 1 (x, y 1 ) for each bouded fuctio h o X. Covergece of previous estimates: (Ĝ 0, Ĝ 1 ) (H 0, H 1 ) ˆµ 1, µ 1 E[E(Y 1 T = 1, X)] ˆµ 0, µ 0 E[E(Y 0 T = 0, X)]

Umeasured cofoudig: gaps betwee P ({Y 0 } T = 0, X) ad P ({Y 0 } T = 1, X) P ({Y 1 } T = 0, X) ad P ({Y 1 } T = 1, X) i.e. systematic differeces betwee the treated ad utreated eve if they received the same treatmet. Defie the Rado-Nikodym derivatives: λ 0 (Y 0 ; X) = P (dy 0 T = 1, X) P (dy 0 T = 0, X), λ 1 (Y 1 ; X) = P (dy 1 T = 0, X) P (dy 1 T = 1, X). The case λ 0 = λ 1 1 correspods to o cofoudig, while deviatios of λ 0 ad λ 1 from 1 idicate umeasured cofoudig. By Bayes rule, λ 0 ad λ 1 ca be see as odds ratios: λ 0 (Y 0 ; X) = 1 π(x) P (T = 1 Y 0, X) π(x) P (T = 0 Y 0, X), λ 1 (Y 1 ; X) = π(x) P (T = 0 Y 1, X) 1 π(x) P (T = 1 Y 1, X). A sesitivity aalysis model: Λ 1 λ 0 (Y 0 ; X), λ 1 (Y 1 ; X) Λ, where Λ 1 idicates the degree of departure from o cofoudig.

Let ĥ c = (ˆπ, 1 ˆπ, ĥ 1,..., ĥ m c). For a value of Λ, fid bouds for y t λ t dh t by liear programmig: mi or max y t λ t dĝ t subject to λ t dĝ t = 1, ˆπ(x)λ t dĝ t = ˆπ(x) dĝ t, ĥ j (x)λ t dĝ t = ĥ j (x) dĝ t, j = 1,..., m c, ad 1 Λ λ t Λ. Ĝ 1 is supported o {(X i, Y 1i )},...,1 ad Ĝ 0 o {(X i, Y 0,i )} i=1 +1,...,. Itegral is fiite sum. The ukows are the values of λ t o observed data: λ 1i = λ 1 (Y 1i ; X i ), i = 1,..., 1, λ 0i = λ 0 (Y 0i ; X i ), i = 1 + 1,...,. Comparisos of the distributios Ĝ 0 [Y 0 T = 0, X][X], Ĝ 1 [Y 1 T = 1, X][X] λ 0 dĝ 0 [Y 0 T = 1, X][X], λ 1 dĝ 1 [Y 1 T = 0, X][X] idicate (i) balace o covariates, (ii) hidde bias, ad (iii) causal effects.