Fair Inference Through Semiparametric-Efficient Estimation Over Constraint-Specific Paths

Similar documents
Targeted Maximum Likelihood Estimation in Safety Analysis

Targeted Learning for High-Dimensional Variable Importance

Empirical Bayes Moderation of Asymptotically Linear Parameters

The International Journal of Biostatistics

University of California, Berkeley

University of California, Berkeley

Empirical Bayes Moderation of Asymptotically Linear Parameters

University of California, Berkeley

Targeted Maximum Likelihood Estimation for Adaptive Designs: Adaptive Randomization in Community Randomized Trial

University of California, Berkeley

University of California, Berkeley

SIMPLE EXAMPLES OF ESTIMATING CAUSAL EFFECTS USING TARGETED MAXIMUM LIKELIHOOD ESTIMATION

Cross-validated Bagged Learning

Statistical Inference for Data Adaptive Target Parameters

University of California, Berkeley

Targeted Group Sequential Adaptive Designs

Application of Time-to-Event Methods in the Assessment of Safety in Clinical Trials

University of California, Berkeley

University of California, Berkeley

University of California, Berkeley

University of California, Berkeley

University of California, Berkeley

Data Adaptive Estimation of the Treatment Specific Mean in Causal Inference R-package cvdsa

Causal Inference Basics

Targeted Maximum Likelihood Estimation of the Parameter of a Marginal Structural Model

University of California, Berkeley

Data-adaptive smoothing for optimal-rate estimation of possibly non-regular parameters

e author and the promoter give permission to consult this master dissertation and to copy it or parts of it for personal use. Each other use falls

Modification and Improvement of Empirical Likelihood for Missing Response Problem

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)

COMP90051 Statistical Machine Learning

University of California, Berkeley

Combining multiple observational data sources to estimate causal eects

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

Deductive Derivation and Computerization of Semiparametric Efficient Estimation

University of California, Berkeley

Construction and statistical analysis of adaptive group sequential designs for randomized clinical trials

University of California, Berkeley

Collaborative Targeted Maximum Likelihood Estimation. Susan Gruber

Machine Learning And Applications: Supervised Learning-SVM

Algorithmisches Lernen/Machine Learning

A note on L convergence of Neumann series approximation in missing data problems

A Sampling of IMPACT Research:

University of California, Berkeley

Bias-Variance Tradeoff

ABC random forest for parameter estimation. Jean-Michel Marin

Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models

Towards the Realistic, Robust, and Efficient Assessment of Causal Effects with Stochastic Shift Interventions

Announcements Kevin Jamieson

Lecture 2 Machine Learning Review

UC Berkeley UC Berkeley Electronic Theses and Dissertations

Summary of Extending the Rank Likelihood for Semiparametric Copula Estimation, by Peter Hoff

Lecture 3: Statistical Decision Theory (Part II)

1 Basic summary of article to set stage for discussion

PAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018

Statistical Applications in Genetics and Molecular Biology

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA

Data splitting. INSERM Workshop: Evaluation of predictive models: goodness-of-fit and predictive power #+TITLE:

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Selective Inference for Effect Modification

University of California, Berkeley

ECE521 week 3: 23/26 January 2017

Topics and Papers for Spring 14 RIT

Primal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing

Causal Inference for Case-Control Studies. Sherri Rose. A dissertation submitted in partial satisfaction of the. requirements for the degree of

Harvard University. Harvard University Biostatistics Working Paper Series

Statistical Analysis of List Experiments

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

An Introduction to Causal Mediation Analysis. Xu Qin University of Chicago Presented at the Central Iowa R User Group Meetup Aug 10, 2016

arxiv: v1 [stat.me] 5 Apr 2017

Targeted Maximum Likelihood Estimation for Dynamic Treatment Regimes in Sequential Randomized Controlled Trials

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion

Statistical Inference and Ensemble Machine Learning for Dependent Data. Molly Margaret Davies. A dissertation submitted in partial satisfaction of the

Lecture 3: Introduction to Complexity Regularization

Estimation of Optimal Treatment Regimes Via Machine Learning. Marie Davidian

ENHANCED PRECISION IN THE ANALYSIS OF RANDOMIZED TRIALS WITH ORDINAL OUTCOMES

Sensitivity analysis and distributional assumptions

Calibration Estimation for Semiparametric Copula Models under Missing Data

Integrated approaches for analysis of cluster randomised trials

University of California, Berkeley

Stochastic optimization in Hilbert spaces

Kavli IPMU-Berkeley Symposium "Statistics, Physics and Astronomy" January , 2018 Lecture Hall, Kavli IPMU

University of California, Berkeley

Probabilistic Graphical Models

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University

Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview

University of California, Berkeley

Learning 2 -Continuous Regression Functionals via Regularized Riesz Representers

Program Evaluation with High-Dimensional Data

The Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models

Variable selection and machine learning methods in causal inference

Model Selection and Geometry

STATISTICAL INFERENCE FOR SURVEY DATA ANALYSIS

Discussion of Papers on the Extensions of Propensity Score

Machine Learning Lecture 7

Long-Run Covariability

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

Preliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference

Transcription:

Fair Inference Through Semiparametric-Efficient Estimation Over Constraint-Specific Paths for New Developments in Nonparametric and Semiparametric Statistics, Joint Statistical Meetings; Vancouver, BC, Canada; 02 Aug. 2018 Nima Hejazi Group in Biostatistics, and Center for Computational Biology, University of California, Berkeley nimahejazi.org Twitter: nshejazi GitHub: nhejazi slides: bit.ly/jsm_fairtmle_2018

Preview: Summary Recent work suggests that the widespread use of machine learning algorithms has had negative social and policy consequences. The widespread use of machine learning in policy issues violates human intuitions of bias. We propose a general algorithm for constructing fair optimal ensemble ML estimators via cross-validation. Constraints may be imposed as functionals defined over the target parameter of interest. Estimating constrained parameters may be seen as iteratively minimizing a loss function along a constrained path in the parameter space Ψ. 1

What s fair if machines aren t? 2

Fairness is machine learning? Another potential result: a more diverse workplace. The software relies on data to surface candidates from a wide variety of places... free of human biases. But software is not free of human influence. Algorithms are written and maintained by people... As a result... algorithms can reinforce human prejudices. - Miller (2015) 3

Addressing bias in a technical manner The careless use of machine learning may induce unjustified bias. Problematic discrimination by ML approaches leads to solutions with practical irrelevance. Ill-considered discrimination by ML approaches leads to solutions that are morally problematic. Two doctrines of discrimination: 1. Disparate treatment: formal or intentional 2. Disparate impact: unjustified or avoidable 4

Background, data, notation An observational unit: O = (W, X, Y), where W is baseline covariates, X a sensitive characteristic, Y an outcome of interest. Consider n i.i.d. copies O 1,..., O n of O P 0 M. Here, M is an infinite-dimensional statistical model (i.e., indexed by an infinite-dimensional vector). We discuss the estimation of a target parameter ψ : M R, where Ψ(P 0 ) = arg min E P0 L(ψ) ψ Ψ 5

Just a few fairness criteria Let C : (X, W) Y {0, 1} be a classifier; X {a, b}. Demographic parity: P (X=a) (C = 1) = P (X=b) (C = 1) Accuracy parity: P (X=a) (C = Y) = P (X=b) (C = Y) True positive parity: P (X=a) (C = 1 Y = 1) = P (X=b) (C = 1 Y = 1) False positive parity: P (X=a) (C = 1 Y = 0) = P (X=b) (C = 1 Y = 0) Positive predictive value parity: P (X=a) (Y = 1 C = 1) = P (X=b) (Y = 1 C = 1) Negative predictive value parity: P (X=a) (Y = 1 C = 0) = P (X=b) (Y = 1 C = 0) 6

Wait, where did the fairness go? Goal: estimate Ψ(P 0 ) = E P0 (Y X, W). Let Y {0, 1} and use negative log-likelihood loss: L(ψ) = (Y log(p(y X, W))+(1 Y) log(1 P(Y X, W))) Fairness criterion equalized odds: Θ ψ (P 0 ) = y {E P0 (L(ψ)(O) X = 1, Y = y) E P0 (L(ψ)(O) X = 0, Y = y)} 2 Let Θ ψ (P 0 ) : M IR be a pathwise differentiable functional for each ψ Ψ. 7

Constrained functional parameters Estimate target parameter under a constraint: Ψ(P 0 ) = arg min E P0 L(ψ) ψ Ψ,Θ ψ (P 0 )=0 Goal: estimate Ψ (P 0 ), the projection of Ψ(P 0 ) onto the subspace Ψ (P 0 ) = {ψ Ψ : Θ ψ (P 0 ) = 0}: (Ψ, λ) = (Ψ (P 0 ), Λ(P 0 )) arg min E P0 L(ψ)+λΘ ψ (P 0 ). ψ Ψ,λ Lemma: If Ψ(P 0 ) = (Ψ (P 0 ), Λ(P 0 )) is the minimizer of the Lagrange multiplier penalized loss, then Ψ (P 0 ) = arg min E P0 L(ψ). ψ Ψ,Θ ψ (P 0 )=0 8

Learning with constrained parameters Risk function: R( ψ P) P n L(ψ ) + λθ(ψ P), where ψ = (ψ, λ) For ψ(p n ) = ( ˆΨ (P n ), ˆλ(P n )) of Ψ(P 0 ), and sample splitting scheme B n {0, 1} n : R 0 ( ψ, P n ) = E Bn P 0 L( ˆΨ (P 0 n,b n ))+ˆΛ(P n )E Bn Θ( ˆΨ (P 0 n,b n ) P 0 ) 9

Learning with constrained parameters Cross-validated risk: R n,cv ( ψ, P n ) =E Bn P 1 n,b n L( ˆΨ (P 0 n,b n )) (1) + ˆΛ(P n )E Bn Θ( ˆΨ (P 0 n,b n ) P n,b n ) (2) Given candidate estimators ψ j (P n ) = ( ˆΨ j (P n), ˆΛ j (P n )), j = 1,..., J, the CV selector is given by: J n = arg min j R n,cv ( ψ j, P n ). We may define an optimal estimate of Ψ by ψ n ψ Jn (P n ) = ( ˆΨ Jn (P n ), ˆλ Jn (P n )) 10

Mappings with constrained learners A straightforward approach to generating estimators of the constrained parameter would be to simply generate a mapping according to the following simple process: 1. Generate an unconstrained estimate ψ n of the unconstrained parameter ψ 0, 2. Map an estimator Θ ψn,n of the constraint Θ ψn (P 0 ) into the path ψ n,λ. The corresponding solution ψ n = ψ n,λn of Θ ψn,λ n,n = 0 generates an estimator of the constrained parameter. 11

Constraint-specific paths Consider ψ 0,λ = arg max ψ Ψ E P0 L(ψ) + λθ 0 (ψ). {ψ 0,λ : λ} represents a path in the parameter space Ψ through ψ 0 at λ = 0. This is a constraint-specific path, as it produces an estimate under the desired functional constraint. Leverage this construction to map an initial estimator of the unconstrained parameter ψ 0 into its corresponding constrained version ψ 0. 12

Future work Further generalization of constraint-specific paths: the solution path {ψ 0,λ : λ} in the parameter space Ψ through ψ 0 at λ = 0. Further develop relation between constraint-specific paths and universal least favorable submodels. Integration of the approach of constraint-specific paths with classical classical targeted maximum likelihood estimation in particular, what, if any, are the implications for inference? 13

Review: Summary Recent work suggests that the widespread use of machine learning algorithms has had negative social and policy consequences. The widespread use of machine learning in policy issues violates human intuitions of bias. We propose a general algorithm for constructing fair optimal ensemble ML estimators via cross-validation. Constraints may be imposed as functionals defined over the target parameter of interest. Estimating constrained parameters may be seen as iteratively minimizing a loss function along a constrained path in the parameter space Ψ. 14

References I Bang, H. and Robins, J. M. (2005). Doubly robust estimation in missing data and causal inference models. Biometrics, 61(4):962 973. Breiman, L. (1996). Stacked regressions. Machine Learning, 24(1):49 64. DiCiccio, T. J. and Romano, J. P. (1990). Nonparametric confidence limits by resampling methods and least favorable families. International Statistical Review/Revue Internationale de Statistique, pages 59 76. Duchi, J., Glynn, P., and Namkoong, H. (2016). Statistics of robust optimization: A generalized empirical likelihood approach. ArXiv e-prints. 15

References II Dudoit, S. and van der Laan, M. J. (2005). Asymptotics of cross-validated risk estimation in estimator selection and performance assessment. Statistical Methodology, 2(2):131 154. Hardt, M., Price, E., Srebro, N., et al. (2016). Equality of opportunity in supervised learning. In Advances in Neural Information Processing Systems, pages 3315 3323. Severini, T. A. and Wong, W. H. (1992). Profile likelihood and conditionally parametric models. The Annals of Statistics, pages 1768 1802. 16

References III Stein, C. (1956). Efficient nonparametric testing and estimation. In Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics. The Regents of the University of California. Tsiatis, A. (2007). Semiparametric Theory and Missing Data. Springer Science & Business Media. van der Laan, M. J. and Dudoit, S. (2003). Unified cross-validation methodology for selection among estimators and a general cross-validated adaptive epsilon-net estimator: Finite sample oracle inequalities and examples. 17

References IV van der Laan, M. J., Dudoit, S., and Keles, S. (2004). Asymptotic optimality of likelihood-based cross-validation. Statistical Applications in Genetics and Molecular Biology, 3(1):1 23. van der Laan, M. J., Polley, E. C., and Hubbard, A. E. (2007). Super Learner. Statistical Applications in Genetics and Molecular Biology, 6(1). van der Laan, M. J. and Rubin, D. (2006). Targeted maximum likelihood learning. The International Journal of Biostatistics, 2(1). 18

Acknowledgments Mark J. van der Laan David C. Benkeser University of California, Berkeley Emory University Funding source: National Institutes of Health: T32-LM012417 02 19

Thank you. Slides: bit.ly/jsm_fairtmle_2018 nimahejazi.org Twitter: nshejazi GitHub: nhejazi 20