Counterfactual Evaluation and Learning

Size: px
Start display at page:

Download "Counterfactual Evaluation and Learning"

Transcription

1 SIGIR 26 Tutorial Counterfactual Evaluation and Learning Adith Swaminathan, Thorsten Joachims Department of Computer Science & Department of Information Science Cornell University Website: Funded in part through NSF Awards IIS , IIS-27686, IIS

2 Examples Search engines Entertainment media E-commerce Smart homes, robots, etc. Logs of User Behavior for Evaluating system performance Learning improved systems and gathering knowledge Personalization User Interactive Systems

3 Interactive System Schematic Utility: U π Action y for x System π

4 Ad Placement Context x: User and page Action y: Ad that is placed Feedback δ x, y : Click / no-click

5 Context x: User Action y: Portfolio of newsarticles Feedback δ x, y : Reading time in minutes News Recommender

6 Search Engine Context x: Query Action y: Ranking Feedback δ x, y : win/loss against baseline in interleaving

7 Log Data from Interactive Systems Data context S = π action reward / loss x, y, δ,, x n, y n, δ n Partial Information (aka Contextual Bandit ) Feedback Properties Contexts x i drawn i.i.d. from unknown P(X) Actions y i selected by existing system π : X Y Feedback δ i from unknown function δ: X Y R [Zadrozny et al., 23] [Langford & Li], [Bottou, et al., 24]

8 Goals for this Tutorial Use interaction log data S = x, y, δ,, x n, y n, δ n for Evaluation: Estimate online measures of some system π offline. System π is typically different from π that generated log. Learning: Find new system π that improves performance over π. Do not rely on interactive experiments like in online learning.

9 SIGIR 26 Tutorial Counterfactual Evaluation and Learning PART : EVALUATION

10 Evaluation: Outline Evaluating Online Metrics Offline A/B Testing (on-policy) Counterfactual estimation from logs (off-policy) Approach : Model the world Estimation via reward prediction Approach 2: Model the bias Counterfactual Model Inverse propensity scoring (IPS) estimator Advanced Estimators Self-normalized IPS estimator Doubly robust estimator Slates estimator Case Studies Summary & Demonstration with code samples

11 Online Performance Metrics Example metrics CTR Revenue Time-to-success Interleaving Etc. Correct choice depends on application and is not the focus of this tutorial. This tutorial: Metric encoded as δ(x, y) [click/payoff/time for (x,y) pair]

12 System Definition [Deterministic Policy]: Function y = π(x) that picks action y for context x. Definition [Stochastic Policy]: Distribution π y x that samples action y given context x Y x Y x π x π (Y x) π 2 x π 2 (Y x)

13 Definition [Utility of Policy]: System Performance The expected reward / utility U(π) of policy π is U π = න න δ x, y π y x P x dx dy Y x i e.g. reading time of user x for portfolio y Y x j π(y x i ) π(y x j )

14 Online Evaluation: A/B Testing Given S = x, y, δ,, x n, y n, δ n collected under π, n U π = n i= δ i A/B Testing Deploy π : Draw x P X, predict y π Y x, get δ(x, y) Deploy π 2 : Draw x P X, predict y π 2 Y x, get δ(x, y) Deploy π H : Draw x P X, predict y π H Y x, get δ(x, y)

15 Pros and Cons of A/B Testing Pro User centric measure No need for manual ratings No user/expert mismatch Cons Requires interactive experimental control Risk of fielding a bad or buggy π i Number of A/B Tests limited Long turnaround time

16 Evaluating Online Metrics Offline Online: On-policy A/B Test Draw S from π U π Draw S 2 from π 2 U π 2 Draw S 3 from π 3 U π 3 Draw S 4 from π 4 U π 4 Draw S 5 from π 5 U π 5 Draw S 6 from π 6 U π 6 Draw S 7 from π 7 U π 7 Offline: Off-policy Counterfactual Estimates Draw S from π U h U h U h U h U h U h U h U h U h U h U h U h U h U h U h U h U h U h U h U h U h U h U h U h U h U π 6 U π 2 U π 8 U π 24 U π 3

17 Evaluation: Outline Evaluating Online Metrics Offline A/B Testing (on-policy) Counterfactual estimation from logs (off-policy) Approach : Model the world Estimation via reward prediction Approach 2: Model the bias Counterfactual Model Inverse propensity scoring (IPS) estimator Advanced Estimators Self-normalized IPS estimator Doubly robust estimator Slates estimator Case Studies Summary & Demonstration with code samples

18 Approach : Reward Predictor Idea: Use S = x, y, δ,, x n, y n, δ n from π to estimate reward predictor መδ x, y Deterministic π: Simulated A/B Testing with predicted መδ x, y For actions y i = π(x i ) from new policy π, generate predicted log S = x, y, መδ x, y,, x n, y n, መδ x n, y n Y x δ x, y δ x, y 2 Estimate performace of π via U rp π = σ n n i= መδ x i, y i Stochastic π: U rp π = σ n i= n σ y መδ x i, y π(y x i ) መδ x, y δ x, y

19 Regression for Reward Prediction Learn መδ: x y R. Represent via features Ψ x, y 2. Learn regression based on Ψ x, y from S collected under π 3. Predict መδ x, y for y = π(x) of new policy π Ψ መδ(x, y) δ x, y Ψ 2

20 News Recommender: Exp Setup Context x: User profile Action y: Ranking Pick from 7 candidates to place into 3 slots Reward δ: Revenue Complicated hidden function Logging policy π : Non-uniform randomized logging system Placket-Luce explore around current production ranker (see case study)

21 News Recommender: Results RP is inaccurate even with more training and logged data

22 Problems of Reward Predictor Modeling bias choice of features and model Selection bias π s actions are overrepresented Ψ መδ(x, y) δ x, π x U rp π = n i መδ x i, π x i Can be unreliable and biased Ψ 2

23 Evaluation: Outline Evaluating Online Metrics Offline A/B Testing (on-policy) Counterfactual estimation from logs (off-policy) Approach : Model the world Estimation via reward prediction Approach 2: Model the bias Counterfactual Model Inverse propensity scoring (IPS) estimator Advanced Estimators Self-normalized IPS estimator Doubly robust estimator Slates estimator Case Studies Summary & Demonstration with code samples

24 Idea: Approach Model the Bias Fix the mismatch between the distribution π Y x that generated the data and the distribution π Y x we aim to evaluate. π U π π y x = න න δ x, y π y x P x dx dy

25 Patients x i,..., n Counterfactual Model Example: Treating Heart Attacks Treatments: Y Bypass / Stent / Drugs Chosen treatment for patient x i : y i Outcomes: δ i 5-year survival: / Which treatment is best?

26 Patients i,..., n Placing Vertical Example: Treating Heart Attacks Treatments: Y Bypass / Stent / Drugs Chosen treatment for patient x i : y i Outcomes: δ i 5-year survival: / Which treatment is best? Counterfactual Model Pos / Pos 2/ Pos 3 Click / no Click on SERP Pos Pos 2 Pos 3

27 Patients x i, i,..., n Counterfactual Model Example: Treating Heart Attacks Treatments: Y Bypass / Stent / Drugs Chosen treatment for patient x i : y i Outcomes: δ i 5-year survival: / Which treatment is best? Everybody Drugs Everybody Stent Everybody Bypass Drugs 3/4, Stent 2/3, Bypass 2/4 really?

28 Patients Treatment Effects Average Treatment Effect of Treatment y U y = n σ i δ(x i, y) Example U bypass = 5 U stent = 7 U drugs = 4 Factual Outcome Counterfactual Outcomes

29 Patients Assignment Mechanism Probabilistic Treatment Assignment For patient i: π Y i = y x i Selection Bias Inverse Propensity Score Estimator Propensity: p i = π Y i = y i x i Unbiased: E U y =U y, if π Y i = y x i > for all i Example U ips y = n i I{y i = y} p i δ(x i, y i ) U drugs = =.36 <.75 π Y i = y x i

30 Experimental vs Observational Controlled Experiment Assignment Mechanism under our control Propensities p i = π Y i = y i x i are known by design Requirement: y: π Y i = y x i > (probabilistic) Observational Study Assignment Mechanism not under our control Propensities p i need to be estimated Estimate π Y i z i = π Y i x i ) based on features z i Requirement: π Y i z i ) = π Y i δ i, z i (unconfounded)

31 Patients Conditional Treatment Policies Policy (deterministic) Context x i describing patient Pick treatment y i based on x i : y i = π(x i ) Example policy: π A = drugs, π B = stent, π C = bypass Average Treatment Effect U π = n σ i δ(x i, π x i ) IPS Estimator U ips π = n i I{y i = π(x i )} p i δ(x i, y i ) B C A B A B A C A C B

32 Patients Stochastic Treatment Policies Policy (stochastic) Context x i describing patient Pick treatment y based on x i : π(y x i ) Note Assignment Mechanism is a stochastic policy as well! Average Treatment Effect U π = n σ i σ y δ(x i, y)π y x i IPS Estimator U π = n σ i π y i x i p i δ(x i, y i ) B C A B A B A C A C B

33 Recorded in Log Counterfactual Model = Logs Medical Search Engine Ad Placement Recommender Context x i Diagnostics Query User + Page User + Movie Treatment y i BP/Stent/Drugs Ranking Placed Ad Watched Movie Outcome δ i Survival Click metric Click / no Click Star rating Propensities p i controlled (*) controlled controlled observational New Policy π FDA Guidelines Ranker Ad Placer Recommender T-effect U(π) Average quality of new policy.

34 Evaluation: Outline Evaluating Online Metrics Offline A/B Testing (on-policy) Counterfactual estimation from logs (off-policy) Approach : Model the world Estimation via reward prediction Approach 2: Model the bias Counterfactual Model Inverse propensity scoring (IPS) estimator Advanced Estimators Self-normalized IPS estimator Doubly robust estimator Slates estimator Case Studies Summary & Demonstration with code samples

35 System Evaluation via Inverse Propensity Scoring Definition [IPS Utility Estimator]: Given S = x, y, δ,, x n, y n, δ n collected under π, U ips π = n i= Unbiased estimate of utility for any π, if propensity nonzero whenever π y i x i >. Note: If π = π, then online A/B Test with U ips π = n δ i Off-policy vs. On-policy estimation. i n δ i π y i x i π y i x i Propensity p i [Horvitz & Thompson, 952] [Rubin, 983] [Zadrozny et al., 23] [Li et al., 2]

36 Illustration of IPS IPS Estimator: U IPS π = n i π y i x i π y i x i δ i π Y x π(y x)

37 IPS Estimator is Unbiased E U π = n x,y x n,y n i π y i x i π y i x i ) δ x i, y i π y x π y n x n P x P(x n ) = n π y i x i π y x P(x ) π y n x n P(x n ) π y i x i ) δ x i, y i x,y x n,y n i = n i π y x P(x ) π y n x n P(x n ) x,y x n,y n π y i x i π y i x i ) δ x i, y i Probabilistic Assignment = n i = n i π y i x i P(x i ) x i,y i π y i x i π y i x i ) δ x i, y i π y i x i P(x i )δ x i, y i = x i,y i n i U(π) = U π

38 News Recommender: Results IPS eventually beats RP; variance decays as O n

39 Adith takes over

Counterfactual Model for Learning Systems

Counterfactual Model for Learning Systems Counterfactual Model for Learning Systems CS 7792 - Fall 28 Thorsten Joachims Department of Computer Science & Department of Information Science Cornell University Imbens, Rubin, Causal Inference for Statistical

More information

Outline. Offline Evaluation of Online Metrics Counterfactual Estimation Advanced Estimators. Case Studies & Demo Summary

Outline. Offline Evaluation of Online Metrics Counterfactual Estimation Advanced Estimators. Case Studies & Demo Summary Outline Offline Evaluation of Online Metrics Counterfactual Estimation Advanced Estimators 1. Self-Normalized Estimator 2. Doubly Robust Estimator 3. Slates Estimator Case Studies & Demo Summary IPS: Issues

More information

Recommendations as Treatments: Debiasing Learning and Evaluation

Recommendations as Treatments: Debiasing Learning and Evaluation ICML 2016, NYC Tobias Schnabel, Adith Swaminathan, Ashudeep Singh, Navin Chandak, Thorsten Joachims Cornell University, Google Funded in part through NSF Awards IIS-1247637, IIS-1217686, IIS-1513692. Romance

More information

Reducing contextual bandits to supervised learning

Reducing contextual bandits to supervised learning Reducing contextual bandits to supervised learning Daniel Hsu Columbia University Based on joint work with A. Agarwal, S. Kale, J. Langford, L. Li, and R. Schapire 1 Learning to interact: example #1 Practicing

More information

Doubly Robust Policy Evaluation and Learning

Doubly Robust Policy Evaluation and Learning Doubly Robust Policy Evaluation and Learning Miroslav Dudik, John Langford and Lihong Li Yahoo! Research Discussed by Miao Liu October 9, 2011 October 9, 2011 1 / 17 1 Introduction 2 Problem Definition

More information

Counterfactual Evaluation and Learning from Logged User Feedback

Counterfactual Evaluation and Learning from Logged User Feedback Counterfactual Evaluation and Learning from Logged User Feedback Adith Swaminathan adith@cs.cornell.edu Department of Computer Science Cornell University Committee: Thorsten Joachims (chair), Johannes

More information

Information Retrieval

Information Retrieval Information Retrieval Online Evaluation Ilya Markov i.markov@uva.nl University of Amsterdam Ilya Markov i.markov@uva.nl Information Retrieval 1 Course overview Offline Data Acquisition Data Processing

More information

Learning with Exploration

Learning with Exploration Learning with Exploration John Langford (Yahoo!) { With help from many } Austin, March 24, 2011 Yahoo! wants to interactively choose content and use the observed feedback to improve future content choices.

More information

arxiv: v2 [cs.lg] 26 Jun 2017

arxiv: v2 [cs.lg] 26 Jun 2017 Effective Evaluation Using Logged Bandit Feedback from Multiple Loggers Aman Agarwal, Soumya Basu, Tobias Schnabel, Thorsten Joachims Cornell University, Dept. of Computer Science Ithaca, NY, USA [aa2398,sb2352,tbs49,tj36]@cornell.edu

More information

Offline Evaluation of Ranking Policies with Click Models

Offline Evaluation of Ranking Policies with Click Models Offline Evaluation of Ranking Policies with Click Models Shuai Li The Chinese University of Hong Kong Joint work with Yasin Abbasi-Yadkori (Adobe Research) Branislav Kveton (Google Research, was in Adobe

More information

Learning from Rational * Behavior

Learning from Rational * Behavior Learning from Rational * Behavior Josef Broder, Olivier Chapelle, Geri Gay, Arpita Ghosh, Laura Granka, Thorsten Joachims, Bobby Kleinberg, Madhu Kurup, Filip Radlinski, Karthik Raman, Tobias Schnabel,

More information

Large-scale Information Processing, Summer Recommender Systems (part 2)

Large-scale Information Processing, Summer Recommender Systems (part 2) Large-scale Information Processing, Summer 2015 5 th Exercise Recommender Systems (part 2) Emmanouil Tzouridis tzouridis@kma.informatik.tu-darmstadt.de Knowledge Mining & Assessment SVM question When a

More information

Bayesian Contextual Multi-armed Bandits

Bayesian Contextual Multi-armed Bandits Bayesian Contextual Multi-armed Bandits Xiaoting Zhao Joint Work with Peter I. Frazier School of Operations Research and Information Engineering Cornell University October 22, 2012 1 / 33 Outline 1 Motivating

More information

The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks

The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks Yingfei Wang, Chu Wang and Warren B. Powell Princeton University Yingfei Wang Optimal Learning Methods June 22, 2016

More information

COUNTERFACTUAL REASONING AND LEARNING SYSTEMS LÉON BOTTOU

COUNTERFACTUAL REASONING AND LEARNING SYSTEMS LÉON BOTTOU COUNTERFACTUAL REASONING AND LEARNING SYSTEMS LÉON BOTTOU MSR Joint work with Jonas Peters Joaquin Quiñonero Candela Denis Xavier Charles D. Max Chickering Elon Portugaly Dipankar Ray Patrice Simard Ed

More information

Slides at:

Slides at: Learning to Interact John Langford @ Microsoft Research (with help from many) Slides at: http://hunch.net/~jl/interact.pdf For demo: Raw RCV1 CCAT-or-not: http://hunch.net/~jl/vw_raw.tar.gz Simple converter:

More information

Automated Tuning of Ad Auctions

Automated Tuning of Ad Auctions Automated Tuning of Ad Auctions Dilan Gorur, Debabrata Sengupta, Levi Boyles, Patrick Jordan, Eren Manavoglu, Elon Portugaly, Meng Wei, Yaojia Zhu Microsoft {dgorur, desengup, leboyles, pajordan, ermana,

More information

Off-policy Evaluation and Learning

Off-policy Evaluation and Learning COMS E6998.001: Bandits and Reinforcement Learning Fall 2017 Off-policy Evaluation and Learning Lecturer: Alekh Agarwal Lecture 7, October 11 Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

ECE521 Lecture7. Logistic Regression

ECE521 Lecture7. Logistic Regression ECE521 Lecture7 Logistic Regression Outline Review of decision theory Logistic regression A single neuron Multi-class classification 2 Outline Decision theory is conceptually easy and computationally hard

More information

Weighting Methods. Harvard University STAT186/GOV2002 CAUSAL INFERENCE. Fall Kosuke Imai

Weighting Methods. Harvard University STAT186/GOV2002 CAUSAL INFERENCE. Fall Kosuke Imai Weighting Methods Kosuke Imai Harvard University STAT186/GOV2002 CAUSAL INFERENCE Fall 2018 Kosuke Imai (Harvard) Weighting Methods Stat186/Gov2002 Fall 2018 1 / 13 Motivation Matching methods for improving

More information

New Algorithms for Contextual Bandits

New Algorithms for Contextual Bandits New Algorithms for Contextual Bandits Lev Reyzin Georgia Institute of Technology Work done at Yahoo! 1 S A. Beygelzimer, J. Langford, L. Li, L. Reyzin, R.E. Schapire Contextual Bandit Algorithms with Supervised

More information

Causal Inference Basics

Causal Inference Basics Causal Inference Basics Sam Lendle October 09, 2013 Observed data, question, counterfactuals Observed data: n i.i.d copies of baseline covariates W, treatment A {0, 1}, and outcome Y. O i = (W i, A i,

More information

Bandit Algorithms. Zhifeng Wang ... Department of Statistics Florida State University

Bandit Algorithms. Zhifeng Wang ... Department of Statistics Florida State University Bandit Algorithms Zhifeng Wang Department of Statistics Florida State University Outline Multi-Armed Bandits (MAB) Exploration-First Epsilon-Greedy Softmax UCB Thompson Sampling Adversarial Bandits Exp3

More information

Counterfactual Learning-to-Rank for Additive Metrics and Deep Models

Counterfactual Learning-to-Rank for Additive Metrics and Deep Models Counterfactual Learning-to-Rank for Additive Metrics and Deep Models Aman Agarwal Cornell University Ithaca, NY aman@cs.cornell.edu Ivan Zaitsev Cornell University Ithaca, NY iz44@cornell.edu Thorsten

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Active Learning and Optimized Information Gathering

Active Learning and Optimized Information Gathering Active Learning and Optimized Information Gathering Lecture 7 Learning Theory CS 101.2 Andreas Krause Announcements Project proposal: Due tomorrow 1/27 Homework 1: Due Thursday 1/29 Any time is ok. Office

More information

Estimation Considerations in Contextual Bandits

Estimation Considerations in Contextual Bandits Estimation Considerations in Contextual Bandits Maria Dimakopoulou Zhengyuan Zhou Susan Athey Guido Imbens arxiv:1711.07077v4 [stat.ml] 16 Dec 2018 Abstract Contextual bandit algorithms are sensitive to

More information

Learning from Logged Implicit Exploration Data

Learning from Logged Implicit Exploration Data Learning from Logged Implicit Exploration Data Alexander L. Strehl Facebook Inc. 1601 S California Ave Palo Alto, CA 94304 astrehl@facebook.com Lihong Li Yahoo! Research 4401 Great America Parkway Santa

More information

1 [15 points] Frequent Itemsets Generation With Map-Reduce

1 [15 points] Frequent Itemsets Generation With Map-Reduce Data Mining Learning from Large Data Sets Final Exam Date: 15 August 2013 Time limit: 120 minutes Number of pages: 11 Maximum score: 100 points You can use the back of the pages if you run out of space.

More information

Deep-Treat: Learning Optimal Personalized Treatments from Observational Data using Neural Networks

Deep-Treat: Learning Optimal Personalized Treatments from Observational Data using Neural Networks Deep-Treat: Learning Optimal Personalized Treatments from Observational Data using Neural Networks Onur Atan 1, James Jordon 2, Mihaela van der Schaar 2,3,1 1 Electrical Engineering Department, University

More information

Doubly Robust Policy Evaluation and Optimization 1

Doubly Robust Policy Evaluation and Optimization 1 Statistical Science 2014, Vol. 29, No. 4, 485 511 DOI: 10.1214/14-STS500 c Institute of Mathematical Statistics, 2014 Doubly Robust Policy Evaluation and Optimization 1 Miroslav Dudík, Dumitru Erhan, John

More information

Online Learning with Feedback Graphs

Online Learning with Feedback Graphs Online Learning with Feedback Graphs Claudio Gentile INRIA and Google NY clagentile@gmailcom NYC March 6th, 2018 1 Content of this lecture Regret analysis of sequential prediction problems lying between

More information

Linear Regression. Machine Learning CSE546 Kevin Jamieson University of Washington. Oct 5, Kevin Jamieson 1

Linear Regression. Machine Learning CSE546 Kevin Jamieson University of Washington. Oct 5, Kevin Jamieson 1 Linear Regression Machine Learning CSE546 Kevin Jamieson University of Washington Oct 5, 2017 1 The regression problem Given past sales data on zillow.com, predict: y = House sale price from x = {# sq.

More information

A Sampling of IMPACT Research:

A Sampling of IMPACT Research: A Sampling of IMPACT Research: Methods for Analysis with Dropout and Identifying Optimal Treatment Regimes Marie Davidian Department of Statistics North Carolina State University http://www.stat.ncsu.edu/

More information

Models, Data, Learning Problems

Models, Data, Learning Problems Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Models, Data, Learning Problems Tobias Scheffer Overview Types of learning problems: Supervised Learning (Classification, Regression,

More information

Machine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods)

Machine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods) Machine Learning InstanceBased Learning (aka nonparametric methods) Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Non parametric CSE 446 Machine Learning Daniel Weld March

More information

Linear classifiers: Overfitting and regularization

Linear classifiers: Overfitting and regularization Linear classifiers: Overfitting and regularization Emily Fox University of Washington January 25, 2017 Logistic regression recap 1 . Thus far, we focused on decision boundaries Score(x i ) = w 0 h 0 (x

More information

Recap from previous lecture

Recap from previous lecture Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience

More information

Bayesian Ranker Comparison Based on Historical User Interactions

Bayesian Ranker Comparison Based on Historical User Interactions Bayesian Ranker Comparison Based on Historical User Interactions Artem Grotov a.grotov@uva.nl Shimon Whiteson s.a.whiteson@uva.nl University of Amsterdam, Amsterdam, The Netherlands Maarten de Rijke derijke@uva.nl

More information

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective Anastasios (Butch) Tsiatis and Xiaofei Bai Department of Statistics North Carolina State University 1/35 Optimal Treatment

More information

Doubly Robust Policy Evaluation and Optimization 1

Doubly Robust Policy Evaluation and Optimization 1 Statistical Science 2014, Vol. 29, No. 4, 485 511 DOI: 10.1214/14-STS500 Institute of Mathematical Statistics, 2014 Doubly Robust Policy valuation and Optimization 1 Miroslav Dudík, Dumitru rhan, John

More information

Propensity Score Weighting with Multilevel Data

Propensity Score Weighting with Multilevel Data Propensity Score Weighting with Multilevel Data Fan Li Department of Statistical Science Duke University October 25, 2012 Joint work with Alan Zaslavsky and Mary Beth Landrum Introduction In comparative

More information

Discrete Latent Variable Models

Discrete Latent Variable Models Discrete Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 14 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 14 1 / 29 Summary Major themes in the course

More information

Distribution-Free Distribution Regression

Distribution-Free Distribution Regression Distribution-Free Distribution Regression Barnabás Póczos, Alessandro Rinaldo, Aarti Singh and Larry Wasserman AISTATS 2013 Presented by Esther Salazar Duke University February 28, 2014 E. Salazar (Reading

More information

Online Learning and Sequential Decision Making

Online Learning and Sequential Decision Making Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Sequential Decision

More information

Administration. CSCI567 Machine Learning (Fall 2018) Outline. Outline. HW5 is available, due on 11/18. Practice final will also be available soon.

Administration. CSCI567 Machine Learning (Fall 2018) Outline. Outline. HW5 is available, due on 11/18. Practice final will also be available soon. Administration CSCI567 Machine Learning Fall 2018 Prof. Haipeng Luo U of Southern California Nov 7, 2018 HW5 is available, due on 11/18. Practice final will also be available soon. Remaining weeks: 11/14,

More information

Propensity Score Analysis with Hierarchical Data

Propensity Score Analysis with Hierarchical Data Propensity Score Analysis with Hierarchical Data Fan Li Alan Zaslavsky Mary Beth Landrum Department of Health Care Policy Harvard Medical School May 19, 2008 Introduction Population-based observational

More information

Interventions. Online Learning from User Interactions through Interventions. Interactive Learning Systems

Interventions. Online Learning from User Interactions through Interventions. Interactive Learning Systems Online Learning from Interactions through Interventions CS 7792 - Fall 2016 horsten Joachims Department of Computer Science & Department of Information Science Cornell University Y. Yue, J. Broder, R.

More information

Exploration Scavenging

Exploration Scavenging John Langford jl@yahoo-inc.com Alexander Strehl strehl@yahoo-inc.com Yahoo! Research, 111 W. 40th Street, New York, New York 10018 Jennifer Wortman wortmanj@seas.upenn.edu Department of Computer and Information

More information

Causal Inference with Big Data Sets

Causal Inference with Big Data Sets Causal Inference with Big Data Sets Marcelo Coca Perraillon University of Colorado AMC November 2016 1 / 1 Outlone Outline Big data Causal inference in economics and statistics Regression discontinuity

More information

Collaborative Filtering. Radek Pelánek

Collaborative Filtering. Radek Pelánek Collaborative Filtering Radek Pelánek 2017 Notes on Lecture the most technical lecture of the course includes some scary looking math, but typically with intuitive interpretation use of standard machine

More information

Introduction to Restricted Boltzmann Machines

Introduction to Restricted Boltzmann Machines Introduction to Restricted Boltzmann Machines Ilija Bogunovic and Edo Collins EPFL {ilija.bogunovic,edo.collins}@epfl.ch October 13, 2014 Introduction Ingredients: 1. Probabilistic graphical models (undirected,

More information

Behavioral Data Mining. Lecture 19 Regression and Causal Effects

Behavioral Data Mining. Lecture 19 Regression and Causal Effects Behavioral Data Mining Lecture 19 Regression and Causal Effects Outline Counterfactuals and Potential Outcomes Regression Models Causal Effects from Matching and Regression Weighted regression Counterfactuals

More information

CSE446: non-parametric methods Spring 2017

CSE446: non-parametric methods Spring 2017 CSE446: non-parametric methods Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin and Luke Zettlemoyer Linear Regression: What can go wrong? What do we do if the bias is too strong? Might want

More information

Unbiased Learning to Rank with Unbiased Propensity Estimation

Unbiased Learning to Rank with Unbiased Propensity Estimation Unbiased Learning to Rank with Unbiased Propensity Estimation Qingyao Ai CICS, UMass Amherst Amherst, MA, USA aiqy@cs.umass.edu Keping Bi CICS, UMass Amherst Amherst, MA, USA kbi@cs.umass.edu Cheng Luo

More information

arxiv: v2 [cs.lg] 26 Oct 2016

arxiv: v2 [cs.lg] 26 Oct 2016 Predicting Counterfactuals from Large Historical Data and Small Randomized rials Nir Rosenfeld Hebrew University of Jerusalem and Microsoft Research nir.rosenfeld@mail.huji.ac.il Yishay Mansour Microsoft

More information

The Self-Normalized Estimator for Counterfactual Learning

The Self-Normalized Estimator for Counterfactual Learning The Self-Normalized Estimator for Counterfactual Learning Adith Swaminathan Department of Computer Science Cornell University adith@cs.cornell.edu Thorsten Joachims Department of Computer Science Cornell

More information

Diagnostics. Gad Kimmel

Diagnostics. Gad Kimmel Diagnostics Gad Kimmel Outline Introduction. Bootstrap method. Cross validation. ROC plot. Introduction Motivation Estimating properties of an estimator. Given data samples say the average. x 1, x 2,...,

More information

Stochastic Gradient Descent

Stochastic Gradient Descent Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular

More information

L09. PARTICLE FILTERING. NA568 Mobile Robotics: Methods & Algorithms

L09. PARTICLE FILTERING. NA568 Mobile Robotics: Methods & Algorithms L09. PARTICLE FILTERING NA568 Mobile Robotics: Methods & Algorithms Particle Filters Different approach to state estimation Instead of parametric description of state (and uncertainty), use a set of state

More information

Statistical Learning Theory: Generalization Error Bounds

Statistical Learning Theory: Generalization Error Bounds Statistical Learning Theory: Generalization Error Bounds CS6780 Advanced Machine Learning Spring 2015 Thorsten Joachims Cornell University Reading: Murphy 6.5.4 Schoelkopf/Smola Chapter 5 (beginning, rest

More information

Machine Learning Theory (CS 6783)

Machine Learning Theory (CS 6783) Machine Learning Theory (CS 6783) Tu-Th 1:25 to 2:40 PM Kimball, B-11 Instructor : Karthik Sridharan ABOUT THE COURSE No exams! 5 assignments that count towards your grades (55%) One term project (40%)

More information

Machine Learning Theory (CS 6783)

Machine Learning Theory (CS 6783) Machine Learning Theory (CS 6783) Tu-Th 1:25 to 2:40 PM Hollister, 306 Instructor : Karthik Sridharan ABOUT THE COURSE No exams! 5 assignments that count towards your grades (55%) One term project (40%)

More information

http://imgs.xkcd.com/comics/electoral_precedent.png Statistical Learning Theory CS4780/5780 Machine Learning Fall 2012 Thorsten Joachims Cornell University Reading: Mitchell Chapter 7 (not 7.4.4 and 7.5)

More information

CONTENTS OF DAY 2. II. Why Random Sampling is Important 10 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

CONTENTS OF DAY 2. II. Why Random Sampling is Important 10 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE 1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 4 Problems with small populations 9 II. Why Random Sampling is Important 10 A myth,

More information

Lecture 3: Introduction to Complexity Regularization

Lecture 3: Introduction to Complexity Regularization ECE90 Spring 2007 Statistical Learning Theory Instructor: R. Nowak Lecture 3: Introduction to Complexity Regularization We ended the previous lecture with a brief discussion of overfitting. Recall that,

More information

Case Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday!

Case Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday! Case Study 1: Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 4, 017 1 Announcements:

More information

Targeted Group Sequential Adaptive Designs

Targeted Group Sequential Adaptive Designs Targeted Group Sequential Adaptive Designs Mark van der Laan Department of Biostatistics, University of California, Berkeley School of Public Health Liver Forum, May 10, 2017 Targeted Group Sequential

More information

Calculating Effect-Sizes. David B. Wilson, PhD George Mason University

Calculating Effect-Sizes. David B. Wilson, PhD George Mason University Calculating Effect-Sizes David B. Wilson, PhD George Mason University The Heart and Soul of Meta-analysis: The Effect Size Meta-analysis shifts focus from statistical significance to the direction and

More information

arxiv: v1 [stat.ml] 12 Feb 2018

arxiv: v1 [stat.ml] 12 Feb 2018 Practical Evaluation and Optimization of Contextual Bandit Algorithms arxiv:1802.04064v1 [stat.ml] 12 Feb 2018 Alberto Bietti Inria, MSR-Inria Joint Center alberto.bietti@inria.fr Alekh Agarwal Microsoft

More information

Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal

Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal Overview In observational and experimental studies, the goal may be to estimate the effect

More information

Tutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning

Tutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning Tutorial: PART 2 Online Convex Optimization, A Game- Theoretic Approach to Learning Elad Hazan Princeton University Satyen Kale Yahoo Research Exploiting curvature: logarithmic regret Logarithmic regret

More information

Importance Sampling for Fair Policy Selection

Importance Sampling for Fair Policy Selection Importance Sampling for Fair Policy Selection Shayan Doroudi Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 shayand@cs.cmu.edu Philip S. Thomas Computer Science Department

More information

Learning for Contextual Bandits

Learning for Contextual Bandits Learning for Contextual Bandits Alina Beygelzimer 1 John Langford 2 IBM Research 1 Yahoo! Research 2 NYC ML Meetup, Sept 21, 2010 Example of Learning through Exploration Repeatedly: 1. A user comes to

More information

Warm up. Regrade requests submitted directly in Gradescope, do not instructors.

Warm up. Regrade requests submitted directly in Gradescope, do not  instructors. Warm up Regrade requests submitted directly in Gradescope, do not email instructors. 1 float in NumPy = 8 bytes 10 6 2 20 bytes = 1 MB 10 9 2 30 bytes = 1 GB For each block compute the memory required

More information

Lecture 3: Causal inference

Lecture 3: Causal inference MACHINE LEARNING FOR HEALTHCARE 6.S897, HST.S53 Lecture 3: Causal inference Prof. David Sontag MIT EECS, CSAIL, IMES (Thanks to Uri Shalit for many of the slides) *Last week: Type 2 diabetes 1994 2000

More information

Lecture 9: Learning Optimal Dynamic Treatment Regimes. Donglin Zeng, Department of Biostatistics, University of North Carolina

Lecture 9: Learning Optimal Dynamic Treatment Regimes. Donglin Zeng, Department of Biostatistics, University of North Carolina Lecture 9: Learning Optimal Dynamic Treatment Regimes Introduction Refresh: Dynamic Treatment Regimes (DTRs) DTRs: sequential decision rules, tailored at each stage by patients time-varying features and

More information

Markov Models and Reinforcement Learning. Stephen G. Ware CSCI 4525 / 5525

Markov Models and Reinforcement Learning. Stephen G. Ware CSCI 4525 / 5525 Markov Models and Reinforcement Learning Stephen G. Ware CSCI 4525 / 5525 Camera Vacuum World (CVW) 2 discrete rooms with cameras that detect dirt. A mobile robot with a vacuum. The goal is to ensure both

More information

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression INTRODUCTION TO CLINICAL RESEARCH Introduction to Linear Regression Karen Bandeen-Roche, Ph.D. July 17, 2012 Acknowledgements Marie Diener-West Rick Thompson ICTR Leadership / Team JHU Intro to Clinical

More information

Statistical learning. Chapter 20, Sections 1 3 1

Statistical learning. Chapter 20, Sections 1 3 1 Statistical learning Chapter 20, Sections 1 3 Chapter 20, Sections 1 3 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

More information

EMERGING MARKETS - Lecture 2: Methodology refresher

EMERGING MARKETS - Lecture 2: Methodology refresher EMERGING MARKETS - Lecture 2: Methodology refresher Maria Perrotta April 4, 2013 SITE http://www.hhs.se/site/pages/default.aspx My contact: maria.perrotta@hhs.se Aim of this class There are many different

More information

Probabilistic Field Mapping for Product Search

Probabilistic Field Mapping for Product Search Probabilistic Field Mapping for Product Search Aman Berhane Ghirmatsion and Krisztian Balog University of Stavanger, Stavanger, Norway ab.ghirmatsion@stud.uis.no, krisztian.balog@uis.no, Abstract. This

More information

OPTIMIZING SEARCH ENGINES USING CLICKTHROUGH DATA. Paper By: Thorsten Joachims (Cornell University) Presented By: Roy Levin (Technion)

OPTIMIZING SEARCH ENGINES USING CLICKTHROUGH DATA. Paper By: Thorsten Joachims (Cornell University) Presented By: Roy Levin (Technion) OPTIMIZING SEARCH ENGINES USING CLICKTHROUGH DATA Paper By: Thorsten Joachims (Cornell University) Presented By: Roy Levin (Technion) Outline The idea The model Learning a ranking function Experimental

More information

A Novel Click Model and Its Applications to Online Advertising

A Novel Click Model and Its Applications to Online Advertising A Novel Click Model and Its Applications to Online Advertising Zeyuan Zhu Weizhu Chen Tom Minka Chenguang Zhu Zheng Chen February 5, 2010 1 Introduction Click Model - To model the user behavior Application

More information

Deep Reinforcement Learning: Policy Gradients and Q-Learning

Deep Reinforcement Learning: Policy Gradients and Q-Learning Deep Reinforcement Learning: Policy Gradients and Q-Learning John Schulman Bay Area Deep Learning School September 24, 2016 Introduction and Overview Aim of This Talk What is deep RL, and should I use

More information

Summary and discussion of The central role of the propensity score in observational studies for causal effects

Summary and discussion of The central role of the propensity score in observational studies for causal effects Summary and discussion of The central role of the propensity score in observational studies for causal effects Statistics Journal Club, 36-825 Jessica Chemali and Michael Vespe 1 Summary 1.1 Background

More information

Optimal ranking of online search requests for long-term revenue maximization. Pierre L Ecuyer

Optimal ranking of online search requests for long-term revenue maximization. Pierre L Ecuyer 1 Optimal ranking of online search requests for long-term revenue maximization Pierre L Ecuyer Patrick Maille, Nicola s Stier-Moses, Bruno Tuffin Technion, Israel, June 2016 Search engines 2 Major role

More information

Recommendation Systems

Recommendation Systems Recommendation Systems Pawan Goyal CSE, IITKGP October 21, 2014 Pawan Goyal (IIT Kharagpur) Recommendation Systems October 21, 2014 1 / 52 Recommendation System? Pawan Goyal (IIT Kharagpur) Recommendation

More information

arxiv: v2 [cs.lg] 27 May 2016

arxiv: v2 [cs.lg] 27 May 2016 Tobias Schnabel, Adith Swaminathan, Ashudeep Singh, Navin Chandak, Thorsten Joachims Cornell University, Ithaca, NY, USA {TBS49, FA234, AS3354, NC475, TJ36}@CORNELL.EDU arxiv:602.05352v2 [cs.lg] 27 May

More information

Deep Counterfactual Networks with Propensity-Dropout

Deep Counterfactual Networks with Propensity-Dropout with Propensity-Dropout Ahmed M. Alaa, 1 Michael Weisz, 2 Mihaela van der Schaar 1 2 3 Abstract We propose a novel approach for inferring the individualized causal effects of a treatment (intervention)

More information

Factor Modeling for Advertisement Targeting

Factor Modeling for Advertisement Targeting Ye Chen 1, Michael Kapralov 2, Dmitry Pavlov 3, John F. Canny 4 1 ebay Inc, 2 Stanford University, 3 Yandex Labs, 4 UC Berkeley NIPS-2009 Presented by Miao Liu May 27, 2010 Introduction GaP model Sponsored

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide

More information

Learning to Bid Without Knowing your Value. Chara Podimata Harvard University June 4, 2018

Learning to Bid Without Knowing your Value. Chara Podimata Harvard University June 4, 2018 Learning to Bid Without Knowing your Value Zhe Feng Harvard University zhe feng@g.harvard.edu Chara Podimata Harvard University podimata@g.harvard.edu June 4, 208 Vasilis Syrgkanis Microsoft Research vasy@microsoft.com

More information

Double Robustness. Bang and Robins (2005) Kang and Schafer (2007)

Double Robustness. Bang and Robins (2005) Kang and Schafer (2007) Double Robustness Bang and Robins (2005) Kang and Schafer (2007) Set-Up Assume throughout that treatment assignment is ignorable given covariates (similar to assumption that data are missing at random

More information

Weighting. Homework 2. Regression. Regression. Decisions Matching: Weighting (0) W i. (1) -å l i. )Y i. (1-W i 3/5/2014. (1) = Y i.

Weighting. Homework 2. Regression. Regression. Decisions Matching: Weighting (0) W i. (1) -å l i. )Y i. (1-W i 3/5/2014. (1) = Y i. Weighting Unconfounded Homework 2 Describe imbalance direction matters STA 320 Design and Analysis of Causal Studies Dr. Kari Lock Morgan and Dr. Fan Li Department of Statistical Science Duke University

More information

Data splitting. INSERM Workshop: Evaluation of predictive models: goodness-of-fit and predictive power #+TITLE:

Data splitting. INSERM Workshop: Evaluation of predictive models: goodness-of-fit and predictive power #+TITLE: #+TITLE: Data splitting INSERM Workshop: Evaluation of predictive models: goodness-of-fit and predictive power #+AUTHOR: Thomas Alexander Gerds #+INSTITUTE: Department of Biostatistics, University of Copenhagen

More information

Restricted Boltzmann Machines for Collaborative Filtering

Restricted Boltzmann Machines for Collaborative Filtering Restricted Boltzmann Machines for Collaborative Filtering Authors: Ruslan Salakhutdinov Andriy Mnih Geoffrey Hinton Benjamin Schwehn Presentation by: Ioan Stanculescu 1 Overview The Netflix prize problem

More information

15-889e Policy Search: Gradient Methods Emma Brunskill. All slides from David Silver (with EB adding minor modificafons), unless otherwise noted

15-889e Policy Search: Gradient Methods Emma Brunskill. All slides from David Silver (with EB adding minor modificafons), unless otherwise noted 15-889e Policy Search: Gradient Methods Emma Brunskill All slides from David Silver (with EB adding minor modificafons), unless otherwise noted Outline 1 Introduction 2 Finite Difference Policy Gradient

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

Lecture Topic 4: Chapter 7 Sampling and Sampling Distributions

Lecture Topic 4: Chapter 7 Sampling and Sampling Distributions Lecture Topic 4: Chapter 7 Sampling and Sampling Distributions Statistical Inference: The aim is to obtain information about a population from information contained in a sample. A population is the set

More information