Counterfactual Evaluation and Learning
|
|
- Noreen Hopkins
- 6 years ago
- Views:
Transcription
1 SIGIR 26 Tutorial Counterfactual Evaluation and Learning Adith Swaminathan, Thorsten Joachims Department of Computer Science & Department of Information Science Cornell University Website: Funded in part through NSF Awards IIS , IIS-27686, IIS
2 Examples Search engines Entertainment media E-commerce Smart homes, robots, etc. Logs of User Behavior for Evaluating system performance Learning improved systems and gathering knowledge Personalization User Interactive Systems
3 Interactive System Schematic Utility: U π Action y for x System π
4 Ad Placement Context x: User and page Action y: Ad that is placed Feedback δ x, y : Click / no-click
5 Context x: User Action y: Portfolio of newsarticles Feedback δ x, y : Reading time in minutes News Recommender
6 Search Engine Context x: Query Action y: Ranking Feedback δ x, y : win/loss against baseline in interleaving
7 Log Data from Interactive Systems Data context S = π action reward / loss x, y, δ,, x n, y n, δ n Partial Information (aka Contextual Bandit ) Feedback Properties Contexts x i drawn i.i.d. from unknown P(X) Actions y i selected by existing system π : X Y Feedback δ i from unknown function δ: X Y R [Zadrozny et al., 23] [Langford & Li], [Bottou, et al., 24]
8 Goals for this Tutorial Use interaction log data S = x, y, δ,, x n, y n, δ n for Evaluation: Estimate online measures of some system π offline. System π is typically different from π that generated log. Learning: Find new system π that improves performance over π. Do not rely on interactive experiments like in online learning.
9 SIGIR 26 Tutorial Counterfactual Evaluation and Learning PART : EVALUATION
10 Evaluation: Outline Evaluating Online Metrics Offline A/B Testing (on-policy) Counterfactual estimation from logs (off-policy) Approach : Model the world Estimation via reward prediction Approach 2: Model the bias Counterfactual Model Inverse propensity scoring (IPS) estimator Advanced Estimators Self-normalized IPS estimator Doubly robust estimator Slates estimator Case Studies Summary & Demonstration with code samples
11 Online Performance Metrics Example metrics CTR Revenue Time-to-success Interleaving Etc. Correct choice depends on application and is not the focus of this tutorial. This tutorial: Metric encoded as δ(x, y) [click/payoff/time for (x,y) pair]
12 System Definition [Deterministic Policy]: Function y = π(x) that picks action y for context x. Definition [Stochastic Policy]: Distribution π y x that samples action y given context x Y x Y x π x π (Y x) π 2 x π 2 (Y x)
13 Definition [Utility of Policy]: System Performance The expected reward / utility U(π) of policy π is U π = න න δ x, y π y x P x dx dy Y x i e.g. reading time of user x for portfolio y Y x j π(y x i ) π(y x j )
14 Online Evaluation: A/B Testing Given S = x, y, δ,, x n, y n, δ n collected under π, n U π = n i= δ i A/B Testing Deploy π : Draw x P X, predict y π Y x, get δ(x, y) Deploy π 2 : Draw x P X, predict y π 2 Y x, get δ(x, y) Deploy π H : Draw x P X, predict y π H Y x, get δ(x, y)
15 Pros and Cons of A/B Testing Pro User centric measure No need for manual ratings No user/expert mismatch Cons Requires interactive experimental control Risk of fielding a bad or buggy π i Number of A/B Tests limited Long turnaround time
16 Evaluating Online Metrics Offline Online: On-policy A/B Test Draw S from π U π Draw S 2 from π 2 U π 2 Draw S 3 from π 3 U π 3 Draw S 4 from π 4 U π 4 Draw S 5 from π 5 U π 5 Draw S 6 from π 6 U π 6 Draw S 7 from π 7 U π 7 Offline: Off-policy Counterfactual Estimates Draw S from π U h U h U h U h U h U h U h U h U h U h U h U h U h U h U h U h U h U h U h U h U h U h U h U h U h U π 6 U π 2 U π 8 U π 24 U π 3
17 Evaluation: Outline Evaluating Online Metrics Offline A/B Testing (on-policy) Counterfactual estimation from logs (off-policy) Approach : Model the world Estimation via reward prediction Approach 2: Model the bias Counterfactual Model Inverse propensity scoring (IPS) estimator Advanced Estimators Self-normalized IPS estimator Doubly robust estimator Slates estimator Case Studies Summary & Demonstration with code samples
18 Approach : Reward Predictor Idea: Use S = x, y, δ,, x n, y n, δ n from π to estimate reward predictor መδ x, y Deterministic π: Simulated A/B Testing with predicted መδ x, y For actions y i = π(x i ) from new policy π, generate predicted log S = x, y, መδ x, y,, x n, y n, መδ x n, y n Y x δ x, y δ x, y 2 Estimate performace of π via U rp π = σ n n i= መδ x i, y i Stochastic π: U rp π = σ n i= n σ y መδ x i, y π(y x i ) መδ x, y δ x, y
19 Regression for Reward Prediction Learn መδ: x y R. Represent via features Ψ x, y 2. Learn regression based on Ψ x, y from S collected under π 3. Predict መδ x, y for y = π(x) of new policy π Ψ መδ(x, y) δ x, y Ψ 2
20 News Recommender: Exp Setup Context x: User profile Action y: Ranking Pick from 7 candidates to place into 3 slots Reward δ: Revenue Complicated hidden function Logging policy π : Non-uniform randomized logging system Placket-Luce explore around current production ranker (see case study)
21 News Recommender: Results RP is inaccurate even with more training and logged data
22 Problems of Reward Predictor Modeling bias choice of features and model Selection bias π s actions are overrepresented Ψ መδ(x, y) δ x, π x U rp π = n i መδ x i, π x i Can be unreliable and biased Ψ 2
23 Evaluation: Outline Evaluating Online Metrics Offline A/B Testing (on-policy) Counterfactual estimation from logs (off-policy) Approach : Model the world Estimation via reward prediction Approach 2: Model the bias Counterfactual Model Inverse propensity scoring (IPS) estimator Advanced Estimators Self-normalized IPS estimator Doubly robust estimator Slates estimator Case Studies Summary & Demonstration with code samples
24 Idea: Approach Model the Bias Fix the mismatch between the distribution π Y x that generated the data and the distribution π Y x we aim to evaluate. π U π π y x = න න δ x, y π y x P x dx dy
25 Patients x i,..., n Counterfactual Model Example: Treating Heart Attacks Treatments: Y Bypass / Stent / Drugs Chosen treatment for patient x i : y i Outcomes: δ i 5-year survival: / Which treatment is best?
26 Patients i,..., n Placing Vertical Example: Treating Heart Attacks Treatments: Y Bypass / Stent / Drugs Chosen treatment for patient x i : y i Outcomes: δ i 5-year survival: / Which treatment is best? Counterfactual Model Pos / Pos 2/ Pos 3 Click / no Click on SERP Pos Pos 2 Pos 3
27 Patients x i, i,..., n Counterfactual Model Example: Treating Heart Attacks Treatments: Y Bypass / Stent / Drugs Chosen treatment for patient x i : y i Outcomes: δ i 5-year survival: / Which treatment is best? Everybody Drugs Everybody Stent Everybody Bypass Drugs 3/4, Stent 2/3, Bypass 2/4 really?
28 Patients Treatment Effects Average Treatment Effect of Treatment y U y = n σ i δ(x i, y) Example U bypass = 5 U stent = 7 U drugs = 4 Factual Outcome Counterfactual Outcomes
29 Patients Assignment Mechanism Probabilistic Treatment Assignment For patient i: π Y i = y x i Selection Bias Inverse Propensity Score Estimator Propensity: p i = π Y i = y i x i Unbiased: E U y =U y, if π Y i = y x i > for all i Example U ips y = n i I{y i = y} p i δ(x i, y i ) U drugs = =.36 <.75 π Y i = y x i
30 Experimental vs Observational Controlled Experiment Assignment Mechanism under our control Propensities p i = π Y i = y i x i are known by design Requirement: y: π Y i = y x i > (probabilistic) Observational Study Assignment Mechanism not under our control Propensities p i need to be estimated Estimate π Y i z i = π Y i x i ) based on features z i Requirement: π Y i z i ) = π Y i δ i, z i (unconfounded)
31 Patients Conditional Treatment Policies Policy (deterministic) Context x i describing patient Pick treatment y i based on x i : y i = π(x i ) Example policy: π A = drugs, π B = stent, π C = bypass Average Treatment Effect U π = n σ i δ(x i, π x i ) IPS Estimator U ips π = n i I{y i = π(x i )} p i δ(x i, y i ) B C A B A B A C A C B
32 Patients Stochastic Treatment Policies Policy (stochastic) Context x i describing patient Pick treatment y based on x i : π(y x i ) Note Assignment Mechanism is a stochastic policy as well! Average Treatment Effect U π = n σ i σ y δ(x i, y)π y x i IPS Estimator U π = n σ i π y i x i p i δ(x i, y i ) B C A B A B A C A C B
33 Recorded in Log Counterfactual Model = Logs Medical Search Engine Ad Placement Recommender Context x i Diagnostics Query User + Page User + Movie Treatment y i BP/Stent/Drugs Ranking Placed Ad Watched Movie Outcome δ i Survival Click metric Click / no Click Star rating Propensities p i controlled (*) controlled controlled observational New Policy π FDA Guidelines Ranker Ad Placer Recommender T-effect U(π) Average quality of new policy.
34 Evaluation: Outline Evaluating Online Metrics Offline A/B Testing (on-policy) Counterfactual estimation from logs (off-policy) Approach : Model the world Estimation via reward prediction Approach 2: Model the bias Counterfactual Model Inverse propensity scoring (IPS) estimator Advanced Estimators Self-normalized IPS estimator Doubly robust estimator Slates estimator Case Studies Summary & Demonstration with code samples
35 System Evaluation via Inverse Propensity Scoring Definition [IPS Utility Estimator]: Given S = x, y, δ,, x n, y n, δ n collected under π, U ips π = n i= Unbiased estimate of utility for any π, if propensity nonzero whenever π y i x i >. Note: If π = π, then online A/B Test with U ips π = n δ i Off-policy vs. On-policy estimation. i n δ i π y i x i π y i x i Propensity p i [Horvitz & Thompson, 952] [Rubin, 983] [Zadrozny et al., 23] [Li et al., 2]
36 Illustration of IPS IPS Estimator: U IPS π = n i π y i x i π y i x i δ i π Y x π(y x)
37 IPS Estimator is Unbiased E U π = n x,y x n,y n i π y i x i π y i x i ) δ x i, y i π y x π y n x n P x P(x n ) = n π y i x i π y x P(x ) π y n x n P(x n ) π y i x i ) δ x i, y i x,y x n,y n i = n i π y x P(x ) π y n x n P(x n ) x,y x n,y n π y i x i π y i x i ) δ x i, y i Probabilistic Assignment = n i = n i π y i x i P(x i ) x i,y i π y i x i π y i x i ) δ x i, y i π y i x i P(x i )δ x i, y i = x i,y i n i U(π) = U π
38 News Recommender: Results IPS eventually beats RP; variance decays as O n
39 Adith takes over
Counterfactual Model for Learning Systems
Counterfactual Model for Learning Systems CS 7792 - Fall 28 Thorsten Joachims Department of Computer Science & Department of Information Science Cornell University Imbens, Rubin, Causal Inference for Statistical
More informationOutline. Offline Evaluation of Online Metrics Counterfactual Estimation Advanced Estimators. Case Studies & Demo Summary
Outline Offline Evaluation of Online Metrics Counterfactual Estimation Advanced Estimators 1. Self-Normalized Estimator 2. Doubly Robust Estimator 3. Slates Estimator Case Studies & Demo Summary IPS: Issues
More informationRecommendations as Treatments: Debiasing Learning and Evaluation
ICML 2016, NYC Tobias Schnabel, Adith Swaminathan, Ashudeep Singh, Navin Chandak, Thorsten Joachims Cornell University, Google Funded in part through NSF Awards IIS-1247637, IIS-1217686, IIS-1513692. Romance
More informationReducing contextual bandits to supervised learning
Reducing contextual bandits to supervised learning Daniel Hsu Columbia University Based on joint work with A. Agarwal, S. Kale, J. Langford, L. Li, and R. Schapire 1 Learning to interact: example #1 Practicing
More informationDoubly Robust Policy Evaluation and Learning
Doubly Robust Policy Evaluation and Learning Miroslav Dudik, John Langford and Lihong Li Yahoo! Research Discussed by Miao Liu October 9, 2011 October 9, 2011 1 / 17 1 Introduction 2 Problem Definition
More informationCounterfactual Evaluation and Learning from Logged User Feedback
Counterfactual Evaluation and Learning from Logged User Feedback Adith Swaminathan adith@cs.cornell.edu Department of Computer Science Cornell University Committee: Thorsten Joachims (chair), Johannes
More informationInformation Retrieval
Information Retrieval Online Evaluation Ilya Markov i.markov@uva.nl University of Amsterdam Ilya Markov i.markov@uva.nl Information Retrieval 1 Course overview Offline Data Acquisition Data Processing
More informationLearning with Exploration
Learning with Exploration John Langford (Yahoo!) { With help from many } Austin, March 24, 2011 Yahoo! wants to interactively choose content and use the observed feedback to improve future content choices.
More informationarxiv: v2 [cs.lg] 26 Jun 2017
Effective Evaluation Using Logged Bandit Feedback from Multiple Loggers Aman Agarwal, Soumya Basu, Tobias Schnabel, Thorsten Joachims Cornell University, Dept. of Computer Science Ithaca, NY, USA [aa2398,sb2352,tbs49,tj36]@cornell.edu
More informationOffline Evaluation of Ranking Policies with Click Models
Offline Evaluation of Ranking Policies with Click Models Shuai Li The Chinese University of Hong Kong Joint work with Yasin Abbasi-Yadkori (Adobe Research) Branislav Kveton (Google Research, was in Adobe
More informationLearning from Rational * Behavior
Learning from Rational * Behavior Josef Broder, Olivier Chapelle, Geri Gay, Arpita Ghosh, Laura Granka, Thorsten Joachims, Bobby Kleinberg, Madhu Kurup, Filip Radlinski, Karthik Raman, Tobias Schnabel,
More informationLarge-scale Information Processing, Summer Recommender Systems (part 2)
Large-scale Information Processing, Summer 2015 5 th Exercise Recommender Systems (part 2) Emmanouil Tzouridis tzouridis@kma.informatik.tu-darmstadt.de Knowledge Mining & Assessment SVM question When a
More informationBayesian Contextual Multi-armed Bandits
Bayesian Contextual Multi-armed Bandits Xiaoting Zhao Joint Work with Peter I. Frazier School of Operations Research and Information Engineering Cornell University October 22, 2012 1 / 33 Outline 1 Motivating
More informationThe Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks
The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks Yingfei Wang, Chu Wang and Warren B. Powell Princeton University Yingfei Wang Optimal Learning Methods June 22, 2016
More informationCOUNTERFACTUAL REASONING AND LEARNING SYSTEMS LÉON BOTTOU
COUNTERFACTUAL REASONING AND LEARNING SYSTEMS LÉON BOTTOU MSR Joint work with Jonas Peters Joaquin Quiñonero Candela Denis Xavier Charles D. Max Chickering Elon Portugaly Dipankar Ray Patrice Simard Ed
More informationSlides at:
Learning to Interact John Langford @ Microsoft Research (with help from many) Slides at: http://hunch.net/~jl/interact.pdf For demo: Raw RCV1 CCAT-or-not: http://hunch.net/~jl/vw_raw.tar.gz Simple converter:
More informationAutomated Tuning of Ad Auctions
Automated Tuning of Ad Auctions Dilan Gorur, Debabrata Sengupta, Levi Boyles, Patrick Jordan, Eren Manavoglu, Elon Portugaly, Meng Wei, Yaojia Zhu Microsoft {dgorur, desengup, leboyles, pajordan, ermana,
More informationOff-policy Evaluation and Learning
COMS E6998.001: Bandits and Reinforcement Learning Fall 2017 Off-policy Evaluation and Learning Lecturer: Alekh Agarwal Lecture 7, October 11 Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:
More informationECE521 Lecture7. Logistic Regression
ECE521 Lecture7 Logistic Regression Outline Review of decision theory Logistic regression A single neuron Multi-class classification 2 Outline Decision theory is conceptually easy and computationally hard
More informationWeighting Methods. Harvard University STAT186/GOV2002 CAUSAL INFERENCE. Fall Kosuke Imai
Weighting Methods Kosuke Imai Harvard University STAT186/GOV2002 CAUSAL INFERENCE Fall 2018 Kosuke Imai (Harvard) Weighting Methods Stat186/Gov2002 Fall 2018 1 / 13 Motivation Matching methods for improving
More informationNew Algorithms for Contextual Bandits
New Algorithms for Contextual Bandits Lev Reyzin Georgia Institute of Technology Work done at Yahoo! 1 S A. Beygelzimer, J. Langford, L. Li, L. Reyzin, R.E. Schapire Contextual Bandit Algorithms with Supervised
More informationCausal Inference Basics
Causal Inference Basics Sam Lendle October 09, 2013 Observed data, question, counterfactuals Observed data: n i.i.d copies of baseline covariates W, treatment A {0, 1}, and outcome Y. O i = (W i, A i,
More informationBandit Algorithms. Zhifeng Wang ... Department of Statistics Florida State University
Bandit Algorithms Zhifeng Wang Department of Statistics Florida State University Outline Multi-Armed Bandits (MAB) Exploration-First Epsilon-Greedy Softmax UCB Thompson Sampling Adversarial Bandits Exp3
More informationCounterfactual Learning-to-Rank for Additive Metrics and Deep Models
Counterfactual Learning-to-Rank for Additive Metrics and Deep Models Aman Agarwal Cornell University Ithaca, NY aman@cs.cornell.edu Ivan Zaitsev Cornell University Ithaca, NY iz44@cornell.edu Thorsten
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationActive Learning and Optimized Information Gathering
Active Learning and Optimized Information Gathering Lecture 7 Learning Theory CS 101.2 Andreas Krause Announcements Project proposal: Due tomorrow 1/27 Homework 1: Due Thursday 1/29 Any time is ok. Office
More informationEstimation Considerations in Contextual Bandits
Estimation Considerations in Contextual Bandits Maria Dimakopoulou Zhengyuan Zhou Susan Athey Guido Imbens arxiv:1711.07077v4 [stat.ml] 16 Dec 2018 Abstract Contextual bandit algorithms are sensitive to
More informationLearning from Logged Implicit Exploration Data
Learning from Logged Implicit Exploration Data Alexander L. Strehl Facebook Inc. 1601 S California Ave Palo Alto, CA 94304 astrehl@facebook.com Lihong Li Yahoo! Research 4401 Great America Parkway Santa
More information1 [15 points] Frequent Itemsets Generation With Map-Reduce
Data Mining Learning from Large Data Sets Final Exam Date: 15 August 2013 Time limit: 120 minutes Number of pages: 11 Maximum score: 100 points You can use the back of the pages if you run out of space.
More informationDeep-Treat: Learning Optimal Personalized Treatments from Observational Data using Neural Networks
Deep-Treat: Learning Optimal Personalized Treatments from Observational Data using Neural Networks Onur Atan 1, James Jordon 2, Mihaela van der Schaar 2,3,1 1 Electrical Engineering Department, University
More informationDoubly Robust Policy Evaluation and Optimization 1
Statistical Science 2014, Vol. 29, No. 4, 485 511 DOI: 10.1214/14-STS500 c Institute of Mathematical Statistics, 2014 Doubly Robust Policy Evaluation and Optimization 1 Miroslav Dudík, Dumitru Erhan, John
More informationOnline Learning with Feedback Graphs
Online Learning with Feedback Graphs Claudio Gentile INRIA and Google NY clagentile@gmailcom NYC March 6th, 2018 1 Content of this lecture Regret analysis of sequential prediction problems lying between
More informationLinear Regression. Machine Learning CSE546 Kevin Jamieson University of Washington. Oct 5, Kevin Jamieson 1
Linear Regression Machine Learning CSE546 Kevin Jamieson University of Washington Oct 5, 2017 1 The regression problem Given past sales data on zillow.com, predict: y = House sale price from x = {# sq.
More informationA Sampling of IMPACT Research:
A Sampling of IMPACT Research: Methods for Analysis with Dropout and Identifying Optimal Treatment Regimes Marie Davidian Department of Statistics North Carolina State University http://www.stat.ncsu.edu/
More informationModels, Data, Learning Problems
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Models, Data, Learning Problems Tobias Scheffer Overview Types of learning problems: Supervised Learning (Classification, Regression,
More informationMachine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods)
Machine Learning InstanceBased Learning (aka nonparametric methods) Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Non parametric CSE 446 Machine Learning Daniel Weld March
More informationLinear classifiers: Overfitting and regularization
Linear classifiers: Overfitting and regularization Emily Fox University of Washington January 25, 2017 Logistic regression recap 1 . Thus far, we focused on decision boundaries Score(x i ) = w 0 h 0 (x
More informationRecap from previous lecture
Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience
More informationBayesian Ranker Comparison Based on Historical User Interactions
Bayesian Ranker Comparison Based on Historical User Interactions Artem Grotov a.grotov@uva.nl Shimon Whiteson s.a.whiteson@uva.nl University of Amsterdam, Amsterdam, The Netherlands Maarten de Rijke derijke@uva.nl
More informationOptimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai
Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective Anastasios (Butch) Tsiatis and Xiaofei Bai Department of Statistics North Carolina State University 1/35 Optimal Treatment
More informationDoubly Robust Policy Evaluation and Optimization 1
Statistical Science 2014, Vol. 29, No. 4, 485 511 DOI: 10.1214/14-STS500 Institute of Mathematical Statistics, 2014 Doubly Robust Policy valuation and Optimization 1 Miroslav Dudík, Dumitru rhan, John
More informationPropensity Score Weighting with Multilevel Data
Propensity Score Weighting with Multilevel Data Fan Li Department of Statistical Science Duke University October 25, 2012 Joint work with Alan Zaslavsky and Mary Beth Landrum Introduction In comparative
More informationDiscrete Latent Variable Models
Discrete Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 14 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 14 1 / 29 Summary Major themes in the course
More informationDistribution-Free Distribution Regression
Distribution-Free Distribution Regression Barnabás Póczos, Alessandro Rinaldo, Aarti Singh and Larry Wasserman AISTATS 2013 Presented by Esther Salazar Duke University February 28, 2014 E. Salazar (Reading
More informationOnline Learning and Sequential Decision Making
Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Sequential Decision
More informationAdministration. CSCI567 Machine Learning (Fall 2018) Outline. Outline. HW5 is available, due on 11/18. Practice final will also be available soon.
Administration CSCI567 Machine Learning Fall 2018 Prof. Haipeng Luo U of Southern California Nov 7, 2018 HW5 is available, due on 11/18. Practice final will also be available soon. Remaining weeks: 11/14,
More informationPropensity Score Analysis with Hierarchical Data
Propensity Score Analysis with Hierarchical Data Fan Li Alan Zaslavsky Mary Beth Landrum Department of Health Care Policy Harvard Medical School May 19, 2008 Introduction Population-based observational
More informationInterventions. Online Learning from User Interactions through Interventions. Interactive Learning Systems
Online Learning from Interactions through Interventions CS 7792 - Fall 2016 horsten Joachims Department of Computer Science & Department of Information Science Cornell University Y. Yue, J. Broder, R.
More informationExploration Scavenging
John Langford jl@yahoo-inc.com Alexander Strehl strehl@yahoo-inc.com Yahoo! Research, 111 W. 40th Street, New York, New York 10018 Jennifer Wortman wortmanj@seas.upenn.edu Department of Computer and Information
More informationCausal Inference with Big Data Sets
Causal Inference with Big Data Sets Marcelo Coca Perraillon University of Colorado AMC November 2016 1 / 1 Outlone Outline Big data Causal inference in economics and statistics Regression discontinuity
More informationCollaborative Filtering. Radek Pelánek
Collaborative Filtering Radek Pelánek 2017 Notes on Lecture the most technical lecture of the course includes some scary looking math, but typically with intuitive interpretation use of standard machine
More informationIntroduction to Restricted Boltzmann Machines
Introduction to Restricted Boltzmann Machines Ilija Bogunovic and Edo Collins EPFL {ilija.bogunovic,edo.collins}@epfl.ch October 13, 2014 Introduction Ingredients: 1. Probabilistic graphical models (undirected,
More informationBehavioral Data Mining. Lecture 19 Regression and Causal Effects
Behavioral Data Mining Lecture 19 Regression and Causal Effects Outline Counterfactuals and Potential Outcomes Regression Models Causal Effects from Matching and Regression Weighted regression Counterfactuals
More informationCSE446: non-parametric methods Spring 2017
CSE446: non-parametric methods Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin and Luke Zettlemoyer Linear Regression: What can go wrong? What do we do if the bias is too strong? Might want
More informationUnbiased Learning to Rank with Unbiased Propensity Estimation
Unbiased Learning to Rank with Unbiased Propensity Estimation Qingyao Ai CICS, UMass Amherst Amherst, MA, USA aiqy@cs.umass.edu Keping Bi CICS, UMass Amherst Amherst, MA, USA kbi@cs.umass.edu Cheng Luo
More informationarxiv: v2 [cs.lg] 26 Oct 2016
Predicting Counterfactuals from Large Historical Data and Small Randomized rials Nir Rosenfeld Hebrew University of Jerusalem and Microsoft Research nir.rosenfeld@mail.huji.ac.il Yishay Mansour Microsoft
More informationThe Self-Normalized Estimator for Counterfactual Learning
The Self-Normalized Estimator for Counterfactual Learning Adith Swaminathan Department of Computer Science Cornell University adith@cs.cornell.edu Thorsten Joachims Department of Computer Science Cornell
More informationDiagnostics. Gad Kimmel
Diagnostics Gad Kimmel Outline Introduction. Bootstrap method. Cross validation. ROC plot. Introduction Motivation Estimating properties of an estimator. Given data samples say the average. x 1, x 2,...,
More informationStochastic Gradient Descent
Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular
More informationL09. PARTICLE FILTERING. NA568 Mobile Robotics: Methods & Algorithms
L09. PARTICLE FILTERING NA568 Mobile Robotics: Methods & Algorithms Particle Filters Different approach to state estimation Instead of parametric description of state (and uncertainty), use a set of state
More informationStatistical Learning Theory: Generalization Error Bounds
Statistical Learning Theory: Generalization Error Bounds CS6780 Advanced Machine Learning Spring 2015 Thorsten Joachims Cornell University Reading: Murphy 6.5.4 Schoelkopf/Smola Chapter 5 (beginning, rest
More informationMachine Learning Theory (CS 6783)
Machine Learning Theory (CS 6783) Tu-Th 1:25 to 2:40 PM Kimball, B-11 Instructor : Karthik Sridharan ABOUT THE COURSE No exams! 5 assignments that count towards your grades (55%) One term project (40%)
More informationMachine Learning Theory (CS 6783)
Machine Learning Theory (CS 6783) Tu-Th 1:25 to 2:40 PM Hollister, 306 Instructor : Karthik Sridharan ABOUT THE COURSE No exams! 5 assignments that count towards your grades (55%) One term project (40%)
More informationhttp://imgs.xkcd.com/comics/electoral_precedent.png Statistical Learning Theory CS4780/5780 Machine Learning Fall 2012 Thorsten Joachims Cornell University Reading: Mitchell Chapter 7 (not 7.4.4 and 7.5)
More informationCONTENTS OF DAY 2. II. Why Random Sampling is Important 10 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE
1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 4 Problems with small populations 9 II. Why Random Sampling is Important 10 A myth,
More informationLecture 3: Introduction to Complexity Regularization
ECE90 Spring 2007 Statistical Learning Theory Instructor: R. Nowak Lecture 3: Introduction to Complexity Regularization We ended the previous lecture with a brief discussion of overfitting. Recall that,
More informationCase Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday!
Case Study 1: Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 4, 017 1 Announcements:
More informationTargeted Group Sequential Adaptive Designs
Targeted Group Sequential Adaptive Designs Mark van der Laan Department of Biostatistics, University of California, Berkeley School of Public Health Liver Forum, May 10, 2017 Targeted Group Sequential
More informationCalculating Effect-Sizes. David B. Wilson, PhD George Mason University
Calculating Effect-Sizes David B. Wilson, PhD George Mason University The Heart and Soul of Meta-analysis: The Effect Size Meta-analysis shifts focus from statistical significance to the direction and
More informationarxiv: v1 [stat.ml] 12 Feb 2018
Practical Evaluation and Optimization of Contextual Bandit Algorithms arxiv:1802.04064v1 [stat.ml] 12 Feb 2018 Alberto Bietti Inria, MSR-Inria Joint Center alberto.bietti@inria.fr Alekh Agarwal Microsoft
More informationMarginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal
Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal Overview In observational and experimental studies, the goal may be to estimate the effect
More informationTutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning
Tutorial: PART 2 Online Convex Optimization, A Game- Theoretic Approach to Learning Elad Hazan Princeton University Satyen Kale Yahoo Research Exploiting curvature: logarithmic regret Logarithmic regret
More informationImportance Sampling for Fair Policy Selection
Importance Sampling for Fair Policy Selection Shayan Doroudi Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 shayand@cs.cmu.edu Philip S. Thomas Computer Science Department
More informationLearning for Contextual Bandits
Learning for Contextual Bandits Alina Beygelzimer 1 John Langford 2 IBM Research 1 Yahoo! Research 2 NYC ML Meetup, Sept 21, 2010 Example of Learning through Exploration Repeatedly: 1. A user comes to
More informationWarm up. Regrade requests submitted directly in Gradescope, do not instructors.
Warm up Regrade requests submitted directly in Gradescope, do not email instructors. 1 float in NumPy = 8 bytes 10 6 2 20 bytes = 1 MB 10 9 2 30 bytes = 1 GB For each block compute the memory required
More informationLecture 3: Causal inference
MACHINE LEARNING FOR HEALTHCARE 6.S897, HST.S53 Lecture 3: Causal inference Prof. David Sontag MIT EECS, CSAIL, IMES (Thanks to Uri Shalit for many of the slides) *Last week: Type 2 diabetes 1994 2000
More informationLecture 9: Learning Optimal Dynamic Treatment Regimes. Donglin Zeng, Department of Biostatistics, University of North Carolina
Lecture 9: Learning Optimal Dynamic Treatment Regimes Introduction Refresh: Dynamic Treatment Regimes (DTRs) DTRs: sequential decision rules, tailored at each stage by patients time-varying features and
More informationMarkov Models and Reinforcement Learning. Stephen G. Ware CSCI 4525 / 5525
Markov Models and Reinforcement Learning Stephen G. Ware CSCI 4525 / 5525 Camera Vacuum World (CVW) 2 discrete rooms with cameras that detect dirt. A mobile robot with a vacuum. The goal is to ensure both
More informationAcknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression
INTRODUCTION TO CLINICAL RESEARCH Introduction to Linear Regression Karen Bandeen-Roche, Ph.D. July 17, 2012 Acknowledgements Marie Diener-West Rick Thompson ICTR Leadership / Team JHU Intro to Clinical
More informationStatistical learning. Chapter 20, Sections 1 3 1
Statistical learning Chapter 20, Sections 1 3 Chapter 20, Sections 1 3 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationEMERGING MARKETS - Lecture 2: Methodology refresher
EMERGING MARKETS - Lecture 2: Methodology refresher Maria Perrotta April 4, 2013 SITE http://www.hhs.se/site/pages/default.aspx My contact: maria.perrotta@hhs.se Aim of this class There are many different
More informationProbabilistic Field Mapping for Product Search
Probabilistic Field Mapping for Product Search Aman Berhane Ghirmatsion and Krisztian Balog University of Stavanger, Stavanger, Norway ab.ghirmatsion@stud.uis.no, krisztian.balog@uis.no, Abstract. This
More informationOPTIMIZING SEARCH ENGINES USING CLICKTHROUGH DATA. Paper By: Thorsten Joachims (Cornell University) Presented By: Roy Levin (Technion)
OPTIMIZING SEARCH ENGINES USING CLICKTHROUGH DATA Paper By: Thorsten Joachims (Cornell University) Presented By: Roy Levin (Technion) Outline The idea The model Learning a ranking function Experimental
More informationA Novel Click Model and Its Applications to Online Advertising
A Novel Click Model and Its Applications to Online Advertising Zeyuan Zhu Weizhu Chen Tom Minka Chenguang Zhu Zheng Chen February 5, 2010 1 Introduction Click Model - To model the user behavior Application
More informationDeep Reinforcement Learning: Policy Gradients and Q-Learning
Deep Reinforcement Learning: Policy Gradients and Q-Learning John Schulman Bay Area Deep Learning School September 24, 2016 Introduction and Overview Aim of This Talk What is deep RL, and should I use
More informationSummary and discussion of The central role of the propensity score in observational studies for causal effects
Summary and discussion of The central role of the propensity score in observational studies for causal effects Statistics Journal Club, 36-825 Jessica Chemali and Michael Vespe 1 Summary 1.1 Background
More informationOptimal ranking of online search requests for long-term revenue maximization. Pierre L Ecuyer
1 Optimal ranking of online search requests for long-term revenue maximization Pierre L Ecuyer Patrick Maille, Nicola s Stier-Moses, Bruno Tuffin Technion, Israel, June 2016 Search engines 2 Major role
More informationRecommendation Systems
Recommendation Systems Pawan Goyal CSE, IITKGP October 21, 2014 Pawan Goyal (IIT Kharagpur) Recommendation Systems October 21, 2014 1 / 52 Recommendation System? Pawan Goyal (IIT Kharagpur) Recommendation
More informationarxiv: v2 [cs.lg] 27 May 2016
Tobias Schnabel, Adith Swaminathan, Ashudeep Singh, Navin Chandak, Thorsten Joachims Cornell University, Ithaca, NY, USA {TBS49, FA234, AS3354, NC475, TJ36}@CORNELL.EDU arxiv:602.05352v2 [cs.lg] 27 May
More informationDeep Counterfactual Networks with Propensity-Dropout
with Propensity-Dropout Ahmed M. Alaa, 1 Michael Weisz, 2 Mihaela van der Schaar 1 2 3 Abstract We propose a novel approach for inferring the individualized causal effects of a treatment (intervention)
More informationFactor Modeling for Advertisement Targeting
Ye Chen 1, Michael Kapralov 2, Dmitry Pavlov 3, John F. Canny 4 1 ebay Inc, 2 Stanford University, 3 Yandex Labs, 4 UC Berkeley NIPS-2009 Presented by Miao Liu May 27, 2010 Introduction GaP model Sponsored
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide
More informationLearning to Bid Without Knowing your Value. Chara Podimata Harvard University June 4, 2018
Learning to Bid Without Knowing your Value Zhe Feng Harvard University zhe feng@g.harvard.edu Chara Podimata Harvard University podimata@g.harvard.edu June 4, 208 Vasilis Syrgkanis Microsoft Research vasy@microsoft.com
More informationDouble Robustness. Bang and Robins (2005) Kang and Schafer (2007)
Double Robustness Bang and Robins (2005) Kang and Schafer (2007) Set-Up Assume throughout that treatment assignment is ignorable given covariates (similar to assumption that data are missing at random
More informationWeighting. Homework 2. Regression. Regression. Decisions Matching: Weighting (0) W i. (1) -å l i. )Y i. (1-W i 3/5/2014. (1) = Y i.
Weighting Unconfounded Homework 2 Describe imbalance direction matters STA 320 Design and Analysis of Causal Studies Dr. Kari Lock Morgan and Dr. Fan Li Department of Statistical Science Duke University
More informationData splitting. INSERM Workshop: Evaluation of predictive models: goodness-of-fit and predictive power #+TITLE:
#+TITLE: Data splitting INSERM Workshop: Evaluation of predictive models: goodness-of-fit and predictive power #+AUTHOR: Thomas Alexander Gerds #+INSTITUTE: Department of Biostatistics, University of Copenhagen
More informationRestricted Boltzmann Machines for Collaborative Filtering
Restricted Boltzmann Machines for Collaborative Filtering Authors: Ruslan Salakhutdinov Andriy Mnih Geoffrey Hinton Benjamin Schwehn Presentation by: Ioan Stanculescu 1 Overview The Netflix prize problem
More information15-889e Policy Search: Gradient Methods Emma Brunskill. All slides from David Silver (with EB adding minor modificafons), unless otherwise noted
15-889e Policy Search: Gradient Methods Emma Brunskill All slides from David Silver (with EB adding minor modificafons), unless otherwise noted Outline 1 Introduction 2 Finite Difference Policy Gradient
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationLecture Topic 4: Chapter 7 Sampling and Sampling Distributions
Lecture Topic 4: Chapter 7 Sampling and Sampling Distributions Statistical Inference: The aim is to obtain information about a population from information contained in a sample. A population is the set
More information