Outline. Offline Evaluation of Online Metrics Counterfactual Estimation Advanced Estimators. Case Studies & Demo Summary

Size: px

Start display at page:

Download "Outline. Offline Evaluation of Online Metrics Counterfactual Estimation Advanced Estimators. Case Studies & Demo Summary"

Marcus Nelson
6 years ago
Views:

1 Outline Offline Evaluation of Online Metrics Counterfactual Estimation Advanced Estimators 1. Self-Normalized Estimator 2. Doubly Robust Estimator 3. Slates Estimator Case Studies & Demo Summary

2 IPS: Issues U ips π = 1 n i π(y i x i ) π 0 (y i x i ) δ i Suppose δ 1 E[Constant] Constant Two variables in IPS δ π(y x) Τπ 0 (y x)

3 Fix: Control Variates Use correlated quantities to control Multiplicative θ E[s] E s i = θ known Τ π(y x) π 0 (y x) variability Additive π(y x) π y x E[ δ] E π 0 (y x) π 0 y x δ s + θ e.g. Self-Normalization e.g. Doubly Robust Estimator

4 Self-Normalized Estimator Use expected sample size as multiplicative control variate sƹ π = 1 n i π(y i x i ) π 0 (y i x i ) E sƹ π = 1 U SNips π = i π(y i x i ) π 0 (y i x i ) δ i i π(y i x i ) π 0 (y i x i ) [Trotter & Tukey, 1952] [Hesterberg, 1983] [Swaminathan & Joachims, 2016]

5 Self-Normalization: Properties E[Constant] = Constant Equivariant Asymptotically consistent Pr lim n U SNips π = U π = 1 Small bias which decays O( 1 n ) while variance decays O( 1 n )

6 News Recommender: Results snips often achieves a better bias-variance trade-off

7 Doubly Robust Estimator Reward Prediction U rp π = 1 E n y~π xi [ መδ x i, y ] i Low variance, High bias U ips π = 1 n i IPS π(y i x i ) π 0 (y i x i ) δ i High variance, No bias U dr π = 1 n i π y i x i π 0 y i x i δ i መδ x i, y i + E y~π xi [ መδ x i, y ] [Langford, Li, Dudik, 2011] [Jiang & Li, 2016]

8 Doubly Robust: Properties Useful when using estimated propensities p i π 0 (y i x i ) Unbiased if, either መδ(x, y) = δ(x, y) Or, p i = π 0 (y i x i ) Default in Vowpal Wabbit

9 News Recommender: Results DR dominates IPS even with a noisy መδ(x, y)

10 Evaluating rankings (slates) y i π(x i ) Exact match of composite actions in logs unlikely Idea: Count per-slot matches

11 Slates Estimator If π 0 samples l documents from a multinomial μ(d x), with replacement U slates π = 1 n i l I y i j = π x i 1 l + μ y i j x i ) j=1 j δ i For general π 0, need to record π 0 (y j = d, y k = d x)

12 Slates Estimator: General π 0 Define: Γ π0 x d, j ; d, k = π 0 y j = d, y k = d x 1 y d, j = I y j = d U slates π = 1 n i E y~π xi 1 T y Γ π0 (x i ) 1 yi δ i Can also develop self-normalized/doubly robust variants [Swaminathan et al, 2016]

13 Slates Estimator: Properties Typically, exponentially better sample complexity than IPS Unbiased if reward decomposes per-slot Φ x s. t. δ x, y = Φ x + Φ x + Φ x + Φ x Can capture higher-order interactions with suitable Γ π0 (x)

14 News Recommender: Results (sn)slates better than IPS but can have asymptotic bias

15 Outline Offline Evaluation of Online Metrics Counterfactual Estimation Advanced Estimators Case Studies & Demo 1. Yahoo Front Page [Li,Chu,Langford,Wang; 2011] 2. Bing Speller [Li,Chen,Kleban,Gupta; 2014] 3. Search Ranking [Swaminathan et al; 2016] Summary

16 Yahoo Front Page y~π(x) Pick STORY F1, F20 to highlight for different users Metric δ: CTR Logging π 0 : Uniform random Setup: Deploy policies, check online IPS correlation

17 Yahoo Front Page: Case Study IPS is quite accurate for several (spatio-temporal) policies

18 Yahoo Front Page: Case Study IPS indeed gives unbiased CTR estimates for different articles

19 Bing Speller Logging π 0 : Pick (possibly many) reformulation candidates Independent Bernoulli per rank Pr Pick d j = Τ α exp β score d j score(d 1 ) Setup: Deploy policies, check online IPS correlation

20 Bing Speller: Case Study IPS reliably estimates CTR, etc. despite non-uniform logging

21 Search Ranking Re-rank 5 out of 8 candidates Metric δ: Logging π 0 : Setup: Time-to-success Utility rate Bootstrap from Uniform/Plackett-Luce Report RMSE vs. bootstrap sample size

22 Plackett-Luce for Slates SoftMax/Multinomial without replacement (renormalize probabilities after each draw)

23 Search Ranking: Case Study Slates RP Slates estimator dominates IPS

24 Outline Offline Evaluation of Online Metrics Counterfactual Estimation Advanced Estimators Case Studies & Demo Summary

25 Evaluation: Key Questions What system π 0 should we deploy? What should we record in our logs? y~π 0 (x) π 0 How can we estimate U(π) using logged data from π 0?

26 What system π 0 to deploy? To enable reliable counterfactual estimation If possible, stochastic with logged randomization If not, log enough to estimate propensities p i Uniform? Around current system? How to explore? typically, bad less risk & better targeted [Also see Online contextual Bandit Algorithms ]

27 What should we record? Log EVERYTHING! x i, y i, δ i, p i To reliably replay π on logged data, Candidate set of actions Features for each candidate Action at the point of randomization Y i f(x i, y) y i [Bottou et al, 2013] [Li,Chen,Kleban,Gupta, 2015]

28 How can we evaluate U(π)? Offline Evaluation of Online Metrics Related: Test collections for offline metrics Model the world Reward Prediction Related: Click models; Collaborative filtering Model the bias in data Off-policy estimator Randomization is essential [Carterette et al, 2010] [Aslam et al, 2009] [Schnabel et al, 2016b] [Chuklin et al, 2015] [Schnabel et al, 2016a]

29 Summary: Off-policy Estimators IPS Estimator Simple, effective fix for non-uniform (biased) data Self-Normalized Estimator Doubly Robust Estimator Slates Estimator

30 Demo: Code Samples Visit Download (Recommend) Code_Data.zip Install Anaconda-Python3; joblib Run experiment: python EvalExperiment.py Plot results: python plot_sigir.py --mode [estimate/error] --path../../logs/ssynth

31 QUESTIONS?

Counterfactual Evaluation and Learning

SIGIR 26 Tutorial Counterfactual Evaluation and Learning Adith Swaminathan, Thorsten Joachims Department of Computer Science & Department of Information Science Cornell University Website: http://www.cs.cornell.edu/~adith/cfactsigir26/