Recommendations as Treatments: Debiasing Learning and Evaluation

Size: px

Start display at page:

Download "Recommendations as Treatments: Debiasing Learning and Evaluation"

Rosalind Evans
6 years ago
Views:

1 ICML 2016, NYC Tobias Schnabel, Adith Swaminathan, Ashudeep Singh, Navin Chandak, Thorsten Joachims Cornell University, Google Funded in part through NSF Awards IIS , IIS , IIS

2 Romance Horror Recommendations as Treatments: Movie recommendation O Observed Y/N Y True Rating Data is Missing Not At Random (MNAR) Example adapted from (Steck et al., 2010) 2

3 Selection Bias in Recommendation Why is there selection bias? o User-induced bias (e.g., browsing) o System-induced bias (e.g., advertising) Question: What happens if we ignore selection bias? (Marlin et al., 2007; Steck, 2011; Hernándandez-Lobato et al., 2014) 3

4 Romance Horror Recommendations as Treatments: Evaluating Recommendations under Selection Bias Y Recommend O Observed Y/N Y True Rating Observed ratings are misleading due to selection bias 4

5 Romance Romance Horror Horror Recommendations as Treatments: Evaluating Predicted Ratings under Selection Bias Y 1 Pred Ratings (worse) Y 2 Pred Ratings (better)

6 Romance Romance Horror Horror Recommendations as Treatments: Evaluating Predicted Ratings under Selection Bias Y 1 Pred Ratings (worse) Y 2 Pred Ratings (better)

7 Romance Romance Horror Horror Recommendations as Treatments: Evaluating Predicted Ratings under Selection Bias Y 1 Pred Ratings (worse) Y 2 Pred Ratings (better) Observed losses are misleading due to selection bias 7

8 Recommendations as Treatments Question: How can we fix the effects of selection bias? o Connection to potential outcomes framework Counterfactual Outcomes Y treatments movies Observed Outcomes Y treatments patients users patients Understand assignment mechanism (Imbens & Ruben, 2015) 8

9 Debiasing Evaluation Assignment mechansim for recommendation: Propensities P o P u,i = P O u,i = 1 Use Inverse-Propensity-Scoring Estimator p p/10 p/2 (IPS) to obtain unbiased estimate: R IPS Y P = 1 U I u,i :O ui =1 1 P u,i Y u,i Y u,i 2 p/10 p p/2 (Little & Rubin, 2002; Cortes et al., 2008; Bickel et al., 2009; Sugiyama & Kawanabe, 2012). 9

10 Propensity estimation Two settings: o Experimental - Propensities are under our control; known by design (e.g., ad placement) o Observational - Users self-select; need to estimate P u,i Estimate parameter of binary random variables: P u,i = P O u,i = 1 X, Y Variety of models: Logistic Regression, Naïve Bayes, etc. Observations O

11 Debiasing Evaluation Robustness to selection bias: Severity of Selection Bias Severity of Selection Bias 11

12 Debiasing Evaluation Robustness to inaccurate propensities: IPS-est More accurate propensities More accurate propensities 12

13 Debiasing Learning Empirical Risk Minimization (ERM) successful in many settings (Cortes & Vapnik, 1995) Use ERM together with Inverse-Propensity-Scoring Estimator (IPS) Y ERM = argmin Y H R IPS Y P For matrix factorization with MSE loss: Y ERM = argmin V,W O u,i =1 1 P u,i Y u,i V u W i 2 + λ V F 2 + W F 2 propensity weight 13

14 Generalization Error Theoretical insights: o Additional trade-off between bias and variance With probability 1 η, capacity H, maximum loss Δ: R Y ERM R IPS Y ERM P + Δ U I u,i 1 P u,i P u,i Bias + Δ U I log 2 H η 2 u,i 1 2 P u,i Variance 14

15 Propensity-scored ERM Approach is modular and discriminative: 1. Pick and estimate propensity model 2. Use estimated propensities in ERM objective Observations O Features X Observed ratings Y Propensity estimation ERM discriminative Complete Data Model generative Latent variables Missing Data Model (Marlin et al., 2007; Steck, 2011; Hernándandez-Lobato et al., 2014) 15

16 Debiasing Learning Results on two real-world datasets: o COAT: Shopping dataset (300 users; newly collected) o YAHOO: Song rating dataset (15400 users; Marlin & Zemel, 2009) Report performance on MAR test data: o HL: Latest generative approach (Hernández-Lobato et al., 2014) 16

17 Conclusions Observations O Features X Observed ratings Y Propensity estimation ERM Discriminative propensity scoring: o o o o Modular Directly optimizes target loss No latent variables Scalable Data and code: o ~schnabts/mnar/ 17

arxiv: v2 [cs.lg] 27 May 2016

arxiv: v2 [cs.lg] 27 May 2016 Tobias Schnabel, Adith Swaminathan, Ashudeep Singh, Navin Chandak, Thorsten Joachims Cornell University, Ithaca, NY, USA {TBS49, FA234, AS3354, NC475, TJ36}@CORNELL.EDU arxiv:602.05352v2 [cs.lg] 27 May