Interventions. Online Learning from User Interactions through Interventions. Interactive Learning Systems

Size: px

Start display at page:

Download "Interventions. Online Learning from User Interactions through Interventions. Interactive Learning Systems"

Daniela Grant
5 years ago
Views:

Online Learning from Interactions through Interventions CS 7792 - Fall 2016 horsten Joachims Department of Computer Science & Department of Information Science Cornell University Y. Yue, J. Broder, R.

s Examples Search engines Entertainment media E-commerce Smart homes / robots Learning Gathering and maintenance of knowledge Measure and optimize performance Personalization Interventions

Retrieval Function 1 Which one Retrieval Function 2 f 1 (x) y 1 is better? f 2 (x) y 2 2. SVM-Light Support Vector Machine http://svmlight.joachims.org/ 3.

School of Veterinary Medicine at UPenn http://www.vet.upenn.edu/ 2. Service Master Company http://www.servicemaster.com/ 3. Support Vector Machine 4. Archives of SUPPOR-VECOR-MACHINES http://www.

1 Online Learning from Interactions through Interventions CS Fall 2016 horsten Joachims Department of Computer Science & Department of Information Science Cornell University Y. Yue, J. Broder, R. Kleinberg,. Joachims. he K-armed Dueling Bandits Problem. In COL, P. Shivaswamy,. Joachims. Online Structured Prediction via Coactive Learning, ICML, s Examples Search engines Entertainment media E-commerce Smart homes / robots Learning Gathering and maintenance of knowledge Measure and optimize performance Personalization Interventions Information Elicitation from the Dueling Bandits: -driven exploration Coactive Learning: -driven exploration Decide between two Ranking Functions Distribution P(x) of x=(user, query) (tj, SVM ) Retrieval Function 1 Which one Retrieval Function 2 f 1 (x) y 1 is better? f 2 (x) y 2 2. SVM-Light Support Vector Machine 3. School of Veterinary Medicine at UPenn 4. An Introduction to Support Vector Machines 5. Service Master Company 1. School of Veterinary Medicine at UPenn 2. Service Master Company 3. Support Vector Machine 4. Archives of SUPPOR-VECOR-MACHINES 5. SVM-Light Support Vector Machine U(tj, SVM,y 1 ) U(tj, SVM,y 2 ) Name Description Aggregation Measuring Utility Abandonment Rate % of queries with no click N/A Increase Reformulation Rate Queries per Session % of queries that are followed by reformulation Session = no interruption of more than 30 minutes N/A Mean Hypothesized Change with Decreased Quality Increase Increase s per Query Number of clicks Mean % of queries with clicks at position 1 N/A Decrease Max Reciprocal Rank* 1/rank for highest click Mean Decrease Mean Reciprocal Rank* Mean of 1/rank for all clicks Mean Decrease ime to First * Seconds before first click Median Increase ime to Last * Seconds before final click Median Decrease (*) only queries with at least one click count Arxiv.org: Results Conclusions ORIG FLA None of the absolute metrics reflects RAND expected order. ORIG Most differences not significant SWAP2 after one month of data. SWAP4 Analogous results for Yahoo! Search with much more data [Chapelle et al., 2012]. [Radlinski et al., 2008] 1

Percent Wins A Model of how s in Search Model of clicking: s explore ranking to position k s click on most relevant (looking) links in top k s stop clicking when time budget up or other action more

2 Percent Wins A Model of how s in Search Model of clicking: s explore ranking to position k s click on most relevant (looking) links in top k s stop clicking when time budget up or other action more promising (e.g. reformulation) Empirically supported by [Granka et al., 2004] argmax U(y) y op k 2. Support Vector Machine 3. An Introduction to Support Vector Machines 4. Archives of SUPPOR-VECOR-MACHINES SVM-Light Support Vector Machine Balanced Interleaving x=(u=tj, q= svm ) f 1 (x) y 1 f 2 (x) y 2 Model of : Better retrieval functions is more likely to get more clicks. Interleaving(y 1,y 2 ) 1 2. Support Vector Machine 2 3. SVM-Light Support Vector Machine 2 4. An Introduction to Support Vector Machines 3 5. Support Vector Machine and Kernel... References Archives of SUPPOR-VECOR-MACHINES Lucent echnologies: SVM demo applet SVM-Light Support Vector Machine 3. Support Vector Machine and Kernel... References 4. Lucent echnologies: SVM demo applet 5. Royal Holloway Support Vector Machine Invariant: For all k, top k of balanced interleaving is union of top k 1 of r 1 and top k 2 of r 2 with k 1 =k 2 ± 1. Interpretation: (y 1 Â y 2 ) clicks(topk(y 1 )) > clicks(topk(y 2 )) see also [Radlinski, Craswell, 2012] [Hofmann, 2012] [Joachims, 2001] [Radlinski et al., 2008] Arxiv.org: Interleaving Results % wins ORIG % wins RAND Conclusions All interleaving experiments reflect the expected order. All differences are significant after one month of data. Same results also for alternative data-preprocessing. Yahoo and Bing: Interleaving Results Yahoo Web Search [Chapelle et al., 2012] Four retrieval functions (i.e. 6 paired comparisons) Balanced Interleaving All paired comparisons consistent with ordering by NDCG. Bing Web Search [Radlinski & Craswell, 2010] Five retrieval function pairs eam-game Interleaving Consistent with ordering by NDGC when NDCG significant. Efficiency: Interleaving vs. Explicit Bing Web Search 4 retrieval function pairs ~12k manually judged queries ~200k interleaved queries Experiment p = probability that NDCG is correct on subsample of size y x = number of queries needed to reach same p-value with interleaving en interleaved queries are equivalent to one manually judged query. [Radlinski & Craswell, 2010] Information Elicitation from the Dueling Bandits: -driven exploration Coactive Learning: -driven exploration 2

Learning on Operational Example: 4 retrieval functions: A > B >> C > D 10 possible pairs for interactive experiment (A,B) low cost to user (A,C) medium cost to user (C,D) high cost to user (A,A) zero

best f Dueling Bandits Problem R(A) = P f Âf t 0.5 + P f Âf t 0.5 [Yue, Broder, Kleinberg, Joachims, 2010] First hought: ournament Noisy Sorting/Max s: [Feige et al.

3 Learning on Operational Example: 4 retrieval functions: A > B >> C > D 10 possible pairs for interactive experiment (A,B) low cost to user (A,C) medium cost to user (C,D) high cost to user (A,A) zero cost to user Minimizing Regret Don t present bad pairs more often than necessary rade off (long term) informativeness and (short term) cost Definition: Probability of (f t, f t ) losing against the best f Dueling Bandits Problem R(A) = P f Âf t P f Âf t 0.5 [Yue, Broder, Kleinberg, Joachims, 2010] First hought: ournament Noisy Sorting/Max s: [Feige et al.]: riangle ournament Heap O(n/ 2 log(1/ )) with prob 1- [Adler et al., Karp & Kleinberg]: optimal under weaker assumptions X 3 X 5 X 5 X 7 X 2 X 3 X 4 X 5 X 6 X 7 X 8 : Interleaved Filter 2 InterleavedFilter1(,W={f 1 f K }) Pick random f from W =1/(K 2 ) WHILE W >1 FOR b 2 W DO» duel(f,f)» update P f t=t+1 c t =(log(1/ )/t) 0.5 Remove all f from W with P f < 0.5-c t [WORSE WIH PROB 1- ] IF there exists f with P f > 0.5+c t [BEER WIH PROB 1- ]» Remove f from W» Remove all f from W that are empirically inferior to f» f =f ; t=0 UNIL : duel(f,f ) Related s: [Hofmann, Whiteson, Rijke, 2011] [Yue, Joachims, 2009] [Yue, Joachims, 2011] f 1 f 2 f =f 3 f 4 f 5 0/0 0/0 0/0 0/0 f 1 f 2 f =f 3 f 4 f 5 8/2 7/3 4/6 1/9 f 1 f 2 f =f 3 f 4 13/2 11/4 7/8 XX f =f 1 f 2 f 4 0/0 0/0 XX XX XX [Yue et al., 2009] Assumptions Preference Relation: f i Â f j P(f i Â f j ) = 0.5+ i,j > 0.5 Weak Stochastic ransitivity: f i Â f j and f j Â f k f i Â f k heorem: fif2 1 Â incurs f 2 Â expected f 3 Â f 4 Â average f 5 Â regret f 6 Â bounded Â f K by Strong Stochastic ransitivity: i,k max{ i,j, j,k } 1,4 2,4 3, ,4 6,4 K,4 Stochastic riangle Inequality: f i Â f j Â f k i,k i,j + j,k 1,2 = 0.01 and 2,3 = , Winner exists: = max i { P(f 1 Â f i )-0.5 } = 1,2 > 0 Who does the exploring? Example 1 Information Elicitation from the Dueling Bandits: -driven exploration Coactive Learning: -driven exploration 3

4 Who does the exploring? Example 2 Who does the exploring? Example 3 Coactive Feedback Model Machine ranslation Interaction: given x prediction y explored y Improved Prediction Set of all y for context x y Feedback: Improved prediction ӯ t U(ӯ t x t ) > U(y t x t ) Supervised learning: optimal prediction y t * y t * = argmax y U(y x t ) Optimal Prediction y t x t We propose Coactive Learning as a model of interaction between a learning system and a human user, where both have the common goal of providing results of maximum utility to the user. Wir schlagen vor, koaktive Learning als ein Modell der Wechselwirkung zwischen einem Lernsystem und menschlichen Benutzer, wobei sowohl die gemeinsame Ziel, die Ergebnisse der maximalen Nutzen für den Benutzer. Á Wir schlagen vor, koaktive Learning als ein Modell der Wechselwirkung des Dialogs zwischen einem Lernsystem und menschlichen Benutzer, wobei sowohl die beide das gemeinsame Ziel haben, die Ergebnisse der maximalen Nutzen für den Benutzer zu liefern. ӯ t Coactive Preference Perceptron Model Linear model of user utility: U(y x) = w Á(x,y) FOR t = 1 O DO Observe x t Present y t = argmax y { w t (x t,y) } Obtain feedback ӯ t from user Update w t+1 = w t + (x t,ӯ t ) - (x t,y t ) his may look similar to a multi-class Perceptron, but Feedback ӯ t is different (not get the correct class label) Regret is different (misclassifications vs. utility difference) R A = 1 U y t x) U y t x Never revealed: cardinal feedback optimal y * [Shivaswamy, Joachims, 2012] Coactive Perceptron: Regret Bound Model U(y x) = w ɸ(x,y), where w is unknown Feedback: ξ-approximately α-informative E U x t, y t U x t, y t + α U x t, y t U x t, y t ξ t heorem For user feedback ӯ that is α-informative in expectation, the expected average regret of the Preference Perceptron is bounded by E 1 user feedback system prediction U y t x) U y t x 1 α model error gap to optimal ξ t 2R w + α model error zero [Shivaswamy, Joachims, 2012] 4

Cumulative Cumulative Win Win Ratio Ratio Preference Perceptron: Experiment Experiment: Automatically optimize Arxiv.

from y t by moving Coactive Coactive clicked links one position higher. Learning Learning Perturbation [Raman et al.

y t dependent on x t x t+1 dependent on y t (e.g.

2, O( Y log()) regret Coactive Learning Model: for given y, user provides ӯ with U(ӯ x) > U(y x) : Preference Perceptron, O(ǁwǁ 0.5 ) regret Model!

5 Cumulative Cumulative Win Win Ratio Ratio Preference Perceptron: Experiment Experiment: Automatically optimize Arxiv.org Fulltext Search Model Analogous to DCG Utility of ranking y for query x: U t (y x) = i i w t Á(x,y (i) ) [~1000 features] Computing argmax ranking: sort by w t Á(x,y (i) ) Feedback Construct ӯ t from y t by moving Coactive Coactive clicked links one position higher. Learning Learning Perturbation [Raman et al., 2013] Baseline Handtuned w base for U base (y x) Baseline Baseline Evaluation Interleaving of ranking from U t (y x) and U base (y x) Number of of Feedback [Raman et al., 2013] Design! y t dependent on x t x t+1 dependent on y t (e.g. click given ranking, new query) Information Elicitation Interventions Decisions Feedback Learning Dueling Bandits Model: Pairwise comparison test P( y i Â y j U(y i )>U(y j ) ) : Interleaved Filter 2, O( Y log()) regret Coactive Learning Model: for given y, user provides ӯ with U(ӯ x) > U(y x) : Preference Perceptron, O(ǁwǁ 0.5 ) regret Model! Running Interactive Learning Experiments 1) Build your own system and provide service a lot of work too little data 2) Convince others to run your experiments on commercial system good luck with that 3) Use large-scale historical log data from commercial system Learning from Human Decisions Decision Model Design Space: Decision Model Utility Model Interaction Experiments Feedback ype Regret Applications Application Learning Contact: tj@cs.cornell.edu Software + Papers: Related Fields: Micro Economics Decision heory Econometrics Psychology Communications Cognitive Science 5

Learning from Rational * Behavior

Learning from Rational * Behavior Josef Broder, Olivier Chapelle, Geri Gay, Arpita Ghosh, Laura Granka, Thorsten Joachims, Bobby Kleinberg, Madhu Kurup, Filip Radlinski, Karthik Raman, Tobias Schnabel,