Web Search and Text Mining. Learning from Preference Data

Size: px

Start display at page:

Download "Web Search and Text Mining. Learning from Preference Data"

Elisabeth Chase
5 years ago
Views:

1 Web Search and Text Mining Learning from Preference Data

2 Outline Two-stage algorithm, learning preference functions, and finding a total order that best agrees with a preference function Learning ranking functions from preference data Learning ranking functions from combined labeled and preference data

3 Ranking Problem and Preference Judgments Ranking problem: ranking a list of items according to certain underlying criterion. Preference Judgments: another an item should be ranked higher than Problem set-up: 1) learn a preference function from a set of preference judgments (preference data); 2) for a new list of items, apply the learned preference function 3) find a total order of the items that best agree with the preference function

4 Preference Functions We assume each item is represented by a feature vector x X. A preference function g : X X [0, 1], g(u, v) [0, 1] Interpretation: 1) g(u, v) close to 1, u ranked higher than v 2) g(u, v) close to 0, v ranked higher than u 3) g(u, v) close to 1/2, no preference w.r.t. u and v

5 Learning a Preference Function Given preference data S = {< x i, y i >, x i y i, i = 1,..., N} We can turn it into a binary classification problem {(< x i, y i >, 1), (< y i, x i >, 1), i = 1,..., N} Many choice: SVM, AdaBoost, etc.

6 From Preference Function to Total Order Given a new list of items U, we run the binary classifier on < u, v > U U. In effect, we run a tournament on U and use the binary classifier to determine the outcome of each match between the players u and v. Problem. Find a total order (linear order) on U according to the tournament results. For example, minimize the number of mistakes. A mistake occurs if a lower ranked player beats a higher ranked player. NP-hard, minimum feedback arc set problem in tournaments

7 A Heuristic Rank the players according to the number of wins, break the tie arbitrarily. This algorithm provides a 5-approximation for the feedback arc set problem (Coppersmith, Fleischer, and Rudra, SODA, 2006)

8 Ranking Functions from Preference Data Preference data S = { x i, y i x i y i, i = 1,..., N}. Learn a function h, h H, such that h match the set of preferences, i.e., as much as possible. h(x i ) h(y i ), if x i y i, i = 1,..., N, Objective Function. R(h) = 1 2 N i=1 (max{0, h(y i ) h(x i )}) 2

9 Interpretation 1) If h matches the given preference, i.e., h(x i ) h(y i ), then h incurs no cost; 2) Otherwise, the cost is (h(y i ) h(x i )) 2. A proxy for the number of mistakes made by h.

10 Functional gradient boosting applied to Consider R(h) = 1 2 N i=1 (max{0, h(y i ) h(x i )}) 2 h(x i ), h(y i ), i = 1,..., N, as the unknowns, and compute the gradient of R(h). The components of the negative gradient corresponding to h(x i ) and h(y i ), respectively, are max{0, h(y i ) h(x i )}, max{0, h(y i ) h(x i )}. For a match, the components are zero, otherwise they are h(y i ) h(x i ), h(x i ) h(y i ).

11 With step size α along the gradient, we have new function values at x i and y i, respectively, (x i, h(x i ) + α(h(y i ) h(x i ))), (y i, h(y i ) + α(h(x i ) h(y i ))) If we set α = 1, we have (x i, h(y i )), (y i, h(x i )), i.e., we just swap the function values at x i and y i. One complication. If x i appear in multiple preference pairs, we may have contradicting requirements for the new function value at x i. One solution, let the data tell you want to do.

12 Algorithm. (GBrank) Start with an initial guess h 0, for k = 1, 2,..., 1) using h k 1 as the current approximation of h, we separate S into two disjoint sets, and S + = { x i, y i S h k 1 (x i ) h k 1 (y i )} S = { x i, y i S h k 1 (x i ) < h k 1 (y i )}; 2) fitting g k (x) and the following training data {(x i, h k 1 (y i )), (y i, h k 1 (x i )) (x i, y i ) S }; 3) forming h k (x) = h k 1 (x) + µg k (x).

13 Some Experimental Results A commercial SE, 4372 queries and query-document pairs. A 0-4 grade is assigned to each query-document. Labeled data to preference data. Query q and two documents d x and d y. Feature vectors for (q, d x ) and (q, d y ) be x and y. If d x has a higher grade than d y, we include the preference x y while if d y has a higher grade than d x, we include the preference y x

14 Evaluation Metrics Number of contradicting pairs. Precision at K%: for two documents x and y (w.r.t. the same query), reasonable to assume that it is easy to compare x and y if h(x) h(y) is large, and x and y have about the same rank if h(x) is close to h(y). Sort all the document pairs x, y according to h(x) h(y). Precision at K%, the fraction of non-contradicting pairs in the top K% of the sorted list. Discounted Cumulative Gain (DCG) DCG N = N i=1 G i log 2 (i + 1).

15 Number of contradicting pairs in training data v. iterations Number of Contradicting Pairs in test data v. iterations DCG v. Iterations number of contradicting pairs number of contradicting pairs DCG iterations iterations Num of iterations

16 # of Contradicting Test Pairs v. Training Data Size DCG-5 vs. Training Set Size # of contradicting test pairs % 20% 30% 40% 50% 60% 70% 80% 90% 100 % GBrank GBT dcg % 20% 30% 40% 50% 60% 70% 80% 90% 100 % GBrank GBT % of training data used % of training data used

17 DCG for GBRank, GBT, and RankSVM in 5-fold cross validation dcg GBRank GBT RankSVM fold number

18 Number of contradicting pairs for GBRank, GBT, and RankSVM in 5- fold cross validation number of contradicting pairs GBRank GBT RankSVM fold number

19 Combined Labeled Data and Preference Data Preference judgments, S = {x i y i, i = 1,..., N}. Additionally, there are also labeled data, L = {(z i, l i ), i = 1,..., n}, where z i is the feature of an item and l i is the corresponding numerically coded label.

20 Objective Functions Find a ranking function h to minimize, R(h, α, β) = 1 2 N i=1(max{0, h(y i ) h(x i )}) n i=1 (αl i +β h(z i )) 2. Why α, β? l i fixed, not reasonable to ask h(z i ) l i. Optimization problem, {h, α, β } = argmin h H,α 0,β R(h, α, β)

21 Algorithm. (combined) Gradient Boosting Ranking (cgbrank) Start with an initial guess h 0, for m = 1, 2,..., 1) compute α m and β m such that {α m, β m } = argmin α,β 1 2 n i=1 and let g m i = α m l i + β m, i = 1,..., n. (αl i + β h m 1 (z i )) 2, 2) using h m 1 as the current approximation of h, we separate S into two disjoint sets, S + = {(x i, y i ) S h m 1 (x i ) h m 1 (y i )}

22 and S = {(x i, y i ) S h m 1 (x i ) < h m 1 (y i )}; 3) we construct a training set for fitting g m (x) by adding the following for each (x i, y i ) S, and (x i, h m 1 (y i ) τ), (y i, h m 1 (x i ) + τ), {(z i, g m i ), i = 1,..., n}. The fitting of g m (x) is done by using GBT with the above training set; 4) form h m (x) = h m 1 (x) + µg m (x), where µ is a shrinking factor.

Semestrial Project - Expedia Hotel Ranking

Semestrial Project - Expedia Hotel Ranking 1 Many customers search and purchase hotels online. Companies such as Expedia make their profit from purchases made through their sites. The ultimate goal top of the list are the hotels that are most likely