Semestrial Project - Expedia Hotel Ranking

Size: px

Start display at page:

Download "Semestrial Project - Expedia Hotel Ranking"

Gervase Burns
6 years ago
Views:

1 1

2 Many customers search and purchase hotels online. Companies such as Expedia make their profit from purchases made through their sites. The ultimate goal top of the list are the hotels that are most likely to be purchased by the user. 2

3 Kaggle Predictive Modeling competitions. Expedia hotel ranking challenge through Kaggle. Data set used provided by Expedia. 3

4 4

5 Each query shows multiple samples Each sample represents a hotel. A sample provides information on the hotel s cost, ratings etc. There are features that describe the search query and user history same for all samples in query. 5

6 6

7 Rank(query,hotel) = q T Ah s.t: q is the query features vector h is the hotel features vector 7

8 Multi class problem purchased, clicked, neither. Non-coherent data some examples might be missing details that other has. Different features have different representation. Evaluation metric is NDCG order is important. See Appendix 1. 8

9 Goals Unified representation of different types of features. Compensating missing data. Creating New features. Flexibility easy to modify. 9

10 Implementation Transforming into unified binary representation. Limit predetermined value to define number of quantization levels. Boundaries thresholds values for quantization. New features Average, Median, Variance, Abs stars diff. 10

11 arg min{ A,, i j Rank SVM Solution with Matpower a matlab package C1 C2 s. t : i [1, n], j [1, m] T T q Ah2 q Ah1 T T q Ah1 q Ah0 i A 0, 0 j n i 1 i j i j m } j 11

12 Setbacks Memory complexity O(n 2 ) at best High time complexity (empirically). Not optimizing NDCG evaluation metric directly. 12

13 Data Filtering Max no. of unclicked hotels (Max_Hotels). Max no. of queries without bound was Max no. of queries with Max_Hotels=5 was Ignoring less effective features. 13

14 Choosing C1/C2 Ratio 14

15 Choosing C1,C2 values 15

16 Limit 16

17 Divide & Conquer - Explained Break into smaller similar problems. Select a feature to divide the train set into disjoint sets. Solve for each set separately. Feature selected Site ID. 17

18 Divide & Conquer Results NDCG vs Number Of Sites NDCG Number Of Sites 18

19 Concept Perceptron-like algorithm, without classifying. Find non-complying hotels. Update ranking matrix to reinforce correct ranking. Use updated matrix if NDCG value improves. 19

20 20

21 Iteration 1 the order is: index click purch rank Full query ranking example Iteration 2 the order is: index click purch rank Iteration 3 the order is: index click purch rank Iteration 4 the order is: index click purch rank Iteration 5 the order is: index click purch rank

22 Iteration 6 the order is: index click purch rank Full query ranking example Iteration 7 the order is: index click purch rank Iteration 8 the order is: index click purch rank Iteration 9 the order is: index click purch rank

23 Train on smaller parts. Cross Training Calculate performance with average matrix. 23

24 Choosing alpha factor 24

25 Choosing K 25

26 Deteriorate limit (K) 26

27 Algorithm NDCG at Kaggle Details Best Score Best score reached on Kaggle by competitors. Position Benchmark Properties ranked according to the position they were shown on Expedia.com SVM-Rank Algorithm 1 SwitchRank Algorithm 2 Basic Python Benchmark Some basic benchmark created for competitors use Random Order Benchmark Properties are recommended in a random order 27

28 Combining different algorithms. Finding a better quantization method. Creating more features. Divide & Conquer with different features. 28

29 NDCG - Normalized Discounted Cumulative Gain DCG rel 2 i 1 log ( i 1) i 1 2 Where K is the maximum number of entities that can be recommended and rel is the graded relevance of entity i. is the maximum possible (ideal) for a given set. IDCG k The final score is calculated by: k i k rel {0,1,5} i ndcg k DCG k IDCG k DCG k 29

Web Search and Text Mining. Learning from Preference Data

Web Search and Text Mining. Learning from Preference Data Web Search and Text Mining Learning from Preference Data Outline Two-stage algorithm, learning preference functions, and finding a total order that best agrees with a preference function Learning ranking