A Novel Click Model and Its Applications to Online Advertising

Size: px

Start display at page:

Download "A Novel Click Model and Its Applications to Online Advertising"

Ursula Thomas
5 years ago
Views:

1 A Novel Click Model and Its Applications to Online Advertising Zeyuan Zhu Weizhu Chen Tom Minka Chenguang Zhu Zheng Chen February 5,

2 Introduction Click Model - To model the user behavior Application Predict CTR Improve NDCG AdPrediction Document relevance estimation Replace human judged data As ranking features. Clicks are biased presenting order February 5,

3 Related Works examination hypothesis (position model) Observation: The relevance of a document at position i should be further multiplied by a term x i. cascade model Observation: user scans from top to bottom a Bayesian network. February 5,

4 Related Works Examination Hypothesis if a displayed url is clicked, it must be both examined and relevant query q; url u; position i; binary click event C P C = 1 q, u, i = P C = 1 u, q, E = 1 P E = 1 i User Browsing Model previous clicked position l r u,q P C = 1 q, u, i, l = P C = 1 u, q, E = 1 r u,q x i P(E = 1 i, l) x i,l February 5,

5 Related Works Cascade Model Model for each queries separately E i, C i be the probabilistic events indicating whether the ith url is examined and clicked resp. P E 1 = 1 P(E i+1 = 1 E i = 0) = 0 P(E i+1 = 1 E i = 1, C i ) = 1 C i P C i = 1 E i = 1 = r ui,q where u i is the ith url i 1 P C i = 1 = r ui,q j=1 1 r uj,q February 5,

6 Related Works Cascade Model P(E i+1 = 1 E i = 1, C i ) = 1 C i Extension Click Chain Model (CCM) P E i+1 = 1 E i = 1, C i = 0 = α 1 P E i+1 = 1 E i = 1, C i = 1 = α 2 1 r ui,q + α 3 r ui,q Dynamic Bayesian Network (DBN) P E i+1 = 1 E i = 1, C i = 0 = γ P E i+1 = 1 E i = 1, C i = 1 = γ 1 s ui,q February 5,

7 Limitation Click Chain Model (CCM) P E i+1 = 1 E i = 1, C i = 0 = α 1 P E i+1 = 1 E i = 1, C i = 1 = α 2 1 r ui,q + α 3 r ui,q Dynamic Bayesian Network (DBN) P E i+1 = 1 E i = 1, C i = 0 = γ P E i+1 = 1 E i = 1, C i = 1 = γ 1 s ui,q Transition probability only considers the relevance. February 5,

8 click through rate Observation But a click is influenced by multiple bias: local hour user agent 0.05 Linux user The local hour non-linux user with FireFox non-linux user with Opera Other users, including IE users Click through rate February 5,

9 Big Challenge How to tolerate multiple-bias in the click model? February 5,

General Click Model We still need to keep E and C They

in which we assume users scan urls from top to bottom

the network to be a summation of parameters, each

10 General Click Model We still need to keep E and C They are good assumption The Outer Model Bayesian network, in which we assume users scan urls from top to bottom The Inner Model define the transition probability in the network to be a summation of parameters, each corresponding to a single attribute value February 5,

11 General Click Model We need to consider multiple bias into transition probability The Outer Model Bayesian network, in which we assume users scan urls from top to bottom The Inner Model define the transition probability in the network to be a summation of parameters, each corresponding to a single attribute value February 5,

12 GCM The Outer Model February 5,

13 GCM The Outer Model P E 1 = 1 P(E i+1 = 1 E i = 0) = 0 P E i+1 = 1 E i = 1, C i = 0, Β i = I Β i > 0 P E i+1 = 1 E i = 1, C i = 1, Α i = I A i > 0 P C i = 1 E i = 1, R i = I R i > 0 February 5,

14 Different with DBN/CCM Similar Bayesian Network GCM has a general notation of A i, B i and R i Our main contribution comes next: The inner model how to build A i, B i and R i February 5,

15 GCM The Inner Model We assume each attribute value f is associated with three parameters θ f A, θ f B and θ f R, each of which is a continuous random variable Α i = Β i = R i = s A θ fj user A j=1 + j=1 θ url fi,j s B θ fj user j=1 + j=1 θ url fi,j s Let Θ = R θ fj user t t B j=1 + j=1 θ url fi,j t R + err + err + err θ f A, θ f B, θ f R f be the parameter set. February 5,

keyword the length of the url f i,1 url, f

16 user-specific attributes GCM The Inner Model the query the location the browser type the local hour the IP address the query length f user 1, f user 2, fuser s the url the displayed position(=i) the classification of the url the matched keyword the length of the url f i,1 url, f i,2 url, f i,t url url-specific attributes February 5,

17 GCM The Inference Method Assume parameters in Θ are independent Gaussians. Bayesian Inference Expectation Propagation method by Tom Minka Given the structure of a Bayesian network with hidden variables, EP takes the observation values as input, and is capable of calculating the inference of any variable. For each training session, we use the current Gaussians as prior, do the EP, and then calculate the posterior Gaussians and update them in Θ. February 5,

18 GCM Algorithm February 5,

19 GCM Algorithm February 5,

20 GCM Algorithm February 5,

21 GCM Algorithm February 5,

22 GCM Algorithm February 5,

23 GCM Algorithm February 5,

24 GCM Algorithm February 5,

25 GCM Reductions Lemma: If we define an attribute value f to be the pair of query and url f = u i, q,the traditional transition probability P C i = 1 E i = 1 = r ui,q can reduce to P C i = 1 E i = 1, R i = I R i > 0 if we set R i = θ R f + err and θ R f is a point mass Gaussian centered at F 1 r ui,q,where F is the cumulative distribution function of N 0,1. Recall R i = s R θ fj user j=1 + j=1 θ url fi,j February 5, t R + err

26 GCM Reductions Examination Hypothesis P B i > 0 = P Α i > 0 = x i+1 P R i > 0 = r ui,q define two attributes f 1 = i + 1 and f 2 = (u i, q) A i = θ A f1 + err; B i = θ B f1 + err; R i = θ R f2 + err Similar for other prior works February 5,

27 Perplexity Log-Likelihood Experiment Set 1 Set 2 Set 3 Set 4 Set 5 Set 6 Set 7 Set 8 All Baseline Cascade CCM DBN GCM Set 1 Set 2 Set 3 Set 4 Set 5 Set 6 Set 7 Set 8 All Baseline Cascade CCM DBN GCM February 5,

28 Actual CTR Actual CTR Experiment GCM DBN Predicted CTR Cascade CCM Predicted CTR February 5,

29 Click Rate Experiment Actual GCM DBN CCM Local hour February 5,

30 Main Contribution Multi-bias aware. The transition probabilities between variables depend jointly on a list of attributes. This enables our model to explain bias terms other than the position-bias. Learning across queries. The model learns queries altogether and thus can predict clicks for one query even a new query using the learned data from other queries. Extensible: The user may actively add or remove attributes applied in our GCM model. In fact, all the prior works mentioned above can reduce to our GCM as special cases when only one or two attributes are incorporated. One-pass. Our click model is an on-line algorithm. The posterior distributions will be regarded as the prior knowledge for the next query session. Applicable to ads. We have demonstrated our click model in the CTR prediction of advertisements. Experimental results show that our click model outperforms the prior works. February 5,

31 Future work To learn Continuous attribute values Make use of the page structure Running time February 5,

Thanks! Questions: zhuzeyuan@hotmail.com wzchen@microsoft.

32 Thanks! Questions: Thanks to: Haixun Wang Gang Wang Dakan Wang February 5,

33 Introduction Implicit feedback Attributes Query text Timestamps Localities The click-or-not flag Etc February 5,

34 Definitions Query Microsoft Query session U = u 1, u 2, u M Urls impressions u 2 = research.microsoft.com Attribute IE 7am local time February 5,

35 Experiment Set Query Freq #Queries Train set Test set #Sessions #Urls #Sessions #Urls 1 1~ , , ~30 1,211 24,928 1,664,403 2,122 13, ~100 5, ,203 1,810,009 18, , ~300 3, ,654 3,148,826 40, , ~1000 1, ,722 3,011,482 54, , ,000~3, ,422 2,470,665 48, , ,000~10, ,645 1,508,985 42,067 92, ,000~30, , ,786 19,338 48, , ,835 1,046,948 37,796 64,236 All All of above 12,691 4,267,241 15,431, , ,245 February 5,

36 Perplexity Log-Likelihood Experiment Set 1 Set 2 Set 3 Set 4 Set 5 Set 6 Set 7 Set 8 All CCM DBN GCM Set 1 Set 2 Set 3 Set 4 Set 5 Set 6 Set 7 Set 8 All CCM DBN GCM February 5,

37 Related Works CCM February 5,

38 Related Works CCM February 5,

39 Related Works CCM February 5,

40 Related Works DBN P E i+1 = 1 E i = 1, C i = 0 = γ P E i+1 = 1 E i = 1, C i = 1 = γ 1 s ui,q February 5,

41 Related Works DBN February 5,

42 Related Works DBN February 5,

Behavioral Data Mining. Lecture 2

Behavioral Data Mining. Lecture 2 Behavioral Data Mining Lecture 2 Autonomy Corp Bayes Theorem Bayes Theorem P(A B) = probability of A given that B is true. P(A B) = P(B A)P(A) P(B) In practice we are most interested in dealing with events