. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences.

S 63 Lecture 8 2/2/26 Lecturer Lillian Lee Scribes Peter Babinski, Davi Lin Basic Language Moeling Approach I. Special ase of LM-base Approach a. Recap of Formulas an Terms b. Fixing θ? c. About that Multinomial Moel II. A New Interpretation of our LM Approach a. A New Scenario III. Relate Questions I. Special ase of LM-base Approach a. Recap of Formulas an Terms Recall our special case of a language moeling-base approach. We ecie to use multinomial topic moels with corpus-epenent Dirichlet priors. We wante to score a ocument by using the probability of the uery base on a moel base on the ocument, P. Using a multinomial moel gives us the following euation for P, with respect to same length term seuences. [ ] P K θ K represents the number of ways we can rearrange the terms of the uery. Recall the θ following euation for 2 ϑ µ µ Recall that we incorporate this smoothing term into µ 3 We ae 3 to the euation for θ θ in orer to avoi situations where we woul have zero counts for a particular term v in our vector representation of the ocument. Having zero counts for any component of the prouct sum in the euation for P for which any of the corresponing counts in the uery are non-zero woul naturally cause our probability to become zero, so we ae counts to each term base upon its freuency of occurrence in the corpus.

We then remove the ocument inepenent term, K, because it oes not affect our ranking of ocuments, in orer to obtain the following scoring euation for P 4 P rank µ µ Notice that many familiar terms appear in 4. is our term freuency component. is our length normalization austment. looks like some form of a reciprocal IDF term. This new euation seems a bit problematic. The smoothing function in the euation as counts to a term proportional to its freuency of occurrence, the exact opposite of the way in which the IDF measure weights terms. A term such as the woul gain many counts from this smoothing function. We woul have preferre to see a term resembling the IDF uantity appear. b. Fixing θ? There are a few ways we coul try to fix θ, shown in 2 We coul remove the smoothing term setµ, but this woul result in our estimation treating a ocument that is missing one uery term ientically to how it treats ocuments missing more than one or even all uery terms. We coul also change the smoothing term to incorporate a uniform prior. The resulting LM estimation woul be ifferent, an it oesn t take avantage of all of the information we have available from the corpus. In aition, we have little ustification for this choice. We coul back off from the generative approach entirely. However, before we ecie to take any action, shouln t we see if there is any way that we can manipulate 4 to give us the IDF term we want? In other wors, is the IDF term alreay in 4? Our approach, following Zhai an Lafferty 2, will be to try to rive into the enominator. First we efine the following norm norm, µ, µ 5 2

Using this uantity, we can rewrite 4 [ ] 6 P µ norm Recall our RSJ erivation. We can use some simple transformations to try to separate missing terms. We begin by consiering splitting the prouct sum accoring to the inices. One prouct sum will only inclue inices such that. The other prouct sum will only inclue inices such that. Thus we obtain the following euation from 6 norm [ ] 7 µ [ µ ] Notice that the RHS of 7 is inepenent of except for in the inex. We can eliminate this RHS by multiplying by this term [ ] 8 µ Because we now have a prouct sum over all inices of this term that is inepenent of, we can eliminate it uner ranking. [ ] µ µ 9 µ [ ] [ ] The resulting term in 9 will not affect our ranking, so we can eliminate it. However, we also nee to multiply by the reciprocal of 8 in orer to maintain euality. [ ] norm µ [ µ ] [ µ ] [ µ ] After multiplying through an noting our euivalence in 9 oes not affect our ranking, we obtain the following P ranking norm µ 3

We still have our term freuency, we now have our IDF term,. c. About that Multinomial Moel, an our length normalization,. In aition, Let us first review some of the aspects an motivations of our language moel. We wante to moel a probabilistic text-generation process for the uery base somehow on the ocument. We wante a metho for assigning probabilities to text. While there are a variety of sophisticate moels we coul have chosen, we ecie upon a multinomial one. Asie Our multinomial moel is not necessary the same as a unigram moel, as consiering the two eual creates some checksum issues. For example Assume parameters θ,, θ m s.t. θ an for all, θ. Thus P U [ ] θ, where P U is our probability of the uery given some unigram moel, assuming inepenence of occurrences for each term. For the probability of any term v, we know the following P U v θ Now these probabilities must sum to. " P " θ U v However, we now note that there is no probability left for multiple terms occurring. P U v v θ θ 2 2 expecte result because of 2 We expecte P v v θ θ 2 by our above efinition of P U. However, we foun 2 2 that P U v v ue to the fact that PU v P v v by the rules of probability. U 4

How is our multinomial moel ifferent? P K PU The uantity K restricts our uery to a given length. However, the two uantities are euivalent uner ranking, as the K term is ocument inepenent. Let us see how we coul fix the unigram moel. We will consier a 2-state HMM Hien Markov Moel. We incorporate s, the probability of stopping generation of the text. Therefore we observe probabilities are calculate as the following v s θs P H 2 v v s θ s θ s P H 2 k i i2 ik k PH v v K v s s laim Uner this moel, we obtain a proper istribution. θ i II. A New Interpretation of our LM Approach a. A New Scenario Recall we originally wante our ocuments relevance to the uery using the following 2 P R y D, Q Our special case erivation resulte in a VSM-like moel. During our erivation in previous lectures of the uery-likelihoo scoring function we use, we at some point finesse the concept of selection vs. generation. The reasoning we use here wasn t as ustifiable as we woul have like. Let us try to backtrack a bit from our previous thinking in orer to fin a better explanation for how we got to the point we erive earlier in our special case. In a new scenario, we want to try a new interpretation of whatd means. Assume The corpus contents are escribe by a set of topic LM s. Documents are generate by a choice of topic T D, which then generates D There is also a set of info nee LM s these coul be user-specific perhaps? The uery is create by a choice of topic T Q, which then creates Q Therefore the concept of relevance, R y, reuires that T D t TQ. 5

Some immeiate uestions come to min. What happens if we have ocuments that are about multiple topics? But note that the notion of topic as use here is also somewhat flexible, an so we can partially accommoate such a situation. The remaining portion of our LM iscussion will be continue next lecture. III. Relate Questions Let us examine how the ranking function we erive from our LM moel behaves with some sample ata. Recall the euation we use for ranking is foun in. Query tips on bass fishing length 4 Documents fishing bass for fun length 4 2 tips on fishing length 3 3 fishing for tips as a waiter length 6 We will consier the set of ocuments to be our corpus for the smoothing term. Letµ. 5.. alculate P using. Answer First the norms for each ocument norm 4.5 4 4. norm 2 3.5 4 5. norm 3 6.5 4 785. Terms fishing, bass, for, fun, tips, on, as, a, waiter Raw term counts fishing 3, bass, for 2, fun, tips 2, on, as, a, waiter 4 3 6 3 6

Now the prouct sums fishing term bass term tips term on term 3 3 P. 636.5 3.5 4. 3 3 3 P 2 24. 344.5 3.5 2.5 5. 3 3 P 3. 76.5 3.5 2 785. 2. Recall that for our special case, we ecie to use Dirichlet smoothing to ai is in estimating the uantity P. However, Dirichlet smoothing is not the only possibility. What if we were to try another smoothing metho? Say we ecie to use linear θ. interpolation to smooth each How woul the formula in 2 change? Rewrite P using your new formula for θ. Answer θ P 3. Think about how you woul calculate P using the new smoothing metho from uestion 2. We on t seem to have a term that acts like an IDF in our current formula for P. Show that the IDF term is in fact present when we use P to rank ocuments. Answer When we examine P from uestion 2 uner ranking, we observe the following 3 P 7

Since each count in the secon term in euation 3 is eual to zero, we observe that that prouct sum simplifies as follows 4 Now the secon term in euation 3, shown in 4, is almost ocument inepenent except for its inex set. We can make 4 ocument inepenent by multiplying 3 by the same term over the remaining inex set,. 5 However, in orer to maintain euality, we must also multiply 3 by the reciprocal of 5. Here is our euation for P after the aforementione transformations. 6 P Simplifying, the mile two terms of the prouct are ocument inepenent, so we can remove them uner ranking. 7 ranking P 8