. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences.

Similar documents
Implicit Differentiation

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013

Schrödinger s equation.

Part I: Web Structure Mining Chapter 1: Information Retrieval and Web Search

d dx But have you ever seen a derivation of these results? We ll prove the first result below. cos h 1

Linear First-Order Equations

Integration by Parts

Chapter 6: Integration: partial fractions and improper integrals

3.7 Implicit Differentiation -- A Brief Introduction -- Student Notes

Diagonalization of Matrices Dr. E. Jacobs

Math 1271 Solutions for Fall 2005 Final Exam

Relative Entropy and Score Function: New Information Estimation Relationships through Arbitrary Additive Perturbation

Math 1B, lecture 8: Integration by parts

The Exact Form and General Integrating Factors

23 Implicit differentiation

DIFFERENTIAL GEOMETRY, LECTURE 15, JULY 10

6 Wave equation in spherical polar coordinates

A. Incorrect! The letter t does not appear in the expression of the given integral

Topic Modeling: Beyond Bag-of-Words

State-Space Model for a Multi-Machine System

Pure Further Mathematics 1. Revision Notes

Solving the Schrödinger Equation for the 1 Electron Atom (Hydrogen-Like)

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x)

MA 2232 Lecture 08 - Review of Log and Exponential Functions and Exponential Growth

Experiment I Electric Force

Calculus in the AP Physics C Course The Derivative

G j dq i + G j. q i. = a jt. and

Lecture XII. where Φ is called the potential function. Let us introduce spherical coordinates defined through the relations

Least-Squares Regression on Sparse Spaces

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors

Table of Common Derivatives By David Abraham

Math 342 Partial Differential Equations «Viktor Grigoryan

Implicit Differentiation

WJEC Core 2 Integration. Section 1: Introduction to integration

and from it produce the action integral whose variation we set to zero:

IMPLICIT DIFFERENTIATION

12.11 Laplace s Equation in Cylindrical and

Vectors in two dimensions

Euler equations for multiple integrals

Final Exam Study Guide and Practice Problems Solutions

Quantum mechanical approaches to the virial

Lecture 2: Correlated Topic Model

PDE Notes, Lecture #11

UNDERSTANDING INTEGRATION

JUST THE MATHS UNIT NUMBER DIFFERENTIATION 2 (Rates of change) A.J.Hobson

Summary: Differentiation

Angles-Only Orbit Determination Copyright 2006 Michel Santos Page 1

FLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS. 1. Introduction

Construction of the Electronic Radial Wave Functions and Probability Distributions of Hydrogen-like Systems

6 General properties of an autonomous system of two first order ODE

LDA Collapsed Gibbs Sampler, VariaNonal Inference. Task 3: Mixed Membership Models. Case Study 5: Mixed Membership Modeling

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs

Derivatives and the Product Rule

Calculus of Variations

0.1 Differentiation Rules

Separation of Variables

Lecture 5. Symmetric Shearer s Lemma

inflow outflow Part I. Regular tasks for MAE598/494 Task 1

Two formulas for the Euler ϕ-function

7.1 Support Vector Machine

Solutions to Practice Problems Tuesday, October 28, 2008

1 dx. where is a large constant, i.e., 1, (7.6) and Px is of the order of unity. Indeed, if px is given by (7.5), the inequality (7.

Sturm-Liouville Theory

Collapsed Variational Inference for HDP

Tutorial on Maximum Likelyhood Estimation: Parametric Density Estimation

Hyperbolic Functions. Notice: this material must not be used as a substitute for attending. the lectures

SYNCHRONOUS SEQUENTIAL CIRCUITS

Bohr Model of the Hydrogen Atom

Transmission Line Matrix (TLM) network analogues of reversible trapping processes Part B: scaling and consistency

2Algebraic ONLINE PAGE PROOFS. foundations

Space-time Linear Dispersion Using Coordinate Interleaving

Diophantine Approximations: Examining the Farey Process and its Method on Producing Best Approximations

Assignment 1. g i (x 1,..., x n ) dx i = 0. i=1

SYDE 112, LECTURE 1: Review & Antidifferentiation

arxiv: v4 [math.pr] 27 Jul 2016

Notes on Lie Groups, Lie algebras, and the Exponentiation Map Mitchell Faulk

Armenian Transformation Equations For Relativity

Proof of SPNs as Mixture of Trees

Collapsed Gibbs and Variational Methods for LDA. Example Collapsed MoG Sampling

Axiometrics: Axioms of Information Retrieval Effectiveness Metrics

Chapter 9 Method of Weighted Residuals

Cascaded redundancy reduction

First Order Linear Differential Equations

Exam 2 Review Solutions

A Review of Multiple Try MCMC algorithms for Signal Processing

Problem Sheet 2: Eigenvalues and eigenvectors and their use in solving linear ODEs

Research Article When Inflation Causes No Increase in Claim Amounts

Calculus BC Section II PART A A GRAPHING CALCULATOR IS REQUIRED FOR SOME PROBLEMS OR PARTS OF PROBLEMS

Kramers Relation. Douglas H. Laurence. Department of Physical Sciences, Broward College, Davie, FL 33314

1 The Derivative of ln(x)

The Press-Schechter mass function

Year 11 Matrices Semester 2. Yuk

θ x = f ( x,t) could be written as

ensembles When working with density operators, we can use this connection to define a generalized Bloch vector: v x Tr x, v y Tr y

Antiderivatives. Definition (Antiderivative) If F (x) = f (x) we call F an antiderivative of f. Alan H. SteinUniversity of Connecticut

Parameter estimation: A new approach to weighting a priori information

Convergence of Random Walks

Chapter Primer on Differentiation

INFO 630 / CS 674 Lecture Notes

Quantum Mechanics in Three Dimensions

Transcription:

S 63 Lecture 8 2/2/26 Lecturer Lillian Lee Scribes Peter Babinski, Davi Lin Basic Language Moeling Approach I. Special ase of LM-base Approach a. Recap of Formulas an Terms b. Fixing θ? c. About that Multinomial Moel II. A New Interpretation of our LM Approach a. A New Scenario III. Relate Questions I. Special ase of LM-base Approach a. Recap of Formulas an Terms Recall our special case of a language moeling-base approach. We ecie to use multinomial topic moels with corpus-epenent Dirichlet priors. We wante to score a ocument by using the probability of the uery base on a moel base on the ocument, P. Using a multinomial moel gives us the following euation for P, with respect to same length term seuences. [ ] P K θ K represents the number of ways we can rearrange the terms of the uery. Recall the θ following euation for 2 ϑ µ µ Recall that we incorporate this smoothing term into µ 3 We ae 3 to the euation for θ θ in orer to avoi situations where we woul have zero counts for a particular term v in our vector representation of the ocument. Having zero counts for any component of the prouct sum in the euation for P for which any of the corresponing counts in the uery are non-zero woul naturally cause our probability to become zero, so we ae counts to each term base upon its freuency of occurrence in the corpus.

We then remove the ocument inepenent term, K, because it oes not affect our ranking of ocuments, in orer to obtain the following scoring euation for P 4 P rank µ µ Notice that many familiar terms appear in 4. is our term freuency component. is our length normalization austment. looks like some form of a reciprocal IDF term. This new euation seems a bit problematic. The smoothing function in the euation as counts to a term proportional to its freuency of occurrence, the exact opposite of the way in which the IDF measure weights terms. A term such as the woul gain many counts from this smoothing function. We woul have preferre to see a term resembling the IDF uantity appear. b. Fixing θ? There are a few ways we coul try to fix θ, shown in 2 We coul remove the smoothing term setµ, but this woul result in our estimation treating a ocument that is missing one uery term ientically to how it treats ocuments missing more than one or even all uery terms. We coul also change the smoothing term to incorporate a uniform prior. The resulting LM estimation woul be ifferent, an it oesn t take avantage of all of the information we have available from the corpus. In aition, we have little ustification for this choice. We coul back off from the generative approach entirely. However, before we ecie to take any action, shouln t we see if there is any way that we can manipulate 4 to give us the IDF term we want? In other wors, is the IDF term alreay in 4? Our approach, following Zhai an Lafferty 2, will be to try to rive into the enominator. First we efine the following norm norm, µ, µ 5 2

Using this uantity, we can rewrite 4 [ ] 6 P µ norm Recall our RSJ erivation. We can use some simple transformations to try to separate missing terms. We begin by consiering splitting the prouct sum accoring to the inices. One prouct sum will only inclue inices such that. The other prouct sum will only inclue inices such that. Thus we obtain the following euation from 6 norm [ ] 7 µ [ µ ] Notice that the RHS of 7 is inepenent of except for in the inex. We can eliminate this RHS by multiplying by this term [ ] 8 µ Because we now have a prouct sum over all inices of this term that is inepenent of, we can eliminate it uner ranking. [ ] µ µ 9 µ [ ] [ ] The resulting term in 9 will not affect our ranking, so we can eliminate it. However, we also nee to multiply by the reciprocal of 8 in orer to maintain euality. [ ] norm µ [ µ ] [ µ ] [ µ ] After multiplying through an noting our euivalence in 9 oes not affect our ranking, we obtain the following P ranking norm µ 3

We still have our term freuency, we now have our IDF term,. c. About that Multinomial Moel, an our length normalization,. In aition, Let us first review some of the aspects an motivations of our language moel. We wante to moel a probabilistic text-generation process for the uery base somehow on the ocument. We wante a metho for assigning probabilities to text. While there are a variety of sophisticate moels we coul have chosen, we ecie upon a multinomial one. Asie Our multinomial moel is not necessary the same as a unigram moel, as consiering the two eual creates some checksum issues. For example Assume parameters θ,, θ m s.t. θ an for all, θ. Thus P U [ ] θ, where P U is our probability of the uery given some unigram moel, assuming inepenence of occurrences for each term. For the probability of any term v, we know the following P U v θ Now these probabilities must sum to. " P " θ U v However, we now note that there is no probability left for multiple terms occurring. P U v v θ θ 2 2 expecte result because of 2 We expecte P v v θ θ 2 by our above efinition of P U. However, we foun 2 2 that P U v v ue to the fact that PU v P v v by the rules of probability. U 4

How is our multinomial moel ifferent? P K PU The uantity K restricts our uery to a given length. However, the two uantities are euivalent uner ranking, as the K term is ocument inepenent. Let us see how we coul fix the unigram moel. We will consier a 2-state HMM Hien Markov Moel. We incorporate s, the probability of stopping generation of the text. Therefore we observe probabilities are calculate as the following v s θs P H 2 v v s θ s θ s P H 2 k i i2 ik k PH v v K v s s laim Uner this moel, we obtain a proper istribution. θ i II. A New Interpretation of our LM Approach a. A New Scenario Recall we originally wante our ocuments relevance to the uery using the following 2 P R y D, Q Our special case erivation resulte in a VSM-like moel. During our erivation in previous lectures of the uery-likelihoo scoring function we use, we at some point finesse the concept of selection vs. generation. The reasoning we use here wasn t as ustifiable as we woul have like. Let us try to backtrack a bit from our previous thinking in orer to fin a better explanation for how we got to the point we erive earlier in our special case. In a new scenario, we want to try a new interpretation of whatd means. Assume The corpus contents are escribe by a set of topic LM s. Documents are generate by a choice of topic T D, which then generates D There is also a set of info nee LM s these coul be user-specific perhaps? The uery is create by a choice of topic T Q, which then creates Q Therefore the concept of relevance, R y, reuires that T D t TQ. 5

Some immeiate uestions come to min. What happens if we have ocuments that are about multiple topics? But note that the notion of topic as use here is also somewhat flexible, an so we can partially accommoate such a situation. The remaining portion of our LM iscussion will be continue next lecture. III. Relate Questions Let us examine how the ranking function we erive from our LM moel behaves with some sample ata. Recall the euation we use for ranking is foun in. Query tips on bass fishing length 4 Documents fishing bass for fun length 4 2 tips on fishing length 3 3 fishing for tips as a waiter length 6 We will consier the set of ocuments to be our corpus for the smoothing term. Letµ. 5.. alculate P using. Answer First the norms for each ocument norm 4.5 4 4. norm 2 3.5 4 5. norm 3 6.5 4 785. Terms fishing, bass, for, fun, tips, on, as, a, waiter Raw term counts fishing 3, bass, for 2, fun, tips 2, on, as, a, waiter 4 3 6 3 6

Now the prouct sums fishing term bass term tips term on term 3 3 P. 636.5 3.5 4. 3 3 3 P 2 24. 344.5 3.5 2.5 5. 3 3 P 3. 76.5 3.5 2 785. 2. Recall that for our special case, we ecie to use Dirichlet smoothing to ai is in estimating the uantity P. However, Dirichlet smoothing is not the only possibility. What if we were to try another smoothing metho? Say we ecie to use linear θ. interpolation to smooth each How woul the formula in 2 change? Rewrite P using your new formula for θ. Answer θ P 3. Think about how you woul calculate P using the new smoothing metho from uestion 2. We on t seem to have a term that acts like an IDF in our current formula for P. Show that the IDF term is in fact present when we use P to rank ocuments. Answer When we examine P from uestion 2 uner ranking, we observe the following 3 P 7

Since each count in the secon term in euation 3 is eual to zero, we observe that that prouct sum simplifies as follows 4 Now the secon term in euation 3, shown in 4, is almost ocument inepenent except for its inex set. We can make 4 ocument inepenent by multiplying 3 by the same term over the remaining inex set,. 5 However, in orer to maintain euality, we must also multiply 3 by the reciprocal of 5. Here is our euation for P after the aforementione transformations. 6 P Simplifying, the mile two terms of the prouct are ocument inepenent, so we can remove them uner ranking. 7 ranking P 8