Introduction To Machine Learning

Size: px

Start display at page:

Download "Introduction To Machine Learning"

Edith Davis
6 years ago
Views:

1 Introduction To Machine Learning David Sontag New York University Lecture 21, April 14, 2016 David Sontag (NYU) Introduction To Machine Learning Lecture 21, April 14, / 14

2 Expectation maximization Algorithm is as follows: 1 Write down the complete log-likelihood log p(x, z; θ) in such a way that it is linear in z 2 Initialize θ 0, e.g. at random or using a good first guess 3 Repeat until convergence: θ t+1 = arg max θ M E p(zm xm;θ t)[log p(x m, Z; θ)] m=1 Notice that log p(x m, Z; θ) is a random function because Z is unknown By linearity of expectation, objective decomposes into expectation terms and data terms E step corresponds to computing the objective (i.e., the expectations) M step corresponds to maximizing the objective David Sontag (NYU) Introduction To Machine Learning Lecture 21, April 14, / 14

3 Derivation of EM algorithm L(θ n+1 ) l(θ n+1 θ n ) L(θ n )=l(θ n θ n ) L(θ) l(θ θ n ) L(θ) l(θ θ n ) θ n θ n+1 θ Figure 2: Graphical (Figure interpretation from tutorial of aby single Seaniteration Borman) of the EM algorithm: The function l(θ θ n )isboundedabovebythelikelihoodfunctionl(θ). The functions are equal at θ = θ n.theemalgorithmchoosesθ n+1 as the value of θ David Sontag (NYU) Introduction To Machine Learning Lecture 21, April 14, / 14

4 Application to mixture models θ Prior distribution over topics Topic-word distributions β z d Topic of doc d w id Word i =1toN d =1toD This model is a type of (discrete) mixture model Called multinomial naive Bayes (a word can appear multiple times) Document is generated from a single topic David Sontag (NYU) Introduction To Machine Learning Lecture 21, April 14, / 14

5 EM for mixture models θ Prior distribution over topics Topic-word distributions β z d Topic of doc d w id Word i =1toN d =1toD The complete likelihood is p(w, Z; θ, β) = D d=1 p(w d, Z d ; θ, β), where p(w d, Z d ; θ, β) = θ Zd N β Zd,w id i=1 Trick #1: re-write this as p(w d, Z d ; θ, β) = K k=1 θ 1[Z d =k] k N K i=1 k=1 β 1[Z d =k] k,w id David Sontag (NYU) Introduction To Machine Learning Lecture 21, April 14, / 14

6 EM for mixture models Thus, the complete log-likelihood is: ( D K ) N K log p(w, Z; θ, β) = 1[Z d = k] log θ k + 1[Z d = k] log β k,wid d=1 k=1 i=1 k=1 In the E step, we take the expectation of the complete log-likelihood with respect to p(z w; θ t, β t ), applying linearity of expectation, i.e. E p(z w;θ t,β t )[log p(w, z; θ, β)] = ( D K p(z d = k w; θ t, β t ) log θ k + d=1 k=1 N i=1 k=1 ) K p(z d = k w; θ t, β t ) log β k,wid In the M step, we maximize this with respect to θ and β David Sontag (NYU) Introduction To Machine Learning Lecture 21, April 14, / 14

7 EM for mixture models Just as with complete data, this maximization can be done in closed form First, re-write expected complete log-likelihood from ( D K ) N K p(z d = k w; θ t, β t ) log θ k + p(z d = k w; θ t, β t ) log β k,wid d=1 k=1 i=1 k=1 to K log θ k D K W p(z d = k w d ; θ t, β t )+ log β k,w k=1 d=1 k=1 w=1 d=1 D N dw p(z d = k w d ; θ t, β t ) We then have that θ t+1 k = D d=1 p(z d = k w d ; θ t, β t ) Kˆk=1 D d=1 p(z d = ˆk w d ; θ t, β t ) David Sontag (NYU) Introduction To Machine Learning Lecture 21, April 14, / 14

Complexity+of+Inference+in+Latent+Dirichlet+Alloca6on+ Latent Dirichlet David+Sontag,+Daniel+Roy+ allocation (LDA) (NYU,+Cambridge)+ W66+ Topic models are powerful tools for exploring large data sets

8 Complexity+of+Inference+in+Latent+Dirichlet+Alloca6on+ Latent Dirichlet David+Sontag,+Daniel+Roy+ allocation (LDA) (NYU,+Cambridge)+ W66+ Topic models are powerful tools for exploring large data sets and for Topic+models+are+powerful+tools+for+exploring+large+data+sets+and+for+making+ making inferences about the content of documents inferences+about+the+content+of+documents+ Documents+!"#$%&'() *"+,#) Topics+ poli6cs religion sports "/,9#)1.&/,0,"'1 )+".()1 president hindu baseball &),3&'(1 2,'3$1 65)&65//1 obama judiasm soccer washington "65%51 4$3,5)%1 ethics )"##&.1 basketball :5)2,'0("'1 religion &(2,#)1 buddhism )7&(65//1 football &/,0,"'1 6$332,)%1 8""(65// t = p(w z = t) ManyAlmost+all+uses+of+topic+models+(e.g.,+for+unsupervised+learning,+informa6on+ applications in information retrieval, document summarization, and classification retrieval,+classifica6on)+require+probabilis)c+inference:+ New+document+ + What+is+this+document+about?+ weather+.50+ finance+.49+ sports Words+w 1,+,+w N+ Distribu6on+of+topics + LDA is one of the simplest and most widely used topic models David Sontag (NYU) Introduction To Machine Learning Lecture 21, April 14, / 14

9 Generative model for a document in LDA 1 Sample the document s topic distribution θ (aka topic vector) θ Dirichlet(α 1:T ) where the {α t } T t=1 are fixed hyperparameters. Thus θ is a distribution over T topics with mean θ t = α t / t α t 2 For i = 1 to N, sample the topic z i of the i th word z i θ θ 3... and then sample the actual word w i from the z i th topic w i z i β zi where {β t } T t=1 words) are the topics (a fixed collection of distributions on David Sontag (NYU) Introduction To Machine Learning Lecture 21, April 14, / 14

10 Generative model for a document in LDA 1 Sample the document s topic distribution θ (aka topic vector) θ Dirichlet(α 1:T ) where the {α t } T t=1 are hyperparameters.the Dirichlet density, defined over = { θ R T : t θ t 0, T t=1 θ t = 1}, is: T p(θ 1,..., θ T ) For example, for T =3 (θ 3 = 1 θ 1 θ 2 ): t=1 θ αt 1 t α1 = α2 = α3 = α 1 = α 2 = α 3 = log Pr(θ) log Pr(θ) θ 1 θ 2 θ 1 θ 2 David Sontag (NYU) Introduction To Machine Learning Lecture 21, April 14, / 14

11 Generative model for a document in LDA 3... and then sample the actual word w i from the z i th topic Complexity+of+Inference+in+Latent+Dirichlet+Alloca6on+ David+Sontag,+Daniel+Roy+ w i z i β zi (NYU,+Cambridge)+ where {β t } T t=1 are the topics (a fixed collection of distributions W66+ on Topic+models+are+powerful+tools+for+exploring+large+data+sets+and+for+making+ words) inferences+about+the+content+of+documents+ Documents+ poli6cs president obama washington religion Topics+ Almost+all+uses+of+topic+models+(e.g.,+for+unsupervised+learning,+informa6on+ retrieval,+classifica6on)+require+probabilis)c+inference:+ New+document+ + religion hindu judiasm ethics buddhism t = p(w z = t) What+is+this+document+about?+ sports baseball soccer basketball football David Sontag (NYU) Introduction To Machine Learning Lecture 21, April 14, / 14 +

Example of using LDA β 1 Topics gene 0.04 dna 0.02 genetic 0.01.

02 evolve 0.01 organism 0.01.,, θ d β T brain 0.04 neuron 0.

12 Example of using LDA β 1 Topics gene 0.04 dna 0.02 genetic 0.01.,, Documents Topic proportions and assignments z 1d life 0.02 evolve 0.01 organism 0.01.,, θ d β T brain 0.04 neuron 0.02 nerve data 0.02 number 0.02 computer 0.01.,, z Nd (Blei, Introduction to Probabilistic Topic Models, 2011) David Sontag (NYU) Introduction To Machine Learning Lecture 21, April 14, / 14

13 Plate notation for LDA model α Dirichlet hyperparameters θ d Topic distribution for document Topic-word distributions β z id Topic of word i of doc d w id Word i =1toN d =1toD Variables within a plate are replicated in a conditionally independent manner David Sontag (NYU) Introduction To Machine Learning Lecture 21, April 14, / 14

14 Comparison of mixture and admixture models α Dirichlet hyperparameters θ Prior distribution over topics θ d Topic distribution for document Topic-word distributions β z d Topic of doc d Topic-word distributions β z id Topic of word i of doc d w id Word w id Word i =1toN i =1toN d =1toD d =1toD Model on left is a mixture model Called multinomial naive Bayes (a word can appear multiple times) Document is generated from a single topic Model on right (LDA) is an admixture model Document is generated from a distribution over topics David Sontag (NYU) Introduction To Machine Learning Lecture 21, April 14, / 14

RECSM Summer School: Facebook + Topic Models. github.com/pablobarbera/big-data-upf

RECSM Summer School: Facebook + Topic Models Pablo Barberá School of International Relations University of Southern California pablobarbera.com Networked Democracy Lab www.netdem.org Course website: github.com/pablobarbera/big-data-upf