Dirichlet Mixtures in Text Modeling

Size: px
Start display at page:

Download "Dirichlet Mixtures in Text Modeling"

Transcription

1 Drchlet Mxtures n Text Modelng Mko Yamamoto and Kugatsu Sadamtsu CS Techncal report CS-TR-05-1 Unversty of Tsukuba May 30, 2005 Abstract Word rates n text vary accordng to global factors such as genre, topc, author, and expected readershp (Church and Gale 1995). Models that summarze such global factors n text or at the document level, are called text models. A fnte mxture of Drchlet dstrbuton (Drchlet Mxture or DM for short) was nvestgated as a new text model. When parameters of a multnomal are drawn from a DM, the compound for dscrete outcomes s a fnte mxture of the Drchlet-multnomal. A Drchlet multnomal can be regarded as a multvarate verson of the Posson mxture, a relable unvarate model for global factors (Church and Gale 1995). In the present paper, the DM and ts compounds are ntroduced, wth parameter estmaton methods derved from Mnka s fxed-pont methods (Mnka 2003) and the EM algorthm. The method can estmate a consderable number of parameters of a large DM,.e., a few hundred thousand parameters. After dscusson of the relatonshps wthn the DM probablstc latent semantc analyss (PLSA) (Hofmann 1999), the mxture of ungrams (Ngam et al. 2000), and latent Drchlet allocaton (LDA) (Ble et al. 2001, 2003) the products of statstcal language modelng applcatons are dscussed and ther performance n perplexty compared. The DM model acheves the lowest perplexty level despte ts untopc nature. 1 Introducton Word rates n text vary accordng to global factors such as genre, topc, author, and expected readershp. Church and Gale (1995) examned the Brown corpus and showed that the Englsh word sad occurs wth hgh frequency n the press and fcton, but relatvely nfrequently n the hobby and learned genres. Ths observaton s basc to ther model for word rate varaton. Rosenfeld (1999) wrote that the occurrence of the word wnter n a document s proportonal to the occurrence of the word summer n the same document. Ths s a basc observaton for hs trgger models. Church (2001) states that a document ncludng the word Norega has a hgh probablty for another occurrence of Norega n the same document, an observaton 1

2 basc to hs adaptaton model. In the present paper, an attempt s made to model these global factors wth a new generatve text model (Drchlet Mxture or DM for short), and to apply t to mprove conventonal ngram language models that reveal only local nterdependency among words. It s well known that global factors are mportant to language modelng n three language processng research communtes nvolved n natural language processng, speech processng, and neural networks. In the natural language processng communty, Church and Gale (1995) proposed that word rate dstrbuton can be explaned by the Posson mxture an nfnte Posson mxture model, n whch the Posson parameter vares over the factors of a densty functon. For example, they demonstrated that an emprcal word rate varaton closely fts a specal case of the Posson mxture, the negatve bnomal where the densty functon assumes a gamma dstrbuton. However, because the Posson mxture s unvarate, t s dffcult to manage word rates for every word smultaneously. Researchers n the speech processng communty have proposed and tested a great number of multvarate models cache models, trgger models and topc-based models to capture dstant word dependency n the past two decades. Iyer and Ostendorf (1999) proposed an m-component mxture of ungram models wth a parameter estmaton method usng the EM algorthm. Each ungram model for the mxture corresponds to a dfferent topc and yelds word rates for that topc. Usng topc-based fnte mxture models, language models can be greatly mproved n perplexty, though parameter estmaton for ths model tends to overft tranng data. Recently, generatve text models such as latent Drchlet allocaton (LDA) have attracted people n the neural network communty. Usng generatve text models, the probablty for a document rather than smply sentences can be computed. Probablty computaton n these models takes advantage of pror dstrbuton of word rate varablty garnered from large document collectons. Generatve models are statstcally well defned and robust for parameter estmaton and adaptaton because they explot (herarchcal) Bayesan frameworks, whch rely heavly on a pror and posteror dstrbuton of word rates. In ths paper, a new generatve text model was nvestgated, whch unfes the followng concepts developed by the three communtes related to language processng: (1) summary of word rate varablty as a stochastc dstrbuton (2) fnte topc mxture models of multvarate dstrbutons (3) generatve text models based on a (herarchcal) Bayesan framework Based on (1) and (2), t was assumed that word rate varablty can be modeled wth a fnte mxture of Drchlet dstrbutons. Fnte mxtures encapsulate rough topc structures, and each Drchlet dstrbuton yelds clear word rate varablty wthn each topc for all words smultaneously. From (3), a robust model s bult adoptng a Baysan framework, employng a pror and a posteror dstrbuton to estmate DM parameters and to adapt them to the context of thus-processed documents. 2

3 In Sectons 2 and 3 of the paper, the DM model s descrbed, as are parameters estmaton methods, and posteror and predctve dstrbutons of the model. In Secton 4, the relatonshp between DM and other text models s dscussed. In Secton 5, expermental results of applcatons for statstcal language models are presented. The DM model acheves lower perplexty levels than those employng the mxture of ungrams (MU) and LDA models. 2 The DM and parameter estmaton 2.1 The DM and the Polya mxture The Drchlet dstrbuton s defned for a random vector, p =(p 1,p 2...p V ), on a smplex of V dmensons. Elements of a random vector on a smplex sum to 1. We nterpret p as word occurrence probabltes on V words of a vocabulary, so that the Drchlet dstrbuton models word occurrence probabltes. The densty functon of the Drchlet for p s: P D (p; α) = Γ(α) V v=1 Γ(α v) V v=1 p αv 1 v, (1) where α =(α 1,α 2,..., α V ) s a parameter vector, α v > 0andα = V v=1 α v. The Drchlet mxture dstrbuton (Sjölander et al. 1996) wth M components s defned as the followng: M P DM (p; λ, α M 1 )= λ m P D (p; α m ) = m=1 M m=1 λ m Γ(α m ) V v=1 Γ(α mv) V v=1 p αmv 1 v, (2) where λ =(λ 1,λ 2,..., λ M ) s a weght vector for each component Drchlet dstrbuton and α m = v α mv. When the random vector p as parameters of a multnomal s drawn from the DM, the compound dstrbuton for dscrete outcomes y =(y 1,y 2,..., y V )s: P PM (y; λ, α M 1 )= P Mul (y p)p DM (p; λ, α M 1 )dp = = M λ m m=1 M m=1 P Mul (y p)p D (p; α m )dp λ m Γ(α m ) Γ(α m + y) V v=1 Γ(y v + α mv ), (3) Γ(α mv ) 3

4 where y = v y v. Each y v means occurrence frequency of the v-th word n a document. Ths dstrbuton s called the Drchlet-multnomal mxture or the Polya mxture dstrbuton. The Polya mxture s used to estmate parameters for the DM, α and λ. 2.2 Parameter estmaton In ths subsecton, methods for estmatng parameters were ntroduced for the DM wth a maxmum lkelhood estmator of the Polya mxture. The estmatng methods are based on Mnka s estmaton methods for a Drchlet dstrbuton (Mnka 2003) and the EM algorthm (Dempster et al. 1977). Gven the -th datum or the -th tranng document, outcomes can be determned for words y =(y 1,y 2...y V ). For N tranng documents, the log lkelhood functon for the tranng documents D =(y 1, y 2,..., y N )s: N L(D; λ, α M 1 )= log P PM (y ; λ, α M 1 ). =1 The λ and α that maxmze the above lkelhood functon are also DM parameters. Assumng Z =(z 1,z 2,..., z N )andz s a hdden varable that denotes a component generatng the -th document, the log lkelhood for the complete data s: N L(D, Z; λ, α M 1 )= log P (y,z ; λ, α M 1 ). =1 The Q-functon for the EM algorthm or the condtonal expectaton of the above log lkelhood s: Q(θ θ) = P m log P (y,z = m; λ, α M 1 ) m = P m log λ m + Γ(α m ) V Γ(y v + α mv ) P m log, (4) Γ(α m m m + y ) Γ(α v=1 mv ) where P m = P (z = m y ; λ, ᾱ M 1 ) y = v y v. 4

5 λ, ᾱ M 1 are current values of parameters. The frst term of (4) can be maxmzed va the followng update formula for λ. λ m P m (5) The second term of (4) can be maxmzed va the followng update formula for α. Ψ(x) s the dgamma functon. The update formula s derved from the Mnka s fxed-pont teraton (Mnka 2003) and the EM algorthm (see Appendx A). α mv =ᾱ P m{ψ(y k +ᾱ mv ) Ψ(ᾱ mv )} mv P (6) m{ψ(y +ᾱ m ) Ψ(ᾱ m )} If the leavng-one-out (LOO) lkelhood s desgnated as the functon to be maxmzed, a faster update formula s obtaned. The followng update formula s based on Mnka s teraton for the LOO lkelhood (Mnka 2003) and the EM algorthm (see Appendx B): α mv =ᾱ P m{y v /(y v 1+ᾱ mv )} mv P (7) m{y /(y 1+ᾱ m )} Ths LOO update functon s used to estmate the DM parameters n all followng experments. 3 Inference 3.1 A posteror and predctve dstrbuton The DM model wth parameters estmated usng the above methods s regarded as a pror for the dstrbuton of word occurrence probabltes. In ths secton, a method s descrbed for computng a posteror and expectatons for word occurrence probablty, gven a word hstory or a document. The followng formula s a posteror dstrbuton for word occurrence probablty gven the data hstory y =(y 1,y 2,..., y V ), assumng a multnomal dstrbuton for count data y, wth parameter p dstrbuted accordng to a DM wth parameter α as a pror: P (p y) = P (y p)p (p) P (y p)p (p)dp = P Mul(y p)p DM (p; λ, α M 1 ) PMul (y p)p DM (p; λ, α M 1 )dp = m=1 B m v pαmv+yv 1 v m=1 C, (8) m 5

6 where B m = λ m Γ(α m ) V v=1 Γ(α mv), V v=1 C m = B Γ(α mv + y v ) m, Γ(α m + y) α m = V α mv, v=1 y = V y v. v=1 Expectaton of occurrence probablty of the w-thwordnavocabulary,p (w y), s: P (w y) = p w P (p y)dp = = = m=1 B V m v=1 pαmv+yv+δ(v w) 1 v dp m=1 C m m=1 B V Γ{α mv+y v+δ(v w)} m v=1 Γ(α m+y+1) m=1 C m m=1 C m αmw+yw α m+y m=1 C m (9) where δ(k) s Kronecker s delta, δ(k) = { 1, f k =0, 0, others. In contrast to LDA, DM has a closed formula for computng expectaton of word occurrence probablty. 3.2 Model averagng In the experment secton (Secton 5), t s demonstrated that a statstcal language model usng DM outperforms other models wth a fewer components, but that performance does not rse n proporton to the number of components. Ths problem reflects the overfttng nature of DM models. Avodng the overfttng problem, a smple model averagng method s adopted, whch computes a predctve dstrbuton as a mean for each predcton of DM wth a dfferent number of components. It s assumed that there are N dfferent DM models, and that P (w y), =1, 2...N s a predctve probablty for the word w. The followng averagng 6

7 equaton s referred to as method 1, n whch evdence probablty for a hstory s regarded as credt weght for each model: P ma1 (w y) = PPM (y; λ, α) j P j PM (y; λ, α)p (w y) (10) Method 2 s a smpler method, averagng predctve probabltes wth a smple arthmetc mean: P ma2 (w y) = 1 P (w y) (11) N 4 Relatonshp wth other topc-based models The relatonshp of the DM wth the other topc-based models s demonstrated usng graphcal representaton of models. The followng are evdence probabltes for data y =(y 1,y 2,..., y V ) n each topc-based model. Mxture of ungrams: P (y) = z p(z)p Mul(y z) LDA: P (y) = P D (θ α) v P (w v θ) yv dθ, wherep (w v θ) = z p(z θ)p(w v z) DM: P (y) =P PM (y) Z s of the mxture of ungram models and LDA are latent varables representng topcs. θ of LDA s weghts for each ungram modeled wth the Drchlet dstrbuton P D (θ; α). Evdence probablty for DM s the Polya dstrbuton descrbed n Sec Fgure 1 s a graphcal representaton of four models ncludng a probablstc LSA (PLSA). Outer and nner squares represent N documents (d) and L words (w) neachdocument, respectvely. Crcles are random varables and double crcles are model parameters. Arrows ndcate condtonal dependent relatonshps between varables. The smplest model s MU. In ths model, t s assumed that a document s generated from just one topc the frst chosen topc z, s used to generate all words n the document. The PLSA model can relax ths assumpton. Under PLSA, t s possble that each word n a document s generated wth dfferent topcs. In the result, a document s assumed to have multple topcs, a realstc assumpton. However, PLSA s not a well-defned generatve model for documents, because the model has no natural method to assgn probablty to a new document. LDA s an extenson of PLSA. Assumng Drchlet dstrbuton weghts for each ungram model, LDA can assgn probablty to new documents. The DM s another MU extenson. For DM, the assumpton of MU one topc for one document remans, but dstrbuton of ungram probablty s drectly modeled by the Drchlet dstrbuton. Other models, ncludng PLSA and LDA, can reveal multtopc structure, but each topc model s qute smple a ungram model. Though DM assumes a untopc structure for each document, ts topc models are rcher than those of ts strong rvals. 7

8 Fgure 1: Graphcal model expressons of generatve text models 5 Experments The performance of the DM was compared wth those of the LDA and the MU n test-set perplextes usng adaptve ngram language models. Tranng n all three models reled on the same tranng data and were evaluated wth the same test data. The tranng data was a set of 98,211 Japanese newspaper artcles from the year The test data was a set of 495 randomly selected artcles of more than 40 words, from the year The vocabulary comprsed the 20,000 most frequent words n the tranng data, and had a cover rate of 97.1%. The varatonal EM method was used to estmate the LDA parameter (Ble et al. 2003), but α n the Drchlet parameter for LDA was updated wth Mnka s fxed-pont teraton method (Mnka 2003), nstead of the Newton-Raphson method. For the DM, the estmaton method (1) based on an LOO lkelhood was used. Tranng for both models used the same stoppng crtera the change of closed perplexty for the tranng data before and after one global loop of teraton s less than 0.1%. Models were constructed wth both LDA and DM havng 1, 2, 5, 10, 20, 50, 100, 200, and 500 components. Fgure 2 presents the perplexty of each model for ts dfferent number of components. Each perplexty s computed as an nverse of probablty per word, a geometrcal mean of a document probablty, that s, an evdence probablty for data. For the DM, the probablty of the Polya mxture for a document s the document probablty. The DM consstently outperforms both other models. However, DM performance s best at 20 components, as t saturates wth somewhat fewer components. DM s a generalzed verson of MU and the saturaton suggests that DM suffers the MU overfttng problem. Fgure 3 presents the perplexty of the adaptve language models for dfferent numbers of components. Adaptve language models predct the probablty of the next word by usng a hstory, such as a conventonal ngram language model. They employ a longer hstory, such as an entre prevously processed secton, rather than just the two words precedng the target 8

9 Fgure 2: Comparson of test-set perplexty by document probablty word. In ths experment, the models were adapted to a secton of a document from the frst to the current word, and then probabltes were computed for the next 20 words. Every 20 words, ths operaton was repeated. Fgure 3 also shows that DM outperforms better than LDA. For the next experment, a trgram language model was developed, dynamcally adapted to longer hstores for topc-based models. A ungram rescalng method was used for the adaptaton (Gldea and Hofmann 1999). Fgure 4 shows the perplexty of combned adaptve language models. Lke the above experments, the performance of a ungram-rescalng trgram model wth DM s better than that wth LDA. Table 1 shows the best perplextes for each model. The values n parentheses are perplexty reducton rates from baselne models. LDA s a multtopc text model, whch reveals a mxture of topcs n a document, whle DM s a untopc text model, whch assumes one topc per document. Generally speakng, there are few documents wth just one topc. Ths rases the queston as to why DM s better than LDA n perplexty for those experments. Whle there s no clear answer, t s possble that the tranng and test materals for those experments were based on newspaper artcles, and thus somewhat focused on a sngle topc. DM may capture the detaled dstrbuton of word probabltes on topcs usng multple Drchlets, whereas LDA smply captures the dstrbuton ndrectly as a topc proporton or weght for a mxture usng a sngle Drchlet. The same experments need to be conducted wth web data, whch contans more complex topcal structures. 6 Conclusons Fnte mxture of Drchlet dstrbutons was nvestgated as a new text model. Parameter estmaton methods were ntroduced, based on Mnka s fxed-pont methods (Mnka 2003) 9

10 Fgure 3: Comparson of test-set perplexty by hstory adaptaton probablty(ungram) Fgure 4: Comparson of test-set perplexty by hstory adaptaton probablty(trgram) and the EM algorthm usng the mxture of the Drchlet-multnomal dstrbuton. Expermental results for applcatons of statstcal language models were demonstrated and ther performance compared for perplexty. The DM model acheved the lowest perplexty level despte ts untopc nature. References [1] D.M. Ble, A.Y. Ng, and M.I. Jordan Latent Drchlet Allocaton, Neural Informaton Processng Systems, vol.14. [2] D.M. Ble, A.Y. Ng, and M.I. Jordan Latent Drchlet Allocaton, Journal of Machne Learnng Research, Vol.3, pages

11 Table 1: Mnmum perplexty of each aspect model hstory hstory document adaptaton adaptaton probablty (ungram) (trgram) DM (32.0%) 60.74(19.6%) DM ave (36.1%) (23.2%) - LDA (29.9%) (11.1%) Mxture of Ungrams [3] Kenneth W. Church and Wllam A. Gale Posson mxtures. Natural Language Engneerng, Vol.1, No.1, pages [4] Kenneth W. Church Emprcal estmates of adaptaton: The chance of two Noregas s closer to p/2 thanp 2. In Proc. of Colng 2000, pages [5] A. P. Dempster, N. M. Lard and D. B. Rubn Maxmum lkelhood from ncomplete data va the EM algorthm. J. of Royal Statstcal Socety, Seres B, Vol.39, pages [6] T. Hofmann Probablstc latent semantc ndexng, Proc. of the 22nd Annual ACM Conference on Research and Development n Informaton Retreval, pp.50-57, Berkeley, Calforna. [7] K. Sjölander, K. Karplus, M. Brown, R. Hunghey, A. Krogh, I.S. Man, and D. Haussler Drchlet mxtures:a method for mproved detecton of weak but sgnfcant proten sequence homology, Computer Applcatons n the Boscences, vol.12, no.4, pp [8] D. Gldea, and T. Hofmann Topc-based language models usng em, Proc. of the 6th European Conference on Speech Communcaton and Technology (EU- ROSPEECH 99). [9] R.M. Iyer, and M. Ostendorf Modelng long dstance dependence n language:topc mxtures versus dynamc cache models, IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, vol.7, no.1, pp [10] S.T. K. Ngam, A. McCallum, and T. Mtchell Text classfcaton from labeled and unlabeled documents usng EM, Machne Learnng, vol.39, no.2/3, pp [11] T. Mnka Estmatng a Drchlet dstrbuton, mnka/papers/drchlet/ [12] Ronald Rosenfeld A maxmum entropy approach to adaptve statstcal language modelng. Computer Speech and Language, Vol.10, No.3, pages

12 A Dervaton of the update formula of α for the MLE Mnka s fxed-pont teraton method (Mnka 1999) and the EM algorthm were used. The followng equatons (Mnka 2003) were used to get the lower bound of the second term of the equaton (4): Γ(α m ) Γ(α m + y ) Γ(ᾱ { } m)exp (ᾱm α m )b m, Γ(ᾱ m + y ) and Γ(ᾱ mv + y v ) Γ(ᾱ mv ) c mv ᾱ a mv mv (f y v 1), where y = v y v, α m = v α mv, and ᾱ m = v ᾱ mv. The lower bound s: Γ(ᾱ m ) P m log Γ(ᾱ m + y ) where m m V v=1 Γ(y v +ᾱ mv ) Γ(ᾱ mv ) [ P m log Γ(ᾱ m)exp { } (ᾱ m α m )b m + Γ(y +ᾱ m ) v log c mv α a mv mv ] Q (α), a mv = { Ψ(ᾱ mv + y v ) Ψ(ᾱ mv ) } ᾱ mv, b m =Ψ(ᾱ m + y ) Ψ(ᾱ m ), c mv = Γ(ᾱ mv + y v ) ᾱ a mv mv. Γ(ᾱ mv ) Ths lower bound can be maxmzed wth the followng update formula for α mv. Q (α) = P m b + 1 P m a mv =0 α mv ᾱ mv α mv =ᾱ P m{ψ(y k +ᾱ mv ) Ψ(ᾱ mv )} mv P m{ψ(y +ᾱ m ) Ψ(ᾱ m )} 12

13 B Dervaton for the update formula of α for the maxmum LOO lkelhood Gven datum y v n whch one word v s left out of a document, the predctve probablty P (v y v ) for the word v s the followng from the equaton (9): where p(v y v ) m P m α mv + y v 1 α m + y 1, P v m P m. The LOO log-lkelhood, L loo,s L loo (y α) y v log α mv + y v 1 P m α v m m + y 1. Snce P m s very nearly to 1 or 0 for almost all cases, the precedng equaton can be transformed: L loo (y α) ( ) αmv + y v 1 y v P m log α v m m + y 1 Usng the precedng equaton, lkelhood can be maxmzed ndependently for α mv of the m-th Drchlet component. The lower bound of the LOO log-lkelhood L m loo for the m-th component s: L m loo ( ) y v P m q mv log α mv y P m a m α m +(const.), v where ᾱ mv q mv = ᾱ mv + y v 1, 1 a m = ᾱ m + y 1. The followng nequaltes (Mnka 2003) were used to get the above bound: log(n + x) q log x +(1 q)logn q log q (1 q)log(1 q) where q = ˆx, n+ˆx log(x) ax 1+logˆx 13

14 where a =1/ˆx. An update formula for fxed-pont teraton s: α mv =ᾱ P m{y v /(y v 1+ᾱ mv )} mv P m{y /(y 1+ᾱ m )}. 14

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2) 1/16 MATH 829: Introducton to Data Mnng and Analyss The EM algorthm (part 2) Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 20, 2016 Recall 2/16 We are gven ndependent observatons

More information

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements CS 750 Machne Learnng Lecture 5 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 750 Machne Learnng Announcements Homework Due on Wednesday before the class Reports: hand n before

More information

Retrieval Models: Language models

Retrieval Models: Language models CS-590I Informaton Retreval Retreval Models: Language models Luo S Department of Computer Scence Purdue Unversty Introducton to language model Ungram language model Document language model estmaton Maxmum

More information

Expectation Maximization Mixture Models HMMs

Expectation Maximization Mixture Models HMMs -755 Machne Learnng for Sgnal Processng Mture Models HMMs Class 9. 2 Sep 200 Learnng Dstrbutons for Data Problem: Gven a collecton of eamples from some data, estmate ts dstrbuton Basc deas of Mamum Lelhood

More information

Conjugacy and the Exponential Family

Conjugacy and the Exponential Family CS281B/Stat241B: Advanced Topcs n Learnng & Decson Makng Conjugacy and the Exponental Famly Lecturer: Mchael I. Jordan Scrbes: Bran Mlch 1 Conjugacy In the prevous lecture, we saw conjugate prors for the

More information

The EM Algorithm (Dempster, Laird, Rubin 1977) The missing data or incomplete data setting: ODL(φ;Y ) = [Y;φ] = [Y X,φ][X φ] = X

The EM Algorithm (Dempster, Laird, Rubin 1977) The missing data or incomplete data setting: ODL(φ;Y ) = [Y;φ] = [Y X,φ][X φ] = X The EM Algorthm (Dempster, Lard, Rubn 1977 The mssng data or ncomplete data settng: An Observed Data Lkelhood (ODL that s a mxture or ntegral of Complete Data Lkelhoods (CDL. (1a ODL(;Y = [Y;] = [Y,][

More information

Evaluation for sets of classes

Evaluation for sets of classes Evaluaton for Tet Categorzaton Classfcaton accuracy: usual n ML, the proporton of correct decsons, Not approprate f the populaton rate of the class s low Precson, Recall and F 1 Better measures 21 Evaluaton

More information

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Hidden Markov Models & The Multivariate Gaussian (10/26/04) CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

STATS 306B: Unsupervised Learning Spring Lecture 10 April 30

STATS 306B: Unsupervised Learning Spring Lecture 10 April 30 STATS 306B: Unsupervsed Learnng Sprng 2014 Lecture 10 Aprl 30 Lecturer: Lester Mackey Scrbe: Joey Arthur, Rakesh Achanta 10.1 Factor Analyss 10.1.1 Recap Recall the factor analyss (FA) model for lnear

More information

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012 MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data Condtonal Random Felds: Probablstc Models for Segmentng and Labelng Sequence Data Paper by John Lafferty, Andrew McCallum, and Fernando Perera ICML 2001 Presentaton by Joe Drsh May 9, 2002 Man Goals Present

More information

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models Computaton of Hgher Order Moments from Two Multnomal Overdsperson Lkelhood Models BY J. T. NEWCOMER, N. K. NEERCHAL Department of Mathematcs and Statstcs, Unversty of Maryland, Baltmore County, Baltmore,

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1 On an Extenson of Stochastc Approxmaton EM Algorthm for Incomplete Data Problems Vahd Tadayon Abstract: The Stochastc Approxmaton EM (SAEM algorthm, a varant stochastc approxmaton of EM, s a versatle tool

More information

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010 Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm The Expectaton-Maxmaton Algorthm Charles Elan elan@cs.ucsd.edu November 16, 2007 Ths chapter explans the EM algorthm at multple levels of generalty. Secton 1 gves the standard hgh-level verson of the algorthm.

More information

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton

More information

EM and Structure Learning

EM and Structure Learning EM and Structure Learnng Le Song Machne Learnng II: Advanced Topcs CSE 8803ML, Sprng 2012 Partally observed graphcal models Mxture Models N(μ 1, Σ 1 ) Z X N N(μ 2, Σ 2 ) 2 Gaussan mxture model Consder

More information

Chapter 11: Simple Linear Regression and Correlation

Chapter 11: Simple Linear Regression and Correlation Chapter 11: Smple Lnear Regresson and Correlaton 11-1 Emprcal Models 11-2 Smple Lnear Regresson 11-3 Propertes of the Least Squares Estmators 11-4 Hypothess Test n Smple Lnear Regresson 11-4.1 Use of t-tests

More information

Motion Perception Under Uncertainty. Hongjing Lu Department of Psychology University of Hong Kong

Motion Perception Under Uncertainty. Hongjing Lu Department of Psychology University of Hong Kong Moton Percepton Under Uncertanty Hongjng Lu Department of Psychology Unversty of Hong Kong Outlne Uncertanty n moton stmulus Correspondence problem Qualtatve fttng usng deal observer models Based on sgnal

More information

arxiv: v2 [stat.me] 26 Jun 2012

arxiv: v2 [stat.me] 26 Jun 2012 The Two-Way Lkelhood Rato (G Test and Comparson to Two-Way χ Test Jesse Hoey June 7, 01 arxv:106.4881v [stat.me] 6 Jun 01 1 One-Way Lkelhood Rato or χ test Suppose we have a set of data x and two hypotheses

More information

NUMERICAL DIFFERENTIATION

NUMERICAL DIFFERENTIATION NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the

More information

Markov Chain Monte Carlo Lecture 6

Markov Chain Monte Carlo Lecture 6 where (x 1,..., x N ) X N, N s called the populaton sze, f(x) f (x) for at least one {1, 2,..., N}, and those dfferent from f(x) are called the tral dstrbutons n terms of mportance samplng. Dfferent ways

More information

SDMML HT MSc Problem Sheet 4

SDMML HT MSc Problem Sheet 4 SDMML HT 06 - MSc Problem Sheet 4. The recever operatng characterstc ROC curve plots the senstvty aganst the specfcty of a bnary classfer as the threshold for dscrmnaton s vared. Let the data space be

More information

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U) Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of

More information

STAT 3008 Applied Regression Analysis

STAT 3008 Applied Regression Analysis STAT 3008 Appled Regresson Analyss Tutoral : Smple Lnear Regresson LAI Chun He Department of Statstcs, The Chnese Unversty of Hong Kong 1 Model Assumpton To quantfy the relatonshp between two factors,

More information

Comparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method

Comparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method Appled Mathematcal Scences, Vol. 7, 0, no. 47, 07-0 HIARI Ltd, www.m-hkar.com Comparson of the Populaton Varance Estmators of -Parameter Exponental Dstrbuton Based on Multple Crtera Decson Makng Method

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

MAXIMUM A POSTERIORI TRANSDUCTION

MAXIMUM A POSTERIORI TRANSDUCTION MAXIMUM A POSTERIORI TRANSDUCTION LI-WEI WANG, JU-FU FENG School of Mathematcal Scences, Peng Unversty, Bejng, 0087, Chna Center for Informaton Scences, Peng Unversty, Bejng, 0087, Chna E-MIAL: {wanglw,

More information

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute

More information

Research Article Green s Theorem for Sign Data

Research Article Green s Theorem for Sign Data Internatonal Scholarly Research Network ISRN Appled Mathematcs Volume 2012, Artcle ID 539359, 10 pages do:10.5402/2012/539359 Research Artcle Green s Theorem for Sgn Data Lous M. Houston The Unversty of

More information

Space of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics

Space of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics /7/7 CSE 73: Artfcal Intellgence Bayesan - Learnng Deter Fox Sldes adapted from Dan Weld, Jack Breese, Dan Klen, Daphne Koller, Stuart Russell, Andrew Moore & Luke Zettlemoyer What s Beng Learned? Space

More information

COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS

COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS Avalable onlne at http://sck.org J. Math. Comput. Sc. 3 (3), No., 6-3 ISSN: 97-537 COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS

More information

More metrics on cartesian products

More metrics on cartesian products More metrcs on cartesan products If (X, d ) are metrc spaces for 1 n, then n Secton II4 of the lecture notes we defned three metrcs on X whose underlyng topologes are the product topology The purpose of

More information

Hidden Markov Models

Hidden Markov Models CM229S: Machne Learnng for Bonformatcs Lecture 12-05/05/2016 Hdden Markov Models Lecturer: Srram Sankararaman Scrbe: Akshay Dattatray Shnde Edted by: TBD 1 Introducton For a drected graph G we can wrte

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

Supporting Information

Supporting Information Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to

More information

10.34 Fall 2015 Metropolis Monte Carlo Algorithm

10.34 Fall 2015 Metropolis Monte Carlo Algorithm 10.34 Fall 2015 Metropols Monte Carlo Algorthm The Metropols Monte Carlo method s very useful for calculatng manydmensonal ntegraton. For e.g. n statstcal mechancs n order to calculate the prospertes of

More information

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement Markov Chan Monte Carlo MCMC, Gbbs Samplng, Metropols Algorthms, and Smulated Annealng 2001 Bonformatcs Course Supplement SNU Bontellgence Lab http://bsnuackr/ Outlne! Markov Chan Monte Carlo MCMC! Metropols-Hastngs

More information

Machine learning: Density estimation

Machine learning: Density estimation CS 70 Foundatons of AI Lecture 3 Machne learnng: ensty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square ata: ensty estmaton {.. n} x a vector of attrbute values Objectve: estmate the model of

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Homework Assignment 3 Due in class, Thursday October 15

Homework Assignment 3 Due in class, Thursday October 15 Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.

More information

Bayesian predictive Configural Frequency Analysis

Bayesian predictive Configural Frequency Analysis Psychologcal Test and Assessment Modelng, Volume 54, 2012 (3), 285-292 Bayesan predctve Confgural Frequency Analyss Eduardo Gutérrez-Peña 1 Abstract Confgural Frequency Analyss s a method for cell-wse

More information

The Basic Idea of EM

The Basic Idea of EM The Basc Idea of EM Janxn Wu LAMDA Group Natonal Key Lab for Novel Software Technology Nanjng Unversty, Chna wujx2001@gmal.com June 7, 2017 Contents 1 Introducton 1 2 GMM: A workng example 2 2.1 Gaussan

More information

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Maxmum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models

More information

Composite Hypotheses testing

Composite Hypotheses testing Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter

More information

Basically, if you have a dummy dependent variable you will be estimating a probability.

Basically, if you have a dummy dependent variable you will be estimating a probability. ECON 497: Lecture Notes 13 Page 1 of 1 Metropoltan State Unversty ECON 497: Research and Forecastng Lecture Notes 13 Dummy Dependent Varable Technques Studenmund Chapter 13 Bascally, f you have a dummy

More information

Appendix B: Resampling Algorithms

Appendix B: Resampling Algorithms 407 Appendx B: Resamplng Algorthms A common problem of all partcle flters s the degeneracy of weghts, whch conssts of the unbounded ncrease of the varance of the mportance weghts ω [ ] of the partcles

More information

Course 395: Machine Learning - Lectures

Course 395: Machine Learning - Lectures Course 395: Machne Learnng - Lectures Lecture 1-2: Concept Learnng (M. Pantc Lecture 3-4: Decson Trees & CC Intro (M. Pantc Lecture 5-6: Artfcal Neural Networks (S.Zaferou Lecture 7-8: Instance ased Learnng

More information

Linear Regression Analysis: Terminology and Notation

Linear Regression Analysis: Terminology and Notation ECON 35* -- Secton : Basc Concepts of Regresson Analyss (Page ) Lnear Regresson Analyss: Termnology and Notaton Consder the generc verson of the smple (two-varable) lnear regresson model. It s represented

More information

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family IOSR Journal of Mathematcs IOSR-JM) ISSN: 2278-5728. Volume 3, Issue 3 Sep-Oct. 202), PP 44-48 www.osrjournals.org Usng T.O.M to Estmate Parameter of dstrbutons that have not Sngle Exponental Famly Jubran

More information

Boostrapaggregating (Bagging)

Boostrapaggregating (Bagging) Boostrapaggregatng (Baggng) An ensemble meta-algorthm desgned to mprove the stablty and accuracy of machne learnng algorthms Can be used n both regresson and classfcaton Reduces varance and helps to avod

More information

Chapter Newton s Method

Chapter Newton s Method Chapter 9. Newton s Method After readng ths chapter, you should be able to:. Understand how Newton s method s dfferent from the Golden Secton Search method. Understand how Newton s method works 3. Solve

More information

An Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation

An Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation An Experment/Some Intuton I have three cons n my pocket, 6.864 (Fall 2006): Lecture 18 The EM Algorthm Con 0 has probablty λ of heads; Con 1 has probablty p 1 of heads; Con 2 has probablty p 2 of heads

More information

3.1 ML and Empirical Distribution

3.1 ML and Empirical Distribution 67577 Intro. to Machne Learnng Fall semester, 2008/9 Lecture 3: Maxmum Lkelhood/ Maxmum Entropy Dualty Lecturer: Amnon Shashua Scrbe: Amnon Shashua 1 In the prevous lecture we defned the prncple of Maxmum

More information

Generative classification models

Generative classification models CS 675 Intro to Machne Learnng Lecture Generatve classfcaton models Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square Data: D { d, d,.., dn} d, Classfcaton represents a dscrete class value Goal: learn

More information

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA 4 Analyss of Varance (ANOVA) 5 ANOVA 51 Introducton ANOVA ANOVA s a way to estmate and test the means of multple populatons We wll start wth one-way ANOVA If the populatons ncluded n the study are selected

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

Tracking with Kalman Filter

Tracking with Kalman Filter Trackng wth Kalman Flter Scott T. Acton Vrgna Image and Vdeo Analyss (VIVA), Charles L. Brown Department of Electrcal and Computer Engneerng Department of Bomedcal Engneerng Unversty of Vrgna, Charlottesvlle,

More information

Computing MLE Bias Empirically

Computing MLE Bias Empirically Computng MLE Bas Emprcally Kar Wa Lm Australan atonal Unversty January 3, 27 Abstract Ths note studes the bas arses from the MLE estmate of the rate parameter and the mean parameter of an exponental dstrbuton.

More information

Gaussian process classification: a message-passing viewpoint

Gaussian process classification: a message-passing viewpoint Gaussan process classfcaton: a message-passng vewpont Flpe Rodrgues fmpr@de.uc.pt November 014 Abstract The goal of ths short paper s to provde a message-passng vewpont of the Expectaton Propagaton EP

More information

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore Sesson Outlne Introducton to classfcaton problems and dscrete choce models. Introducton to Logstcs Regresson. Logstc functon and Logt functon. Maxmum Lkelhood Estmator (MLE) for estmaton of LR parameters.

More information

Section 8.3 Polar Form of Complex Numbers

Section 8.3 Polar Form of Complex Numbers 80 Chapter 8 Secton 8 Polar Form of Complex Numbers From prevous classes, you may have encountered magnary numbers the square roots of negatve numbers and, more generally, complex numbers whch are the

More information

Global Sensitivity. Tuesday 20 th February, 2018

Global Sensitivity. Tuesday 20 th February, 2018 Global Senstvty Tuesday 2 th February, 28 ) Local Senstvty Most senstvty analyses [] are based on local estmates of senstvty, typcally by expandng the response n a Taylor seres about some specfc values

More information

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4) I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes

More information

Mixture o f of Gaussian Gaussian clustering Nov

Mixture o f of Gaussian Gaussian clustering Nov Mture of Gaussan clusterng Nov 11 2009 Soft vs hard lusterng Kmeans performs Hard clusterng: Data pont s determnstcally assgned to one and only one cluster But n realty clusters may overlap Soft-clusterng:

More information

First Year Examination Department of Statistics, University of Florida

First Year Examination Department of Statistics, University of Florida Frst Year Examnaton Department of Statstcs, Unversty of Florda May 7, 010, 8:00 am - 1:00 noon Instructons: 1. You have four hours to answer questons n ths examnaton.. You must show your work to receve

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

Uncertainty as the Overlap of Alternate Conditional Distributions

Uncertainty as the Overlap of Alternate Conditional Distributions Uncertanty as the Overlap of Alternate Condtonal Dstrbutons Olena Babak and Clayton V. Deutsch Centre for Computatonal Geostatstcs Department of Cvl & Envronmental Engneerng Unversty of Alberta An mportant

More information

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Mamum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models for

More information

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline Outlne Bayesan Networks: Maxmum Lkelhood Estmaton and Tree Structure Learnng Huzhen Yu janey.yu@cs.helsnk.f Dept. Computer Scence, Unv. of Helsnk Probablstc Models, Sprng, 200 Notces: I corrected a number

More information

Lecture 6: Introduction to Linear Regression

Lecture 6: Introduction to Linear Regression Lecture 6: Introducton to Lnear Regresson An Manchakul amancha@jhsph.edu 24 Aprl 27 Lnear regresson: man dea Lnear regresson can be used to study an outcome as a lnear functon of a predctor Example: 6

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2017 Instructor: Victor Aguirregabiria

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2017 Instructor: Victor Aguirregabiria ECOOMETRICS II ECO 40S Unversty of Toronto Department of Economcs Wnter 07 Instructor: Vctor Agurregabra SOLUTIO TO FIAL EXAM Tuesday, Aprl 8, 07 From :00pm-5:00pm 3 hours ISTRUCTIOS: - Ths s a closed-book

More information

Discussion of Extensions of the Gauss-Markov Theorem to the Case of Stochastic Regression Coefficients Ed Stanek

Discussion of Extensions of the Gauss-Markov Theorem to the Case of Stochastic Regression Coefficients Ed Stanek Dscusson of Extensons of the Gauss-arkov Theorem to the Case of Stochastc Regresson Coeffcents Ed Stanek Introducton Pfeffermann (984 dscusses extensons to the Gauss-arkov Theorem n settngs where regresson

More information

Joint Statistical Meetings - Biopharmaceutical Section

Joint Statistical Meetings - Biopharmaceutical Section Iteratve Ch-Square Test for Equvalence of Multple Treatment Groups Te-Hua Ng*, U.S. Food and Drug Admnstraton 1401 Rockvlle Pke, #200S, HFM-217, Rockvlle, MD 20852-1448 Key Words: Equvalence Testng; Actve

More information

Maximum Likelihood Estimation (MLE)

Maximum Likelihood Estimation (MLE) Maxmum Lkelhood Estmaton (MLE) Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 01 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y

More information

Classification as a Regression Problem

Classification as a Regression Problem Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class

More information

Statistical inference for generalized Pareto distribution based on progressive Type-II censored data with random removals

Statistical inference for generalized Pareto distribution based on progressive Type-II censored data with random removals Internatonal Journal of Scentfc World, 2 1) 2014) 1-9 c Scence Publshng Corporaton www.scencepubco.com/ndex.php/ijsw do: 10.14419/jsw.v21.1780 Research Paper Statstcal nference for generalzed Pareto dstrbuton

More information

XII.3 The EM (Expectation-Maximization) Algorithm

XII.3 The EM (Expectation-Maximization) Algorithm XII.3 The EM (Expectaton-Maxzaton) Algorth Toshnor Munaata 3/7/06 The EM algorth s a technque to deal wth varous types of ncoplete data or hdden varables. It can be appled to a wde range of learnng probles

More information

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation Statstcs for Managers Usng Mcrosoft Excel/SPSS Chapter 13 The Smple Lnear Regresson Model and Correlaton 1999 Prentce-Hall, Inc. Chap. 13-1 Chapter Topcs Types of Regresson Models Determnng the Smple Lnear

More information

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6 Department of Quanttatve Methods & Informaton Systems Tme Seres and Ther Components QMIS 30 Chapter 6 Fall 00 Dr. Mohammad Zanal These sldes were modfed from ther orgnal source for educatonal purpose only.

More information

I529: Machine Learning in Bioinformatics (Spring 2017) Markov Models

I529: Machine Learning in Bioinformatics (Spring 2017) Markov Models I529: Machne Learnng n Bonformatcs (Sprng 217) Markov Models Yuzhen Ye School of Informatcs and Computng Indana Unversty, Bloomngton Sprng 217 Outlne Smple model (frequency & profle) revew Markov chan

More information

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Analyss of Varance and Desgn of Experment-I MODULE VIII LECTURE - 34 ANALYSIS OF VARIANCE IN RANDOM-EFFECTS MODEL AND MIXED-EFFECTS EFFECTS MODEL Dr Shalabh Department of Mathematcs and Statstcs Indan

More information

A be a probability space. A random vector

A be a probability space. A random vector Statstcs 1: Probablty Theory II 8 1 JOINT AND MARGINAL DISTRIBUTIONS In Probablty Theory I we formulate the concept of a (real) random varable and descrbe the probablstc behavor of ths random varable by

More information

Gaussian Mixture Models

Gaussian Mixture Models Lab Gaussan Mxture Models Lab Objectve: Understand the formulaton of Gaussan Mxture Models (GMMs) and how to estmate GMM parameters. You ve already seen GMMs as the observaton dstrbuton n certan contnuous

More information

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin Fnte Mxture Models and Expectaton Maxmzaton Most sldes are from: Dr. Maro Fgueredo, Dr. Anl Jan and Dr. Rong Jn Recall: The Supervsed Learnng Problem Gven a set of n samples X {(x, y )},,,n Chapter 3 of

More information

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018 INF 5860 Machne learnng for mage classfcaton Lecture 3 : Image classfcaton and regresson part II Anne Solberg January 3, 08 Today s topcs Multclass logstc regresson and softma Regularzaton Image classfcaton

More information

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers Psychology 282 Lecture #24 Outlne Regresson Dagnostcs: Outlers In an earler lecture we studed the statstcal assumptons underlyng the regresson model, ncludng the followng ponts: Formal statement of assumptons.

More information

Lecture Nov

Lecture Nov Lecture 18 Nov 07 2008 Revew Clusterng Groupng smlar obects nto clusters Herarchcal clusterng Agglomeratve approach (HAC: teratvely merge smlar clusters Dfferent lnkage algorthms for computng dstances

More information

How its computed. y outcome data λ parameters hyperparameters. where P denotes the Laplace approximation. k i k k. Andrew B Lawson 2013

How its computed. y outcome data λ parameters hyperparameters. where P denotes the Laplace approximation. k i k k. Andrew B Lawson 2013 Andrew Lawson MUSC INLA INLA s a relatvely new tool that can be used to approxmate posteror dstrbutons n Bayesan models INLA stands for ntegrated Nested Laplace Approxmaton The approxmaton has been known

More information

EPR Paradox and the Physical Meaning of an Experiment in Quantum Mechanics. Vesselin C. Noninski

EPR Paradox and the Physical Meaning of an Experiment in Quantum Mechanics. Vesselin C. Noninski EPR Paradox and the Physcal Meanng of an Experment n Quantum Mechancs Vesseln C Nonnsk vesselnnonnsk@verzonnet Abstract It s shown that there s one purely determnstc outcome when measurement s made on

More information

Copyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for P Charts. Dr. Wayne A. Taylor

Copyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for P Charts. Dr. Wayne A. Taylor Taylor Enterprses, Inc. Control Lmts for P Charts Copyrght 2017 by Taylor Enterprses, Inc., All Rghts Reserved. Control Lmts for P Charts Dr. Wayne A. Taylor Abstract: P charts are used for count data

More information

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU Group M D L M Chapter Bayesan Decson heory Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty Bayesan Decson heory Bayesan decson theory s a statstcal approach to data mnng/pattern

More information

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton Memoral Unversty of ewfoundland Pattern Recognton Lecture 7 May 3, 006 http://wwwengrmunca/~charlesr Offce Hours: uesdays hursdays 8:30-9:30

More information

x = , so that calculated

x = , so that calculated Stat 4, secton Sngle Factor ANOVA notes by Tm Plachowsk n chapter 8 we conducted hypothess tests n whch we compared a sngle sample s mean or proporton to some hypotheszed value Chapter 9 expanded ths to

More information

Natural Images, Gaussian Mixtures and Dead Leaves Supplementary Material

Natural Images, Gaussian Mixtures and Dead Leaves Supplementary Material Natural Images, Gaussan Mxtures and Dead Leaves Supplementary Materal Danel Zoran Interdscplnary Center for Neural Computaton Hebrew Unversty of Jerusalem Israel http://www.cs.huj.ac.l/ danez Yar Wess

More information

Power law and dimension of the maximum value for belief distribution with the max Deng entropy

Power law and dimension of the maximum value for belief distribution with the max Deng entropy Power law and dmenson of the maxmum value for belef dstrbuton wth the max Deng entropy Bngy Kang a, a College of Informaton Engneerng, Northwest A&F Unversty, Yanglng, Shaanx, 712100, Chna. Abstract Deng

More information