Dirichlet Mixtures in Text Modeling
|
|
- Avis Baker
- 5 years ago
- Views:
Transcription
1 Drchlet Mxtures n Text Modelng Mko Yamamoto and Kugatsu Sadamtsu CS Techncal report CS-TR-05-1 Unversty of Tsukuba May 30, 2005 Abstract Word rates n text vary accordng to global factors such as genre, topc, author, and expected readershp (Church and Gale 1995). Models that summarze such global factors n text or at the document level, are called text models. A fnte mxture of Drchlet dstrbuton (Drchlet Mxture or DM for short) was nvestgated as a new text model. When parameters of a multnomal are drawn from a DM, the compound for dscrete outcomes s a fnte mxture of the Drchlet-multnomal. A Drchlet multnomal can be regarded as a multvarate verson of the Posson mxture, a relable unvarate model for global factors (Church and Gale 1995). In the present paper, the DM and ts compounds are ntroduced, wth parameter estmaton methods derved from Mnka s fxed-pont methods (Mnka 2003) and the EM algorthm. The method can estmate a consderable number of parameters of a large DM,.e., a few hundred thousand parameters. After dscusson of the relatonshps wthn the DM probablstc latent semantc analyss (PLSA) (Hofmann 1999), the mxture of ungrams (Ngam et al. 2000), and latent Drchlet allocaton (LDA) (Ble et al. 2001, 2003) the products of statstcal language modelng applcatons are dscussed and ther performance n perplexty compared. The DM model acheves the lowest perplexty level despte ts untopc nature. 1 Introducton Word rates n text vary accordng to global factors such as genre, topc, author, and expected readershp. Church and Gale (1995) examned the Brown corpus and showed that the Englsh word sad occurs wth hgh frequency n the press and fcton, but relatvely nfrequently n the hobby and learned genres. Ths observaton s basc to ther model for word rate varaton. Rosenfeld (1999) wrote that the occurrence of the word wnter n a document s proportonal to the occurrence of the word summer n the same document. Ths s a basc observaton for hs trgger models. Church (2001) states that a document ncludng the word Norega has a hgh probablty for another occurrence of Norega n the same document, an observaton 1
2 basc to hs adaptaton model. In the present paper, an attempt s made to model these global factors wth a new generatve text model (Drchlet Mxture or DM for short), and to apply t to mprove conventonal ngram language models that reveal only local nterdependency among words. It s well known that global factors are mportant to language modelng n three language processng research communtes nvolved n natural language processng, speech processng, and neural networks. In the natural language processng communty, Church and Gale (1995) proposed that word rate dstrbuton can be explaned by the Posson mxture an nfnte Posson mxture model, n whch the Posson parameter vares over the factors of a densty functon. For example, they demonstrated that an emprcal word rate varaton closely fts a specal case of the Posson mxture, the negatve bnomal where the densty functon assumes a gamma dstrbuton. However, because the Posson mxture s unvarate, t s dffcult to manage word rates for every word smultaneously. Researchers n the speech processng communty have proposed and tested a great number of multvarate models cache models, trgger models and topc-based models to capture dstant word dependency n the past two decades. Iyer and Ostendorf (1999) proposed an m-component mxture of ungram models wth a parameter estmaton method usng the EM algorthm. Each ungram model for the mxture corresponds to a dfferent topc and yelds word rates for that topc. Usng topc-based fnte mxture models, language models can be greatly mproved n perplexty, though parameter estmaton for ths model tends to overft tranng data. Recently, generatve text models such as latent Drchlet allocaton (LDA) have attracted people n the neural network communty. Usng generatve text models, the probablty for a document rather than smply sentences can be computed. Probablty computaton n these models takes advantage of pror dstrbuton of word rate varablty garnered from large document collectons. Generatve models are statstcally well defned and robust for parameter estmaton and adaptaton because they explot (herarchcal) Bayesan frameworks, whch rely heavly on a pror and posteror dstrbuton of word rates. In ths paper, a new generatve text model was nvestgated, whch unfes the followng concepts developed by the three communtes related to language processng: (1) summary of word rate varablty as a stochastc dstrbuton (2) fnte topc mxture models of multvarate dstrbutons (3) generatve text models based on a (herarchcal) Bayesan framework Based on (1) and (2), t was assumed that word rate varablty can be modeled wth a fnte mxture of Drchlet dstrbutons. Fnte mxtures encapsulate rough topc structures, and each Drchlet dstrbuton yelds clear word rate varablty wthn each topc for all words smultaneously. From (3), a robust model s bult adoptng a Baysan framework, employng a pror and a posteror dstrbuton to estmate DM parameters and to adapt them to the context of thus-processed documents. 2
3 In Sectons 2 and 3 of the paper, the DM model s descrbed, as are parameters estmaton methods, and posteror and predctve dstrbutons of the model. In Secton 4, the relatonshp between DM and other text models s dscussed. In Secton 5, expermental results of applcatons for statstcal language models are presented. The DM model acheves lower perplexty levels than those employng the mxture of ungrams (MU) and LDA models. 2 The DM and parameter estmaton 2.1 The DM and the Polya mxture The Drchlet dstrbuton s defned for a random vector, p =(p 1,p 2...p V ), on a smplex of V dmensons. Elements of a random vector on a smplex sum to 1. We nterpret p as word occurrence probabltes on V words of a vocabulary, so that the Drchlet dstrbuton models word occurrence probabltes. The densty functon of the Drchlet for p s: P D (p; α) = Γ(α) V v=1 Γ(α v) V v=1 p αv 1 v, (1) where α =(α 1,α 2,..., α V ) s a parameter vector, α v > 0andα = V v=1 α v. The Drchlet mxture dstrbuton (Sjölander et al. 1996) wth M components s defned as the followng: M P DM (p; λ, α M 1 )= λ m P D (p; α m ) = m=1 M m=1 λ m Γ(α m ) V v=1 Γ(α mv) V v=1 p αmv 1 v, (2) where λ =(λ 1,λ 2,..., λ M ) s a weght vector for each component Drchlet dstrbuton and α m = v α mv. When the random vector p as parameters of a multnomal s drawn from the DM, the compound dstrbuton for dscrete outcomes y =(y 1,y 2,..., y V )s: P PM (y; λ, α M 1 )= P Mul (y p)p DM (p; λ, α M 1 )dp = = M λ m m=1 M m=1 P Mul (y p)p D (p; α m )dp λ m Γ(α m ) Γ(α m + y) V v=1 Γ(y v + α mv ), (3) Γ(α mv ) 3
4 where y = v y v. Each y v means occurrence frequency of the v-th word n a document. Ths dstrbuton s called the Drchlet-multnomal mxture or the Polya mxture dstrbuton. The Polya mxture s used to estmate parameters for the DM, α and λ. 2.2 Parameter estmaton In ths subsecton, methods for estmatng parameters were ntroduced for the DM wth a maxmum lkelhood estmator of the Polya mxture. The estmatng methods are based on Mnka s estmaton methods for a Drchlet dstrbuton (Mnka 2003) and the EM algorthm (Dempster et al. 1977). Gven the -th datum or the -th tranng document, outcomes can be determned for words y =(y 1,y 2...y V ). For N tranng documents, the log lkelhood functon for the tranng documents D =(y 1, y 2,..., y N )s: N L(D; λ, α M 1 )= log P PM (y ; λ, α M 1 ). =1 The λ and α that maxmze the above lkelhood functon are also DM parameters. Assumng Z =(z 1,z 2,..., z N )andz s a hdden varable that denotes a component generatng the -th document, the log lkelhood for the complete data s: N L(D, Z; λ, α M 1 )= log P (y,z ; λ, α M 1 ). =1 The Q-functon for the EM algorthm or the condtonal expectaton of the above log lkelhood s: Q(θ θ) = P m log P (y,z = m; λ, α M 1 ) m = P m log λ m + Γ(α m ) V Γ(y v + α mv ) P m log, (4) Γ(α m m m + y ) Γ(α v=1 mv ) where P m = P (z = m y ; λ, ᾱ M 1 ) y = v y v. 4
5 λ, ᾱ M 1 are current values of parameters. The frst term of (4) can be maxmzed va the followng update formula for λ. λ m P m (5) The second term of (4) can be maxmzed va the followng update formula for α. Ψ(x) s the dgamma functon. The update formula s derved from the Mnka s fxed-pont teraton (Mnka 2003) and the EM algorthm (see Appendx A). α mv =ᾱ P m{ψ(y k +ᾱ mv ) Ψ(ᾱ mv )} mv P (6) m{ψ(y +ᾱ m ) Ψ(ᾱ m )} If the leavng-one-out (LOO) lkelhood s desgnated as the functon to be maxmzed, a faster update formula s obtaned. The followng update formula s based on Mnka s teraton for the LOO lkelhood (Mnka 2003) and the EM algorthm (see Appendx B): α mv =ᾱ P m{y v /(y v 1+ᾱ mv )} mv P (7) m{y /(y 1+ᾱ m )} Ths LOO update functon s used to estmate the DM parameters n all followng experments. 3 Inference 3.1 A posteror and predctve dstrbuton The DM model wth parameters estmated usng the above methods s regarded as a pror for the dstrbuton of word occurrence probabltes. In ths secton, a method s descrbed for computng a posteror and expectatons for word occurrence probablty, gven a word hstory or a document. The followng formula s a posteror dstrbuton for word occurrence probablty gven the data hstory y =(y 1,y 2,..., y V ), assumng a multnomal dstrbuton for count data y, wth parameter p dstrbuted accordng to a DM wth parameter α as a pror: P (p y) = P (y p)p (p) P (y p)p (p)dp = P Mul(y p)p DM (p; λ, α M 1 ) PMul (y p)p DM (p; λ, α M 1 )dp = m=1 B m v pαmv+yv 1 v m=1 C, (8) m 5
6 where B m = λ m Γ(α m ) V v=1 Γ(α mv), V v=1 C m = B Γ(α mv + y v ) m, Γ(α m + y) α m = V α mv, v=1 y = V y v. v=1 Expectaton of occurrence probablty of the w-thwordnavocabulary,p (w y), s: P (w y) = p w P (p y)dp = = = m=1 B V m v=1 pαmv+yv+δ(v w) 1 v dp m=1 C m m=1 B V Γ{α mv+y v+δ(v w)} m v=1 Γ(α m+y+1) m=1 C m m=1 C m αmw+yw α m+y m=1 C m (9) where δ(k) s Kronecker s delta, δ(k) = { 1, f k =0, 0, others. In contrast to LDA, DM has a closed formula for computng expectaton of word occurrence probablty. 3.2 Model averagng In the experment secton (Secton 5), t s demonstrated that a statstcal language model usng DM outperforms other models wth a fewer components, but that performance does not rse n proporton to the number of components. Ths problem reflects the overfttng nature of DM models. Avodng the overfttng problem, a smple model averagng method s adopted, whch computes a predctve dstrbuton as a mean for each predcton of DM wth a dfferent number of components. It s assumed that there are N dfferent DM models, and that P (w y), =1, 2...N s a predctve probablty for the word w. The followng averagng 6
7 equaton s referred to as method 1, n whch evdence probablty for a hstory s regarded as credt weght for each model: P ma1 (w y) = PPM (y; λ, α) j P j PM (y; λ, α)p (w y) (10) Method 2 s a smpler method, averagng predctve probabltes wth a smple arthmetc mean: P ma2 (w y) = 1 P (w y) (11) N 4 Relatonshp wth other topc-based models The relatonshp of the DM wth the other topc-based models s demonstrated usng graphcal representaton of models. The followng are evdence probabltes for data y =(y 1,y 2,..., y V ) n each topc-based model. Mxture of ungrams: P (y) = z p(z)p Mul(y z) LDA: P (y) = P D (θ α) v P (w v θ) yv dθ, wherep (w v θ) = z p(z θ)p(w v z) DM: P (y) =P PM (y) Z s of the mxture of ungram models and LDA are latent varables representng topcs. θ of LDA s weghts for each ungram modeled wth the Drchlet dstrbuton P D (θ; α). Evdence probablty for DM s the Polya dstrbuton descrbed n Sec Fgure 1 s a graphcal representaton of four models ncludng a probablstc LSA (PLSA). Outer and nner squares represent N documents (d) and L words (w) neachdocument, respectvely. Crcles are random varables and double crcles are model parameters. Arrows ndcate condtonal dependent relatonshps between varables. The smplest model s MU. In ths model, t s assumed that a document s generated from just one topc the frst chosen topc z, s used to generate all words n the document. The PLSA model can relax ths assumpton. Under PLSA, t s possble that each word n a document s generated wth dfferent topcs. In the result, a document s assumed to have multple topcs, a realstc assumpton. However, PLSA s not a well-defned generatve model for documents, because the model has no natural method to assgn probablty to a new document. LDA s an extenson of PLSA. Assumng Drchlet dstrbuton weghts for each ungram model, LDA can assgn probablty to new documents. The DM s another MU extenson. For DM, the assumpton of MU one topc for one document remans, but dstrbuton of ungram probablty s drectly modeled by the Drchlet dstrbuton. Other models, ncludng PLSA and LDA, can reveal multtopc structure, but each topc model s qute smple a ungram model. Though DM assumes a untopc structure for each document, ts topc models are rcher than those of ts strong rvals. 7
8 Fgure 1: Graphcal model expressons of generatve text models 5 Experments The performance of the DM was compared wth those of the LDA and the MU n test-set perplextes usng adaptve ngram language models. Tranng n all three models reled on the same tranng data and were evaluated wth the same test data. The tranng data was a set of 98,211 Japanese newspaper artcles from the year The test data was a set of 495 randomly selected artcles of more than 40 words, from the year The vocabulary comprsed the 20,000 most frequent words n the tranng data, and had a cover rate of 97.1%. The varatonal EM method was used to estmate the LDA parameter (Ble et al. 2003), but α n the Drchlet parameter for LDA was updated wth Mnka s fxed-pont teraton method (Mnka 2003), nstead of the Newton-Raphson method. For the DM, the estmaton method (1) based on an LOO lkelhood was used. Tranng for both models used the same stoppng crtera the change of closed perplexty for the tranng data before and after one global loop of teraton s less than 0.1%. Models were constructed wth both LDA and DM havng 1, 2, 5, 10, 20, 50, 100, 200, and 500 components. Fgure 2 presents the perplexty of each model for ts dfferent number of components. Each perplexty s computed as an nverse of probablty per word, a geometrcal mean of a document probablty, that s, an evdence probablty for data. For the DM, the probablty of the Polya mxture for a document s the document probablty. The DM consstently outperforms both other models. However, DM performance s best at 20 components, as t saturates wth somewhat fewer components. DM s a generalzed verson of MU and the saturaton suggests that DM suffers the MU overfttng problem. Fgure 3 presents the perplexty of the adaptve language models for dfferent numbers of components. Adaptve language models predct the probablty of the next word by usng a hstory, such as a conventonal ngram language model. They employ a longer hstory, such as an entre prevously processed secton, rather than just the two words precedng the target 8
9 Fgure 2: Comparson of test-set perplexty by document probablty word. In ths experment, the models were adapted to a secton of a document from the frst to the current word, and then probabltes were computed for the next 20 words. Every 20 words, ths operaton was repeated. Fgure 3 also shows that DM outperforms better than LDA. For the next experment, a trgram language model was developed, dynamcally adapted to longer hstores for topc-based models. A ungram rescalng method was used for the adaptaton (Gldea and Hofmann 1999). Fgure 4 shows the perplexty of combned adaptve language models. Lke the above experments, the performance of a ungram-rescalng trgram model wth DM s better than that wth LDA. Table 1 shows the best perplextes for each model. The values n parentheses are perplexty reducton rates from baselne models. LDA s a multtopc text model, whch reveals a mxture of topcs n a document, whle DM s a untopc text model, whch assumes one topc per document. Generally speakng, there are few documents wth just one topc. Ths rases the queston as to why DM s better than LDA n perplexty for those experments. Whle there s no clear answer, t s possble that the tranng and test materals for those experments were based on newspaper artcles, and thus somewhat focused on a sngle topc. DM may capture the detaled dstrbuton of word probabltes on topcs usng multple Drchlets, whereas LDA smply captures the dstrbuton ndrectly as a topc proporton or weght for a mxture usng a sngle Drchlet. The same experments need to be conducted wth web data, whch contans more complex topcal structures. 6 Conclusons Fnte mxture of Drchlet dstrbutons was nvestgated as a new text model. Parameter estmaton methods were ntroduced, based on Mnka s fxed-pont methods (Mnka 2003) 9
10 Fgure 3: Comparson of test-set perplexty by hstory adaptaton probablty(ungram) Fgure 4: Comparson of test-set perplexty by hstory adaptaton probablty(trgram) and the EM algorthm usng the mxture of the Drchlet-multnomal dstrbuton. Expermental results for applcatons of statstcal language models were demonstrated and ther performance compared for perplexty. The DM model acheved the lowest perplexty level despte ts untopc nature. References [1] D.M. Ble, A.Y. Ng, and M.I. Jordan Latent Drchlet Allocaton, Neural Informaton Processng Systems, vol.14. [2] D.M. Ble, A.Y. Ng, and M.I. Jordan Latent Drchlet Allocaton, Journal of Machne Learnng Research, Vol.3, pages
11 Table 1: Mnmum perplexty of each aspect model hstory hstory document adaptaton adaptaton probablty (ungram) (trgram) DM (32.0%) 60.74(19.6%) DM ave (36.1%) (23.2%) - LDA (29.9%) (11.1%) Mxture of Ungrams [3] Kenneth W. Church and Wllam A. Gale Posson mxtures. Natural Language Engneerng, Vol.1, No.1, pages [4] Kenneth W. Church Emprcal estmates of adaptaton: The chance of two Noregas s closer to p/2 thanp 2. In Proc. of Colng 2000, pages [5] A. P. Dempster, N. M. Lard and D. B. Rubn Maxmum lkelhood from ncomplete data va the EM algorthm. J. of Royal Statstcal Socety, Seres B, Vol.39, pages [6] T. Hofmann Probablstc latent semantc ndexng, Proc. of the 22nd Annual ACM Conference on Research and Development n Informaton Retreval, pp.50-57, Berkeley, Calforna. [7] K. Sjölander, K. Karplus, M. Brown, R. Hunghey, A. Krogh, I.S. Man, and D. Haussler Drchlet mxtures:a method for mproved detecton of weak but sgnfcant proten sequence homology, Computer Applcatons n the Boscences, vol.12, no.4, pp [8] D. Gldea, and T. Hofmann Topc-based language models usng em, Proc. of the 6th European Conference on Speech Communcaton and Technology (EU- ROSPEECH 99). [9] R.M. Iyer, and M. Ostendorf Modelng long dstance dependence n language:topc mxtures versus dynamc cache models, IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, vol.7, no.1, pp [10] S.T. K. Ngam, A. McCallum, and T. Mtchell Text classfcaton from labeled and unlabeled documents usng EM, Machne Learnng, vol.39, no.2/3, pp [11] T. Mnka Estmatng a Drchlet dstrbuton, mnka/papers/drchlet/ [12] Ronald Rosenfeld A maxmum entropy approach to adaptve statstcal language modelng. Computer Speech and Language, Vol.10, No.3, pages
12 A Dervaton of the update formula of α for the MLE Mnka s fxed-pont teraton method (Mnka 1999) and the EM algorthm were used. The followng equatons (Mnka 2003) were used to get the lower bound of the second term of the equaton (4): Γ(α m ) Γ(α m + y ) Γ(ᾱ { } m)exp (ᾱm α m )b m, Γ(ᾱ m + y ) and Γ(ᾱ mv + y v ) Γ(ᾱ mv ) c mv ᾱ a mv mv (f y v 1), where y = v y v, α m = v α mv, and ᾱ m = v ᾱ mv. The lower bound s: Γ(ᾱ m ) P m log Γ(ᾱ m + y ) where m m V v=1 Γ(y v +ᾱ mv ) Γ(ᾱ mv ) [ P m log Γ(ᾱ m)exp { } (ᾱ m α m )b m + Γ(y +ᾱ m ) v log c mv α a mv mv ] Q (α), a mv = { Ψ(ᾱ mv + y v ) Ψ(ᾱ mv ) } ᾱ mv, b m =Ψ(ᾱ m + y ) Ψ(ᾱ m ), c mv = Γ(ᾱ mv + y v ) ᾱ a mv mv. Γ(ᾱ mv ) Ths lower bound can be maxmzed wth the followng update formula for α mv. Q (α) = P m b + 1 P m a mv =0 α mv ᾱ mv α mv =ᾱ P m{ψ(y k +ᾱ mv ) Ψ(ᾱ mv )} mv P m{ψ(y +ᾱ m ) Ψ(ᾱ m )} 12
13 B Dervaton for the update formula of α for the maxmum LOO lkelhood Gven datum y v n whch one word v s left out of a document, the predctve probablty P (v y v ) for the word v s the followng from the equaton (9): where p(v y v ) m P m α mv + y v 1 α m + y 1, P v m P m. The LOO log-lkelhood, L loo,s L loo (y α) y v log α mv + y v 1 P m α v m m + y 1. Snce P m s very nearly to 1 or 0 for almost all cases, the precedng equaton can be transformed: L loo (y α) ( ) αmv + y v 1 y v P m log α v m m + y 1 Usng the precedng equaton, lkelhood can be maxmzed ndependently for α mv of the m-th Drchlet component. The lower bound of the LOO log-lkelhood L m loo for the m-th component s: L m loo ( ) y v P m q mv log α mv y P m a m α m +(const.), v where ᾱ mv q mv = ᾱ mv + y v 1, 1 a m = ᾱ m + y 1. The followng nequaltes (Mnka 2003) were used to get the above bound: log(n + x) q log x +(1 q)logn q log q (1 q)log(1 q) where q = ˆx, n+ˆx log(x) ax 1+logˆx 13
14 where a =1/ˆx. An update formula for fxed-pont teraton s: α mv =ᾱ P m{y v /(y v 1+ᾱ mv )} mv P m{y /(y 1+ᾱ m )}. 14
MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)
1/16 MATH 829: Introducton to Data Mnng and Analyss The EM algorthm (part 2) Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 20, 2016 Recall 2/16 We are gven ndependent observatons
More informationCS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements
CS 750 Machne Learnng Lecture 5 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 750 Machne Learnng Announcements Homework Due on Wednesday before the class Reports: hand n before
More informationRetrieval Models: Language models
CS-590I Informaton Retreval Retreval Models: Language models Luo S Department of Computer Scence Purdue Unversty Introducton to language model Ungram language model Document language model estmaton Maxmum
More informationExpectation Maximization Mixture Models HMMs
-755 Machne Learnng for Sgnal Processng Mture Models HMMs Class 9. 2 Sep 200 Learnng Dstrbutons for Data Problem: Gven a collecton of eamples from some data, estmate ts dstrbuton Basc deas of Mamum Lelhood
More informationConjugacy and the Exponential Family
CS281B/Stat241B: Advanced Topcs n Learnng & Decson Makng Conjugacy and the Exponental Famly Lecturer: Mchael I. Jordan Scrbes: Bran Mlch 1 Conjugacy In the prevous lecture, we saw conjugate prors for the
More informationThe EM Algorithm (Dempster, Laird, Rubin 1977) The missing data or incomplete data setting: ODL(φ;Y ) = [Y;φ] = [Y X,φ][X φ] = X
The EM Algorthm (Dempster, Lard, Rubn 1977 The mssng data or ncomplete data settng: An Observed Data Lkelhood (ODL that s a mxture or ntegral of Complete Data Lkelhoods (CDL. (1a ODL(;Y = [Y;] = [Y,][
More informationEvaluation for sets of classes
Evaluaton for Tet Categorzaton Classfcaton accuracy: usual n ML, the proporton of correct decsons, Not approprate f the populaton rate of the class s low Precson, Recall and F 1 Better measures 21 Evaluaton
More informationHidden Markov Models & The Multivariate Gaussian (10/26/04)
CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models
More information2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification
E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton
More informationSTATS 306B: Unsupervised Learning Spring Lecture 10 April 30
STATS 306B: Unsupervsed Learnng Sprng 2014 Lecture 10 Aprl 30 Lecturer: Lester Mackey Scrbe: Joey Arthur, Rakesh Achanta 10.1 Factor Analyss 10.1.1 Recap Recall the factor analyss (FA) model for lnear
More informationMLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012
MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:
More informationLecture Notes on Linear Regression
Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume
More informationConditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
Condtonal Random Felds: Probablstc Models for Segmentng and Labelng Sequence Data Paper by John Lafferty, Andrew McCallum, and Fernando Perera ICML 2001 Presentaton by Joe Drsh May 9, 2002 Man Goals Present
More informationComputation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models
Computaton of Hgher Order Moments from Two Multnomal Overdsperson Lkelhood Models BY J. T. NEWCOMER, N. K. NEERCHAL Department of Mathematcs and Statstcs, Unversty of Maryland, Baltmore County, Baltmore,
More informationModule 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur
Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:
More informationOn an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1
On an Extenson of Stochastc Approxmaton EM Algorthm for Incomplete Data Problems Vahd Tadayon Abstract: The Stochastc Approxmaton EM (SAEM algorthm, a varant stochastc approxmaton of EM, s a versatle tool
More informationParametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010
Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton
More informationThe Expectation-Maximization Algorithm
The Expectaton-Maxmaton Algorthm Charles Elan elan@cs.ucsd.edu November 16, 2007 Ths chapter explans the EM algorthm at multple levels of generalty. Secton 1 gves the standard hgh-level verson of the algorthm.
More informationLogistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI
Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton
More informationEM and Structure Learning
EM and Structure Learnng Le Song Machne Learnng II: Advanced Topcs CSE 8803ML, Sprng 2012 Partally observed graphcal models Mxture Models N(μ 1, Σ 1 ) Z X N N(μ 2, Σ 2 ) 2 Gaussan mxture model Consder
More informationChapter 11: Simple Linear Regression and Correlation
Chapter 11: Smple Lnear Regresson and Correlaton 11-1 Emprcal Models 11-2 Smple Lnear Regresson 11-3 Propertes of the Least Squares Estmators 11-4 Hypothess Test n Smple Lnear Regresson 11-4.1 Use of t-tests
More informationMotion Perception Under Uncertainty. Hongjing Lu Department of Psychology University of Hong Kong
Moton Percepton Under Uncertanty Hongjng Lu Department of Psychology Unversty of Hong Kong Outlne Uncertanty n moton stmulus Correspondence problem Qualtatve fttng usng deal observer models Based on sgnal
More informationarxiv: v2 [stat.me] 26 Jun 2012
The Two-Way Lkelhood Rato (G Test and Comparson to Two-Way χ Test Jesse Hoey June 7, 01 arxv:106.4881v [stat.me] 6 Jun 01 1 One-Way Lkelhood Rato or χ test Suppose we have a set of data x and two hypotheses
More informationNUMERICAL DIFFERENTIATION
NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the
More informationMarkov Chain Monte Carlo Lecture 6
where (x 1,..., x N ) X N, N s called the populaton sze, f(x) f (x) for at least one {1, 2,..., N}, and those dfferent from f(x) are called the tral dstrbutons n terms of mportance samplng. Dfferent ways
More informationSDMML HT MSc Problem Sheet 4
SDMML HT 06 - MSc Problem Sheet 4. The recever operatng characterstc ROC curve plots the senstvty aganst the specfcty of a bnary classfer as the threshold for dscrmnaton s vared. Let the data space be
More informationANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)
Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of
More informationSTAT 3008 Applied Regression Analysis
STAT 3008 Appled Regresson Analyss Tutoral : Smple Lnear Regresson LAI Chun He Department of Statstcs, The Chnese Unversty of Hong Kong 1 Model Assumpton To quantfy the relatonshp between two factors,
More informationComparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method
Appled Mathematcal Scences, Vol. 7, 0, no. 47, 07-0 HIARI Ltd, www.m-hkar.com Comparson of the Populaton Varance Estmators of -Parameter Exponental Dstrbuton Based on Multple Crtera Decson Makng Method
More informationFor now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.
Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson
More informationMAXIMUM A POSTERIORI TRANSDUCTION
MAXIMUM A POSTERIORI TRANSDUCTION LI-WEI WANG, JU-FU FENG School of Mathematcal Scences, Peng Unversty, Bejng, 0087, Chna Center for Informaton Scences, Peng Unversty, Bejng, 0087, Chna E-MIAL: {wanglw,
More informationCIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M
CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute
More informationResearch Article Green s Theorem for Sign Data
Internatonal Scholarly Research Network ISRN Appled Mathematcs Volume 2012, Artcle ID 539359, 10 pages do:10.5402/2012/539359 Research Artcle Green s Theorem for Sgn Data Lous M. Houston The Unversty of
More informationSpace of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics
/7/7 CSE 73: Artfcal Intellgence Bayesan - Learnng Deter Fox Sldes adapted from Dan Weld, Jack Breese, Dan Klen, Daphne Koller, Stuart Russell, Andrew Moore & Luke Zettlemoyer What s Beng Learned? Space
More informationCOMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS
Avalable onlne at http://sck.org J. Math. Comput. Sc. 3 (3), No., 6-3 ISSN: 97-537 COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS
More informationMore metrics on cartesian products
More metrcs on cartesan products If (X, d ) are metrc spaces for 1 n, then n Secton II4 of the lecture notes we defned three metrcs on X whose underlyng topologes are the product topology The purpose of
More informationHidden Markov Models
CM229S: Machne Learnng for Bonformatcs Lecture 12-05/05/2016 Hdden Markov Models Lecturer: Srram Sankararaman Scrbe: Akshay Dattatray Shnde Edted by: TBD 1 Introducton For a drected graph G we can wrte
More information3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X
Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number
More informationSupporting Information
Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to
More information10.34 Fall 2015 Metropolis Monte Carlo Algorithm
10.34 Fall 2015 Metropols Monte Carlo Algorthm The Metropols Monte Carlo method s very useful for calculatng manydmensonal ntegraton. For e.g. n statstcal mechancs n order to calculate the prospertes of
More informationMarkov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement
Markov Chan Monte Carlo MCMC, Gbbs Samplng, Metropols Algorthms, and Smulated Annealng 2001 Bonformatcs Course Supplement SNU Bontellgence Lab http://bsnuackr/ Outlne! Markov Chan Monte Carlo MCMC! Metropols-Hastngs
More informationMachine learning: Density estimation
CS 70 Foundatons of AI Lecture 3 Machne learnng: ensty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square ata: ensty estmaton {.. n} x a vector of attrbute values Objectve: estmate the model of
More informationKernel Methods and SVMs Extension
Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general
More informationHomework Assignment 3 Due in class, Thursday October 15
Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.
More informationBayesian predictive Configural Frequency Analysis
Psychologcal Test and Assessment Modelng, Volume 54, 2012 (3), 285-292 Bayesan predctve Confgural Frequency Analyss Eduardo Gutérrez-Peña 1 Abstract Confgural Frequency Analyss s a method for cell-wse
More informationThe Basic Idea of EM
The Basc Idea of EM Janxn Wu LAMDA Group Natonal Key Lab for Novel Software Technology Nanjng Unversty, Chna wujx2001@gmal.com June 7, 2017 Contents 1 Introducton 1 2 GMM: A workng example 2 2.1 Gaussan
More informationMaximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models
ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Maxmum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models
More informationComposite Hypotheses testing
Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter
More informationBasically, if you have a dummy dependent variable you will be estimating a probability.
ECON 497: Lecture Notes 13 Page 1 of 1 Metropoltan State Unversty ECON 497: Research and Forecastng Lecture Notes 13 Dummy Dependent Varable Technques Studenmund Chapter 13 Bascally, f you have a dummy
More informationAppendix B: Resampling Algorithms
407 Appendx B: Resamplng Algorthms A common problem of all partcle flters s the degeneracy of weghts, whch conssts of the unbounded ncrease of the varance of the mportance weghts ω [ ] of the partcles
More informationCourse 395: Machine Learning - Lectures
Course 395: Machne Learnng - Lectures Lecture 1-2: Concept Learnng (M. Pantc Lecture 3-4: Decson Trees & CC Intro (M. Pantc Lecture 5-6: Artfcal Neural Networks (S.Zaferou Lecture 7-8: Instance ased Learnng
More informationLinear Regression Analysis: Terminology and Notation
ECON 35* -- Secton : Basc Concepts of Regresson Analyss (Page ) Lnear Regresson Analyss: Termnology and Notaton Consder the generc verson of the smple (two-varable) lnear regresson model. It s represented
More informationUsing T.O.M to Estimate Parameter of distributions that have not Single Exponential Family
IOSR Journal of Mathematcs IOSR-JM) ISSN: 2278-5728. Volume 3, Issue 3 Sep-Oct. 202), PP 44-48 www.osrjournals.org Usng T.O.M to Estmate Parameter of dstrbutons that have not Sngle Exponental Famly Jubran
More informationBoostrapaggregating (Bagging)
Boostrapaggregatng (Baggng) An ensemble meta-algorthm desgned to mprove the stablty and accuracy of machne learnng algorthms Can be used n both regresson and classfcaton Reduces varance and helps to avod
More informationChapter Newton s Method
Chapter 9. Newton s Method After readng ths chapter, you should be able to:. Understand how Newton s method s dfferent from the Golden Secton Search method. Understand how Newton s method works 3. Solve
More informationAn Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation
An Experment/Some Intuton I have three cons n my pocket, 6.864 (Fall 2006): Lecture 18 The EM Algorthm Con 0 has probablty λ of heads; Con 1 has probablty p 1 of heads; Con 2 has probablty p 2 of heads
More information3.1 ML and Empirical Distribution
67577 Intro. to Machne Learnng Fall semester, 2008/9 Lecture 3: Maxmum Lkelhood/ Maxmum Entropy Dualty Lecturer: Amnon Shashua Scrbe: Amnon Shashua 1 In the prevous lecture we defned the prncple of Maxmum
More informationGenerative classification models
CS 675 Intro to Machne Learnng Lecture Generatve classfcaton models Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square Data: D { d, d,.., dn} d, Classfcaton represents a dscrete class value Goal: learn
More information4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA
4 Analyss of Varance (ANOVA) 5 ANOVA 51 Introducton ANOVA ANOVA s a way to estmate and test the means of multple populatons We wll start wth one-way ANOVA If the populatons ncluded n the study are selected
More informationGeneralized Linear Methods
Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set
More informationTracking with Kalman Filter
Trackng wth Kalman Flter Scott T. Acton Vrgna Image and Vdeo Analyss (VIVA), Charles L. Brown Department of Electrcal and Computer Engneerng Department of Bomedcal Engneerng Unversty of Vrgna, Charlottesvlle,
More informationComputing MLE Bias Empirically
Computng MLE Bas Emprcally Kar Wa Lm Australan atonal Unversty January 3, 27 Abstract Ths note studes the bas arses from the MLE estmate of the rate parameter and the mean parameter of an exponental dstrbuton.
More informationGaussian process classification: a message-passing viewpoint
Gaussan process classfcaton: a message-passng vewpont Flpe Rodrgues fmpr@de.uc.pt November 014 Abstract The goal of ths short paper s to provde a message-passng vewpont of the Expectaton Propagaton EP
More informationPredictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore
Sesson Outlne Introducton to classfcaton problems and dscrete choce models. Introducton to Logstcs Regresson. Logstc functon and Logt functon. Maxmum Lkelhood Estmator (MLE) for estmaton of LR parameters.
More informationSection 8.3 Polar Form of Complex Numbers
80 Chapter 8 Secton 8 Polar Form of Complex Numbers From prevous classes, you may have encountered magnary numbers the square roots of negatve numbers and, more generally, complex numbers whch are the
More informationGlobal Sensitivity. Tuesday 20 th February, 2018
Global Senstvty Tuesday 2 th February, 28 ) Local Senstvty Most senstvty analyses [] are based on local estmates of senstvty, typcally by expandng the response n a Taylor seres about some specfc values
More informationEcon107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)
I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes
More informationMixture o f of Gaussian Gaussian clustering Nov
Mture of Gaussan clusterng Nov 11 2009 Soft vs hard lusterng Kmeans performs Hard clusterng: Data pont s determnstcally assgned to one and only one cluster But n realty clusters may overlap Soft-clusterng:
More informationFirst Year Examination Department of Statistics, University of Florida
Frst Year Examnaton Department of Statstcs, Unversty of Florda May 7, 010, 8:00 am - 1:00 noon Instructons: 1. You have four hours to answer questons n ths examnaton.. You must show your work to receve
More informationLinear Approximation with Regularization and Moving Least Squares
Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...
More informationUncertainty as the Overlap of Alternate Conditional Distributions
Uncertanty as the Overlap of Alternate Condtonal Dstrbutons Olena Babak and Clayton V. Deutsch Centre for Computatonal Geostatstcs Department of Cvl & Envronmental Engneerng Unversty of Alberta An mportant
More informationMaximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models
ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Mamum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models for
More informationOutline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline
Outlne Bayesan Networks: Maxmum Lkelhood Estmaton and Tree Structure Learnng Huzhen Yu janey.yu@cs.helsnk.f Dept. Computer Scence, Unv. of Helsnk Probablstc Models, Sprng, 200 Notces: I corrected a number
More informationLecture 6: Introduction to Linear Regression
Lecture 6: Introducton to Lnear Regresson An Manchakul amancha@jhsph.edu 24 Aprl 27 Lnear regresson: man dea Lnear regresson can be used to study an outcome as a lnear functon of a predctor Example: 6
More information10-701/ Machine Learning, Fall 2005 Homework 3
10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40
More informationECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2017 Instructor: Victor Aguirregabiria
ECOOMETRICS II ECO 40S Unversty of Toronto Department of Economcs Wnter 07 Instructor: Vctor Agurregabra SOLUTIO TO FIAL EXAM Tuesday, Aprl 8, 07 From :00pm-5:00pm 3 hours ISTRUCTIOS: - Ths s a closed-book
More informationDiscussion of Extensions of the Gauss-Markov Theorem to the Case of Stochastic Regression Coefficients Ed Stanek
Dscusson of Extensons of the Gauss-arkov Theorem to the Case of Stochastc Regresson Coeffcents Ed Stanek Introducton Pfeffermann (984 dscusses extensons to the Gauss-arkov Theorem n settngs where regresson
More informationJoint Statistical Meetings - Biopharmaceutical Section
Iteratve Ch-Square Test for Equvalence of Multple Treatment Groups Te-Hua Ng*, U.S. Food and Drug Admnstraton 1401 Rockvlle Pke, #200S, HFM-217, Rockvlle, MD 20852-1448 Key Words: Equvalence Testng; Actve
More informationMaximum Likelihood Estimation (MLE)
Maxmum Lkelhood Estmaton (MLE) Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 01 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y
More informationClassification as a Regression Problem
Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class
More informationStatistical inference for generalized Pareto distribution based on progressive Type-II censored data with random removals
Internatonal Journal of Scentfc World, 2 1) 2014) 1-9 c Scence Publshng Corporaton www.scencepubco.com/ndex.php/ijsw do: 10.14419/jsw.v21.1780 Research Paper Statstcal nference for generalzed Pareto dstrbuton
More informationXII.3 The EM (Expectation-Maximization) Algorithm
XII.3 The EM (Expectaton-Maxzaton) Algorth Toshnor Munaata 3/7/06 The EM algorth s a technque to deal wth varous types of ncoplete data or hdden varables. It can be appled to a wde range of learnng probles
More informationStatistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation
Statstcs for Managers Usng Mcrosoft Excel/SPSS Chapter 13 The Smple Lnear Regresson Model and Correlaton 1999 Prentce-Hall, Inc. Chap. 13-1 Chapter Topcs Types of Regresson Models Determnng the Smple Lnear
More informationDepartment of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6
Department of Quanttatve Methods & Informaton Systems Tme Seres and Ther Components QMIS 30 Chapter 6 Fall 00 Dr. Mohammad Zanal These sldes were modfed from ther orgnal source for educatonal purpose only.
More informationI529: Machine Learning in Bioinformatics (Spring 2017) Markov Models
I529: Machne Learnng n Bonformatcs (Sprng 217) Markov Models Yuzhen Ye School of Informatcs and Computng Indana Unversty, Bloomngton Sprng 217 Outlne Smple model (frequency & profle) revew Markov chan
More informationDr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur
Analyss of Varance and Desgn of Experment-I MODULE VIII LECTURE - 34 ANALYSIS OF VARIANCE IN RANDOM-EFFECTS MODEL AND MIXED-EFFECTS EFFECTS MODEL Dr Shalabh Department of Mathematcs and Statstcs Indan
More informationA be a probability space. A random vector
Statstcs 1: Probablty Theory II 8 1 JOINT AND MARGINAL DISTRIBUTIONS In Probablty Theory I we formulate the concept of a (real) random varable and descrbe the probablstc behavor of ths random varable by
More informationGaussian Mixture Models
Lab Gaussan Mxture Models Lab Objectve: Understand the formulaton of Gaussan Mxture Models (GMMs) and how to estmate GMM parameters. You ve already seen GMMs as the observaton dstrbuton n certan contnuous
More informationFinite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin
Fnte Mxture Models and Expectaton Maxmzaton Most sldes are from: Dr. Maro Fgueredo, Dr. Anl Jan and Dr. Rong Jn Recall: The Supervsed Learnng Problem Gven a set of n samples X {(x, y )},,,n Chapter 3 of
More informationINF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018
INF 5860 Machne learnng for mage classfcaton Lecture 3 : Image classfcaton and regresson part II Anne Solberg January 3, 08 Today s topcs Multclass logstc regresson and softma Regularzaton Image classfcaton
More informationPsychology 282 Lecture #24 Outline Regression Diagnostics: Outliers
Psychology 282 Lecture #24 Outlne Regresson Dagnostcs: Outlers In an earler lecture we studed the statstcal assumptons underlyng the regresson model, ncludng the followng ponts: Formal statement of assumptons.
More informationLecture Nov
Lecture 18 Nov 07 2008 Revew Clusterng Groupng smlar obects nto clusters Herarchcal clusterng Agglomeratve approach (HAC: teratvely merge smlar clusters Dfferent lnkage algorthms for computng dstances
More informationHow its computed. y outcome data λ parameters hyperparameters. where P denotes the Laplace approximation. k i k k. Andrew B Lawson 2013
Andrew Lawson MUSC INLA INLA s a relatvely new tool that can be used to approxmate posteror dstrbutons n Bayesan models INLA stands for ntegrated Nested Laplace Approxmaton The approxmaton has been known
More informationEPR Paradox and the Physical Meaning of an Experiment in Quantum Mechanics. Vesselin C. Noninski
EPR Paradox and the Physcal Meanng of an Experment n Quantum Mechancs Vesseln C Nonnsk vesselnnonnsk@verzonnet Abstract It s shown that there s one purely determnstc outcome when measurement s made on
More informationCopyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for P Charts. Dr. Wayne A. Taylor
Taylor Enterprses, Inc. Control Lmts for P Charts Copyrght 2017 by Taylor Enterprses, Inc., All Rghts Reserved. Control Lmts for P Charts Dr. Wayne A. Taylor Abstract: P charts are used for count data
More informationMIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU
Group M D L M Chapter Bayesan Decson heory Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty Bayesan Decson heory Bayesan decson theory s a statstcal approach to data mnng/pattern
More informationENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition
EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton Memoral Unversty of ewfoundland Pattern Recognton Lecture 7 May 3, 006 http://wwwengrmunca/~charlesr Offce Hours: uesdays hursdays 8:30-9:30
More informationx = , so that calculated
Stat 4, secton Sngle Factor ANOVA notes by Tm Plachowsk n chapter 8 we conducted hypothess tests n whch we compared a sngle sample s mean or proporton to some hypotheszed value Chapter 9 expanded ths to
More informationNatural Images, Gaussian Mixtures and Dead Leaves Supplementary Material
Natural Images, Gaussan Mxtures and Dead Leaves Supplementary Materal Danel Zoran Interdscplnary Center for Neural Computaton Hebrew Unversty of Jerusalem Israel http://www.cs.huj.ac.l/ danez Yar Wess
More informationPower law and dimension of the maximum value for belief distribution with the max Deng entropy
Power law and dmenson of the maxmum value for belef dstrbuton wth the max Deng entropy Bngy Kang a, a College of Informaton Engneerng, Northwest A&F Unversty, Yanglng, Shaanx, 712100, Chna. Abstract Deng
More information