Topic Modeling: Beyond Bag-of-Words

Size: px
Start display at page:

Download "Topic Modeling: Beyond Bag-of-Words"

Transcription

1 Hanna M. Wallach Cavenish Laboratory, University of Cambrige, Cambrige CB3 0HE, UK Abstract Some moels of textual corpora employ text generation methos involving n-gram statistics, while others use latent topic variables inferre using the bag-of-wors assumption, in which wor orer is ignore. Previously, these methos have not been combine. In this wor, I explore a hierarchical generative probabilistic moel that incorporates both n-gram statistics an latent topic variables by extening a unigram topic moel to inclue properties of a hierarchical Dirichlet bigram language moel. The moel hyperparameters are inferre using a Gibbs EM algorithm. On two ata sets, each of 150 ocuments, the new moel exhibits better preictive accuracy than either a hierarchical Dirichlet bigram language moel or a unigram topic moel. Aitionally, the inferre topics are less ominate by function wors than are topics iscovere using unigram statistics, potentially maing them more meaningful. 1. Introuction Recently, much attention has been given to generative probabilistic moels of textual corpora, esigne to ientify representations of the ata that reuce escription length an reveal inter- or intra-ocument statistical structure. Such moels typically fall into one of two categories those that generate each wor on the basis of some number of preceing wors or wor classes an those that generate wors base on latent topic variables inferre from wor correlations inepenent of the orer in which the wors appear. n-gram language moels mae preictions using observe marginal an conitional wor frequencies. Appearing in Proceeings of the 23 r International Conference on Machine Learning, Pittsburgh, PA, Copyright 2006 by the author(s)/owner(s). While such moels may use conitioning contexts of arbitrary length, this paper eals only with bigram moels i.e., moels that preict each wor base on the immeiately preceing wor. To evelop a bigram language moel, marginal an conitional wor counts are etermine from a corpus w. The marginal count N i is efine as the number of times that wor i has occurre in the corpus, while the conitional count N i j is the number of times wor i immeiately follows wor j. Given these counts, the aim of bigram language moeling is to evelop preictions of wor w t given wor w t 1, in any ocument. Typically this is one by computing estimators of both the marginal probability of wor i an the conitional probability of wor i following wor j, such as f i = N i /N an f i j = N i j /N j, where N is the number of wors in the corpus. If there were sufficient ata available, the observe conitional frequency f i j coul be use as an estimator for the preictive probability of i given j. In practice, this oes not provie a goo estimate: only a small fraction of possible i, j wor pairs will have been observe in the corpus. Consequently, the conitional frequency estimator has too large a variance to be use by itself. To alleviate this problem, the bigram estimator f i j is smoothe by the marginal frequency estimator f i to give the preictive probability of wor i given wor j: P (w t = i w t 1 = j) = λf i + (1 λ)f i j. (1) The parameter λ may be fixe, or etermine from the ata using techniques such as cross-valiation (Jeline & Mercer, 1980). This proceure wors well in practice, espite its somewhat a hoc nature. The hierarchical Dirichlet language moel (MacKay & Peto, 1995) is a bigram moel that is entirely riven by principles of Bayesian inference. This moel has a similar preictive istribution to moels base on equation (1), with one ey ifference: the bigram statistics f i j in MacKay an Peto s moel are not smoothe with marginal statistics f i, but are smoothe with a quantity relate to the number of ifferent contexts in which each wor has occurre.

2 Latent Dirichlet allocation (Blei et al., 2003) provies an alternative approach to moeling textual corpora. Documents are moele as finite mixtures over an unerlying set of latent topics inferre from correlations between wors, inepenent of wor orer. This bag-of-wors assumption maes sense from a point of view of computational efficiency, but is unrealistic. In many language moeling applications, such as text compression, speech recognition, an preictive text entry, wor orer is extremely important. Furthermore, it is liely that wor orer can assist in topic inference. The phrases the epartment chair couches offers an the chair epartment offers couches have the same unigram statistics, but are about quite ifferent topics. When eciing which topic generate the wor chair in the first sentence, nowing that it was immeiately precee by the wor epartment maes it much more liely to have been generate by a topic that assigns high probability to wors relate to university aministration. In practice, the topics inferre using latent Dirichlet allocation are heavily ominate by function wors, such as in, that, of an for, unless these wors are remove from corpora prior to topic inference. While removing these may be appropriate for tass where wor orer oes not play a significant role, such as information retrieval, it is not appropriate for many language moeling applications, where both function an content wors must be accurately preicte. In this paper, I present a hierarchical Bayesian moel that integrates bigram-base an topic-base approaches to ocument moeling. This moel moves beyon the bag-of-wors assumption foun in latent Dirichlet allocation by introucing properties of MacKay an Peto s hierarchical Dirichlet language moel. In aition to exhibiting better preictive performance than either MacKay an Peto s language moel or latent Dirichlet allocation, the topics inferre using the new moel are typically less ominate by function wors than are topics inferre from the same corpora using latent Dirichlet allocation. 2. Bacgroun I begin with brief escriptions of MacKay an Peto s hierarchical Dirichlet language moel an Blei et al. s latent Dirichlet allocation Hierarchical Dirichlet Language Moel Bigram language moels are specifie by a conitional istribution P (w t = i w t 1 = j), escribe by W (W 1) free parameters, where W is the number of wors in the vocabulary. These parameters are enote by the matrix Φ, with P (w t = i w t 1 = j) φ i j. Φ may be thought of as a transition probability matrix, in which the j th row, the probability vector for transitions from wor j, is enote by the vector φ j. Given a corpus w, the lielihoo function is P (w Φ) = φ N i j i j, (2) i j where N i j is the number of times that wor i immeiately follows wor j in the corpus. MacKay an Peto (1995) exten this basic framewor by placing a Dirichlet prior over Φ: P (Φ βm) = j Dirichlet(φ j βm), (3) where β > 0 an m is a measure satisfying i m i = 1. Combining equations (2) an (3), an integrating over Φ, yiels the probability of the corpus given the hyperparameters βm, also nown as the evience : P (w βm) = j i Γ(N i j + βm i ) Γ(N j + β) Γ(β) i Γ(βm i). (4) It is also easy to obtain a preictive istribution for each context j given the hyperparameters βm: P (i j, w, βm) = N i j + βm i N j + β. (5) To mae the relationship to equation (1) explicit, P (i j, w, βm) may be rewritten as P (i j, w, βm) = λ j m i + (1 λ j )f i j, (6) where f i j = N i j /N j an λ j = β N j + β. (7) The hyperparameter m i is now taing the role of the marginal statistic f i in equation (1). Ieally, the measure βm shoul be given a proper prior an marginalize over when maing preictions, yieling the true preictive istribution: P (i j, w) = P (βm w)p (i j, w, βm) (βm). (8) However, if P (βm w) is sharply peae in βm so that that it is effectively a elta function, then the true preictive istribution may be approximate by P (i j, w, [βm] MP ), where [βm] MP is the maximum of

3 P (βm w). Aitionally, the prior over βm may be assume to be uninformative, yieling a minimal atariven Bayesian moel in which the optimal βm may be etermine from the ata by maximizing the evience. MacKay an Peto show that each element of the optimal m, when estimate using this empirical Bayes proceure, is relate to the number of contexts in which the corresponing wor has appeare Latent Dirichlet Allocation Latent Dirichlet allocation (Blei et al., 2003) represents ocuments as ranom mixtures over latent topics, where each topic is characterize by a istribution over wors. Each wor w t in a corpus w is assume to have been generate by a latent topic z t, rawn from a ocument-specific istribution over T topics. Wor generation is efine by a conitional istribution P (w t = i z t = ), escribe by T (W 1) free parameters, where T is the number of topics an W is the size of the vocabulary. These parameters are enote by Φ, with P (w t = i z t = ) φ i. Φ may be thought of as an emission probability matrix, in which the th row, the istribution over wors for topic, is enote by φ. Similarly, topic generation is characterize by a conitional istribution P (z t = t = ), escribe by D(T 1) free parameters, where D is the number of ocuments in the corpus. These parameters form a matrix Θ, with P (z t = t = ) θ. The th row of this matrix is the istribution over topics for ocument, enote by θ. The joint probability of a corpus w an a set of corresponing latent topics z is P (w, z Φ, Θ) = i φ N i i θn, (9) where N i is the number of times that wor i has been generate by topic, an N is the number of times topic has been use in ocument. Blei et al. place a Dirichlet prior over Φ, P (Φ βm) = an another over Θ, P (Θ αn) = Dirichlet(φ βm), (10) Dirichlet(θ αn). (11) Combining these priors with equation (9) an integrating over Φ an Θ gives the evience for hyperparameters αn an βm, which is also the probability of the corpus given the hyperparameters: P (w αn, βm) = ( i Γ(N i + βm i ) Γ(β) Γ(N z + β) i Γ(βm i) Γ(N ) + αn ) Γ(α) Γ(N + α) Γ(αn. (12) ) N is the total number of times topic occurs in z, while N is the number of wors in ocument. The sum over z cannot be compute irectly because it oes not factorize an involves T N terms, where N is the total number of wors in the corpus. However, it may be approximate using Marov chain Monte Carlo (Griffiths & Steyvers, 2004). Given a corpus w, a set of latent topics z, an optimal hyperparameters [αn] MP an [βm] MP, approximate preictive istributions for each topic an ocument are given by the following pair of equations: P (i, w, z, [βm] MP ) = N i + [βm i ] MP N + β MP (13) P (, w, z, [αn] MP ) = N + [αn ] MP These may be rewritten as N + α MP. (14) P (i, z, w, βm) = λ f i + (1 λ )m i (15) P (, z, w, αn) = γ f + (1 γ )n, (16) where f i = N i /N, f = N /N an β λ = N + β (17) α γ = N + α. (18) f i is therefore being smoothe by the hyperparameter m i, while f is smoothe by n. Note the similarity of equations (15) an (16) to equation (6). 3. Bigram Topic Moel This section introuces a moel that extens latent Dirichlet allocation by incorporating a notion of wor orer, similar to that employe by MacKay an Peto s hierarchical Dirichlet language moel. Each topic is now represente by a set of W istributions. Wor generation is efine by a conitional istribution P (w t = i w t 1 = j, z t = ), escribe by W T (W 1) free parameters. As before, these parameters form a matrix Φ, this time with W T rows.

4 Each row is a istribution over wors for a particular context j,, enote by φ j,. Each topic is now characterize by the W istributions specific to that topic. Topic generation is the same as in latent Dirichlet allocation: topics are rawn from the conitional istribution P (z t = t = ), escribe by D(T 1) free parameters, which form a matrix Θ. The joint probability of a corpus w an a single set of latent topic assignments z is P (w, z Φ, Θ) = i φ N i j, i j, θn, (19) j where N i j, is the number of times wor i has been generate by topic when precee by wor j. As in latent Dirichlet allocation, N is the number of times topic has been use in ocument. The prior over Θ is chosen to be the same as that use in latent Dirichlet allocation: P (Θ αn) = Dirichlet(θ αn). (20) However, the aitional conitioning context j in the istribution that efines wor generation affors greater flexibility in choosing a hierarchical prior for Φ than in either latent Dirichlet allocation or the hierarchical Dirichlet language moel. The priors over Φ use in both MacKay an Peto s language moel an Blei et al. s latent Dirichlet allocation are couple priors: learning the probability vector for a single context, φ j the case of MacKay an Peto s moel an φ in Blei et al. s, gives information about the probability vectors in other contexts, j an respectively. This epenence comes from the hyperparameter vector βm, share, in the case of the hierarchical Dirichlet language moel, between all possible previous wor contexts j an, in the case of latent Dirichlet allocation, between all possible topics. Since wor generation is conitione upon both j an in the new moel presente in this paper, there is more than one way in which hyperparameters for the prior over Φ might be share in this moel. Prior 1: Most simply, a single hyperparameter vector βm may be share between all j, contexts: P (Φ βm) = j Dirichlet(φ j, βm). (21) Here, nowlege about the probability vector for one φ j, will give information about the probability vectors φ j, for all other j, contexts. Prior 2: Alternatively there may be T hyperparameter vectors one for each topic : P (Φ {β m }) = Dirichlet(φ j, β m ). (22) j Information is now share between only those probability vectors with topic context. Intuitively, this is appealing. Learning about the istribution over wors for a single context j, yiels information about the istributions over wors for other contexts j, that share this topic, but not about istributions with other topic contexts. In other wors, this prior encapsulates the notion of similarity between istributions over wors for a given topic context. Having efine the istributions that characterize wor an topic generation in the new moel an assigne priors over the parameters, the generative process for a corpus w is: 1. For each topic an wor j: (a) Draw φ j, from the prior over Φ: either Dirichlet(φ j, βm) (prior 1) or Dirichlet(φ j, β m ) (prior 2). 2. For each ocument in the corpus: (a) Draw the topic mixture θ for ocument from Dirichlet(θ αn). (b) For each position t in ocument : i. Draw a topic z t Discrete(θ ). ii. Draw a wor w t from the istribution over wors for the context efine by the topic z t an previous wor w t 1, Discrete(φ wt 1,z t ). The evience, or probability of a corpus w given the hyperparameters, is either (prior 1) P (w αn, βm) = i Γ(N i j, + βm i ) Γ(β) Γ(N z j j, + β) i Γ(βm i) Γ(N ) + αn ) Γ(α) Γ(N + α) Γ(αn (23) ) or (prior 2) P (w αn, {β m }) = i Γ(N i j, + β m i ) Γ(β) Γ(N z j j, + β ) i Γ(β m i ) Γ(N ) + αn ) Γ(α) Γ(N + α) Γ(αn. (24) )

5 As in latent Dirichlet allocation, the sum over z is intractable, but may be approximate using MCMC. For a single set of latent topics z, an optimal hyperparameters [βm] MP or {[β m ] MP }, the approximate preictive istribution over wors given previous wor j an current topic is either (prior 1) P (i j,, w, z, [βm] MP ) = N i j, + [βm i ] MP N j, + β MP (25) or (prior 2) P (i j,, w, z, {[β m ] MP }) = N i j, + [β m i ] MP N j, + β MP. (26) In equation (25), the statistic N i j, /N j, is always smoothe by the quantity m i, regarless of the conitioning context, j,. Meanwhile, in equation (26), N i j, /N j, is smoothe by m i, which is will vary epening on the conitioning topic. Given [αn] MP, the approximate preictive istribution over topics for ocument is P (, w, z, [αn] MP ) = N + [αn ] MP N + α MP. (27) 4. Inference of Hyperparameters Previous sampling-base treatments of latent Dirichlet allocation have not inclue any metho for optimizing hyperparameters. However, the metho escribe in this section may be applie to both latent Dirichlet allocation an the moel presente in this paper. Given an uninformative prior over αn an βm or {β m }, the optimal hyperparameters, [αn] MP an [βm] MP or {[β m ] MP }, may be foun by maximizing the evience, given in equation (23) or (24). The evience contains latent variables z an must therefore be maximize with respect to the hyperparameters using an expectation-maximization (EM) algorithm. Unfortunately, the expectation with respect to the istribution over the latent variables involves a sum over T N terms, where N is the number of wors in the entire corpus. However, this sum may be approximate using a Marov chain Monte Carlo algorithm, such as Gibbs sampling, resulting in a Gibbs EM algorithm (Anrieu et al., 2003). Given a corpus w, an enoting the set of hyperparameters as U = {αn, βm} or U = {αn, {β m }}, the optimal hyperparameters may be foun by using the following steps: 1. Initialize z (0) an U (0) an set i = Iteration i: (a) E-step: Draw S samples {z (s) } S s=1 from P (z w, U (i 1) ) using a Gibbs sampler. (b) M-step: Maximize U (i) = arg max U 3. i i + 1 an go to E-Step 1 S S log P (w, z (s) U) s=1 Gibbs sampling involves sequentially sampling each variable of interest, z t here, from the istribution over that variable given the current values of all other variables an the ata. Letting the subscript t enote a quantity that exclues ata from the t th position, the conitional posterior for z t is either (prior 1) P (z t = z t, w, αn, βm) or (prior 2) {N wt w t 1,} t + βm wt {N } t + β P (z t = z t, w, αn, {β m }) {N t } t + αn {N t } t + α (28) {N wt w t 1,} t + β m wt {N wt 1,} t + β {N t } t + αn {N t } t + α. (29) Drawing a single set of topics z taes time proportional to the size of the corpus N an the number of topics T. The E-step therefore taes time proportional to N, T an the number of iterations for which the Marov chain is run in orer to obtain the S samples. Note that the samples use to approximate the E-step must come from a single Marov chain. The moel is unaffecte by permutations of topic inices. Consequently, there is no corresponence between topic inices across samples from ifferent Marov chains: topics that have inex in two ifferent Marov chains nee not have similar istributions over wors M-Step Given {z (s) } S s=1, the optimal αn can be compute using the fixe-point iteration [αn ] new = ( s αn s Ψ(N (s) ) + αn ) Ψ(αn ), (30) (Ψ(N + α) Ψ(α)) where N (s) is the number of times topic has been use in ocument in the s th sample. Similar fixepoint iterations can be use to etermine [βm i ] MP an {[β m ] MP } (Mina, 2003).

6 In my implementation, each fixe-point iteration taes time that is proportional to S an (at worst) N. For latent Dirichlet allocation an the new moel with prior 1, the time taen to perform the M-step is therefore at worst proportional to S, N an the number of iterations taen to reach convergence. For the new moel with prior 2, the time taen is also proportional to T. 5. Experiments To evaluate the new moel, both variants were compare with latent Dirichlet allocation an MacKay an Peto s hierarchical Dirichlet language moel. The topic moels were traine ientically: the Gibbs EM algorithm escribe in the previous section was use for both the new moel (with either prior) an latent Dirichlet allocation. The hyperparameters of the hierarchical Dirichlet language moel were inferre using the same fixe-point iteration use in the M-step. The results presente in this section are therefore a irect reflection of ifferences between the moels. Language moels are typically evaluate by computing the information rate of unseen test ata, measure in bits per wor: the better the preictive performance, the fewer the bits per wor. Information rate is a irect measure of text compressibility. Given corpora w an w test, information rate is efine as R = log 2 P (w test w) N test, (31) where N test is the number of wors in the test corpus. The information rate may be compute irectly for the hierarchical Dirichlet language moel. For the topic moels, computing P (w test w) requires summing over z an z test. As mentione before, this is intractable. Instea, the information rate may be compute using a single set of topics z for the training ata, in this case obtaine by running a Gibbs sampler for iterations after the hyperparameters have been inferre. Given z, multiple sets of topics for the test ata {z (s) test} S s=1 may be obtaine using the preictive istributions. Given hyperparameters U, P (w test w) may be approximate by taing the harmonic mean of {P (w test z (s) test, w, z, U)} S s=1 (Kass & Raftery, 1995) Corpora The moels were compare using two ata sets. The first was constructe by rawing 150 abstracts (ocuments) at ranom from the Psychological Review Abstracts ata provie by Griffiths an Steyvers (2005). A subset of 100 ocuments were use to infer the hyperparameters, while the remaining 50 were use for evaluating the moels. The secon ata set consiste of 150 newsgroup postings, rawn at ranom from the 20 Newsgroups ata (Rennie, 2005). Again, 100 ocuments were use for inference, while 50 were retaine for evaluating preictive accuracy. Punctuation characters, incluing hyphens an apostrophes, were treate as wor separators, an each number was replace with a special number toen to reuce the size of the vocabulary. To enable evaluation using ocuments containing toens not present in the training corpus, all wors that occurre only once in the training corpus were replace with an unseen toen u. Preprocessing the Psychological Review Abstracts ata in this manner resulte in a vocabulary of 1374 wors, which occurre times in the training corpus an 6521 times in the ocuments use for testing. The 20 Newsgroups ata ene up with a vocabulary of 2281 wors, which occurre times in the training ata an times in the test ata. Despite consisting of the same number of ocuments, the 20 Newsgroups corpora are roughly twice the size of the Psychological Review Abstracts corpora Results The experiments involving latent Dirichlet allocation an the new moel were run with 1 to 120 topics, on an Opteron 254 (2.8GHz). These moels all require at most 200 iterations of the Gibbs EM algorithm escribe in section 4. In the E-step, a Marov chain was run for 400 iterations. The first 200 iterations were iscare an 5 samples were taen from the remaining iterations. The mean time taen for each iteration is shown for both variants of the new moel as a function of the number of topics in figure 2. As expecte, the time taen is proportional to both the number of topics an the size of the corpus. The information rates of the test ata are shown in figure 1. On both corpora, latent Dirichlet allocation an the hierarchical Dirichlet language moel achieve similar performance. With prior 1, the new moel improves upon this by between 0.5 an 1 bits per wor. However, with prior 2, it achieves an information rate reuction of between 1 an 2 bits per wor. For latent Dirichlet allocation, the information rate is reuce most by the first 20 topics. The new moel uses a larger number of topics an exhibits a greater information rate reuction as more topics are ae. In latent Dirichlet allocation, the latent topic for a given wor is inferre using the ientity of the wor, the number of times the wor has previously been assume to be generate by each topic, an the number of times each topic has been use in the current ocument. In the new moel, the previous wor is also taen into ac-

7 bits per wor Hierarchical Dirichlet language moel Latent Dirichlet allocation Bigram topic moel (prior 1) Bigram topic moel (prior 2) bits per wor Hierarchical Dirichlet language moel Latent Dirichlet allocation Bigram topic moel (prior 1) Bigram topic moel (prior 2) number of topics number of topics Figure 1. Information rates of the test ata, measure in bits per wor, uner the ifferent moels versus number of topics. Left: Psychological Review Abstracts ata. Right: 20 Newsgroups ata. mean secons per iteration Psychological Review Abstracts 20 Newsgroups mean secons per iteration Psychological Review Abstracts 20 Newsgroups number of topics number of topics Figure 2. Mean time taen to perform a single iteration of the Gibbs EM algorithm escribe in section 4 as a function of the number of topics for both variants of the new moel. Left: prior 1. Right: prior 2. count. This aitional information means wors that were consiere to be generate by the same topic in latent Dirichlet allocation, may now be assume to have been generate by ifferent topics, epening on the contexts in which they are seen. Consequently, the new moel tens to use a greater number of topics. In aition to comparing preictive accuracy, it is instructive to loo at the inferre topics. Table 1 shows the wors most frequently assigne to a selection of topics extracte from the 20 Newsgroups training ata by each of the moels. The unseen toen was omitte. The topics inferre using latent Dirichlet allocation contain many function wors, such as the, in an to. In contrast, all but one of the topics inferre by the new moel, especially with prior 2, typically contain fewer function wors. Instea, these are largely collecte into the single remaining topic, shown in the last column of rows 2 an 3 in table 1. This effect is similar, though less pronounce, to that achieve by Griffiths et al. s composite moel (2004), in which function wors are hanle by a hien Marov moel, while content wors are hanle by latent Dirichlet allocation. 6. Future Wor There is a another possible prior over Φ, in aition to the two priors iscusse in this paper. This prior has a hyperparameter vector for each previous wor context j, resulting in W hyperparameter vectors: P (Φ {β j m j }) = Dirichlet(φ j, β j m j ). (32) j Here, information is share between all istributions with previous wor context j. This prior captures the notion of common bigrams wor pairs that always occur together. However, the number of hyperparameter vectors is extremely large much larger than the number of hyperparameters in prior 2 with comparatively little ata from which to infer them. To mae effective use of this prior, each normalize measure m j shoul itself be assigne a Dirichlet prior. This variant of the moel coul be compare with those presente

8 in this paper. To enable a irect comparison, Dirichlet hyperpriors coul also be place on the hyperparameters of the priors escribe in section Conclusions Creating a single moel that integrates bigram-base an topic-base approaches to ocument moeling has several benefits. Firstly, the preictive accuracy of the new moel, especially when using prior 2, is significantly better than that of either latent Dirichlet allocation or the hierarchical Dirichlet language moel. Seconly, the moel automatically infers a separate topic for function wors, meaning that the other topics are less ominate by these wors. Acnowlegments Thans to Phil Cowans, Davi MacKay an Fernano Pereira for useful iscussions. Thans to Anrew Suffiel for proviing sparse matrix coe. References Anrieu, C., e Freitas, N., Doucet, A., & Joran, M. I. (2003). An introuction to MCMC for machine learning. Machine Learning, 50, Blei, D. M., Ng, A. Y., & Joran, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, Griffiths, T. L., & Steyvers, M. (2004). Fining scientific topics. Proceeings of the National Acaemy of Sciences, 101, Griffiths, T. L., & Steyvers, M. (2005). Topic moeling toolbox. programs_ata/toolbox.htm. Griffiths, T. L., Steyvers, M., Blei, D. M., & Tenenbaum, J. B. (2004). Integrating topics an syntax. Avances in Neural Information Processing Systems. Jeline, F., & Mercer, R. (1980). Interpolate estimation of Marov source parameters from sparse ata. In E. Gelsema an L. Kanal (Es.), Pattern recognition in practice, North-Hollan publishing company. Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90, MacKay, D. J. C., & Peto, L. C. B. (1995). A hierarchical Dirichlet language moel. Natural Language Engineering, 1, Mina, T. P. (2003). Estimating a Dirichlet istribution. papers/irichlet/. Rennie, J. (2005). 20 newsgroups ata set. people.csail.mit.eu/jrennie/20newsgroups/. Table 1. Top: The most commonly occurring wors in some of the topics inferre from the 20 Newsgroups training ata by latent Dirichlet allocation. Mile: Some of the topics inferre from the same ata by the new moel with prior 1. Bottom: Some of the topics inferre by the new moel with prior 2. Each column represents a single topic, an wors appear in orer of frequency of occurrence. Content wors are in bol. Function wors, which are not in bol, were ientifie by their presence on a stanar list of stop wors: resources/linguistic_utils/stop_wors. All three sets of topics were taen from moels with 90 topics. Latent Dirichlet allocation the i that easter number is proteins ishtar in satan the a to the of the espn which to have hocey an i with a of if but this metaphorical number english as evil you an run there fact is Bigram topic moel (prior 1) to the the the party go an a arab is between to not belief warrior i power believe enemy of any use battlefiel number i there a is is strong of in this mae there an things i way it Bigram topic moel (prior 2) party go number the arab believe the to power about tower a as atheism cloc an arabs gos a of political before power i are see motherboar is rolling atheist mhz number lonon most socet it security shafts plastic that

Topic Modeling: Beyond Bag-of-Words

Topic Modeling: Beyond Bag-of-Words University of Cambridge hmw26@cam.ac.uk June 26, 2006 Generative Probabilistic Models of Text Used in text compression, predictive text entry, information retrieval Estimate probability of a word in a

More information

. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences.

. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences. S 63 Lecture 8 2/2/26 Lecturer Lillian Lee Scribes Peter Babinski, Davi Lin Basic Language Moeling Approach I. Special ase of LM-base Approach a. Recap of Formulas an Terms b. Fixing θ? c. About that Multinomial

More information

Collapsed Gibbs and Variational Methods for LDA. Example Collapsed MoG Sampling

Collapsed Gibbs and Variational Methods for LDA. Example Collapsed MoG Sampling Case Stuy : Document Retrieval Collapse Gibbs an Variational Methos for LDA Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 7 th, 0 Example

More information

Collapsed Variational Inference for HDP

Collapsed Variational Inference for HDP Collapse Variational Inference for HDP Yee W. Teh Davi Newman an Max Welling Publishe on NIPS 2007 Discussion le by Iulian Pruteanu Outline Introuction Hierarchical Bayesian moel for LDA Collapse VB inference

More information

A Review of Multiple Try MCMC algorithms for Signal Processing

A Review of Multiple Try MCMC algorithms for Signal Processing A Review of Multiple Try MCMC algorithms for Signal Processing Luca Martino Image Processing Lab., Universitat e València (Spain) Universia Carlos III e Mari, Leganes (Spain) Abstract Many applications

More information

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012 CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration

More information

LDA Collapsed Gibbs Sampler, VariaNonal Inference. Task 3: Mixed Membership Models. Case Study 5: Mixed Membership Modeling

LDA Collapsed Gibbs Sampler, VariaNonal Inference. Task 3: Mixed Membership Models. Case Study 5: Mixed Membership Modeling Case Stuy 5: Mixe Membership Moeling LDA Collapse Gibbs Sampler, VariaNonal Inference Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox May 8 th, 05 Emily Fox 05 Task : Mixe

More information

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013 Survey Sampling Kosuke Imai Department of Politics, Princeton University February 19, 2013 Survey sampling is one of the most commonly use ata collection methos for social scientists. We begin by escribing

More information

Linear First-Order Equations

Linear First-Order Equations 5 Linear First-Orer Equations Linear first-orer ifferential equations make up another important class of ifferential equations that commonly arise in applications an are relatively easy to solve (in theory)

More information

This module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics

This module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics This moule is part of the Memobust Hanbook on Methoology of Moern Business Statistics 26 March 2014 Metho: Balance Sampling for Multi-Way Stratification Contents General section... 3 1. Summary... 3 2.

More information

Least-Squares Regression on Sparse Spaces

Least-Squares Regression on Sparse Spaces Least-Squares Regression on Sparse Spaces Yuri Grinberg, Mahi Milani Far, Joelle Pineau School of Computer Science McGill University Montreal, Canaa {ygrinb,mmilan1,jpineau}@cs.mcgill.ca 1 Introuction

More information

Cascaded redundancy reduction

Cascaded redundancy reduction Network: Comput. Neural Syst. 9 (1998) 73 84. Printe in the UK PII: S0954-898X(98)88342-5 Cascae reunancy reuction Virginia R e Sa an Geoffrey E Hinton Department of Computer Science, University of Toronto,

More information

THE VAN KAMPEN EXPANSION FOR LINKED DUFFING LINEAR OSCILLATORS EXCITED BY COLORED NOISE

THE VAN KAMPEN EXPANSION FOR LINKED DUFFING LINEAR OSCILLATORS EXCITED BY COLORED NOISE Journal of Soun an Vibration (1996) 191(3), 397 414 THE VAN KAMPEN EXPANSION FOR LINKED DUFFING LINEAR OSCILLATORS EXCITED BY COLORED NOISE E. M. WEINSTEIN Galaxy Scientific Corporation, 2500 English Creek

More information

SYNCHRONOUS SEQUENTIAL CIRCUITS

SYNCHRONOUS SEQUENTIAL CIRCUITS CHAPTER SYNCHRONOUS SEUENTIAL CIRCUITS Registers an counters, two very common synchronous sequential circuits, are introuce in this chapter. Register is a igital circuit for storing information. Contents

More information

Implicit Differentiation

Implicit Differentiation Implicit Differentiation Thus far, the functions we have been concerne with have been efine explicitly. A function is efine explicitly if the output is given irectly in terms of the input. For instance,

More information

Part I: Web Structure Mining Chapter 1: Information Retrieval and Web Search

Part I: Web Structure Mining Chapter 1: Information Retrieval and Web Search Part I: Web Structure Mining Chapter : Information Retrieval an Web Search The Web Challenges Crawling the Web Inexing an Keywor Search Evaluating Search Quality Similarity Search The Web Challenges Tim

More information

Lecture 2: Correlated Topic Model

Lecture 2: Correlated Topic Model Probabilistic Moels for Unsupervise Learning Spring 203 Lecture 2: Correlate Topic Moel Inference for Correlate Topic Moel Yuan Yuan First of all, let us make some claims about the parameters an variables

More information

Lower bounds on Locality Sensitive Hashing

Lower bounds on Locality Sensitive Hashing Lower bouns on Locality Sensitive Hashing Rajeev Motwani Assaf Naor Rina Panigrahy Abstract Given a metric space (X, X ), c 1, r > 0, an p, q [0, 1], a istribution over mappings H : X N is calle a (r,

More information

'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21

'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21 Large amping in a structural material may be either esirable or unesirable, epening on the engineering application at han. For example, amping is a esirable property to the esigner concerne with limiting

More information

How to Minimize Maximum Regret in Repeated Decision-Making

How to Minimize Maximum Regret in Repeated Decision-Making How to Minimize Maximum Regret in Repeate Decision-Making Karl H. Schlag July 3 2003 Economics Department, European University Institute, Via ella Piazzuola 43, 033 Florence, Italy, Tel: 0039-0-4689, email:

More information

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION The Annals of Statistics 1997, Vol. 25, No. 6, 2313 2327 LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION By Eva Riccomagno, 1 Rainer Schwabe 2 an Henry P. Wynn 1 University of Warwick, Technische

More information

Latent Dirichlet Allocation in Web Spam Filtering

Latent Dirichlet Allocation in Web Spam Filtering Latent Dirichlet Allocation in Web Spam Filtering István Bíró Jácint Szabó Anrás A. Benczúr Data Mining an Web search Research Group, Informatics Laboratory Computer an Automation Research Institute of

More information

One-dimensional I test and direction vector I test with array references by induction variable

One-dimensional I test and direction vector I test with array references by induction variable Int. J. High Performance Computing an Networking, Vol. 3, No. 4, 2005 219 One-imensional I test an irection vector I test with array references by inuction variable Minyi Guo School of Computer Science

More information

IN the evolution of the Internet, there have been

IN the evolution of the Internet, there have been 1 Tag-Weighte Topic Moel For Large-scale Semi-Structure Documents Shuangyin Li, Jiefei Li, Guan Huang, Ruiyang Tan, an Rong Pan arxiv:1507.08396v1 [cs.cl] 30 Jul 2015 Abstract To ate, there have been massive

More information

Factorized Multi-Modal Topic Model

Factorized Multi-Modal Topic Model Factorize Multi-Moal Topic Moel Seppo Virtanen 1, Yangqing Jia 2, Arto Klami 1, Trevor Darrell 2 1 Helsini Institute for Information Technology HIIT Department of Information an Compute Science, Aalto

More information

WEIGHTING A RESAMPLED PARTICLE IN SEQUENTIAL MONTE CARLO. L. Martino, V. Elvira, F. Louzada

WEIGHTING A RESAMPLED PARTICLE IN SEQUENTIAL MONTE CARLO. L. Martino, V. Elvira, F. Louzada WEIGHTIG A RESAMPLED PARTICLE I SEQUETIAL MOTE CARLO L. Martino, V. Elvira, F. Louzaa Dep. of Signal Theory an Communic., Universia Carlos III e Mari, Leganés (Spain). Institute of Mathematical Sciences

More information

KNN Particle Filters for Dynamic Hybrid Bayesian Networks

KNN Particle Filters for Dynamic Hybrid Bayesian Networks KNN Particle Filters for Dynamic Hybri Bayesian Networs H. D. Chen an K. C. Chang Dept. of Systems Engineering an Operations Research George Mason University MS 4A6, 4400 University Dr. Fairfax, VA 22030

More information

Non-Linear Bayesian CBRN Source Term Estimation

Non-Linear Bayesian CBRN Source Term Estimation Non-Linear Bayesian CBRN Source Term Estimation Peter Robins Hazar Assessment, Simulation an Preiction Group Dstl Porton Down, UK. probins@stl.gov.uk Paul Thomas Hazar Assessment, Simulation an Preiction

More information

Lower Bounds for the Smoothed Number of Pareto optimal Solutions

Lower Bounds for the Smoothed Number of Pareto optimal Solutions Lower Bouns for the Smoothe Number of Pareto optimal Solutions Tobias Brunsch an Heiko Röglin Department of Computer Science, University of Bonn, Germany brunsch@cs.uni-bonn.e, heiko@roeglin.org Abstract.

More information

Improving Estimation Accuracy in Nonrandomized Response Questioning Methods by Multiple Answers

Improving Estimation Accuracy in Nonrandomized Response Questioning Methods by Multiple Answers International Journal of Statistics an Probability; Vol 6, No 5; September 207 ISSN 927-7032 E-ISSN 927-7040 Publishe by Canaian Center of Science an Eucation Improving Estimation Accuracy in Nonranomize

More information

The Role of Models in Model-Assisted and Model- Dependent Estimation for Domains and Small Areas

The Role of Models in Model-Assisted and Model- Dependent Estimation for Domains and Small Areas The Role of Moels in Moel-Assiste an Moel- Depenent Estimation for Domains an Small Areas Risto Lehtonen University of Helsini Mio Myrsylä University of Pennsylvania Carl-Eri Särnal University of Montreal

More information

Estimating Causal Direction and Confounding Of Two Discrete Variables

Estimating Causal Direction and Confounding Of Two Discrete Variables Estimating Causal Direction an Confouning Of Two Discrete Variables This inspire further work on the so calle aitive noise moels. Hoyer et al. (2009) extene Shimizu s ientifiaarxiv:1611.01504v1 [stat.ml]

More information

Necessary and Sufficient Conditions for Sketched Subspace Clustering

Necessary and Sufficient Conditions for Sketched Subspace Clustering Necessary an Sufficient Conitions for Sketche Subspace Clustering Daniel Pimentel-Alarcón, Laura Balzano 2, Robert Nowak University of Wisconsin-Maison, 2 University of Michigan-Ann Arbor Abstract This

More information

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks A PAC-Bayesian Approach to Spectrally-Normalize Margin Bouns for Neural Networks Behnam Neyshabur, Srinah Bhojanapalli, Davi McAllester, Nathan Srebro Toyota Technological Institute at Chicago {bneyshabur,

More information

Influence of weight initialization on multilayer perceptron performance

Influence of weight initialization on multilayer perceptron performance Influence of weight initialization on multilayer perceptron performance M. Karouia (1,2) T. Denœux (1) R. Lengellé (1) (1) Université e Compiègne U.R.A. CNRS 817 Heuiasyc BP 649 - F-66 Compiègne ceex -

More information

7.1 Support Vector Machine

7.1 Support Vector Machine 67577 Intro. to Machine Learning Fall semester, 006/7 Lecture 7: Support Vector Machines an Kernel Functions II Lecturer: Amnon Shashua Scribe: Amnon Shashua 7. Support Vector Machine We return now to

More information

CONTROL CHARTS FOR VARIABLES

CONTROL CHARTS FOR VARIABLES UNIT CONTOL CHATS FO VAIABLES Structure.1 Introuction Objectives. Control Chart Technique.3 Control Charts for Variables.4 Control Chart for Mean(-Chart).5 ange Chart (-Chart).6 Stanar Deviation Chart

More information

The Exact Form and General Integrating Factors

The Exact Form and General Integrating Factors 7 The Exact Form an General Integrating Factors In the previous chapters, we ve seen how separable an linear ifferential equations can be solve using methos for converting them to forms that can be easily

More information

inflow outflow Part I. Regular tasks for MAE598/494 Task 1

inflow outflow Part I. Regular tasks for MAE598/494 Task 1 MAE 494/598, Fall 2016 Project #1 (Regular tasks = 20 points) Har copy of report is ue at the start of class on the ue ate. The rules on collaboration will be release separately. Please always follow the

More information

A Course in Machine Learning

A Course in Machine Learning A Course in Machine Learning Hal Daumé III 12 EFFICIENT LEARNING So far, our focus has been on moels of learning an basic algorithms for those moels. We have not place much emphasis on how to learn quickly.

More information

Topic Modelling and Latent Dirichlet Allocation

Topic Modelling and Latent Dirichlet Allocation Topic Modelling and Latent Dirichlet Allocation Stephen Clark (with thanks to Mark Gales for some of the slides) Lent 2013 Machine Learning for Language Processing: Lecture 7 MPhil in Advanced Computer

More information

Code_Aster. Detection of the singularities and calculation of a map of size of elements

Code_Aster. Detection of the singularities and calculation of a map of size of elements Titre : Détection es singularités et calcul une carte [...] Date : 0/0/0 Page : /6 Responsable : DLMAS Josselin Clé : R4.0.04 Révision : Detection of the singularities an calculation of a map of size of

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Keywors: multi-view learning, clustering, canonical correlation analysis Abstract Clustering ata in high-imensions is believe to be a har problem in general. A number of efficient clustering algorithms

More information

Optimization of Geometries by Energy Minimization

Optimization of Geometries by Energy Minimization Optimization of Geometries by Energy Minimization by Tracy P. Hamilton Department of Chemistry University of Alabama at Birmingham Birmingham, AL 3594-140 hamilton@uab.eu Copyright Tracy P. Hamilton, 1997.

More information

Equilibrium in Queues Under Unknown Service Times and Service Value

Equilibrium in Queues Under Unknown Service Times and Service Value University of Pennsylvania ScholarlyCommons Finance Papers Wharton Faculty Research 1-2014 Equilibrium in Queues Uner Unknown Service Times an Service Value Laurens Debo Senthil K. Veeraraghavan University

More information

Predictive Control of a Laboratory Time Delay Process Experiment

Predictive Control of a Laboratory Time Delay Process Experiment Print ISSN:3 6; Online ISSN: 367-5357 DOI:0478/itc-03-0005 Preictive Control of a aboratory ime Delay Process Experiment S Enev Key Wors: Moel preictive control; time elay process; experimental results

More information

Latent Dirichlet Allocation (LDA)

Latent Dirichlet Allocation (LDA) Latent Dirichlet Allocation (LDA) A review of topic modeling and customer interactions application 3/11/2015 1 Agenda Agenda Items 1 What is topic modeling? Intro Text Mining & Pre-Processing Natural Language

More information

Prior-Based Dual Additive Latent Dirichlet Allocation for User-Item Connected Documents

Prior-Based Dual Additive Latent Dirichlet Allocation for User-Item Connected Documents Proceeings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2015) Prior-Base Dual Aitive Latent Dirichlet Allocation for User-Item Connecte Documents Wei Zhang Jianyong

More information

Modeling the effects of polydispersity on the viscosity of noncolloidal hard sphere suspensions. Paul M. Mwasame, Norman J. Wagner, Antony N.

Modeling the effects of polydispersity on the viscosity of noncolloidal hard sphere suspensions. Paul M. Mwasame, Norman J. Wagner, Antony N. Submitte to the Journal of Rheology Moeling the effects of polyispersity on the viscosity of noncolloial har sphere suspensions Paul M. Mwasame, Norman J. Wagner, Antony N. Beris a) epartment of Chemical

More information

Math 1B, lecture 8: Integration by parts

Math 1B, lecture 8: Integration by parts Math B, lecture 8: Integration by parts Nathan Pflueger 23 September 2 Introuction Integration by parts, similarly to integration by substitution, reverses a well-known technique of ifferentiation an explores

More information

Chapter 6: Energy-Momentum Tensors

Chapter 6: Energy-Momentum Tensors 49 Chapter 6: Energy-Momentum Tensors This chapter outlines the general theory of energy an momentum conservation in terms of energy-momentum tensors, then applies these ieas to the case of Bohm's moel.

More information

Code_Aster. Detection of the singularities and computation of a card of size of elements

Code_Aster. Detection of the singularities and computation of a card of size of elements Titre : Détection es singularités et calcul une carte [...] Date : 0/0/0 Page : /6 Responsable : Josselin DLMAS Clé : R4.0.04 Révision : 9755 Detection of the singularities an computation of a car of size

More information

The Press-Schechter mass function

The Press-Schechter mass function The Press-Schechter mass function To state the obvious: It is important to relate our theories to what we can observe. We have looke at linear perturbation theory, an we have consiere a simple moel for

More information

The Principle of Least Action

The Principle of Least Action Chapter 7. The Principle of Least Action 7.1 Force Methos vs. Energy Methos We have so far stuie two istinct ways of analyzing physics problems: force methos, basically consisting of the application of

More information

Homework 2 EM, Mixture Models, PCA, Dualitys

Homework 2 EM, Mixture Models, PCA, Dualitys Homework 2 EM, Mixture Moels, PCA, Dualitys CMU 10-715: Machine Learning (Fall 2015) http://www.cs.cmu.eu/~bapoczos/classes/ml10715_2015fall/ OUT: Oct 5, 2015 DUE: Oct 19, 2015, 10:20 AM Guielines The

More information

Introduction to Markov Processes

Introduction to Markov Processes Introuction to Markov Processes Connexions moule m44014 Zzis law Gustav) Meglicki, Jr Office of the VP for Information Technology Iniana University RCS: Section-2.tex,v 1.24 2012/12/21 18:03:08 gustav

More information

Similarity Measures for Categorical Data A Comparative Study. Technical Report

Similarity Measures for Categorical Data A Comparative Study. Technical Report Similarity Measures for Categorical Data A Comparative Stuy Technical Report Department of Computer Science an Engineering University of Minnesota 4-92 EECS Builing 200 Union Street SE Minneapolis, MN

More information

A NONLINEAR SOURCE SEPARATION APPROACH FOR THE NICOLSKY-EISENMAN MODEL

A NONLINEAR SOURCE SEPARATION APPROACH FOR THE NICOLSKY-EISENMAN MODEL 6th European Signal Processing Conference EUSIPCO 28, Lausanne, Switzerlan, August 25-29, 28, copyright by EURASIP A NONLINEAR SOURCE SEPARATION APPROACH FOR THE NICOLSKY-EISENMAN MODEL Leonaro Tomazeli

More information

Schrödinger s equation.

Schrödinger s equation. Physics 342 Lecture 5 Schröinger s Equation Lecture 5 Physics 342 Quantum Mechanics I Wenesay, February 3r, 2010 Toay we iscuss Schröinger s equation an show that it supports the basic interpretation of

More information

Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine

Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine Nitish Srivastava nitish@cs.toronto.edu Ruslan Salahutdinov rsalahu@cs.toronto.edu Geoffrey Hinton hinton@cs.toronto.edu

More information

u!i = a T u = 0. Then S satisfies

u!i = a T u = 0. Then S satisfies Deterministic Conitions for Subspace Ientifiability from Incomplete Sampling Daniel L Pimentel-Alarcón, Nigel Boston, Robert D Nowak University of Wisconsin-Maison Abstract Consier an r-imensional subspace

More information

A simple model for the small-strain behaviour of soils

A simple model for the small-strain behaviour of soils A simple moel for the small-strain behaviour of soils José Jorge Naer Department of Structural an Geotechnical ngineering, Polytechnic School, University of São Paulo 05508-900, São Paulo, Brazil, e-mail:

More information

Bayesian Mixtures of Bernoulli Distributions

Bayesian Mixtures of Bernoulli Distributions Bayesian Mixtures of Bernoulli Distributions Laurens van der Maaten Department of Computer Science and Engineering University of California, San Diego Introduction The mixture of Bernoulli distributions

More information

Homework 2 Solutions EM, Mixture Models, PCA, Dualitys

Homework 2 Solutions EM, Mixture Models, PCA, Dualitys Homewor Solutions EM, Mixture Moels, PCA, Dualitys CMU 0-75: Machine Learning Fall 05 http://www.cs.cmu.eu/~bapoczos/classes/ml075_05fall/ OUT: Oct 5, 05 DUE: Oct 9, 05, 0:0 AM An EM algorithm for a Mixture

More information

Classifying Biomedical Text Abstracts based on Hierarchical Concept Structure

Classifying Biomedical Text Abstracts based on Hierarchical Concept Structure Classifying Biomeical Text Abstracts base on Hierarchical Concept Structure Rozilawati Binti Dollah an Masai Aono Abstract Classifying biomeical literature is a ifficult an challenging tas, especially

More information

Generative learning methods for bags of features

Generative learning methods for bags of features Generative learning methos for bags of features Moel the robability of a bag of features given a class Many slies aate from Fei-Fei Li, Rob Fergus, an Antonio Torralba Generative methos We ill cover to

More information

CUSTOMER REVIEW FEATURE EXTRACTION Heng Ren, Jingye Wang, and Tony Wu

CUSTOMER REVIEW FEATURE EXTRACTION Heng Ren, Jingye Wang, and Tony Wu CUSTOMER REVIEW FEATURE EXTRACTION Heng Ren, Jingye Wang, an Tony Wu Abstract Popular proucts often have thousans of reviews that contain far too much information for customers to igest. Our goal for the

More information

Quantum mechanical approaches to the virial

Quantum mechanical approaches to the virial Quantum mechanical approaches to the virial S.LeBohec Department of Physics an Astronomy, University of Utah, Salt Lae City, UT 84112, USA Date: June 30 th 2015 In this note, we approach the virial from

More information

Survey-weighted Unit-Level Small Area Estimation

Survey-weighted Unit-Level Small Area Estimation Survey-weighte Unit-Level Small Area Estimation Jan Pablo Burgar an Patricia Dörr Abstract For evience-base regional policy making, geographically ifferentiate estimates of socio-economic inicators are

More information

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions Working Paper 2013:5 Department of Statistics Computing Exact Confience Coefficients of Simultaneous Confience Intervals for Multinomial Proportions an their Functions Shaobo Jin Working Paper 2013:5

More information

Analyzing Tensor Power Method Dynamics in Overcomplete Regime

Analyzing Tensor Power Method Dynamics in Overcomplete Regime Journal of Machine Learning Research 18 (2017) 1-40 Submitte 9/15; Revise 11/16; Publishe 4/17 Analyzing Tensor Power Metho Dynamics in Overcomplete Regime Animashree Ananumar Department of Electrical

More information

Expected Value of Partial Perfect Information

Expected Value of Partial Perfect Information Expecte Value of Partial Perfect Information Mike Giles 1, Takashi Goa 2, Howar Thom 3 Wei Fang 1, Zhenru Wang 1 1 Mathematical Institute, University of Oxfor 2 School of Engineering, University of Tokyo

More information

JUST THE MATHS UNIT NUMBER DIFFERENTIATION 2 (Rates of change) A.J.Hobson

JUST THE MATHS UNIT NUMBER DIFFERENTIATION 2 (Rates of change) A.J.Hobson JUST THE MATHS UNIT NUMBER 10.2 DIFFERENTIATION 2 (Rates of change) by A.J.Hobson 10.2.1 Introuction 10.2.2 Average rates of change 10.2.3 Instantaneous rates of change 10.2.4 Derivatives 10.2.5 Exercises

More information

Topic Uncovering and Image Annotation via Scalable Probit Normal Correlated Topic Models

Topic Uncovering and Image Annotation via Scalable Probit Normal Correlated Topic Models Rochester Institute of Technology RIT Scholar Wors Theses Thesis/Dissertation Collections 5-2015 Topic Uncovering an Image Annotation via Scalable Probit Normal Correlate Topic Moels Xingchen Yu Follow

More information

arxiv: v1 [math.co] 29 May 2009

arxiv: v1 [math.co] 29 May 2009 arxiv:0905.4913v1 [math.co] 29 May 2009 simple Havel-Hakimi type algorithm to realize graphical egree sequences of irecte graphs Péter L. Erős an István Miklós. Rényi Institute of Mathematics, Hungarian

More information

IMAGE classification is a topic of significant interest within

IMAGE classification is a topic of significant interest within IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 35, NO. 11, NOVEMBER 2013 2665 Latent Dirichlet Allocation Moels for Image Classification Nihil Rasiwasia, Member, IEEE, an Nuno Vasconcelos,

More information

Robust Bounds for Classification via Selective Sampling

Robust Bounds for Classification via Selective Sampling Nicolò Cesa-Bianchi DSI, Università egli Stui i Milano, Italy Clauio Gentile DICOM, Università ell Insubria, Varese, Italy Francesco Orabona Iiap, Martigny, Switzerlan cesa-bianchi@siunimiit clauiogentile@uninsubriait

More information

Separation of Variables

Separation of Variables Physics 342 Lecture 1 Separation of Variables Lecture 1 Physics 342 Quantum Mechanics I Monay, January 25th, 2010 There are three basic mathematical tools we nee, an then we can begin working on the physical

More information

On combinatorial approaches to compressed sensing

On combinatorial approaches to compressed sensing On combinatorial approaches to compresse sensing Abolreza Abolhosseini Moghaam an Hayer Raha Department of Electrical an Computer Engineering, Michigan State University, East Lansing, MI, U.S. Emails:{abolhos,raha}@msu.eu

More information

TOEPLITZ AND POSITIVE SEMIDEFINITE COMPLETION PROBLEM FOR CYCLE GRAPH

TOEPLITZ AND POSITIVE SEMIDEFINITE COMPLETION PROBLEM FOR CYCLE GRAPH English NUMERICAL MATHEMATICS Vol14, No1 Series A Journal of Chinese Universities Feb 2005 TOEPLITZ AND POSITIVE SEMIDEFINITE COMPLETION PROBLEM FOR CYCLE GRAPH He Ming( Λ) Michael K Ng(Ξ ) Abstract We

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Kamalika Chauhuri ITA, UC San Diego, 9500 Gilman Drive, La Jolla, CA Sham M. Kakae Karen Livescu Karthik Sriharan Toyota Technological Institute at Chicago, 6045 S. Kenwoo Ave., Chicago, IL kamalika@soe.ucs.eu

More information

A Novel Decoupled Iterative Method for Deep-Submicron MOSFET RF Circuit Simulation

A Novel Decoupled Iterative Method for Deep-Submicron MOSFET RF Circuit Simulation A Novel ecouple Iterative Metho for eep-submicron MOSFET RF Circuit Simulation CHUAN-SHENG WANG an YIMING LI epartment of Mathematics, National Tsing Hua University, National Nano evice Laboratories, an

More information

On Topic Evolution. Eric P. Xing School of Computer Science Carnegie Mellon University Technical Report: CMU-CALD

On Topic Evolution. Eric P. Xing School of Computer Science Carnegie Mellon University Technical Report: CMU-CALD On Topic Evolution Eric P. Xing School of Computer Science Carnegie Mellon University epxing@cs.cmu.eu Technical Report: CMU-CALD-05-5 December 005 Abstract I introuce topic evolution moels for longituinal

More information

A New Minimum Description Length

A New Minimum Description Length A New Minimum Description Length Soosan Beheshti, Munther A. Dahleh Laboratory for Information an Decision Systems Massachusetts Institute of Technology soosan@mit.eu,ahleh@lis.mit.eu Abstract The minimum

More information

Inter-domain Gaussian Processes for Sparse Inference using Inducing Features

Inter-domain Gaussian Processes for Sparse Inference using Inducing Features Inter-omain Gaussian Processes for Sparse Inference using Inucing Features Miguel Lázaro-Greilla an Aníbal R. Figueiras-Vial Dep. Signal Processing & Communications Universia Carlos III e Mari, SPAIN {miguel,arfv}@tsc.uc3m.es

More information

ASYMMETRIC TWO-OUTPUT QUANTUM PROCESSOR IN ANY DIMENSION

ASYMMETRIC TWO-OUTPUT QUANTUM PROCESSOR IN ANY DIMENSION ASYMMETRIC TWO-OUTPUT QUANTUM PROCESSOR IN ANY IMENSION IULIA GHIU,, GUNNAR BJÖRK Centre for Avance Quantum Physics, University of Bucharest, P.O. Box MG-, R-0775, Bucharest Mgurele, Romania School of

More information

Learning Automata in Games with Memory with Application to Circuit-Switched Routing

Learning Automata in Games with Memory with Application to Circuit-Switched Routing Learning Automata in Games with Memory with Application to Circuit-Switche Routing Murat Alanyali Abstract A general setting is consiere in which autonomous users interact by means of a finite-state controlle

More information

Agmon Kolmogorov Inequalities on l 2 (Z d )

Agmon Kolmogorov Inequalities on l 2 (Z d ) Journal of Mathematics Research; Vol. 6, No. ; 04 ISSN 96-9795 E-ISSN 96-9809 Publishe by Canaian Center of Science an Eucation Agmon Kolmogorov Inequalities on l (Z ) Arman Sahovic Mathematics Department,

More information

6 General properties of an autonomous system of two first order ODE

6 General properties of an autonomous system of two first order ODE 6 General properties of an autonomous system of two first orer ODE Here we embark on stuying the autonomous system of two first orer ifferential equations of the form ẋ 1 = f 1 (, x 2 ), ẋ 2 = f 2 (, x

More information

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors Math 18.02 Notes on ifferentials, the Chain Rule, graients, irectional erivative, an normal vectors Tangent plane an linear approximation We efine the partial erivatives of f( xy, ) as follows: f f( x+

More information

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control 19 Eigenvalues, Eigenvectors, Orinary Differential Equations, an Control This section introuces eigenvalues an eigenvectors of a matrix, an iscusses the role of the eigenvalues in etermining the behavior

More information

Construction of the Electronic Radial Wave Functions and Probability Distributions of Hydrogen-like Systems

Construction of the Electronic Radial Wave Functions and Probability Distributions of Hydrogen-like Systems Construction of the Electronic Raial Wave Functions an Probability Distributions of Hyrogen-like Systems Thomas S. Kuntzleman, Department of Chemistry Spring Arbor University, Spring Arbor MI 498 tkuntzle@arbor.eu

More information

TIME-DELAY ESTIMATION USING FARROW-BASED FRACTIONAL-DELAY FIR FILTERS: FILTER APPROXIMATION VS. ESTIMATION ERRORS

TIME-DELAY ESTIMATION USING FARROW-BASED FRACTIONAL-DELAY FIR FILTERS: FILTER APPROXIMATION VS. ESTIMATION ERRORS TIME-DEAY ESTIMATION USING FARROW-BASED FRACTIONA-DEAY FIR FITERS: FITER APPROXIMATION VS. ESTIMATION ERRORS Mattias Olsson, Håkan Johansson, an Per öwenborg Div. of Electronic Systems, Dept. of Electrical

More information

TEMPORAL AND TIME-FREQUENCY CORRELATION-BASED BLIND SOURCE SEPARATION METHODS. Yannick DEVILLE

TEMPORAL AND TIME-FREQUENCY CORRELATION-BASED BLIND SOURCE SEPARATION METHODS. Yannick DEVILLE TEMPORAL AND TIME-FREQUENCY CORRELATION-BASED BLIND SOURCE SEPARATION METHODS Yannick DEVILLE Université Paul Sabatier Laboratoire Acoustique, Métrologie, Instrumentation Bât. 3RB2, 8 Route e Narbonne,

More information

Quantile function expansion using regularly varying functions

Quantile function expansion using regularly varying functions Quantile function expansion using regularly varying functions arxiv:705.09494v [math.st] 9 Aug 07 Thomas Fung a, an Eugene Seneta b a Department of Statistics, Macquarie University, NSW 09, Australia b

More information

Structural Risk Minimization over Data-Dependent Hierarchies

Structural Risk Minimization over Data-Dependent Hierarchies Structural Risk Minimization over Data-Depenent Hierarchies John Shawe-Taylor Department of Computer Science Royal Holloway an Befor New College University of Lonon Egham, TW20 0EX, UK jst@cs.rhbnc.ac.uk

More information

Linear Regression with Limited Observation

Linear Regression with Limited Observation Ela Hazan Tomer Koren Technion Israel Institute of Technology, Technion City 32000, Haifa, Israel ehazan@ie.technion.ac.il tomerk@cs.technion.ac.il Abstract We consier the most common variants of linear

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Kamalika Chauhuri ITA, UC San Diego, 9500 Gilman Drive, La Jolla, CA Sham M. Kakae Karen Livescu Karthik Sriharan Toyota Technological Institute at Chicago, 6045 S. Kenwoo Ave., Chicago, IL kamalika@soe.ucs.eu

More information

Hybrid Fusion for Biometrics: Combining Score-level and Decision-level Fusion

Hybrid Fusion for Biometrics: Combining Score-level and Decision-level Fusion Hybri Fusion for Biometrics: Combining Score-level an Decision-level Fusion Qian Tao Raymon Velhuis Signals an Systems Group, University of Twente Postbus 217, 7500AE Enschee, the Netherlans {q.tao,r.n.j.velhuis}@ewi.utwente.nl

More information

Multivariate Methods. Matlab Example. Principal Components Analysis -- PCA

Multivariate Methods. Matlab Example. Principal Components Analysis -- PCA Multivariate Methos Xiaoun Qi Principal Coponents Analysis -- PCA he PCA etho generates a new set of variables, calle principal coponents Each principal coponent is a linear cobination of the original

More information