Variational Bayes for generic topic models

Size: px
Start display at page:

Download "Variational Bayes for generic topic models"

Transcription

1 Variational Bayes for generic topic odels Gregor Heinrich 1 and Michael Goesele 2 1 Fraunhofer IGD and University of Leipzig 2 TU Darstadt Abstract. The article contributes a derivation of variational Bayes for a large class of topic odels by generalising fro the well-known odel of latent Dirichlet allocation. For an abstraction of these odels as systes of interconnected ixtures, variational update equations are obtained, leading to inference algoriths for odels that so far have used Gibbs sapling exclusively. 1 Introduction Topic odels (TMs) are a set of unsupervised learning odels used in any areas of artificial intelligence: In text ining, they allow retrieval and autoatic thesaurus generation; coputer vision uses TMs for iage classification and content based retrieval; in bioinforatics they are the basis for protein relationship odels etc. In all of these cases, TMs learn latent variables fro co-occurrences of features in data. Following the seinal odel of latent Dirichlet allocation (LDA [6]), this is done efficiently according to a odel that exploits the conjugacy of Dirichlet and ultinoial probability distributions. Although the original work by Blei et al. [6] has shown the applicability of variational Bayes (VB) for TMs with ipressive results, inference especially in ore coplex odels has not adopted this technique but reains the doain of Gibbs sapling (e.g., [12,9,8]). In this article, we explore variational Bayes for TMs in general rather than specific for soe given odel. We start with an overview of TMs and specify general properties (Sec. 2). Using these properties, we develop a generic approach to VB that can be applied to a large class of odels (Sec. 3). We verify the variational algoriths on real data and several odels (Sec. 4). This paper is therefore the VB counterpart to [7]. 2 Topic odels We characterise topic odels as a for of discrete ixture odels. Mixture odels approxiate coplex distributions by a convex su of coponent distributions, p(x) = Kk=1 p(x z=k)p(z=k), where p(z=k) is the weight of a coponent with index k and distribution p(x z=k). Latent Dirichlet allocation as the siplest TM can be considered a ixture odel with two interrelated ixtures: It represents docuents as ixtures of latent variables z with coponents ϑ = p(z ) and latent topics z as ixtures of words w with

2 2 [M] [M] z,n =k w,n =t ϑ α β [K] k η [V] (a) ϑ r z 2,n =x z αr ϑ 3,n,x α =y w,n =t x ϑ [s1] [s2] y β [V] (c) a x,n =x [M] [A] ϑ [M],0 α 0 [ T ] z T,n =T z t,n =t ϑ [M],T α T [ t ] ϑ x α (b) z,n =k ϕ k β w,n =t [K] [V] z T,n=T l,n w,n ζ T,t γ ϕ [3] l,t,t β [V] [ T + t +1] (d) Fig. 1. Dependencies of ixture levels (ellipses) via discrete variables (arrows) in exaples fro literature: (a) latent Dirichlet allocation [6], (b) author topic odel (ATM [12], using observed paraeters a to label docuents, see end of Sec. 3), (c) 4-level pachinko allocation (PAM [9], odels seantic structure with a hierarchy of topics ϑ, ϑ,x, ϑ y ), (d) hierarchical pachinko allocation (hpam [8], topic hierarchy; coplex ixture structure). coponents β k = p(w z=k) and coponent weights ϑ, leading to a distribution over words w of p(w ) = K k=1 ϑ,k β k,w. 3 The corresponding generative process is illustrative: For each text docuent, a ultinoial distribution ϑ is drawn fro a Dirichlet prior Dir( ϑ α) with hyperparaeter α. For each word token w,n of that docuent, a topic z,n =k is drawn fro the docuent ultinoial ϑ and finally the word observation w drawn fro a topic-specific ultinoial over ters β k. Pursuing a Bayesian strategy with paraeters handled as rando variables, the topic-specific ultinoial is itself drawn fro another Dirichlet, Dir( β k η), siilar to the docuent ultinoial. Generic TMs. As generalisations of LDA, topic odels can be seen as a powerful yet flexible fraework to odel coplex relationships in data that are based on only two odelling assuptions: (1) TMs are structured into Dirichlet ultinoial ixture levels to learn discrete latent variables (in LDA: z) and ultinoial paraeters (in LDA: β and ϑ). And (2) these levels are coupled via the values of discrete variables, siilar to the coupling in LDA between ϑ and β via z. More specifically, topic odels for graphs of ixture levels with sets of ultinoial coponents as nodes connected by discrete rando values as directed edges. Conditioned on discrete inputs, each ixture level chooses one of its coponents to generate discrete output propagated to the next level(s), until one or ore final levels produce observable discrete data. For soe exaples fro literature, corresponding ixture networks are shown in Fig. 1, including the variant of observed ultinoial paraeters substituting the Dirichlet prior, which will be discussed further below. For the following derivations, we introduce sets of discrete variables X, ultinoial paraeters Θ and Dirichlet hyperparaeters A as odel-wide quantities, and the corresponding level-specific quanitities X l, Θ l, A l where superscript l indicates the ixture level. The constraint of connecting different ixture levels (ellipses in Fig. 1) via discrete variables (arrows in Fig. 1) can be expressed by an operator x l that yields all parent variables of a ixture level l L generating variable x l. Here x l can refer to 3 In exaple odels, we use the sybols fro the original literature.

3 3 specific tokens xi l or configurations X l. Based on this and the definitions of the ultinoial and Dirichlet distributions, the joint likelihood of any TM is: p(x, Θ A) = p(x l, Θ l A l, X l ) = Mult(x i Θ, x i ) Dir( ϑ k A, X) (1) i Γ( t α j,t ) = ϑ ki,x i ϑ α j,t 1 i k t Γ(α j,t ) k,t ; ki l = g l ( xi l, i), jl = f l (k l ) t 1 = ϑ n k,t+α j,t 1 ( α j ) k,t ; n l k,t = δ(k i k) δ(x i t). (2) k t In this equation, soe further notation is introduced: We use brackets [ ] to indicate that the contained quantities are specific to level l. Moreover, the appings fro parent variables to coponent indices k i are expressed by (level-specific) k i = g( x i, i), and n l k,t is the nuber of ties that a configuration { x i, i} for level l lead to coponent k l. Further, odels are allowed to group coponents by providing group-specific hyperparaeters α j with apping j = f (k). Finally, ( α) is the noralisation function of the Dirichlet distribution, a K-diensional beta function: ( α) t Γ(α t )/Γ( t α t ). 3 Variational Bayes for topic odels As in any latent-variable odels, deterining the posterior distribution p(h, Θ V) = p(v, H, Θ) / H p(v, H, Θ) dθ with hidden and visible variables {H, V} = X, is intractable in TMs because of excessive dependencies between the sets of latent variables H and paraeters Θ in the arginal likelihood p(v) = H p(v, H, Θ) dθ in the denoinator. Variational Bayes [2] is an approxiative inference technique that relaxes the structure of p(h, Θ V) by a sipler variational distribution q(h, Θ Ψ, Ξ) conditioned on sets of free variational paraeters Ψ and Ξ to be estiated in lieu of H and Θ. Miniizing the Kullback-Leibler divergence of the distribution q to the true posterior can be shown to be equivalent to axiising a lower bound on the log arginal likelihood: log p(v) log p(v) KL{q(H, Θ) p(h, Θ V)} = log p(v, H, Θ) q(h,θ) + H{q(H, Θ)} F {q(h, Θ)} (3) with entropy H{ }. F {q(h, Θ)} is the (negative) variational free energy the quantity to be optiised using an EM-like algorith that alternates between (E) axiising F w.r.t. the variational paraeters to pull the lower bound towards the arginal likelihood and (M) axiising F w.r.t. the true paraeters to raise the arginal likelihood. Mean-field approxiation. Following the variational ean field approach [2], in the LDA odel the variational distribution consists of fully factorised Dirichlet and ultinoial distributions [6]: 4 q( z, β, ϑ ϕ, λ, γ) = M N Mult(z,n ϕ,n ) =1 n=1 k=1 i k K M Dir( β k λ k ) Dir( ϑ γ ). (4) 4 In [6] this refers to the soothed version; it is described in ore detail in [5]. =1

4 4 In [6], this approach proved very successful, which raises the question how it can be transferred to ore generic TMs. Our approach is to view Eq. 4 as a special case of a ore generic variational structure that captures dependencies X between ultiple hidden ixture levels and includes LDA for the case of one hidden level (H = { z }): q(h, Θ Ψ, Ξ) = Mult(x i ψ i, x i ) Dir( ϑ k ξ k, X), (5) l H i where l H refers to all levels that produce hidden variables. In the following, we assue that the indicator i is identical for all levels l, e.g., words in docuents i l = i (, n). Further, tokens i in the corpus can be grouped into ters v and (observable) docuent-specific ter frequencies n,v introduced. We use shorthand u = (, v) to refer to specific unique tokens or docuent ter pairs. Topic field. The dependency between ixture levels, x l u, can be expressed by the likelihood of a particular configuration of hidden variables x u = t {x l u=t l } l H under the variational distribution: ψ u, t = q( x u = t Ψ). The coplete structure ψ u (the joint distribution over all l H with Ψ = {ψ u } u ) is a ulti-way array of likelihoods for all latent configurations of token u with as any index diensions as there are dependent variables. For instance, Fig. 1 reveals that LDA has one hidden variable with diension K while PAM has two with diensions s 1 s 2. Because of its interpretation as a ean field of topic states in the odel, we refer to ψ u as a topic field (in underline notation). We further define ψ l u,k,t as the likelihood of configuration (kl, t l ) for docuent ter pair u. This arginal of ψ u depends on the appings between parent variables x u and coponents k on each level. To obtain ψ l u,k,t, the topic field ψ u is sued over all descendant paths that x u =t causes and the ancestor paths that can cause k = g( x u, u) on level l according to the generative process: ψ l u,k,t = { t l A, t l D } ψ u;( t l A,k l,t l, t l D ) ; t l A = path causing kl, t l D = path caused by tl. (6) Descendant paths t l D of tl are obtained via recursion of k = g( x d u, u) over l s descendant levels d. Assuing bijective g( ) as in the TMs in Fig. 1, the ancestor paths t l A that correspond to coponents in parents leading to k l are obtained via ( x a u, u) = g 1 (k) on l s ancestor levels a recursively. Each pair { t l A, t l D } corresponds to one eleent in ψ u per {k l, t l } at index vector t = ( t l A, kl, t l, t l D ). Free energy. Using Eqs. 2, 3, 5 and 6, the free energy of the generic odel becoes: (( ) ) F = log ( ξ k ) log ( α j ) + un u ψ u,k,t + α j,t ξ k,t µt ( ξ k ) k t n u ψ u, t log ψ u, t = F l + H{Ψ}, (7) u t where µ t ( ξ) Ψ(ξ t ) Ψ( tξ t ) = log ϑ ξ Dir( ϑ ξ) = t log ( ξ), and Ψ(ξ) d/dx log Γ(ξ) is the digaa function. 5 5 Note the distinction between the function Ψ( ) and quantity Ψ. k

5 5 Variational E-steps. In the E-step of each odel, the variational distributions for the joint ultinoial ψ u for each token (its topic field) and the Dirichlet paraeters ξ l k on each level need to be estiated. The updates can be derived fro the generic Eq. 7 by setting derivatives with respect to the variational paraeters to zero, which yields: 6 ( [ ψ u, t exp µt ( ξ k ) ] ), (8) ξ l k,t = [ ( un u ψ u,k,t ) + α j,t ] (9) where the su un u ψ l u,k,t for level l can be interpreted as the expected counts n l k,t q of co-occurrence of the value pair (k l, t l ). The result in Eqs. 8 and 9 perfectly generalises that for LDA in [5]. M-steps. In the M-step of each odel, the Dirichlet hyperparaeters α l j (or scalar αl ) are calculated fro the variational expectations of the log odel paraeters log ϑ k,t q = µ t ( ξ k ), which can be done at ixture level (Eq. 9 has no reference to α l j across levels). Each estiator for α j (oitting level l) should see only the expected paraeters µ t ( ξ k ) of the K j coponents associated with its group j = f (k). We assue that coponents be associated a priori (e.g., PAM in Fig. 1c has ϑ,x Dir( α x )) and K j is known. Then the Dirichlet ML paraeter estiation procedure given in [6,10] can be used in odified for. It is based on Newton s ethod with the Dirichlet log likelihood function f as well as its gradient and Hessian eleents g t and h tu : f ( α j ) = K j log ( α j ) + t(α j,t 1) {k: f (k)= j}µ t ( ξ k ) g t ( α j ) = K j µ t ( α j ) + {k: f (k)= j} µ t ( ξ k ) h tu ( α j ) = K j Ψ ( sα j,s ) + δ(t u)k j Ψ (α j,t ) = z + δ(t u)h tt α j,t α j,t (H 1 g) t = α j,t h 1 tt ( gt ( sg s h 1 ss ) / (z 1 + sh 1 ss ) ). (10) Scalar α (without grouping) is found accordingly via the syetric Dirichlet: f = K[T log Γ(α) log Γ(Tα)] + (α 1)s α, s α = K k=1 Tt=1 µ t ( ξ k ) g = KT[Ψ(Tα) Ψ(α) + s α ], h = KT[TΨ (Tα) Ψ (α)] α α gh 1. (11) Variants. As an alternative to Bayesian estiation of all ixture level paraeters, for soe ixture levels ML point estiates ay be used that are coputationally less expensive (e.g., unsoothed LDA [6]). By applying ML only to levels without docuentspecific coponents, the generative process for unseen docuents is retained. The E- step with ML levels has a siplified Eq. 8, and ML paraeters ϑ c are estiated in the M-step (instead of hyperparaeters): ψ u, t exp ( \c [ µt ( ξ k ) ] ) ϑ c k,t, ϑc k,t = n c k,t q / n c k q un u ψ c u,k,t. (12) 6 In Eq. 8 we assue that t l =v on final ixture level(s) ( leaves ), which ties observed ters v to the latent structure. For root levels where coponent indices are observed, µ t ( ξ k ) in Eq. 8 can be replaced by Ψ(ξ k,t ).

6 6 Moreover, as an extension to the fraework specified in Sec. 2, it is straightforward to introduce observed paraeters that for instance can represent labels, as in the author topic odel, cf. Fig 1. In the free energy in Eq. 7, the ter with µ t ( ξ k ) is replaced by ( un u ψ u,k,t ) log ϑ k,t, and consequently, Eq. 8 takes the for of Eq. 12 (left), as well. Other variants like specific distributions for priors (e.g., logistic-noral to odel topic correlation [4] and non-paraetric approaches [14]) and observations (e.g., Gaussian coponents to odel continuous data [1]), will not be covered here. Algorith structure. The coplete variational EM algorith alternates between the variational E-step and M-step until the variational free energy F converges at an optiu. At convergence, the estiated docuent and topic ultinoials can be obtained via the variational expectation log ˆϑ k,t = µ t ( ξ k ). Initialisation plays an iportant role to avoid local optia, and a coon approach is to initialise topic distributions with observed data, possibly using several such initialisations concurrently. The actual variational EM loop can be outlined in its generic for as follows: 1. Repeat E-step loop until convergence w.r.t. variational paraeters: 1. For each observed unique token u: 1. For each configuration t: calculate var. ultinoial ψ u, t (Eq. 8 or 12 left). 2. For each (k, t) on each level l: calculate var. Dirichlet paraeters ξ l k,t based on topic field arginals ψ l u,k,t (Eqs. 6 and 9), which can be done differentially: ξl k,t ξ l k,t + n u ψ l u,k,t with ψl u,k,t the change of ψl u,k,t. 2. Finish variational E-step if free energy F (Eq. 7) converged. 2. Perfor M-step: 1. For each j on each level l: calculate hyperparaeter α l j,t (Eqs. 10 or 11), inner iteration loop over t. 2. For each (k, t) in point-estiated nodes l: estiate ϑ l k,t (Eq. 12 right). 3. Finish variational EM loop if free energy F (Eq. 7) converged. In practice, siilar to [5], this algorith can be odified by separating levels with docuent-specific variational paraeters Ξ l, and such with corpus-wide paraeters Ξ l,. This allows a separate E-step loop for each docuent that updates ψ u and Ξ l, with Ξ l, fixed. Paraeters Ξ l, are updated afterwards fro changes ψ l u,k,t cuulated in the docuent-specific loops, and their contribution added to F. 4 Experiental verification In this section, we present initial validation results based on the algorith in Sec. 3. Setting. We chose odels fro Fig. 1, LDA, ATM and PAM, and investigated two versions of each: an unsoothed version that perfors ML estiation of the final ixture level (using Eq. 12) and a soothed version that places variational distributions over all paraeters (using Eq. 8). Except for the coponent grouping in PAM ( ϑ,x have vector hyperparaeter α x ), we used scalar hyperparaeters. As a base-line, we used Gibbs sapling ipleentations of the corresponding odels. Two criteria are iediately useful: the ability to generalise to test data V given the odel paraeters Θ, and the convergence tie (assuing single-threaded operation). For the

7 7 Model: LDA ATM PAM Diensions {A,B}: K = {25, 100} K = {25, 100} s 1,2 = {(5, 10), (25, 25)} Method: GS VB ML VB GS VB ML VB GS VB ML VB Convergence tie [h] Iteration tie [sec] Iterations Perplexity A B A B A B A B Fig. 2. Results of VB and Gibbs experients. first criterion, because of its frequent usage with topic odels we use the perplexity, the inverse geoetric ean of the likelihood of test data tokens given the odel: P(V ) = exp( u n u log p(v u Θ )/W ) where Θ are the paraeters fitted to the test data V with W tokens. The log likelihood of test tokens log p(v u Θ ) is obtained by (1) running the inference algoriths on the test data, which yields Ξ and consequently Θ, and (2) arginalising all hidden variables h u in the likelihood p(v u h u, Θ ) = [ ] ϑk,t. 7 The experients were perfored on the NIPS corpus [11] with M = 1740 docuents (174 held-out), V = ters, W = tokens, and A = 2037 authors. Results. The results of the experients are shown in Fig. 2. It turns out that generally the VB algoriths were able to achieve perplexity reductions in the range of their Gibbs counterparts, which verifies the approach taken. Further, the full VB approaches tend to yield slightly iproved perplexity reductions copared to the ML versions. However, these first VB results were consistently weaker copared to the baselines. This ay be due to adverse initialisation of variational distributions, causing VB algoriths to becoe trapped at local optia. It ay alternatively be a systeatic issue due to the correlation between Ψ and Ξ assued independent in Eq. 5, a fact that has otivated the collapsed variant of variational Bayes in [13]. Considering the second evaluation criterion, the results show that the current VB ipleentations generally converge less than half as fast as the corresponding Gibbs saplers. This is why currently work is undertaken in the direction of code optiisation, including parallelisation for ultikernel CPUs, which, opposed to (collapsed) Gibbs saplers, is straightforward for VB. 5 Conclusions We have derived variational Bayes algoriths for a large class of topic odels by generalising fro the well-known odel of latent Dirichlet allocation. By an abstraction of these odels as systes of interconnected ixture levels, we could obtain variational update equations in a generic way, which are the basis for an algorith, that can be easily applied to specific topic odels. Finally, we have applied the algorith to a couple of exaple odels, verifying the general applicability of the approach. So far, especially ore coplex topic odels have predoinantly used inference based on Gibbs sapling. Therefore, this paper is a step towards exploring the possibility of variational 7 In contrast to [12], we also used this ethod to deterine ATM perplexity (fro the ϕ k ).

8 8 approaches. However, what can be drawn as a conclusion fro the experiental study in this paper, ore work reains to be done in order to ake VB algoriths as effective and efficient as their Gibbs counterparts. Related work. Beside the relation to the original LDA odel [6,5], especially the proposed representation of topic odels as networks of ixture levels akes work on discrete DAG odels relevant: In [3], a variational approach for structure learning in DAGs is provided with an alternative derivation based on exponential failies leading to a structure siilar to the topic field. They do not discuss apping of coponents or hyperparaeters and restrict their ipleentations to structure learning in graphs bipartite between hidden and observed nodes. Also, the authors of [9] present their pachinko allocation odels as DAGs, but forulate inference based on Gibbs sapling. In contrast to this, the novelty of the work presented here is that it unifies the theory of topic odels in general including labels, the option of point estiates and coponent grouping for variational Bayes, giving epirical results for real-world topic odels. Future work will optiise the current ipleentations with respect to efficiency in order to iprove the experiental results presented here, and an iportant aspect is to develop parallel algoriths for the odels at hand. Another research direction is the extension of the fraework of generic topic odels, especially taking into consideration the variants of ixture levels outlined in Sec. 3. Finally, we will investigate a generalisation of collapsed variational Bayes [13]. References 1. K. Barnard, P. Duygulu, D. Forsyth, N. de Freitas, D. Blei, and M. Jordan. Matching words and pictures. JMLR, 3(6): , M. J. Beal. Variational Algoriths for Approxiate Bayesian Inference. PhD thesis, Gatsby Coputational Neuroscience Unit, University College London, M. J. Beal and Z. Ghahraani. Variational bayesian learning of directed graphical odels with hidden variables. Bayesian Analysis, 1: , D. Blei and J. Lafferty. A correlated topic odel of science. AOAS, 1:17 35, D. Blei, A. Ng, and M. Jordan. Hierarchical Bayesian odels for applications in inforation retrieval. Bayesian Statistics, 7:25 44, D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. JMLR, 3: , G. Heinrich. A generic approach to topic odels. In ECML/PKDD, W. Li, D. Blei, and A. McCallu. Mixtures of hierarchical topics with pachinko allocation. In ICML, W. Li and A. McCallu. Pachinko allocation: DAG-structured ixture odels of topic correlations. In ICML, T. Minka. Estiating a Dirichlet distribution. Web, NIPS corpus. roweis/data.htl. 12. M. Steyvers, P. Syth, M. Rosen-Zvi, and T. Griffiths. Probabilistic author-topic odels for inforation discovery. In ACM SIGKDD, Y. W. Teh, D. Newan, and M. Welling. A collapsed variational Bayesian inference algorith for latent Dirichlet allocation. In NIPS, volue 19, Y.W. Teh, M.I. Jordan, M.J. Beal, and D.M. Blei. Hierarchical Dirichlet processes. Technical Report 653, Departent of Statistics, University of California at Berkeley, 2004.

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians Using EM To Estiate A Probablity Density With A Mixture Of Gaussians Aaron A. D Souza adsouza@usc.edu Introduction The proble we are trying to address in this note is siple. Given a set of data points

More information

Bayes Decision Rule and Naïve Bayes Classifier

Bayes Decision Rule and Naïve Bayes Classifier Bayes Decision Rule and Naïve Bayes Classifier Le Song Machine Learning I CSE 6740, Fall 2013 Gaussian Mixture odel A density odel p(x) ay be ulti-odal: odel it as a ixture of uni-odal distributions (e.g.

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

A Generic Approach to Topic Models

A Generic Approach to Topic Models A Generic Approach to Topic Models Gregor Heinrich Fraunhofer IGD + University of Leipzig Darmstadt, Germany heinrich@igd.fraunhofer.de Abstract. This article contributes a generic model of topic models.

More information

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic

More information

Machine Learning Basics: Estimators, Bias and Variance

Machine Learning Basics: Estimators, Bias and Variance Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics

More information

Probability Distributions

Probability Distributions Probability Distributions In Chapter, we ephasized the central role played by probability theory in the solution of pattern recognition probles. We turn now to an exploration of soe particular exaples

More information

PAC-Bayes Analysis Of Maximum Entropy Learning

PAC-Bayes Analysis Of Maximum Entropy Learning PAC-Bayes Analysis Of Maxiu Entropy Learning John Shawe-Taylor and David R. Hardoon Centre for Coputational Statistics and Machine Learning Departent of Coputer Science University College London, UK, WC1E

More information

Non-Parametric Non-Line-of-Sight Identification 1

Non-Parametric Non-Line-of-Sight Identification 1 Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016 Lessons 7 14 Dec 2016 Outline Artificial Neural networks Notation...2 1. Introduction...3... 3 The Artificial

More information

Tracking using CONDENSATION: Conditional Density Propagation

Tracking using CONDENSATION: Conditional Density Propagation Tracking using CONDENSATION: Conditional Density Propagation Goal Model-based visual tracking in dense clutter at near video frae rates M. Isard and A. Blake, CONDENSATION Conditional density propagation

More information

Study Notes on the Latent Dirichlet Allocation

Study Notes on the Latent Dirichlet Allocation Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection

More information

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering

More information

Training an RBM: Contrastive Divergence. Sargur N. Srihari

Training an RBM: Contrastive Divergence. Sargur N. Srihari Training an RBM: Contrastive Divergence Sargur N. srihari@cedar.buffalo.edu Topics in Partition Function Definition of Partition Function 1. The log-likelihood gradient 2. Stochastic axiu likelihood and

More information

Variational Adaptive-Newton Method

Variational Adaptive-Newton Method Variational Adaptive-Newton Method Mohaad Etiyaz Khan Wu Lin Voot Tangkaratt Zuozhu Liu Didrik Nielsen AIP, RIKEN, Tokyo Abstract We present a black-box learning ethod called the Variational Adaptive-Newton

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016/2017 Lessons 9 11 Jan 2017 Outline Artificial Neural networks Notation...2 Convolutional Neural Networks...3

More information

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks Intelligent Systes: Reasoning and Recognition Jaes L. Crowley MOSIG M1 Winter Seester 2018 Lesson 7 1 March 2018 Outline Artificial Neural Networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

Chapter 6 1-D Continuous Groups

Chapter 6 1-D Continuous Groups Chapter 6 1-D Continuous Groups Continuous groups consist of group eleents labelled by one or ore continuous variables, say a 1, a 2,, a r, where each variable has a well- defined range. This chapter explores:

More information

Online Sound Structure Analysis Based on Generative Model of Acoustic Feature Sequences

Online Sound Structure Analysis Based on Generative Model of Acoustic Feature Sequences Proceedings of APSIPA Annual Suit and Conference - Deceber, Malaysia Online Sound Structure Analysis Based on Generative Model of Acoustic Feature Sequences Keisuke Ioto, Nobutaka Ono, Masahiro Niitsua,

More information

Feature Extraction Techniques

Feature Extraction Techniques Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

On Hyper-Parameter Estimation in Empirical Bayes: A Revisit of the MacKay Algorithm

On Hyper-Parameter Estimation in Empirical Bayes: A Revisit of the MacKay Algorithm On Hyper-Paraeter Estiation in Epirical Bayes: A Revisit of the MacKay Algorith Chune Li 1, Yongyi Mao, Richong Zhang 1 and Jinpeng Huai 1 1 School of Coputer Science and Engineering, Beihang University,

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lessons 7 20 Dec 2017 Outline Artificial Neural networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

Boosting with log-loss

Boosting with log-loss Boosting with log-loss Marco Cusuano-Towner Septeber 2, 202 The proble Suppose we have data exaples {x i, y i ) i =... } for a two-class proble with y i {, }. Let F x) be the predictor function with the

More information

Identical Maximum Likelihood State Estimation Based on Incremental Finite Mixture Model in PHD Filter

Identical Maximum Likelihood State Estimation Based on Incremental Finite Mixture Model in PHD Filter Identical Maxiu Lielihood State Estiation Based on Increental Finite Mixture Model in PHD Filter Gang Wu Eail: xjtuwugang@gail.co Jing Liu Eail: elelj20080730@ail.xjtu.edu.cn Chongzhao Han Eail: czhan@ail.xjtu.edu.cn

More information

1 Bounding the Margin

1 Bounding the Margin COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost

More information

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information Cite as: Straub D. (2014). Value of inforation analysis with structural reliability ethods. Structural Safety, 49: 75-86. Value of Inforation Analysis with Structural Reliability Methods Daniel Straub

More information

COS 424: Interacting with Data. Written Exercises

COS 424: Interacting with Data. Written Exercises COS 424: Interacting with Data Hoework #4 Spring 2007 Regression Due: Wednesday, April 18 Written Exercises See the course website for iportant inforation about collaboration and late policies, as well

More information

Detection and Estimation Theory

Detection and Estimation Theory ESE 54 Detection and Estiation Theory Joseph A. O Sullivan Sauel C. Sachs Professor Electronic Systes and Signals Research Laboratory Electrical and Systes Engineering Washington University 11 Urbauer

More information

E. Alpaydın AERFAISS

E. Alpaydın AERFAISS E. Alpaydın AERFAISS 00 Introduction Questions: Is the error rate of y classifier less than %? Is k-nn ore accurate than MLP? Does having PCA before iprove accuracy? Which kernel leads to highest accuracy

More information

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition

More information

Bayesian Learning. Chapter 6: Bayesian Learning. Bayes Theorem. Roles for Bayesian Methods. CS 536: Machine Learning Littman (Wu, TA)

Bayesian Learning. Chapter 6: Bayesian Learning. Bayes Theorem. Roles for Bayesian Methods. CS 536: Machine Learning Littman (Wu, TA) Bayesian Learning Chapter 6: Bayesian Learning CS 536: Machine Learning Littan (Wu, TA) [Read Ch. 6, except 6.3] [Suggested exercises: 6.1, 6.2, 6.6] Bayes Theore MAP, ML hypotheses MAP learners Miniu

More information

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

e-companion ONLY AVAILABLE IN ELECTRONIC FORM OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer

More information

Analyzing Simulation Results

Analyzing Simulation Results Analyzing Siulation Results Dr. John Mellor-Cruey Departent of Coputer Science Rice University johnc@cs.rice.edu COMP 528 Lecture 20 31 March 2005 Topics for Today Model verification Model validation Transient

More information

Appendix. Bayesian Information Criterion (BIC) E Proof of Proposition 1. Compositional Kernel Search Algorithm. Base Kernels.

Appendix. Bayesian Information Criterion (BIC) E Proof of Proposition 1. Compositional Kernel Search Algorithm. Base Kernels. Hyunjik Ki and Yee Whye Teh Appendix A Bayesian Inforation Criterion (BIC) The BIC is a odel selection criterion that is the arginal likelihood with a odel coplexity penalty: BIC = log p(y ˆθ) 1 2 p log(n)

More information

Bayesian inference for stochastic differential mixed effects models - initial steps

Bayesian inference for stochastic differential mixed effects models - initial steps Bayesian inference for stochastic differential ixed effects odels - initial steps Gavin Whitaker 2nd May 2012 Supervisors: RJB and AG Outline Mixed Effects Stochastic Differential Equations (SDEs) Bayesian

More information

In this chapter, we consider several graph-theoretic and probabilistic models

In this chapter, we consider several graph-theoretic and probabilistic models THREE ONE GRAPH-THEORETIC AND STATISTICAL MODELS 3.1 INTRODUCTION In this chapter, we consider several graph-theoretic and probabilistic odels for a social network, which we do under different assuptions

More information

Bayesian Approach for Fatigue Life Prediction from Field Inspection

Bayesian Approach for Fatigue Life Prediction from Field Inspection Bayesian Approach for Fatigue Life Prediction fro Field Inspection Dawn An and Jooho Choi School of Aerospace & Mechanical Engineering, Korea Aerospace University, Goyang, Seoul, Korea Srira Pattabhiraan

More information

Text Mining for Economics and Finance Latent Dirichlet Allocation

Text Mining for Economics and Finance Latent Dirichlet Allocation Text Mining for Economics and Finance Latent Dirichlet Allocation Stephen Hansen Text Mining Lecture 5 1 / 45 Introduction Recall we are interested in mixed-membership modeling, but that the plsi model

More information

Pachinko Allocation: DAG-Structured Mixture Models of Topic Correlations

Pachinko Allocation: DAG-Structured Mixture Models of Topic Correlations : DAG-Structured Mixture Models of Topic Correlations Wei Li and Andrew McCallum University of Massachusetts, Dept. of Computer Science {weili,mccallum}@cs.umass.edu Abstract Latent Dirichlet allocation

More information

Extension of CSRSM for the Parametric Study of the Face Stability of Pressurized Tunnels

Extension of CSRSM for the Parametric Study of the Face Stability of Pressurized Tunnels Extension of CSRSM for the Paraetric Study of the Face Stability of Pressurized Tunnels Guilhe Mollon 1, Daniel Dias 2, and Abdul-Haid Soubra 3, M.ASCE 1 LGCIE, INSA Lyon, Université de Lyon, Doaine scientifique

More information

2 Q 10. Likewise, in case of multiple particles, the corresponding density in 2 must be averaged over all

2 Q 10. Likewise, in case of multiple particles, the corresponding density in 2 must be averaged over all Lecture 6 Introduction to kinetic theory of plasa waves Introduction to kinetic theory So far we have been odeling plasa dynaics using fluid equations. The assuption has been that the pressure can be either

More information

Block designs and statistics

Block designs and statistics Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent

More information

Combining Classifiers

Combining Classifiers Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/

More information

A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation

A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation THIS IS A DRAFT VERSION. FINAL VERSION TO BE PUBLISHED AT NIPS 06 A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee Whye Teh School of Computing National University

More information

The Distribution of the Covariance Matrix for a Subset of Elliptical Distributions with Extension to Two Kurtosis Parameters

The Distribution of the Covariance Matrix for a Subset of Elliptical Distributions with Extension to Two Kurtosis Parameters journal of ultivariate analysis 58, 96106 (1996) article no. 0041 The Distribution of the Covariance Matrix for a Subset of Elliptical Distributions with Extension to Two Kurtosis Paraeters H. S. Steyn

More information

paper prepared for the 1996 PTRC Conference, September 2-6, Brunel University, UK ON THE CALIBRATION OF THE GRAVITY MODEL

paper prepared for the 1996 PTRC Conference, September 2-6, Brunel University, UK ON THE CALIBRATION OF THE GRAVITY MODEL paper prepared for the 1996 PTRC Conference, Septeber 2-6, Brunel University, UK ON THE CALIBRATION OF THE GRAVITY MODEL Nanne J. van der Zijpp 1 Transportation and Traffic Engineering Section Delft University

More information

Supervised Baysian SAR image Classification Using The Full Polarimetric Data

Supervised Baysian SAR image Classification Using The Full Polarimetric Data Supervised Baysian SAR iage Classification Using The Full Polarietric Data (1) () Ziad BELHADJ (1) SUPCOM, Route de Raoued 3.5 083 El Ghazala - TUNSA () ENT, BP. 37, 100 Tunis Belvedere, TUNSA Abstract

More information

LogLog-Beta and More: A New Algorithm for Cardinality Estimation Based on LogLog Counting

LogLog-Beta and More: A New Algorithm for Cardinality Estimation Based on LogLog Counting LogLog-Beta and More: A New Algorith for Cardinality Estiation Based on LogLog Counting Jason Qin, Denys Ki, Yuei Tung The AOLP Core Data Service, AOL, 22000 AOL Way Dulles, VA 20163 E-ail: jasonqin@teaaolco

More information

A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation

A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee Whye Teh Gatsby Computational Neuroscience Unit University College London 17 Queen Square, London WC1N 3AR, UK ywteh@gatsby.ucl.ac.uk

More information

Generalized Queries on Probabilistic Context-Free Grammars

Generalized Queries on Probabilistic Context-Free Grammars IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 20, NO. 1, JANUARY 1998 1 Generalized Queries on Probabilistic Context-Free Graars David V. Pynadath and Michael P. Wellan Abstract

More information

Symbolic Analysis as Universal Tool for Deriving Properties of Non-linear Algorithms Case study of EM Algorithm

Symbolic Analysis as Universal Tool for Deriving Properties of Non-linear Algorithms Case study of EM Algorithm Acta Polytechnica Hungarica Vol., No., 04 Sybolic Analysis as Universal Tool for Deriving Properties of Non-linear Algoriths Case study of EM Algorith Vladiir Mladenović, Miroslav Lutovac, Dana Porrat

More information

A Note on the Applied Use of MDL Approximations

A Note on the Applied Use of MDL Approximations A Note on the Applied Use of MDL Approxiations Daniel J. Navarro Departent of Psychology Ohio State University Abstract An applied proble is discussed in which two nested psychological odels of retention

More information

An Improved Particle Filter with Applications in Ballistic Target Tracking

An Improved Particle Filter with Applications in Ballistic Target Tracking Sensors & ransducers Vol. 72 Issue 6 June 204 pp. 96-20 Sensors & ransducers 204 by IFSA Publishing S. L. http://www.sensorsportal.co An Iproved Particle Filter with Applications in Ballistic arget racing

More information

Collapsed Variational Bayesian Inference for Hidden Markov Models

Collapsed Variational Bayesian Inference for Hidden Markov Models Collapsed Variational Bayesian Inference for Hidden Markov Models Pengyu Wang, Phil Blunsom Department of Computer Science, University of Oxford International Conference on Artificial Intelligence and

More information

Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine

Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine Nitish Srivastava nitish@cs.toronto.edu Ruslan Salahutdinov rsalahu@cs.toronto.edu Geoffrey Hinton hinton@cs.toronto.edu

More information

Machine Learning Summer School

Machine Learning Summer School Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,

More information

Lecture 6: Graphical Models: Learning

Lecture 6: Graphical Models: Learning Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)

More information

Topic Models and Applications to Short Documents

Topic Models and Applications to Short Documents Topic Models and Applications to Short Documents Dieu-Thu Le Email: dieuthu.le@unitn.it Trento University April 6, 2011 1 / 43 Outline Introduction Latent Dirichlet Allocation Gibbs Sampling Short Text

More information

Information retrieval LSI, plsi and LDA. Jian-Yun Nie

Information retrieval LSI, plsi and LDA. Jian-Yun Nie Information retrieval LSI, plsi and LDA Jian-Yun Nie Basics: Eigenvector, Eigenvalue Ref: http://en.wikipedia.org/wiki/eigenvector For a square matrix A: Ax = λx where x is a vector (eigenvector), and

More information

Paul M. Goggans Department of Electrical Engineering, University of Mississippi, Anderson Hall, University, Mississippi 38677

Paul M. Goggans Department of Electrical Engineering, University of Mississippi, Anderson Hall, University, Mississippi 38677 Evaluation of decay ties in coupled spaces: Bayesian decay odel selection a),b) Ning Xiang c) National Center for Physical Acoustics and Departent of Electrical Engineering, University of Mississippi,

More information

Data-Driven Imaging in Anisotropic Media

Data-Driven Imaging in Anisotropic Media 18 th World Conference on Non destructive Testing, 16- April 1, Durban, South Africa Data-Driven Iaging in Anisotropic Media Arno VOLKER 1 and Alan HUNTER 1 TNO Stieltjesweg 1, 6 AD, Delft, The Netherlands

More information

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential

More information

Support Vector Machines. Goals for the lecture

Support Vector Machines. Goals for the lecture Support Vector Machines Mark Craven and David Page Coputer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Soe of the slides in these lectures have been adapted/borrowed fro aterials developed

More information

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay

More information

A Smoothed Boosting Algorithm Using Probabilistic Output Codes

A Smoothed Boosting Algorithm Using Probabilistic Output Codes A Soothed Boosting Algorith Using Probabilistic Output Codes Rong Jin rongjin@cse.su.edu Dept. of Coputer Science and Engineering, Michigan State University, MI 48824, USA Jian Zhang jian.zhang@cs.cu.edu

More information

The linear sampling method and the MUSIC algorithm

The linear sampling method and the MUSIC algorithm INSTITUTE OF PHYSICS PUBLISHING INVERSE PROBLEMS Inverse Probles 17 (2001) 591 595 www.iop.org/journals/ip PII: S0266-5611(01)16989-3 The linear sapling ethod and the MUSIC algorith Margaret Cheney Departent

More information

Distributed Subgradient Methods for Multi-agent Optimization

Distributed Subgradient Methods for Multi-agent Optimization 1 Distributed Subgradient Methods for Multi-agent Optiization Angelia Nedić and Asuan Ozdaglar October 29, 2007 Abstract We study a distributed coputation odel for optiizing a su of convex objective functions

More information

Machine Learning: Fisher s Linear Discriminant. Lecture 05

Machine Learning: Fisher s Linear Discriminant. Lecture 05 Machine Learning: Fisher s Linear Discriinant Lecture 05 Razvan C. Bunescu chool of Electrical Engineering and Coputer cience bunescu@ohio.edu Lecture 05 upervised Learning ask learn an (unkon) function

More information

Sequence Analysis, WS 14/15, D. Huson & R. Neher (this part by D. Huson) February 5,

Sequence Analysis, WS 14/15, D. Huson & R. Neher (this part by D. Huson) February 5, Sequence Analysis, WS 14/15, D. Huson & R. Neher (this part by D. Huson) February 5, 2015 31 11 Motif Finding Sources for this section: Rouchka, 1997, A Brief Overview of Gibbs Sapling. J. Buhler, M. Topa:

More information

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab Support Vector Machines Machine Learning Series Jerry Jeychandra Bloh Lab Outline Main goal: To understand how support vector achines (SVMs) perfor optial classification for labelled data sets, also a

More information

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis City University of New York (CUNY) CUNY Acadeic Works International Conference on Hydroinforatics 8-1-2014 Experiental Design For Model Discriination And Precise Paraeter Estiation In WDS Analysis Giovanna

More information

BAYESIAN ESTIMATION OF PROBABILITIES OF TARGET RATES USING FED FUNDS FUTURES OPTIONS

BAYESIAN ESTIMATION OF PROBABILITIES OF TARGET RATES USING FED FUNDS FUTURES OPTIONS BAYESIAN ESTIMATION OF PROBABILITIES OF TARGET RATES USING FED FUNDS FUTURES OPTIONS MARK FISHER Preliinary and Incoplete Abstract. This paper adopts a Bayesian approach to the proble of extracting the

More information

Probabilistic Machine Learning

Probabilistic Machine Learning Probabilistic Machine Learning by Prof. Seungchul Lee isystes Design Lab http://isystes.unist.ac.kr/ UNIST Table of Contents I.. Probabilistic Linear Regression I... Maxiu Likelihood Solution II... Maxiu-a-Posteriori

More information

Multi-view Discriminative Manifold Embedding for Pattern Classification

Multi-view Discriminative Manifold Embedding for Pattern Classification Multi-view Discriinative Manifold Ebedding for Pattern Classification X. Wang Departen of Inforation Zhenghzou 450053, China Y. Guo Departent of Digestive Zhengzhou 450053, China Z. Wang Henan University

More information

Sharp Time Data Tradeoffs for Linear Inverse Problems

Sharp Time Data Tradeoffs for Linear Inverse Problems Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used

More information

Supplementary to Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data

Supplementary to Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data Suppleentary to Learning Discriinative Bayesian Networks fro High-diensional Continuous Neuroiaging Data Luping Zhou, Lei Wang, Lingqiao Liu, Philip Ogunbona, and Dinggang Shen Proposition. Given a sparse

More information

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search Quantu algoriths (CO 781, Winter 2008) Prof Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search ow we begin to discuss applications of quantu walks to search algoriths

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

Ştefan ŞTEFĂNESCU * is the minimum global value for the function h (x)

Ştefan ŞTEFĂNESCU * is the minimum global value for the function h (x) 7Applying Nelder Mead s Optiization Algorith APPLYING NELDER MEAD S OPTIMIZATION ALGORITHM FOR MULTIPLE GLOBAL MINIMA Abstract Ştefan ŞTEFĂNESCU * The iterative deterinistic optiization ethod could not

More information

Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space

Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space Journal of Machine Learning Research 3 (2003) 1333-1356 Subitted 5/02; Published 3/03 Grafting: Fast, Increental Feature Selection by Gradient Descent in Function Space Sion Perkins Space and Reote Sensing

More information

Introduction to Bayesian inference

Introduction to Bayesian inference Introduction to Bayesian inference Thomas Alexander Brouwer University of Cambridge tab43@cam.ac.uk 17 November 2015 Probabilistic models Describe how data was generated using probability distributions

More information

W-BASED VS LATENT VARIABLES SPATIAL AUTOREGRESSIVE MODELS: EVIDENCE FROM MONTE CARLO SIMULATIONS

W-BASED VS LATENT VARIABLES SPATIAL AUTOREGRESSIVE MODELS: EVIDENCE FROM MONTE CARLO SIMULATIONS W-BASED VS LATENT VARIABLES SPATIAL AUTOREGRESSIVE MODELS: EVIDENCE FROM MONTE CARLO SIMULATIONS. Introduction When it coes to applying econoetric odels to analyze georeferenced data, researchers are well

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

Graphical Models in Local, Asymmetric Multi-Agent Markov Decision Processes

Graphical Models in Local, Asymmetric Multi-Agent Markov Decision Processes Graphical Models in Local, Asyetric Multi-Agent Markov Decision Processes Ditri Dolgov and Edund Durfee Departent of Electrical Engineering and Coputer Science University of Michigan Ann Arbor, MI 48109

More information

A Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair

A Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair Proceedings of the 6th SEAS International Conference on Siulation, Modelling and Optiization, Lisbon, Portugal, Septeber -4, 006 0 A Siplified Analytical Approach for Efficiency Evaluation of the eaving

More information

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t. CS 493: Algoriths for Massive Data Sets Feb 2, 2002 Local Models, Bloo Filter Scribe: Qin Lv Local Models In global odels, every inverted file entry is copressed with the sae odel. This work wells when

More information

CS Lecture 18. Topic Models and LDA

CS Lecture 18. Topic Models and LDA CS 6347 Lecture 18 Topic Models and LDA (some slides by David Blei) Generative vs. Discriminative Models Recall that, in Bayesian networks, there could be many different, but equivalent models of the same

More information

Latent Dirichlet Allocation Based Multi-Document Summarization

Latent Dirichlet Allocation Based Multi-Document Summarization Latent Dirichlet Allocation Based Multi-Document Summarization Rachit Arora Department of Computer Science and Engineering Indian Institute of Technology Madras Chennai - 600 036, India. rachitar@cse.iitm.ernet.in

More information

Estimating Parameters for a Gaussian pdf

Estimating Parameters for a Gaussian pdf Pattern Recognition and achine Learning Jaes L. Crowley ENSIAG 3 IS First Seester 00/0 Lesson 5 7 Noveber 00 Contents Estiating Paraeters for a Gaussian pdf Notation... The Pattern Recognition Proble...3

More information

An analytical relation between relaxation time spectrum and molecular weight distribution

An analytical relation between relaxation time spectrum and molecular weight distribution An analytical relation between relaxation tie spectru and olecular weight distribution Wolfgang Thi, Christian Friedrich, a) Michael Marth, and Josef Honerkap b) Freiburger Materialforschungszentru, Stefan-Meier-Straße

More information

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS A Thesis Presented to The Faculty of the Departent of Matheatics San Jose State University In Partial Fulfillent of the Requireents

More information

A Theoretical Analysis of a Warm Start Technique

A Theoretical Analysis of a Warm Start Technique A Theoretical Analysis of a War Start Technique Martin A. Zinkevich Yahoo! Labs 701 First Avenue Sunnyvale, CA Abstract Batch gradient descent looks at every data point for every step, which is wasteful

More information

About the definition of parameters and regimes of active two-port networks with variable loads on the basis of projective geometry

About the definition of parameters and regimes of active two-port networks with variable loads on the basis of projective geometry About the definition of paraeters and regies of active two-port networks with variable loads on the basis of projective geoetry PENN ALEXANDR nstitute of Electronic Engineering and Nanotechnologies "D

More information

Proc. of the IEEE/OES Seventh Working Conference on Current Measurement Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES

Proc. of the IEEE/OES Seventh Working Conference on Current Measurement Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES Proc. of the IEEE/OES Seventh Working Conference on Current Measureent Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES Belinda Lipa Codar Ocean Sensors 15 La Sandra Way, Portola Valley, CA 98 blipa@pogo.co

More information

Statistical Logic Cell Delay Analysis Using a Current-based Model

Statistical Logic Cell Delay Analysis Using a Current-based Model Statistical Logic Cell Delay Analysis Using a Current-based Model Hanif Fatei Shahin Nazarian Massoud Pedra Dept. of EE-Systes, University of Southern California, Los Angeles, CA 90089 {fatei, shahin,

More information

Estimation of the Mean of the Exponential Distribution Using Maximum Ranked Set Sampling with Unequal Samples

Estimation of the Mean of the Exponential Distribution Using Maximum Ranked Set Sampling with Unequal Samples Open Journal of Statistics, 4, 4, 64-649 Published Online Septeber 4 in SciRes http//wwwscirporg/ournal/os http//ddoiorg/436/os4486 Estiation of the Mean of the Eponential Distribution Using Maiu Ranked

More information

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 1

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 1 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Two-Diensional Multi-Label Active Learning with An Efficient Online Adaptation Model for Iage Classification Guo-Jun Qi, Xian-Sheng Hua, Meber,

More information

Polygonal Designs: Existence and Construction

Polygonal Designs: Existence and Construction Polygonal Designs: Existence and Construction John Hegean Departent of Matheatics, Stanford University, Stanford, CA 9405 Jeff Langford Departent of Matheatics, Drake University, Des Moines, IA 5011 G

More information