Collapsed Variational Inference for LDA
|
|
- Martha Richardson
- 5 years ago
- Views:
Transcription
1 Collapse Variational Inference for LDA BT Thomas Yeo LDA We shall follow the same notation as Blei et al In other wors, we consier full LDA moel with hyperparameters α anη onβ anθ respectiely, whereθparameterizes Ptopic ocument anβ parameterizes Pwor topic,w n refers ton-th wor of the -th ocument. z n refers to the topic that generates the n-th wor of the-th ocument.. Variational Inference Setup Cost function aim is to maximize lielihoo of obsere wors w with respect to α,η: pw α,η = pθ, β, z, w α, ηβθ Note that Therefore = θ θ β β z qβ,θ,zpθ,β,z,w α,η βθ 2 qβ,θ,z z 3 = E q pθ,β,z,w α,η qβ,θ,z E q pθ,β,z,w α,η E q qβ,θ,z 4 = Fqβ, θ, z F is calle the ariational free energy 5 E q pθ,β,z,w α,η E q qβ,θ,z+klqβ,θ,z pβ,θ,z w,α,η 6 qβ,θ,z = E q pθ,β,z,w α,η E q qβ,θ,z+e q 7 pβ,θ,z w,α,η = E q pθ,β,z,w α,η 8 pβ,θ,z w,α,η = E q pw α,η = pw α,η pw α,η Fqβ,θ,z = KLqβ,θ,z pβ,θ,z w,α,η In other wors, for the the negatie free energy F to be equal topw α,η,qβ,θ,z shoul be equal topβ,θ,z w,α,η. In EM, we compute qβ,θ,z = pβ,θ,z w,α,η exactly in the E-step an then conitione on this particular q, we maximize with respect to α an η in the M-step. In LDA, this is not tractable. In ariational EM, qβ,θ,z is approximate with a simpler class of functions. The best q function within this class of functions is optimize uring the E-step an then conitione on this particular q, we maximize with respect toαan η in the M-step..2 Motiation for Collapse Variational Inference In the original LDA paper Blei et al., 2003, the class of proposal istributions was: qθ, z, β γ, φ, λ = qβ λqθ γqz φ 2 K D = qβ λ q θ γ qz φ 3 = = 9 0
2 In Teh et al. 2007, the class of proposal istributions was qθ,z,β φ = qθ,β zqz φ = pθ, β z, w, α, ηqz φ notice the first term is exact 5 D = pθ,β z,w,α,η qz φ 6 = Obsere that Blei s class of istribution is a subset of Teh s: K = qβ λ D = q θ γ qθ,β z,w,α,η since the former class of istributions assume inepenence between β an θ, while the latter maes no assumption about their inepenence. Therefore intuitiely, Teh s approximation is better. We can also proe this more formally: KLqβ,θ,z pβ,θ,z w,α,η = KLqzqβ,θ z pβ,θ,z w,α,η 7 qzqβ, θ z = E q 8 pβ,θ,z w,α,η qz = E q pz w,α,η + qβ,θ z 9 pβ,θ z,w,α,η qz = E q +E pz w,α,η qz E qβ, θ z qβ,θ z 20 pβ,θ z,w,α,η = E qz qz pz w,α,η 4 +E qz KL qβ,θ z pβ,θ z,w,α,η 2 KLqz pz w, α, η with equality when qβ, θ z = pβ, θ z, w, α, η 22 Therefore KL iergence of Teh s approximation is at most equal to the KL iergence of Blei s approximation because the secon KL iergence term is zero for Teh s approximation but non-zero for Blei s approximation. The question then is whether Teh s approximation is tractable..3 Simplifying the lower boun onpw α,η with Teh s approximation We can plug in qβ, θ, z = pθ, β z, w, α, ηqz φ into Eq. 4 an simplify. Howeer, it s simpler to restart from pw α, η: pw α,η = z pz,w α,η 23 = qzpz, w α, η qz z pz,w α,η = E q qz E q pz,w α,η E q qz 26 The class of proposal istribution we will consier is qz = D = qz φ =,n qz n φ n. Note that qz n φ n is a categorical istribution with parameters φ n. In other wors φ n is a ector of length K corresponing to topics, such that φ n =..4 Variational E-step The goal is to maximize Eq. 26 with respect to {φ n }. We hae to a the lagrange multipliers corresponing to φ n =. Furthermore, note that E q qz = [ ] qz qz DN qz + +qz DN 27 z z DN = qz n = qz n = 28 = φ n φ n
3 Therefore, Eq. 26 becomes E q pz,w α,η E q qz+ µ n φ n = E q pz,w α,η φ n φ n + µ n φ n = E qzij E qz ij pz,w α,η φ n φ n + µ n φ n = φ ij E qz ij pz ij,z ij =,w α,η φ n φ n + µ n φ n where i inexes a particular ocument, j inexes the j-th wor of the ocument. Differentiating with respect toφ ijl, we get pz ij,z ij = l,w α,η φ ijl +µ ij 34 E qz ij Equating the aboe to 0, we get φ ijl exp E qz ij pz ij,z ij = l,w α,η 35 exp E qz ij pz ij,z ij = l,w α,η φ ijl = E exp qz ij pz ij,z ij =,w α,η Plugging in the counts Using the fact about Dirichlet-compoun-multinomial istribution wiipeia lin, we get ΓKα Γα+N pz α = pz θ pθ αθ =, 37 ΓKα+N Γα where N is the number of wors in the -th ocument belonging to topic an corresponing to ictionary wor. Dot implies corresponing inices are summe out. For example, N = N is the number of wors in the corpus belonging to topic an ictionary wor, as well as pw z,η = ΓVη ΓVη+N Γη +N. 38 Γη The way to thin about the aboe equation is that conitione on the topics z, we can iie all the wors in the corpus w into wors from topicto topic K. Wors generate from a particular topic are inepenent of wors generate from another topics hence the. For a gien topic, the wors generate by the topic follow a Dirichlet-compoun-multinomial istribution with hyperparameter η. Using both two equations, we get pz,w α,η = pz αpw z, η ΓKα = ΓKα+N = = ΓKα ΓKα+N + N Γα+N Γα Kα+m+ Γα+N Γα N ΓVη ΓVη +N + α+m Γη +N Γη ΓVη ΓVη +N + N Vη +m+ Γη +N Γη N η +m We now come to a tricy part of the eriations! Consier Eq. 36. The numerator an enominator will hae four terms corresponing to those from Eq
4 The first term will be the same for both numerator an enominator an will therefore cancel out. The secon term of pz ij,z ij = l,w α,η i.e., numerator of Eq. 36 excluing the exponential an expectation correspons to: N = i α+m N α+m + N ij i 43 α+m +α+n ij il, 44 where the first two terms will cancel with the enominator because they are inepenent ofl. Similarly, the thir term ofpz ij,z ij = l,w α,η i.e., numerator of Eq. 36 excluing the exponential an expectation correspons to: N Vη +m = N ij Vη+m Vη+N ij l, 45 where the first term will cancel with the enominator because they are inepenent of l. Finally, the fourth term of pz ij,z ij = l,w α,η i.e., numerator of Eq. 36 excluing the exponential an expectation correspons to: N η +m = N ij η +m +η +N ij lw ij, 46 where the first term will cancel with the enominator because they are inepenent of l. Therefore Eq. 36 becomes expe qz φ ijl = ij α+n ij il Vη +N ij l +η +N ij lw ij expe qz ij α+n ij i Vη +N ij +η +N ij w ij Renaming i to, j to n, an l to, we get φ n = expe qz n α+n n Vη +N n +η +N n w n expe qz n α+n n Vη +N n +η +N n w n.4.2 Approximating the countsn N n = n n Iz n =, which is a sum of inepenent by the mean fiel approximation bernoulli ariables with probability of heas equal toφ n. Therefore the mean an ariance of N n is gien by E q N n = φ n n n Var q N n n nφ n φ n 49 N n =,n =,n Iz n =, with mean an ariance gien by E q N n =,n,n φ n Var q N n =,n,n φ n φ n 50 N n w n =,n,n Iz n = Iw n = w n with mean an ariance gien by E q N n w n = φ n Iw n = w n Var q N n w n = φ n φ n Iw n = w n,n,n 4,n,n 5
5 As suggeste by Teh et al. 2007, we will approximate the ranom ariables N n, N n an N n w n as Gaussian ranom ariables with mean an ariance as iscusse aboe. First, note that fx = b+x = fa+f ax a+ f a x a = fa+ b+a x a 2b+a 2x a Therefore the terms in the numerator of Eq. 48 becomes Let b = α, x = N n an a = E q N n E qz n α+n n = f E q N n + α+e q N n E q N n E q N n α+E q N n 2E q N n E q N n = α+e q N n Var q N n 2α+E q N n 57 2 Let b = Vη, x = N n an a = E q N n E qz n Vη+N n = Vη+E q N n Var q N n 2Vη+E q N n 58 2 Let b = η,x = N n w n an a = E q N n w n E qz n η +N n w n = η +E q N n Var q N n w w n n 2η +E q N n 59 w n Plugging the aboe into Eq. 48, we get φ n α+e q N n Vη +E q N n η +E q N n w n Var q N n n n exp 2α+E q N n + Var q N 2 2Vη+E q N n Var q N w n 2 2η +E q N n w n M-step Teh et al stoppe at the ariational E-step. Here, we will consier how to optimize α an η. In the M-step, we see to maximize Eq. 26 with respect to α an η. Keeping only the first term, which contains all the terms that o not contain α, we get E q pz,w α,η = E q N Kα+m+ N α+m N Vη+m+ N 6 η +m 62 5
6 .5. Optimizeα Let s consier the terms withα: L α = E q N Kα+m+ = = = N N N Kα+m+E q Kα+m+ Kα+m+ Differentiating with respect to α, we get N N qn = n N n= α+m α+m n α+m 65 qn nα+n 66 N n= L α α = N K Kα+m + Differentiating one more time, we get qn n α+n N n= 67 2 L α α 2 = N K 2 Kα+m 2 qn n α+n 2 68 N n= We can use the Hessian an Graient to compute the Newton-Raphson upate. To compute qn n, note that N = n Iz n =, which we can approximate by a Gaussian with mean n φ n an ariance n φ n φ n..5.2 Optimizeη Let s consier the terms withη an enoting N = N i.e., total number of wors in all the ocuments, we get L η = E q N Vη +m+ N η +m 69 = N n qn = n Vη +m+ N n qn = n η +m n= n= 70 = N nvη n=qn +n + N qn nη +n n= 7 Differentiating with respect to α, we get L η η = N V qn n Vη +n + n= Differentiating one more time, we get N qn n η +n n= 72 2 L η η 2 = N V 2 qn n Vη+n 2 n= N qn n η +n 2 73 n= We can use the Hessian an Graient to compute the Newton-Raphson upate. To compute qn n, note that N = n Iz n =, which we can approximate by a Gaussian with mean n φ n an ariance n φ n φ n. To compute qn n, note that N = n Iz n = Iw n =, which we can approximate by a Gaussian with mean n φ niw n = an ariance n φ n φ n Iw n = 6
7 .6 Alternatie M-step As an alternatie approach, we can useφ n to compute γ = α + N n= λ = η + φ n N φ n wn, n= which we can then use to upate α an η exactly the same way as the original LDA Blei et al., This is theoretically not as goo as the preious section because we are using point estimates of θ an β instea of integrating them out. But the estimates of φ shoul be better than the original LDA Blei et al., 2003, so maybe it will perform better? Howeer, there is no theoretical guarantees unlie the ariational E-step. 7
Collapsed Variational Inference for HDP
Collapse Variational Inference for HDP Yee W. Teh Davi Newman an Max Welling Publishe on NIPS 2007 Discussion le by Iulian Pruteanu Outline Introuction Hierarchical Bayesian moel for LDA Collapse VB inference
More informationLecture 2: Correlated Topic Model
Probabilistic Moels for Unsupervise Learning Spring 203 Lecture 2: Correlate Topic Moel Inference for Correlate Topic Moel Yuan Yuan First of all, let us make some claims about the parameters an variables
More informationCollapsed Gibbs and Variational Methods for LDA. Example Collapsed MoG Sampling
Case Stuy : Document Retrieval Collapse Gibbs an Variational Methos for LDA Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 7 th, 0 Example
More informationLDA Collapsed Gibbs Sampler, VariaNonal Inference. Task 3: Mixed Membership Models. Case Study 5: Mixed Membership Modeling
Case Stuy 5: Mixe Membership Moeling LDA Collapse Gibbs Sampler, VariaNonal Inference Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox May 8 th, 05 Emily Fox 05 Task : Mixe
More informationCS Lecture 18. Topic Models and LDA
CS 6347 Lecture 18 Topic Models and LDA (some slides by David Blei) Generative vs. Discriminative Models Recall that, in Bayesian networks, there could be many different, but equivalent models of the same
More informationCombining Time Series and Cross-sectional Data for Current Employment Statistics Estimates 1
JSM015 - Surey Research Methos Section Combining Time Series an Cross-sectional Data for Current Employment Statistics Estimates 1 Julie Gershunskaya U.S. Bureau of Labor Statistics, Massachusetts Ae NE,
More informationLDA with Amortized Inference
LDA with Amortied Inference Nanbo Sun Abstract This report describes how to frame Latent Dirichlet Allocation LDA as a Variational Auto- Encoder VAE and use the Amortied Variational Inference AVI to optimie
More informationTopic Models. Charles Elkan November 20, 2008
Topic Models Charles Elan elan@cs.ucsd.edu November 20, 2008 Suppose that we have a collection of documents, and we want to find an organization for these, i.e. we want to do unsupervised learning. One
More informationTopic Modeling: Beyond Bag-of-Words
Hanna M. Wallach Cavenish Laboratory, University of Cambrige, Cambrige CB3 0HE, UK hmw26@cam.ac.u Abstract Some moels of textual corpora employ text generation methos involving n-gram statistics, while
More informationNote 1: Varitional Methods for Latent Dirichlet Allocation
Technical Note Series Spring 2013 Note 1: Varitional Methods for Latent Dirichlet Allocation Version 1.0 Wayne Xin Zhao batmanfly@gmail.com Disclaimer: The focus of this note was to reorganie the content
More informationVariance Reduction for Stochastic Gradient Optimization
Variance Reduction for Stochastic Gradient Optimization Chong Wang Xi Chen Alex Smola Eric P. Xing Carnegie Mellon Uniersity, Uniersity of California, Berkeley {chongw,xichen,epxing}@cs.cmu.edu alex@smola.org
More informationStudy Notes on the Latent Dirichlet Allocation
Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection
More informationSparse Stochastic Inference for Latent Dirichlet Allocation
Sparse Stochastic Inference for Latent Dirichlet Allocation David Mimno 1, Matthew D. Hoffman 2, David M. Blei 1 1 Dept. of Computer Science, Princeton U. 2 Dept. of Statistics, Columbia U. Presentation
More informationAnother Walkthrough of Variational Bayes. Bevan Jones Machine Learning Reading Group Macquarie University
Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University 2 Variational Bayes? Bayes Bayes Theorem But the integral is intractable! Sampling Gibbs, Metropolis
More informationTopic Modelling and Latent Dirichlet Allocation
Topic Modelling and Latent Dirichlet Allocation Stephen Clark (with thanks to Mark Gales for some of the slides) Lent 2013 Machine Learning for Language Processing: Lecture 7 MPhil in Advanced Computer
More informationCalculus and optimization
Calculus an optimization These notes essentially correspon to mathematical appenix 2 in the text. 1 Functions of a single variable Now that we have e ne functions we turn our attention to calculus. A function
More informationGenerative Clustering, Topic Modeling, & Bayesian Inference
Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationHomework 2 Solutions EM, Mixture Models, PCA, Dualitys
Homewor Solutions EM, Mixture Moels, PCA, Dualitys CMU 0-75: Machine Learning Fall 05 http://www.cs.cmu.eu/~bapoczos/classes/ml075_05fall/ OUT: Oct 5, 05 DUE: Oct 9, 05, 0:0 AM An EM algorithm for a Mixture
More information. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences.
S 63 Lecture 8 2/2/26 Lecturer Lillian Lee Scribes Peter Babinski, Davi Lin Basic Language Moeling Approach I. Special ase of LM-base Approach a. Recap of Formulas an Terms b. Fixing θ? c. About that Multinomial
More informationOn Topic Evolution. Eric P. Xing School of Computer Science Carnegie Mellon University Technical Report: CMU-CALD
On Topic Evolution Eric P. Xing School of Computer Science Carnegie Mellon University epxing@cs.cmu.eu Technical Report: CMU-CALD-05-5 December 005 Abstract I introuce topic evolution moels for longituinal
More informationPattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM
Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures
More information(x,y) 4. Calculus I: Differentiation
4. Calculus I: Differentiation 4. The eriatie of a function Suppose we are gien a cure with a point lying on it. If the cure is smooth at then we can fin a unique tangent to the cure at : If the tangent
More informationAn Introduction to Expectation-Maximization
An Introduction to Expectation-Maximization Dahua Lin Abstract This notes reviews the basics about the Expectation-Maximization EM) algorithm, a popular approach to perform model estimation of the generative
More informationSelf-normalized Martingale Tail Inequality
Online-to-Confience-Set Conversions an Application to Sparse Stochastic Banits A Self-normalize Martingale Tail Inequality The self-normalize martingale tail inequality that we present here is the scalar-value
More informationA Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation
THIS IS A DRAFT VERSION. FINAL VERSION TO BE PUBLISHED AT NIPS 06 A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee Whye Teh School of Computing National University
More informationA Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee Whye Teh Gatsby Computational Neuroscience Unit University College London 17 Queen Square, London WC1N 3AR, UK ywteh@gatsby.ucl.ac.uk
More informationLinear and quadratic approximation
Linear an quaratic approximation November 11, 2013 Definition: Suppose f is a function that is ifferentiable on an interval I containing the point a. The linear approximation to f at a is the linear function
More informationunder the null hypothesis, the sign test (with continuity correction) rejects H 0 when α n + n 2 2.
Assignment 13 Exercise 8.4 For the hypotheses consiere in Examples 8.12 an 8.13, the sign test is base on the statistic N + = #{i : Z i > 0}. Since 2 n(n + /n 1) N(0, 1) 2 uner the null hypothesis, the
More informationdifferent formulas, depending on whether or not the vector is in two dimensions or three dimensions.
ectors The word ector comes from the Latin word ectus which means carried. It is best to think of a ector as the displacement from an initial point P to a terminal point Q. Such a ector is expressed as
More informationInferring dynamic architecture of cellular networks using time series of gene expression, protein and metabolite data. Supplementary Information
Inferring ynamic architecture of cellular networks using time series of gene expression, protein an metabolite ata Euaro Sontag, Anatoly Kiyatkin an Boris N. Kholoenko *, Department of Mathematics, Rutgers
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationApplying LDA topic model to a corpus of Italian Supreme Court decisions
Applying LDA topic model to a corpus of Italian Supreme Court decisions Paolo Fantini Statistical Service of the Ministry of Justice - Italy CESS Conference - Rome - November 25, 2014 Our goal finding
More informationLower Bounds for the Smoothed Number of Pareto optimal Solutions
Lower Bouns for the Smoothe Number of Pareto optimal Solutions Tobias Brunsch an Heiko Röglin Department of Computer Science, University of Bonn, Germany brunsch@cs.uni-bonn.e, heiko@roeglin.org Abstract.
More informationFundamental Laws of Motion for Particles, Material Volumes, and Control Volumes
Funamental Laws of Motion for Particles, Material Volumes, an Control Volumes Ain A. Sonin Department of Mechanical Engineering Massachusetts Institute of Technology Cambrige, MA 02139, USA August 2001
More informationIntroduction to Bayesian inference
Introduction to Bayesian inference Thomas Alexander Brouwer University of Cambridge tab43@cam.ac.uk 17 November 2015 Probabilistic models Describe how data was generated using probability distributions
More informationTopic 7: Convergence of Random Variables
Topic 7: Convergence of Ranom Variables Course 003, 2016 Page 0 The Inference Problem So far, our starting point has been a given probability space (S, F, P). We now look at how to generate information
More informationLatent Dirichlet Allocation (LDA)
Latent Dirichlet Allocation (LDA) D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3:993-1022, January 2003. Following slides borrowed ant then heavily modified from: Jonathan Huang
More informationTime-of-Arrival Estimation in Non-Line-Of-Sight Environments
2 Conference on Information Sciences an Systems, The Johns Hopkins University, March 2, 2 Time-of-Arrival Estimation in Non-Line-Of-Sight Environments Sinan Gezici, Hisashi Kobayashi an H. Vincent Poor
More informationAN M/M/1 QUEUEING SYSTEM WITH INTERRUPTED ARRIVAL STREAM K.C. Madan, Department of Statistics, College of Science, Yarmouk University, Irbid, Jordan
REVISTA INVESTIGACION OPERACIONAL Vol., No., AN M/M/ QUEUEING SYSTEM WITH INTERRUPTED ARRIVAL STREAM K.C. Maan, Department of Statistics, College of Science, Yarmouk University, Irbi, Joran ABSTRACT We
More informationExponential families also behave nicely under conditioning. Specifically, suppose we write η = (η 1, η 2 ) R k R p k so that
1 More examples 1.1 Exponential families under conditioning Exponential families also behave nicely under conditioning. Specifically, suppose we write η = η 1, η 2 R k R p k so that dp η dm 0 = e ηt 1
More informationOptimization of Geometries by Energy Minimization
Optimization of Geometries by Energy Minimization by Tracy P. Hamilton Department of Chemistry University of Alabama at Birmingham Birmingham, AL 3594-140 hamilton@uab.eu Copyright Tracy P. Hamilton, 1997.
More informationDeep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationCONTROL AND PERFORMANCE OF A NINE PHASE SYNCHRONOUS RELUCTANCE DRIVE
CONTRO AN PERFORMANCE OF A NINE PHASE SYNCHRONOUS REUCTANCE RIVE Abstract C.E. Coates*,. Platt** an V.J. Gosbell ** * epartment of Electrical an Computer Engineering Uniersity of Newcastle ** epartment
More informationEuler equations for multiple integrals
Euler equations for multiple integrals January 22, 2013 Contents 1 Reminer of multivariable calculus 2 1.1 Vector ifferentiation......................... 2 1.2 Matrix ifferentiation........................
More informationAsymptotic Normality of an Entropy Estimator with Exponentially Decaying Bias
Asymptotic Normality of an Entropy Estimator with Exponentially Decaying Bias Zhiyi Zhang Department of Mathematics and Statistics Uniersity of North Carolina at Charlotte Charlotte, NC 28223 Abstract
More informationDirichlet Enhanced Latent Semantic Analysis
Dirichlet Enhanced Latent Semantic Analysis Kai Yu Siemens Corporate Technology D-81730 Munich, Germany Kai.Yu@siemens.com Shipeng Yu Institute for Computer Science University of Munich D-80538 Munich,
More information6 General properties of an autonomous system of two first order ODE
6 General properties of an autonomous system of two first orer ODE Here we embark on stuying the autonomous system of two first orer ifferential equations of the form ẋ 1 = f 1 (, x 2 ), ẋ 2 = f 2 (, x
More informationJUST THE MATHS UNIT NUMBER DIFFERENTIATION 2 (Rates of change) A.J.Hobson
JUST THE MATHS UNIT NUMBER 10.2 DIFFERENTIATION 2 (Rates of change) by A.J.Hobson 10.2.1 Introuction 10.2.2 Average rates of change 10.2.3 Instantaneous rates of change 10.2.4 Derivatives 10.2.5 Exercises
More informationIN the evolution of the Internet, there have been
1 Tag-Weighte Topic Moel For Large-scale Semi-Structure Documents Shuangyin Li, Jiefei Li, Guan Huang, Ruiyang Tan, an Rong Pan arxiv:1507.08396v1 [cs.cl] 30 Jul 2015 Abstract To ate, there have been massive
More informationUC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics
UC Berkeley Department of Electrical Engineering an Computer Science Department of Statistics EECS 8B / STAT 4B Avance Topics in Statistical Learning Theory Solutions 3 Spring 9 Solution 3. For parti,
More informationarxiv: v4 [math.pr] 27 Jul 2016
The Asymptotic Distribution of the Determinant of a Ranom Correlation Matrix arxiv:309768v4 mathpr] 7 Jul 06 AM Hanea a, & GF Nane b a Centre of xcellence for Biosecurity Risk Analysis, University of Melbourne,
More informationPhysics 2A Chapter 3 - Motion in Two Dimensions Fall 2017
These notes are seen pages. A quick summary: Projectile motion is simply horizontal motion at constant elocity with ertical motion at constant acceleration. An object moing in a circular path experiences
More informationSurvey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013
Survey Sampling Kosuke Imai Department of Politics, Princeton University February 19, 2013 Survey sampling is one of the most commonly use ata collection methos for social scientists. We begin by escribing
More informationA. Exclusive KL View of the MLE
A. Exclusive KL View of the MLE Lets assume a change-of-variable moel p Z z on the ranom variable Z R m, such as the one use in Dinh et al. 2017: z 0 p 0 z 0 an z = ψz 0, where ψ is an invertible function
More informationIntroduction to Machine Learning
How o you estimate p(y x)? Outline Contents Introuction to Machine Learning Logistic Regression Varun Chanola April 9, 207 Generative vs. Discriminative Classifiers 2 Logistic Regression 2 3 Logistic Regression
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationSome Examples. Uniform motion. Poisson processes on the real line
Some Examples Our immeiate goal is to see some examples of Lévy processes, an/or infinitely-ivisible laws on. Uniform motion Choose an fix a nonranom an efine X := for all (1) Then, {X } is a [nonranom]
More information8.0 Definition and the concept of a vector:
Chapter 8: Vectors In this chapter, we will study: 80 Definition and the concept of a ector 81 Representation of ectors in two dimensions (2D) 82 Representation of ectors in three dimensions (3D) 83 Operations
More informationFactorized Multi-Modal Topic Model
Factorize Multi-Moal Topic Moel Seppo Virtanen 1, Yangqing Jia 2, Arto Klami 1, Trevor Darrell 2 1 Helsini Institute for Information Technology HIIT Department of Information an Compute Science, Aalto
More informationAnalyzing Tensor Power Method Dynamics in Overcomplete Regime
Journal of Machine Learning Research 18 (2017) 1-40 Submitte 9/15; Revise 11/16; Publishe 4/17 Analyzing Tensor Power Metho Dynamics in Overcomplete Regime Animashree Ananumar Department of Electrical
More informationAn Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback
Journal of Machine Learning Research 8 07) - Submitte /6; Publishe 5/7 An Optimal Algorithm for Banit an Zero-Orer Convex Optimization with wo-point Feeback Oha Shamir Department of Computer Science an
More informationTopic Uncovering and Image Annotation via Scalable Probit Normal Correlated Topic Models
Rochester Institute of Technology RIT Scholar Wors Theses Thesis/Dissertation Collections 5-2015 Topic Uncovering an Image Annotation via Scalable Probit Normal Correlate Topic Moels Xingchen Yu Follow
More informationOn the Total Duration of Negative Surplus of a Risk Process with Two-step Premium Function
Aailable at http://pame/pages/398asp ISSN: 93-9466 Vol, Isse (December 7), pp 7 (Preiosly, Vol, No ) Applications an Applie Mathematics (AAM): An International Jornal Abstract On the Total Dration of Negatie
More informationDocument and Topic Models: plsa and LDA
Document and Topic Models: plsa and LDA Andrew Levandoski and Jonathan Lobo CS 3750 Advanced Topics in Machine Learning 2 October 2018 Outline Topic Models plsa LSA Model Fitting via EM phits: link analysis
More informationDifferentiation ( , 9.5)
Chapter 2 Differentiation (8.1 8.3, 9.5) 2.1 Rate of Change (8.2.1 5) Recall that the equation of a straight line can be written as y = mx + c, where m is the slope or graient of the line, an c is the
More informationQF101: Quantitative Finance September 5, Week 3: Derivatives. Facilitator: Christopher Ting AY 2017/2018. f ( x + ) f(x) f(x) = lim
QF101: Quantitative Finance September 5, 2017 Week 3: Derivatives Facilitator: Christopher Ting AY 2017/2018 I recoil with ismay an horror at this lamentable plague of functions which o not have erivatives.
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s
More informationAn Alternative Characterization of Hidden Regular Variation in Joint Tail Modeling
An Alternatie Characterization of Hidden Regular Variation in Joint Tail Modeling Grant B. Weller Daniel S. Cooley Department of Statistics, Colorado State Uniersity, Fort Collins, CO USA May 22, 212 Abstract
More informationProof of SPNs as Mixture of Trees
A Proof of SPNs as Mixture of Trees Theorem 1. If T is an inuce SPN from a complete an ecomposable SPN S, then T is a tree that is complete an ecomposable. Proof. Argue by contraiction that T is not a
More informationLeast-Squares Regression on Sparse Spaces
Least-Squares Regression on Sparse Spaces Yuri Grinberg, Mahi Milani Far, Joelle Pineau School of Computer Science McGill University Montreal, Canaa {ygrinb,mmilan1,jpineau}@cs.mcgill.ca 1 Introuction
More informationRobust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k
A Proof of Lemma 2 B Proof of Lemma 3 Proof: Since the support of LL istributions is R, two such istributions are equivalent absolutely continuous with respect to each other an the ivergence is well-efine
More informationLecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012
CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING Text Data: Topic Model Instructor: Yizhou Sun yzsun@cs.ucla.edu December 4, 2017 Methods to be Learnt Vector Data Set Data Sequence Data Text Data Classification Clustering
More informationInformation retrieval LSI, plsi and LDA. Jian-Yun Nie
Information retrieval LSI, plsi and LDA Jian-Yun Nie Basics: Eigenvector, Eigenvalue Ref: http://en.wikipedia.org/wiki/eigenvector For a square matrix A: Ax = λx where x is a vector (eigenvector), and
More informationConvergence of Random Walks
Chapter 16 Convergence of Ranom Walks This lecture examines the convergence of ranom walks to the Wiener process. This is very important both physically an statistically, an illustrates the utility of
More informationLatent Dirichlet Bayesian Co-Clustering
Latent Dirichlet Bayesian Co-Clustering Pu Wang 1, Carlotta Domeniconi 1, and athryn Blackmond Laskey 1 Department of Computer Science Department of Systems Engineering and Operations Research George Mason
More informationNote for plsa and LDA-Version 1.1
Note for plsa and LDA-Version 1.1 Wayne Xin Zhao March 2, 2011 1 Disclaimer In this part of PLSA, I refer to [4, 5, 1]. In LDA part, I refer to [3, 2]. Due to the limit of my English ability, in some place,
More informationReconstructions for some coupled-physics inverse problems
Reconstructions for some couple-physics inerse problems Guillaume al Gunther Uhlmann March 9, 01 Abstract This letter announces an summarizes results obtaine in [8] an consiers seeral natural extensions.
More informationGAUSSIAN PROCESS REGRESSION
GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The
More informationA Shortest-Path Algorithm for the Departure Time and Speed Optimization Problem
A Shortest-Path Algorithm for the Departure Time an Spee Optimization Problem Anna Franceschetti Dorothée Honhon Gilbert Laporte Tom Van Woensel Noember 2016 CIRRELT-2016-64 A Shortest-Path Algorithm for
More informationIntroduction To Machine Learning
Introduction To Machine Learning David Sontag New York University Lecture 21, April 14, 2016 David Sontag (NYU) Introduction To Machine Learning Lecture 21, April 14, 2016 1 / 14 Expectation maximization
More informationA Course in Machine Learning
A Course in Machine Learning Hal Daumé III 12 EFFICIENT LEARNING So far, our focus has been on moels of learning an basic algorithms for those moels. We have not place much emphasis on how to learn quickly.
More informationOutline Lecture 2 2(32)
Outline Lecture (3), Lecture Linear Regression and Classification it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic
More informationAPPROXIMATE THEORY-AIDED ROBUST EFFICIENT FACTORIAL FRACTIONS UNDER BASELINE PARAMETRIZATION
APPROXIMAE HEORY-AIDED ROBUS EFFICIEN FACORIAL FRACIONS UNDER BASELINE PARAMERIZAION Rahul Muerjee an S. Hua Inian Institute of Management Calcutta Department of Statistics an OR Joa, Diamon Harbour Roa
More informationProbabilistic Graphical Models for Image Analysis - Lecture 4
Probabilistic Graphical Models for Image Analysis - Lecture 4 Stefan Bauer 12 October 2018 Max Planck ETH Center for Learning Systems Overview 1. Repetition 2. α-divergence 3. Variational Inference 4.
More informationLinear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52
Statistics for Applications Chapter 10: Generalized Linear Models (GLMs) 1/52 Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Components of a linear model The two
More information7.1 Support Vector Machine
67577 Intro. to Machine Learning Fall semester, 006/7 Lecture 7: Support Vector Machines an Kernel Functions II Lecturer: Amnon Shashua Scribe: Amnon Shashua 7. Support Vector Machine We return now to
More informationMixtures of Multinomials
Mixtures of Multinomials Jason D. M. Rennie jrennie@gmail.com September, 25 Abstract We consider two different types of multinomial mixtures, () a wordlevel mixture, and (2) a document-level mixture. We
More informationAN INTRODUCTION TO TOPIC MODELS
AN INTRODUCTION TO TOPIC MODELS Michael Paul December 4, 2013 600.465 Natural Language Processing Johns Hopkins University Prof. Jason Eisner Making sense of text Suppose you want to learn something about
More informationDeep Variational Inference. FLARE Reading Group Presentation Wesley Tansey 9/28/2016
Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is Variational Inference? What is Variational Inference? Want to estimate some distribution, p*(x) p*(x) What is
More information3.6. Let s write out the sample space for this random experiment:
STAT 5 3 Let s write out the sample space for this ranom experiment: S = {(, 2), (, 3), (, 4), (, 5), (2, 3), (2, 4), (2, 5), (3, 4), (3, 5), (4, 5)} This sample space assumes the orering of the balls
More informationThe Press-Schechter mass function
The Press-Schechter mass function To state the obvious: It is important to relate our theories to what we can observe. We have looke at linear perturbation theory, an we have consiere a simple moel for
More informationProbablistic Graphical Models, Spring 2007 Homework 4 Due at the beginning of class on 11/26/07
Probablistic Graphical odels, Spring 2007 Homework 4 Due at the beginning of class on 11/26/07 Instructions There are four questions in this homework. The last question involves some programming which
More informationStatistical Debugging with Latent Topic Models
Statistical Debugging with Latent Topic Models David Andrzejewski, Anne Mulhern, Ben Liblit, Xiaojin Zhu Department of Computer Sciences University of Wisconsin Madison European Conference on Machine Learning,
More informationThermal conductivity of graded composites: Numerical simulations and an effective medium approximation
JOURNAL OF MATERIALS SCIENCE 34 (999)5497 5503 Thermal conuctivity of grae composites: Numerical simulations an an effective meium approximation P. M. HUI Department of Physics, The Chinese University
More informationIntroduction to Machine Learning
Introduction to Machine Learning Thomas G. Dietterich tgd@eecs.oregonstate.edu 1 Outline What is Machine Learning? Introduction to Supervised Learning: Linear Methods Overfitting, Regularization, and the
More informationChapter 4. Electrostatics of Macroscopic Media
Chapter 4. Electrostatics of Macroscopic Meia 4.1 Multipole Expansion Approximate potentials at large istances 3 x' x' (x') x x' x x Fig 4.1 We consier the potential in the far-fiel region (see Fig. 4.1
More informationTable of Common Derivatives By David Abraham
Prouct an Quotient Rules: Table of Common Derivatives By Davi Abraham [ f ( g( ] = [ f ( ] g( + f ( [ g( ] f ( = g( [ f ( ] g( g( f ( [ g( ] Trigonometric Functions: sin( = cos( cos( = sin( tan( = sec
More information2Algebraic ONLINE PAGE PROOFS. foundations
Algebraic founations. Kick off with CAS. Algebraic skills.3 Pascal s triangle an binomial expansions.4 The binomial theorem.5 Sets of real numbers.6 Surs.7 Review . Kick off with CAS Playing lotto Using
More information