Collapsed Variational Inference for LDA

Size: px
Start display at page:

Download "Collapsed Variational Inference for LDA"

Transcription

1 Collapse Variational Inference for LDA BT Thomas Yeo LDA We shall follow the same notation as Blei et al In other wors, we consier full LDA moel with hyperparameters α anη onβ anθ respectiely, whereθparameterizes Ptopic ocument anβ parameterizes Pwor topic,w n refers ton-th wor of the -th ocument. z n refers to the topic that generates the n-th wor of the-th ocument.. Variational Inference Setup Cost function aim is to maximize lielihoo of obsere wors w with respect to α,η: pw α,η = pθ, β, z, w α, ηβθ Note that Therefore = θ θ β β z qβ,θ,zpθ,β,z,w α,η βθ 2 qβ,θ,z z 3 = E q pθ,β,z,w α,η qβ,θ,z E q pθ,β,z,w α,η E q qβ,θ,z 4 = Fqβ, θ, z F is calle the ariational free energy 5 E q pθ,β,z,w α,η E q qβ,θ,z+klqβ,θ,z pβ,θ,z w,α,η 6 qβ,θ,z = E q pθ,β,z,w α,η E q qβ,θ,z+e q 7 pβ,θ,z w,α,η = E q pθ,β,z,w α,η 8 pβ,θ,z w,α,η = E q pw α,η = pw α,η pw α,η Fqβ,θ,z = KLqβ,θ,z pβ,θ,z w,α,η In other wors, for the the negatie free energy F to be equal topw α,η,qβ,θ,z shoul be equal topβ,θ,z w,α,η. In EM, we compute qβ,θ,z = pβ,θ,z w,α,η exactly in the E-step an then conitione on this particular q, we maximize with respect to α an η in the M-step. In LDA, this is not tractable. In ariational EM, qβ,θ,z is approximate with a simpler class of functions. The best q function within this class of functions is optimize uring the E-step an then conitione on this particular q, we maximize with respect toαan η in the M-step..2 Motiation for Collapse Variational Inference In the original LDA paper Blei et al., 2003, the class of proposal istributions was: qθ, z, β γ, φ, λ = qβ λqθ γqz φ 2 K D = qβ λ q θ γ qz φ 3 = = 9 0

2 In Teh et al. 2007, the class of proposal istributions was qθ,z,β φ = qθ,β zqz φ = pθ, β z, w, α, ηqz φ notice the first term is exact 5 D = pθ,β z,w,α,η qz φ 6 = Obsere that Blei s class of istribution is a subset of Teh s: K = qβ λ D = q θ γ qθ,β z,w,α,η since the former class of istributions assume inepenence between β an θ, while the latter maes no assumption about their inepenence. Therefore intuitiely, Teh s approximation is better. We can also proe this more formally: KLqβ,θ,z pβ,θ,z w,α,η = KLqzqβ,θ z pβ,θ,z w,α,η 7 qzqβ, θ z = E q 8 pβ,θ,z w,α,η qz = E q pz w,α,η + qβ,θ z 9 pβ,θ z,w,α,η qz = E q +E pz w,α,η qz E qβ, θ z qβ,θ z 20 pβ,θ z,w,α,η = E qz qz pz w,α,η 4 +E qz KL qβ,θ z pβ,θ z,w,α,η 2 KLqz pz w, α, η with equality when qβ, θ z = pβ, θ z, w, α, η 22 Therefore KL iergence of Teh s approximation is at most equal to the KL iergence of Blei s approximation because the secon KL iergence term is zero for Teh s approximation but non-zero for Blei s approximation. The question then is whether Teh s approximation is tractable..3 Simplifying the lower boun onpw α,η with Teh s approximation We can plug in qβ, θ, z = pθ, β z, w, α, ηqz φ into Eq. 4 an simplify. Howeer, it s simpler to restart from pw α, η: pw α,η = z pz,w α,η 23 = qzpz, w α, η qz z pz,w α,η = E q qz E q pz,w α,η E q qz 26 The class of proposal istribution we will consier is qz = D = qz φ =,n qz n φ n. Note that qz n φ n is a categorical istribution with parameters φ n. In other wors φ n is a ector of length K corresponing to topics, such that φ n =..4 Variational E-step The goal is to maximize Eq. 26 with respect to {φ n }. We hae to a the lagrange multipliers corresponing to φ n =. Furthermore, note that E q qz = [ ] qz qz DN qz + +qz DN 27 z z DN = qz n = qz n = 28 = φ n φ n

3 Therefore, Eq. 26 becomes E q pz,w α,η E q qz+ µ n φ n = E q pz,w α,η φ n φ n + µ n φ n = E qzij E qz ij pz,w α,η φ n φ n + µ n φ n = φ ij E qz ij pz ij,z ij =,w α,η φ n φ n + µ n φ n where i inexes a particular ocument, j inexes the j-th wor of the ocument. Differentiating with respect toφ ijl, we get pz ij,z ij = l,w α,η φ ijl +µ ij 34 E qz ij Equating the aboe to 0, we get φ ijl exp E qz ij pz ij,z ij = l,w α,η 35 exp E qz ij pz ij,z ij = l,w α,η φ ijl = E exp qz ij pz ij,z ij =,w α,η Plugging in the counts Using the fact about Dirichlet-compoun-multinomial istribution wiipeia lin, we get ΓKα Γα+N pz α = pz θ pθ αθ =, 37 ΓKα+N Γα where N is the number of wors in the -th ocument belonging to topic an corresponing to ictionary wor. Dot implies corresponing inices are summe out. For example, N = N is the number of wors in the corpus belonging to topic an ictionary wor, as well as pw z,η = ΓVη ΓVη+N Γη +N. 38 Γη The way to thin about the aboe equation is that conitione on the topics z, we can iie all the wors in the corpus w into wors from topicto topic K. Wors generate from a particular topic are inepenent of wors generate from another topics hence the. For a gien topic, the wors generate by the topic follow a Dirichlet-compoun-multinomial istribution with hyperparameter η. Using both two equations, we get pz,w α,η = pz αpw z, η ΓKα = ΓKα+N = = ΓKα ΓKα+N + N Γα+N Γα Kα+m+ Γα+N Γα N ΓVη ΓVη +N + α+m Γη +N Γη ΓVη ΓVη +N + N Vη +m+ Γη +N Γη N η +m We now come to a tricy part of the eriations! Consier Eq. 36. The numerator an enominator will hae four terms corresponing to those from Eq

4 The first term will be the same for both numerator an enominator an will therefore cancel out. The secon term of pz ij,z ij = l,w α,η i.e., numerator of Eq. 36 excluing the exponential an expectation correspons to: N = i α+m N α+m + N ij i 43 α+m +α+n ij il, 44 where the first two terms will cancel with the enominator because they are inepenent ofl. Similarly, the thir term ofpz ij,z ij = l,w α,η i.e., numerator of Eq. 36 excluing the exponential an expectation correspons to: N Vη +m = N ij Vη+m Vη+N ij l, 45 where the first term will cancel with the enominator because they are inepenent of l. Finally, the fourth term of pz ij,z ij = l,w α,η i.e., numerator of Eq. 36 excluing the exponential an expectation correspons to: N η +m = N ij η +m +η +N ij lw ij, 46 where the first term will cancel with the enominator because they are inepenent of l. Therefore Eq. 36 becomes expe qz φ ijl = ij α+n ij il Vη +N ij l +η +N ij lw ij expe qz ij α+n ij i Vη +N ij +η +N ij w ij Renaming i to, j to n, an l to, we get φ n = expe qz n α+n n Vη +N n +η +N n w n expe qz n α+n n Vη +N n +η +N n w n.4.2 Approximating the countsn N n = n n Iz n =, which is a sum of inepenent by the mean fiel approximation bernoulli ariables with probability of heas equal toφ n. Therefore the mean an ariance of N n is gien by E q N n = φ n n n Var q N n n nφ n φ n 49 N n =,n =,n Iz n =, with mean an ariance gien by E q N n =,n,n φ n Var q N n =,n,n φ n φ n 50 N n w n =,n,n Iz n = Iw n = w n with mean an ariance gien by E q N n w n = φ n Iw n = w n Var q N n w n = φ n φ n Iw n = w n,n,n 4,n,n 5

5 As suggeste by Teh et al. 2007, we will approximate the ranom ariables N n, N n an N n w n as Gaussian ranom ariables with mean an ariance as iscusse aboe. First, note that fx = b+x = fa+f ax a+ f a x a = fa+ b+a x a 2b+a 2x a Therefore the terms in the numerator of Eq. 48 becomes Let b = α, x = N n an a = E q N n E qz n α+n n = f E q N n + α+e q N n E q N n E q N n α+E q N n 2E q N n E q N n = α+e q N n Var q N n 2α+E q N n 57 2 Let b = Vη, x = N n an a = E q N n E qz n Vη+N n = Vη+E q N n Var q N n 2Vη+E q N n 58 2 Let b = η,x = N n w n an a = E q N n w n E qz n η +N n w n = η +E q N n Var q N n w w n n 2η +E q N n 59 w n Plugging the aboe into Eq. 48, we get φ n α+e q N n Vη +E q N n η +E q N n w n Var q N n n n exp 2α+E q N n + Var q N 2 2Vη+E q N n Var q N w n 2 2η +E q N n w n M-step Teh et al stoppe at the ariational E-step. Here, we will consier how to optimize α an η. In the M-step, we see to maximize Eq. 26 with respect to α an η. Keeping only the first term, which contains all the terms that o not contain α, we get E q pz,w α,η = E q N Kα+m+ N α+m N Vη+m+ N 6 η +m 62 5

6 .5. Optimizeα Let s consier the terms withα: L α = E q N Kα+m+ = = = N N N Kα+m+E q Kα+m+ Kα+m+ Differentiating with respect to α, we get N N qn = n N n= α+m α+m n α+m 65 qn nα+n 66 N n= L α α = N K Kα+m + Differentiating one more time, we get qn n α+n N n= 67 2 L α α 2 = N K 2 Kα+m 2 qn n α+n 2 68 N n= We can use the Hessian an Graient to compute the Newton-Raphson upate. To compute qn n, note that N = n Iz n =, which we can approximate by a Gaussian with mean n φ n an ariance n φ n φ n..5.2 Optimizeη Let s consier the terms withη an enoting N = N i.e., total number of wors in all the ocuments, we get L η = E q N Vη +m+ N η +m 69 = N n qn = n Vη +m+ N n qn = n η +m n= n= 70 = N nvη n=qn +n + N qn nη +n n= 7 Differentiating with respect to α, we get L η η = N V qn n Vη +n + n= Differentiating one more time, we get N qn n η +n n= 72 2 L η η 2 = N V 2 qn n Vη+n 2 n= N qn n η +n 2 73 n= We can use the Hessian an Graient to compute the Newton-Raphson upate. To compute qn n, note that N = n Iz n =, which we can approximate by a Gaussian with mean n φ n an ariance n φ n φ n. To compute qn n, note that N = n Iz n = Iw n =, which we can approximate by a Gaussian with mean n φ niw n = an ariance n φ n φ n Iw n = 6

7 .6 Alternatie M-step As an alternatie approach, we can useφ n to compute γ = α + N n= λ = η + φ n N φ n wn, n= which we can then use to upate α an η exactly the same way as the original LDA Blei et al., This is theoretically not as goo as the preious section because we are using point estimates of θ an β instea of integrating them out. But the estimates of φ shoul be better than the original LDA Blei et al., 2003, so maybe it will perform better? Howeer, there is no theoretical guarantees unlie the ariational E-step. 7

Collapsed Variational Inference for HDP

Collapsed Variational Inference for HDP Collapse Variational Inference for HDP Yee W. Teh Davi Newman an Max Welling Publishe on NIPS 2007 Discussion le by Iulian Pruteanu Outline Introuction Hierarchical Bayesian moel for LDA Collapse VB inference

More information

Lecture 2: Correlated Topic Model

Lecture 2: Correlated Topic Model Probabilistic Moels for Unsupervise Learning Spring 203 Lecture 2: Correlate Topic Moel Inference for Correlate Topic Moel Yuan Yuan First of all, let us make some claims about the parameters an variables

More information

Collapsed Gibbs and Variational Methods for LDA. Example Collapsed MoG Sampling

Collapsed Gibbs and Variational Methods for LDA. Example Collapsed MoG Sampling Case Stuy : Document Retrieval Collapse Gibbs an Variational Methos for LDA Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 7 th, 0 Example

More information

LDA Collapsed Gibbs Sampler, VariaNonal Inference. Task 3: Mixed Membership Models. Case Study 5: Mixed Membership Modeling

LDA Collapsed Gibbs Sampler, VariaNonal Inference. Task 3: Mixed Membership Models. Case Study 5: Mixed Membership Modeling Case Stuy 5: Mixe Membership Moeling LDA Collapse Gibbs Sampler, VariaNonal Inference Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox May 8 th, 05 Emily Fox 05 Task : Mixe

More information

CS Lecture 18. Topic Models and LDA

CS Lecture 18. Topic Models and LDA CS 6347 Lecture 18 Topic Models and LDA (some slides by David Blei) Generative vs. Discriminative Models Recall that, in Bayesian networks, there could be many different, but equivalent models of the same

More information

Combining Time Series and Cross-sectional Data for Current Employment Statistics Estimates 1

Combining Time Series and Cross-sectional Data for Current Employment Statistics Estimates 1 JSM015 - Surey Research Methos Section Combining Time Series an Cross-sectional Data for Current Employment Statistics Estimates 1 Julie Gershunskaya U.S. Bureau of Labor Statistics, Massachusetts Ae NE,

More information

LDA with Amortized Inference

LDA with Amortized Inference LDA with Amortied Inference Nanbo Sun Abstract This report describes how to frame Latent Dirichlet Allocation LDA as a Variational Auto- Encoder VAE and use the Amortied Variational Inference AVI to optimie

More information

Topic Models. Charles Elkan November 20, 2008

Topic Models. Charles Elkan November 20, 2008 Topic Models Charles Elan elan@cs.ucsd.edu November 20, 2008 Suppose that we have a collection of documents, and we want to find an organization for these, i.e. we want to do unsupervised learning. One

More information

Topic Modeling: Beyond Bag-of-Words

Topic Modeling: Beyond Bag-of-Words Hanna M. Wallach Cavenish Laboratory, University of Cambrige, Cambrige CB3 0HE, UK hmw26@cam.ac.u Abstract Some moels of textual corpora employ text generation methos involving n-gram statistics, while

More information

Note 1: Varitional Methods for Latent Dirichlet Allocation

Note 1: Varitional Methods for Latent Dirichlet Allocation Technical Note Series Spring 2013 Note 1: Varitional Methods for Latent Dirichlet Allocation Version 1.0 Wayne Xin Zhao batmanfly@gmail.com Disclaimer: The focus of this note was to reorganie the content

More information

Variance Reduction for Stochastic Gradient Optimization

Variance Reduction for Stochastic Gradient Optimization Variance Reduction for Stochastic Gradient Optimization Chong Wang Xi Chen Alex Smola Eric P. Xing Carnegie Mellon Uniersity, Uniersity of California, Berkeley {chongw,xichen,epxing}@cs.cmu.edu alex@smola.org

More information

Study Notes on the Latent Dirichlet Allocation

Study Notes on the Latent Dirichlet Allocation Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection

More information

Sparse Stochastic Inference for Latent Dirichlet Allocation

Sparse Stochastic Inference for Latent Dirichlet Allocation Sparse Stochastic Inference for Latent Dirichlet Allocation David Mimno 1, Matthew D. Hoffman 2, David M. Blei 1 1 Dept. of Computer Science, Princeton U. 2 Dept. of Statistics, Columbia U. Presentation

More information

Another Walkthrough of Variational Bayes. Bevan Jones Machine Learning Reading Group Macquarie University

Another Walkthrough of Variational Bayes. Bevan Jones Machine Learning Reading Group Macquarie University Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University 2 Variational Bayes? Bayes Bayes Theorem But the integral is intractable! Sampling Gibbs, Metropolis

More information

Topic Modelling and Latent Dirichlet Allocation

Topic Modelling and Latent Dirichlet Allocation Topic Modelling and Latent Dirichlet Allocation Stephen Clark (with thanks to Mark Gales for some of the slides) Lent 2013 Machine Learning for Language Processing: Lecture 7 MPhil in Advanced Computer

More information

Calculus and optimization

Calculus and optimization Calculus an optimization These notes essentially correspon to mathematical appenix 2 in the text. 1 Functions of a single variable Now that we have e ne functions we turn our attention to calculus. A function

More information

Generative Clustering, Topic Modeling, & Bayesian Inference

Generative Clustering, Topic Modeling, & Bayesian Inference Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

Homework 2 Solutions EM, Mixture Models, PCA, Dualitys

Homework 2 Solutions EM, Mixture Models, PCA, Dualitys Homewor Solutions EM, Mixture Moels, PCA, Dualitys CMU 0-75: Machine Learning Fall 05 http://www.cs.cmu.eu/~bapoczos/classes/ml075_05fall/ OUT: Oct 5, 05 DUE: Oct 9, 05, 0:0 AM An EM algorithm for a Mixture

More information

. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences.

. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences. S 63 Lecture 8 2/2/26 Lecturer Lillian Lee Scribes Peter Babinski, Davi Lin Basic Language Moeling Approach I. Special ase of LM-base Approach a. Recap of Formulas an Terms b. Fixing θ? c. About that Multinomial

More information

On Topic Evolution. Eric P. Xing School of Computer Science Carnegie Mellon University Technical Report: CMU-CALD

On Topic Evolution. Eric P. Xing School of Computer Science Carnegie Mellon University Technical Report: CMU-CALD On Topic Evolution Eric P. Xing School of Computer Science Carnegie Mellon University epxing@cs.cmu.eu Technical Report: CMU-CALD-05-5 December 005 Abstract I introuce topic evolution moels for longituinal

More information

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures

More information

(x,y) 4. Calculus I: Differentiation

(x,y) 4. Calculus I: Differentiation 4. Calculus I: Differentiation 4. The eriatie of a function Suppose we are gien a cure with a point lying on it. If the cure is smooth at then we can fin a unique tangent to the cure at : If the tangent

More information

An Introduction to Expectation-Maximization

An Introduction to Expectation-Maximization An Introduction to Expectation-Maximization Dahua Lin Abstract This notes reviews the basics about the Expectation-Maximization EM) algorithm, a popular approach to perform model estimation of the generative

More information

Self-normalized Martingale Tail Inequality

Self-normalized Martingale Tail Inequality Online-to-Confience-Set Conversions an Application to Sparse Stochastic Banits A Self-normalize Martingale Tail Inequality The self-normalize martingale tail inequality that we present here is the scalar-value

More information

A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation

A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation THIS IS A DRAFT VERSION. FINAL VERSION TO BE PUBLISHED AT NIPS 06 A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee Whye Teh School of Computing National University

More information

A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation

A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee Whye Teh Gatsby Computational Neuroscience Unit University College London 17 Queen Square, London WC1N 3AR, UK ywteh@gatsby.ucl.ac.uk

More information

Linear and quadratic approximation

Linear and quadratic approximation Linear an quaratic approximation November 11, 2013 Definition: Suppose f is a function that is ifferentiable on an interval I containing the point a. The linear approximation to f at a is the linear function

More information

under the null hypothesis, the sign test (with continuity correction) rejects H 0 when α n + n 2 2.

under the null hypothesis, the sign test (with continuity correction) rejects H 0 when α n + n 2 2. Assignment 13 Exercise 8.4 For the hypotheses consiere in Examples 8.12 an 8.13, the sign test is base on the statistic N + = #{i : Z i > 0}. Since 2 n(n + /n 1) N(0, 1) 2 uner the null hypothesis, the

More information

different formulas, depending on whether or not the vector is in two dimensions or three dimensions.

different formulas, depending on whether or not the vector is in two dimensions or three dimensions. ectors The word ector comes from the Latin word ectus which means carried. It is best to think of a ector as the displacement from an initial point P to a terminal point Q. Such a ector is expressed as

More information

Inferring dynamic architecture of cellular networks using time series of gene expression, protein and metabolite data. Supplementary Information

Inferring dynamic architecture of cellular networks using time series of gene expression, protein and metabolite data. Supplementary Information Inferring ynamic architecture of cellular networks using time series of gene expression, protein an metabolite ata Euaro Sontag, Anatoly Kiyatkin an Boris N. Kholoenko *, Department of Mathematics, Rutgers

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Applying LDA topic model to a corpus of Italian Supreme Court decisions

Applying LDA topic model to a corpus of Italian Supreme Court decisions Applying LDA topic model to a corpus of Italian Supreme Court decisions Paolo Fantini Statistical Service of the Ministry of Justice - Italy CESS Conference - Rome - November 25, 2014 Our goal finding

More information

Lower Bounds for the Smoothed Number of Pareto optimal Solutions

Lower Bounds for the Smoothed Number of Pareto optimal Solutions Lower Bouns for the Smoothe Number of Pareto optimal Solutions Tobias Brunsch an Heiko Röglin Department of Computer Science, University of Bonn, Germany brunsch@cs.uni-bonn.e, heiko@roeglin.org Abstract.

More information

Fundamental Laws of Motion for Particles, Material Volumes, and Control Volumes

Fundamental Laws of Motion for Particles, Material Volumes, and Control Volumes Funamental Laws of Motion for Particles, Material Volumes, an Control Volumes Ain A. Sonin Department of Mechanical Engineering Massachusetts Institute of Technology Cambrige, MA 02139, USA August 2001

More information

Introduction to Bayesian inference

Introduction to Bayesian inference Introduction to Bayesian inference Thomas Alexander Brouwer University of Cambridge tab43@cam.ac.uk 17 November 2015 Probabilistic models Describe how data was generated using probability distributions

More information

Topic 7: Convergence of Random Variables

Topic 7: Convergence of Random Variables Topic 7: Convergence of Ranom Variables Course 003, 2016 Page 0 The Inference Problem So far, our starting point has been a given probability space (S, F, P). We now look at how to generate information

More information

Latent Dirichlet Allocation (LDA)

Latent Dirichlet Allocation (LDA) Latent Dirichlet Allocation (LDA) D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3:993-1022, January 2003. Following slides borrowed ant then heavily modified from: Jonathan Huang

More information

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments 2 Conference on Information Sciences an Systems, The Johns Hopkins University, March 2, 2 Time-of-Arrival Estimation in Non-Line-Of-Sight Environments Sinan Gezici, Hisashi Kobayashi an H. Vincent Poor

More information

AN M/M/1 QUEUEING SYSTEM WITH INTERRUPTED ARRIVAL STREAM K.C. Madan, Department of Statistics, College of Science, Yarmouk University, Irbid, Jordan

AN M/M/1 QUEUEING SYSTEM WITH INTERRUPTED ARRIVAL STREAM K.C. Madan, Department of Statistics, College of Science, Yarmouk University, Irbid, Jordan REVISTA INVESTIGACION OPERACIONAL Vol., No., AN M/M/ QUEUEING SYSTEM WITH INTERRUPTED ARRIVAL STREAM K.C. Maan, Department of Statistics, College of Science, Yarmouk University, Irbi, Joran ABSTRACT We

More information

Exponential families also behave nicely under conditioning. Specifically, suppose we write η = (η 1, η 2 ) R k R p k so that

Exponential families also behave nicely under conditioning. Specifically, suppose we write η = (η 1, η 2 ) R k R p k so that 1 More examples 1.1 Exponential families under conditioning Exponential families also behave nicely under conditioning. Specifically, suppose we write η = η 1, η 2 R k R p k so that dp η dm 0 = e ηt 1

More information

Optimization of Geometries by Energy Minimization

Optimization of Geometries by Energy Minimization Optimization of Geometries by Energy Minimization by Tracy P. Hamilton Department of Chemistry University of Alabama at Birmingham Birmingham, AL 3594-140 hamilton@uab.eu Copyright Tracy P. Hamilton, 1997.

More information

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

CONTROL AND PERFORMANCE OF A NINE PHASE SYNCHRONOUS RELUCTANCE DRIVE

CONTROL AND PERFORMANCE OF A NINE PHASE SYNCHRONOUS RELUCTANCE DRIVE CONTRO AN PERFORMANCE OF A NINE PHASE SYNCHRONOUS REUCTANCE RIVE Abstract C.E. Coates*,. Platt** an V.J. Gosbell ** * epartment of Electrical an Computer Engineering Uniersity of Newcastle ** epartment

More information

Euler equations for multiple integrals

Euler equations for multiple integrals Euler equations for multiple integrals January 22, 2013 Contents 1 Reminer of multivariable calculus 2 1.1 Vector ifferentiation......................... 2 1.2 Matrix ifferentiation........................

More information

Asymptotic Normality of an Entropy Estimator with Exponentially Decaying Bias

Asymptotic Normality of an Entropy Estimator with Exponentially Decaying Bias Asymptotic Normality of an Entropy Estimator with Exponentially Decaying Bias Zhiyi Zhang Department of Mathematics and Statistics Uniersity of North Carolina at Charlotte Charlotte, NC 28223 Abstract

More information

Dirichlet Enhanced Latent Semantic Analysis

Dirichlet Enhanced Latent Semantic Analysis Dirichlet Enhanced Latent Semantic Analysis Kai Yu Siemens Corporate Technology D-81730 Munich, Germany Kai.Yu@siemens.com Shipeng Yu Institute for Computer Science University of Munich D-80538 Munich,

More information

6 General properties of an autonomous system of two first order ODE

6 General properties of an autonomous system of two first order ODE 6 General properties of an autonomous system of two first orer ODE Here we embark on stuying the autonomous system of two first orer ifferential equations of the form ẋ 1 = f 1 (, x 2 ), ẋ 2 = f 2 (, x

More information

JUST THE MATHS UNIT NUMBER DIFFERENTIATION 2 (Rates of change) A.J.Hobson

JUST THE MATHS UNIT NUMBER DIFFERENTIATION 2 (Rates of change) A.J.Hobson JUST THE MATHS UNIT NUMBER 10.2 DIFFERENTIATION 2 (Rates of change) by A.J.Hobson 10.2.1 Introuction 10.2.2 Average rates of change 10.2.3 Instantaneous rates of change 10.2.4 Derivatives 10.2.5 Exercises

More information

IN the evolution of the Internet, there have been

IN the evolution of the Internet, there have been 1 Tag-Weighte Topic Moel For Large-scale Semi-Structure Documents Shuangyin Li, Jiefei Li, Guan Huang, Ruiyang Tan, an Rong Pan arxiv:1507.08396v1 [cs.cl] 30 Jul 2015 Abstract To ate, there have been massive

More information

UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics

UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics UC Berkeley Department of Electrical Engineering an Computer Science Department of Statistics EECS 8B / STAT 4B Avance Topics in Statistical Learning Theory Solutions 3 Spring 9 Solution 3. For parti,

More information

arxiv: v4 [math.pr] 27 Jul 2016

arxiv: v4 [math.pr] 27 Jul 2016 The Asymptotic Distribution of the Determinant of a Ranom Correlation Matrix arxiv:309768v4 mathpr] 7 Jul 06 AM Hanea a, & GF Nane b a Centre of xcellence for Biosecurity Risk Analysis, University of Melbourne,

More information

Physics 2A Chapter 3 - Motion in Two Dimensions Fall 2017

Physics 2A Chapter 3 - Motion in Two Dimensions Fall 2017 These notes are seen pages. A quick summary: Projectile motion is simply horizontal motion at constant elocity with ertical motion at constant acceleration. An object moing in a circular path experiences

More information

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013 Survey Sampling Kosuke Imai Department of Politics, Princeton University February 19, 2013 Survey sampling is one of the most commonly use ata collection methos for social scientists. We begin by escribing

More information

A. Exclusive KL View of the MLE

A. Exclusive KL View of the MLE A. Exclusive KL View of the MLE Lets assume a change-of-variable moel p Z z on the ranom variable Z R m, such as the one use in Dinh et al. 2017: z 0 p 0 z 0 an z = ψz 0, where ψ is an invertible function

More information

Introduction to Machine Learning

Introduction to Machine Learning How o you estimate p(y x)? Outline Contents Introuction to Machine Learning Logistic Regression Varun Chanola April 9, 207 Generative vs. Discriminative Classifiers 2 Logistic Regression 2 3 Logistic Regression

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Some Examples. Uniform motion. Poisson processes on the real line

Some Examples. Uniform motion. Poisson processes on the real line Some Examples Our immeiate goal is to see some examples of Lévy processes, an/or infinitely-ivisible laws on. Uniform motion Choose an fix a nonranom an efine X := for all (1) Then, {X } is a [nonranom]

More information

8.0 Definition and the concept of a vector:

8.0 Definition and the concept of a vector: Chapter 8: Vectors In this chapter, we will study: 80 Definition and the concept of a ector 81 Representation of ectors in two dimensions (2D) 82 Representation of ectors in three dimensions (3D) 83 Operations

More information

Factorized Multi-Modal Topic Model

Factorized Multi-Modal Topic Model Factorize Multi-Moal Topic Moel Seppo Virtanen 1, Yangqing Jia 2, Arto Klami 1, Trevor Darrell 2 1 Helsini Institute for Information Technology HIIT Department of Information an Compute Science, Aalto

More information

Analyzing Tensor Power Method Dynamics in Overcomplete Regime

Analyzing Tensor Power Method Dynamics in Overcomplete Regime Journal of Machine Learning Research 18 (2017) 1-40 Submitte 9/15; Revise 11/16; Publishe 4/17 Analyzing Tensor Power Metho Dynamics in Overcomplete Regime Animashree Ananumar Department of Electrical

More information

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback Journal of Machine Learning Research 8 07) - Submitte /6; Publishe 5/7 An Optimal Algorithm for Banit an Zero-Orer Convex Optimization with wo-point Feeback Oha Shamir Department of Computer Science an

More information

Topic Uncovering and Image Annotation via Scalable Probit Normal Correlated Topic Models

Topic Uncovering and Image Annotation via Scalable Probit Normal Correlated Topic Models Rochester Institute of Technology RIT Scholar Wors Theses Thesis/Dissertation Collections 5-2015 Topic Uncovering an Image Annotation via Scalable Probit Normal Correlate Topic Moels Xingchen Yu Follow

More information

On the Total Duration of Negative Surplus of a Risk Process with Two-step Premium Function

On the Total Duration of Negative Surplus of a Risk Process with Two-step Premium Function Aailable at http://pame/pages/398asp ISSN: 93-9466 Vol, Isse (December 7), pp 7 (Preiosly, Vol, No ) Applications an Applie Mathematics (AAM): An International Jornal Abstract On the Total Dration of Negatie

More information

Document and Topic Models: plsa and LDA

Document and Topic Models: plsa and LDA Document and Topic Models: plsa and LDA Andrew Levandoski and Jonathan Lobo CS 3750 Advanced Topics in Machine Learning 2 October 2018 Outline Topic Models plsa LSA Model Fitting via EM phits: link analysis

More information

Differentiation ( , 9.5)

Differentiation ( , 9.5) Chapter 2 Differentiation (8.1 8.3, 9.5) 2.1 Rate of Change (8.2.1 5) Recall that the equation of a straight line can be written as y = mx + c, where m is the slope or graient of the line, an c is the

More information

QF101: Quantitative Finance September 5, Week 3: Derivatives. Facilitator: Christopher Ting AY 2017/2018. f ( x + ) f(x) f(x) = lim

QF101: Quantitative Finance September 5, Week 3: Derivatives. Facilitator: Christopher Ting AY 2017/2018. f ( x + ) f(x) f(x) = lim QF101: Quantitative Finance September 5, 2017 Week 3: Derivatives Facilitator: Christopher Ting AY 2017/2018 I recoil with ismay an horror at this lamentable plague of functions which o not have erivatives.

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s

More information

An Alternative Characterization of Hidden Regular Variation in Joint Tail Modeling

An Alternative Characterization of Hidden Regular Variation in Joint Tail Modeling An Alternatie Characterization of Hidden Regular Variation in Joint Tail Modeling Grant B. Weller Daniel S. Cooley Department of Statistics, Colorado State Uniersity, Fort Collins, CO USA May 22, 212 Abstract

More information

Proof of SPNs as Mixture of Trees

Proof of SPNs as Mixture of Trees A Proof of SPNs as Mixture of Trees Theorem 1. If T is an inuce SPN from a complete an ecomposable SPN S, then T is a tree that is complete an ecomposable. Proof. Argue by contraiction that T is not a

More information

Least-Squares Regression on Sparse Spaces

Least-Squares Regression on Sparse Spaces Least-Squares Regression on Sparse Spaces Yuri Grinberg, Mahi Milani Far, Joelle Pineau School of Computer Science McGill University Montreal, Canaa {ygrinb,mmilan1,jpineau}@cs.mcgill.ca 1 Introuction

More information

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k A Proof of Lemma 2 B Proof of Lemma 3 Proof: Since the support of LL istributions is R, two such istributions are equivalent absolutely continuous with respect to each other an the ivergence is well-efine

More information

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012 CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING Text Data: Topic Model Instructor: Yizhou Sun yzsun@cs.ucla.edu December 4, 2017 Methods to be Learnt Vector Data Set Data Sequence Data Text Data Classification Clustering

More information

Information retrieval LSI, plsi and LDA. Jian-Yun Nie

Information retrieval LSI, plsi and LDA. Jian-Yun Nie Information retrieval LSI, plsi and LDA Jian-Yun Nie Basics: Eigenvector, Eigenvalue Ref: http://en.wikipedia.org/wiki/eigenvector For a square matrix A: Ax = λx where x is a vector (eigenvector), and

More information

Convergence of Random Walks

Convergence of Random Walks Chapter 16 Convergence of Ranom Walks This lecture examines the convergence of ranom walks to the Wiener process. This is very important both physically an statistically, an illustrates the utility of

More information

Latent Dirichlet Bayesian Co-Clustering

Latent Dirichlet Bayesian Co-Clustering Latent Dirichlet Bayesian Co-Clustering Pu Wang 1, Carlotta Domeniconi 1, and athryn Blackmond Laskey 1 Department of Computer Science Department of Systems Engineering and Operations Research George Mason

More information

Note for plsa and LDA-Version 1.1

Note for plsa and LDA-Version 1.1 Note for plsa and LDA-Version 1.1 Wayne Xin Zhao March 2, 2011 1 Disclaimer In this part of PLSA, I refer to [4, 5, 1]. In LDA part, I refer to [3, 2]. Due to the limit of my English ability, in some place,

More information

Reconstructions for some coupled-physics inverse problems

Reconstructions for some coupled-physics inverse problems Reconstructions for some couple-physics inerse problems Guillaume al Gunther Uhlmann March 9, 01 Abstract This letter announces an summarizes results obtaine in [8] an consiers seeral natural extensions.

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

A Shortest-Path Algorithm for the Departure Time and Speed Optimization Problem

A Shortest-Path Algorithm for the Departure Time and Speed Optimization Problem A Shortest-Path Algorithm for the Departure Time an Spee Optimization Problem Anna Franceschetti Dorothée Honhon Gilbert Laporte Tom Van Woensel Noember 2016 CIRRELT-2016-64 A Shortest-Path Algorithm for

More information

Introduction To Machine Learning

Introduction To Machine Learning Introduction To Machine Learning David Sontag New York University Lecture 21, April 14, 2016 David Sontag (NYU) Introduction To Machine Learning Lecture 21, April 14, 2016 1 / 14 Expectation maximization

More information

A Course in Machine Learning

A Course in Machine Learning A Course in Machine Learning Hal Daumé III 12 EFFICIENT LEARNING So far, our focus has been on moels of learning an basic algorithms for those moels. We have not place much emphasis on how to learn quickly.

More information

Outline Lecture 2 2(32)

Outline Lecture 2 2(32) Outline Lecture (3), Lecture Linear Regression and Classification it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic

More information

APPROXIMATE THEORY-AIDED ROBUST EFFICIENT FACTORIAL FRACTIONS UNDER BASELINE PARAMETRIZATION

APPROXIMATE THEORY-AIDED ROBUST EFFICIENT FACTORIAL FRACTIONS UNDER BASELINE PARAMETRIZATION APPROXIMAE HEORY-AIDED ROBUS EFFICIEN FACORIAL FRACIONS UNDER BASELINE PARAMERIZAION Rahul Muerjee an S. Hua Inian Institute of Management Calcutta Department of Statistics an OR Joa, Diamon Harbour Roa

More information

Probabilistic Graphical Models for Image Analysis - Lecture 4

Probabilistic Graphical Models for Image Analysis - Lecture 4 Probabilistic Graphical Models for Image Analysis - Lecture 4 Stefan Bauer 12 October 2018 Max Planck ETH Center for Learning Systems Overview 1. Repetition 2. α-divergence 3. Variational Inference 4.

More information

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Statistics for Applications Chapter 10: Generalized Linear Models (GLMs) 1/52 Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Components of a linear model The two

More information

7.1 Support Vector Machine

7.1 Support Vector Machine 67577 Intro. to Machine Learning Fall semester, 006/7 Lecture 7: Support Vector Machines an Kernel Functions II Lecturer: Amnon Shashua Scribe: Amnon Shashua 7. Support Vector Machine We return now to

More information

Mixtures of Multinomials

Mixtures of Multinomials Mixtures of Multinomials Jason D. M. Rennie jrennie@gmail.com September, 25 Abstract We consider two different types of multinomial mixtures, () a wordlevel mixture, and (2) a document-level mixture. We

More information

AN INTRODUCTION TO TOPIC MODELS

AN INTRODUCTION TO TOPIC MODELS AN INTRODUCTION TO TOPIC MODELS Michael Paul December 4, 2013 600.465 Natural Language Processing Johns Hopkins University Prof. Jason Eisner Making sense of text Suppose you want to learn something about

More information

Deep Variational Inference. FLARE Reading Group Presentation Wesley Tansey 9/28/2016

Deep Variational Inference. FLARE Reading Group Presentation Wesley Tansey 9/28/2016 Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is Variational Inference? What is Variational Inference? Want to estimate some distribution, p*(x) p*(x) What is

More information

3.6. Let s write out the sample space for this random experiment:

3.6. Let s write out the sample space for this random experiment: STAT 5 3 Let s write out the sample space for this ranom experiment: S = {(, 2), (, 3), (, 4), (, 5), (2, 3), (2, 4), (2, 5), (3, 4), (3, 5), (4, 5)} This sample space assumes the orering of the balls

More information

The Press-Schechter mass function

The Press-Schechter mass function The Press-Schechter mass function To state the obvious: It is important to relate our theories to what we can observe. We have looke at linear perturbation theory, an we have consiere a simple moel for

More information

Probablistic Graphical Models, Spring 2007 Homework 4 Due at the beginning of class on 11/26/07

Probablistic Graphical Models, Spring 2007 Homework 4 Due at the beginning of class on 11/26/07 Probablistic Graphical odels, Spring 2007 Homework 4 Due at the beginning of class on 11/26/07 Instructions There are four questions in this homework. The last question involves some programming which

More information

Statistical Debugging with Latent Topic Models

Statistical Debugging with Latent Topic Models Statistical Debugging with Latent Topic Models David Andrzejewski, Anne Mulhern, Ben Liblit, Xiaojin Zhu Department of Computer Sciences University of Wisconsin Madison European Conference on Machine Learning,

More information

Thermal conductivity of graded composites: Numerical simulations and an effective medium approximation

Thermal conductivity of graded composites: Numerical simulations and an effective medium approximation JOURNAL OF MATERIALS SCIENCE 34 (999)5497 5503 Thermal conuctivity of grae composites: Numerical simulations an an effective meium approximation P. M. HUI Department of Physics, The Chinese University

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Thomas G. Dietterich tgd@eecs.oregonstate.edu 1 Outline What is Machine Learning? Introduction to Supervised Learning: Linear Methods Overfitting, Regularization, and the

More information

Chapter 4. Electrostatics of Macroscopic Media

Chapter 4. Electrostatics of Macroscopic Media Chapter 4. Electrostatics of Macroscopic Meia 4.1 Multipole Expansion Approximate potentials at large istances 3 x' x' (x') x x' x x Fig 4.1 We consier the potential in the far-fiel region (see Fig. 4.1

More information

Table of Common Derivatives By David Abraham

Table of Common Derivatives By David Abraham Prouct an Quotient Rules: Table of Common Derivatives By Davi Abraham [ f ( g( ] = [ f ( ] g( + f ( [ g( ] f ( = g( [ f ( ] g( g( f ( [ g( ] Trigonometric Functions: sin( = cos( cos( = sin( tan( = sec

More information

2Algebraic ONLINE PAGE PROOFS. foundations

2Algebraic ONLINE PAGE PROOFS. foundations Algebraic founations. Kick off with CAS. Algebraic skills.3 Pascal s triangle an binomial expansions.4 The binomial theorem.5 Sets of real numbers.6 Surs.7 Review . Kick off with CAS Playing lotto Using

More information