On Topic Evolution. Eric P. Xing School of Computer Science Carnegie Mellon University Technical Report: CMU-CALD

Size: px
Start display at page:

Download "On Topic Evolution. Eric P. Xing School of Computer Science Carnegie Mellon University Technical Report: CMU-CALD"

Transcription

1 On Topic Evolution Eric P. Xing School of Computer Science Carnegie Mellon University Technical Report: CMU-CALD-05-5 December 005 Abstract I introuce topic evolution moels for longituinal epochs of wor ocuments. The moels employ marginally epenent latent state-space moels for evolving topic proportion istributions an topicspecific wor istributions; an either a logistic-normal-multinomial or a logistic-normal-poisson moel for ocument lielihoo. These moels allow posterior inference of latent topic themes over time, an topical clustering of longituinal ocument epochs. I erive a variational inference algorithm for nonconjugate generalize linear moels base on truncate Taylor approximation, an I also outline formulae for parameter estimation base on variational EM principle.

2 Introuction Text information, such as meia ocuments, journal articles an s, often come as temporal streams. Current information retrieval systems woring on corpora collecte over time mae little use of the time stamps associate with the ocuments. They often merely pool all the ocuments into a single collection, in which each ocument is treate as an ii sample from some topical istribution [Hofmann, 999; Blei et al., 003; Griffiths an Steyvers, 004]; or moel the topics of each time-specific epoch separately an then examine relationships among the inepenently inferre time-specific topics [Steyvers et al., 004]. In practice, topic themes that generate the ocuments can evolve over time, an there exist epenencies among ocuments over time. In this report, I evelop a principle statistical framewor for moeling topic evolution an extracting high-level insights of the topic history base on latent space ynamic processes, an I erive the formulae for posterior inference an parameter estimation. Topic Evolution Let D,..., D T represent a temporal series of corpus, where D t x Nt enotes the set of N t ocuments available at time t, x enotes a ocument consisting of wor sequence x,,..., x,n, an n n,,..., n,m enotes an M-imensional count vector corresponing to the frequencies of M wors efine by a fixe vocabulary in ocument. We assume that every ocument x can express multiple topics coming from a preefine topic space, an the weights of every topic can be represente by a normalize vector θ of fixe imension. Furthermore, we assume that each topic can be represente by a set of parameters that etermine how wors from a fixe vocabulary can be rawn in a topic-specific manner to compose the ocument for simplicity, here we assume a bag-of-wor moel for the wor-to-ocument relationship, so that topic-specific semantics only translate to measures on wor rates, but not to non-trivial syntactic grammars. Uner a topic evolution moel, the prior istributions of topic proportions of every ocument, an the representations of each of the topics themselves, are evolving over time. In the following, I present two topic evolution moels efine on two ifferent ins of topic representations, an erive the variational inference formulas in each case.. A Dynamic Logistic-Normal-Multinomial Moel In this moel we assume that each ocument is an amixture of topics resulting from a bag of topic-specific instances of wors, each of which is marginally a mixture of topics. Each topic, say topic, is represente by an M-imensional normalize wor frequency vector β, which parameterizes a topic-specific multinomial istribution of wor. Here is an outline of a generative process uner such a moel a graphical moel representation of this moel is illustrate in Figure : We assume that the topic proportion vector θ for each ocument follows a time-specific logistic normal prior LN µ t, Σ t, whose mean µ t is evolving over time accoring to a linear Gaussian moel: For simplicity, we assume that the Σ t s capturing time-specific topic correlations are inepenent across time. µ Normalν, Φ, sample the mean of the topic mixing prior at time. µ t NormalA µ t, Φ, sample the means of the topic mixing priors over time. θ LogisticNormal µ t, Σ t, Notice that the last step above can be broen into two sub-steps: γ Normal µ t, Σ t, for each ocument, sample a topic proportion vector for simplicity, in the sequel we will omit the time inex t an/or ocument inex when escribing a general law that applies to all time points an/or all ocuments.

3 θ, expγ,,,..., K. P expγ, Furthermore, ue to the normalizability constrain of the multinomial parameters, θ only has K egree of freeom. Thus, as escribe in etail in the sequel, we only nee to raw the first K components of γ from a K -imensional multivariate Gaussian, an leave γ K 0. But for simplicity, we omit this technicality in the forth coming general escription of our moel. We further assume that the representation of each topic, in this case a topic-specific multinomial vector β of wor frequencies, is also evolving over time. By efining β as a logistic transformation of a multivariate normal ranom vector η, we can moel the temporal evolution of β in a simplex via a linear Gaussian ynamics moel: η Normalι, Ψ, sample the topic at time. η NormalB η t, Ψ, sample topic over subsequent time points. β,w expη,w, w,..., M, compute wor probabilities via logistic transformation. Pw expη,w Now we assume that each occurrence of wor, e.g., the nth wor in ocument at time t, x,n, is rawn from a topic-specific wor istribution β, specifie by a latent topic inicator z,n. z,n Multinomialθ x,n z,n Multinomial β, sample the latent topic inicator again, for simplicity, inices t an will be omitte in the sequel where no confusion arises. sample the wor from a topic-specific wor istribution. In principle, we can use the above topic evolution moel to capture not only topic correlation among ocuments at a specific time as i in [Blei an Lafferty, 006], but also ynamic coupling i.e., coevolution of topics via covariance matrix Φ, an topic-specific wor-coupling via covariance matrices Ψ. In the simplest scenario, when A I, B I, Φ σi, an Ψ ρi, this moel reuces to ranom wal in both the topic spaces, an the topic-mixing space. Since in most realistic temporal series of corpus, both the proportions of topics, an the semantic representations of topics are unliely to be invariant over time, we expect that even a ranom wal topic evolution moel can provie a better fit of the ata than a static moel that ignores the time stamps of all ocuments.. A Dynamic Log-Normal-Poisson Moel The above topic evolution process assumes an amixture lielihoo moel for ocuments belonging to a specific time interval, an the amixing is realize at the wor level, i.e., the marginal probability of each wor in the ocument is efine by a mixture of topic-specific wor istributions. Now we present another text lielihoo moel employing a ifferent topic mixing mechanism, which can be also plugge into the topic evolution moel. Note that in a bag-of-wor moel all we observe are counts of wors in the ocuments. Instea of assuming each occurrence of a wor is sample from the topic-specific wor istribution, we can irectly assume that the total count n w of wor w is mae up of fractions each contribute by a specific topic accoring to a topic-specific Poisson istribution Poissonωθ τ w,, where ω enotes the length of the ocument, θ enotes the proportion of topic in the ocument as efine before, an τ w, is a rate measure for wor w associate with topic. Specifically, n w n w,, n w, Poissonωθ τ w,. It can be shown that uner this moel we have: n w Poissonω θ τ w, expn w logω θ τ w, ω θ τ w, Γn w +.

4 µ t µ t+ θ θ t+ z,n z t+,n x,n x t+,n M,t M,t+ N t N t+ β β t+ K Figure : A graphical moel representation of the ynamic logistic-normal-multinomial moel for topic evolution. Note that in the above setting for each wor w we have a row vector of rates each associate with a specific topic: τ w, τ w,,..., τ w,k. For each topic, we have a column vector of rates each associate with a specific wor, τ, τ,,..., τ M,. Unlie the multinomial topic moel parameterize by column-normalize topic matrix β [ β,..., β K ], the Poisson topic moel is parameterize by matrix τ [ τ,,..., τ,k ] that oes not have to be column- or row-normalize. Thus we can irectly use a Log-Normal istribution to moel τ, which is simpler than the logistic-normal istribution. This leas to the following generative moel for topic evolution assuming we are intereste in moeling cross-topic coupling of wor rates: µ Normalν, Φ, sample the mean of the topic mixing prior at time. µ t NormalA µ t, Φ, sample the means of the topic mixing priors over time. θ LogisticNormal µ t, Σ t, for each ocument, sample a topic proportion vector. ζ w Normal0, Ψ w, sample rates for wor w at time. ζ w τ w, NormalB wζ t w, Ψ w, sample rates for wor w over subsequent time points. expζ w,, w,..., M, compute wor rates. n,w Poissonω θ τ w, sample the wor counts. Figure illustrates a graphical moel representation of such a ynamic log-normal-poisson moel. 3 Variational Inference 3. Variational Inference for the Logistic-Normal-Multinomial Moel Uner the Logistic-Normal-Multinomial topic evolution moel, the complete lielihoo function can be written as follows: 3

5 µ t µ t+ θ θ t+ n,w n t+,w M M N t N t+ τ w τ t+ w M Figure : A graphical moel representation of the ynamic log-normal-poisson moel for topic evolution. Note that the topic representations evolve as M inepenent wor-rate vectors, each of which efines the rates of a wor of a fixe set of topics. pd, µ t, θ, η, z,n, p µ t p θ µ tp η pz,n θ px,n z,n T p µ ν, Φ T N t t N t n N µ ν, Φ T N t p µ t µ t t p θ µ t pz,n θ px,n z,n, η t N t n N µ t A µ t, Φ T t, η p η ι, Ψ LN θ µ t, Σ t t p η ηt N η ι, Ψ t N η B ηt, Ψ Multinomialz,n θ Multinomialx,n z,n, Logistic η. The posterior of µ t, θ, η, z,n uner the above moel is intractable, therefore we approximate p µ t, θ, η, z,n with a prouct of simpler marginals, each on a cluster of latent variables: q q µ µ t q θ θ q η η q zz,n. Base on the generalize mean fiel theorem [Xing et al., 003], the optimal parameterization of each marginal, q Θ, can be erive by plugging the generalize mean fiel GMF messages receive by the cluster of variables say, X C uner each marginal to the original conitional istribution of each variable cluster given its Marov blanet MB; the GMF messages can be thought of as surrogates of the epenent variables X MB in the Marov blanet of the cluster of variables uner the marginal e.g., X C, an they will be use to replace the original values of the epenent variables in the MB, e.g., px C X MB p X C GMFX MB. [Xing et al., 003] showe that in case of generalize linear moel, the generalize mean fiel message correspons to an expectation of the sufficient statistics of the relevant Marov blanet variables uner its associate GMF cluster marginal. In the sequel, we use S x qx to enote the GMF message ue to latent variable x; an the optimal GMF approximation to px C 4

6 is: q X C p X C S y qy : y X MB As a prelue for etaile erivations, we first rearrange some relevant local conitional istributions in our moel into the canonical form of generalize linear moels. As mentione before, the multinomial parameters θ are logistic transformations of elements of a multivariate normal vector γ: θ t e γ / l eγ l. In fact, since θ is a multinomial parameter vector, it has only K egree of freeom. Therefore, we only nee to moel a K imensional normal vector, an pa it with an vacuous element γ K 0. Uner this parameterization, the logistic transformation from γ to θ remains the same, but the inverse of this transformation taes a simple form: γ ln θ P K i θi ln θ θ K. Assuming that z is a normalize K-imensional ranom binary vector, that is, when z inicate the th event, we have z, z i 0, an i z i ; the exponential family representation of a multinomial istribution for a topic inicator z is: pz θ K exp z ln θ exp exp K K K z γ ln + z γ K e γ K z K ln + e γ z ln K + e γ. 3 For a collection of topic inicators z,n, we have the following conitional lielihoo at time t: K pz,n : n θ exp exp exp exp K n z,n, γ n, γ, K K, m γ n c γ m γ N c γ t n z,n, ln K + n, ln K + expγ, expγ, 4 where n, n z,n, is the number of wors from topic in ocument at time t, m enotes the row vector of total wor-counts from topics to K at time t; n n,,..., n,k m, n,k enotes the row vector of total wor-counts in ocument from all topics; is a column vector of all ones; an c γ ln + K expγ is a scalar etermine by γ. Similarly, the local conitional probability of the ata x,n, where x is also efine as a M-imensional norm- binary inicator vector, can be written as: px,n : n, z,n : n,, η K M K exp x,n,w z,n, η,w K exp K exp w,n M w K exp m η m η n,w η,w K K K,M,w n c η M w,n x,n,w z,n, ln M + n,w ln M + expη,w w w expη,w N t c η, 5 5

7 where n,w,n x,n,w z,n, is the count for wor w from topic at time t; m n,,..., n enotes the row vector of total wor-counts of all but the last wor of topic at time t; n enotes the row vector of counts of every wor generate from topic at time t; an c η,m,m m, n is a scalar etermine by η. Note that we have the following ientity: n t n n. With the above specifications of local conitional probability istributions, in the following we can write own one by one the GMF approximations to marginal posteriors of subsets of latent variables. 3.. We first show that the marginal posterior of µ t can be approximate by a re-parameterize state-space moel. q µ µ t p µ t S θ qθ p µ t S γ qγ π K / Φ exp / µ ν Φ µ ν T exp π K / Φ / N t t π K / Σ t / exp γ µ t Σ t T µ t A µ t Φ µ t A µ t t γ µ t 6 where γ γ,,..., γ,k is the expecte topic vector of ocument at time t, in which γ, enotes the expectation of γ, ln θ, θ,k uner variational marginal q γ γ. For simplicity, we efine y t γ as a short han for the expecte topic vector, an Y t γ Nt as a short han for all such vectors at time t. Note that the above Eq. 6 is a linear Gaussian SSM, except that at each time the output is not a single observation γ, but a set of observations γ Nt. It is well nown that uner a stanar SSM, the posterior istribution of the centroi µ t given the entire observation sequence is still a normal istribution, of which the mean an covariance matrix can be reaily estimate using the Kalman filtering KF an Rauch- Tung-Striebel RTS smoothing algorithms. Here we give the moifie Kalman filter measurement-upate equations that tae into account multiple rather than single output ata points. The RTS equations an the time-upate equations of KF is ientical to the stanar case for single output. Let ˆµ t t enote the mean of µ t conitione on partial sequence Y,..., Y t. The convariance matrix of µ t conitione on partial sequence Y,..., Y t is enote P t t ; that is: ˆµ t t E[ µ t Y,..., Y t ] P t t E[ µ t ˆµ t t µ t ˆµ t t Y,..., Y t ]. Similarly, we let ˆµ t+ t enotes the mean of µ t+ conitione on the partial sequence Y,..., Y t ; P t+ t enotes the covariance matrices of µ t+ t conitione of partial sequences Y,..., Y t ; an so on. Thus, the SSM inference formulae are as follows: Time upate: ˆµ t+ t Aˆµ t t 8 P t+ t AP t t + Φ 9 This can be erive using the fact that the posterior mean an covariance matrix of the mean of a normal istribution N µ, Σ given ata Y an prior of the mean N µ 0, Σ 0 is: Σ p nσ + Σ 0, µ p nσ + Σ 0 nσ ỹ + Σ 0 µ 0 7 6

8 Measurement upate: RTS smoothing: ˆµ t+ t+ ˆµ t+ t + P t+ t P t+ t + Σ t /N t γ t+ ˆµ t+ t 0 P t+ t+ P t+ t P t+ t P t+ t + Σ t /N t P t+ t, L t P t t A P t+ t ˆµ t T ˆµ t t + L t ˆµ t+ T ˆµ t+ t 3 P t T P t t + L t Pt+ T P t+ t L t 4 where γ t+ enote the sample mean of observations at time t + : γ t+ yt+. To estimate the state ynamic matrix A, we also nee to compute the cross-time convariance matrix about µ t an µ t conitione on complete sequence Y,..., Y T : P t,t T E[ µ t ˆµ t T µ t ˆµ t T Y,..., Y t ]. It can be shown that P t t T satisfies the following bacwar recursion: N t+ P t t T P t t L t + L t P t t+ T AP t t L t, 5 which is initialize by P T T T I K T AP T T, where K T P t+ t P t+ t +Σ t /N t is the Kalman gain matrix. 3.. Now we move on to the variational marginal q γ γ. q γ γ p γ S µ qµ, S z qz N t t N t t p γ S µ t qµ, S zt, qz π K / Σ t exp / γ µ t Σ t γ µ t exp m γ n c γ, 6 where n n,,..., n,k c.f. m, an n, n z,n, enotes the sum of expecte topicspecific counts for each wor in ocument uner q z z,n which will be specifie in the sequel. Due to the complexity of c γ ln + K expγ,, the q γ efine above is not integratable uring inference e.g., for computing an expectation of γ. In [Blei an Lafferty, 006], an variational approximation base on optimizing a relaxe boun of the KL-ivergence between q an p is use to approximate q γ. In the following, we present a ifferent approach that overcome the non-conjugacy between the multinomial lielihoo an the logistic-normal prior, an mae the joint tractable. We see a normal approximation to q γ using Taylor expansion technique. Let s tae a secon-orer Taylor expansion of c γ with respect to γ: c γ i e γi + K eγ c e γi γ i γ i γ i + K K + c γ i γ j g i eγ, i eγ eγi e γi + + K e γ i + K eγ eγ K h ii + eγ e γi γ j + e γj e γi K eγ K h ij. 7 + eγ 7 e γ i

9 Therefore, the n-orer Taylor series of c γ ln + K expγ with respect to some ˆγ is: c γ cˆγ + g γ γ ˆγ + γ ˆγ H γ γ ˆγ + R, 8 where g γ γ c γ g,..., g K enotes the graient vector of c γ, H γ h ij enotes the Hessian matrix, an R is the Lagrange remainer. Assuming that ˆγ is close enough to the true γ for each ocument at each time e.g., the posterior mean of all γ, we have: c γ cˆγ + g γˆγ γ ˆγ + γ ˆγ H γ ˆγ γ ˆγ. 9 It can be shown that since c γ is convex w.r.t. γ, the above approximation is a n-orer polymornial lower boun of c γ [Joran et al., 999]. Now we have: p γ S µ t qµ, S zt, qz π K / Σ t exp / γ π K / Σ t exp / N t exp cˆγ + g γˆγ γ γ µ t Σ t γ µ t exp m γ Σ t γ ˆγ + γ γ Σ t + NH t γ ˆγ γ + γ Σ t µ t µ t Σ t µ t + γ m ˆγ H γ ˆγ γ Σ + γ t ˆγ n c γ µ t + m N t g γ ˆγ + N t H γ ˆγˆγ 0 Rearranging the terms, an setting ˆγ µ t ˆµ t T from.., we have the following multivariate-normal approximation: where 3..3 p γ S µ t qµ, S zt, qz N γ µt, Σ t, Σ t inv Σ t + NH t γ ˆµ t T, µ t Σ t Σ t ˆµ t T + NH t γ ˆµ t T ˆµ t T + m N g t γ ˆµ t T ˆµ t T + Σ t m N g t γ ˆµ t T. 3 Now we compute variational marginal q β β. q β β K p β,..., β T S z qz 4 This is a prouct of conitionally inepenent SSMs given sufficient statistics S γ qγ, S z qz, moel parameters ι, Ψ, B, an ata D. The variational marginal of a single chain of a evolving topic represente in pre-transforme normal vector η,..., ηt is: p η,..., ηt S z qz T π M / Φ / t exp m η exp η ι Ψ η ι T t η B η t Ψ η B η t n c η. 5 8

10 Recall that we can approximate c η with its secon-orer truncate Taylor series with respect to an estimate of η, say : c η cˆ η + gˆ η η ˆ η + η ˆ η Hˆ η η ˆ η. 6 In the following we first outline a normal approximation to a multinomial istribution of count vector n. In particular, we assume that the multinomial parameters are logistic transformations of a real vector η: M pn η exp n η + N ln η w w exp n Ng + ˆη NHη η NHη M ˆη NHˆη N ln + ˆη w + N ˆη g exp NH n Ng + NHˆη η NH NH n Ng + NHˆη η M N ln + ˆη w + n Ng NH n Ng + n ˆη w N v η, NH 7 where g η c ηˆη, H Hessian η c ηˆη, v NH n Ng + NHˆη ˆη + NH n Ng; an the Taylor expansion point ˆη can be set at the empirical estimate or just a guess of η. With this approximation, we can approximate Eq. 5 by an SSM with linear Gaussian emission moels: p η,..., ηt S z qz T π M / Φ / t N η ι, Φ exp η exp m η N cˆ η N gˆ η η t N η B η t, Φ ι Ψ η ι t T t η ˆ η N η w B η t Ψ η B η t ˆ η Hˆ η η ˆ η N v η, NH, 8 where the observation v ˆη + N Hˆ η m N gˆ η, ˆη can be set to be its estimate in the previous roun of GMF iteration see..5; an the expectation of the count vector m an total wor count N associate with topic at time t can be compute using variation marginal q z, specifically, n,w,n x,n,w z,n, is the expecte count for wor w from topic at time t; m n,,..., n,m enotes the expecte row vector of total wor-counts of all but the last wor of topic at time t; n m, n,m enotes the row vector of counts of every wor generate from topic at time t; N n. Now the posterior of η can be approximate by a multivariate Gaussian N ˆη,t T, P,t T, here we give the formula for the KF time/measurement upate, an the RTS smoothing of the topics istribution parameters at time t: Time upate: ˆη,t+ t B ˆη,t t P,t+ t B P,t t + Ψ 9 9

11 Measurement upate: RTS smoothing: ˆη,t+ t+ ˆη,t+ t + P,t+ t P,t+ t + NH v ˆη,t+ t P,t+ t+ P,t+ t P,t+ t P,t+ t + NH P,t+ t 30 L,t P,t t B P,t+ t ˆη,t T ˆη,t t + L,t ˆη,t+ T ˆη,t+ t P,t T P,t t + L,t P,t+ T P,t+ t L,t P,t t T P,t t L,t + L,t P,t t+ T B P,t t L,t, 3 We can estimate the parameters ι, Ψ an B using an EM algorithm Now we compute variational marginal q z z,n.,n,t pz D, S γ qγ, S η qη pz,n x,n, Sγ q γ, Sη q η 3 For notational simplicity, we omit inices, an give bellow a generic formula for the variational approximation for singleton marginal: Recall that z is a unit base vector, thus z z. Similar efinition applies to x. pz x, S γ, S η pz S γ px z, S η exp z γ c γ + x Ξ z z c η, 33 where γ follows a Gaussian istribution, an Ξ is an M K matrix whose column vectors η also follows a Gaussian istribution. The close form solutions of c γ an c η uner normal istribution is not available. Note that the multinomial parameter vector θ P π π,..., π K, π K, where vector π π,..., π K follows a multivariate log-normal istribution, an π K. To better approximate c γ an similarly also c η, we rewrite c γ ln + K exp γ c π ln π, where π is the unnormalize version of multinomial parameter vector θ. Now we expan c π w.r.t. to π aroun the mean of π up to the secon orer. The graient of c π w.r.t. to π is: The Hessian of c π w.r.t. to π is: π i π i ln π ln π π i π π ln π π g π. 34 π i π j ln π π H π ln π π π i π π π, 35 0

12 where represents an outer prouct of the two one-vectors. Therefore, Let ˆπ E[ π] uner π LN K µ, Σ, then we have: c π ln ˆπ + g ππ ˆπ + π ˆπ H π π ˆπ. 36 c π ln E[ π] + Tr H π E[ π] E[ π E[ π] π E[ π] ] ln E[ π] + Tr H π E[ π] ˆΣ π, 37 where ˆΣ π is the covariance of π uner a multivariate log-normal istribution. It can be shown that [Kleiber an Kotz, 003]: covπ i, π j exp µ i + µ j + σ ii + σ jj expσ ij 38 Eπ i exp µ i + σ ii 39 E[ π] exp µ + DiagΣ. 40 This computation can be applie to the expectation of both the pre-normalize topic proportion vector π an the pre-normalize topic-specific wor frequency vector ξ corresponing to β. So, we have pz x, S γ, S η exp z γ c π + x Ξ z z c ξ exp z E[ γ] ln E[ π] Tr H π E[ π] ˆΣ π +x E[Ξ]z z ln E[ ξ ] Tr H ξ E[ ξ ] ˆΣ ξ, 4 where H ξ E[ ξ ] E[ ξ ], E[ ξ ] an ˆΣ ξ are the mean an covariance of ξ uner a log-normal istribution as in Eqs , E[Ξ] consists of column-by-column expectation, E[ η ], uner a normal istribution of η. Note that in the above computation, one must be careful about appropriately recovering the K-imensional multinomial istribution of z from the K -imensional pre-tranforme natural parameter vector γ, an the M K imensional pre-tranforme natural parameter matrix Ξ η,..., η K. I omit etails of such manipulations. We nee to compute the above singleton marginal for each z,n where π LN µ, Σ, γ N µ, Σ, ξ 3..5 Summary given γ, π, η, an ξ ; LNˆη,t T, P,t T, an η Nˆη,t T, P,t T. The above 4 variational marginals are couple an thus constitute a set of fixe-point equations, computing the GMF message for one marginal require the marginals of other sets of variables. Thus, we can iteratively upate each marginal until convergence i.e., all the GMF messages stop changing. This approximation scheme can be shown to minimize the KL ivergence between the variational posterior an the true posterior of latent variables. We can use a variational EM scheme to estimate the parameters in our moel, which are essentially the SSM parameters. Operationally, VEM is no ifferent from a stanar EM for SSM we have observation sequence γ t N t γ for the topic mixing SSM as efine by q µ, an observation sequence u for each of the K topic representation i.e., wor-frequency SSMs as efine by q η ; an we can use the stanar learning learning rules for SSM parameter estimation.

13 In the E step, we use Eqs.3-4, an Eqs.3-3 to estimate the expecte sufficient statistics of the latent states; In the M step, we upate the parameters Φ, A, Σ t of the topic mixing SSM, an Ψ, B of the topic representation SSM see [Ghahramani an Hinton, 996; Ghahramani an Hinton, 998] for etails. 3. Variational Inference for Log-Normal-Poisson moel Now we nee to approximate n w PoissonN θ τ w, expn w lnn θ τ w, N θ τ w, Γn w +. Again, let π enote the unnormalize version of the multinomial parameter vector θ. Jacobian of vector θ with respect to π is: θ i π π i π i i π π π θ i π i π j π π π π Jθ π π π π π K π K K π Note that the 4 From this erivation, we now that similarly, θ π i θ τ w π i ln Nθ τ w π i π π,..., i π π ei π π,..., π K π π ei π τw π τw,i π π τ w N θ τ w Nθ π τw,i τ w π i θ τ w π π τ w τw,i θ θ τ w π τ w τw,i / θ π τ w π τ w,i π τ w. 43 ln Nθ τ w π i π i ln Nθ τ w π i π j π i π τ w,i π τ w π + τ w,i π τ w π j π τ w,i π τ w π + τ w,iτ w,j π τ w. 44 Therefore, here is the matrix form of the graient an Hessian of the Poisson log-lielihoo with respect

14 to θ: π ln N θ τ w H π ln N θ τ w π π τ w τ w τ w, τ w, τ w, τ w, τ w,k τ w, τ w, τw, τ w, τ w,k π τ w π τ w, τ w,k τ w,k τ w,k τw,k π τ w τ w τ w π, 45 where τ w τ w represents an outer prouct of the two vectors. The graient an Hessian with respect to τ is: ln Nθ τ w θ i τ i In matrix form: ln N θ τ w θ τ w τ i τ i τ i θi θ τ w ln N θ τ w τ i τ j τ j θi θ τ w τ ln N θ τ w H τ ln N θ τ w π τ w θ θ i θ τ w θ iθ j θ τ w. 46 π τ w θ θ. 47 Assume that θ an τ are inepenent, i.e., covτ, θ 0, we have the following approximation of ln N θ τ w : ln N θ τ w ln N ˆθ ˆτ w + τ [ˆτ w ] τ w ˆτ w + π [ˆπ] π ˆπ + τ w ˆτ w H τ [ˆτ w ] τ w ˆτ w + π ˆπ H π [ˆπ] π ˆπ where ˆπ an ˆτ w are some estimates of the true π an τ. Note that uner this approximation, computing the expectation ln Nθ τ w uner q θ an q τ w can be one approximately in close-from by using the variational marginals of γ an ζ w. Now, the variational marginal for ζ w the inverse logistic-transformation of τ w, also nown as the natural parameter of the multinomial an γ the inverse logistic-transformation of θ can be erive from the following GMF approximations to the marginal posterior of ζ w an τ w, respectively. 48 p ζ w,..., ζ w T S θ qθ T π K/ Ψ w / t exp n w logω t exp ζ w Ψ w ζ w θ τ w, T t ζ w B wζ t w Ψ w ζ w B wζ t w ω t θ τ w, Γn w + ; 49 q θ 3

15 an q γ γ p γ S µ qµ, S τ qτ N t t N t t p γ S µ t qµ, S zt, qz π K / Σ t exp / γ exp logω t θ τ w, ω t n w q τ µ t Σ t γ µ t θ τ w, Γn w Note that by introucing the taylor approximation to ln N θ τ w, an using the laws for computing the means an covariance of τ w an π uner multivariate log-normal istribution i.e., Eqs , the expectation terms in the above equations can be approximately solve. Using similar techniques employe in., they can be approximate by stanar SSMs with Gaussian emissions. 4 Parameter Estimation As mentione before, we can use a variational EM scheme to estimate the parameters in our moel, which are essentially the SSM parameters. Operationally, VEM is no ifferent from a stanar EM for SSM we have observation sequence γ t N t γ for the topic mixing SSM as efine by q µ, an observation sequence u for each of the K topic representation i.e., wor-frequency SSMs as efine by q η ; an we can use the stanar learning learning rules for SSM parameter estimation. In the E step, we use Eqs.3-4, an Eqs.3-3 to estimate the expecte sufficient statistics of the latent states; In the M step, we upate the parameters Φ, A, Σ t of the topic mixing SSM, an Ψ, B of the topic representation SSM. Following [Ghahramani an Hinton, 996; Ghahramani an Hinton, 998], which gave etaile erivations for the MLE stanar SSM an switching SSM, bellow we give the relevant formulae of MLE for the parameters in our moel. Each of the estimate can be erive by taen the corresponing partial erivative of the expecte loglielohoo uner the our variational approximation to the true posterior, setting to zero an solving. Topic mixing ynamic matrix: A T V t t µ T V t µ, 5 where V t µ E[ µ t µ t Y,..., Y t ] not to be confuse with the center moment P t in Eq.??, an V t t µ E[ µ t µ t Y,..., Y t ]. From RTS smoother, it is easy to see: V t µ P t T + ˆµ t T ˆµ t T V t t µ P t t T + ˆµ t T ˆµ t T 5 where the posterior estimate of the self an cross-time covariance matrices P t T an P t,t T can be compute from Eqs Noise covariance matrix for topic mixing state: Φ T V t µ A T T V t t µ. 53 4

16 Output covariance matrix for topic mixing vectors: Σ t N t γ N ˆµ t T γ ˆµ t T. 54 t Topic representation i.e., topic-specific wor frequency vector ynamic matrix: B T V,t t η T V,t η, 55 where Noise covariance matrix for topic representation vector: V,t η P,t T + ˆη,t T ˆη,t T V,t t η P,t t T + ˆη,t T ˆη,t T 56 Ψ T V,t η B T T V,t t η. 57 We set the initial vectors ι an ν to be zero vectors instea of estimating them from the ata. Finally, note that what are given above are the most general form of transition an correlations of topics an wors. In practice, to avoi over-parameterization, we can choose to reuce, for example, the transition matrices B s an the covariance matrices Ψ s of the topic representations to be sparse or iagonal matrix to moel only ranom wal effects. 5 Conclusion In this report I introuce topic evolution moels for longituinal epochs of wor ocuments. The moels employ marginally epenent latent state-space moels for evolving topic proportion istributions an topicspecific wor istributions; an both a logistic-normal-multinomial an a logistic-normal-poisson moel for ocument lielihoo. These moels allow posterior inference of latent topic themes over time, an topical clustering of longituinal ocument epochs. I erive a variational inference algorithm for non-conjugate generalize linear moels base on truncate Taylor approximation, an I also outline formulae for parameter estimation base on variational EM principle. In the current moel, I assume all topics coexist over time, an no new topic will emerge over time. In a companion report, I present a birth-eath process moel that capture more complicate an realistic behaviors of topic evolution, such as aggregation, emergence, extinction, an split of topics over time. 6 BIBLIOGRAPHY References [Blei an Lafferty, 006] D. Blei an J. Lafferty. Correlate topic moels. In Avances in Neural Information Processing Systems 8, 006. [Blei et al., 003] D. Blei, A. Ng, an M. Joran. Latent irichlet allocation. Journal of Machine Learning Research, 3,

17 [Ghahramani an Hinton, 996] Z. Ghahramani an G. E. Hinton. Parameter estimation for linear ynamical systems. University of Toronto Technical Report CRG-TR-96-, 996. [Ghahramani an Hinton, 998] Z. Ghahramani an G. E. Hinton. Variational learning for switching statespace moels. Neural Computation, 4:963 96, 998. [Griffiths an Steyvers, 004] T. Griffiths an M. Steyvers. Fining scientific topics. Proc Natl Aca Sci U S A, 0 Suppl :58 535, 004. [Hofmann, 999] Thomas Hofmann. Probabilistic latent semantic inexing. In Proc. of the n Intl. ACM SIGIR conference, pages 50 57, 999. [Joran et al., 999] M. I. Joran, Z. Ghahramani, T. S. Jaaola, an L. K. Saul. An introuction to variational methos for graphical moels. In M. I. Joran, eitor, Learning in Graphical Moels, pages Kluwer Acaemic Publishers, 999. [Kleiber an Kotz, 003] C. Kleiber an S. Kotz. Statistical Size Distributions in Economics an Actuarial Sciences. Wiley InterScience, 003. [Steyvers et al., 004] M. Steyvers, P. Smyth, M. Rosen-Zvi, an T. Griffiths. Probabilistic author-topic moels for information iscovery. In The Tenth ACM SIGKDD International Conference on Knowlege Discovery an Data Mining, 004. [Xing et al., 003] E. P. Xing, M. I. Joran, an S. Russell. A generalize mean fiel algorithm for variational inference in exponential families. In Proceeings of the 9th Annual Conference on Uncertainty in AI,

Lecture 2: Correlated Topic Model

Lecture 2: Correlated Topic Model Probabilistic Moels for Unsupervise Learning Spring 203 Lecture 2: Correlate Topic Moel Inference for Correlate Topic Moel Yuan Yuan First of all, let us make some claims about the parameters an variables

More information

Collapsed Gibbs and Variational Methods for LDA. Example Collapsed MoG Sampling

Collapsed Gibbs and Variational Methods for LDA. Example Collapsed MoG Sampling Case Stuy : Document Retrieval Collapse Gibbs an Variational Methos for LDA Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 7 th, 0 Example

More information

LDA Collapsed Gibbs Sampler, VariaNonal Inference. Task 3: Mixed Membership Models. Case Study 5: Mixed Membership Modeling

LDA Collapsed Gibbs Sampler, VariaNonal Inference. Task 3: Mixed Membership Models. Case Study 5: Mixed Membership Modeling Case Stuy 5: Mixe Membership Moeling LDA Collapse Gibbs Sampler, VariaNonal Inference Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox May 8 th, 05 Emily Fox 05 Task : Mixe

More information

Topic Modeling: Beyond Bag-of-Words

Topic Modeling: Beyond Bag-of-Words Hanna M. Wallach Cavenish Laboratory, University of Cambrige, Cambrige CB3 0HE, UK hmw26@cam.ac.u Abstract Some moels of textual corpora employ text generation methos involving n-gram statistics, while

More information

Introduction to Machine Learning

Introduction to Machine Learning How o you estimate p(y x)? Outline Contents Introuction to Machine Learning Logistic Regression Varun Chanola April 9, 207 Generative vs. Discriminative Classifiers 2 Logistic Regression 2 3 Logistic Regression

More information

. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences.

. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences. S 63 Lecture 8 2/2/26 Lecturer Lillian Lee Scribes Peter Babinski, Davi Lin Basic Language Moeling Approach I. Special ase of LM-base Approach a. Recap of Formulas an Terms b. Fixing θ? c. About that Multinomial

More information

Table of Common Derivatives By David Abraham

Table of Common Derivatives By David Abraham Prouct an Quotient Rules: Table of Common Derivatives By Davi Abraham [ f ( g( ] = [ f ( ] g( + f ( [ g( ] f ( = g( [ f ( ] g( g( f ( [ g( ] Trigonometric Functions: sin( = cos( cos( = sin( tan( = sec

More information

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k A Proof of Lemma 2 B Proof of Lemma 3 Proof: Since the support of LL istributions is R, two such istributions are equivalent absolutely continuous with respect to each other an the ivergence is well-efine

More information

Least-Squares Regression on Sparse Spaces

Least-Squares Regression on Sparse Spaces Least-Squares Regression on Sparse Spaces Yuri Grinberg, Mahi Milani Far, Joelle Pineau School of Computer Science McGill University Montreal, Canaa {ygrinb,mmilan1,jpineau}@cs.mcgill.ca 1 Introuction

More information

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012 CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration

More information

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013 Survey Sampling Kosuke Imai Department of Politics, Princeton University February 19, 2013 Survey sampling is one of the most commonly use ata collection methos for social scientists. We begin by escribing

More information

Physics 5153 Classical Mechanics. The Virial Theorem and The Poisson Bracket-1

Physics 5153 Classical Mechanics. The Virial Theorem and The Poisson Bracket-1 Physics 5153 Classical Mechanics The Virial Theorem an The Poisson Bracket 1 Introuction In this lecture we will consier two applications of the Hamiltonian. The first, the Virial Theorem, applies to systems

More information

An Introduction to Expectation-Maximization

An Introduction to Expectation-Maximization An Introduction to Expectation-Maximization Dahua Lin Abstract This notes reviews the basics about the Expectation-Maximization EM) algorithm, a popular approach to perform model estimation of the generative

More information

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks A PAC-Bayesian Approach to Spectrally-Normalize Margin Bouns for Neural Networks Behnam Neyshabur, Srinah Bhojanapalli, Davi McAllester, Nathan Srebro Toyota Technological Institute at Chicago {bneyshabur,

More information

Analyzing Tensor Power Method Dynamics in Overcomplete Regime

Analyzing Tensor Power Method Dynamics in Overcomplete Regime Journal of Machine Learning Research 18 (2017) 1-40 Submitte 9/15; Revise 11/16; Publishe 4/17 Analyzing Tensor Power Metho Dynamics in Overcomplete Regime Animashree Ananumar Department of Electrical

More information

'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21

'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21 Large amping in a structural material may be either esirable or unesirable, epening on the engineering application at han. For example, amping is a esirable property to the esigner concerne with limiting

More information

Collapsed Variational Inference for LDA

Collapsed Variational Inference for LDA Collapse Variational Inference for LDA BT Thomas Yeo LDA We shall follow the same notation as Blei et al. 2003. In other wors, we consier full LDA moel with hyperparameters α anη onβ anθ respectiely, whereθparameterizes

More information

Introduction to the Vlasov-Poisson system

Introduction to the Vlasov-Poisson system Introuction to the Vlasov-Poisson system Simone Calogero 1 The Vlasov equation Consier a particle with mass m > 0. Let x(t) R 3 enote the position of the particle at time t R an v(t) = ẋ(t) = x(t)/t its

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

7.1 Support Vector Machine

7.1 Support Vector Machine 67577 Intro. to Machine Learning Fall semester, 006/7 Lecture 7: Support Vector Machines an Kernel Functions II Lecturer: Amnon Shashua Scribe: Amnon Shashua 7. Support Vector Machine We return now to

More information

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback Journal of Machine Learning Research 8 07) - Submitte /6; Publishe 5/7 An Optimal Algorithm for Banit an Zero-Orer Convex Optimization with wo-point Feeback Oha Shamir Department of Computer Science an

More information

A Review of Multiple Try MCMC algorithms for Signal Processing

A Review of Multiple Try MCMC algorithms for Signal Processing A Review of Multiple Try MCMC algorithms for Signal Processing Luca Martino Image Processing Lab., Universitat e València (Spain) Universia Carlos III e Mari, Leganes (Spain) Abstract Many applications

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

Homework 2 Solutions EM, Mixture Models, PCA, Dualitys

Homework 2 Solutions EM, Mixture Models, PCA, Dualitys Homewor Solutions EM, Mixture Moels, PCA, Dualitys CMU 0-75: Machine Learning Fall 05 http://www.cs.cmu.eu/~bapoczos/classes/ml075_05fall/ OUT: Oct 5, 05 DUE: Oct 9, 05, 0:0 AM An EM algorithm for a Mixture

More information

Text S1: Simulation models and detailed method for early warning signal calculation

Text S1: Simulation models and detailed method for early warning signal calculation 1 Text S1: Simulation moels an etaile metho for early warning signal calculation Steven J. Lae, Thilo Gross Max Planck Institute for the Physics of Complex Systems, Nöthnitzer Str. 38, 01187 Dresen, Germany

More information

Chapter 2 Lagrangian Modeling

Chapter 2 Lagrangian Modeling Chapter 2 Lagrangian Moeling The basic laws of physics are use to moel every system whether it is electrical, mechanical, hyraulic, or any other energy omain. In mechanics, Newton s laws of motion provie

More information

A. Exclusive KL View of the MLE

A. Exclusive KL View of the MLE A. Exclusive KL View of the MLE Lets assume a change-of-variable moel p Z z on the ranom variable Z R m, such as the one use in Dinh et al. 2017: z 0 p 0 z 0 an z = ψz 0, where ψ is an invertible function

More information

Information retrieval LSI, plsi and LDA. Jian-Yun Nie

Information retrieval LSI, plsi and LDA. Jian-Yun Nie Information retrieval LSI, plsi and LDA Jian-Yun Nie Basics: Eigenvector, Eigenvalue Ref: http://en.wikipedia.org/wiki/eigenvector For a square matrix A: Ax = λx where x is a vector (eigenvector), and

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Technical Report TTI-TR-2008-5 Multi-View Clustering via Canonical Correlation Analysis Kamalika Chauhuri UC San Diego Sham M. Kakae Toyota Technological Institute at Chicago ABSTRACT Clustering ata in

More information

Lower Bounds for the Smoothed Number of Pareto optimal Solutions

Lower Bounds for the Smoothed Number of Pareto optimal Solutions Lower Bouns for the Smoothe Number of Pareto optimal Solutions Tobias Brunsch an Heiko Röglin Department of Computer Science, University of Bonn, Germany brunsch@cs.uni-bonn.e, heiko@roeglin.org Abstract.

More information

Capacity Analysis of MIMO Systems with Unknown Channel State Information

Capacity Analysis of MIMO Systems with Unknown Channel State Information Capacity Analysis of MIMO Systems with Unknown Channel State Information Jun Zheng an Bhaskar D. Rao Dept. of Electrical an Computer Engineering University of California at San Diego e-mail: juzheng@ucs.eu,

More information

Lecture 2 Lagrangian formulation of classical mechanics Mechanics

Lecture 2 Lagrangian formulation of classical mechanics Mechanics Lecture Lagrangian formulation of classical mechanics 70.00 Mechanics Principle of stationary action MATH-GA To specify a motion uniquely in classical mechanics, it suffices to give, at some time t 0,

More information

A simple model for the small-strain behaviour of soils

A simple model for the small-strain behaviour of soils A simple moel for the small-strain behaviour of soils José Jorge Naer Department of Structural an Geotechnical ngineering, Polytechnic School, University of São Paulo 05508-900, São Paulo, Brazil, e-mail:

More information

Some Examples. Uniform motion. Poisson processes on the real line

Some Examples. Uniform motion. Poisson processes on the real line Some Examples Our immeiate goal is to see some examples of Lévy processes, an/or infinitely-ivisible laws on. Uniform motion Choose an fix a nonranom an efine X := for all (1) Then, {X } is a [nonranom]

More information

SYNCHRONOUS SEQUENTIAL CIRCUITS

SYNCHRONOUS SEQUENTIAL CIRCUITS CHAPTER SYNCHRONOUS SEUENTIAL CIRCUITS Registers an counters, two very common synchronous sequential circuits, are introuce in this chapter. Register is a igital circuit for storing information. Contents

More information

Euler equations for multiple integrals

Euler equations for multiple integrals Euler equations for multiple integrals January 22, 2013 Contents 1 Reminer of multivariable calculus 2 1.1 Vector ifferentiation......................... 2 1.2 Matrix ifferentiation........................

More information

The total derivative. Chapter Lagrangian and Eulerian approaches

The total derivative. Chapter Lagrangian and Eulerian approaches Chapter 5 The total erivative 51 Lagrangian an Eulerian approaches The representation of a flui through scalar or vector fiels means that each physical quantity uner consieration is escribe as a function

More information

Conservation Laws. Chapter Conservation of Energy

Conservation Laws. Chapter Conservation of Energy 20 Chapter 3 Conservation Laws In orer to check the physical consistency of the above set of equations governing Maxwell-Lorentz electroynamics [(2.10) an (2.12) or (1.65) an (1.68)], we examine the action

More information

Collapsed Variational Inference for HDP

Collapsed Variational Inference for HDP Collapse Variational Inference for HDP Yee W. Teh Davi Newman an Max Welling Publishe on NIPS 2007 Discussion le by Iulian Pruteanu Outline Introuction Hierarchical Bayesian moel for LDA Collapse VB inference

More information

NOTES ON EULER-BOOLE SUMMATION (1) f (l 1) (n) f (l 1) (m) + ( 1)k 1 k! B k (y) f (k) (y) dy,

NOTES ON EULER-BOOLE SUMMATION (1) f (l 1) (n) f (l 1) (m) + ( 1)k 1 k! B k (y) f (k) (y) dy, NOTES ON EULER-BOOLE SUMMATION JONATHAN M BORWEIN, NEIL J CALKIN, AND DANTE MANNA Abstract We stuy a connection between Euler-MacLaurin Summation an Boole Summation suggeste in an AMM note from 196, which

More information

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments 2 Conference on Information Sciences an Systems, The Johns Hopkins University, March 2, 2 Time-of-Arrival Estimation in Non-Line-Of-Sight Environments Sinan Gezici, Hisashi Kobayashi an H. Vincent Poor

More information

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors Math 18.02 Notes on ifferentials, the Chain Rule, graients, irectional erivative, an normal vectors Tangent plane an linear approximation We efine the partial erivatives of f( xy, ) as follows: f f( x+

More information

Latent Variable Models and EM algorithm

Latent Variable Models and EM algorithm Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic

More information

The Press-Schechter mass function

The Press-Schechter mass function The Press-Schechter mass function To state the obvious: It is important to relate our theories to what we can observe. We have looke at linear perturbation theory, an we have consiere a simple moel for

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Latent Dirichlet Allocation in Web Spam Filtering

Latent Dirichlet Allocation in Web Spam Filtering Latent Dirichlet Allocation in Web Spam Filtering István Bíró Jácint Szabó Anrás A. Benczúr Data Mining an Web search Research Group, Informatics Laboratory Computer an Automation Research Institute of

More information

Assignment 1. g i (x 1,..., x n ) dx i = 0. i=1

Assignment 1. g i (x 1,..., x n ) dx i = 0. i=1 Assignment 1 Golstein 1.4 The equations of motion for the rolling isk are special cases of general linear ifferential equations of constraint of the form g i (x 1,..., x n x i = 0. i=1 A constraint conition

More information

Problem Sheet 2: Eigenvalues and eigenvectors and their use in solving linear ODEs

Problem Sheet 2: Eigenvalues and eigenvectors and their use in solving linear ODEs Problem Sheet 2: Eigenvalues an eigenvectors an their use in solving linear ODEs If you fin any typos/errors in this problem sheet please email jk28@icacuk The material in this problem sheet is not examinable

More information

arxiv: v4 [math.pr] 27 Jul 2016

arxiv: v4 [math.pr] 27 Jul 2016 The Asymptotic Distribution of the Determinant of a Ranom Correlation Matrix arxiv:309768v4 mathpr] 7 Jul 06 AM Hanea a, & GF Nane b a Centre of xcellence for Biosecurity Risk Analysis, University of Melbourne,

More information

All s Well That Ends Well: Supplementary Proofs

All s Well That Ends Well: Supplementary Proofs All s Well That Ens Well: Guarantee Resolution of Simultaneous Rigi Boy Impact 1:1 All s Well That Ens Well: Supplementary Proofs This ocument complements the paper All s Well That Ens Well: Guarantee

More information

Schrödinger s equation.

Schrödinger s equation. Physics 342 Lecture 5 Schröinger s Equation Lecture 5 Physics 342 Quantum Mechanics I Wenesay, February 3r, 2010 Toay we iscuss Schröinger s equation an show that it supports the basic interpretation of

More information

ELEC3114 Control Systems 1

ELEC3114 Control Systems 1 ELEC34 Control Systems Linear Systems - Moelling - Some Issues Session 2, 2007 Introuction Linear systems may be represente in a number of ifferent ways. Figure shows the relationship between various representations.

More information

arxiv: v2 [math.pr] 27 Nov 2018

arxiv: v2 [math.pr] 27 Nov 2018 Range an spee of rotor wals on trees arxiv:15.57v [math.pr] 7 Nov 1 Wilfrie Huss an Ecaterina Sava-Huss November, 1 Abstract We prove a law of large numbers for the range of rotor wals with ranom initial

More information

KNN Particle Filters for Dynamic Hybrid Bayesian Networks

KNN Particle Filters for Dynamic Hybrid Bayesian Networks KNN Particle Filters for Dynamic Hybri Bayesian Networs H. D. Chen an K. C. Chang Dept. of Systems Engineering an Operations Research George Mason University MS 4A6, 4400 University Dr. Fairfax, VA 22030

More information

WUCHEN LI AND STANLEY OSHER

WUCHEN LI AND STANLEY OSHER CONSTRAINED DYNAMICAL OPTIMAL TRANSPORT AND ITS LAGRANGIAN FORMULATION WUCHEN LI AND STANLEY OSHER Abstract. We propose ynamical optimal transport (OT) problems constraine in a parameterize probability

More information

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions Working Paper 2013:5 Department of Statistics Computing Exact Confience Coefficients of Simultaneous Confience Intervals for Multinomial Proportions an their Functions Shaobo Jin Working Paper 2013:5

More information

Tractability results for weighted Banach spaces of smooth functions

Tractability results for weighted Banach spaces of smooth functions Tractability results for weighte Banach spaces of smooth functions Markus Weimar Mathematisches Institut, Universität Jena Ernst-Abbe-Platz 2, 07740 Jena, Germany email: markus.weimar@uni-jena.e March

More information

Tutorial on Maximum Likelyhood Estimation: Parametric Density Estimation

Tutorial on Maximum Likelyhood Estimation: Parametric Density Estimation Tutorial on Maximum Likelyhoo Estimation: Parametric Density Estimation Suhir B Kylasa 03/13/2014 1 Motivation Suppose one wishes to etermine just how biase an unfair coin is. Call the probability of tossing

More information

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION The Annals of Statistics 1997, Vol. 25, No. 6, 2313 2327 LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION By Eva Riccomagno, 1 Rainer Schwabe 2 an Henry P. Wynn 1 University of Warwick, Technische

More information

ensembles When working with density operators, we can use this connection to define a generalized Bloch vector: v x Tr x, v y Tr y

ensembles When working with density operators, we can use this connection to define a generalized Bloch vector: v x Tr x, v y Tr y Ph195a lecture notes, 1/3/01 Density operators for spin- 1 ensembles So far in our iscussion of spin- 1 systems, we have restricte our attention to the case of pure states an Hamiltonian evolution. Toay

More information

Introduction to Markov Processes

Introduction to Markov Processes Introuction to Markov Processes Connexions moule m44014 Zzis law Gustav) Meglicki, Jr Office of the VP for Information Technology Iniana University RCS: Section-2.tex,v 1.24 2012/12/21 18:03:08 gustav

More information

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x)

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x) Y. D. Chong (2016) MH2801: Complex Methos for the Sciences 1. Derivatives The erivative of a function f(x) is another function, efine in terms of a limiting expression: f (x) f (x) lim x δx 0 f(x + δx)

More information

Angles-Only Orbit Determination Copyright 2006 Michel Santos Page 1

Angles-Only Orbit Determination Copyright 2006 Michel Santos Page 1 Angles-Only Orbit Determination Copyright 6 Michel Santos Page 1 Abstract This ocument presents a re-erivation of the Gauss an Laplace Angles-Only Methos for Initial Orbit Determination. It keeps close

More information

Range and speed of rotor walks on trees

Range and speed of rotor walks on trees Range an spee of rotor wals on trees Wilfrie Huss an Ecaterina Sava-Huss May 15, 1 Abstract We prove a law of large numbers for the range of rotor wals with ranom initial configuration on regular trees

More information

Free rotation of a rigid body 1 D. E. Soper 2 University of Oregon Physics 611, Theoretical Mechanics 5 November 2012

Free rotation of a rigid body 1 D. E. Soper 2 University of Oregon Physics 611, Theoretical Mechanics 5 November 2012 Free rotation of a rigi boy 1 D. E. Soper 2 University of Oregon Physics 611, Theoretical Mechanics 5 November 2012 1 Introuction In this section, we escribe the motion of a rigi boy that is free to rotate

More information

Equilibrium in Queues Under Unknown Service Times and Service Value

Equilibrium in Queues Under Unknown Service Times and Service Value University of Pennsylvania ScholarlyCommons Finance Papers Wharton Faculty Research 1-2014 Equilibrium in Queues Uner Unknown Service Times an Service Value Laurens Debo Senthil K. Veeraraghavan University

More information

On combinatorial approaches to compressed sensing

On combinatorial approaches to compressed sensing On combinatorial approaches to compresse sensing Abolreza Abolhosseini Moghaam an Hayer Raha Department of Electrical an Computer Engineering, Michigan State University, East Lansing, MI, U.S. Emails:{abolhos,raha}@msu.eu

More information

Chapter 6: Energy-Momentum Tensors

Chapter 6: Energy-Momentum Tensors 49 Chapter 6: Energy-Momentum Tensors This chapter outlines the general theory of energy an momentum conservation in terms of energy-momentum tensors, then applies these ieas to the case of Bohm's moel.

More information

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control 19 Eigenvalues, Eigenvectors, Orinary Differential Equations, an Control This section introuces eigenvalues an eigenvectors of a matrix, an iscusses the role of the eigenvalues in etermining the behavior

More information

Sturm-Liouville Theory

Sturm-Liouville Theory LECTURE 5 Sturm-Liouville Theory In the three preceing lectures I emonstrate the utility of Fourier series in solving PDE/BVPs. As we ll now see, Fourier series are just the tip of the iceberg of the theory

More information

Influence of weight initialization on multilayer perceptron performance

Influence of weight initialization on multilayer perceptron performance Influence of weight initialization on multilayer perceptron performance M. Karouia (1,2) T. Denœux (1) R. Lengellé (1) (1) Université e Compiègne U.R.A. CNRS 817 Heuiasyc BP 649 - F-66 Compiègne ceex -

More information

Gaussian Mixture Model

Gaussian Mixture Model Case Study : Document Retrieval MAP EM, Latent Dirichlet Allocation, Gibbs Sampling Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 5 th,

More information

6 General properties of an autonomous system of two first order ODE

6 General properties of an autonomous system of two first order ODE 6 General properties of an autonomous system of two first orer ODE Here we embark on stuying the autonomous system of two first orer ifferential equations of the form ẋ 1 = f 1 (, x 2 ), ẋ 2 = f 2 (, x

More information

u!i = a T u = 0. Then S satisfies

u!i = a T u = 0. Then S satisfies Deterministic Conitions for Subspace Ientifiability from Incomplete Sampling Daniel L Pimentel-Alarcón, Nigel Boston, Robert D Nowak University of Wisconsin-Maison Abstract Consier an r-imensional subspace

More information

Hyperbolic Moment Equations Using Quadrature-Based Projection Methods

Hyperbolic Moment Equations Using Quadrature-Based Projection Methods Hyperbolic Moment Equations Using Quarature-Base Projection Methos J. Koellermeier an M. Torrilhon Department of Mathematics, RWTH Aachen University, Aachen, Germany Abstract. Kinetic equations like the

More information

Two Useful Bounds for Variational Inference

Two Useful Bounds for Variational Inference Two Useful Bounds for Variational Inference John Paisley Department of Computer Science Princeton University, Princeton, NJ jpaisley@princeton.edu Abstract We review and derive two lower bounds on the

More information

Homework 2 EM, Mixture Models, PCA, Dualitys

Homework 2 EM, Mixture Models, PCA, Dualitys Homework 2 EM, Mixture Moels, PCA, Dualitys CMU 10-715: Machine Learning (Fall 2015) http://www.cs.cmu.eu/~bapoczos/classes/ml10715_2015fall/ OUT: Oct 5, 2015 DUE: Oct 19, 2015, 10:20 AM Guielines The

More information

Quantum mechanical approaches to the virial

Quantum mechanical approaches to the virial Quantum mechanical approaches to the virial S.LeBohec Department of Physics an Astronomy, University of Utah, Salt Lae City, UT 84112, USA Date: June 30 th 2015 In this note, we approach the virial from

More information

Factor Analysis and Kalman Filtering (11/2/04)

Factor Analysis and Kalman Filtering (11/2/04) CS281A/Stat241A: Statistical Learning Theory Factor Analysis and Kalman Filtering (11/2/04) Lecturer: Michael I. Jordan Scribes: Byung-Gon Chun and Sunghoon Kim 1 Factor Analysis Factor analysis is used

More information

Topic 7: Convergence of Random Variables

Topic 7: Convergence of Random Variables Topic 7: Convergence of Ranom Variables Course 003, 2016 Page 0 The Inference Problem So far, our starting point has been a given probability space (S, F, P). We now look at how to generate information

More information

Optimized Schwarz Methods with the Yin-Yang Grid for Shallow Water Equations

Optimized Schwarz Methods with the Yin-Yang Grid for Shallow Water Equations Optimize Schwarz Methos with the Yin-Yang Gri for Shallow Water Equations Abessama Qaouri Recherche en prévision numérique, Atmospheric Science an Technology Directorate, Environment Canaa, Dorval, Québec,

More information

Parameter estimation: A new approach to weighting a priori information

Parameter estimation: A new approach to weighting a priori information Parameter estimation: A new approach to weighting a priori information J.L. Mea Department of Mathematics, Boise State University, Boise, ID 83725-555 E-mail: jmea@boisestate.eu Abstract. We propose a

More information

Proof of SPNs as Mixture of Trees

Proof of SPNs as Mixture of Trees A Proof of SPNs as Mixture of Trees Theorem 1. If T is an inuce SPN from a complete an ecomposable SPN S, then T is a tree that is complete an ecomposable. Proof. Argue by contraiction that T is not a

More information

The Principle of Least Action

The Principle of Least Action Chapter 7. The Principle of Least Action 7.1 Force Methos vs. Energy Methos We have so far stuie two istinct ways of analyzing physics problems: force methos, basically consisting of the application of

More information

22 : Hilbert Space Embeddings of Distributions

22 : Hilbert Space Embeddings of Distributions 10-708: Probabilistic Graphical Models 10-708, Spring 2014 22 : Hilbert Space Embeddings of Distributions Lecturer: Eric P. Xing Scribes: Sujay Kumar Jauhar and Zhiguang Huo 1 Introduction and Motivation

More information

Robust Low Rank Kernel Embeddings of Multivariate Distributions

Robust Low Rank Kernel Embeddings of Multivariate Distributions Robust Low Rank Kernel Embeings of Multivariate Distributions Le Song, Bo Dai College of Computing, Georgia Institute of Technology lsong@cc.gatech.eu, boai@gatech.eu Abstract Kernel embeing of istributions

More information

The Ehrenfest Theorems

The Ehrenfest Theorems The Ehrenfest Theorems Robert Gilmore Classical Preliminaries A classical system with n egrees of freeom is escribe by n secon orer orinary ifferential equations on the configuration space (n inepenent

More information

SYMMETRIC KRONECKER PRODUCTS AND SEMICLASSICAL WAVE PACKETS

SYMMETRIC KRONECKER PRODUCTS AND SEMICLASSICAL WAVE PACKETS SYMMETRIC KRONECKER PRODUCTS AND SEMICLASSICAL WAVE PACKETS GEORGE A HAGEDORN AND CAROLINE LASSER Abstract We investigate the iterate Kronecker prouct of a square matrix with itself an prove an invariance

More information

The Hamiltonian particle-mesh method for the spherical shallow water equations

The Hamiltonian particle-mesh method for the spherical shallow water equations ATMOSPHERIC SCIENCE LETTERS Atmos. Sci. Let. 5: 89 95 (004) Publishe online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.100/asl.70 The Hamiltonian particle-mesh metho for the spherical

More information

New Statistical Test for Quality Control in High Dimension Data Set

New Statistical Test for Quality Control in High Dimension Data Set International Journal of Applie Engineering Research ISSN 973-456 Volume, Number 6 (7) pp. 64-649 New Statistical Test for Quality Control in High Dimension Data Set Shamshuritawati Sharif, Suzilah Ismail

More information

EVALUATING HIGHER DERIVATIVE TENSORS BY FORWARD PROPAGATION OF UNIVARIATE TAYLOR SERIES

EVALUATING HIGHER DERIVATIVE TENSORS BY FORWARD PROPAGATION OF UNIVARIATE TAYLOR SERIES MATHEMATICS OF COMPUTATION Volume 69, Number 231, Pages 1117 1130 S 0025-5718(00)01120-0 Article electronically publishe on February 17, 2000 EVALUATING HIGHER DERIVATIVE TENSORS BY FORWARD PROPAGATION

More information

arxiv: v2 [math.st] 29 Oct 2015

arxiv: v2 [math.st] 29 Oct 2015 EXPONENTIAL RANDOM SIMPLICIAL COMPLEXES KONSTANTIN ZUEV, OR EISENBERG, AND DMITRI KRIOUKOV arxiv:1502.05032v2 [math.st] 29 Oct 2015 Abstract. Exponential ranom graph moels have attracte significant research

More information

INDEPENDENT COMPONENT ANALYSIS VIA

INDEPENDENT COMPONENT ANALYSIS VIA INDEPENDENT COMPONENT ANALYSIS VIA NONPARAMETRIC MAXIMUM LIKELIHOOD ESTIMATION Truth Rotate S 2 2 1 0 1 2 3 4 X 2 2 1 0 1 2 3 4 4 2 0 2 4 6 4 2 0 2 4 6 S 1 X 1 Reconstructe S^ 2 2 1 0 1 2 3 4 Marginal

More information

LeChatelier Dynamics

LeChatelier Dynamics LeChatelier Dynamics Robert Gilmore Physics Department, Drexel University, Philaelphia, Pennsylvania 1914, USA (Date: June 12, 28, Levine Birthay Party: To be submitte.) Dynamics of the relaxation of a

More information

APPROXIMATE SOLUTION FOR TRANSIENT HEAT TRANSFER IN STATIC TURBULENT HE II. B. Baudouy. CEA/Saclay, DSM/DAPNIA/STCM Gif-sur-Yvette Cedex, France

APPROXIMATE SOLUTION FOR TRANSIENT HEAT TRANSFER IN STATIC TURBULENT HE II. B. Baudouy. CEA/Saclay, DSM/DAPNIA/STCM Gif-sur-Yvette Cedex, France APPROXIMAE SOLUION FOR RANSIEN HEA RANSFER IN SAIC URBULEN HE II B. Bauouy CEA/Saclay, DSM/DAPNIA/SCM 91191 Gif-sur-Yvette Ceex, France ABSRAC Analytical solution in one imension of the heat iffusion equation

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Keywors: multi-view learning, clustering, canonical correlation analysis Abstract Clustering ata in high-imensions is believe to be a har problem in general. A number of efficient clustering algorithms

More information

Optimization of Geometries by Energy Minimization

Optimization of Geometries by Energy Minimization Optimization of Geometries by Energy Minimization by Tracy P. Hamilton Department of Chemistry University of Alabama at Birmingham Birmingham, AL 3594-140 hamilton@uab.eu Copyright Tracy P. Hamilton, 1997.

More information

Some vector algebra and the generalized chain rule Ross Bannister Data Assimilation Research Centre, University of Reading, UK Last updated 10/06/10

Some vector algebra and the generalized chain rule Ross Bannister Data Assimilation Research Centre, University of Reading, UK Last updated 10/06/10 Some vector algebra an the generalize chain rule Ross Bannister Data Assimilation Research Centre University of Reaing UK Last upate 10/06/10 1. Introuction an notation As we shall see in these notes the

More information

Robustness and Perturbations of Minimal Bases

Robustness and Perturbations of Minimal Bases Robustness an Perturbations of Minimal Bases Paul Van Dooren an Froilán M Dopico December 9, 2016 Abstract Polynomial minimal bases of rational vector subspaces are a classical concept that plays an important

More information

Gaussian processes with monotonicity information

Gaussian processes with monotonicity information Gaussian processes with monotonicity information Anonymous Author Anonymous Author Unknown Institution Unknown Institution Abstract A metho for using monotonicity information in multivariate Gaussian process

More information