Analyzing Burst of Topics in News Stream

Size: px

Start display at page:

Download "Analyzing Burst of Topics in News Stream"

Kellie Bridges
5 years ago
Views:

1 Kleinberg LDA (latent Dirichlet allocation) DTM (dynamic topic model) DTM Analyzing Burst of Topics in News Stream Yusuke Takahashi, 1 Daisuke Yokomoto, 1 Takehito Utsuro 1 and Masaharu Yoshioka 2 Among various types of recent information explosion, that in news stream is also a kind of serious problems. This paper studies issues regarding two types of modeling of information flow in news stream, namely, burst analysis and topic modeling. First, when one wants to detect a kind of topics that are paid much more attention than usual, it is usually necessary for him/her to carefully watch every article in news stream at every moment. In such a situation, it is well known in the field of time series analysis that Kleinberg s modeling of bursts is quite effective in detecting burst of keywords. Second, topic models such as LDA (latent Dirichlet allocation) and DTM (dynamic topic model) are also quite effective in estimating distribution of topics over a document collection such as articles in news stream. This paper focuses on the fact that Kleinberg s modeling of bursts is usually applied only to bursts of keywords but not to those of topics. Then, based on Kleinberg s modeling of bursts of keywords, we propose how to measure bursts of topics estimated by a topic model such as LDA and DTM. 1. Kleinberg 5) DTM (dynamic topic model) 3) DTM 1 1 Graduate School of Systems and Information Engineering, University of Tsukuba 2 Graduate School of Information Science and Technology, Hokkaido University 1 c 2011 Information Processing Society of Japan

2 1 2. Kleinberg 5) 2.1 enumerating enumerating 2 A 2 2 q 0 q m B 1,...,B m t B t d t B t r t m D D = d t R R = m r t t=1 t=1 2 q 0 p 0 = R/D q 1 p 0 s p 1 = p 0s s >1 p 1 1 s s m d t r t q =(q i1,...,q im ) q im m q i (i =0, 1) B(d t,p i) q i σ(i, r i,d t) [ ( ) ] d t σ(i, r t,d t)= ln p rt i r (1 pi)dt rt t q i q j τ(i, j) { (j i)γ (j >i) τ(i, j) = 0 (j i) τ γ γ =1 1 2 c 2011 Information Processing Society of Japan

3 (a) (b) 2 q σ τ q c( q r t,d t )= ( m 1 τ(i t,i t+1) t=0 ) + ( m ) σ(i t,r t,d t) t=1 A 2 s γ A 2 s,γ s =2 γ =1 A 2 2,1 2.2 Kleinberg t k,...,t l w bw(t k,t l,w) bw(t k,t l,w)= t l t=t k ( σ(0,rt,d t) σ(1,r t,d t) ) 1 t k = t l (=t) bw(t, w) =bw(t, t, w) DTM (dynamic topic model) 3) DTM w K z n (n =1,...,K) w p(w z n)(w V ) b z n p(z n b) (n =1,...,K) V DTM (LDA, Latent Dirichlet Allocation) 4) 3 c 2011 Information Processing Society of Japan

4 p(w z n)(w V ) p(z n b) (n =1,...,K) Blei 1 α K α =0.01 K = B K 1 b (b B) z n (n =1,...,K) D(z n) { } D(z n)= b B t z n = argmax z u (u=1,...,k) p(z u b) b b 4. t z n bz(t, z n) bz(t, z n)= bw(t, w) p(w z n) w p (a) (b) z n D(z n) D(z n) 2(a) 2(b) ( ( ( yomiuri.co.jp/) 56,503 38,758 62, ,945 4 c 2011 Information Processing Society of Japan

5 (3.71), (2.34), (0.33), (0.33), (0.21) (4.12), (1.08), (0.36), (0.33), (0.27) (3.80), (1.49), (1.42), (0.90), (0.25) (8.07), (4.06), (2.06), (0.25), (0.20) (4.67), (3.00), (2.96), (0.52), (0.46) (1.20), (1.08), (0.97), (0.78), (0.43) (1.91), (0.77), (0.52), (0.52), (0.27) 5 (63), (54), (51), (49), (45) (77), (58), (50), (43), (33) (48), (30), (23), (23), (22) (114), (36), (34), (16), (15) (64), (55), (22), (50), (46) (58), (50), (45), (37), (32) (49), (46), (45), (41), (38) 3 3 2(a) 2(b) DTM ) 7) Kleinberg DTM LDA 7) 5 c 2011 Information Processing Society of Japan

6 2 ( =0.6) [%] ( =0.5) [%] ( =0.4) [%] ) 2) (J/I; Junk/Insignificance Topic) LDA J/I 7... DTM On-line LDA 1) 1) ALSumait, L., Bardara, D. and Domeniconi, C.: On-Line LDA: Adaptive Topic Models for Mining Text Streams with Applications to Topic Detection and Tracking, Proc. 8th ICDM, pp.3 12 (2008). 2) ALSumait, L., Bardara, D., Gentle, J. and Domeniconi, C.: Topic Significance Ranking of LDA Generative Models, Proc. ECML/PKDD, pp (2009). 3) Blei, D.M. andlafferty, J.D.: DynamicTopicModels, Proc. 23rd ICML, pp (2006). 4) Blei, D.M., Ng, A.Y. and Jordan, M.I.: Latent Dirichlet Allocation, Journal of Machine Learning Research, Vol.3, pp (2003). 5) Kleinberg, J.: Bursty and Hierarchical Structure in Streams, Proc. 8th SIGKDD, pp (2002). 6) Mane, K. and Borner, K.: Mapping topics and topic bursts in PNAS, Proc. PNAS, Vol.101, Suppl 1, pp (2004). 7) 3 DEIM (2011). DTM 6 c 2011 Information Processing Society of Japan

Time Series Topic Modeling and Bursty Topic Detection of Correlated News and Twitter

Time Series Topic Modeling and Bursty Topic Detection of Correlated News and Twitter Daichi Koike Yusuke Takahashi Takehito Utsuro Grad. Sch. Sys. & Inf. Eng., University of Tsukuba, Tsukuba, 305-8573,