Unied Presentation Contents Retrieval using Voice Information

Size: px

Start display at page:

Download "Unied Presentation Contents Retrieval using Voice Information"

Harriet Flynn
5 years ago
Views:

1 DEWS2006 6c-o1,, UPRISE (Unified Presentation slide Retrieval by Impression Search Engine) UPRISE UPRISE 4 e-learning Unied Presentation Contents Retrieval using Voice Information Hiroaki OKAMOTO, Wataru NAKANO, Takashi KOBAYASHI, Satoshi NAOI,, Haruo YOKOTA,, Koji IWANO, and Sadaoki FURUI Department of Computer Science, Graduate School of Information Science and Engineering, Tokyo Institute of Technology Global Scientic Information & Computing Center, Tokyo Institute of Technology FUJITSU LABORATORIES LTD {okamoto,wnakano}@decstitechacjp, tkobaya@gsictitechacjp,, naoisatoshi@jpfujitsucom, {yokota,iwano,furui}@cstitechacjp Abstract We have proposed the UPRISE (Unified Presentation slide Retrieval by Impression Search Engine) to store lecture videos and presentation slides used in the lectures It provides dedicated scene search functions for the metadata-based unified presentation contents, using the slide structure, scene duration, and context information In this paper, we use the voice information in the lecture videos to improve the precision of scene retrieval using the speech recognition technique We propose four ways to add the influence of voice to the scene search functions of the UPRISE, and evaluate the effect of them The experimental results indicate that the voice information is effective to improve the precision Key words information integration, information retrieval, e-learning 1 [1] [3] e-learning e-learning

1 3 UPRISE 4 5 2 UPRISE UPRISE 2 1 UPRISE UPRISE 1 ( CSJ) [12], [13] UPRISE [11]

2 UPRISE(Unied Presentation slide Retrieval by Impression Search Engine) [4] [11] UPRISE / "!$#%! 1 3 UPRISE UPRISE UPRISE 2 1 UPRISE UPRISE 1 ( CSJ) [12], [13] UPRISE [11] [14] UPRISE [15] [7] UPRISE UPRISE [16], [17] [6] 2 2 UPRISE I p I p UPRISE L I p (s, k) = P(s, l) C(s, k, l) 2 UPRISE UPRISE l=1 2 2 s k l P(s, l) &('*)

3 s l C(s, k, l) s l [11] k P(s, l) I d I d I p ( ) θ T(s) I d (s, k, θ, u) = I p (s, k) u UPRISE T(s) s θ u I c I c I d 3 1 I c (s, k, θ, u, δ, ε 1, ε 2 ) = s+δ γ=s δ E(γ s, ε 1, ε 2 ) I d (s, k, θ, u) δ E(γ s, ε 1, ε 2 ) E(γ s, ε 1, ε 2 ) exp(ε 1 x) (x < 0) E(x, ε 1, ε 2 ) = [19], [20] exp( ε 2 x) (x > = 0) [21] δ [21] (CSJ) [12], [13] ε δ = 4, ε 1 = 50, ε 2 = 05 4 CSJ 967 ( 300 ) UPRISE I p I d I c ipadic 3 ( ) ( 300 ) 22,860 [9] [10] [9], [10] ( 90 ) 3 UPRISE Julius 1 [18] Julius 2 2 UPRISE, [21] 3 2 Julius ( ) ( ) XML forgejp/ 2 UPIRISE 3

XML XML fragment fragment ( ) string voice I p 0 voice 1 1 voice string in ( ) constimemilli I c ( ) constimesec ( ) 0 cmscore 3 4 4 cmscore IDF [22] 2 XML IVF 3 3 3 4 3 3 4 I vas (VA if keyword 3 2

4 XML XML fragment fragment ( ) string voice I p 0 voice 1 1 voice string in ( ) constimemilli I c ( ) constimesec ( ) 0 cmscore cmscore IDF [22] 2 XML IVF I vas (VA if keyword 3 2 is in the Scene) I vas (s, k, δ, θ, u, ε 1, ε 2, α) s k I c (s, k, δ, θ, u, ε 1, ε 2 ) + α ( ) T(s) θ u VA(s, k) (Ip 0) ( VA(s, k)(voice Appearance) VA T(s) ) θ = I c (s, k, δ, θ, u, ε 1, ε 2 ) (I p = 0) u α α = 6 I c 1 1 α = 0 I c VC(Voice I vas and Context) I c 0 s+δ ( ) θ T(s) I vac (VA if keyword is in Contexts) VC(s, k, θ, u, δ, ε 1, ε 2 ) = E(γ s, ε 1, ε 2 ) VA(s, k) u γ=s δ IVF(Inverse Voice Frequency) IVF(k) = log k I vac (s, k, δ, θ, u, ε 1, ε 2, α) I c (s, k, δ, θ, u, ε 1, ε 2 ) + α ( ) T(s) θ u VA(s, k) (Ic 0) = 0 (I c = 0) I vac ( (VC) ) I vcs

5 I vcs (s, k, δ, θ, u, ε 1, ε 2, α) I c (s, k, δ, θ, u, ε 1, ε 2 ) + αvc(s, k, δ, θ, u, ε 1, ε 2 ) (I p 0) = I c (s, k, δ, θ, u, ε 1, ε 2 ) (I p = 0) ( ) 254% (I c ) 0, (VC) I vcc,, I vcc (s, k, δ, θ, u, ε 1, ε 2, α) I c (s, k, δ, θ, u, ε 1, ε 2 ) + αvc(s, k, δ, θ, u, ε 1, ε 2 ) (I c 0) = 0 (I c = 0) 3 3 IVF [9] I vas I vac I vcs I vcc IS FP IVF IS FC IS FC IVF IS FC(k) = log k θ = 05 u = 60 δ = 4 ε 1 = 50 ε 2 = 05 4 I c 124 I x IS FC IVF x vas vac vcs vcc I p I c I c I x IS FC IVF (s, k, δ, θ, u, ε 1, ε 2, α) I c IS FC(k) + αi x IVF(k) = I c IS FC(k) 4 (I p 0 I c 0) (I p = 0 I c = 0) IS FC IVF UPRISE IS FC α % 267% I vas I vac I vcs I vcc α UPRISE ( 11 ) ( 12 ) [21] , (recall) (pre ,036 cision) [23] 1 ( ) 1

6 %& ' $ +, - *!#" ')(! "#!$ "#%$& "#%$&$ 3 4 α (5 ) α (1 ) UPRISE 0 1 α = I p 1 N = 1 N 1 N i i= α α α I vas I vac I vcs I vcc 0=I c I c 0560(α = 0 ) I c I vas 0035 I vac I vcs 0059 I vcc 005 I vcs α I vac α = 9, 10 2 ' ' ' ' ' I c ' 3 α α ' ' I c 14 I vcs α α I c 5 2 (I vcs ) α = 0 α = 10 α = 100 α =

7 3 I vcs I vcs IS FC I vcs IS FC IVF α = α = α = α = α = ' ' I vcs IS FC I p 4 (α = 5) I vcs I vcs IS FC IVF I vcs IS FC ' ' ( ) ( ) α α = 5 α = I vcs ' ' 4 ' ' ' ' ' 7 ' 3 α = 5 70 ' ' ' ' I vcs IS FC IVF I vcs IS FC IVF I vcs IS FC IVF IS FC 2 I vcs IS FC 3 α = 0( ) 0 ( IS FC ) IS FC I vcs IS FC α = 5 I vcs α ' ' I vcs α = 20 I vcs IS FC IVF 0623 ' ' ' ' I vcs IS FC IVF I vcs IS FC I vcs α = 15 I c I vcs IS FC IVF 0063(63%) 0623

8 K "! # $ % & "' (*) +-,& 6!#"%$'&(!)*$ +-, 0/ ":$'&;<=)*$ +?>0@BA!#"%$'&E+->2@BA GF!>0@HA CBDJI CBD*3 7 ( ) CBD* ( ) [11] 3 3 IDF [22] 1 IVF UPRISE UPRISE I 4 2 c ( 0560) I I vcs vcs 0059(59%) 0619 α ( I vcs IS FC I vcs IS FC IVF ) α 1 I 5 1 c ( 0560) 0063(63%) 0623 I vcs 6 ( ) α = α = 3

9 In Proc of WIRI2005, pp 47 52, April 2005 [10] Wataru Nakano, Yuta Ochi, Takashi Kobayashi, Yutaka Katsuyama, Satoshi Naoi, and Haruo Yokota Unied Presentation Contents Retrieval Using Laser Pointer Information In Proc of SWOD, pp , April 2005 [11],,,, 2005-dbs-137 (ii) (78), [24] pp ,, July 2005 ( ) [12] gojp/ csj/public/ [25] [13] Kikuo Maekawa, Hanae Koiso, Sadaoki Furui, and Hitoshi Isahara Spontaneous speech corpus of japanese In Proc LREC2000, Vol 2, pp , Athens, Greece, May 2000 [14] S Johnson, P Jourlin, G Moore, K S Jones, and PWoodland The 4 3 Cambridge University spoken document retrieval system In Proceedings of ICASSP'99, pp 49 52, [15],, 8, March 2002 [16],, Technical Report 2005-SLP ,, February 2005 [17],, e-learning, March 2004 [18], Julius 1, Vol 20, No 1, pp 41 49, 2005 [19],,, SP ,, Dec 2000 [20] Takahiro Shinozaki, Chiori Hori, and Sadaoki Furui Towards automatic transcription of spontaneous presentations In Proc Eu- rospeech2001, Vol 1, pp , Aalborg, Denmark, Sep 2001 Julius / [21], February 2005 [22] G Salton Automatic Text Processing: The Transformation, Analysis and Retrieval of Information by Computer Addison-Wesley, 1988 [23] D A Grossman and O Frieder Information Retrieval Algorithm and ( , ) Heuristics Kluwer, 1998 CREST 21 [24] A Singhal and F Pereria Document expansion for speech retrieval COE In Proc of ACM SIGIR 99, pp 34 41, 1999 [25] S Srinivasan and D Petkovic Phonetic confusion matrix based spoken document retrieval In Proc of ACM SIGIR2000, pp 81 87, 2000 [1] R Müller and T Ottmann The Authoring on the Fly system for automated recording and replay of (tele)presentations Multimedia Syst, Vol 8, No 3, pp , 2000 [2] Carnegie Mellon University The Informedia Project Informedia ii digital video library [3] G D Abowd Classroom 2000: an experiment with the instrumentation of a living edu cational environment IBM Syst J, Vol 38, No 4, pp , 1999 [4] DBS ,, July 2001 [5],,,, In Proc of DBWeb2002, pp , 2002 [6] Haruo Yokota, Takashi Kobayashi, Taichi Muraki, and Satoshi Naoi UPRISE: Unied Presentation Slide Retrieval by Impression Search Engine IEICE Transactions on Information and Systems, Vol E87- D, No 2, pp , February 2004 [7],,,, Vol J88-D-I, No 3, pp , March 2005 [8],,, pp DEWS B 3 DE, March 2004 [9] Hiroaki Okamoto, Takashi Kobayashi, and Haruo Yokota Presentation Retrieval Method Considering the Scope of Targets and Outputs

Considering Granularity of Laser Pointer and Speech in Information Integration for Lecture Scene Retrieval

Considering Granularity of Laser Pointer and Speech in Information Integration for Lecture Scene Retrieval DEWS2007 E1-3,, 152 8552 2 12 1 152 8550 2 12 1 211 8588 4 1 1 E-mail: wnakano@de.cs.titech.ac.jp, tkobaya@gsic.titech.ac.jp,, naoi.satoshi@jp.fujitsu.com, {yokota,furui}@cs.titech.ac.jp UPRISE e-learning