Language Information RetrievalÓ j

± i jschang@cs.nthu.edu.tw o ± k j ± j z ± wyw j ± j ÒCross Language Information RetrievalÓ j h ±ÒQuery TranslationÓ ± k h ± ± ± k ± h ± j ± ± ± 1. ± j y k j ± j z ± y p wywj ± 1

j ÒCross-Language Information RetrievalÓ q z ± ± j ä ±åòdocument translationó ä hi ±åòquery translationó ÒMcCarley 1999Ó h ± NTCIR-2 i ÒKando et al. 2001Ó h h Assembly Parade Law Parade and Demonstration Constitution Freedom of speech Indemnification Communism Country separation Council of Grand Justices Legislation Amendments h ± dg ÒWord Sense DisambiguationÓ ÒIde and Veronis 1998, Chen and Chang 1998Ó ±ÒPhrase TranslationÓ ± ± ÒText CollectionÓ ± d g ± ÒLexical ChoiceÓ demonstration ± ä å ± ä å 2

j u 1. ± ÒGey and Chen 1997, Kwok 2001Ó 2. ÒOard 1999, Kwok 2001Ó u ± ± Kwok ± Michael Jordan ±ä å j ± Chen Ò1999Ó Òoccurrence statisticsó ± µ ± j ± ÒKnight and Graehl 1997, Chang et al. 2001Ó j j p j h ± ± ± ÒExample-based ApproachÓ ± ji Òdata-drivenÓ j i ± IBM Watson Brown Ò1988, 1990, 1993Ó ± j p j k u ± ÒStatistical Alignment and Machine TranslationÓ h ± j h ± j g u ± ÒAlignment ProbabilityÓ 3

± 2. ± ± ± ÒDirect ApproachÓ r ÒTransfer ApproachÓ 1980 j ÒEmpirical ApproachÓ ± ± Brown p Brown S ± T ± ÒTranslation ProbabilityÓ Pr( T S ) g (a) ± ÒLexical Translation ProbabilityÓ Pr( S i T j ) (b) ÒFertility ProbabilityÓ Pr( a b ) (c) ÒDistortion ProbabilityÓ Pr(i j, k, m ) S i S i T j T j a S i b T j k S m T Brown ± k ä å ä åqv ÒExpectation and Maximization 4

AlgorithmÓ ä å º ± ä å º ± k EM qv ± ± k ÒNoisy Channel ModelÓ ± N-gram ÒLanguage ModelÓ qv ÒBeam SearchÓ ± 3. ± Brown ± ± ÒalignmentÓ y ± ± S i, i i T j S i j 1 Pr(j i, k, m) 1 Pr(j i, k, m) =0,i i k ± ± A* S 1 S 2 S 3 ± S 1 {T 1, T 2 } S 2 {T 3, T 4 } S 3 {T 5 } A* = (0, 12, 34, 5)Ò 0 Ó k =3 m =5 ± A* 35% A* ÒMaximum Likelihood EstimationÓ 5

Pr MLE ( A* )=0.35 Pr( A* )=P(1 1,3,5) P(2 1,3,5) P(3 2,3,5) P(4 2,3,5) P(5 3,3,5) j Ò0.6Ó P(j i,3,5) k Pr( A* ) < (0.6) 5 = 0.046656 << 0.35 Pr(A*)<<Pr MLE ( A* ) w ± ± ± ÒAlignment ProbabilityÓ ± ± S ± T ± PrÒT SÓ g (a) ± ÒLexical Translation ProbabilityÓ Pr( T(A i ) S i ) (b) ± ÒAlignment ProbabilityÓ Pr ( A k, m )=Pr ( A 0, A 1, A 2,,A k k, m ) S i S i T(A i ) T S i A 0 T S f A i T S i f, i>0 k S m T S i T 6

B(A, i) E(A, i) k S i T(A i )={T B(A,i), T B(A,i)+1, T E(A,i) } A i ={B(A,i), B(A,i)+1,, E(A,i) } 4. ± k g 1. ± j 2. ± u ± k kn 3. ± j k 4. ± j 4.1 BDCr ÒBDC 1992Ó 3 ± ± i i k 96,156 ± ÒP n, Q n Ó,n=1,N h EM qv ª ± ± Och Ò2000Ó 7

IBM 1. Brown EM qv ± 2. ± j jv Pr(j i, k, m )=1/m j 0.5 i 0.5 3I ( i j, k, m) = 1 [1] m k i = k = j = m = S T i k j M Pr( j i,k,m) flight 1 2 1 4 0.875 flight 1 2 2 4 0.875 flight 1 2 3 4 0.625 flight 1 2 4 4 0.375 eight 2 2 1 4 0.375 eight 2 2 2 4 0.625 eight 2 2 3 4 0.875 eight 2 2 4 4 0.875 1 ± 2 ± 4 8 1 Pr( j i,k,m) Òflight eight, Ó 1 v 1 8

E C ± Pr (C E) Pr( C E) N k m n= 1 i= 1 j= 1 = N δ ( E, P ( i)) δ ( C, Q ( j)) Pr( j i, k, m) k m n= 1 i= 1 j= 1 n δ ( E, P ( i)) Pr( j i, k, m) P n (i) P n i Q n (j) Q n j k = P n m = Q n δ(x, y) =1 x = y, δ(x, y) =0 x y n n [2] S i T j i k j m Pr(j i,k,m) Pr(T j S i ) Pr(T j S i ) Pr(j i,k,m) flight 1 2 1 4 0.875 0.00797 0.00697 flight 1 2 2 4 0.875 0.00797 0.00697 flight 1 2 3 4 0.625 0.25770 0.16106 flight 1 2 4 4 0.375 0.16901 0.06338 eight 2 2 1 4 0.375 0.02903 0.01089 eight 2 2 2 4 0.625 0.04839 0.03024 eight 2 2 3 4 0.875 0.06774 0.05927 eight 2 2 4 4 0.875 0.06774 0.05927 2 ± 2 E C E Pr (C E) 0 1 2 2 1 ± 9

4.2 EM qv v EM qv º ÒGreedy MethodÓ ÒP n, Q n Ó 0 ± Ò 2Ó º (threshold) 0 1 0 k y ¼ 0.008 flight eight 2 3 Ò0, 34, 12Ó S i T i i j k m Pr(j i,k,m) Pr(T j S i ) Pr(T j S i ) Pr(j i,k,m) flight 1 3 2 4 0.625 0.25770 0.16106 flight 1 4 2 4 0.375 0.16901 0.06338 eight 2 1 2 4 0.375 0.02903 0.01089 eight 2 2 2 4 0.625 0.04839 0.03024 3Òflight eight, Ó Ò0, 34, 12Ó v - ± k 10

± k m u A count( A ( S, T) ) Pr( A k, m) = [3] count( k = S, m = T ) 4 k m A A 0 A 1 A 2 Pr(A k,m) 2 4 0 12 34 0.572025052 2 4 0 123 4 0.121317560 2 4 0 1 234 0.085479007 2 4 0 1234 0 0.078056136 2 4 0 0 1234 0.065066110 2 4 0 124 3 0.020992809 2 4 0 2 134 0.016585479 2 4 0 3 124 0.007886801 2 4 0 34 12 0.005915101 2 4 0 13 24 0.004059383 2 4 0 134 2 0.003363489 2 4 0 23 14 0.002551612 2 4 0 4 123 0.002319647 2 4 1 0 234 0.002087683 2 4 0 234 1 0.001855718 ± 15 EM qv 601 u 38 u 4 15 4 ¼ 11

1. ± ± ± 2 2. 90% 3. 5 2 4 5 y S T T(A 0 ) S 1 T(A 1 ) S 2 T(A 2 ) T-shaped antenna ƒ T-shaped antenna ƒ X-ray examination X-ray examination irresistible force irresistible force Unwritten law unwritten law Central Asia Central Asia mutual non-interference mutual non-interference undesirable element undesirable element unalterable truth unalterable truth come soon come soon a desperado a desperado 5 5 u v ± 4.2 ± g 12

Òbound morphemeó $empty$ i Òdata sparsenessó $any$ Good-Turing Òsmoothing methodó $any$ ± E C PrÒC EÓ flight 0.6480231012 flight 0.1411528654 flight 0.0602616768 flight $empty$ 0.0296114718 flight 0.0296114718 flight 0.0041786956 flight 0.0041786956 flight 0.0041786956 flight 0.0041786956 flight 0.0041786956 flight 0.0041786956 flight 0.0041786956 flight d 0.0041786956 6 flight $any$ 0.0000009248 flight ± 6 flight ± $empty$ $any$ flight $empty$ 0.0296114718 k ± 4 Ò0,0,1234Ó 13

Ò1,0,234Ó $any$ EM qv $empty$ r 4.3 EM qv v jv 1 ± S T A 0 A 1 A 2 T(A 0 ) T(A 1 ) T(A 2 ) Pr(T S,A) flight eight 0 34 12 0.0000788100 flight eight 0 3 124 0.0000000051 flight eight 12 34 0 $empty$ 0.0000000007 flight eight 12 3 4 0.0000000003 flight eight 2 3 14 0.0000000001 7 Pr( flight eight) 5 jv ÒS, TÓ A v ± Pr(T S, A) A Pr(T S, A) A A (S i, T(A i )) Pr( T S) = max Pr( T S, A) = max Pr( A k, m) A A* A k i= 1 Pr( T ( A ) S ) i i 14

A * = arg max Pr( T S, A) A = arg max Pr( A k, m) A k i= 1 Pr( T( A ) S ) i i [4] k = S, m = T (S, T) =Òflight eight, Ó A ± v A = (0, 12, 34) Pr(T S,A) =Pr(0,12,34 2,4) Pr( flight) Pr( eight) A = (0, 34, 12) Pr(T S,A) =Pr(0,34,12 2,4) Pr( flight) Pr( eight) A = (0, 3, 124) Pr(T S,A) =Pr(0, 3, 124 2,4) Pr( flight) Pr( eight) A = (2, 34, 1) Pr(T S,A) =Pr(2, 34, 1 2,4) Pr( flight) Pr( eight) A = (12, 34, 0) Pr(T S,A) =Pr(12, 34, 0 2,4) Pr( flight) Pr($empty$ eight) 7 Òflight eight, Ó ± 7 A* = (0, 34, 12) 5. ± ± k 10 i EM qv j j 15

S T T(A 0 ) T(A 1 ) T(A 2 ) T(A 0 ) T(A 1 ) T(A 2 ) association football delay flip-flop I demodulator fg fg f g Disgraceful act disregard to secret ballot bearer stock false retrieval used car infix operation jv jv jv 8 jv ± ¼ ± j j 1. j n ± association ± ä å ä å association football ± ± äa å 2. ± ± Òlight verbó make take to 3. ± Òflight eight, Ó ¼ u j Brown u 16

8 Och (2000) 100 ( 2 3 50 ) 2 u S (sure) P (possible) S P S P j (recall) (precision) (error rate) A Ι S recall = S 185 = = 212 0.873 A Ι P precision = A 226 = = 273 0.828 A Ι S + AΙ P 185+ 226 AER( S, P; A) = 1 = 1 = 0.153 A + S 273+ 212 6 ± 6.1 ± ± i k i Òflight attendant Ó Òflight, Ó Òflight attendant Ó flight $any$ $any$ j 17

j œ~ d d œ~ Òflight, Ó ±Òflight, Ó Òflight, Ó ± Òflight, Ó 0 Òattendant, Ó ±Òattendant, Ó Òattendant, Ó Òattendant, Ó Òattendant, $any$ó ± j Òpreservation, Ó Òpreservation, Ó Òpreservation, Ó ± Òflight attendant Ó ± Pr($any$ flight) Pr($any$ eight) m ± Pr(A 2,3) u Ò0, 12, 3Ó Òflight, Ó Òattendant, Ó ± Òflight, Ó Òattendant, Ó j j Òflight, Ó Òattendant, Ó 18

LTP º k LTP j 6.2 ± ± ±j ± ± v 6.2.1 j j h ± ± ± h ± u 1. ± Òdocument frequencyó ± ± 2. k k ± N-gram qv ± ± ª ± ± h S ± T* Pr( T S) = max Pr( T S, A) A 19

T * = arg max Pr( T S) Pr( T ) T = arg max max Pr( T S, A) Pr( T ) T = arg max max Pr( A k, m) T A A k i= 1 Pr( T ( A ) S ) Pr( T ) Pr(T) = ± T k = S m = T Pr(A k, m) Pr(T S) v qv ÒBranch and Bound AlgorithmÓ T* Pr(A k, *) Pr(* S i ) gòsolutionó Òupper boundó Pr(* S i ) Pr(T) N-gram k r g œ 6.2.2 ± ƒ ± ÒKupiec 1993Ó ± k º P v ± A ± T j+1, j+m i i arg max Pr( A n( i), m) Pr( T A, j, m k= 1 j+ 1, j+ m = arg max Pr( A n( i), m) Pr( T A, j, m n( i ) = arg max Pr( A n( i), m) Pr( T A, j, m j+ 1, j+ m P, A) S b, e, A) j+ B( A, K ), j+ E( A, k ) 20 S b+ k 1 A n m )

Pr( A n, m) A S b+k-1 P k T j+b(a,k),j+e(a,k) P k ± A ± qv S T k (P 1, Q 1 ), (P 2, Q 2 ),,(P k, Q k ) (1). ÒPart of speech taggeró S ÒlemmaÓ (2). Òshallow parsingó Òbasic phraseó P 1, P 2,,P k P I = S b(i), e(i), P b =n(i) (3). P I v ± A ± T j ð (i) 1, j m*(i) ÒBranch and Bound AlgorithmÓ A*(i), j*(i), m*(i) argmax Pr( A n( i), m) A, j, m n( i) k = 1 Pr( T j+ B( A, k ), j+ E ( A, k ) S b+ k 1 ) (4). P I (P I, T j ð (i) 1, j m*(i) ) 7. ± ± h ± j 21

Brown v ±ª ƒ ± ± ± j h 1. ± 2. dg h œ ± 3. d Wordnet ± dg ƒf 892420H007001 g ~ ± 1. BDC 1992 The BDC Chinese-English electronic dictionary (version 2.0), Behavior Design Corporation, Taiwan. 2. Brown, P. F., Cocke J., Della Pietra S. A., Della Pietra V. J., Jelinek F., Mercer R. L., and Roosin P. S. 1988 A Statistical Approach to Language Translation, In Proceedings of the 12th International Conference on Computational Linguistics, Budapest, Hungary, pp. 71-76. 22

3. Brown, P. F., Cocke J., Della Pietra S. A., Della Pietra V. J., Jelinek F., Lafferty J. D., Mercer R. L., and Roosin P. S. 1990 A Statistical Approach to Machine Translation, Computational Linguistics, 16/2, pp. 79-85. 4. Brown, P. F., Della Pietra S. A., Della Pietra V. J., and Mercer R. L. 1993 The Mathematics of Statistical Machine Translation: Parameter Estimation, Computational Linguistics, 19/2, pp. 263-311. 5. Chang, J. S. et al. 2001. Nathu IR System at NTCIR-2. In Proceedings of the Second NTCIR Workshop Meeting on Evaluation of Chinese and Japanese Text Retrieval and Text Summarization, pp. (5) 49-52, National Institute of Informatics, Japan. 6. Chang, J. S., Ker S. J., and Chen M. H. 1998 Taxonomy and Lexical Semantics From the Perspective of Machine Readable Dictionary, In Proceedings of the third Conference of the Association for Machine Translation in the Americas (AMTA), pp. 199-212. 7. Chen, H.H., G.W. Bian and W.C. Lin. 1999. Resolving Translation Ambiguity and Target Polysemy in Cross-Language Information Retrieval. In Proceedings of the 37 th Annual Meeting of the Association for Computation Linguistics, pp 215-222. 8. Dagan, I., Church K. W. and Gale W. A. 1993 Robust Bilingual Word Alignment or Machine Aided Translation, In Proceedings of the Workshop on Very Large Corpora Academic and Industrial Perspectives, pp. 1-8. 9. Fung, P. and McKeown K. 1994 Aligning Noisy Parallel Corpora across Language Groups: Word Pair Feature Matching by Dynamic Time Warping, In Proceedings of the First Conference of the Association for Machine Translation in the Americas (AMTA), pp. 81-88, Columbia, Maryland, USA. 10. Gale, W. A. and Church K. W. 1991 Identifying Word Correspondences in Parallel Texts, In Proceedings of the Fourth DARPA Speech and Natural Language Workshop, pp. 152-157. 11. Gey, F C and A. Chen. 1997. Phrase Discovery for English and Cross-Language Retrieval at TREC-6. In Proceedings of the 6 th Text Retrieval Evaluation Conference, pp 637-648. 23

12. Ide, N. and J Veronis. 1998. Special Issue on Word Sense Disambiguation, editors, Computational Linguistics, 24/1. 13. Isabelle, P. 1987 Machine Translation at the TAUM Group, In M. King, editor, Machine Translation Today: The State of the Art, Proceedings of the Third Lugano Tutorial, pp. 247-277. 14. Kando, Noriko, Kenro Aihara, Koji Eguchi and Hiroyuki Kato. 2001. Proceedings of the Second NTCIR Workshop Meeting on Evaluation of Chinese and Japanese Text Retrieval and Text Summarization, National Institute of Informatics, Japan. 15. Kay, M. and Röscheisen M. 1988 Text-Translation Alignment, Technical Report P90-00143, Xerox Palo Alto Research Center. 16. Ker, S. J. and Chang J. S. 1997 A Class-base Approach to Word Alignment, Computational Linguistics, 23/2, pp. 313-343. 17. Knight, K. and J Graehl. 1997. Machine Transliteration, In Proceedings of the 35 th Annual Meeting of the Association for Computational Linguistics and 8 th Conference of ACL European Chapter, pp. 128-135. 18. Kupiec, Julian. 1993 An Algorithm for finding noun phrase correspondence in bilingual corpus, In ACL 31, 23/2, pp. 17-22. 19. Kwok, K L. 2001. NTCIR-2 Chinese, Cross-Language Retrieval Experiments Using PIRCS. In Proceedings of the Second NTCIR Workshop Meeting on Evaluation of Chinese and Japanese Text Retrieval and Text Summarization, pp. (5) 14-20, National Institute of Informatics, Japan. 20. Longman Group 1992 Longman English-Chinese Dictionary of Contemporary English, Published by Longman Group (Far East) Ltd., Hong Kong. 21. McCarley, J. Scott. 1999. Should we Translate the Documents or the Queries in Cross-Language Information Retrieval? In Proceedings of the 37 th Annual Meeting of the Association for Computation Linguistics, pp 208-214. 22. Melamed, I. D. 1996 Automatic Construction of Clean Broad-Coverage Translation Lexicons, In Proceedings of the second Conference of the Association for Machine Translation in the Americas (AMTA), pp. 125-134. 23. Nagao, M. 1986 Machine Translation: How Far Can it Go? Oxford University Press, Oxford. 24

24. Oard, D W and J. Wang. 1999. Effect of Term Segmentation on Chinese/English Cross-Language Information Retrieval. In Proceedings of the Symposium on String and Processing and Information Retrieval. http://www.glue.umd.edu/~oard/research.html. 25. Och, Franz Josef and Hermann Ney. 2000. Improved Statistical Alignment Models. In Proceedings of the 38 th Annual Meeting of the Association for Computation Linguistics. 26. Pirkola, A. 1998. The Effect of Query Structure and Dictionary Setups in Dictionary-based Cross-Language Retrieval. In Proceedings of the 21 st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 55-63. 27. Smadja, F., McKeown K., and Hatzivassiloglou V. 1996 Translating Collocations for Bilingual Lexicons: A Statistical Approach, Computational Linguistics, 22/1, pp. 1-38. 28. Utsuro, T., Ikeda H., Yamane M., Matsumoto M., and Nagao M. 1994 Bilingual Text Matching Using Bilingual Dictionary and Statistics, In Proceedings of the 15th International Conference on Computational Linguistics, pp. 1076-1082. 29. Wu, D. and Xia X. 1994 Learning an English-Chinese Lexicon from a Parallel Corpus, In Proceedings of the first Conference of the Association for Machine Translation in the Americas (AMTA), pp. 206-213. 25