Cross-language Retrieval Experiments at CLEF-2002
|
|
- Daisy Barton
- 6 years ago
- Views:
Transcription
1 Cross-language Retrieval Experiments at CLEF-2002 Aitao Chen School of Information Management and Systems University of California at Berkeley CLEF 2002 Workshop: 9-20 September, 2002, Rome, Italy
2 Talk Outline Overview of our CLEF-2002 experiments Evaluation of merging strategies German/Dutch decompounding Query expansion Conclusions
3 Overview of Multilingual Information Retrieval Experiments Query English IR/RF Documents English French/2 IR/RF French Babelfish L&H German/2 Italian/2 IR/RF/ DC IR/RF German Italian Spanish/2 IR/RF Spanish English docs French docs German docs Italian docs Spanish docs merger combined ranked list of documents
4 Multilingual Information Retrieval: Direct Merging English docs French docs Italian docs German docs Spanish docs E e E2 e2 E50 e50 E5 e5 E000 e000 F f F2 f2 F50 f50 F5 f5 F000 f000 I i I2 i2 I50 i50 I5 i5 I000 i000 G g G2 g2 G50 g50 G5 g5 G000 g000 S s S2 s2 S50 s50 S5 s5 S000 s000 () combine ranked lists; (2) sort by raw score; (3) take the top 000 docs Weakness: prone to un-comparable raw relevance scores in individual ranked lists.
5 Multilingual Information Retrieval: Normalized Merging English docs French docs Italian docs German docs Spanish docs E e E2 e2 E50 e50 E000 e000 F f F2 f2 F50 f50 F000 f000 I i I2 i2 I50 i50 I000 i000 G g G2 g2 G50 g50 G000 g000 S s S2 s2 S50 s50 S000 s000 E e/e E2 e2/e E50 e50/e E000 e000/e F f/f F2 f2/f F50 f50/f F000 f000/f I i/i I2 i2/i I50 i50/i I000 i000/i G g/g G2 g2/g G50 g50/g G000 g000/g S s/s S2 s2/s S50 s50/s S000 s000/s () combine ranked lists; (2) sort by normalized score; (3) take the top 000 docs Weakness: prone to skewed distribution of relevant documents over the document subcollections.
6 Optimal Merging (Known Relevance) (red=rel doc; black=irrel doc) Table Rank Run A Run B Run C A B C 2 A2 B2 C2 3 A3 B3 C3 4 A4 B4 C4 Table 2 Table 3 Set Optimal Ranking (0,) {A} Set 2 3 Run A (0,) {A} (,) {A2,A3} (,0) {A3} Run B (2,) {B,B2,B3} (,0) {B4} Run C (,3) {C,C2,C3,C4} Choose the set with the smallest number of irrelevant documents, but the largest number of relevant documents from the set of active sets.
7 Optimal Merging (Known Relevance) (red=rel doc; black=irrel doc) Set Optimal Ranking A Rank Run A A A2 A3 A4 Run B B B2 B3 B4 Run C C C2 C3 C (0,) {A} (,3) {C,C2,C3,C4} (,) {A2,A3} (2,) {B,B2,B3} (,0) {A4} (,0) {B4} C C2 C3 C4 A2 A3 B Set Run A Run B (2,) {B,B2,B3} Run C (,3) {C,C2,C3,C4} 9 0 B2 B3 A4 2 (,) {A2,A3} (,0) {B4} 2 B4 3 (,0) {A3}
8 Performances of MLIR with Different Merging Strategies (Topics: English,TD) (.4705) (.4773) (.4479) (.4008) (.4567) English docs French docs German docs Italian docs Spanish docs Direct Merging (.3762) Normalized Merging (.3570) Optimal Merging (.577) 72.67% 68.96% English French German Italian Spanish No. topics with no rel docs
9 Why decompounding? Topic 09: Computersicherheit (computer security) in title & desc fields, but not in the German document collection. Topic 88: The Dutch compound gekkekoeienziekte (mad cow disease) is not in the Dutch document collection. Topic 3: Fussballeuropameisterschaft in the title, but Europameisterschaft im FuBball in the desc. (B/ss) Topic 5: Scheidungsstatistiken (divorce statistics) in the title, but Statistiken uber die Scheidungsraten in the desc. In Der Spiegel : Literaturnobelpreistrager, Literatur- Nobelpreistrager, Literaturnobelpreis-Tragerin; Literaturnobelpreis v.s. Nobelpreis fur Literatur. The German translation of Latin America by Babelfish was lateinischem Amerika, not Lateinamerika. Bronchialasthma was not translated into English by Babelfish, but Bronchial and Asthma were.
10 German/Dutch Decompounding Procedure Create a German/Dutch base dictionary consisting of single words only (compounds are excluded). Decompose a compound into component words found in the German/Dutch base dictionary. Choose the decomposition with the minimum number of component words. If there are more than one decompositions having the minimum number of component words, choose the decomposition with the highest probability.
11 German Decompounding: Example Compound: fussballeuropameisterschaft (European Football Cup). Base dictionary ball europa fuss fussball meisterschaft s 2. Decompose a compound with respect to the base dictionary.. fuss ball europa meisterschaft 2. fussball europa meisterschaft 3. Choose the decomposition with the smallest number of component words. fussballeuropameisterschaft = fussball europa meisterschaft
12 German Decompounding: Example 2 Compound: wintersports (winter sports). Base dictionary port ports s sport sports winter winters 2. Decompose a compound with respect to the base dictionary. Decompositions log p(d). winter s ports winter sports winters ports Choose the most likely decomposition. wintersports = winter sports
13 Decompounding: Probability of Decomposition C = W W2 W3 W4 p( C) = p( W ) p( W2 ) p( W3 ) p( W4 ) p( w) = n tfc( i= tfc( w) w i ) Relative frequency of w in a collection. tfc(w) is the number of times word w occurs in a corpus. n is the number of unique words (including compounds) in a corpus.
14 Evaluation of Decompounding Test Without With collection Run type decompounding decompounding Change CLEF-2002 German--German % CLEF-2002 Dutch--Dutch % CLEF-200 Dutch--Dutch % CLEF-2002 English--German % (L&H) CLEF-2002 English--German % (Babelfish) CLEF-2002 French German % (Babelfish)
15 Document Ranking query t t2 2 t3 t4 t2 t3 4 t5 3 t8 documents n, { qtf, dtf, ctf }, ql, dl, i i i i =.. n cl x n qtf n dtf n ctf = ql c dl c, cl 4 2 i= i= i= i, x = log i, x = log i x = n logit( p( q, d)) = p p log = b 0 + b x + b 2 x 2 + b 3 x 3 + b 4 x 4
16 Query Expansion Select terms from top-ranked documents after the initial search. Assign weight to selected terms. Combine selected terms with original query terms.
17 Query Expansion (2) Step : term selection. relevant irrelevant w = n n 3 n 2 indexed not indexed n 4 ) Rank terms in the presumed relevant documents by w in descending order; 2) Choose the top-ranked m terms, m = 2 * average-number-of-unique-queryterms. Alternative weighting schemes include a) Maximum Likelihood Ratio; b) Chi-square statistics; c) Mutual information. n n2 n3 n4 Steps 2 & 3: term-weighting and merging. Initial query Selected terms T () T2 (2) T2 (2*0.5) T3 () T3 (*0.5) T4 (0.5) Expanded query T (.0) T2 (3.0) T3 (.5) T4 (0.5)
18 Evaluation of Query Expansion (0 terms/0 docs) Run id Run type No query expansion Query expansion Change bky2monl Dutch-Dutch % bky2mofr French-French % bky2mode German-German % bky2moit Italian-Italian % bky2moes Spanish-Spanish % bky2bienfr English-French % bky2bienfr2 English-French % bky2bidefr German-French % bky2biende English-German % bky2bifrde French-German % bky2bienit English-Italian % bky2bienes English-Spanish % bky2biennl English-Dutch %
19 Evaluation of Decompounding, Stemming and Query Expansion in Monolingual Retrieval (Topics: German, TD) decomp+stem+expan.5234 (5.8%).4393 (26.89%).457 (30.47%).4393 (26.89%) decomp+stem decomp+expan stem+expan.3859 (.47%).3633 (4.94%).445 (9.73%) decomp stem expan baseline.3465
20 English-to-French Dictionary Built from Parallel Texts. fall in sale of cars 2. ski race; car race 3. pop star; galaxy star autome 0.32 tomber 0.08 telever 0.06 race 0.60 courser 0.8 racial 0.05 star 0.62 etoile 0.3 etoiler rock music 5. lead singer rock 0.89 rocher 0.02 pierre 0.0 mener 0.3 conduire 0.08 amener 0.07 principal Run id bky2enfr3 bky2enfr4 bky2enfr5 Resource Babelfish L&H Parallel texts AP
21 Conclusions The simplest direct merging method worked better than the score-normalized method when the intermediate ranked lists were produced under similar conditions (e.g., roughly the same query length, the same number of terms selected from the same number of documents). Decompounding improved the retrieval performance of German/Dutch monolingual and cross-language retrieval to German. The margin of improvement varies from one topic set to another. Query expansion substantially improved the performance of monolingual, cross-language, and multilingual retrieval.
22 THANK YOU
Variable Latent Semantic Indexing
Variable Latent Semantic Indexing Prabhakar Raghavan Yahoo! Research Sunnyvale, CA November 2005 Joint work with A. Dasgupta, R. Kumar, A. Tomkins. Yahoo! Research. Outline 1 Introduction 2 Background
More informationGCSE Results June 2018: Grades A* G
GCSE Results June 2018: Grades A* G Spec A* A* A A B B C C D D E E F F G G Total Code No % Cum No Cum% Cum No Cum% Cum No Cum% Cum No Cum% Cum No Cum% Cum No Cum% Cum No Cum% Examined Ancient History (QN:
More informationAdvanced GCE Results June 2017: Grades A* E
percentage of candidates at each of Grades A* to E are given. Advanced GCE Results June 2017: Grades A* E Specification Title Spec A* A* A A B B C C D D E E Total Code No % Cum No Cum% Cum No Cum% Cum
More informationNatural Language Processing. Topics in Information Retrieval. Updated 5/10
Natural Language Processing Topics in Information Retrieval Updated 5/10 Outline Introduction to IR Design features of IR systems Evaluation measures The vector space model Latent semantic indexing Background
More informationDistribution of GCSE Grades Summer 2016
Distribution of GCSE Grades Summer 2016 Entries A* A B C D E F G U X Art and Design F 62 3 12 25 16 6 1 1 0 0 0 M 28 0 1 3 10 11 1 1 0 0 0 All 90 3 13 28 26 17 2 1 0 0 0 Business Studies F 44 0 2 9 14
More informationCross-Lingual Language Modeling for Automatic Speech Recogntion
GBO Presentation Cross-Lingual Language Modeling for Automatic Speech Recogntion November 14, 2003 Woosung Kim woosung@cs.jhu.edu Center for Language and Speech Processing Dept. of Computer Science The
More informationVector Space Model. Yufei Tao KAIST. March 5, Y. Tao, March 5, 2013 Vector Space Model
Vector Space Model Yufei Tao KAIST March 5, 2013 In this lecture, we will study a problem that is (very) fundamental in information retrieval, and must be tackled by all search engines. Let S be a set
More informationPROBABILISTIC LATENT SEMANTIC ANALYSIS
PROBABILISTIC LATENT SEMANTIC ANALYSIS Lingjia Deng Revised from slides of Shuguang Wang Outline Review of previous notes PCA/SVD HITS Latent Semantic Analysis Probabilistic Latent Semantic Analysis Applications
More informationINFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from
INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Schütze s, linked from http://informationretrieval.org/ IR 26/26: Feature Selection and Exam Overview Paul Ginsparg Cornell University,
More informationLatent semantic indexing
Latent semantic indexing Relationship between concepts and words is many-to-many. Solve problems of synonymy and ambiguity by representing documents as vectors of ideas or concepts, not terms. For retrieval,
More informationCSE 494/598 Lecture-6: Latent Semantic Indexing. **Content adapted from last year s slides
CSE 494/598 Lecture-6: Latent Semantic Indexing LYDIA MANIKONDA HT TP://WWW.PUBLIC.ASU.EDU/~LMANIKON / **Content adapted from last year s slides Announcements Homework-1 and Quiz-1 Project part-2 released
More informationRETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
RETRIEVAL MODELS Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Retrieval models Boolean model Vector space model Probabilistic
More informationLatent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology
Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2016 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276,
More informationName three countries in Europe.
Name three countries in Europe. I will be able to identify the 5 themes of geography for Europe and locate the countries and capitals of Europe and the major physical features. People: Past: Through the
More informationInformation Retrieval
Introduction to Information CS276: Information and Web Search Christopher Manning and Pandu Nayak Lecture 13: Latent Semantic Indexing Ch. 18 Today s topic Latent Semantic Indexing Term-document matrices
More informationLatent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology
Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2014 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276,
More informationCS 3750 Advanced Machine Learning. Applications of SVD and PCA (LSA and Link analysis) Cem Akkaya
CS 375 Advanced Machine Learning Applications of SVD and PCA (LSA and Link analysis) Cem Akkaya Outline SVD and LSI Kleinberg s Algorithm PageRank Algorithm Vector Space Model Vector space model represents
More informationPV211: Introduction to Information Retrieval
PV211: Introduction to Information Retrieval http://www.fi.muni.cz/~sojka/pv211 IIR 11: Probabilistic Information Retrieval Handout version Petr Sojka, Hinrich Schütze et al. Faculty of Informatics, Masaryk
More informationSyntax versus Semantics:
Syntax versus Semantics: Analysis of Enriched Vector Space Models Benno Stein and Sven Meyer zu Eissen and Martin Potthast Bauhaus University Weimar Relevance Computation Information retrieval aims at
More informationA graph for a quantitative variable that divides a distribution into 25% segments.
STATISTICS Unit 2 STUDY GUIDE Topics 6-10 Part 1: Vocabulary For each word, be sure you know the definition, the formula, or what the graph looks like. Name Block A. association M. mean absolute deviation
More informationChap 2: Classical models for information retrieval
Chap 2: Classical models for information retrieval Jean-Pierre Chevallet & Philippe Mulhem LIG-MRIM Sept 2016 Jean-Pierre Chevallet & Philippe Mulhem Models of IR 1 / 81 Outline Basic IR Models 1 Basic
More informationInformation Retrieval and Organisation
Information Retrieval and Organisation Chapter 13 Text Classification and Naïve Bayes Dell Zhang Birkbeck, University of London Motivation Relevance Feedback revisited The user marks a number of documents
More informationInformation Retrieval
Introduction to Information Retrieval Lecture 12: Language Models for IR Outline Language models Language Models for IR Discussion What is a language model? We can view a finite state automaton as a deterministic
More informationLatent Semantic Analysis. Hongning Wang
Latent Semantic Analysis Hongning Wang CS@UVa Recap: vector space model Represent both doc and query by concept vectors Each concept defines one dimension K concepts define a high-dimensional space Element
More informationUniversity of Illinois at Urbana-Champaign. Midterm Examination
University of Illinois at Urbana-Champaign Midterm Examination CS410 Introduction to Text Information Systems Professor ChengXiang Zhai TA: Azadeh Shakery Time: 2:00 3:15pm, Mar. 14, 2007 Place: Room 1105,
More informationA Pairwise Document Analysis Approach for Monolingual Plagiarism Detection
A Pairwise Document Analysis Approach for Monolingual Plagiarism Detection Introuction Plagiarism: Unauthorize use of Text, coe, iea, Plagiarism etection research area has receive increasing attention
More informationGIR Experimentation. Abstract
GIR Experimentation Andogah Geoffrey Computational Linguistics Group Centre for Language and Cognition Groningen (CLCG) University of Groningen Groningen, The Netherlands g.andogah@rug.nl, annageof@yahoo.com
More informationCS47300: Web Information Search and Management
CS47300: Web Information Search and Management Prof. Chris Clifton 6 September 2017 Material adapted from course created by Dr. Luo Si, now leading Alibaba research group 1 Vector Space Model Disadvantages:
More informationTerms in Time and Times in Context: A Graph-based Term-Time Ranking Model
Terms in Time and Times in Context: A Graph-based Term-Time Ranking Model Andreas Spitz, Jannik Strötgen, Thomas Bögel and Michael Gertz Heidelberg University Institute of Computer Science Database Systems
More information1 Information retrieval fundamentals
CS 630 Lecture 1: 01/26/2006 Lecturer: Lillian Lee Scribes: Asif-ul Haque, Benyah Shaparenko This lecture focuses on the following topics Information retrieval fundamentals Vector Space Model (VSM) Deriving
More informationINFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from
INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Schütze s, linked from http://informationretrieval.org/ IR 8: Evaluation & SVD Paul Ginsparg Cornell University, Ithaca, NY 20 Sep 2011
More informationProbabilistic Field Mapping for Product Search
Probabilistic Field Mapping for Product Search Aman Berhane Ghirmatsion and Krisztian Balog University of Stavanger, Stavanger, Norway ab.ghirmatsion@stud.uis.no, krisztian.balog@uis.no, Abstract. This
More informationMidterm Examination Practice
University of Illinois at Urbana-Champaign Midterm Examination Practice CS598CXZ Advanced Topics in Information Retrieval (Fall 2013) Professor ChengXiang Zhai 1. Basic IR evaluation measures: The following
More information5 10 12 32 48 5 10 12 32 48 4 8 16 32 64 128 4 8 16 32 64 128 2 3 5 16 2 3 5 16 5 10 12 32 48 4 8 16 32 64 128 2 3 5 16 docid score 5 10 12 32 48 O'Neal averaged 15.2 points 9.2 rebounds and 1.0 assists
More informationPart I: Web Structure Mining Chapter 1: Information Retrieval and Web Search
Part I: Web Structure Mining Chapter : Information Retrieval an Web Search The Web Challenges Crawling the Web Inexing an Keywor Search Evaluating Search Quality Similarity Search The Web Challenges Tim
More informationCS 572: Information Retrieval
CS 572: Information Retrieval Lecture 11: Topic Models Acknowledgments: Some slides were adapted from Chris Manning, and from Thomas Hoffman 1 Plan for next few weeks Project 1: done (submit by Friday).
More informationTest One Mathematics Fall 2009
Test One Mathematics 35.2 Fall 29 TO GET FULL CREDIT YOU MUST SHOW ALL WORK! I have neither given nor received aid in the completion of this test. Signature: pts. 2 pts. 3 5 pts. 2 pts. 5 pts. 6(i) pts.
More informationThe Static Absorbing Model for the Web a
Journal of Web Engineering, Vol. 0, No. 0 (2003) 000 000 c Rinton Press The Static Absorbing Model for the Web a Vassilis Plachouras University of Glasgow Glasgow G12 8QQ UK vassilis@dcs.gla.ac.uk Iadh
More informationAn Empirical Study on Dimensionality Optimization in Text Mining for Linguistic Knowledge Acquisition
An Empirical Study on Dimensionality Optimization in Text Mining for Linguistic Knowledge Acquisition Yu-Seop Kim 1, Jeong-Ho Chang 2, and Byoung-Tak Zhang 2 1 Division of Information and Telecommunication
More informationEmbeddings Learned By Matrix Factorization
Embeddings Learned By Matrix Factorization Benjamin Roth; Folien von Hinrich Schütze Center for Information and Language Processing, LMU Munich Overview WordSpace limitations LinAlgebra review Input matrix
More informationIBM Model 1 for Machine Translation
IBM Model 1 for Machine Translation Micha Elsner March 28, 2014 2 Machine translation A key area of computational linguistics Bar-Hillel points out that human-like translation requires understanding of
More informationCS4800: Algorithms & Data Jonathan Ullman
CS4800: Algorithms & Data Jonathan Ullman Lecture 22: Greedy Algorithms: Huffman Codes Data Compression and Entropy Apr 5, 2018 Data Compression How do we store strings of text compactly? A (binary) code
More informationBoolean and Vector Space Retrieval Models
Boolean and Vector Space Retrieval Models Many slides in this section are adapted from Prof. Joydeep Ghosh (UT ECE) who in turn adapted them from Prof. Dik Lee (Univ. of Science and Tech, Hong Kong) 1
More informationMatrix Decomposition and Latent Semantic Indexing (LSI) Introduction to Information Retrieval INF 141/ CS 121 Donald J. Patterson
Matrix Decomposition and Latent Semantic Indexing (LSI) Introduction to Information Retrieval INF 141/ CS 121 Donald J. Patterson Latent Semantic Indexing Outline Introduction Linear Algebra Refresher
More informationMaschinelle Sprachverarbeitung
Maschinelle Sprachverarbeitung Retrieval Models and Implementation Ulf Leser Content of this Lecture Information Retrieval Models Boolean Model Vector Space Model Inverted Files Ulf Leser: Maschinelle
More informationUptake of IGCSE subjects 2012
Uptake of IGCSE subjects 2012 Statistics Report Series No.63 Tom Sutch September 2013 Research Division Assessment, Research and Development Cambridge Assessment 1 Regent Street, Cambridge, CB2 1GG Introduction
More informationLecture 13: More uses of Language Models
Lecture 13: More uses of Language Models William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 13 What we ll learn in this lecture Comparing documents, corpora using LM approaches
More informationText Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University
Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Yuriy Sverchkov Intelligent Systems Program University of Pittsburgh October 6, 2011 Outline Latent Semantic Analysis (LSA) A quick review Probabilistic LSA (plsa)
More informationYORK UNIVERSITY - UNIVERSITÉ YORK
Faculty of Liberal Arts and Professional Studies Administrative Studies 1,538 1,323 336 309 3,506 0 0 1 0 1 African Studies 0 0 0 0 0 3 6 2 3 14 Anthropology 49 90 9 30 178 0 0 0 0 0 Applied Mathematics
More informationText Analytics (Text Mining)
http://poloclub.gatech.edu/cse6242 CSE6242 / CX4242: Data & Visual Analytics Text Analytics (Text Mining) Concepts, Algorithms, LSI/SVD Duen Horng (Polo) Chau Assistant Professor Associate Director, MS
More informationRanked Retrieval (2)
Text Technologies for Data Science INFR11145 Ranked Retrieval (2) Instructor: Walid Magdy 31-Oct-2017 Lecture Objectives Learn about Probabilistic models BM25 Learn about LM for IR 2 1 Recall: VSM & TFIDF
More informationXVII. Science and Technology/Engineering, Grade 8
XVII. Science and Technology/Engineering, Grade 8 Grade 8 Science and Technology/Engineering Test The spring 2017 grade 8 Science and Technology/Engineering test was based on learning standards in the
More informationLatent Semantic Models. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze
Latent Semantic Models Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze 1 Vector Space Model: Pros Automatic selection of index terms Partial matching of queries
More informationLatent Semantic Analysis. Hongning Wang
Latent Semantic Analysis Hongning Wang CS@UVa VS model in practice Document and query are represented by term vectors Terms are not necessarily orthogonal to each other Synonymy: car v.s. automobile Polysemy:
More informationOn the Foundations of Diverse Information Retrieval. Scott Sanner, Kar Wai Lim, Shengbo Guo, Thore Graepel, Sarvnaz Karimi, Sadegh Kharazmi
On the Foundations of Diverse Information Retrieval Scott Sanner, Kar Wai Lim, Shengbo Guo, Thore Graepel, Sarvnaz Karimi, Sadegh Kharazmi 1 Outline Need for diversity The answer: MMR But what was the
More informationA Study of the Dirichlet Priors for Term Frequency Normalisation
A Study of the Dirichlet Priors for Term Frequency Normalisation ABSTRACT Ben He Department of Computing Science University of Glasgow Glasgow, United Kingdom ben@dcs.gla.ac.uk In Information Retrieval
More informationI Stable, marginally stable, & unstable linear systems. I Relationship between pole locations and stability. I Routh-Hurwitz criterion
EE C128 / ME C134 Feedback Control Systems Lecture Chapter 6 Stability Lecture abstract Alexandre Bayen Department of Electrical Engineering & Computer Science University of California Berkeley Topics
More informationDover- Sherborn High School Mathematics Curriculum Probability and Statistics
Mathematics Curriculum A. DESCRIPTION This is a full year courses designed to introduce students to the basic elements of statistics and probability. Emphasis is placed on understanding terminology and
More informationDimension Reduction and Iterative Consensus Clustering
Dimension Reduction and Iterative Consensus Clustering Southeastern Clustering and Ranking Workshop August 24, 2009 Dimension Reduction and Iterative 1 Document Clustering Geometry of the SVD Centered
More informationNEAL: A Neurally Enhanced Approach to Linking Citation and Reference
NEAL: A Neurally Enhanced Approach to Linking Citation and Reference Tadashi Nomoto 1 National Institute of Japanese Literature 2 The Graduate University of Advanced Studies (SOKENDAI) nomoto@acm.org Abstract.
More informationLanguage Processing with Perl and Prolog
Language Processing with Perl and Prolog Chapter 5: Counting Words Pierre Nugues Lund University Pierre.Nugues@cs.lth.se http://cs.lth.se/pierre_nugues/ Pierre Nugues Language Processing with Perl and
More informationLanguage Models. Hongning Wang
Language Models Hongning Wang CS@UVa Notion of Relevance Relevance (Rep(q), Rep(d)) Similarity P(r1 q,d) r {0,1} Probability of Relevance P(d q) or P(q d) Probabilistic inference Different rep & similarity
More informationtext statistics October 24, 2018 text statistics 1 / 20
text statistics October 24, 2018 text statistics 1 / 20 Overview 1 2 text statistics 2 / 20 Outline 1 2 text statistics 3 / 20 Model collection: The Reuters collection symbol statistic value N documents
More information60% 50% 40% 30% 20% 10% 0%
Table 1. Number & Percentage of Students by IB Diploma-Seeking Status: N Percent N Percent N Percent Seeking Diploma 13 34% 11 34% 24 34% Seeking Certificate 8 21% 1 3% 9 13% Anticipated 17 45% 20 63%
More informationLecture 1b: Text, terms, and bags of words
Lecture 1b: Text, terms, and bags of words Trevor Cohn (based on slides by William Webber) COMP90042, 2015, Semester 1 Corpus, document, term Body of text referred to as corpus Corpus regarded as a collection
More informationUnsupervised Rank Aggregation with Distance-Based Models
Unsupervised Rank Aggregation with Distance-Based Models Kevin Small Tufts University Collaborators: Alex Klementiev (Johns Hopkins University) Ivan Titov (Saarland University) Dan Roth (University of
More informationOFFICIAL JOURNAL OF THE AMERICAN SOCIOLOGICAL ASSOCIATION
AMERICAN SOCIOLOGICAL REVIEW OFFICIAL JOURNAL OF THE AMERICAN SOCIOLOGICAL ASSOCIATION ONLINE SUPPLEMENT to article in AMERICAN SOCIOLOGICAL REVIEW, 2015, VOL. 80 Do Women Suffer from Network Closure?
More informationSocial Studies Mr. Poirier Introduction Test - Study Guide
Social Studies Mr. Poirier Introduction Test - Study Guide Study Guide given in class on Monday, September 15, 2014 Test Date: Friday, September 19, 2014 I. Study the following Vocabulary Words to be defined:
More informationBoolean and Vector Space Retrieval Models CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK).
Boolean and Vector Space Retrieval Models 2013 CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK). 1 Table of Content Boolean model Statistical vector space model Retrieval
More informationGeometric and algebraic structures in pattern recognition
Geometric and algebraic structures in pattern recognition Luke Oeding Department of Mathematics, University of California, Berkeley April 30, 2012 Multimedia Pattern Recognition Rolf Bardeli mmprec.iais.fraunhofer.de/
More informationCUNI at the CLEF ehealth 2015 Task 2
CUNI at the CLEF ehealth 2015 Task 2 Shadi Saleh, Feraena Bibyna, and Pavel Pecina Charles University in Prague Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics, Czech Republic
More informationC SC 620 Advanced Topics in Natural Language Processing. Lecture 21 4/13
C SC 620 Advanced Topics in Natural Language Processing Lecture 21 4/13 Reading List Readings in Machine Translation, Eds. Nirenburg, S. et al. MIT Press 2003. 19. Montague Grammar and Machine Translation.
More informationCross-Language French-English Question Answering using the DLT System at CLEF 2004
Cross-Language French-English Question Answering using the DLT System at CLEF 2004 Richard F. E. Sutcliffe Igal Gabbay Michael Mulcahy Aoife O'Gorman Documents and Linguistic Technology Group Department
More informationVector Model Improvement by FCA and Topic Evolution
Vector Model Improvement by FCA and Topic Evolution Petr Gajdoš Jan Martinovič Department of Computer Science, VŠB - Technical University of Ostrava, tř. 17. listopadu 15, 708 33 Ostrava-Poruba Czech Republic
More informationFRANKLIN UNIVERSITY PROFICIENCY EXAM (FUPE) STUDY GUIDE
FRANKLIN UNIVERSITY PROFICIENCY EXAM (FUPE) STUDY GUIDE Course Title: Probability and Statistics (MATH 80) Recommended Textbook(s): Number & Type of Questions: Probability and Statistics for Engineers
More informationQuery Performance Prediction: Evaluation Contrasted with Effectiveness
Query Performance Prediction: Evaluation Contrasted with Effectiveness Claudia Hauff 1, Leif Azzopardi 2, Djoerd Hiemstra 1, and Franciska de Jong 1 1 University of Twente, Enschede, the Netherlands {c.hauff,
More informationIntroduction to Information Retrieval
Introduction to Information Retrieval http://informationretrieval.org IIR 18: Latent Semantic Indexing Hinrich Schütze Center for Information and Language Processing, University of Munich 2013-07-10 1/43
More informationINF 141 IR METRICS LATENT SEMANTIC ANALYSIS AND INDEXING. Crista Lopes
INF 141 IR METRICS LATENT SEMANTIC ANALYSIS AND INDEXING Crista Lopes Outline Precision and Recall The problem with indexing so far Intuition for solving it Overview of the solution The Math How to measure
More informationLanguage Models. Web Search. LM Jelinek-Mercer Smoothing and LM Dirichlet Smoothing. Slides based on the books: 13
Language Models LM Jelinek-Mercer Smoothing and LM Dirichlet Smoothing Web Search Slides based on the books: 13 Overview Indexes Query Indexing Ranking Results Application Documents User Information analysis
More informationLinear Algebra Background
CS76A Text Retrieval and Mining Lecture 5 Recap: Clustering Hierarchical clustering Agglomerative clustering techniques Evaluation Term vs. document space clustering Multi-lingual docs Feature selection
More informationINFO 4300 / CS4300 Information Retrieval. IR 9: Linear Algebra Review
INFO 4300 / CS4300 Information Retrieval IR 9: Linear Algebra Review Paul Ginsparg Cornell University, Ithaca, NY 24 Sep 2009 1/ 23 Overview 1 Recap 2 Matrix basics 3 Matrix Decompositions 4 Discussion
More informationCINQA Workshop Probability Math 105 Silvia Heubach Department of Mathematics, CSULA Thursday, September 6, 2012
CINQA Workshop Probability Math 105 Silvia Heubach Department of Mathematics, CSULA Thursday, September 6, 2012 Silvia Heubach/CINQA 2012 Workshop Objectives To familiarize biology faculty with one of
More informationMAJOR IN INTERNATIONAL STUDIES, EUROPEAN STUDIES CONCENTRATION
Major in International Studies, European Studies Concentration 1 MAJOR IN INTERNATIONAL STUDIES, EUROPEAN STUDIES CONCENTRATION Requirements Effective Fall 2018 Freshman ANTH 200 Cultures and the Global
More informationCS276A Text Information Retrieval, Mining, and Exploitation. Lecture 4 15 Oct 2002
CS276A Text Information Retrieval, Mining, and Exploitation Lecture 4 15 Oct 2002 Recap of last time Index size Index construction techniques Dynamic indices Real world considerations 2 Back of the envelope
More informationBehavioral Data Mining. Lecture 3 Naïve Bayes Classifier and Generalized Linear Models
Behavioral Data Mining Lecture 3 Naïve Bayes Classifier and Generalized Linear Models Outline Naïve Bayes Classifier Regularization in Linear Regression Generalized Linear Models Assignment Tips: Matrix
More informationInformation Retrieval Basic IR models. Luca Bondi
Basic IR models Luca Bondi Previously on IR 2 d j q i IRM SC q i, d j IRM D, Q, R q i, d j d j = w 1,j, w 2,j,, w M,j T w i,j = 0 if term t i does not appear in document d j w i,j and w i:1,j assumed to
More informationNuevo examen - 02 de Febrero de 2017 [280 marks]
Nuevo examen - 0 de Febrero de 0 [0 marks] Jar A contains three red marbles and five green marbles. Two marbles are drawn from the jar, one after the other, without replacement. a. Find the probability
More informationDimensionality Reduction
Dimensionality Reduction Given N vectors in n dims, find the k most important axes to project them k is user defined (k < n) Applications: information retrieval & indexing identify the k most important
More informationWords vs. Terms. Words vs. Terms. Words vs. Terms. Information Retrieval cares about terms You search for em, Google indexes em Query:
Words vs. Terms Words vs. Terms Information Retrieval cares about You search for em, Google indexes em Query: What kind of monkeys live in Costa Rica? 600.465 - Intro to NLP - J. Eisner 1 600.465 - Intro
More information3. Name two countries that have a very high percentage of arable land. 4. What economic activity does most of Kenya s wealth come from?
AP Human Geography Chapter 1: Intro To Human Geo Reader s Notes I. What is Human Geography? 8-9 1. What South American country has the highest average daily calorie consumption per capita? 2. What are
More informationSemantic Similarity from Corpora - Latent Semantic Analysis
Semantic Similarity from Corpora - Latent Semantic Analysis Carlo Strapparava FBK-Irst Istituto per la ricerca scientifica e tecnologica I-385 Povo, Trento, ITALY strappa@fbk.eu Overview Latent Semantic
More informationWednesday, 10 September 2008
MA211 : Calculus, Part 1 Lecture 2: Sets and Functions Dr Niall Madden (Mathematics, NUI Galway) Wednesday, 10 September 2008 MA211 Lecture 2: Sets and Functions 1/33 Outline 1 Short review of sets 2 Sets
More informationPivoted Length Normalization I. Summary idf II. Review
2 Feb 2006 1/11 COM S/INFO 630: Representing and Accessing [Textual] Digital Information Lecturer: Lillian Lee Lecture 3: 2 February 2006 Scribes: Siavash Dejgosha (sd82) and Ricardo Hu (rh238) Pivoted
More informationOutline. Wednesday, 10 September Schedule. Welcome to MA211. MA211 : Calculus, Part 1 Lecture 2: Sets and Functions
Outline MA211 : Calculus, Part 1 Lecture 2: Sets and Functions Dr Niall Madden (Mathematics, NUI Galway) Wednesday, 10 September 2008 1 Short review of sets 2 The Naturals: N The Integers: Z The Rationals:
More informationInternational Semester Modeling and Simulation in Chemical and Process Engineering
International Semester Modeling and Simulation in Chemical and Process Engineering October March Internationales Zentrum Clausthal (IZC) International Center Clausthal (IZC) Clausthal University of Technology
More informationSets and Venn Diagrams
1) Sets and Venn Diagrams In a survey, 100 students are asked if they like basketball (), football (F) and swimming (S). The Venn diagram shows the results. F 20 25 q 12 17 p 8 r S 42 students like swimming.
More informationExercise 1: Basics of probability calculus
: Basics of probability calculus Stig-Arne Grönroos Department of Signal Processing and Acoustics Aalto University, School of Electrical Engineering stig-arne.gronroos@aalto.fi [21.01.2016] Ex 1.1: Conditional
More informationUniversity of Pittsburgh at Johnstown. Morehead State University (Kentucky) COURSE COURSE NUMBER PITT JOHNSTOWN COURSE TITLE CREDITS TRANSFER SUBJECT
TITLE PITT JOHNSTOWN TITLE ACCT 281 Principles of Financial Accounting 3 BUS 0000 Non-Equivalent* 3 ACCT 282 Principles of Managerial Accounting 3 BUS 0000 Non-Equivalent* 3 ACCT 381 Intermediate Accounting
More informationBlog Distillation via Sentiment-Sensitive Link Analysis
Blog Distillation via Sentiment-Sensitive Link Analysis Giacomo Berardi, Andrea Esuli, Fabrizio Sebastiani, and Fabrizio Silvestri Istituto di Scienza e Tecnologie dell Informazione, Consiglio Nazionale
More information