Cross-Language Information Retrieval (CLIR)

Size: px
Start display at page:

Download "Cross-Language Information Retrieval (CLIR)"

Transcription

1 Cross-Language Informaton Retreval CLIR Ananthakrshnan R Computer Scence & Engg. IIT Bombay anand@cse Aprl Natural Language Processng/Language Technology for the Web

2 Cross Language Informaton Retreval CLIR A subfeld of nformaton retreval dealng th retrevng nformaton rtten n a language dfferent from the language of the user's query. E.g. Usng Hnd queres to retreve Englsh documents Also called mult-lngual cross-lngual or trans-lngual IR.

3 Why CLIR? E.g. On the eb e have: Documents n dfferent languages Multlngual documents Images th captons n dfferent languages A sngle query should retreve all such resources.

4 Approaches to CLIR most effcent; commonly used Query Translaton Dctonary/Thes aurus-based Corpus-based Knoledgebased Pseudo- Relevance Feedback PRF nfeasble for large collectons Document Translaton MT rule-based MT EBMT/StatMT Intermedate Representaton UNL AgroExplorer Latent Semantc Indexng Most effectve approaches are hybrd a combnaton of knoledge and corpus-based methods.

5 Dctonary-based Query Translaton आयरल ड श त व त phrase dentfcaton ords to be translterated Hnd-Englsh dctonares search Collecton Ireland peace talks

6 The problem th dctonary-based CLIR -- ambguty अ त र य घटन ज ल धन आयरल ड श त व त cosmc outer-space ncdent event occurrence lessen subsde decrease loer dmnsh ebb declne reduce lattce mesh net re_nettng meshed_fabrc counterfet forged false fabrcated small_net netork gauze gratng seve money rches ealth appostve property Ireland peace calm tranqulty slence quetude conversaton talk negotaton tale

7 flterng/dsambguaton s requred after query translaton.

8 Dsambguaton usng co-occurrence statstcs Hypothess: correct translatons of query terms ll co-occur and ncorrect translatons ll tend not to co-occur

9 Problem th countng co-occurrences: data sparsty freqmarath Shallo Parsng CRFs freqmarath Shallo Structurng CRFs freqmarath Shallo Analyzng CRFs are all zero. Ho do e choose beteen parsng structurng and analyzng?

10 Par-se co-occurrence अ त र य घटन cosmc outer-space ncdent event occurrence lessen subsde decrease loer dmnsh ebb declne reduce freqcosmc ncdent freqcosmc event freqcosmc lessen 7130 freqcosmc subsde 3120 freqouter-space ncdent freqouter-space event freqouter-space lessen 2600 freqouter-space subsde 980

11 Shallo Parsng Structurng or Analyzng? shallo parsng shallo structurng shallo analyzng CRFs parsng 540 CRFs structurng 125 CRFs analyzng 765 But analyzng parsng structurng shallo Marath parsng Marath structurng 511 Marath analyzng shallo parsng shallo structurng 11 shallo analyzng 2 collocaton?

12 Rankng senses usng co-occurrence statstcs Use co-occurrence scores to calculate smlarty beteen to ords: smx y Pont-se mutual nformaton PMI Dce coeffcent PMI-IR PMI - IR x y = log hts x AND y hts x hts y

13 Dsambguaton algorthm user's query : q = { q s 1 q s 2... q s m } For each q s the set of translatons S = { t j }

14 = ' ' ' '. 1 t l S t l t j t j sm S sm = t j t j S sm score '. 2 ' }... { translated query 2 1 t m t t t q q q q = arg max. 3 t j t score q t j =

15 Example अ त र य घटन cosmc outer-space ncdent event lessen subsde decrease loer dmnsh ebb declne reduce scorecosmc= PMI-IRcosmc ncdent + PMI-IRcosmc event + PMI-IRcosmc lessen + PMI-IRcosmc subsde

16 Dsambguaton algorthm: sample outputs आयरल ड श त व त Ireland peace talks अ त र य घटन cosmc events ज ल धन net money?

17 Results on TREC8 dsks 4 and 5 Englsh topcs manually translated to Hnd Assumpton: relevance judgments for Englsh topcs hold for the translated queres Results all TF-IDF: Technque MAP Monolngual 23 All-translatons 16 PMI based dsambguaton 20.5 Manual flterng 21.5

18 Pseudo-Relevance Feedback for CLIR

19 User Relevance Feedback mono-lngual 1. Retreve documents usng the user s query 2. The user marks relevant documents 3. Choose the top N terms from these documents Top terms IDF s one opton for scorng 4. Add these N terms to the user s query to form a ne query 5. Use ths ne query to retreve a ne set of documents

20 Pseudo-Relevance Feedback PRF mono-lngual 1. Retreve documents usng the user s query 2. Assume that the top M documents retreved are relevant 3. Choose the top N terms from these M documents 4. Add these N terms to the user s query to form a ne query 5. Use ths ne query to retreve a ne set of documents

21 PRF for CLIR Corpus-based Query Translaton Uses a parallel corpus of documents: Hnd collecton H H 1 E 1 H 2 E H m E m Englsh collecton E

22 PRF for CLIR 1. Retreve documents n H usng the user s query 2. Assume that the top M documents retreved are relevant 3. Select the M documents n E that are algned to the top M retreved documents 4. Choose the top N terms from these documents 5. These N terms are the translated query 6. Use ths query to retreve from the target collecton hch s n the same language as E

23 Cross-Lngual Relevance Models - Estmate relevance models usng a parallel corpus

24 Rankng th Relevance Models Relevance model or Query model dstrbuton encodes the nformaton need: Probablty of ord occurrence n a relevant document Probablty of ord occurrence n the canddate document Rankng functon relatve entropy or KL dvergence KL D R Θ R P ΘR P D P D = P D.log P Θ R

25 Estmatng Mono-Lngual Relevance Models m m m R h h h P h h h P h h h P Q P P = = Θ Μ = = M m m M h P M P M P h h h P

26 Estmatng Cross-Lngual Relevance Models Μ = = } { } {... M H M E m H E E H m M h P M P M M P h h h P 1 P freq freq M P v X v X X λ λ + =

27 CLIR Evaluaton TREC Text REtreval Conference TREC CLIR track 2001 and 2002 Retreval of Arabc language nesre documents from topcs n Englsh Arabc documents 896 MB th SGML markup 50 topcs Use of provded resources stemmers blngual dctonares MT systems parallel corpora s encouraged to mnmze varablty

28 CLIR Evaluaton CLEF Cross Language Evaluaton Forum Major CLIR evaluaton forum Tracks nclude Multlngual retreval on nes collectons topcs ll be provded n many languages ncludng Hnd Multple language Queston Anserng ImageCLEF Cross Language Speech Retreval WebCLEF

29 Summary CLIR technques Query Translaton-based Document Translaton-based Intermedate Representaton-based Query translaton usng dctonares folloed by dsambguaton s a smple and effectve technque for CLIR PRF uses a parallel corpus for query translaton Parallel corpora can also be used to estmate crosslngual relevance models CLEF and TREC: mportant CLIR evaluaton conferences

30 References 1 1. Phrasal Translaton and Query Expanson Technques for Crosslanguage Informaton Retreval Lsa Ballesteros and W. Bruce Croft Research and Development n Informaton Retreval Resolvng Ambguty for Cross-Language Retreval Lsa Ballesteros and W. Bruce Croft Research and Development n Informaton Retreval A Maxmum Coherence Model for Dctonary-Based Cross- Language Informaton Retreval Y Lu Rong Jn and Joyce Y. Cha ACM SIGIR A Comparatve Study of Knoledge-Based Approaches for Cross- Language Informaton Retreval Douglas W. Oard Bonne J. Dorr Paul G. Hackett and Mara Katsova Techncal Report CS-TR Unversty of Maryland 1998.

31 References 2 5. Translngual Informaton Retreval: A Comparatve Evaluaton Jame G. Carbonell Ymng Yang Robert E. Frederkng Ralf D. Bron Ybng Geng and Danny Lee Internatonal Jont Conference on Artfcal Intellgence A Multstage Search Strategy for Cross Lngual Informaton Retreval Satsh Kagathara Mansh Deodalkar and Pushpak Bhattacharyya Symposum on Indan Morphology Phonology and Language Engneerng IIT Kharagpur February Relevance-Based Language Models Vctor Lavrenko and W. Bruce Croft Research and Development n Informaton Retreval Cross- Lngual Relevance Models V. Lavrenko M. Choquette and W. Croft ACM-SIGIR 2002.

32 Thank You

CS47300: Web Information Search and Management

CS47300: Web Information Search and Management CS47300: Web Informaton Search and Management Probablstc Retreval Models Prof. Chrs Clfton 7 September 2018 Materal adapted from course created by Dr. Luo S, now leadng Albaba research group 14 Why probabltes

More information

MSU at ImageCLEF: Cross Language and Interactive Image Retrieval

MSU at ImageCLEF: Cross Language and Interactive Image Retrieval MSU at ImageCLEF: Cross Language and Interactve Image Retreval Vneet Bansal, Chen Zhang, Joyce Y. Cha, Rong Jn Department of Computer Scence and Engneerng, Mchgan State Unversty East Lansng, MI48824, U.S.A.

More information

Retrieval Models: Language models

Retrieval Models: Language models CS-590I Informaton Retreval Retreval Models: Language models Luo S Department of Computer Scence Purdue Unversty Introducton to language model Ungram language model Document language model estmaton Maxmum

More information

Probabilistic Information Retrieval CE-324: Modern Information Retrieval Sharif University of Technology

Probabilistic Information Retrieval CE-324: Modern Information Retrieval Sharif University of Technology Probablstc Informaton Retreval CE-324: Modern Informaton Retreval Sharf Unversty of Technology M. Soleyman Fall 2016 Most sldes have been adapted from: Profs. Mannng, Nayak & Raghavan (CS-276, Stanford)

More information

Probabilistic Structured Query Methods

Probabilistic Structured Query Methods Probablstc Structured Query Methods Kareem Darwsh Electrcal and Computer Engneerng Department and UMIACS Unversty of Maryland, College Park, MD 20742 {kareem,oard}@glue.umd.edu Douglas W. Oard College

More information

Probabilistic Structured Query Methods

Probabilistic Structured Query Methods Probablstc Structured Query Methods Kareem Darwsh and Douglas W. Oard 1 Insttute for Advanced Computer Studes Unversty of Maryland, College Park, MD 20742 {kareem,oard}@glue.umd.edu ABSTRACT Structured

More information

Extending Relevance Model for Relevance Feedback

Extending Relevance Model for Relevance Feedback Extendng Relevance Model for Relevance Feedback Le Zhao, Chenmn Lang and Jame Callan Language Technologes Insttute School of Computer Scence Carnege Mellon Unversty {lezhao, chenmnl, callan}@cs.cmu.edu

More information

Similar Sentence Retrieval for Machine Translation Based on Word-Aligned Bilingual Corpus

Similar Sentence Retrieval for Machine Translation Based on Word-Aligned Bilingual Corpus Smlar Sentence Retreval for Machne Translaton Based on Word-Algned Blngual Corpus Wen-Han Chao and Zhou-Jun L School of Computer Scence, Natonal Unversty of Defense Technology, Chna, 40073 cwhk@63.com

More information

Lecture 13: More uses of Language Models

Lecture 13: More uses of Language Models Lecture 13: More uses of Language Models William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 13 What we ll learn in this lecture Comparing documents, corpora using LM approaches

More information

Information Retrieval Language models for IR

Information Retrieval Language models for IR Informaton Retreval Language models for IR From Mannng and Raghavan s course [Borros sldes from Vktor Lavrenko and Chengxang Zha] 1 Recap Tradtonal models Boolean model Vector space model robablstc models

More information

The BBN Crosslingual Topic Detection and Tracking System

The BBN Crosslingual Topic Detection and Tracking System The BBN Crosslngual Topc Detecton and Trackng System Tm Leek, Hubert Jn, Sreenvasa Ssta, Rchard Schwartz BBN Technologes, Cambrdge, MA ABSTRACT Ths was the frst year that the TDT program ncluded a requred

More information

International Journal of Mathematical Archive-3(3), 2012, Page: Available online through ISSN

International Journal of Mathematical Archive-3(3), 2012, Page: Available online through   ISSN Internatonal Journal of Mathematcal Archve-3(3), 2012, Page: 1136-1140 Avalable onlne through www.ma.nfo ISSN 2229 5046 ARITHMETIC OPERATIONS OF FOCAL ELEMENTS AND THEIR CORRESPONDING BASIC PROBABILITY

More information

Chapter 11: Simple Linear Regression and Correlation

Chapter 11: Simple Linear Regression and Correlation Chapter 11: Smple Lnear Regresson and Correlaton 11-1 Emprcal Models 11-2 Smple Lnear Regresson 11-3 Propertes of the Least Squares Estmators 11-4 Hypothess Test n Smple Lnear Regresson 11-4.1 Use of t-tests

More information

Power law and dimension of the maximum value for belief distribution with the max Deng entropy

Power law and dimension of the maximum value for belief distribution with the max Deng entropy Power law and dmenson of the maxmum value for belef dstrbuton wth the max Deng entropy Bngy Kang a, a College of Informaton Engneerng, Northwest A&F Unversty, Yanglng, Shaanx, 712100, Chna. Abstract Deng

More information

Corpora and Statistical Methods Lecture 6. Semantic similarity, vector space models and wordsense disambiguation

Corpora and Statistical Methods Lecture 6. Semantic similarity, vector space models and wordsense disambiguation Corpora and Statstcal Methods Lecture 6 Semantc smlarty, vector space models and wordsense dsambguaton Part 1 Semantc smlarty Synonymy Dfferent phonologcal/orthographc words hghly related meanngs: sofa

More information

Evaluation for sets of classes

Evaluation for sets of classes Evaluaton for Tet Categorzaton Classfcaton accuracy: usual n ML, the proporton of correct decsons, Not approprate f the populaton rate of the class s low Precson, Recall and F 1 Better measures 21 Evaluaton

More information

Multigradient for Neural Networks for Equalizers 1

Multigradient for Neural Networks for Equalizers 1 Multgradent for Neural Netorks for Equalzers 1 Chulhee ee, Jnook Go and Heeyoung Km Department of Electrcal and Electronc Engneerng Yonse Unversty 134 Shnchon-Dong, Seodaemun-Ku, Seoul 1-749, Korea ABSTRACT

More information

Complex Question Answering with ASQA at NTCIR 7 ACLIA

Complex Question Answering with ASQA at NTCIR 7 ACLIA Complex Queston Answerng wth ASQA at NTCIR 7 ACLIA Y-Hsun Lee 1, Cheng-We Lee 12, Cheng-Lung Sung 1, Mon-Tn Tzou 1, Chh-Chen Wang 1, Shh-Hung Lu 1, Cheng-We Shh 1, Pe-Yn Yang 1, Wen-Lan Hsu 1 1 Insttute

More information

Search sequence databases 2 10/25/2016

Search sequence databases 2 10/25/2016 Search sequence databases 2 10/25/2016 The BLAST algorthms Ø BLAST fnds local matches between two sequences, called hgh scorng segment pars (HSPs). Step 1: Break down the query sequence and the database

More information

Polynomial Regression Models

Polynomial Regression Models LINEAR REGRESSION ANALYSIS MODULE XII Lecture - 6 Polynomal Regresson Models Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur Test of sgnfcance To test the sgnfcance

More information

On the Effectiveness of Relevance Profiling

On the Effectiveness of Relevance Profiling On the Effectveness of Relevance Proflng Davd J Harper Smart Web Technologes Centre The Robert Gordon Unversty Aberdeen AB25 1HG UK d.harper@rgu.ac.uk Davd Lee Smart Web Technologes Centre The Robert Gordon

More information

EM and Structure Learning

EM and Structure Learning EM and Structure Learnng Le Song Machne Learnng II: Advanced Topcs CSE 8803ML, Sprng 2012 Partally observed graphcal models Mxture Models N(μ 1, Σ 1 ) Z X N N(μ 2, Σ 2 ) 2 Gaussan mxture model Consder

More information

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ CSE 455/555 Sprng 2013 Homework 7: Parametrc Technques Jason J. Corso Computer Scence and Engneerng SUY at Buffalo jcorso@buffalo.edu Solutons by Yngbo Zhou Ths assgnment does not need to be submtted and

More information

Selecting Good Expansion Terms for Pseudo-Relevance Feedback

Selecting Good Expansion Terms for Pseudo-Relevance Feedback Guhong Cao, Jan-Yun Ne Department of Computer Scence and Operatons Research Unversty of Montreal, Canada {caogu, ne}@ro.umontreal.ca Selectng Good Expanson erms for Pseudo-Relevance Feedback ABSRAC Pseudo-relevance

More information

Classification. Representing data: Hypothesis (classifier) Lecture 2, September 14, Reading: Eric CMU,

Classification. Representing data: Hypothesis (classifier) Lecture 2, September 14, Reading: Eric CMU, Machne Learnng 10-701/15-781, 781, Fall 2011 Nonparametrc methods Erc Xng Lecture 2, September 14, 2011 Readng: 1 Classfcaton Representng data: Hypothess (classfer) 2 1 Clusterng 3 Supervsed vs. Unsupervsed

More information

Note on EM-training of IBM-model 1

Note on EM-training of IBM-model 1 Note on EM-tranng of IBM-model INF58 Language Technologcal Applcatons, Fall The sldes on ths subject (nf58 6.pdf) ncludng the example seem nsuffcent to gve a good grasp of what s gong on. Hence here are

More information

This excerpt from. Foundations of Statistical Natural Language Processing. Christopher D. Manning and Hinrich Schütze The MIT Press.

This excerpt from. Foundations of Statistical Natural Language Processing. Christopher D. Manning and Hinrich Schütze The MIT Press. Ths excerpt from Foundatons of Statstcal Natural Language Processng. Chrstopher D. Mannng and Hnrch Schütze. 1999 The MIT Press. s provded n screen-vewable form for personal use only by members of MIT

More information

Machine Learning for IR. Outline. Learning to Rank. MAP vs Accuracy. Mean Average Precision 3/9/2010. Information Retrieval as Structured Prediction

Machine Learning for IR. Outline. Learning to Rank. MAP vs Accuracy. Mean Average Precision 3/9/2010. Information Retrieval as Structured Prediction /9/00 Informaton Retreval as Structured Predcton S 6784 March 4 th, 00 Ysong Yue ornell Unverst Jont or th: horsten Joachms, Flp Radlns, and homas Fnle Machne Learnng for IR Machne learnng often used (learnng

More information

Automatic Object Trajectory- Based Motion Recognition Using Gaussian Mixture Models

Automatic Object Trajectory- Based Motion Recognition Using Gaussian Mixture Models Automatc Object Trajectory- Based Moton Recognton Usng Gaussan Mxture Models Fasal I. Bashr, Ashfaq A. Khokhar, Dan Schonfeld Electrcal and Computer Engneerng, Unversty of Illnos at Chcago. Chcago, IL,

More information

A Particle Filter Algorithm based on Mixing of Prior probability density and UKF as Generate Importance Function

A Particle Filter Algorithm based on Mixing of Prior probability density and UKF as Generate Importance Function Advanced Scence and Technology Letters, pp.83-87 http://dx.do.org/10.14257/astl.2014.53.20 A Partcle Flter Algorthm based on Mxng of Pror probablty densty and UKF as Generate Importance Functon Lu Lu 1,1,

More information

Semi-supervised Classification with Active Query Selection

Semi-supervised Classification with Active Query Selection Sem-supervsed Classfcaton wth Actve Query Selecton Jao Wang and Swe Luo School of Computer and Informaton Technology, Beng Jaotong Unversty, Beng 00044, Chna Wangjao088@63.com Abstract. Labeled samples

More information

Web-Mining Agents Probabilistic Information Retrieval

Web-Mining Agents Probabilistic Information Retrieval Web-Mnng Agents Probablstc Informaton etreval Prof. Dr. alf Möller Unverstät zu Lübeck Insttut für Informatonssysteme Karsten Martny Übungen Acknowledgements Sldes taken from: Introducton to Informaton

More information

Primer on High-Order Moment Estimators

Primer on High-Order Moment Estimators Prmer on Hgh-Order Moment Estmators Ton M. Whted July 2007 The Errors-n-Varables Model We wll start wth the classcal EIV for one msmeasured regressor. The general case s n Erckson and Whted Econometrc

More information

Chapter 6. Supplemental Text Material

Chapter 6. Supplemental Text Material Chapter 6. Supplemental Text Materal S6-. actor Effect Estmates are Least Squares Estmates We have gven heurstc or ntutve explanatons of how the estmates of the factor effects are obtaned n the textboo.

More information

THE ROYAL STATISTICAL SOCIETY 2006 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE

THE ROYAL STATISTICAL SOCIETY 2006 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE THE ROYAL STATISTICAL SOCIETY 6 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE PAPER I STATISTICAL THEORY The Socety provdes these solutons to assst canddates preparng for the eamnatons n future years and for

More information

The Synchronous 8th-Order Differential Attack on 12 Rounds of the Block Cipher HyRAL

The Synchronous 8th-Order Differential Attack on 12 Rounds of the Block Cipher HyRAL The Synchronous 8th-Order Dfferental Attack on 12 Rounds of the Block Cpher HyRAL Yasutaka Igarash, Sej Fukushma, and Tomohro Hachno Kagoshma Unversty, Kagoshma, Japan Emal: {garash, fukushma, hachno}@eee.kagoshma-u.ac.jp

More information

Why BP Works STAT 232B

Why BP Works STAT 232B Why BP Works STAT 232B Free Energes Helmholz & Gbbs Free Energes 1 Dstance between Probablstc Models - K-L dvergence b{ KL b{ p{ = b{ ln { } p{ Here, p{ s the eact ont prob. b{ s the appromaton, called

More information

Uncertainty in measurements of power and energy on power networks

Uncertainty in measurements of power and energy on power networks Uncertanty n measurements of power and energy on power networks E. Manov, N. Kolev Department of Measurement and Instrumentaton, Techncal Unversty Sofa, bul. Klment Ohrdsk No8, bl., 000 Sofa, Bulgara Tel./fax:

More information

Economics 130. Lecture 4 Simple Linear Regression Continued

Economics 130. Lecture 4 Simple Linear Regression Continued Economcs 130 Lecture 4 Contnued Readngs for Week 4 Text, Chapter and 3. We contnue wth addressng our second ssue + add n how we evaluate these relatonshps: Where do we get data to do ths analyss? How do

More information

Information Retrieval and Web Search

Information Retrieval and Web Search Information Retrieval and Web Search IR models: Vector Space Model IR Models Set Theoretic Classic Models Fuzzy Extended Boolean U s e r T a s k Retrieval: Adhoc Filtering Brosing boolean vector probabilistic

More information

Handling Uncertain Spatial Data: Comparisons between Indexing Structures. Bir Bhanu, Rui Li, Chinya Ravishankar and Jinfeng Ni

Handling Uncertain Spatial Data: Comparisons between Indexing Structures. Bir Bhanu, Rui Li, Chinya Ravishankar and Jinfeng Ni Handlng Uncertan Spatal Data: Comparsons between Indexng Structures Br Bhanu, Ru L, Chnya Ravshankar and Jnfeng N Abstract Managng and manpulatng uncertanty n spatal databases are mportant problems for

More information

Building a Bilingual Dictionary with Scarce Resources: A Genetic Algorithm Approach

Building a Bilingual Dictionary with Scarce Resources: A Genetic Algorithm Approach Buldng a Blngual Dctonary wth carce Resources: A Genetc Algorthm Approach Benamn Han Language Technologes Insttute, Carnege Mellon Unversty 5000 Forbes Avenue Pttsburgh, PA 523-3702, UA benhd@cs.cmu.edu

More information

An Improved multiple fractal algorithm

An Improved multiple fractal algorithm Advanced Scence and Technology Letters Vol.31 (MulGraB 213), pp.184-188 http://dx.do.org/1.1427/astl.213.31.41 An Improved multple fractal algorthm Yun Ln, Xaochu Xu, Jnfeng Pang College of Informaton

More information

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS RETRIEVAL MODELS Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Retrieval models Boolean model Vector space model Probabilistic

More information

Midterm Review. Hongning Wang

Midterm Review. Hongning Wang Mdterm Revew Hongnng Wang CS@UVa Core concepts Search Engne Archtecture Key components n a modern search engne Crawlng & Text processng Dfferent strateges for crawlng Challenges n crawlng Text processng

More information

A New Scrambling Evaluation Scheme based on Spatial Distribution Entropy and Centroid Difference of Bit-plane

A New Scrambling Evaluation Scheme based on Spatial Distribution Entropy and Centroid Difference of Bit-plane A New Scramblng Evaluaton Scheme based on Spatal Dstrbuton Entropy and Centrod Dfference of Bt-plane Lang Zhao *, Avshek Adhkar Kouch Sakura * * Graduate School of Informaton Scence and Electrcal Engneerng,

More information

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018 INF 5860 Machne learnng for mage classfcaton Lecture 3 : Image classfcaton and regresson part II Anne Solberg January 3, 08 Today s topcs Multclass logstc regresson and softma Regularzaton Image classfcaton

More information

1 Information retrieval fundamentals

1 Information retrieval fundamentals CS 630 Lecture 1: 01/26/2006 Lecturer: Lillian Lee Scribes: Asif-ul Haque, Benyah Shaparenko This lecture focuses on the following topics Information retrieval fundamentals Vector Space Model (VSM) Deriving

More information

Chapter Newton s Method

Chapter Newton s Method Chapter 9. Newton s Method After readng ths chapter, you should be able to:. Understand how Newton s method s dfferent from the Golden Secton Search method. Understand how Newton s method works 3. Solve

More information

Synchronized Multi-sensor Tracks Association and Fusion

Synchronized Multi-sensor Tracks Association and Fusion Synchronzed Mult-sensor Tracks Assocaton and Fuson Dongguang Zuo Chongzhao an School of Electronc and nformaton Engneerng X an Jaotong Unversty Xan 749, P.R. Chna Zlz_3@sna.com.cn czhan@jtu.edu.cn Abstract

More information

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models Computaton of Hgher Order Moments from Two Multnomal Overdsperson Lkelhood Models BY J. T. NEWCOMER, N. K. NEERCHAL Department of Mathematcs and Statstcs, Unversty of Maryland, Baltmore County, Baltmore,

More information

Pattern Classification

Pattern Classification Pattern Classfcaton All materals n these sldes ere taken from Pattern Classfcaton (nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wley & Sons, 000 th the permsson of the authors and the publsher

More information

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010 Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton

More information

Detecting Attribute Dependencies from Query Feedback

Detecting Attribute Dependencies from Query Feedback Detectng Attrbute Dependences from Query Feedback Peter J. Haas 1, Faban Hueske 2, Volker Markl 1 1 IBM Almaden Research Center 2 Unverstät Ulm VLDB 2007 Peter J. Haas The Problem: Detectng (Parwse) Dependent

More information

A Multimodal Fusion Algorithm Based on FRR and FAR Using SVM

A Multimodal Fusion Algorithm Based on FRR and FAR Using SVM Internatonal Journal of Securty and Its Applcatons A Multmodal Fuson Algorthm Based on FRR and FAR Usng SVM Yong L 1, Meme Sh 2, En Zhu 3, Janpng Yn 3, Janmn Zhao 4 1 Department of Informaton Engneerng,

More information

Evaluation of classifiers MLPs

Evaluation of classifiers MLPs Lecture Evaluaton of classfers MLPs Mlos Hausrecht mlos@cs.ptt.edu 539 Sennott Square Evaluaton For any data set e use to test the model e can buld a confuson matrx: Counts of examples th: class label

More information

Sketching Sampled Data Streams

Sketching Sampled Data Streams Sketchng Sampled Data Streams Florn Rusu and Aln Dobra CISE Department Unversty of Florda March 31, 2009 Motvaton & Goal Motvaton Multcore processors How to use all the processng power? Parallel algorthms

More information

CS4495/6495 Introduction to Computer Vision. 3C-L3 Calibrating cameras

CS4495/6495 Introduction to Computer Vision. 3C-L3 Calibrating cameras CS4495/6495 Introducton to Computer Vson 3C-L3 Calbratng cameras Fnally (last tme): Camera parameters Projecton equaton the cumulatve effect of all parameters: M (3x4) f s x ' 1 0 0 0 c R 0 I T 3 3 3 x1

More information

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics ECOOMICS 35*-A Md-Term Exam -- Fall Term 000 Page of 3 pages QUEE'S UIVERSITY AT KIGSTO Department of Economcs ECOOMICS 35* - Secton A Introductory Econometrcs Fall Term 000 MID-TERM EAM ASWERS MG Abbott

More information

A Study for Evaluating the Importance of Various Parts of Speech (POS) for Information Retrieval (IR)

A Study for Evaluating the Importance of Various Parts of Speech (POS) for Information Retrieval (IR) A Study for Evaluating the Importance of Various Parts of Speech (POS) for Information Retrieval (IR) Chirag Shah Dept. of CSE IIT Bombay, Powai Mumbai - 400 076, Maharashtra, India. Email: chirag@cse.iitb.ac.in

More information

ECONOMETRICS - FINAL EXAM, 3rd YEAR (GECO & GADE)

ECONOMETRICS - FINAL EXAM, 3rd YEAR (GECO & GADE) ECONOMETRICS - FINAL EXAM, 3rd YEAR (GECO & GADE) June 7, 016 15:30 Frst famly name: Name: DNI/ID: Moble: Second famly Name: GECO/GADE: Instructor: E-mal: Queston 1 A B C Blank Queston A B C Blank Queston

More information

A Network Intrusion Detection Method Based on Improved K-means Algorithm

A Network Intrusion Detection Method Based on Improved K-means Algorithm Advanced Scence and Technology Letters, pp.429-433 http://dx.do.org/10.14257/astl.2014.53.89 A Network Intruson Detecton Method Based on Improved K-means Algorthm Meng Gao 1,1, Nhong Wang 1, 1 Informaton

More information

Some basic statistics and curve fitting techniques

Some basic statistics and curve fitting techniques Some basc statstcs and curve fttng technques Statstcs s the dscplne concerned wth the study of varablty, wth the study of uncertanty, and wth the study of decsonmakng n the face of uncertanty (Lndsay et

More information

Computational Biology Lecture 8: Substitution matrices Saad Mneimneh

Computational Biology Lecture 8: Substitution matrices Saad Mneimneh Computatonal Bology Lecture 8: Substtuton matrces Saad Mnemneh As we have ntroduced last tme, smple scorng schemes lke + or a match, - or a msmatch and -2 or a gap are not justable bologcally, especally

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

Part-of-Speech Tagging with Hidden Markov Models

Part-of-Speech Tagging with Hidden Markov Models Part-of-Speech Taggng wth Hdden Markov Models Jonathon Read October 7, 20 Last week: probablty theory and n-gram language models Last week we dscussed some concepts from probablty theory, such as condtonal

More information

EDMS Modern Measurement Theories. Multidimensional IRT Models. (Session 6)

EDMS Modern Measurement Theories. Multidimensional IRT Models. (Session 6) EDMS 74 - Modern Measurement Theores Multdmensonal IRT Models (Sesson 6) Sprng Semester 8 Department of Measurement, Statstcs, and Evaluaton (EDMS) Unversty of Maryland Dr. André A. Rupp, (3) 45 363, ruppandr@umd.edu

More information

Chapter 15 Student Lecture Notes 15-1

Chapter 15 Student Lecture Notes 15-1 Chapter 15 Student Lecture Notes 15-1 Basc Busness Statstcs (9 th Edton) Chapter 15 Multple Regresson Model Buldng 004 Prentce-Hall, Inc. Chap 15-1 Chapter Topcs The Quadratc Regresson Model Usng Transformatons

More information

Comparison of Regression Lines

Comparison of Regression Lines STATGRAPHICS Rev. 9/13/2013 Comparson of Regresson Lnes Summary... 1 Data Input... 3 Analyss Summary... 4 Plot of Ftted Model... 6 Condtonal Sums of Squares... 6 Analyss Optons... 7 Forecasts... 8 Confdence

More information

x i1 =1 for all i (the constant ).

x i1 =1 for all i (the constant ). Chapter 5 The Multple Regresson Model Consder an economc model where the dependent varable s a functon of K explanatory varables. The economc model has the form: y = f ( x,x,..., ) xk Approxmate ths by

More information

Multivariate Ratio Estimation With Known Population Proportion Of Two Auxiliary Characters For Finite Population

Multivariate Ratio Estimation With Known Population Proportion Of Two Auxiliary Characters For Finite Population Multvarate Rato Estmaton Wth Knon Populaton Proporton Of To Auxlar haracters For Fnte Populaton *Raesh Sngh, *Sachn Mal, **A. A. Adeara, ***Florentn Smarandache *Department of Statstcs, Banaras Hndu Unverst,Varanas-5,

More information

Split alignment. Martin C. Frith April 13, 2012

Split alignment. Martin C. Frith April 13, 2012 Splt algnment Martn C. Frth Aprl 13, 2012 1 Introducton Ths document s about algnng a query sequence to a genome, allowng dfferent parts of the query to match dfferent parts of the genome. Here are some

More information

Topic 23 - Randomized Complete Block Designs (RCBD)

Topic 23 - Randomized Complete Block Designs (RCBD) Topc 3 ANOVA (III) 3-1 Topc 3 - Randomzed Complete Block Desgns (RCBD) Defn: A Randomzed Complete Block Desgn s a varant of the completely randomzed desgn (CRD) that we recently learned. In ths desgn,

More information

MDL-Based Unsupervised Attribute Ranking

MDL-Based Unsupervised Attribute Ranking MDL-Based Unsupervsed Attrbute Rankng Zdravko Markov Computer Scence Department Central Connectcut State Unversty New Brtan, CT 06050, USA http://www.cs.ccsu.edu/~markov/ markovz@ccsu.edu MDL-Based Unsupervsed

More information

Some Consequences. Example of Extended Euclidean Algorithm. The Fundamental Theorem of Arithmetic, II. Characterizing the GCD and LCM

Some Consequences. Example of Extended Euclidean Algorithm. The Fundamental Theorem of Arithmetic, II. Characterizing the GCD and LCM Example of Extended Eucldean Algorthm Recall that gcd(84, 33) = gcd(33, 18) = gcd(18, 15) = gcd(15, 3) = gcd(3, 0) = 3 We work backwards to wrte 3 as a lnear combnaton of 84 and 33: 3 = 18 15 [Now 3 s

More information

ECE559VV Project Report

ECE559VV Project Report ECE559VV Project Report (Supplementary Notes Loc Xuan Bu I. MAX SUM-RATE SCHEDULING: THE UPLINK CASE We have seen (n the presentaton that, for downlnk (broadcast channels, the strategy maxmzng the sum-rate

More information

A Probabilistic Multimedia Retrieval Model and Its Evaluation

A Probabilistic Multimedia Retrieval Model and Its Evaluation EURASIP Journal on Appled Sgnal Processng 2:2, 86 98 c 2 Hndaw Publshng Corporaton A Probablstc Multmeda Retreval Model and Its Evaluaton Thjs Westerveld Natonal Research Insttute for Mathematcs and Computer

More information

CS 646 (Fall 2016) Homework 3

CS 646 (Fall 2016) Homework 3 CS 646 (Fall 2016) Homework 3 Deadline: 11:59pm, Oct 31st, 2016 (EST) Access the following resources before you start working on HW3: Download and uncompress the index file and other data from Moodle.

More information

Tests of Single Linear Coefficient Restrictions: t-tests and F-tests. 1. Basic Rules. 2. Testing Single Linear Coefficient Restrictions

Tests of Single Linear Coefficient Restrictions: t-tests and F-tests. 1. Basic Rules. 2. Testing Single Linear Coefficient Restrictions ECONOMICS 35* -- NOTE ECON 35* -- NOTE Tests of Sngle Lnear Coeffcent Restrctons: t-tests and -tests Basc Rules Tests of a sngle lnear coeffcent restrcton can be performed usng ether a two-taled t-test

More information

Durban Watson for Testing the Lack-of-Fit of Polynomial Regression Models without Replications

Durban Watson for Testing the Lack-of-Fit of Polynomial Regression Models without Replications Durban Watson for Testng the Lack-of-Ft of Polynomal Regresson Models wthout Replcatons Ruba A. Alyaf, Maha A. Omar, Abdullah A. Al-Shha ralyaf@ksu.edu.sa, maomar@ksu.edu.sa, aalshha@ksu.edu.sa Department

More information

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Analyss of Varance and Desgn of Experment-I MODULE VIII LECTURE - 34 ANALYSIS OF VARIANCE IN RANDOM-EFFECTS MODEL AND MIXED-EFFECTS EFFECTS MODEL Dr Shalabh Department of Mathematcs and Statstcs Indan

More information

Keyword Reduction for Text Categorization using Neighborhood Rough Sets

Keyword Reduction for Text Categorization using Neighborhood Rough Sets IJCSI Internatonal Journal of Computer Scence Issues, Volume 1, Issue 1, No, January 015 ISSN (rnt): 1694-0814 ISSN (Onlne): 1694-0784 www.ijcsi.org Keyword Reducton for Text Categorzaton usng Neghborhood

More information

On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics

On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics On Statstcal Analyss and Optmzaton of Informaton Retreval Effectveness Metrcs Jun Wang and Janhan Zhu Department of Computer Scence, Unversty College London, UK wang.jun@acm.org, j.zhu@cs.ucl.ac.uk ABSTRACT

More information

Gaussian Conditional Random Field Network for Semantic Segmentation - Supplementary Material

Gaussian Conditional Random Field Network for Semantic Segmentation - Supplementary Material Gaussan Condtonal Random Feld Networ for Semantc Segmentaton - Supplementary Materal Ravtea Vemulapall, Oncel Tuzel *, Mng-Yu Lu *, and Rama Chellappa Center for Automaton Research, UMIACS, Unversty of

More information

ML4NLP Introduction to Classification

ML4NLP Introduction to Classification ML4NLP Introducton to Classfcaton CS 590NLP Dan Goldwasser Purdue Unversty dgoldwas@purdue.edu Statstcal Language Modelng Intuton: by lookng at large quanttes of text we can fnd statstcal regulartes Dstngush

More information

Methods in Epidemiology. Medical statistics 02/11/2014. Estimation How large is the effect? At the end of the lecture students should be able

Methods in Epidemiology. Medical statistics 02/11/2014. Estimation How large is the effect? At the end of the lecture students should be able Methods n Epdemology Estmaton How large s the effect? Medcal statstcs At the end of the lecture students should be able to llustrate the prncples of statstcal nference to nterpret confdence ntervals Methods

More information

Deformation rate estimation on changing landscapes using. Abstract Title. Temporarily Coherent Point InSAR. Author name

Deformation rate estimation on changing landscapes using. Abstract Title. Temporarily Coherent Point InSAR. Author name Deformaton rate estmaton on changng landscapes usng Abstract Ttle Temporarly Coherent Pont InSAR Le Zhang (1), Xaol Dng (1) and Zhong Lu (2) Author name (1)The Hong Kong Polytechnc Unversty, Kowloon, Hong

More information

Methods of Detecting Outliers in A Regression Analysis Model.

Methods of Detecting Outliers in A Regression Analysis Model. Methods of Detectng Outlers n A Regresson Analyss Model. Ogu, A. I. *, Inyama, S. C+, Achugamonu, P. C++ *Department of Statstcs, Imo State Unversty,Owerr +Department of Mathematcs, Federal Unversty of

More information

Checking Pairwise Relationships. Lecture 19 Biostatistics 666

Checking Pairwise Relationships. Lecture 19 Biostatistics 666 Checkng Parwse Relatonshps Lecture 19 Bostatstcs 666 Last Lecture: Markov Model for Multpont Analyss X X X 1 3 X M P X 1 I P X I P X 3 I P X M I 1 3 M I 1 I I 3 I M P I I P I 3 I P... 1 IBD states along

More information

Message modification, neutral bits and boomerangs

Message modification, neutral bits and boomerangs Message modfcaton, neutral bts and boomerangs From whch round should we start countng n SHA? Antone Joux DGA and Unversty of Versalles St-Quentn-en-Yvelnes France Jont work wth Thomas Peyrn 1 Dfferental

More information

Question Classification Using Language Modeling

Question Classification Using Language Modeling Queston Classfcaton Usng Language Modelng We L Center for Intellgent Informaton Retreval Department of Computer Scence Unversty of Massachusetts, Amherst, MA 01003 ABSTRACT Queston classfcaton assgns a

More information

Benchmarking in pig production

Benchmarking in pig production Benchmarkng n pg producton Thomas Algot Søllested Egeberg Internatonal A/S Agenda Who am I? Benchmarkng usng Data Envelopment Analyss Focus-Fnder an example of benchmarkng n pg producton 1 Who am I? M.Sc.

More information

Uncertainty as the Overlap of Alternate Conditional Distributions

Uncertainty as the Overlap of Alternate Conditional Distributions Uncertanty as the Overlap of Alternate Condtonal Dstrbutons Olena Babak and Clayton V. Deutsch Centre for Computatonal Geostatstcs Department of Cvl & Envronmental Engneerng Unversty of Alberta An mportant

More information

Bayesian Planning of Hit-Miss Inspection Tests

Bayesian Planning of Hit-Miss Inspection Tests Bayesan Plannng of Ht-Mss Inspecton Tests Yew-Meng Koh a and Wllam Q Meeker a a Center for Nondestructve Evaluaton, Department of Statstcs, Iowa State Unversty, Ames, Iowa 5000 Abstract Although some useful

More information

Exploiting association rules and ontology for semantic document indexing

Exploiting association rules and ontology for semantic document indexing Explotng assocaton rules and ontology for antc document ndexng Fatha Boubekeur IRIT-SIG, Paul Sabater Unversty of Toulouse, 31062 CEDEX 9, France Department of Computer Scences, Mouloud Mammer Unversty

More information

Chapter 10: Information Retrieval. See corresponding chapter in Manning&Schütze

Chapter 10: Information Retrieval. See corresponding chapter in Manning&Schütze Chapter 10: Information Retrieval See corresponding chapter in Manning&Schütze Evaluation Metrics in IR 2 Goal In IR there is a much larger variety of possible metrics For different tasks, different metrics

More information

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers Psychology 282 Lecture #24 Outlne Regresson Dagnostcs: Outlers In an earler lecture we studed the statstcal assumptons underlyng the regresson model, ncludng the followng ponts: Formal statement of assumptons.

More information

Online Appendix to: Axiomatization and measurement of Quasi-hyperbolic Discounting

Online Appendix to: Axiomatization and measurement of Quasi-hyperbolic Discounting Onlne Appendx to: Axomatzaton and measurement of Quas-hyperbolc Dscountng José Lus Montel Olea Tomasz Strzaleck 1 Sample Selecton As dscussed before our ntal sample conssts of two groups of subjects. Group

More information

Simulation and Probability Distribution

Simulation and Probability Distribution CHAPTER Probablty, Statstcs, and Relablty for Engneers and Scentsts Second Edton PROBABILIT DISTRIBUTION FOR CONTINUOUS RANDOM VARIABLES A. J. Clark School of Engneerng Department of Cvl and Envronmental

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information