Terms in Time and Times in Context: A Graph-based Term-Time Ranking Model

Size: px
Start display at page:

Download "Terms in Time and Times in Context: A Graph-based Term-Time Ranking Model"

Transcription

1 Terms in Time and Times in Context: A Graph-based Term-Time Ranking Model Andreas Spitz, Jannik Strötgen, Thomas Bögel and Michael Gertz Heidelberg University Institute of Computer Science Database Systems Research Group spitz@informatik.uni-heidelberg.de 5th Temporal Web Analytics Workshop Florence, May 18, 2015

2 What happened on June 15, 1215? A simple question. How simple is the answer? Terms in Time and Times in Context Andreas Spitz 1 of 24

3 Motivation Co-ooccurrence Graphs With structured data: quite simple Terms in Time and Times in Context Term-Ranking Projection Application Summary Based on unstructured text data: much more challenging Andreas Spitz 2 of 24

4 Data Set and Approach A corpus of all English Wikipedia articles: Only text is considered, no info-boxes 3, 079, 620 documents with time expressions Problem statement, given such a corpus: Extract and normalize temporal expressions (dates) Find key terms that best summarize a given date Terms in Time and Times in Context Andreas Spitz 3 of 24

5 Outline Outline of the approach: Represent date-term co-occurrences efficiently Extract and normalize temporal expressions (dates) Extract content words that co-occur with dates Generate an efficient data structure Based on this representation Identify relevant terms for any given date Identify similar dates for any given date Example applications Terms in Time and Times in Context Andreas Spitz 4 of 24

6 Extraction of Temporal Expressions Normalization, e.g., May 18, Handling relative temporal expressions, e.g., in May Considering the document type Source: Strötgen, Gertz Multilingual and Cross-domain Temporal Tagging (2013) Terms in Time and Times in Context Andreas Spitz 5 of 24

7 Coverage of Dates We use a combination of dates of three granularities: YYYY-MM-DD (day) YYYY-MM (month) YYYY (year) Percentage of dates that are included in the data per year year coverage in % Terms in Time and Times in Context Andreas Spitz 6 of 24

8 Extraction of Terms and Representation For all sentences s in any Wikipedia document: Terms in Time and Times in Context Andreas Spitz 7 of 24

9 Extraction of Terms and Representation Identify/normalize dates and remove stop words Terms in Time and Times in Context Andreas Spitz 7 of 24

10 Extraction of Terms and Representation Create a bipartite graph G s = (T s D s, E s ) with weights ω s Terms in Time and Times in Context Andreas Spitz 7 of 24

11 Extraction of Terms and Representation Satisfy the inclusion condition for dates Terms in Time and Times in Context Andreas Spitz 7 of 24

12 Extraction of Terms and Representation Satisfy the inclusion condition for dates Terms in Time and Times in Context Andreas Spitz 7 of 24

13 Graph aggregation Aggregate the sentence-graphs G s : T := T s D := D s E := E s ω(e) := ω s (e) We obtain G = (T D, E, ω) with: T = 3, 748, 730 terms D = 210, 375 dates E = 110, 639, 525 edges Terms in Time and Times in Context Andreas Spitz 8 of 24

14 Formalising the Question What happened on June 15, 1215? Which terms in the graph co-occur in a significant manner with the date ? Terms in Time and Times in Context Andreas Spitz 9 of 24

15 Ranking We need a ranking-function from dates D to a list of terms T r : D R T r(d) := ranking of terms t T by their significance for d Terms in Time and Times in Context Andreas Spitz 10 of 24

16 Ranking We need a ranking-function from dates D to a list of terms T r : D R T r(d) := ranking of terms t T by their significance for d Idea: adapt tf-idf to the bipartite graph tf-idf := tf log 1 df tf : frequency of term in document df : fraction of documents that contain the term Terms in Time and Times in Context Andreas Spitz 10 of 24

17 Adapting tf-idf How does this relate to the graph? Identify dates with documents, i.e., dates contain terms Term frequency given by edge weights: tf(d, t) ω(d, t) Inverse document frequency given by number of neighbours: idf(t) D deg(t) tf-idf := tf log 1 df tf-idf(d, t) := ω(d, t) log D deg(t) Terms in Time and Times in Context Andreas Spitz 11 of 24

18 June 15, 1215 Terms in Time and Times in Context Andreas Spitz 12 of 24

19 June 15, 1215 Terms in Time and Times in Context Andreas Spitz 12 of 24

20 A Ranking for Dates Ranking dates by term works analogously: tf-idf(t, d) := ω(t, d) log T deg(d) Terms in Time and Times in Context Andreas Spitz 13 of 24

21 A Ranking for Dates Ranking dates by term works analogously: tf-idf(t, d) := ω(t, d) log T deg(d) Terms in Time and Times in Context Andreas Spitz 13 of 24

22 Ranking Nodes by Similarity Within a Set Can we create a ranking for dates by dates?... or for terms by terms? Terms in Time and Times in Context Andreas Spitz 14 of 24

23 Ranking Nodes by Similarity Within a Set Can we create a ranking for dates by dates?... or for terms by terms? Formally this is a one-mode projection of the bipartite graph: Reduce graph to a single set of nodes T or D Connect nodes that share neighbours in the bipartite graph This results in a very dense graph How can we identify relevant edges in the projection? Terms in Time and Times in Context Andreas Spitz 14 of 24

24 Cosine Similarity of Adjacency Vectors In a lesson from Collaborative Filtering : use a cosine similarity of adjacency vectors cos(t a, t b ) := tai t bi t 2 ai t 2 bi Terms in Time and Times in Context Andreas Spitz 15 of 24

25 Evaluation Ground truth: U.S. Election Days ( ) Recurs annually Always on Tuesday after the first Monday in November (Nov 2 - Nov 8) Every four years: presidential election Terms in Time and Times in Context Andreas Spitz 16 of 24

26 Evaluation Ground truth: U.S. Election Days ( ) Recurs annually Always on Tuesday after the first Monday in November (Nov 2 - Nov 8) Every four years: presidential election Expectation: For a given election day, election days in other years are ranked highly For presidential election days, other presidential election days are ranked highly Terms in Time and Times in Context Andreas Spitz 16 of 24

27 Precision at k 0.6 precision k election day presidential general prec k := Election days in top k ranks k Terms in Time and Times in Context Andreas Spitz 17 of 24

28 Area Under the ROC Curve year AUC election day general presidential Terms in Time and Times in Context Andreas Spitz 18 of 24

29 Practical Application: Hot Spots & Key Players Here: approximation of countries activity during given months For each European country c, define its name, e.g. t n (c) = italy, define the countries adjectival form, e.g. t a (c) = italian, compute individual tf-idf scores for terms and combine. act(c, d) := tf-idf(d, t n(c)) + tf-idf(d, t a (c)) max[tf-idf(d, )] Terms in Time and Times in Context Andreas Spitz 19 of 24

30 Activity by Country During World War II March 1939 September 1939 November 1939 German occupation German invasion Soviet invasion of Czechoslovakia of Poland of Finland activity 1.5 April 1940 May 1940 October 1940 German assault German invasion Battle of Britain and on Norway of France Greco Italian War Terms in Time and Times in Context Andreas Spitz 20 of 24

31 Activity by Country During World War II (2) April 1941 June 1941 September 1943 Invasion of Yugoslavia German offensive Allied invasion and Greece against the Soviets of Italy June 1944 August 1944 May 1945 Allied forces land in Liberation of Paris German Capitulation Normandy (DDay) activity Terms in Time and Times in Context Andreas Spitz 21 of 24

32 Summary Approach: Extract dates and terms from unstructured text Construct a bipartite date-term graph Allows ranking dates / terms according to co-occurrences Benefits: Simple measures already yield good results Efficient: 4GB Memory and real-time queries Flexibility of ranking methods Terms in Time and Times in Context Andreas Spitz 22 of 24

33 Ongoing Work Terms in Time and Times in Context Andreas Spitz 23 of 24

34 Thank you! Questions? Terms in Time and Times in Context Andreas Spitz 24 of 24

So Far Away and Yet so Close: Augmenting Toponym Disambiguation and Similarity with Text-Based Networks

So Far Away and Yet so Close: Augmenting Toponym Disambiguation and Similarity with Text-Based Networks So Far Away and Yet so Close: Augmenting Toponym Disambiguation and Similarity with Text-Based Networks Andreas Spitz Johanna Geiß Michael Gertz Institute of Computer Science, Heidelberg University Im

More information

Discrete distribution. Fitting probability models to frequency data. Hypotheses for! 2 test. ! 2 Goodness-of-fit test

Discrete distribution. Fitting probability models to frequency data. Hypotheses for! 2 test. ! 2 Goodness-of-fit test Discrete distribution Fitting probability models to frequency data A probability distribution describing a discrete numerical random variable For example,! Number of heads from 10 flips of a coin! Number

More information

Vector Space Model. Yufei Tao KAIST. March 5, Y. Tao, March 5, 2013 Vector Space Model

Vector Space Model. Yufei Tao KAIST. March 5, Y. Tao, March 5, 2013 Vector Space Model Vector Space Model Yufei Tao KAIST March 5, 2013 In this lecture, we will study a problem that is (very) fundamental in information retrieval, and must be tackled by all search engines. Let S be a set

More information

What is Text mining? To discover the useful patterns/contents from the large amount of data that can be structured or unstructured.

What is Text mining? To discover the useful patterns/contents from the large amount of data that can be structured or unstructured. What is Text mining? To discover the useful patterns/contents from the large amount of data that can be structured or unstructured. Text mining What can be used for text mining?? Classification/categorization

More information

Variable Latent Semantic Indexing

Variable Latent Semantic Indexing Variable Latent Semantic Indexing Prabhakar Raghavan Yahoo! Research Sunnyvale, CA November 2005 Joint work with A. Dasgupta, R. Kumar, A. Tomkins. Yahoo! Research. Outline 1 Introduction 2 Background

More information

Latent Geographic Feature Extraction from Social Media

Latent Geographic Feature Extraction from Social Media Latent Geographic Feature Extraction from Social Media Christian Sengstock* Michael Gertz Database Systems Research Group Heidelberg University, Germany November 8, 2012 Social Media is a huge and increasing

More information

PYROGEOGRAPHY OF THE IBERIAN PENINSULA

PYROGEOGRAPHY OF THE IBERIAN PENINSULA PYROGEOGRAPHY OF THE IBERIAN PENINSULA Teresa J. Calado (1), Carlos C. DaCamara (1), Sílvia A. Nunes (1), Sofia L. Ermida (1) and Isabel F. Trigo (1,2) (1) Instituto Dom Luiz, Universidade de Lisboa, Lisboa,

More information

Natural Language Processing. Topics in Information Retrieval. Updated 5/10

Natural Language Processing. Topics in Information Retrieval. Updated 5/10 Natural Language Processing Topics in Information Retrieval Updated 5/10 Outline Introduction to IR Design features of IR systems Evaluation measures The vector space model Latent semantic indexing Background

More information

Latent Semantic Analysis. Hongning Wang

Latent Semantic Analysis. Hongning Wang Latent Semantic Analysis Hongning Wang CS@UVa Recap: vector space model Represent both doc and query by concept vectors Each concept defines one dimension K concepts define a high-dimensional space Element

More information

Data Mining Recitation Notes Week 3

Data Mining Recitation Notes Week 3 Data Mining Recitation Notes Week 3 Jack Rae January 28, 2013 1 Information Retrieval Given a set of documents, pull the (k) most similar document(s) to a given query. 1.1 Setup Say we have D documents

More information

Wiki Definition. Reputation Systems I. Outline. Introduction to Reputations. Yury Lifshits. HITS, PageRank, SALSA, ebay, EigenTrust, VKontakte

Wiki Definition. Reputation Systems I. Outline. Introduction to Reputations. Yury Lifshits. HITS, PageRank, SALSA, ebay, EigenTrust, VKontakte Reputation Systems I HITS, PageRank, SALSA, ebay, EigenTrust, VKontakte Yury Lifshits Wiki Definition Reputation is the opinion (more technically, a social evaluation) of the public toward a person, a

More information

CAIM: Cerca i Anàlisi d Informació Massiva

CAIM: Cerca i Anàlisi d Informació Massiva 1 / 21 CAIM: Cerca i Anàlisi d Informació Massiva FIB, Grau en Enginyeria Informàtica Slides by Marta Arias, José Balcázar, Ricard Gavaldá Department of Computer Science, UPC Fall 2016 http://www.cs.upc.edu/~caim

More information

Retrieval by Content. Part 2: Text Retrieval Term Frequency and Inverse Document Frequency. Srihari: CSE 626 1

Retrieval by Content. Part 2: Text Retrieval Term Frequency and Inverse Document Frequency. Srihari: CSE 626 1 Retrieval by Content Part 2: Text Retrieval Term Frequency and Inverse Document Frequency Srihari: CSE 626 1 Text Retrieval Retrieval of text-based information is referred to as Information Retrieval (IR)

More information

STATISTICAL FORECASTING and SEASONALITY (M. E. Ippolito; )

STATISTICAL FORECASTING and SEASONALITY (M. E. Ippolito; ) STATISTICAL FORECASTING and SEASONALITY (M. E. Ippolito; 10-6-13) PART I OVERVIEW The following discussion expands upon exponential smoothing and seasonality as presented in Chapter 11, Forecasting, in

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 6: Numerical Linear Algebra: Applications in Machine Learning Cho-Jui Hsieh UC Davis April 27, 2017 Principal Component Analysis Principal

More information

Cross-lingual and temporal Wikipedia analysis

Cross-lingual and temporal Wikipedia analysis MTA SZTAKI Data Mining and Search Group June 14, 2013 Supported by the EC FET Open project New tools and algorithms for directed network analysis (NADINE No 288956) Table of Contents 1 Link prediction

More information

Chap 2: Classical models for information retrieval

Chap 2: Classical models for information retrieval Chap 2: Classical models for information retrieval Jean-Pierre Chevallet & Philippe Mulhem LIG-MRIM Sept 2016 Jean-Pierre Chevallet & Philippe Mulhem Models of IR 1 / 81 Outline Basic IR Models 1 Basic

More information

HELCOM-VASAB Maritime Spatial Planning Working Group Twelfth Meeting Gdansk, Poland, February 2016

HELCOM-VASAB Maritime Spatial Planning Working Group Twelfth Meeting Gdansk, Poland, February 2016 HELCOM-VASAB Maritime Spatial Planning Working Group Twelfth Meeting Gdansk, Poland, 24-25 February 2016 Document title HELCOM database for the coastal and marine Baltic Sea protected areas (HELCOM MPAs).

More information

Predicates and Quantifiers

Predicates and Quantifiers Predicates and Quantifiers Lecture 9 Section 3.1 Robb T. Koether Hampden-Sydney College Wed, Jan 29, 2014 Robb T. Koether (Hampden-Sydney College) Predicates and Quantifiers Wed, Jan 29, 2014 1 / 32 1

More information

Boolean and Vector Space Retrieval Models

Boolean and Vector Space Retrieval Models Boolean and Vector Space Retrieval Models Many slides in this section are adapted from Prof. Joydeep Ghosh (UT ECE) who in turn adapted them from Prof. Dik Lee (Univ. of Science and Tech, Hong Kong) 1

More information

Spatializing time in a history text corpus

Spatializing time in a history text corpus Zurich Open Repository and Archive University of Zurich Main Library Strickhofstrasse 39 CH-8057 Zurich www.zora.uzh.ch Year: 2014 Spatializing time in a history text corpus Bruggmann, André; Fabrikant,

More information

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS RETRIEVAL MODELS Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Retrieval models Boolean model Vector space model Probabilistic

More information

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation. ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Previous lectures: Sparse vectors recap How to represent

More information

ANLP Lecture 22 Lexical Semantics with Dense Vectors

ANLP Lecture 22 Lexical Semantics with Dense Vectors ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Henry S. Thompson ANLP Lecture 22 5 November 2018 Previous

More information

A Plot of the Tracking Signals Calculated in Exhibit 3.9

A Plot of the Tracking Signals Calculated in Exhibit 3.9 CHAPTER 3 FORECASTING 1 Measurement of Error We can get a better feel for what the MAD and tracking signal mean by plotting the points on a graph. Though this is not completely legitimate from a sample-size

More information

Semantics with Dense Vectors. Reference: D. Jurafsky and J. Martin, Speech and Language Processing

Semantics with Dense Vectors. Reference: D. Jurafsky and J. Martin, Speech and Language Processing Semantics with Dense Vectors Reference: D. Jurafsky and J. Martin, Speech and Language Processing 1 Semantics with Dense Vectors We saw how to represent a word as a sparse vector with dimensions corresponding

More information

Argo data management November 2nd, 2005 Ref : cordo/dti-rap/ Version 1.1 ARGO DATA MANAGEMENT REPORT FRENCH DAC

Argo data management November 2nd, 2005 Ref : cordo/dti-rap/ Version 1.1 ARGO DATA MANAGEMENT REPORT FRENCH DAC Argo data management November 2nd, 2005 Ref : cordo/dti-rap/05-145 Version 1.1 ARGO DATA MANAGEMENT REPORT FRENCH DAC 2 Argo National Data Management Report of France November 2005 Introduction This document

More information

Information Retrieval and Web Search

Information Retrieval and Web Search Information Retrieval and Web Search IR models: Vector Space Model IR Models Set Theoretic Classic Models Fuzzy Extended Boolean U s e r T a s k Retrieval: Adhoc Filtering Brosing boolean vector probabilistic

More information

DISTRIBUTIONAL SEMANTICS

DISTRIBUTIONAL SEMANTICS COMP90042 LECTURE 4 DISTRIBUTIONAL SEMANTICS LEXICAL DATABASES - PROBLEMS Manually constructed Expensive Human annotation can be biased and noisy Language is dynamic New words: slangs, terminology, etc.

More information

Semantic Similarity from Corpora - Latent Semantic Analysis

Semantic Similarity from Corpora - Latent Semantic Analysis Semantic Similarity from Corpora - Latent Semantic Analysis Carlo Strapparava FBK-Irst Istituto per la ricerca scientifica e tecnologica I-385 Povo, Trento, ITALY strappa@fbk.eu Overview Latent Semantic

More information

APPENDIX M LAKE ELEVATION AND FLOW RELEASES SENSITIVITY ANALYSIS RESULTS

APPENDIX M LAKE ELEVATION AND FLOW RELEASES SENSITIVITY ANALYSIS RESULTS APPENDIX M LAKE ELEVATION AND FLOW RELEASES SENSITIVITY ANALYSIS RESULTS Appendix M Lake Elevation and Flow Releases Sensitivity Analysis Results Figure M-1 Lake Jocassee Modeled Reservoir Elevations (Current

More information

Essential Maths Skills. for AS/A-level. Geography. Helen Harris. Series Editor Heather Davis Educational Consultant with Cornwall Learning

Essential Maths Skills. for AS/A-level. Geography. Helen Harris. Series Editor Heather Davis Educational Consultant with Cornwall Learning Essential Maths Skills for AS/A-level Geography Helen Harris Series Editor Heather Davis Educational Consultant with Cornwall Learning Contents Introduction... 5 1 Understanding data Nominal, ordinal and

More information

WHEN IS IT EVER GOING TO RAIN? Table of Average Annual Rainfall and Rainfall For Selected Arizona Cities

WHEN IS IT EVER GOING TO RAIN? Table of Average Annual Rainfall and Rainfall For Selected Arizona Cities WHEN IS IT EVER GOING TO RAIN? Table of Average Annual Rainfall and 2001-2002 Rainfall For Selected Arizona Cities Phoenix Tucson Flagstaff Avg. 2001-2002 Avg. 2001-2002 Avg. 2001-2002 October 0.7 0.0

More information

Computation of Large Sparse Aggregated Areas for Analytic Database Queries

Computation of Large Sparse Aggregated Areas for Analytic Database Queries Computation of Large Sparse Aggregated Areas for Analytic Database Queries Steffen Wittmer Tobias Lauer Jedox AG Collaborators: Zurab Khadikov Alexander Haberstroh Peter Strohm Business Intelligence and

More information

Densest subgraph computation and applications in finding events on social media

Densest subgraph computation and applications in finding events on social media Densest subgraph computation and applications in finding events on social media Oana Denisa Balalau advised by Mauro Sozio Télécom ParisTech, Institut Mines Télécom December 4, 2015 1 / 28 Table of Contents

More information

As it is not necessarily possible to satisfy this equation, we just ask for a solution to the more general equation

As it is not necessarily possible to satisfy this equation, we just ask for a solution to the more general equation Graphs and Networks Page 1 Lecture 2, Ranking 1 Tuesday, September 12, 2006 1:14 PM I. II. I. How search engines work: a. Crawl the web, creating a database b. Answer query somehow, e.g. grep. (ex. Funk

More information

World Book Online: World War II

World Book Online: World War II World Book Online: The trusted, student-friendly online reference tool. World Book Kids Database Name: Date: World War II was the most destructive war in world history. It killed more people, destroyed

More information

Chapter 10: Information Retrieval. See corresponding chapter in Manning&Schütze

Chapter 10: Information Retrieval. See corresponding chapter in Manning&Schütze Chapter 10: Information Retrieval See corresponding chapter in Manning&Schütze Evaluation Metrics in IR 2 Goal In IR there is a much larger variety of possible metrics For different tasks, different metrics

More information

Maschinelle Sprachverarbeitung

Maschinelle Sprachverarbeitung Maschinelle Sprachverarbeitung Retrieval Models and Implementation Ulf Leser Content of this Lecture Information Retrieval Models Boolean Model Vector Space Model Inverted Files Ulf Leser: Maschinelle

More information

Drought News August 2014

Drought News August 2014 European Drought Observatory (EDO) Drought News August 2014 (Based on data until the end of July) http://edo.jrc.ec.europa.eu August 2014 EDO (http://edo.jrc.ec.europa.eu) Page 2 of 8 EDO Drought News

More information

Turn: Summer Economic Trend +1

Turn: Summer Economic Trend +1 Turn: Summer 95 Random events Economic Climate European Aggression Index Support Effects Factories Income Expenditures Activity Counters Mobilizations Axis and Allied Forces Balance of Power Spy Rings

More information

Temporal Evolution of the ScientificCollaboration Network in Europe

Temporal Evolution of the ScientificCollaboration Network in Europe Temporal Evolution of the ScientificCollaboration Network in Europe Jori Jämsä 23.3.2015 Instructor: Ph.D Raj Kumar Pan Supervisor: Prof. Harri Ehtamo This work can be saved and published in Aalto University

More information

CQC. Composing Quantum Channels

CQC. Composing Quantum Channels CQC Composing Quantum Channels Michael Wolf echnical University of Munich (UM) Matthias Christandl EH Zurich (EH) David Perez-Garcia Universidad Complutense de Madrid (UCM) Presenter: David Reeb, postdoc

More information

Cell- Each box of information in a chart or table. A cell is named by its column and row

Cell- Each box of information in a chart or table. A cell is named by its column and row Lesson 7 Cell- Each box of information in a chart or table. A cell is named by its column and row Steps to read a chart or table: Check the title to find out what type of information is being presented.

More information

The observed global warming of the lower atmosphere

The observed global warming of the lower atmosphere WATER AND CLIMATE CHANGE: CHANGES IN THE WATER CYCLE 3.1 3.1.6 Variability of European precipitation within industrial time CHRISTIAN-D. SCHÖNWIESE, SILKE TRÖMEL & REINHARD JANOSCHITZ SUMMARY: Precipitation

More information

Ontology-Based News Recommendation

Ontology-Based News Recommendation Ontology-Based News Recommendation Wouter IJntema Frank Goossen Flavius Frasincar Frederik Hogenboom Erasmus University Rotterdam, the Netherlands frasincar@ese.eur.nl Outline Introduction Hermes: News

More information

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability

More information

Paper Reference. 1312/1F Edexcel GCSE Geography A Paper 1F. Foundation Tier. Monday 5 June 2006 Morning Time: 1 hour 45 minutes

Paper Reference. 1312/1F Edexcel GCSE Geography A Paper 1F. Foundation Tier. Monday 5 June 2006 Morning Time: 1 hour 45 minutes Centre No. Paper Reference Surname Initial(s) Candidate No. 1 3 1 2 1 F Signature Paper Reference(s) 1312/1F Edexcel GCSE Geography A Paper 1F Foundation Tier Monday 5 June 2006 Morning Time: 1 hour 45

More information

Text Analytics (Text Mining)

Text Analytics (Text Mining) http://poloclub.gatech.edu/cse6242 CSE6242 / CX4242: Data & Visual Analytics Text Analytics (Text Mining) Concepts, Algorithms, LSI/SVD Duen Horng (Polo) Chau Assistant Professor Associate Director, MS

More information

Text Mining. March 3, March 3, / 49

Text Mining. March 3, March 3, / 49 Text Mining March 3, 2017 March 3, 2017 1 / 49 Outline Language Identification Tokenisation Part-Of-Speech (POS) tagging Hidden Markov Models - Sequential Taggers Viterbi Algorithm March 3, 2017 2 / 49

More information

Design of Indicators to monitor Strategic priorities: Mapping Knowledge Clusters

Design of Indicators to monitor Strategic priorities: Mapping Knowledge Clusters Design of Indicators to monitor Strategic priorities: Mapping Knowledge Clusters 5 th International Workshop Sharing the Best Practices in R&D and Education Statistics - Informing National and Institutional

More information

Students will be able to simplify numerical expressions and evaluate algebraic expressions. (M)

Students will be able to simplify numerical expressions and evaluate algebraic expressions. (M) Morgan County School District Re-3 August What is algebra? This chapter develops some of the basic symbolism and terminology that students may have seen before but still need to master. The concepts of

More information

Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 6 Scoring term weighting and the vector space model

Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 6 Scoring term weighting and the vector space model Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 6 Scoring term weighting and the vector space model Ranked retrieval Thus far, our queries have all been Boolean. Documents either

More information

WHO EpiData. A monthly summary of the epidemiological data on selected Vaccine preventable diseases in the European Region

WHO EpiData. A monthly summary of the epidemiological data on selected Vaccine preventable diseases in the European Region A monthly summary of the epidemiological data on selected Vaccine preventable diseases in the European Region Table : Reported cases for the period September 207 August 208 (data as of 0 October 208) Population

More information

A Convolutional Neural Network-based

A Convolutional Neural Network-based A Convolutional Neural Network-based Model for Knowledge Base Completion Dat Quoc Nguyen Joint work with: Dai Quoc Nguyen, Tu Dinh Nguyen and Dinh Phung April 16, 2018 Introduction Word vectors learned

More information

Topic Models and Applications to Short Documents

Topic Models and Applications to Short Documents Topic Models and Applications to Short Documents Dieu-Thu Le Email: dieuthu.le@unitn.it Trento University April 6, 2011 1 / 43 Outline Introduction Latent Dirichlet Allocation Gibbs Sampling Short Text

More information

INTRODUCTION. In March 1998, the tender for project CT.98.EP.04 was awarded to the Department of Medicines Management, Keele University, UK.

INTRODUCTION. In March 1998, the tender for project CT.98.EP.04 was awarded to the Department of Medicines Management, Keele University, UK. INTRODUCTION In many areas of Europe patterns of drug use are changing. The mechanisms of diffusion are diverse: introduction of new practices by new users, tourism and migration, cross-border contact,

More information

MapOSMatic, free city maps for everyone!

MapOSMatic, free city maps for everyone! MapOSMatic, free city maps for everyone! Thomas Petazzoni thomas.petazzoni@enix.org Libre Software Meeting 2012 http://www.maposmatic.org Thomas Petazzoni () MapOSMatic: free city maps for everyone! July

More information

ASSESSMENT OF DIFFERENT WATER STRESS INDICATORS BASED ON EUMETSAT LSA SAF PRODUCTS FOR DROUGHT MONITORING IN EUROPE

ASSESSMENT OF DIFFERENT WATER STRESS INDICATORS BASED ON EUMETSAT LSA SAF PRODUCTS FOR DROUGHT MONITORING IN EUROPE ASSESSMENT OF DIFFERENT WATER STRESS INDICATORS BASED ON EUMETSAT LSA SAF PRODUCTS FOR DROUGHT MONITORING IN EUROPE G. Sepulcre Canto, A. Singleton, J. Vogt European Commission, DG Joint Research Centre,

More information

Link Analysis Ranking

Link Analysis Ranking Link Analysis Ranking How do search engines decide how to rank your query results? Guess why Google ranks the query results the way it does How would you do it? Naïve ranking of query results Given query

More information

WHO EpiData. A monthly summary of the epidemiological data on selected Vaccine preventable diseases in the WHO European Region

WHO EpiData. A monthly summary of the epidemiological data on selected Vaccine preventable diseases in the WHO European Region A monthly summary of the epidemiological data on selected Vaccine preventable diseases in the WHO European Region Table : Reported cases for the period November 207 October 208 (data as of 30 November

More information

Aalborg Universitet. FuzzyPR Christensen, Hans Ulrich; Ortiz-Arroyo, Daniel. Published in: Rough Sets, Fuzzy Sets, Data Mining and Granular Computing

Aalborg Universitet. FuzzyPR Christensen, Hans Ulrich; Ortiz-Arroyo, Daniel. Published in: Rough Sets, Fuzzy Sets, Data Mining and Granular Computing Aalborg Universitet FuzzyPR Christensen, Hans Ulrich; Ortiz-Arroyo, Daniel Published in: Rough Sets, Fuzzy Sets, Data Mining and Granular Computing DOI (link to publication from Publisher): 10.1007/978-3-540-72530-5_23

More information

John Conway s Doomsday Algorithm

John Conway s Doomsday Algorithm 1 The algorithm as a poem John Conway s Doomsday Algorithm John Conway introduced the Doomsday Algorithm with the following rhyme: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 The last of Feb., or of Jan. will do

More information

Chapter 1 0+7= 1+6= 2+5= 3+4= 4+3= 5+2= 6+1= 7+0= How would you write five plus two equals seven?

Chapter 1 0+7= 1+6= 2+5= 3+4= 4+3= 5+2= 6+1= 7+0= How would you write five plus two equals seven? Chapter 1 0+7= 1+6= 2+5= 3+4= 4+3= 5+2= 6+1= 7+0= If 3 cats plus 4 cats is 7 cats, what does 4 olives plus 3 olives equal? olives How would you write five plus two equals seven? Chapter 2 Tom has 4 apples

More information

Ranked IR. Lecture Objectives. Text Technologies for Data Science INFR Learn about Ranked IR. Implement: 10/10/2018. Instructor: Walid Magdy

Ranked IR. Lecture Objectives. Text Technologies for Data Science INFR Learn about Ranked IR. Implement: 10/10/2018. Instructor: Walid Magdy Text Technologies for Data Science INFR11145 Ranked IR Instructor: Walid Magdy 10-Oct-2018 Lecture Objectives Learn about Ranked IR TFIDF VSM SMART notation Implement: TFIDF 2 1 Boolean Retrieval Thus

More information

Dimensionality reduction

Dimensionality reduction Dimensionality reduction ML for NLP Lecturer: Kevin Koidl Assist. Lecturer Alfredo Maldonado https://www.cs.tcd.ie/kevin.koidl/cs4062/ kevin.koidl@scss.tcd.ie, maldonaa@tcd.ie 2017 Recapitulating: Evaluating

More information

PageRank algorithm Hubs and Authorities. Data mining. Web Data Mining PageRank, Hubs and Authorities. University of Szeged.

PageRank algorithm Hubs and Authorities. Data mining. Web Data Mining PageRank, Hubs and Authorities. University of Szeged. Web Data Mining PageRank, University of Szeged Why ranking web pages is useful? We are starving for knowledge It earns Google a bunch of money. How? How does the Web looks like? Big strongly connected

More information

Modelling and projecting the postponement of childbearing in low-fertility countries

Modelling and projecting the postponement of childbearing in low-fertility countries of childbearing in low-fertility countries Nan Li and Patrick Gerland, United Nations * Abstract In most developed countries, total fertility reached below-replacement level and stopped changing notably.

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval Lecture 11: Probabilistic Information Retrieval 1 Outline Basic Probability Theory Probability Ranking Principle Extensions 2 Basic Probability Theory For events A

More information

Multivariate Analysis

Multivariate Analysis Prof. Dr. J. Franke All of Statistics 3.1 Multivariate Analysis High dimensional data X 1,..., X N, i.i.d. random vectors in R p. As a data matrix X: objects values of p features 1 X 11 X 12... X 1p 2.

More information

Fundamentals of Similarity Search

Fundamentals of Similarity Search Chapter 2 Fundamentals of Similarity Search We will now look at the fundamentals of similarity search systems, providing the background for a detailed discussion on similarity search operators in the subsequent

More information

Propositional Calculus. Problems. Propositional Calculus 3&4. 1&2 Propositional Calculus. Johnson will leave the cabinet, and we ll lose the election.

Propositional Calculus. Problems. Propositional Calculus 3&4. 1&2 Propositional Calculus. Johnson will leave the cabinet, and we ll lose the election. 1&2 Propositional Calculus Propositional Calculus Problems Jim Woodcock University of York October 2008 1. Let p be it s cold and let q be it s raining. Give a simple verbal sentence which describes each

More information

SDI in Lombardia (Italy(

SDI in Lombardia (Italy( SDI in Lombardia (Italy( Italy) Andrea Piccin European SDI Best Practice Awards 2009 - Learning from Best Practices Turin, 26th and 27th November 2009 Lombardia, in Italy, is 4 th Region for territorial

More information

Ranked IR. Lecture Objectives. Text Technologies for Data Science INFR Learn about Ranked IR. Implement: 10/10/2017. Instructor: Walid Magdy

Ranked IR. Lecture Objectives. Text Technologies for Data Science INFR Learn about Ranked IR. Implement: 10/10/2017. Instructor: Walid Magdy Text Technologies for Data Science INFR11145 Ranked IR Instructor: Walid Magdy 10-Oct-017 Lecture Objectives Learn about Ranked IR TFIDF VSM SMART notation Implement: TFIDF 1 Boolean Retrieval Thus far,

More information

Part A. P (w 1 )P (w 2 w 1 )P (w 3 w 1 w 2 ) P (w M w 1 w 2 w M 1 ) P (w 1 )P (w 2 w 1 )P (w 3 w 2 ) P (w M w M 1 )

Part A. P (w 1 )P (w 2 w 1 )P (w 3 w 1 w 2 ) P (w M w 1 w 2 w M 1 ) P (w 1 )P (w 2 w 1 )P (w 3 w 2 ) P (w M w M 1 ) Part A 1. A Markov chain is a discrete-time stochastic process, defined by a set of states, a set of transition probabilities (between states), and a set of initial state probabilities; the process proceeds

More information

P7.7 A CLIMATOLOGICAL STUDY OF CLOUD TO GROUND LIGHTNING STRIKES IN THE VICINITY OF KENNEDY SPACE CENTER, FLORIDA

P7.7 A CLIMATOLOGICAL STUDY OF CLOUD TO GROUND LIGHTNING STRIKES IN THE VICINITY OF KENNEDY SPACE CENTER, FLORIDA P7.7 A CLIMATOLOGICAL STUDY OF CLOUD TO GROUND LIGHTNING STRIKES IN THE VICINITY OF KENNEDY SPACE CENTER, FLORIDA K. Lee Burns* Raytheon, Huntsville, Alabama Ryan K. Decker NASA, Marshall Space Flight

More information

How Well Are Recessions and Recoveries Forecast? Prakash Loungani, Herman Stekler and Natalia Tamirisa

How Well Are Recessions and Recoveries Forecast? Prakash Loungani, Herman Stekler and Natalia Tamirisa How Well Are Recessions and Recoveries Forecast? Prakash Loungani, Herman Stekler and Natalia Tamirisa 1 Outline Focus of the study Data Dispersion and forecast errors during turning points Testing efficiency

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval Lecture 12: Language Models for IR Outline Language models Language Models for IR Discussion What is a language model? We can view a finite state automaton as a deterministic

More information

Lab Activity: Climate Variables

Lab Activity: Climate Variables Name: Date: Period: Water and Climate The Physical Setting: Earth Science Lab Activity: Climate Variables INTRODUCTION:! The state of the atmosphere continually changes over time in response to the uneven

More information

Blazar Optical Sky Survey BOSS Project ( ) and the long-term optical variability monitoring

Blazar Optical Sky Survey BOSS Project ( ) and the long-term optical variability monitoring Blazar Optical Sky Survey BOSS Project (2013-2016) and the long-term optical variability monitoring Kosmas Gazeas University of Athens Observatory - UOAO Department of Astrophysics, Astronomy and Mechanics

More information

Space Research at Hvar Observatory

Space Research at Hvar Observatory Space Research at Hvar Observatory Roman Brajša Hvar Observatory Faculty of Geodesy, University of Zagreb http://oh.geof.unizg.hr SCERIN-6 Capacity Building Workshop on Earth System Observations Zagreb,

More information

Sustainable City Index SCI-2014

Sustainable City Index SCI-2014 Sustainable City Index SCI-2014 Summary Since its successful launch in 2006, the Sustainable Society Index (SSI) still remains the only index that shows in quantitative terms the level of sustainability

More information

Corroborating Information from Disagreeing Views

Corroborating Information from Disagreeing Views Corroboration A. Galland WSDM 2010 1/26 Corroborating Information from Disagreeing Views Alban Galland 1 Serge Abiteboul 1 Amélie Marian 2 Pierre Senellart 3 1 INRIA Saclay Île-de-France 2 Rutgers University

More information

Machine Learning CPSC 340. Tutorial 12

Machine Learning CPSC 340. Tutorial 12 Machine Learning CPSC 340 Tutorial 12 Random Walk on Graph Page Rank Algorithm Label Propagation on Graph Assume a strongly connected graph G = (V, A) Label Propagation on Graph Assume a strongly connected

More information

Determine the trend for time series data

Determine the trend for time series data Extra Online Questions Determine the trend for time series data Covers AS 90641 (Statistics and Modelling 3.1) Scholarship Statistics and Modelling Chapter 1 Essent ial exam notes Time series 1. The value

More information

Public Library Use and Economic Hard Times: Analysis of Recent Data

Public Library Use and Economic Hard Times: Analysis of Recent Data Public Library Use and Economic Hard Times: Analysis of Recent Data A Report Prepared for The American Library Association by The Library Research Center University of Illinois at Urbana Champaign April

More information

Central Coast Tracking Trash. Trash Webinar #3 September 19, 2017

Central Coast Tracking Trash. Trash Webinar #3 September 19, 2017 Central Coast Tracking Trash Trash Webinar #3 September 19, 2017 Webinar #3: Agenda Trash App v0.2 release Intro to tracking annual progress towards compliance Intro to prioritizing actions Next webinar

More information

Bathing water results 2011 Slovakia

Bathing water results 2011 Slovakia Bathing water results Slovakia 1. Reporting and assessment This report gives a general overview of water in Slovakia for the season. Slovakia has reported under the Directive 2006/7/EC since 2008. When

More information

Dealing with Text Databases

Dealing with Text Databases Dealing with Text Databases Unstructured data Boolean queries Sparse matrix representation Inverted index Counts vs. frequencies Term frequency tf x idf term weights Documents as vectors Cosine similarity

More information

Identification of Very Shallow Groundwater Regions in the EU to Support Monitoring

Identification of Very Shallow Groundwater Regions in the EU to Support Monitoring Identification of Very Shallow Groundwater Regions in the EU to Support Monitoring Timothy Negley Paul Sweeney Lucy Fish Paul Hendley Andrew Newcombe ARCADIS Syngenta Ltd. Syngenta Ltd. Phasera Ltd. ARCADIS

More information

JANUARY MONDAY TUESDAY WEDNESDAY THURSDAY FRIDAY SATURDAY SUNDAY

JANUARY MONDAY TUESDAY WEDNESDAY THURSDAY FRIDAY SATURDAY SUNDAY Vocabulary (01) The Calendar (012) In context: Look at the calendar. Then, answer the questions. JANUARY MONDAY TUESDAY WEDNESDAY THURSDAY FRIDAY SATURDAY SUNDAY 1 New 2 3 4 5 6 Year s Day 7 8 9 10 11

More information

Three right directions and three wrong directions for tensor research

Three right directions and three wrong directions for tensor research Three right directions and three wrong directions for tensor research Michael W. Mahoney Stanford University ( For more info, see: http:// cs.stanford.edu/people/mmahoney/ or Google on Michael Mahoney

More information

An Implementation of Mobile Sensing for Large-Scale Urban Monitoring

An Implementation of Mobile Sensing for Large-Scale Urban Monitoring An Implementation of Mobile Sensing for Large-Scale Urban Monitoring Teerayut Horanont 1, Ryosuke Shibasaki 1,2 1 Department of Civil Engineering, University of Tokyo, Meguro, Tokyo 153-8505, JAPAN Email:

More information

A re-sampling based weather generator

A re-sampling based weather generator A re-sampling based weather generator Sara Martino 1 Joint work with T. Nipen 2 and C. Lussana 2 1 Sintef Energy Resources 2 Norwegian Metereologic Institute Berlin 19th Sept. 2017 Sara Martino Joint work

More information

Boolean and Vector Space Retrieval Models CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK).

Boolean and Vector Space Retrieval Models CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK). Boolean and Vector Space Retrieval Models 2013 CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK). 1 Table of Content Boolean model Statistical vector space model Retrieval

More information

carbon dioxide +... (+ light energy) glucose +...

carbon dioxide +... (+ light energy) glucose +... Photosynthesis 1. (i) Complete the word equation for photosynthesis. (ii) carbon dioxide +... (+ light energy) glucose +... Most of the carbon dioxide that a plant uses during photosynthesis is absorbed

More information

Outline for today. Information Retrieval. Cosine similarity between query and document. tf-idf weighting

Outline for today. Information Retrieval. Cosine similarity between query and document. tf-idf weighting Outline for today Information Retrieval Efficient Scoring and Ranking Recap on ranked retrieval Jörg Tiedemann jorg.tiedemann@lingfil.uu.se Department of Linguistics and Philology Uppsala University Efficient

More information

REMOTELY SENSED INFORMATION FOR CROP MONITORING AND FOOD SECURITY

REMOTELY SENSED INFORMATION FOR CROP MONITORING AND FOOD SECURITY LEARNING OBJECTIVES Lesson 4 Methods and Analysis 2: Rainfall and NDVI Seasonal Graphs At the end of the lesson, you will be able to: understand seasonal graphs for rainfall and NDVI; describe the concept

More information

Online Appendix for Cultural Biases in Economic Exchange? Luigi Guiso Paola Sapienza Luigi Zingales

Online Appendix for Cultural Biases in Economic Exchange? Luigi Guiso Paola Sapienza Luigi Zingales Online Appendix for Cultural Biases in Economic Exchange? Luigi Guiso Paola Sapienza Luigi Zingales 1 Table A.1 The Eurobarometer Surveys The Eurobarometer surveys are the products of a unique program

More information

INTEROPERABILITY AND DATA HOMOGENIZATION

INTEROPERABILITY AND DATA HOMOGENIZATION Page 1 of 5 The Transnational Geo-portal Italian-Slovenian of the Cross-Border Park Area Eva Savina Malinverni DARDUS Università Politecnica delle Marche Ancona, IT e.s.malinverni@univpm.it INTRODUCTION

More information