Text and multimedia languages and properties

Size: px
Start display at page:

Download "Text and multimedia languages and properties"

Transcription

1 Text and multimedia languages and properties (Modern IR, Ch. 6) I VP R 1 Introduction Metadata Text Markup Languages Multimedia Trends and Research Issues Bibliographical Discussion I VP R 2

2 Introduction Document: single unit of info Text Other media, too Aspect of document I VP R 3 Aspect of document Syntax Semantics (Also pragmatics) Presentation style Application dependent Print vs. WWW vs. handheld etc. Metadata I VP R 4

3 Syntax Implicit / explicit / declarative I VP R 5 Formatting style Separate from content Manipulate independently I VP R 6

4 Metadata Data about data Descriptive How created Dublin Core Metadata Element Set 15 fields to describe doc Semantic About content è ontologies I VP R 7 MARC Machine Readable Cataloging Record Most used for library records Fields for bibliographic info USMARC: US specific I VP R 8

5 BiBTeX Originally, TeX Now, general cataloging I VP R 9 RDF Resource Descriptive Framework Web metadata standard XML-based è machine interoperability Nodes (URI) + attribute=value pairs I VP R 10

6 Text Formats ASCII, EBCDIC, RTF, PDF, MIME, etc. Information Theory Modeling Natural Language Similarity Models I VP R 11 Information Theory Entropy Fewer symbols è less meaning E = σ i= 1 p i log 2 p i Amount of info measure Symbols of alphabet Coded in binary E.g, σ = 2 è E = 1 if both symbols appear same # times E = 0 if only one symbol appears I VP R 12

7 Modeling Natural Language Text: Symbols from finite alphabet Belong to words Letters Separate words Vowels more frequent e : highest frequency (English) I VP R 13 Binomial model Each symbol è given probability But, dependence on previous symbols E.g., English è no cf è prob of symbol depends on previous Finite-context Markovian model I VP R 14

8 Finite-context Markovian model Consider 1, 2, more letters to generate next k-order model: consider previous k for next Binomial = 0-order More complex I VP R 15 More complex Finite-state machines è regular languages Grammars è context-free languages è context-sensitive languages Natual But correct grammar, hard I VP R 16

9 Word distribution models Zipf s law f(i s most frequent word) = 1/i Θ * f (most frequent word) è text: n words, vocabulary V words è f(i s most frequent word) = n / i Θ H v (Θ) H v (Θ) = harmonic number of order Θ of V V Hv( θ) = θ I VP R j= 1 j 17 1 Word frequency distribution Words arranged in decreasing f order Θ = 1 è H v (Θ) = O(log n) Θ = better fit for real data è H v (Θ) = O(1) Mandelbrot distribution: k/(c + i) Θ c = add l parameter k = such that all f s add to n F words I VP R 18

10 Word distribution skewed Few hundred words = ~50% of text è can ignore if too frequent e.g., stopwords Doesn t carry meaning Most frequent è can drop è reduce index space overhead I VP R 19 E.g., TREC-2 collection Most frequent: the, of, and, a, to, in I VP R 20

11 Word distribution in collection Simple model: same in all doc s But not true Better: negative binomial distribution Fraction of doc s containing word k times: F( k) = α + k 1 k p (1 + p) k α k Parameters: depend on word and doc collection I VP R 21 e.g., Brown Corpus, said è p = 9.24, α = 0.42 Other models from Poisson distribution I VP R 22

12 Vocabulary size # distinct words in doc / collection Heap s Law V Finite but high è better model than O(1) Also growth due to typos V = Kn β = O ( n β ) Text dependent 10 K < β < 1 # words in text Text size I VP R 23 Similarity Models Syntactic similarity between strings / doc s Distance function E.g., Hamming distance # positions w/ diff char s Equal è = 0 I VP R 24

13 Distance function properties Symmetric d(a, b) = d(b, a) Triangle equality d(a, c) d(a, b) + d(b, c) I VP R 25 Edit (Levenshtein) distance Min # char s, insert, delete, sub needed a à b e.g., color à colour = 1 survey à surgery = 2 Considered superior to other models I VP R 26

14 LCS longest common subsequence Only char deletion allowed LCS what s left after all noncommon char s deleted E.g., survey + surgery è surey Doc s è longest common line Unix diff command Time consuming I VP R 27 Also visual tools E.g., Dotplot Rectangular map Coordinates = file lines Each coo entry = gray pixel Depends on edit distance between associated lines I VP R 28

15 Markup Languages SGML DTD Schema HTML XML I VP R 29 Multimedia Formats Textual Images Graphics and Virtual Reality CGM: computer graphics metafile VRML HyTime: Hypermedia/Time-based Structuring Language Multimedia document markup standard SGML architecture Fig. 6.6, p. 161: taxonomy of Web languages I VP R 30

Information Retrieval CS Lecture 03. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Information Retrieval CS Lecture 03. Razvan C. Bunescu School of Electrical Engineering and Computer Science Information Retrieval CS 6900 Lecture 03 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Statistical Properties of Text Zipf s Law models the distribution of terms

More information

Lecture 4 : Adaptive source coding algorithms

Lecture 4 : Adaptive source coding algorithms Lecture 4 : Adaptive source coding algorithms February 2, 28 Information Theory Outline 1. Motivation ; 2. adaptive Huffman encoding ; 3. Gallager and Knuth s method ; 4. Dictionary methods : Lempel-Ziv

More information

Data Structures in Java

Data Structures in Java Data Structures in Java Lecture 20: Algorithm Design Techniques 12/2/2015 Daniel Bauer 1 Algorithms and Problem Solving Purpose of algorithms: find solutions to problems. Data Structures provide ways of

More information

Linguistics and logic of common mathematical language I. Peter Koepke and Merlin Carl, Mathematical Institute Universität Bonn

Linguistics and logic of common mathematical language I. Peter Koepke and Merlin Carl, Mathematical Institute Universität Bonn The NAPROCHE Project Linguistics and logic of common mathematical language I Peter Koepke and Merlin Carl, Mathematical Institute Universität Bonn Mathematical texts are formulated in a semi-formal language,

More information

WEB MAP SERVICE (WMS) FOR GEOLOGICAL DATA GEORGE TUDOR

WEB MAP SERVICE (WMS) FOR GEOLOGICAL DATA GEORGE TUDOR WEB MAP SERVICE (WMS) FOR GEOLOGICAL DATA GEORGE TUDOR WEB MAP SERVICE (WMS) - GENERALITIES Projects with data from different sources Geological data are in different GIS software format Large amount of

More information

High Dimensional Search Min- Hashing Locality Sensi6ve Hashing

High Dimensional Search Min- Hashing Locality Sensi6ve Hashing High Dimensional Search Min- Hashing Locality Sensi6ve Hashing Debapriyo Majumdar Data Mining Fall 2014 Indian Statistical Institute Kolkata September 8 and 11, 2014 High Support Rules vs Correla6on of

More information

Part 1: Fundamentals

Part 1: Fundamentals Provläsningsexemplar / Preview INTERNATIONAL STANDARD ISO 19101-1 First edition 2014-11-15 Geographic information Reference model Part 1: Fundamentals Information géographique Modèle de référence Partie

More information

Theoretical Computer Science

Theoretical Computer Science Theoretical Computer Science Zdeněk Sawa Department of Computer Science, FEI, Technical University of Ostrava 17. listopadu 15, Ostrava-Poruba 708 33 Czech republic September 22, 2017 Z. Sawa (TU Ostrava)

More information

1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H.

1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H. Problem sheet Ex. Verify that the function H(p,..., p n ) = k p k log p k satisfies all 8 axioms on H. Ex. (Not to be handed in). looking at the notes). List as many of the 8 axioms as you can, (without

More information

Conceptual Similarity: Why, Where, How

Conceptual Similarity: Why, Where, How Conceptual Similarity: Why, Where, How Michalis Sfakakis Laboratory on Digital Libraries & Electronic Publishing, Department of Archives and Library Sciences, Ionian University, Greece First Workshop on

More information

Basic Dublin Core Semantics

Basic Dublin Core Semantics Basic Dublin Core Semantics DC 2006 Tutorial 1, 3 October 2006 Marty Kurth Head of Metadata Services Cornell University Library Getting started Let s introduce ourselves Let s discuss our expectations

More information

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability

More information

Applying the Semantic Web to Computational Chemistry

Applying the Semantic Web to Computational Chemistry Applying the Semantic Web to Computational Chemistry Neil S. Ostlund, Mirek Sopek Chemical Semantics Inc., Gainesville, Florida, USA {ostlund, sopek}@chemicalsemantics.com Abstract. Chemical Semantics

More information

Exchange ActiveSync: AirSyncBase Namespace Protocol

Exchange ActiveSync: AirSyncBase Namespace Protocol [MS-ASAIRS]: Intellectual Property Rights Notice for Open Specifications Documentation Technical Documentation. Microsoft publishes Open Specifications documentation ( this documentation ) for protocols,

More information

Section Summary. Relations and Functions Properties of Relations. Combining Relations

Section Summary. Relations and Functions Properties of Relations. Combining Relations Chapter 9 Chapter Summary Relations and Their Properties n-ary Relations and Their Applications (not currently included in overheads) Representing Relations Closures of Relations (not currently included

More information

Boolean and Vector Space Retrieval Models

Boolean and Vector Space Retrieval Models Boolean and Vector Space Retrieval Models Many slides in this section are adapted from Prof. Joydeep Ghosh (UT ECE) who in turn adapted them from Prof. Dik Lee (Univ. of Science and Tech, Hong Kong) 1

More information

EEOS 381 -Spatial Databases and GIS Applications

EEOS 381 -Spatial Databases and GIS Applications EEOS 381 -Spatial Databases and GIS Applications Lecture 5 Geodatabases What is a Geodatabase? Geographic Database ESRI-coined term A standard RDBMS that stores and manages geographic data A modern object-relational

More information

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Schütze s, linked from http://informationretrieval.org/ IR 26/26: Feature Selection and Exam Overview Paul Ginsparg Cornell University,

More information

Data Compression Techniques

Data Compression Techniques Data Compression Techniques Part 2: Text Compression Lecture 5: Context-Based Compression Juha Kärkkäinen 14.11.2017 1 / 19 Text Compression We will now look at techniques for text compression. These techniques

More information

Lecture 1b: Text, terms, and bags of words

Lecture 1b: Text, terms, and bags of words Lecture 1b: Text, terms, and bags of words Trevor Cohn (based on slides by William Webber) COMP90042, 2015, Semester 1 Corpus, document, term Body of text referred to as corpus Corpus regarded as a collection

More information

Boolean and Vector Space Retrieval Models CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK).

Boolean and Vector Space Retrieval Models CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK). Boolean and Vector Space Retrieval Models 2013 CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK). 1 Table of Content Boolean model Statistical vector space model Retrieval

More information

Algorithms for Approximate String Matching

Algorithms for Approximate String Matching Levenshtein Distance Algorithms for Approximate String Matching Part I Levenshtein Distance Hamming Distance Approximate String Matching with k Differences Longest Common Subsequences Part II A Fast and

More information

GRIB API A database driven decoding library

GRIB API A database driven decoding library GRIB API A database driven decoding library Enrico Fucile, Cristian Codorean Data & Services ECMWF Slide 1 Slide 1 Overview Introduction GRIB code GRIB API keys database Parameters database Slide 2 Slide

More information

Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet)

Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet) Compression Motivation Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet) Storage: Store large & complex 3D models (e.g. 3D scanner

More information

Models. Models of Computation, Turing Machines, and the Limits of Turing Computation. Effective Calculability. Motivation for Models of Computation

Models. Models of Computation, Turing Machines, and the Limits of Turing Computation. Effective Calculability. Motivation for Models of Computation Turing Computation /0/ Models of Computation, Turing Machines, and the Limits of Turing Computation Bruce MacLennan Models A model is a tool intended to address a class of questions about some domain of

More information

Archaeology, Formality & the CIDOC CRM. Leif Isaksen, Kirk Martinez & Graeme Earl ECS/Archaeology University of Southampton

Archaeology, Formality & the CIDOC CRM. Leif Isaksen, Kirk Martinez & Graeme Earl ECS/Archaeology University of Southampton Archaeology, Formality & the CIDOC CRM Leif Isaksen, Kirk Martinez & Graeme Earl ECS/Archaeology University of Southampton 1 Is the CIDOC CRM too hard? The initial idea that the Domain Experts would be

More information

Dynamic Ontology Service for Historical Persons and Places Based on Crowdsourcing

Dynamic Ontology Service for Historical Persons and Places Based on Crowdsourcing Dynamic Ontology Service for Historical Persons and Places Based on Crowdsourcing 22.1.2016, COST RRL WG2 Workshop Jouni Tuominen Semantic Computing Research Group (SeCo), http://seco.cs.aalto.fi, Aalto

More information

Definition: A binary relation R from a set A to a set B is a subset R A B. Example:

Definition: A binary relation R from a set A to a set B is a subset R A B. Example: Chapter 9 1 Binary Relations Definition: A binary relation R from a set A to a set B is a subset R A B. Example: Let A = {0,1,2} and B = {a,b} {(0, a), (0, b), (1,a), (2, b)} is a relation from A to B.

More information

Hidden Markov Models, I. Examples. Steven R. Dunbar. Toy Models. Standard Mathematical Models. Realistic Hidden Markov Models.

Hidden Markov Models, I. Examples. Steven R. Dunbar. Toy Models. Standard Mathematical Models. Realistic Hidden Markov Models. , I. Toy Markov, I. February 17, 2017 1 / 39 Outline, I. Toy Markov 1 Toy 2 3 Markov 2 / 39 , I. Toy Markov A good stack of examples, as large as possible, is indispensable for a thorough understanding

More information

Peter Wood. Department of Computer Science and Information Systems Birkbeck, University of London Automata and Formal Languages

Peter Wood. Department of Computer Science and Information Systems Birkbeck, University of London Automata and Formal Languages and and Department of Computer Science and Information Systems Birkbeck, University of London ptw@dcs.bbk.ac.uk Outline and Doing and analysing problems/languages computability/solvability/decidability

More information

INFO 2950 Intro to Data Science. Lecture 18: Power Laws and Big Data

INFO 2950 Intro to Data Science. Lecture 18: Power Laws and Big Data INFO 2950 Intro to Data Science Lecture 18: Power Laws and Big Data Paul Ginsparg Cornell University, Ithaca, NY 7 Apr 2016 1/25 Power Laws in log-log space y = cx k (k=1/2,1,2) log 10 y = k log 10 x +log

More information

13 Searching the Web with the SVD

13 Searching the Web with the SVD 13 Searching the Web with the SVD 13.1 Information retrieval Over the last 20 years the number of internet users has grown exponentially with time; see Figure 1. Trying to extract information from this

More information

God doesn t play dice. - Albert Einstein

God doesn t play dice. - Albert Einstein ECE 450 Lecture 1 God doesn t play dice. - Albert Einstein As far as the laws of mathematics refer to reality, they are not certain; as far as they are certain, they do not refer to reality. Lecture Overview

More information

text statistics October 24, 2018 text statistics 1 / 20

text statistics October 24, 2018 text statistics 1 / 20 text statistics October 24, 2018 text statistics 1 / 20 Overview 1 2 text statistics 2 / 20 Outline 1 2 text statistics 3 / 20 Model collection: The Reuters collection symbol statistic value N documents

More information

OWL Semantics COMP Sean Bechhofer Uli Sattler

OWL Semantics COMP Sean Bechhofer Uli Sattler OWL Semantics COMP62342 Sean Bechhofer sean.bechhofer@manchester.ac.uk Uli Sattler uli.sattler@manchester.ac.uk 1 Toward Knowledge Formalization Acquisition Process Elicit tacit knowledge A set of terms/concepts

More information

Magnetar Games Corporation

Magnetar Games Corporation 'The higher we soar the smaller we appear to those who cannot fly.. Friedrich Nietzsche Magnetar Games Corporation Magnetar Multiverse Highlights! Standards based virtual alternate reality authoring and

More information

Finite State Machines. Languages g and Machines

Finite State Machines. Languages g and Machines Finite State Machines Chapter 5 Languages g and Machines Regular Languages g L Regular Language Regular Expression Accepts Finite State Machine Finite State Machines An FSM to accept $.50 in change: Definition

More information

CityGML XFM Application Template Documentation. Bentley Map V8i (SELECTseries 2)

CityGML XFM Application Template Documentation. Bentley Map V8i (SELECTseries 2) CityGML XFM Application Template Documentation Bentley Map V8i (SELECTseries 2) Table of Contents Introduction to CityGML 1 CityGML XFM Application Template 2 Requirements 2 Finding Documentation 2 To

More information

Some useful tasks involving language. Finite-State Machines and Regular Languages. More useful tasks involving language. Regular expressions

Some useful tasks involving language. Finite-State Machines and Regular Languages. More useful tasks involving language. Regular expressions Some useful tasks involving language Finite-State Machines and Regular Languages Find all phone numbers in a text, e.g., occurrences such as When you call (614) 292-8833, you reach the fax machine. Find

More information

Sapienza universita di Roma Dipartimento di Informatica e Sistemistica. User guide WSCE-Lite Web Service Composition Engine v 0.1.

Sapienza universita di Roma Dipartimento di Informatica e Sistemistica. User guide WSCE-Lite Web Service Composition Engine v 0.1. Sapienza universita di Roma Dipartimento di Informatica e Sistemistica User guide WSCE-Lite Web Service Composition Engine v 0.1 Valerio Colaianni Contents 1 Installation 5 1.1 Installing TLV..........................

More information

A conceptualization is a map from the problem domain into the representation. A conceptualization specifies:

A conceptualization is a map from the problem domain into the representation. A conceptualization specifies: Knowledge Sharing A conceptualization is a map from the problem domain into the representation. A conceptualization specifies: What sorts of individuals are being modeled The vocabulary for specifying

More information

CMPT 365 Multimedia Systems. Lossless Compression

CMPT 365 Multimedia Systems. Lossless Compression CMPT 365 Multimedia Systems Lossless Compression Spring 2017 Edited from slides by Dr. Jiangchuan Liu CMPT365 Multimedia Systems 1 Outline Why compression? Entropy Variable Length Coding Shannon-Fano Coding

More information

A Survey of Temporal Knowledge Representations

A Survey of Temporal Knowledge Representations A Survey of Temporal Knowledge Representations Advisor: Professor Abdullah Tansel Second Exam Presentation Knowledge Representations logic-based logic-base formalisms formalisms more complex and difficult

More information

Knowledge representation DATA INFORMATION KNOWLEDGE WISDOM. Figure Relation ship between data, information knowledge and wisdom.

Knowledge representation DATA INFORMATION KNOWLEDGE WISDOM. Figure Relation ship between data, information knowledge and wisdom. Knowledge representation Introduction Knowledge is the progression that starts with data which s limited utility. Data when processed become information, information when interpreted or evaluated becomes

More information

Data. Notes. are required reading for the week. textbook reading and a few slides on data formats and data cleaning

Data. Notes. are required reading for the week. textbook reading and a few slides on data formats and data cleaning CS 725/825 Information Visualization Fall 2017 Data Dr. Michele C. Weigle http://www.cs.odu.edu/~mweigle/cs725-f17/ Notes } We will not cover these slides in class, but they are required reading for the

More information

Multimedia. Multimedia Data Compression (Lossless Compression Algorithms)

Multimedia. Multimedia Data Compression (Lossless Compression Algorithms) Course Code 005636 (Fall 2017) Multimedia Multimedia Data Compression (Lossless Compression Algorithms) Prof. S. M. Riazul Islam, Dept. of Computer Engineering, Sejong University, Korea E-mail: riaz@sejong.ac.kr

More information

ECE 479/579 Principles of Artificial Intelligence Part I Spring Dr. Michael Marefat

ECE 479/579 Principles of Artificial Intelligence Part I Spring Dr. Michael Marefat ECE 479/579 Principles of Artificial Intelligence Part I Spring 2005 Dr. Michael Marefat (marefat@ece.arizona.edu) Required text "Artificial Intelligence: A Modern Approach, Second Edition" by Stuart Russell

More information

Ecco: A Hybrid Diff Tool for OWL 2 ontologies

Ecco: A Hybrid Diff Tool for OWL 2 ontologies Ecco: A Hybrid Diff Tool for OWL 2 ontologies Rafael S. Gonçalves, Bijan Parsia, and Ulrike Sattler School of Computer Science, University of Manchester, Manchester, United Kingdom Abstract. The detection

More information

Sequences and Information

Sequences and Information Sequences and Information Rahul Siddharthan The Institute of Mathematical Sciences, Chennai, India http://www.imsc.res.in/ rsidd/ Facets 16, 04/07/2016 This box says something By looking at the symbols

More information

ISO INTERNATIONAL STANDARD. Geographic information Spatial referencing by coordinates

ISO INTERNATIONAL STANDARD. Geographic information Spatial referencing by coordinates INTERNATIONAL STANDARD ISO 19111 Second edition 2007-07-01 Geographic information Spatial referencing by coordinates Information géographique Système de références spatiales par coordonnées Reference number

More information

FROM QUERIES TO TOP-K RESULTS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

FROM QUERIES TO TOP-K RESULTS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS FROM QUERIES TO TOP-K RESULTS Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Retrieval models Retrieval evaluation Link

More information

4th year Project demo presentation

4th year Project demo presentation 4th year Project demo presentation Colm Ó héigeartaigh CASE4-99387212 coheig-case4@computing.dcu.ie 4th year Project demo presentation p. 1/23 Table of Contents An Introduction to Quantum Computing The

More information

OWL Semantics. COMP60421 Sean Bechhofer University of Manchester

OWL Semantics. COMP60421 Sean Bechhofer University of Manchester OWL Semantics COMP60421 Sean Bechhofer University of Manchester sean.bechhofer@manchester.ac.uk 1 Technologies for the Semantic Web Metadata Resources are marked-up with descriptions of their content.

More information

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Dynamic Programming II Date: 10/12/17

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Dynamic Programming II Date: 10/12/17 601.433/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Dynamic Programming II Date: 10/12/17 12.1 Introduction Today we re going to do a couple more examples of dynamic programming. While

More information

Turing s thesis: (1930) Any computation carried out by mechanical means can be performed by a Turing Machine

Turing s thesis: (1930) Any computation carried out by mechanical means can be performed by a Turing Machine Turing s thesis: (1930) Any computation carried out by mechanical means can be performed by a Turing Machine There is no known model of computation more powerful than Turing Machines Definition of Algorithm:

More information

7 Distances. 7.1 Metrics. 7.2 Distances L p Distances

7 Distances. 7.1 Metrics. 7.2 Distances L p Distances 7 Distances We have mainly been focusing on similarities so far, since it is easiest to explain locality sensitive hashing that way, and in particular the Jaccard similarity is easy to define in regards

More information

Lecture 5: Web Searching using the SVD

Lecture 5: Web Searching using the SVD Lecture 5: Web Searching using the SVD Information Retrieval Over the last 2 years the number of internet users has grown exponentially with time; see Figure. Trying to extract information from this exponentially

More information

Language as a Stochastic Process

Language as a Stochastic Process CS769 Spring 2010 Advanced Natural Language Processing Language as a Stochastic Process Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu 1 Basic Statistics for NLP Pick an arbitrary letter x at random from any

More information

Reducing Consumer Uncertainty

Reducing Consumer Uncertainty Spatial Analytics Reducing Consumer Uncertainty Eliciting User and Producer Views on Geospatial Data Quality Introduction Cooperative Research Centre for Spatial Information (CRCSI) in Australia Communicate

More information

Lecture 3: Probabilistic Retrieval Models

Lecture 3: Probabilistic Retrieval Models Probabilistic Retrieval Models Information Retrieval and Web Search Engines Lecture 3: Probabilistic Retrieval Models November 5 th, 2013 Wolf-Tilo Balke and Kinda El Maarry Institut für Informationssysteme

More information

Languages. Languages. An Example Grammar. Grammars. Suppose we have an alphabet V. Then we can write:

Languages. Languages. An Example Grammar. Grammars. Suppose we have an alphabet V. Then we can write: Languages A language is a set (usually infinite) of strings, also known as sentences Each string consists of a sequence of symbols taken from some alphabet An alphabet, V, is a finite set of symbols, e.g.

More information

Contextualizing Historical Places in a Gazetteer by Using Historical Maps and Linked Data

Contextualizing Historical Places in a Gazetteer by Using Historical Maps and Linked Data Contextualizing Historical Places in a Gazetteer by Using Historical Maps and Linked Data Esko Ikkala, Jouni Tuominen, and Eero Hyvönen Semantic Computing Research Group (SeCo), Aalto University http://www.seco.tkk.fi/,

More information

9 Searching the Internet with the SVD

9 Searching the Internet with the SVD 9 Searching the Internet with the SVD 9.1 Information retrieval Over the last 20 years the number of internet users has grown exponentially with time; see Figure 1. Trying to extract information from this

More information

Chapter 2 Date Compression: Source Coding. 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code

Chapter 2 Date Compression: Source Coding. 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code Chapter 2 Date Compression: Source Coding 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code 2.1 An Introduction to Source Coding Source coding can be seen as an efficient way

More information

Georef - Linked Data Deployment for Spatial Data; Finnish Initiative

Georef - Linked Data Deployment for Spatial Data; Finnish Initiative Georef - Linked Data Deployment for Spatial Data; Finnish Initiative Esa TIAINEN, Finland Key words: Linked Data, Semantics, Ontology, Web, RDF, Big Data SUMMARY 'Georef' is an initiative for a service

More information

The Flows of Concepts

The Flows of Concepts Marcin Skulimowski Faculty of Physics and Applied Informatics, University of Lodz, Pomorska 149/153, 90-236 Lodz, Poland Keywords: Abstract: Citation Relation, Semantic Publishing, Digital Libraries. A

More information

Atmospheric Science and GIS Interoperability issues: some Data Model and Computational Interface aspects

Atmospheric Science and GIS Interoperability issues: some Data Model and Computational Interface aspects UNIDATA Boulder, Sep. 2003 Atmospheric Science and GIS Interoperability issues: some Data and Computational Interface aspects Stefano Nativi University of Florence and IMAA-CNR Outline Service-Oriented

More information

OWL Basics. Technologies for the Semantic Web. Building a Semantic Web. Ontology

OWL Basics. Technologies for the Semantic Web. Building a Semantic Web. Ontology Technologies for the Semantic Web OWL Basics COMP60421 Sean Bechhofer University of Manchester sean.bechhofer@manchester.ac.uk Metadata Resources are marked-up with descriptions of their content. No good

More information

Innovation. The Push and Pull at ESRI. September Kevin Daugherty Cadastral/Land Records Industry Solutions Manager

Innovation. The Push and Pull at ESRI. September Kevin Daugherty Cadastral/Land Records Industry Solutions Manager Innovation The Push and Pull at ESRI September 2004 Kevin Daugherty Cadastral/Land Records Industry Solutions Manager The Push and The Pull The Push is the information technology that drives research and

More information

Regular expressions and automata

Regular expressions and automata Regular expressions and automata Introduction Finite State Automaton (FSA) Finite State Transducers (FST) Regular expressions(i) (Res) Standard notation for characterizing text sequences Specifying text

More information

Information Retrieval and Web Search Engines

Information Retrieval and Web Search Engines Information Retrieval and Web Search Engines Lecture 4: Probabilistic Retrieval Models April 29, 2010 Wolf-Tilo Balke and Joachim Selke Institut für Informationssysteme Technische Universität Braunschweig

More information

Using Tree Automata and Regular Expressions to Manipulate Hierarchically Structured Data

Using Tree Automata and Regular Expressions to Manipulate Hierarchically Structured Data Using Tree Automata and Regular Expressions to Manipulate Hierarchically Structured Data Nikita Schmidt and Ahmed Patel University College Dublin arxiv:cs/0201008v1 [cs.cl] 11 Jan 2002 Abstract Information,

More information

Tobias Markus. January 21, 2015

Tobias Markus. January 21, 2015 Automata Advanced Seminar Computer Engineering January 21, 2015 (Advanced Seminar Computer Engineering ) Automata January 21, 2015 1 / 35 1 2 3 4 5 6 obias Markus (Advanced Seminar Computer Engineering

More information

1 Introduction. Multimedia, Vision and Graphics Laboratory

1 Introduction. Multimedia, Vision and Graphics Laboratory Multimedia, Vision and Graphics Laboratory Rumelifeneri Yolu, Sarıyer, 34450, İstanbul / Turkey date: September 3, 2017 to: Engin Erzin from: M. A. Tuğtekin Turan subject: On the Use of MVGL Report Template

More information

Huffman Coding. C.M. Liu Perceptual Lab, College of Computer Science National Chiao-Tung University

Huffman Coding. C.M. Liu Perceptual Lab, College of Computer Science National Chiao-Tung University Huffman Coding C.M. Liu Perceptual Lab, College of Computer Science National Chiao-Tung University http://www.csie.nctu.edu.tw/~cmliu/courses/compression/ Office: EC538 (03)573877 cmliu@cs.nctu.edu.tw

More information

Multimedia Communications. Mathematical Preliminaries for Lossless Compression

Multimedia Communications. Mathematical Preliminaries for Lossless Compression Multimedia Communications Mathematical Preliminaries for Lossless Compression What we will see in this chapter Definition of information and entropy Modeling a data source Definition of coding and when

More information

Introduction to Automata

Introduction to Automata Introduction to Automata Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /

More information

Semantic Web SPARQL. Gerd Gröner, Matthias Thimm. July 16,

Semantic Web SPARQL. Gerd Gröner, Matthias Thimm. July 16, Semantic Web SPARQL Gerd Gröner, Matthias Thimm {groener,thimm}@uni-koblenz.de Institute for Web Science and Technologies (WeST) University of Koblenz-Landau July 16, 2013 Gerd Gröner, Matthias Thimm Semantic

More information

Context-free grammars and languages

Context-free grammars and languages Context-free grammars and languages The next class of languages we will study in the course is the class of context-free languages. They are defined by the notion of a context-free grammar, or a CFG for

More information

Compression Techniques for 3D SDI

Compression Techniques for 3D SDI Compression Techniques for 3D SDI Bernad S. ChengXi and Alias Abdul Rahman 3D GIS Research Lab Faculty of Geoinformation and Real Estate Universiti Teknologi Malaysia Outline Introduction Background of

More information

The Unsolvability of the Halting Problem. Chapter 19

The Unsolvability of the Halting Problem. Chapter 19 The Unsolvability of the Halting Problem Chapter 19 Languages and Machines SD D Context-Free Languages Regular Languages reg exps FSMs cfgs PDAs unrestricted grammars Turing Machines D and SD A TM M with

More information

Data Compression. Limit of Information Compression. October, Examples of codes 1

Data Compression. Limit of Information Compression. October, Examples of codes 1 Data Compression Limit of Information Compression Radu Trîmbiţaş October, 202 Outline Contents Eamples of codes 2 Kraft Inequality 4 2. Kraft Inequality............................ 4 2.2 Kraft inequality

More information

1 Introduction to information theory

1 Introduction to information theory 1 Introduction to information theory 1.1 Introduction In this chapter we present some of the basic concepts of information theory. The situations we have in mind involve the exchange of information through

More information

Compiling Techniques

Compiling Techniques Lecture 3: Introduction to 22 September 2017 Reminder Action Create an account and subscribe to the course on piazza. Coursework Starts this afternoon (14.10-16.00) Coursework description is updated regularly;

More information

Slides for CIS 675. Huffman Encoding, 1. Huffman Encoding, 2. Huffman Encoding, 3. Encoding 1. DPV Chapter 5, Part 2. Encoding 2

Slides for CIS 675. Huffman Encoding, 1. Huffman Encoding, 2. Huffman Encoding, 3. Encoding 1. DPV Chapter 5, Part 2. Encoding 2 Huffman Encoding, 1 EECS Slides for CIS 675 DPV Chapter 5, Part 2 Jim Royer October 13, 2009 A toy example: Suppose our alphabet is { A, B, C, D }. Suppose T is a text of 130 million characters. What is

More information

UNIVERSITY OF CALGARY. The Automatic Grouping of Sensor Data Layers

UNIVERSITY OF CALGARY. The Automatic Grouping of Sensor Data Layers UNIVERSITY OF CALGARY The Automatic Grouping of Sensor Data Layers Using Semantic Clustering and Classification to Group Semantically Similar Sensor Data Layers by Ben Charles Knoechel A THESIS SUBMITTED

More information

Visualisations of Gussian and Mean Curvatures by Using Mathematica and webmathematica

Visualisations of Gussian and Mean Curvatures by Using Mathematica and webmathematica 6 th International Conference on Applied Informatics Eger, Hungary, January 27 31, 2004. Visualisations of Gussian and Mean Curvatures by Using Mathematica and webmathematica Vladimir Benič, Sonja Gorjanc

More information

Open Government Data One Year Later

Open Government Data One Year Later Open Government Data One Year Later Build to Share 13 October 2010 U.S. Federal Data Architecture Subcommittee (DAS) 1 Agenda Origins Features of Data.gov Applications, Community, Semantic Web Highlights

More information

Outline. Similarity Search. Outline. Motivation. The String Edit Distance

Outline. Similarity Search. Outline. Motivation. The String Edit Distance Outline Similarity Search The Nikolaus Augsten nikolaus.augsten@sbg.ac.at Department of Computer Sciences University of Salzburg 1 http://dbresearch.uni-salzburg.at WS 2017/2018 Version March 12, 2018

More information

Resource Description Framework (RDF) A basis for knowledge representation on the Web

Resource Description Framework (RDF) A basis for knowledge representation on the Web Resource Description Framework (RDF) A basis for knowledge representation on the Web Simple language to capture assertions (as statements) Captures elements of knowledge about a resource Facilitates incremental

More information

Efficient Longest Common Subsequence Computation using Bulk-Synchronous Parallelism

Efficient Longest Common Subsequence Computation using Bulk-Synchronous Parallelism Efficient Longest Common Subsequence Computation using Bulk-Synchronous Parallelism Peter Krusche Department of Computer Science University of Warwick June 2006 Outline 1 Introduction Motivation The BSP

More information

Geographic Information Systems (GIS) - Hardware and software in GIS

Geographic Information Systems (GIS) - Hardware and software in GIS PDHonline Course L153G (5 PDH) Geographic Information Systems (GIS) - Hardware and software in GIS Instructor: Steve Ramroop, Ph.D. 2012 PDH Online PDH Center 5272 Meadow Estates Drive Fairfax, VA 22030-6658

More information

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code Chapter 3 Source Coding 3. An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code 3. An Introduction to Source Coding Entropy (in bits per symbol) implies in average

More information

2016 OCLC Online Computer Library Center, Inc Kilgour Place Dublin, OH USA

2016 OCLC Online Computer Library Center, Inc Kilgour Place Dublin, OH USA 2016 OCLC Online Computer Library Center, Inc. 6565 Kilgour Place Dublin, OH 43017-3395 USA The following OCLC product, service and business names are trademarks or service marks of OCLC, Inc.: CatExpress,

More information

Describing Geographical Objects

Describing Geographical Objects Describing Geographical Objects Web Architecture and Information Management [./] Spring 2009 INFO 190-02 (CCN 42509) Erik Wilde, UC Berkeley School of Information Contents Abstract Geodata on the Web 1

More information

Presenting Tree Inventory. Tomislav Sapic GIS Technologist Faculty of Natural Resources Management Lakehead University

Presenting Tree Inventory. Tomislav Sapic GIS Technologist Faculty of Natural Resources Management Lakehead University Presenting Tree Inventory Tomislav Sapic GIS Technologist Faculty of Natural Resources Management Lakehead University Suggested Options 1. Print out a Google Maps satellite image of the inventoried block

More information

Graphical Models for Text Mining: Knowledge Extraction and Performance Estimation

Graphical Models for Text Mining: Knowledge Extraction and Performance Estimation UNIVERSITÀ DEGLI STUDI DI MILANO - BICOCCA Facoltà di Scienze Matematiche, Fisiche e Naturali Dipartimento di Informatica Sistemistica e Comunicazione Dottorato di Ricerca in Informatica - XXIII Ciclo

More information

Computation Theory Finite Automata

Computation Theory Finite Automata Computation Theory Dept. of Computing ITT Dublin October 14, 2010 Computation Theory I 1 We would like a model that captures the general nature of computation Consider two simple problems: 2 Design a program

More information

Exam: Synchronous Grammars

Exam: Synchronous Grammars Exam: ynchronous Grammars Duration: 3 hours Written documents are allowed. The numbers in front of questions are indicative of hardness or duration. ynchronous grammars consist of pairs of grammars whose

More information

Chap 2: Classical models for information retrieval

Chap 2: Classical models for information retrieval Chap 2: Classical models for information retrieval Jean-Pierre Chevallet & Philippe Mulhem LIG-MRIM Sept 2016 Jean-Pierre Chevallet & Philippe Mulhem Models of IR 1 / 81 Outline Basic IR Models 1 Basic

More information