Text and multimedia languages and properties
|
|
- Dominic Baldwin
- 5 years ago
- Views:
Transcription
1 Text and multimedia languages and properties (Modern IR, Ch. 6) I VP R 1 Introduction Metadata Text Markup Languages Multimedia Trends and Research Issues Bibliographical Discussion I VP R 2
2 Introduction Document: single unit of info Text Other media, too Aspect of document I VP R 3 Aspect of document Syntax Semantics (Also pragmatics) Presentation style Application dependent Print vs. WWW vs. handheld etc. Metadata I VP R 4
3 Syntax Implicit / explicit / declarative I VP R 5 Formatting style Separate from content Manipulate independently I VP R 6
4 Metadata Data about data Descriptive How created Dublin Core Metadata Element Set 15 fields to describe doc Semantic About content è ontologies I VP R 7 MARC Machine Readable Cataloging Record Most used for library records Fields for bibliographic info USMARC: US specific I VP R 8
5 BiBTeX Originally, TeX Now, general cataloging I VP R 9 RDF Resource Descriptive Framework Web metadata standard XML-based è machine interoperability Nodes (URI) + attribute=value pairs I VP R 10
6 Text Formats ASCII, EBCDIC, RTF, PDF, MIME, etc. Information Theory Modeling Natural Language Similarity Models I VP R 11 Information Theory Entropy Fewer symbols è less meaning E = σ i= 1 p i log 2 p i Amount of info measure Symbols of alphabet Coded in binary E.g, σ = 2 è E = 1 if both symbols appear same # times E = 0 if only one symbol appears I VP R 12
7 Modeling Natural Language Text: Symbols from finite alphabet Belong to words Letters Separate words Vowels more frequent e : highest frequency (English) I VP R 13 Binomial model Each symbol è given probability But, dependence on previous symbols E.g., English è no cf è prob of symbol depends on previous Finite-context Markovian model I VP R 14
8 Finite-context Markovian model Consider 1, 2, more letters to generate next k-order model: consider previous k for next Binomial = 0-order More complex I VP R 15 More complex Finite-state machines è regular languages Grammars è context-free languages è context-sensitive languages Natual But correct grammar, hard I VP R 16
9 Word distribution models Zipf s law f(i s most frequent word) = 1/i Θ * f (most frequent word) è text: n words, vocabulary V words è f(i s most frequent word) = n / i Θ H v (Θ) H v (Θ) = harmonic number of order Θ of V V Hv( θ) = θ I VP R j= 1 j 17 1 Word frequency distribution Words arranged in decreasing f order Θ = 1 è H v (Θ) = O(log n) Θ = better fit for real data è H v (Θ) = O(1) Mandelbrot distribution: k/(c + i) Θ c = add l parameter k = such that all f s add to n F words I VP R 18
10 Word distribution skewed Few hundred words = ~50% of text è can ignore if too frequent e.g., stopwords Doesn t carry meaning Most frequent è can drop è reduce index space overhead I VP R 19 E.g., TREC-2 collection Most frequent: the, of, and, a, to, in I VP R 20
11 Word distribution in collection Simple model: same in all doc s But not true Better: negative binomial distribution Fraction of doc s containing word k times: F( k) = α + k 1 k p (1 + p) k α k Parameters: depend on word and doc collection I VP R 21 e.g., Brown Corpus, said è p = 9.24, α = 0.42 Other models from Poisson distribution I VP R 22
12 Vocabulary size # distinct words in doc / collection Heap s Law V Finite but high è better model than O(1) Also growth due to typos V = Kn β = O ( n β ) Text dependent 10 K < β < 1 # words in text Text size I VP R 23 Similarity Models Syntactic similarity between strings / doc s Distance function E.g., Hamming distance # positions w/ diff char s Equal è = 0 I VP R 24
13 Distance function properties Symmetric d(a, b) = d(b, a) Triangle equality d(a, c) d(a, b) + d(b, c) I VP R 25 Edit (Levenshtein) distance Min # char s, insert, delete, sub needed a à b e.g., color à colour = 1 survey à surgery = 2 Considered superior to other models I VP R 26
14 LCS longest common subsequence Only char deletion allowed LCS what s left after all noncommon char s deleted E.g., survey + surgery è surey Doc s è longest common line Unix diff command Time consuming I VP R 27 Also visual tools E.g., Dotplot Rectangular map Coordinates = file lines Each coo entry = gray pixel Depends on edit distance between associated lines I VP R 28
15 Markup Languages SGML DTD Schema HTML XML I VP R 29 Multimedia Formats Textual Images Graphics and Virtual Reality CGM: computer graphics metafile VRML HyTime: Hypermedia/Time-based Structuring Language Multimedia document markup standard SGML architecture Fig. 6.6, p. 161: taxonomy of Web languages I VP R 30
Information Retrieval CS Lecture 03. Razvan C. Bunescu School of Electrical Engineering and Computer Science
Information Retrieval CS 6900 Lecture 03 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Statistical Properties of Text Zipf s Law models the distribution of terms
More informationLecture 4 : Adaptive source coding algorithms
Lecture 4 : Adaptive source coding algorithms February 2, 28 Information Theory Outline 1. Motivation ; 2. adaptive Huffman encoding ; 3. Gallager and Knuth s method ; 4. Dictionary methods : Lempel-Ziv
More informationData Structures in Java
Data Structures in Java Lecture 20: Algorithm Design Techniques 12/2/2015 Daniel Bauer 1 Algorithms and Problem Solving Purpose of algorithms: find solutions to problems. Data Structures provide ways of
More informationLinguistics and logic of common mathematical language I. Peter Koepke and Merlin Carl, Mathematical Institute Universität Bonn
The NAPROCHE Project Linguistics and logic of common mathematical language I Peter Koepke and Merlin Carl, Mathematical Institute Universität Bonn Mathematical texts are formulated in a semi-formal language,
More informationWEB MAP SERVICE (WMS) FOR GEOLOGICAL DATA GEORGE TUDOR
WEB MAP SERVICE (WMS) FOR GEOLOGICAL DATA GEORGE TUDOR WEB MAP SERVICE (WMS) - GENERALITIES Projects with data from different sources Geological data are in different GIS software format Large amount of
More informationHigh Dimensional Search Min- Hashing Locality Sensi6ve Hashing
High Dimensional Search Min- Hashing Locality Sensi6ve Hashing Debapriyo Majumdar Data Mining Fall 2014 Indian Statistical Institute Kolkata September 8 and 11, 2014 High Support Rules vs Correla6on of
More informationPart 1: Fundamentals
Provläsningsexemplar / Preview INTERNATIONAL STANDARD ISO 19101-1 First edition 2014-11-15 Geographic information Reference model Part 1: Fundamentals Information géographique Modèle de référence Partie
More informationTheoretical Computer Science
Theoretical Computer Science Zdeněk Sawa Department of Computer Science, FEI, Technical University of Ostrava 17. listopadu 15, Ostrava-Poruba 708 33 Czech republic September 22, 2017 Z. Sawa (TU Ostrava)
More information1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H.
Problem sheet Ex. Verify that the function H(p,..., p n ) = k p k log p k satisfies all 8 axioms on H. Ex. (Not to be handed in). looking at the notes). List as many of the 8 axioms as you can, (without
More informationConceptual Similarity: Why, Where, How
Conceptual Similarity: Why, Where, How Michalis Sfakakis Laboratory on Digital Libraries & Electronic Publishing, Department of Archives and Library Sciences, Ionian University, Greece First Workshop on
More informationBasic Dublin Core Semantics
Basic Dublin Core Semantics DC 2006 Tutorial 1, 3 October 2006 Marty Kurth Head of Metadata Services Cornell University Library Getting started Let s introduce ourselves Let s discuss our expectations
More informationPROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability
More informationApplying the Semantic Web to Computational Chemistry
Applying the Semantic Web to Computational Chemistry Neil S. Ostlund, Mirek Sopek Chemical Semantics Inc., Gainesville, Florida, USA {ostlund, sopek}@chemicalsemantics.com Abstract. Chemical Semantics
More informationExchange ActiveSync: AirSyncBase Namespace Protocol
[MS-ASAIRS]: Intellectual Property Rights Notice for Open Specifications Documentation Technical Documentation. Microsoft publishes Open Specifications documentation ( this documentation ) for protocols,
More informationSection Summary. Relations and Functions Properties of Relations. Combining Relations
Chapter 9 Chapter Summary Relations and Their Properties n-ary Relations and Their Applications (not currently included in overheads) Representing Relations Closures of Relations (not currently included
More informationBoolean and Vector Space Retrieval Models
Boolean and Vector Space Retrieval Models Many slides in this section are adapted from Prof. Joydeep Ghosh (UT ECE) who in turn adapted them from Prof. Dik Lee (Univ. of Science and Tech, Hong Kong) 1
More informationEEOS 381 -Spatial Databases and GIS Applications
EEOS 381 -Spatial Databases and GIS Applications Lecture 5 Geodatabases What is a Geodatabase? Geographic Database ESRI-coined term A standard RDBMS that stores and manages geographic data A modern object-relational
More informationINFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from
INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Schütze s, linked from http://informationretrieval.org/ IR 26/26: Feature Selection and Exam Overview Paul Ginsparg Cornell University,
More informationData Compression Techniques
Data Compression Techniques Part 2: Text Compression Lecture 5: Context-Based Compression Juha Kärkkäinen 14.11.2017 1 / 19 Text Compression We will now look at techniques for text compression. These techniques
More informationLecture 1b: Text, terms, and bags of words
Lecture 1b: Text, terms, and bags of words Trevor Cohn (based on slides by William Webber) COMP90042, 2015, Semester 1 Corpus, document, term Body of text referred to as corpus Corpus regarded as a collection
More informationBoolean and Vector Space Retrieval Models CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK).
Boolean and Vector Space Retrieval Models 2013 CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK). 1 Table of Content Boolean model Statistical vector space model Retrieval
More informationAlgorithms for Approximate String Matching
Levenshtein Distance Algorithms for Approximate String Matching Part I Levenshtein Distance Hamming Distance Approximate String Matching with k Differences Longest Common Subsequences Part II A Fast and
More informationGRIB API A database driven decoding library
GRIB API A database driven decoding library Enrico Fucile, Cristian Codorean Data & Services ECMWF Slide 1 Slide 1 Overview Introduction GRIB code GRIB API keys database Parameters database Slide 2 Slide
More informationBandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet)
Compression Motivation Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet) Storage: Store large & complex 3D models (e.g. 3D scanner
More informationModels. Models of Computation, Turing Machines, and the Limits of Turing Computation. Effective Calculability. Motivation for Models of Computation
Turing Computation /0/ Models of Computation, Turing Machines, and the Limits of Turing Computation Bruce MacLennan Models A model is a tool intended to address a class of questions about some domain of
More informationArchaeology, Formality & the CIDOC CRM. Leif Isaksen, Kirk Martinez & Graeme Earl ECS/Archaeology University of Southampton
Archaeology, Formality & the CIDOC CRM Leif Isaksen, Kirk Martinez & Graeme Earl ECS/Archaeology University of Southampton 1 Is the CIDOC CRM too hard? The initial idea that the Domain Experts would be
More informationDynamic Ontology Service for Historical Persons and Places Based on Crowdsourcing
Dynamic Ontology Service for Historical Persons and Places Based on Crowdsourcing 22.1.2016, COST RRL WG2 Workshop Jouni Tuominen Semantic Computing Research Group (SeCo), http://seco.cs.aalto.fi, Aalto
More informationDefinition: A binary relation R from a set A to a set B is a subset R A B. Example:
Chapter 9 1 Binary Relations Definition: A binary relation R from a set A to a set B is a subset R A B. Example: Let A = {0,1,2} and B = {a,b} {(0, a), (0, b), (1,a), (2, b)} is a relation from A to B.
More informationHidden Markov Models, I. Examples. Steven R. Dunbar. Toy Models. Standard Mathematical Models. Realistic Hidden Markov Models.
, I. Toy Markov, I. February 17, 2017 1 / 39 Outline, I. Toy Markov 1 Toy 2 3 Markov 2 / 39 , I. Toy Markov A good stack of examples, as large as possible, is indispensable for a thorough understanding
More informationPeter Wood. Department of Computer Science and Information Systems Birkbeck, University of London Automata and Formal Languages
and and Department of Computer Science and Information Systems Birkbeck, University of London ptw@dcs.bbk.ac.uk Outline and Doing and analysing problems/languages computability/solvability/decidability
More informationINFO 2950 Intro to Data Science. Lecture 18: Power Laws and Big Data
INFO 2950 Intro to Data Science Lecture 18: Power Laws and Big Data Paul Ginsparg Cornell University, Ithaca, NY 7 Apr 2016 1/25 Power Laws in log-log space y = cx k (k=1/2,1,2) log 10 y = k log 10 x +log
More information13 Searching the Web with the SVD
13 Searching the Web with the SVD 13.1 Information retrieval Over the last 20 years the number of internet users has grown exponentially with time; see Figure 1. Trying to extract information from this
More informationGod doesn t play dice. - Albert Einstein
ECE 450 Lecture 1 God doesn t play dice. - Albert Einstein As far as the laws of mathematics refer to reality, they are not certain; as far as they are certain, they do not refer to reality. Lecture Overview
More informationtext statistics October 24, 2018 text statistics 1 / 20
text statistics October 24, 2018 text statistics 1 / 20 Overview 1 2 text statistics 2 / 20 Outline 1 2 text statistics 3 / 20 Model collection: The Reuters collection symbol statistic value N documents
More informationOWL Semantics COMP Sean Bechhofer Uli Sattler
OWL Semantics COMP62342 Sean Bechhofer sean.bechhofer@manchester.ac.uk Uli Sattler uli.sattler@manchester.ac.uk 1 Toward Knowledge Formalization Acquisition Process Elicit tacit knowledge A set of terms/concepts
More informationMagnetar Games Corporation
'The higher we soar the smaller we appear to those who cannot fly.. Friedrich Nietzsche Magnetar Games Corporation Magnetar Multiverse Highlights! Standards based virtual alternate reality authoring and
More informationFinite State Machines. Languages g and Machines
Finite State Machines Chapter 5 Languages g and Machines Regular Languages g L Regular Language Regular Expression Accepts Finite State Machine Finite State Machines An FSM to accept $.50 in change: Definition
More informationCityGML XFM Application Template Documentation. Bentley Map V8i (SELECTseries 2)
CityGML XFM Application Template Documentation Bentley Map V8i (SELECTseries 2) Table of Contents Introduction to CityGML 1 CityGML XFM Application Template 2 Requirements 2 Finding Documentation 2 To
More informationSome useful tasks involving language. Finite-State Machines and Regular Languages. More useful tasks involving language. Regular expressions
Some useful tasks involving language Finite-State Machines and Regular Languages Find all phone numbers in a text, e.g., occurrences such as When you call (614) 292-8833, you reach the fax machine. Find
More informationSapienza universita di Roma Dipartimento di Informatica e Sistemistica. User guide WSCE-Lite Web Service Composition Engine v 0.1.
Sapienza universita di Roma Dipartimento di Informatica e Sistemistica User guide WSCE-Lite Web Service Composition Engine v 0.1 Valerio Colaianni Contents 1 Installation 5 1.1 Installing TLV..........................
More informationA conceptualization is a map from the problem domain into the representation. A conceptualization specifies:
Knowledge Sharing A conceptualization is a map from the problem domain into the representation. A conceptualization specifies: What sorts of individuals are being modeled The vocabulary for specifying
More informationCMPT 365 Multimedia Systems. Lossless Compression
CMPT 365 Multimedia Systems Lossless Compression Spring 2017 Edited from slides by Dr. Jiangchuan Liu CMPT365 Multimedia Systems 1 Outline Why compression? Entropy Variable Length Coding Shannon-Fano Coding
More informationA Survey of Temporal Knowledge Representations
A Survey of Temporal Knowledge Representations Advisor: Professor Abdullah Tansel Second Exam Presentation Knowledge Representations logic-based logic-base formalisms formalisms more complex and difficult
More informationKnowledge representation DATA INFORMATION KNOWLEDGE WISDOM. Figure Relation ship between data, information knowledge and wisdom.
Knowledge representation Introduction Knowledge is the progression that starts with data which s limited utility. Data when processed become information, information when interpreted or evaluated becomes
More informationData. Notes. are required reading for the week. textbook reading and a few slides on data formats and data cleaning
CS 725/825 Information Visualization Fall 2017 Data Dr. Michele C. Weigle http://www.cs.odu.edu/~mweigle/cs725-f17/ Notes } We will not cover these slides in class, but they are required reading for the
More informationMultimedia. Multimedia Data Compression (Lossless Compression Algorithms)
Course Code 005636 (Fall 2017) Multimedia Multimedia Data Compression (Lossless Compression Algorithms) Prof. S. M. Riazul Islam, Dept. of Computer Engineering, Sejong University, Korea E-mail: riaz@sejong.ac.kr
More informationECE 479/579 Principles of Artificial Intelligence Part I Spring Dr. Michael Marefat
ECE 479/579 Principles of Artificial Intelligence Part I Spring 2005 Dr. Michael Marefat (marefat@ece.arizona.edu) Required text "Artificial Intelligence: A Modern Approach, Second Edition" by Stuart Russell
More informationEcco: A Hybrid Diff Tool for OWL 2 ontologies
Ecco: A Hybrid Diff Tool for OWL 2 ontologies Rafael S. Gonçalves, Bijan Parsia, and Ulrike Sattler School of Computer Science, University of Manchester, Manchester, United Kingdom Abstract. The detection
More informationSequences and Information
Sequences and Information Rahul Siddharthan The Institute of Mathematical Sciences, Chennai, India http://www.imsc.res.in/ rsidd/ Facets 16, 04/07/2016 This box says something By looking at the symbols
More informationISO INTERNATIONAL STANDARD. Geographic information Spatial referencing by coordinates
INTERNATIONAL STANDARD ISO 19111 Second edition 2007-07-01 Geographic information Spatial referencing by coordinates Information géographique Système de références spatiales par coordonnées Reference number
More informationFROM QUERIES TO TOP-K RESULTS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
FROM QUERIES TO TOP-K RESULTS Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Retrieval models Retrieval evaluation Link
More information4th year Project demo presentation
4th year Project demo presentation Colm Ó héigeartaigh CASE4-99387212 coheig-case4@computing.dcu.ie 4th year Project demo presentation p. 1/23 Table of Contents An Introduction to Quantum Computing The
More informationOWL Semantics. COMP60421 Sean Bechhofer University of Manchester
OWL Semantics COMP60421 Sean Bechhofer University of Manchester sean.bechhofer@manchester.ac.uk 1 Technologies for the Semantic Web Metadata Resources are marked-up with descriptions of their content.
More information/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Dynamic Programming II Date: 10/12/17
601.433/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Dynamic Programming II Date: 10/12/17 12.1 Introduction Today we re going to do a couple more examples of dynamic programming. While
More informationTuring s thesis: (1930) Any computation carried out by mechanical means can be performed by a Turing Machine
Turing s thesis: (1930) Any computation carried out by mechanical means can be performed by a Turing Machine There is no known model of computation more powerful than Turing Machines Definition of Algorithm:
More information7 Distances. 7.1 Metrics. 7.2 Distances L p Distances
7 Distances We have mainly been focusing on similarities so far, since it is easiest to explain locality sensitive hashing that way, and in particular the Jaccard similarity is easy to define in regards
More informationLecture 5: Web Searching using the SVD
Lecture 5: Web Searching using the SVD Information Retrieval Over the last 2 years the number of internet users has grown exponentially with time; see Figure. Trying to extract information from this exponentially
More informationLanguage as a Stochastic Process
CS769 Spring 2010 Advanced Natural Language Processing Language as a Stochastic Process Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu 1 Basic Statistics for NLP Pick an arbitrary letter x at random from any
More informationReducing Consumer Uncertainty
Spatial Analytics Reducing Consumer Uncertainty Eliciting User and Producer Views on Geospatial Data Quality Introduction Cooperative Research Centre for Spatial Information (CRCSI) in Australia Communicate
More informationLecture 3: Probabilistic Retrieval Models
Probabilistic Retrieval Models Information Retrieval and Web Search Engines Lecture 3: Probabilistic Retrieval Models November 5 th, 2013 Wolf-Tilo Balke and Kinda El Maarry Institut für Informationssysteme
More informationLanguages. Languages. An Example Grammar. Grammars. Suppose we have an alphabet V. Then we can write:
Languages A language is a set (usually infinite) of strings, also known as sentences Each string consists of a sequence of symbols taken from some alphabet An alphabet, V, is a finite set of symbols, e.g.
More informationContextualizing Historical Places in a Gazetteer by Using Historical Maps and Linked Data
Contextualizing Historical Places in a Gazetteer by Using Historical Maps and Linked Data Esko Ikkala, Jouni Tuominen, and Eero Hyvönen Semantic Computing Research Group (SeCo), Aalto University http://www.seco.tkk.fi/,
More information9 Searching the Internet with the SVD
9 Searching the Internet with the SVD 9.1 Information retrieval Over the last 20 years the number of internet users has grown exponentially with time; see Figure 1. Trying to extract information from this
More informationChapter 2 Date Compression: Source Coding. 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code
Chapter 2 Date Compression: Source Coding 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code 2.1 An Introduction to Source Coding Source coding can be seen as an efficient way
More informationGeoref - Linked Data Deployment for Spatial Data; Finnish Initiative
Georef - Linked Data Deployment for Spatial Data; Finnish Initiative Esa TIAINEN, Finland Key words: Linked Data, Semantics, Ontology, Web, RDF, Big Data SUMMARY 'Georef' is an initiative for a service
More informationThe Flows of Concepts
Marcin Skulimowski Faculty of Physics and Applied Informatics, University of Lodz, Pomorska 149/153, 90-236 Lodz, Poland Keywords: Abstract: Citation Relation, Semantic Publishing, Digital Libraries. A
More informationAtmospheric Science and GIS Interoperability issues: some Data Model and Computational Interface aspects
UNIDATA Boulder, Sep. 2003 Atmospheric Science and GIS Interoperability issues: some Data and Computational Interface aspects Stefano Nativi University of Florence and IMAA-CNR Outline Service-Oriented
More informationOWL Basics. Technologies for the Semantic Web. Building a Semantic Web. Ontology
Technologies for the Semantic Web OWL Basics COMP60421 Sean Bechhofer University of Manchester sean.bechhofer@manchester.ac.uk Metadata Resources are marked-up with descriptions of their content. No good
More informationInnovation. The Push and Pull at ESRI. September Kevin Daugherty Cadastral/Land Records Industry Solutions Manager
Innovation The Push and Pull at ESRI September 2004 Kevin Daugherty Cadastral/Land Records Industry Solutions Manager The Push and The Pull The Push is the information technology that drives research and
More informationRegular expressions and automata
Regular expressions and automata Introduction Finite State Automaton (FSA) Finite State Transducers (FST) Regular expressions(i) (Res) Standard notation for characterizing text sequences Specifying text
More informationInformation Retrieval and Web Search Engines
Information Retrieval and Web Search Engines Lecture 4: Probabilistic Retrieval Models April 29, 2010 Wolf-Tilo Balke and Joachim Selke Institut für Informationssysteme Technische Universität Braunschweig
More informationUsing Tree Automata and Regular Expressions to Manipulate Hierarchically Structured Data
Using Tree Automata and Regular Expressions to Manipulate Hierarchically Structured Data Nikita Schmidt and Ahmed Patel University College Dublin arxiv:cs/0201008v1 [cs.cl] 11 Jan 2002 Abstract Information,
More informationTobias Markus. January 21, 2015
Automata Advanced Seminar Computer Engineering January 21, 2015 (Advanced Seminar Computer Engineering ) Automata January 21, 2015 1 / 35 1 2 3 4 5 6 obias Markus (Advanced Seminar Computer Engineering
More information1 Introduction. Multimedia, Vision and Graphics Laboratory
Multimedia, Vision and Graphics Laboratory Rumelifeneri Yolu, Sarıyer, 34450, İstanbul / Turkey date: September 3, 2017 to: Engin Erzin from: M. A. Tuğtekin Turan subject: On the Use of MVGL Report Template
More informationHuffman Coding. C.M. Liu Perceptual Lab, College of Computer Science National Chiao-Tung University
Huffman Coding C.M. Liu Perceptual Lab, College of Computer Science National Chiao-Tung University http://www.csie.nctu.edu.tw/~cmliu/courses/compression/ Office: EC538 (03)573877 cmliu@cs.nctu.edu.tw
More informationMultimedia Communications. Mathematical Preliminaries for Lossless Compression
Multimedia Communications Mathematical Preliminaries for Lossless Compression What we will see in this chapter Definition of information and entropy Modeling a data source Definition of coding and when
More informationIntroduction to Automata
Introduction to Automata Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /
More informationSemantic Web SPARQL. Gerd Gröner, Matthias Thimm. July 16,
Semantic Web SPARQL Gerd Gröner, Matthias Thimm {groener,thimm}@uni-koblenz.de Institute for Web Science and Technologies (WeST) University of Koblenz-Landau July 16, 2013 Gerd Gröner, Matthias Thimm Semantic
More informationContext-free grammars and languages
Context-free grammars and languages The next class of languages we will study in the course is the class of context-free languages. They are defined by the notion of a context-free grammar, or a CFG for
More informationCompression Techniques for 3D SDI
Compression Techniques for 3D SDI Bernad S. ChengXi and Alias Abdul Rahman 3D GIS Research Lab Faculty of Geoinformation and Real Estate Universiti Teknologi Malaysia Outline Introduction Background of
More informationThe Unsolvability of the Halting Problem. Chapter 19
The Unsolvability of the Halting Problem Chapter 19 Languages and Machines SD D Context-Free Languages Regular Languages reg exps FSMs cfgs PDAs unrestricted grammars Turing Machines D and SD A TM M with
More informationData Compression. Limit of Information Compression. October, Examples of codes 1
Data Compression Limit of Information Compression Radu Trîmbiţaş October, 202 Outline Contents Eamples of codes 2 Kraft Inequality 4 2. Kraft Inequality............................ 4 2.2 Kraft inequality
More information1 Introduction to information theory
1 Introduction to information theory 1.1 Introduction In this chapter we present some of the basic concepts of information theory. The situations we have in mind involve the exchange of information through
More informationCompiling Techniques
Lecture 3: Introduction to 22 September 2017 Reminder Action Create an account and subscribe to the course on piazza. Coursework Starts this afternoon (14.10-16.00) Coursework description is updated regularly;
More informationSlides for CIS 675. Huffman Encoding, 1. Huffman Encoding, 2. Huffman Encoding, 3. Encoding 1. DPV Chapter 5, Part 2. Encoding 2
Huffman Encoding, 1 EECS Slides for CIS 675 DPV Chapter 5, Part 2 Jim Royer October 13, 2009 A toy example: Suppose our alphabet is { A, B, C, D }. Suppose T is a text of 130 million characters. What is
More informationUNIVERSITY OF CALGARY. The Automatic Grouping of Sensor Data Layers
UNIVERSITY OF CALGARY The Automatic Grouping of Sensor Data Layers Using Semantic Clustering and Classification to Group Semantically Similar Sensor Data Layers by Ben Charles Knoechel A THESIS SUBMITTED
More informationVisualisations of Gussian and Mean Curvatures by Using Mathematica and webmathematica
6 th International Conference on Applied Informatics Eger, Hungary, January 27 31, 2004. Visualisations of Gussian and Mean Curvatures by Using Mathematica and webmathematica Vladimir Benič, Sonja Gorjanc
More informationOpen Government Data One Year Later
Open Government Data One Year Later Build to Share 13 October 2010 U.S. Federal Data Architecture Subcommittee (DAS) 1 Agenda Origins Features of Data.gov Applications, Community, Semantic Web Highlights
More informationOutline. Similarity Search. Outline. Motivation. The String Edit Distance
Outline Similarity Search The Nikolaus Augsten nikolaus.augsten@sbg.ac.at Department of Computer Sciences University of Salzburg 1 http://dbresearch.uni-salzburg.at WS 2017/2018 Version March 12, 2018
More informationResource Description Framework (RDF) A basis for knowledge representation on the Web
Resource Description Framework (RDF) A basis for knowledge representation on the Web Simple language to capture assertions (as statements) Captures elements of knowledge about a resource Facilitates incremental
More informationEfficient Longest Common Subsequence Computation using Bulk-Synchronous Parallelism
Efficient Longest Common Subsequence Computation using Bulk-Synchronous Parallelism Peter Krusche Department of Computer Science University of Warwick June 2006 Outline 1 Introduction Motivation The BSP
More informationGeographic Information Systems (GIS) - Hardware and software in GIS
PDHonline Course L153G (5 PDH) Geographic Information Systems (GIS) - Hardware and software in GIS Instructor: Steve Ramroop, Ph.D. 2012 PDH Online PDH Center 5272 Meadow Estates Drive Fairfax, VA 22030-6658
More informationChapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code
Chapter 3 Source Coding 3. An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code 3. An Introduction to Source Coding Entropy (in bits per symbol) implies in average
More information2016 OCLC Online Computer Library Center, Inc Kilgour Place Dublin, OH USA
2016 OCLC Online Computer Library Center, Inc. 6565 Kilgour Place Dublin, OH 43017-3395 USA The following OCLC product, service and business names are trademarks or service marks of OCLC, Inc.: CatExpress,
More informationDescribing Geographical Objects
Describing Geographical Objects Web Architecture and Information Management [./] Spring 2009 INFO 190-02 (CCN 42509) Erik Wilde, UC Berkeley School of Information Contents Abstract Geodata on the Web 1
More informationPresenting Tree Inventory. Tomislav Sapic GIS Technologist Faculty of Natural Resources Management Lakehead University
Presenting Tree Inventory Tomislav Sapic GIS Technologist Faculty of Natural Resources Management Lakehead University Suggested Options 1. Print out a Google Maps satellite image of the inventoried block
More informationGraphical Models for Text Mining: Knowledge Extraction and Performance Estimation
UNIVERSITÀ DEGLI STUDI DI MILANO - BICOCCA Facoltà di Scienze Matematiche, Fisiche e Naturali Dipartimento di Informatica Sistemistica e Comunicazione Dottorato di Ricerca in Informatica - XXIII Ciclo
More informationComputation Theory Finite Automata
Computation Theory Dept. of Computing ITT Dublin October 14, 2010 Computation Theory I 1 We would like a model that captures the general nature of computation Consider two simple problems: 2 Design a program
More informationExam: Synchronous Grammars
Exam: ynchronous Grammars Duration: 3 hours Written documents are allowed. The numbers in front of questions are indicative of hardness or duration. ynchronous grammars consist of pairs of grammars whose
More informationChap 2: Classical models for information retrieval
Chap 2: Classical models for information retrieval Jean-Pierre Chevallet & Philippe Mulhem LIG-MRIM Sept 2016 Jean-Pierre Chevallet & Philippe Mulhem Models of IR 1 / 81 Outline Basic IR Models 1 Basic
More information