MDLP approach to clustering
|
|
- Roland Newman
- 6 years ago
- Views:
Transcription
1 MDLP approach to clustering Jacek Tabor Jagiellonian University Theoretical Foundations of Machine Learning 2015 Jacek Tabor (UJ) MDLP approach to clustering TFML / 13
2 Computer science Ockham s razor Ockham s razor Ockham s razor Among competing hypotheses, the one with the fewest assumptions should be selected. Other, more complicated solutions may ultimately prove correct, but in the absence of certainty the fewer assumptions that are made, the better. Figure: William of Ocham Jacek Tabor (UJ) MDLP approach to clustering TFML / 13
3 Computer science Ockham s razor Morse code Morse code Morse Code/telegraph [1836 S. Morse, A. Vail]: Vail determined the frequency of use of letters in the English language by counting the movable type he found in the type-cases of a local newspaper in Morristown: the code of typical text should be as short as possible, i.e.: the more common the letter, the shorter should be its code. Jacek Tabor (UJ) MDLP approach to clustering TFML / 13
4 Computer science Ockham s razor Morse code Morse code Morse Code/telegraph [1836 S. Morse, A. Vail]: Vail determined the frequency of use of letters in the English language by counting the movable type he found in the type-cases of a local newspaper in Morristown: the code of typical text should be as short as possible, i.e.: the more common the letter, the shorter should be its code. ANECDOTE: Ernest Hemingway [newspaper notes from war in Spain] Jacek Tabor (UJ) MDLP approach to clustering TFML / 13
5 Computer science Ockham s razor Morse code Morse code Morse Code/telegraph [1836 S. Morse, A. Vail]: Vail determined the frequency of use of letters in the English language by counting the movable type he found in the type-cases of a local newspaper in Morristown: the code of typical text should be as short as possible, i.e.: the more common the letter, the shorter should be its code. ANECDOTE: Ernest Hemingway [newspaper notes from war in Spain] READER (with admiration): how is it possible to learn to write the notes so short, but so wonderfully informative? Jacek Tabor (UJ) MDLP approach to clustering TFML / 13
6 Computer science Ockham s razor Morse code Morse code Morse Code/telegraph [1836 S. Morse, A. Vail]: Vail determined the frequency of use of letters in the English language by counting the movable type he found in the type-cases of a local newspaper in Morristown: the code of typical text should be as short as possible, i.e.: the more common the letter, the shorter should be its code. ANECDOTE: Ernest Hemingway [newspaper notes from war in Spain] READER (with admiration): how is it possible to learn to write the notes so short, but so wonderfully informative? HEMINGWAY: easy you just have to pay one dollar for each word send by the telegraph Jacek Tabor (UJ) MDLP approach to clustering TFML / 13
7 Computer science Ockham s razor Morse code Morse code Morse Code/telegraph [1836 S. Morse, A. Vail]: Vail determined the frequency of use of letters in the English language by counting the movable type he found in the type-cases of a local newspaper in Morristown: the code of typical text should be as short as possible, i.e.: the more common the letter, the shorter should be its code. ANECDOTE: Ernest Hemingway [newspaper notes from war in Spain] READER (with admiration): how is it possible to learn to write the notes so short, but so wonderfully informative? HEMINGWAY: easy you just have to pay one dollar for each word send by the telegraph Information=money Jacek Tabor (UJ) MDLP approach to clustering TFML / 13
8 Computer science Ockham s razor Entropy Entropy C. Shannon [1948. "A Mathematical Theory of Communication". Bell System Technical Journal 27] Precise formulation of the idea of Morse leads to the formal definition of Shannon s entropy: in the optimal coding (that is those with the shortest code-length), if the signal appears with probability p, its code-length should equal to log 2 p. Jacek Tabor (UJ) MDLP approach to clustering TFML / 13
9 Computer science Ockham s razor Entropy Entropy C. Shannon [1948. "A Mathematical Theory of Communication". Bell System Technical Journal 27] Precise formulation of the idea of Morse leads to the formal definition of Shannon s entropy: in the optimal coding (that is those with the shortest code-length), if the signal appears with probability p, its code-length should equal to log 2 p. Thus if the symbols x 1,..., x n appear with probabilities p 1,..., p n : entropy = minimal code length = p 1 log 2 p p n log 2 p n. In practice leads to Huffman s coding (used for example in jpg). Jacek Tabor (UJ) MDLP approach to clustering TFML / 13
10 Computer science Ockham s razor Minimum description length (MDL) principle Minimum description length (MDL) principle Formulation of the MDL principle Jorma Rissanen [1978. "Modeling by shortest data description". Automatica 14]: the best hypothesis for a given set of data is the one that leads to the best compression of the data. Jacek Tabor (UJ) MDLP approach to clustering TFML / 13
11 Computer science Ockham s razor Minimum description length (MDL) principle Minimum description length (MDL) principle Formulation of the MDL principle Jorma Rissanen [1978. "Modeling by shortest data description". Automatica 14]: the best hypothesis for a given set of data is the one that leads to the best compression of the data. P. Grünwald, [1998]: "any regularity in a given set of data can be used to compress the data, i.e. to describe it using fewer symbols than needed to describe the data literally." Jacek Tabor (UJ) MDLP approach to clustering TFML / 13
12 Computer science Ockham s razor Minimum description length (MDL) principle Minimum description length (MDL) principle Formulation of the MDL principle Jorma Rissanen [1978. "Modeling by shortest data description". Automatica 14]: the best hypothesis for a given set of data is the one that leads to the best compression of the data. P. Grünwald, [1998]: "any regularity in a given set of data can be used to compress the data, i.e. to describe it using fewer symbols than needed to describe the data literally." Connected to the notion of Kolmogorov complexity [1963. "On Tables of Random Numbers". Sankhya Ser. A. 2]: the complexity of a string is the length of the shortest possible description of the string in some fixed universal description language. Jacek Tabor (UJ) MDLP approach to clustering TFML / 13
13 Clustering How many clusters? How many clusters? In most clustering methods, one has to specify the number of clusters. This implies, that the procedure does not return the right number of clusters (or reduce unnecessary clusters on-line during the clustering procedure). Jacek Tabor (UJ) MDLP approach to clustering TFML / 13
14 Clustering How many clusters? How many clusters? In most clustering methods, one has to specify the number of clusters. This implies, that the procedure does not return the right number of clusters (or reduce unnecessary clusters on-line during the clustering procedure). Advantages of the use of MDL principle in clustering: automatically reduces the complexity of the model (number of clusters) has high adaptability (often) small requirements on the the data: (we do not require vector or metric structures, but only the existence of encoding or compression methods) easily allows potential modifications Jacek Tabor (UJ) MDLP approach to clustering TFML / 13
15 Clustering MDL principle in clustering MDL principle in clustering Requirements Data type which we want to cluster, available compression methods W. Jacek Tabor (UJ) MDLP approach to clustering TFML / 13
16 Clustering MDL principle in clustering MDL principle in clustering Requirements Data type which we want to cluster, available compression methods W. Determination of the overall cost of the memory Let us assume that message X = (x 1,..., x N ) and the compression methods w 1,..., w K W are given. We denote the algorithm that encodes point x l (defines the cluster it belongs to) with w kl, where k l {1,..., K }. Then the memory cost of coding the message X equals K memory cost of w k + N (cost of identification of k l + i=1 l=1 the amount of memory algorithm w kl uses to code x l ). Jacek Tabor (UJ) MDLP approach to clustering TFML / 13
17 Clustering MDL principle in clustering MDL principle in clustering Requirements Data type which we want to cluster, available compression methods W. Determination of the overall cost of the memory Let us assume that message X = (x 1,..., x N ) and the compression methods w 1,..., w K W are given. We denote the algorithm that encodes point x l (defines the cluster it belongs to) with w kl, where k l {1,..., K }. Then the memory cost of coding the message X equals K memory cost of w k + N (cost of identification of k l + i=1 l=1 the amount of memory algorithm w kl uses to code x l ). Memory minimization step We seek K, compression methods w 1,..., w K and indices k l, such that the total cost needed to encode the message X is minimal. Jacek Tabor (UJ) MDLP approach to clustering TFML / 13
18 Clustering MDL principle in clustering MDL principle in clustering Requirements Data type which we want to cluster, available compression methods W. Determination of the overall cost of the memory Let us assume that message X = (x 1,..., x N ) and the compression methods w 1,..., w K W are given. We denote the algorithm that encodes point x l (defines the cluster it belongs to) with w kl, where k l {1,..., K }. Then the memory cost of coding the message X equals K memory cost of w k + N (cost of identification of k l + i=1 l=1 the amount of memory algorithm w kl uses to code x l ). Memory minimization step We seek K, compression methods w 1,..., w K and indices k l, such that the total cost needed to encode the message X is minimal. Construction of the clustering Points which are coded by the same algorithm are assigned to the same cluster. Jacek Tabor (UJ) MDLP approach to clustering TFML / 13
19 Clustering MDL principle in clustering MDL principle in clustering Observe that in the above approach the use of any encoding method takes memory (needed for its determination), as a consequence we obtain the upper limit for the amount of possible clusters. Moreover, in practice when the amount of elements encoded by a given algorithm is small it will be worthwhile to give up the use of this algorithm, which leads to a reduction of the complexity of the constructed model (understood as a number of clusters). Jacek Tabor (UJ) MDLP approach to clustering TFML / 13
20 Gaussian models Differential entropy, cross-entropy Differential entropy The coder adapted to the data generated by the continuous random variable with the density f. Differential entropy is the limiting case of entropy (smaller and smaller bins): h(f ) := f (x) log 2 (f (x))dx. Jacek Tabor (UJ) MDLP approach to clustering TFML / 13
21 Gaussian models Differential entropy, cross-entropy Differential entropy The coder adapted to the data generated by the continuous random variable with the density f. Differential entropy is the limiting case of entropy (smaller and smaller bins): h(f ) := f (x) log 2 (f (x))dx. Cross-entropy h(f ) := g(x) log 2 (f (x))dx. Gives the memory cost of compressing the data with the density g by the coder optimally adapted to the density f. Jacek Tabor (UJ) MDLP approach to clustering TFML / 13
22 Gaussian models Gaussian models Gaussian models In practice, we can compute the above for the Gaussian models! Jacek Tabor (UJ) MDLP approach to clustering TFML / 13
23 Gaussian models Gaussian models Gaussian models In practice, we can compute the above for the Gaussian models! There is only the need for the knowledge of the mean of the data and the covariance matrix. Jacek Tabor (UJ) MDLP approach to clustering TFML / 13
24 Papers Papers Papers TABOR, SPUREK: Cross-entropy clustering, Pattern Recognition 2014 TABOR, MISZTAL: Detection of elliptical shapes via cross-entropy clustering (IbPRIA 2013) SPUREK, TABOR, ZAJAC: Detection of disk-like particles in electron microscopy images (CORES 2013) ŚMIEJA, TABOR: Image segmentation with use of cross-entropy clustering (CORES 2013) Jacek Tabor (UJ) MDLP approach to clustering TFML / 13
25 Papers EM vs CEC EM vs CEC [Geoffrey McLachlan,Thriyambakam Krishnan The EM Algorithm and Extensions] Fit the data X by X p 1 f p k f k, where f i belong to the fixed density family. CEC: X max(p 1 f 1,..., p k f k ), Jacek Tabor (UJ) MDLP approach to clustering TFML / 13
26 Papers EM vs CEC Cluster reduction Figure: Cluster reduction. Jacek Tabor (UJ) MDLP approach to clustering TFML / 13
Kolmogorov complexity ; induction, prediction and compression
Kolmogorov complexity ; induction, prediction and compression Contents 1 Motivation for Kolmogorov complexity 1 2 Formal Definition 2 3 Trying to compute Kolmogorov complexity 3 4 Standard upper bounds
More information3F1 Information Theory, Lecture 3
3F1 Information Theory, Lecture 3 Jossy Sayir Department of Engineering Michaelmas 2011, 28 November 2011 Memoryless Sources Arithmetic Coding Sources with Memory 2 / 19 Summary of last lecture Prefix-free
More information3F1 Information Theory, Lecture 3
3F1 Information Theory, Lecture 3 Jossy Sayir Department of Engineering Michaelmas 2013, 29 November 2013 Memoryless Sources Arithmetic Coding Sources with Memory Markov Example 2 / 21 Encoding the output
More informationBandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet)
Compression Motivation Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet) Storage: Store large & complex 3D models (e.g. 3D scanner
More informationDATA MINING LECTURE 9. Minimum Description Length Information Theory Co-Clustering
DATA MINING LECTURE 9 Minimum Description Length Information Theory Co-Clustering MINIMUM DESCRIPTION LENGTH Occam s razor Most data mining tasks can be described as creating a model for the data E.g.,
More informationInformation and Entropy. Professor Kevin Gold
Information and Entropy Professor Kevin Gold What s Information? Informally, when I communicate a message to you, that s information. Your grade is 100/100 Information can be encoded as a signal. Words
More informationA Mathematical Theory of Communication
A Mathematical Theory of Communication Ben Eggers Abstract This paper defines information-theoretic entropy and proves some elementary results about it. Notably, we prove that given a few basic assumptions
More informationData Compression Techniques
Data Compression Techniques Part 2: Text Compression Lecture 5: Context-Based Compression Juha Kärkkäinen 14.11.2017 1 / 19 Text Compression We will now look at techniques for text compression. These techniques
More informationCISC 876: Kolmogorov Complexity
March 27, 2007 Outline 1 Introduction 2 Definition Incompressibility and Randomness 3 Prefix Complexity Resource-Bounded K-Complexity 4 Incompressibility Method Gödel s Incompleteness Theorem 5 Outline
More informationCS4800: Algorithms & Data Jonathan Ullman
CS4800: Algorithms & Data Jonathan Ullman Lecture 22: Greedy Algorithms: Huffman Codes Data Compression and Entropy Apr 5, 2018 Data Compression How do we store strings of text compactly? A (binary) code
More informationMurray Gell-Mann, The Quark and the Jaguar, 1995
Although [complex systems] differ widely in their physical attributes, they resemble one another in the way they handle information. That common feature is perhaps the best starting point for exploring
More informationAn Information Theoretic Interpretation of Variational Inference based on the MDL Principle and the Bits-Back Coding Scheme
An Information Theoretic Interpretation of Variational Inference based on the MDL Principle and the Bits-Back Coding Scheme Ghassen Jerfel April 2017 As we will see during this talk, the Bayesian and information-theoretic
More informationPackage CEC. R topics documented: August 29, Title Cross-Entropy Clustering Version Date
Title Cross-Entropy Clustering Version 0.9.4 Date 2016-04-23 Package CEC August 29, 2016 Author Konrad Kamieniecki [aut, cre], Przemyslaw Spurek [ctb] Maintainer Konrad Kamieniecki
More informationOutline of the Lecture. Background and Motivation. Basics of Information Theory: 1. Introduction. Markku Juntti. Course Overview
: Markku Juntti Overview The basic ideas and concepts of information theory are introduced. Some historical notes are made and the overview of the course is given. Source The material is mainly based on
More informationIntro to Information Theory
Intro to Information Theory Math Circle February 11, 2018 1. Random variables Let us review discrete random variables and some notation. A random variable X takes value a A with probability P (a) 0. Here
More informationCompressing Tabular Data via Pairwise Dependencies
Compressing Tabular Data via Pairwise Dependencies Amir Ingber, Yahoo! Research TCE Conference, June 22, 2017 Joint work with Dmitri Pavlichin, Tsachy Weissman (Stanford) Huge datasets: everywhere - Internet
More informationInformation Theory and Coding Techniques: Chapter 1.1. What is Information Theory? Why you should take this course?
Information Theory and Coding Techniques: Chapter 1.1 What is Information Theory? Why you should take this course? 1 What is Information Theory? Information Theory answers two fundamental questions in
More informationBayesian Learning Extension
Bayesian Learning Extension This document will go over one of the most useful forms of statistical inference known as Baye s Rule several of the concepts that extend from it. Named after Thomas Bayes this
More informationMultimedia Communications. Mathematical Preliminaries for Lossless Compression
Multimedia Communications Mathematical Preliminaries for Lossless Compression What we will see in this chapter Definition of information and entropy Modeling a data source Definition of coding and when
More informationExercise 1: Basics of probability calculus
: Basics of probability calculus Stig-Arne Grönroos Department of Signal Processing and Acoustics Aalto University, School of Electrical Engineering stig-arne.gronroos@aalto.fi [21.01.2016] Ex 1.1: Conditional
More informationChapter 6 The Structural Risk Minimization Principle
Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University March 23, 2004 Objectives Structural risk minimization
More informationMinimum message length estimation of mixtures of multivariate Gaussian and von Mises-Fisher distributions
Minimum message length estimation of mixtures of multivariate Gaussian and von Mises-Fisher distributions Parthan Kasarapu & Lloyd Allison Monash University, Australia September 8, 25 Parthan Kasarapu
More informationImage and Multidimensional Signal Processing
Image and Multidimensional Signal Processing Professor William Hoff Dept of Electrical Engineering &Computer Science http://inside.mines.edu/~whoff/ Image Compression 2 Image Compression Goal: Reduce amount
More informationInformation & Correlation
Information & Correlation Jilles Vreeken 11 June 2014 (TADA) Questions of the day What is information? How can we measure correlation? and what do talking drums have to do with this? Bits and Pieces What
More informationA Tutorial Introduction to the Minimum Description Length Principle
A Tutorial Introduction to the Minimum Description Length Principle Peter Grünwald Centrum voor Wiskunde en Informatica Kruislaan 413, 1098 SJ Amsterdam The Netherlands pdg@cwi.nl www.grunwald.nl Abstract
More informationComplex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity
Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity Eckehard Olbrich MPI MiS Leipzig Potsdam WS 2007/08 Olbrich (Leipzig) 26.10.2007 1 / 18 Overview 1 Summary
More informationSource Coding Techniques
Source Coding Techniques. Huffman Code. 2. Two-pass Huffman Code. 3. Lemple-Ziv Code. 4. Fano code. 5. Shannon Code. 6. Arithmetic Code. Source Coding Techniques. Huffman Code. 2. Two-path Huffman Code.
More informationPROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability
More informationComplexity 6: AIT. Outline. Dusko Pavlovic. Kolmogorov. Solomonoff. Chaitin: The number of wisdom RHUL Spring Complexity 6: AIT.
Outline Complexity Theory Part 6: did we achieve? Algorithmic information and logical depth : Algorithmic information : Algorithmic probability : The number of wisdom RHUL Spring 2012 : Logical depth Outline
More information! Where are we on course map? ! What we did in lab last week. " How it relates to this week. ! Compression. " What is it, examples, classifications
Lecture #3 Compression! Where are we on course map?! What we did in lab last week " How it relates to this week! Compression " What is it, examples, classifications " Probability based compression # Huffman
More information17.1 Binary Codes Normal numbers we use are in base 10, which are called decimal numbers. Each digit can be 10 possible numbers: 0, 1, 2, 9.
( c ) E p s t e i n, C a r t e r, B o l l i n g e r, A u r i s p a C h a p t e r 17: I n f o r m a t i o n S c i e n c e P a g e 1 CHAPTER 17: Information Science 17.1 Binary Codes Normal numbers we use
More informationSIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding
SIGNAL COMPRESSION Lecture 7 Variable to Fix Encoding 1. Tunstall codes 2. Petry codes 3. Generalized Tunstall codes for Markov sources (a presentation of the paper by I. Tabus, G. Korodi, J. Rissanen.
More informationSource Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria
Source Coding Master Universitario en Ingeniería de Telecomunicación I. Santamaría Universidad de Cantabria Contents Introduction Asymptotic Equipartition Property Optimal Codes (Huffman Coding) Universal
More informationNotes on Information, Reference, and Machine Thirdness
Notes on Information, Reference, and Machine Thirdness Rafael Alvarado Data Science Institute Human & Machine Intelligence Group Humanities Informatics Lab 24 July 2018 Overview Paul proposes that neural
More informationThe memory centre IMUJ PREPRINT 2012/03. P. Spurek
The memory centre IMUJ PREPRINT 202/03 P. Spurek Faculty of Mathematics and Computer Science, Jagiellonian University, Łojasiewicza 6, 30-348 Kraków, Poland J. Tabor Faculty of Mathematics and Computer
More information4.8 Huffman Codes. These lecture slides are supplied by Mathijs de Weerd
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd Data Compression Q. Given a text that uses 32 symbols (26 different letters, space, and some punctuation characters), how can we
More informationShannon-Fano-Elias coding
Shannon-Fano-Elias coding Suppose that we have a memoryless source X t taking values in the alphabet {1, 2,..., L}. Suppose that the probabilities for all symbols are strictly positive: p(i) > 0, i. The
More informationUniversal probability distributions, two-part codes, and their optimal precision
Universal probability distributions, two-part codes, and their optimal precision Contents 0 An important reminder 1 1 Universal probability distributions in theory 2 2 Universal probability distributions
More informationUNIT I INFORMATION THEORY. I k log 2
UNIT I INFORMATION THEORY Claude Shannon 1916-2001 Creator of Information Theory, lays the foundation for implementing logic in digital circuits as part of his Masters Thesis! (1939) and published a paper
More informationSIGNAL COMPRESSION Lecture Shannon-Fano-Elias Codes and Arithmetic Coding
SIGNAL COMPRESSION Lecture 3 4.9.2007 Shannon-Fano-Elias Codes and Arithmetic Coding 1 Shannon-Fano-Elias Coding We discuss how to encode the symbols {a 1, a 2,..., a m }, knowing their probabilities,
More information( c ) E p s t e i n, C a r t e r a n d B o l l i n g e r C h a p t e r 1 7 : I n f o r m a t i o n S c i e n c e P a g e 1
( c ) E p s t e i n, C a r t e r a n d B o l l i n g e r 2 0 1 6 C h a p t e r 1 7 : I n f o r m a t i o n S c i e n c e P a g e 1 CHAPTER 17: Information Science In this chapter, we learn how data can
More informationEntropy Coding. Connectivity coding. Entropy coding. Definitions. Lossles coder. Input: a set of symbols Output: bitstream. Idea
Connectivity coding Entropy Coding dd 7, dd 6, dd 7, dd 5,... TG output... CRRRLSLECRRE Entropy coder output Connectivity data Edgebreaker output Digital Geometry Processing - Spring 8, Technion Digital
More informationIntroduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.
L65 Dept. of Linguistics, Indiana University Fall 205 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission rate
More informationMultimedia. Multimedia Data Compression (Lossless Compression Algorithms)
Course Code 005636 (Fall 2017) Multimedia Multimedia Data Compression (Lossless Compression Algorithms) Prof. S. M. Riazul Islam, Dept. of Computer Engineering, Sejong University, Korea E-mail: riaz@sejong.ac.kr
More informationMODULE -4 BAYEIAN LEARNING
MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities
More informationIntroduction to Information Theory
Introduction to Information Theory Impressive slide presentations Radu Trîmbiţaş UBB October 2012 Radu Trîmbiţaş (UBB) Introduction to Information Theory October 2012 1 / 19 Transmission of information
More informationDept. of Linguistics, Indiana University Fall 2015
L645 Dept. of Linguistics, Indiana University Fall 2015 1 / 28 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission
More informationarxiv: v2 [cs.it] 11 Dec 2012
Cross-Entropy Clustering J. Tabor Faculty of Mathematics and Computer Science, Jagiellonian University, Lojasiewicza 6, 30-348 Kraków, Poland arxiv:1210.5594v2 [cs.it] 11 Dec 2012 P. Spurek Faculty of
More informationMDL Histogram Density Estimation
MDL Histogram Density Estimation Petri Kontkanen, Petri Myllymäki Complex Systems Computation Group (CoSCo) Helsinki Institute for Information Technology (HIIT) University of Helsinki and Helsinki University
More information2 Plain Kolmogorov Complexity
2 Plain Kolmogorov Complexity In this section, we introduce plain Kolmogorov Complexity, prove the invariance theorem - that is, the complexity of a string does not depend crucially on the particular model
More informationInformation Theory. Week 4 Compressing streams. Iain Murray,
Information Theory http://www.inf.ed.ac.uk/teaching/courses/it/ Week 4 Compressing streams Iain Murray, 2014 School of Informatics, University of Edinburgh Jensen s inequality For convex functions: E[f(x)]
More information10-704: Information Processing and Learning Fall Lecture 10: Oct 3
0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 0: Oct 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy of
More informationU Logo Use Guidelines
COMP2610/6261 - Information Theory Lecture 15: Arithmetic Coding U Logo Use Guidelines Mark Reid and Aditya Menon logo is a contemporary n of our heritage. presents our name, d and our motto: arn the nature
More informationApplied Logic. Lecture 4 part 2 Bayesian inductive reasoning. Marcin Szczuka. Institute of Informatics, The University of Warsaw
Applied Logic Lecture 4 part 2 Bayesian inductive reasoning Marcin Szczuka Institute of Informatics, The University of Warsaw Monographic lecture, Spring semester 2017/2018 Marcin Szczuka (MIMUW) Applied
More informationMotivation for Arithmetic Coding
Motivation for Arithmetic Coding Motivations for arithmetic coding: 1) Huffman coding algorithm can generate prefix codes with a minimum average codeword length. But this length is usually strictly greater
More informationCSE 421 Greedy: Huffman Codes
CSE 421 Greedy: Huffman Codes Yin Tat Lee 1 Compression Example 100k file, 6 letter alphabet: File Size: ASCII, 8 bits/char: 800kbits 2 3 > 6; 3 bits/char: 300kbits better: 2.52 bits/char 74%*2 +26%*4:
More informationThe Expectation-Maximization Algorithm
The Expectation-Maximization Algorithm Francisco S. Melo In these notes, we provide a brief overview of the formal aspects concerning -means, EM and their relation. We closely follow the presentation in
More informationContext tree models for source coding
Context tree models for source coding Toward Non-parametric Information Theory Licence de droits d usage Outline Lossless Source Coding = density estimation with log-loss Source Coding and Universal Coding
More informationLecture 10 : Basic Compression Algorithms
Lecture 10 : Basic Compression Algorithms Modeling and Compression We are interested in modeling multimedia data. To model means to replace something complex with a simpler (= shorter) analog. Some models
More informationThe Regularized EM Algorithm
The Regularized EM Algorithm Haifeng Li Department of Computer Science University of California Riverside, CA 92521 hli@cs.ucr.edu Keshu Zhang Human Interaction Research Lab Motorola, Inc. Tempe, AZ 85282
More informationPERFECT SECRECY AND ADVERSARIAL INDISTINGUISHABILITY
PERFECT SECRECY AND ADVERSARIAL INDISTINGUISHABILITY BURTON ROSENBERG UNIVERSITY OF MIAMI Contents 1. Perfect Secrecy 1 1.1. A Perfectly Secret Cipher 2 1.2. Odds Ratio and Bias 3 1.3. Conditions for Perfect
More informationEntropies & Information Theory
Entropies & Information Theory LECTURE I Nilanjana Datta University of Cambridge,U.K. See lecture notes on: http://www.qi.damtp.cam.ac.uk/node/223 Quantum Information Theory Born out of Classical Information
More informationRun-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE
General e Image Coder Structure Motion Video x(s 1,s 2,t) or x(s 1,s 2 ) Natural Image Sampling A form of data compression; usually lossless, but can be lossy Redundancy Removal Lossless compression: predictive
More informationMinimum Description Length Induction, Bayesianism, and Kolmogorov Complexity
446 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 46, NO 2, MARCH 2000 Minimum Description Length Induction, Bayesianism, Kolmogorov Complexity Paul M B Vitányi Ming Li Abstract The relationship between
More informationChapter 9 Fundamental Limits in Information Theory
Chapter 9 Fundamental Limits in Information Theory Information Theory is the fundamental theory behind information manipulation, including data compression and data transmission. 9.1 Introduction o For
More informationKotebe Metropolitan University Department of Computer Science and Technology Multimedia (CoSc 4151)
Kotebe Metropolitan University Department of Computer Science and Technology Multimedia (CoSc 4151) Chapter Three Multimedia Data Compression Part I: Entropy in ordinary words Claude Shannon developed
More informationDecision Tree Learning - ID3
Decision Tree Learning - ID3 n Decision tree examples n ID3 algorithm n Occam Razor n Top-Down Induction in Decision Trees n Information Theory n gain from property 1 Training Examples Day Outlook Temp.
More informationBITS F464: MACHINE LEARNING
BITS F464: MACHINE LEARNING Lecture-09: Concept Learning Dr. Kamlesh Tiwari Assistant Professor Department of Computer Science and Information Systems Engineering, BITS Pilani, Rajasthan-333031 INDIA Jan
More informationMinimum Message Length Clustering of Spatially-Correlated Data with Varying Inter-Class Penalties
Minimum Message Length Clustering of Spatially-Correlated Data with Varying Inter-Class Penalties Gerhard Visser Clayton School of I.T., Monash University, Clayton, Vic. 3168, Australia, (gvis1@student.monash.edu.au)
More informationExpectation Maximization (EM)
Expectation Maximization (EM) The EM algorithm is used to train models involving latent variables using training data in which the latent variables are not observed (unlabeled data). This is to be contrasted
More informationImplementation of Lossless Huffman Coding: Image compression using K-Means algorithm and comparison vs. Random numbers and Message source
Implementation of Lossless Huffman Coding: Image compression using K-Means algorithm and comparison vs. Random numbers and Message source Ali Tariq Bhatti 1, Dr. Jung Kim 2 1,2 Department of Electrical
More informationOutline. Introduction. Bayesian Probability Theory Bayes rule Bayes rule applied to learning Bayesian learning and the MDL principle
Outline Introduction Bayesian Probability Theory Bayes rule Bayes rule applied to learning Bayesian learning and the MDL principle Sequence Prediction and Data Compression Bayesian Networks Copyright 2015
More informationAlgorithmic Probability
Algorithmic Probability From Scholarpedia From Scholarpedia, the free peer-reviewed encyclopedia p.19046 Curator: Marcus Hutter, Australian National University Curator: Shane Legg, Dalle Molle Institute
More informationGaussian Mixture Distance for Information Retrieval
Gaussian Mixture Distance for Information Retrieval X.Q. Li and I. King fxqli, ingg@cse.cuh.edu.h Department of omputer Science & Engineering The hinese University of Hong Kong Shatin, New Territories,
More informationDistribution of Environments in Formal Measures of Intelligence: Extended Version
Distribution of Environments in Formal Measures of Intelligence: Extended Version Bill Hibbard December 2008 Abstract This paper shows that a constraint on universal Turing machines is necessary for Legg's
More informationChapter 2: Source coding
Chapter 2: meghdadi@ensil.unilim.fr University of Limoges Chapter 2: Entropy of Markov Source Chapter 2: Entropy of Markov Source Markov model for information sources Given the present, the future is independent
More informationFormal Modeling in Cognitive Science Lecture 29: Noisy Channel Model and Applications;
Formal Modeling in Cognitive Science Lecture 9: and ; ; Frank Keller School of Informatics University of Edinburgh keller@inf.ed.ac.uk Proerties of 3 March, 6 Frank Keller Formal Modeling in Cognitive
More informationSelective Use Of Multiple Entropy Models In Audio Coding
Selective Use Of Multiple Entropy Models In Audio Coding Sanjeev Mehrotra, Wei-ge Chen Microsoft Corporation One Microsoft Way, Redmond, WA 98052 {sanjeevm,wchen}@microsoft.com Abstract The use of multiple
More informationA Simple Introduction to Information, Channel Capacity and Entropy
A Simple Introduction to Information, Channel Capacity and Entropy Ulrich Hoensch Rocky Mountain College Billings, MT 59102 hoenschu@rocky.edu Friday, April 21, 2017 Introduction A frequently occurring
More informationStatistical and Inductive Inference by Minimum Message Length
C.S. Wallace Statistical and Inductive Inference by Minimum Message Length With 22 Figures Springer Contents Preface 1. Inductive Inference 1 1.1 Introduction 1 1.2 Inductive Inference 5 1.3 The Demise
More informationDecision Tree Learning Mitchell, Chapter 3. CptS 570 Machine Learning School of EECS Washington State University
Decision Tree Learning Mitchell, Chapter 3 CptS 570 Machine Learning School of EECS Washington State University Outline Decision tree representation ID3 learning algorithm Entropy and information gain
More informationClassification & Information Theory Lecture #8
Classification & Information Theory Lecture #8 Introduction to Natural Language Processing CMPSCI 585, Fall 2007 University of Massachusetts Amherst Andrew McCallum Today s Main Points Automatically categorizing
More informationLecture 11: Continuous-valued signals and differential entropy
Lecture 11: Continuous-valued signals and differential entropy Biology 429 Carl Bergstrom September 20, 2008 Sources: Parts of today s lecture follow Chapter 8 from Cover and Thomas (2007). Some components
More informationFormal Definition of Computation. August 28, 2013
August 28, 2013 Computation model The model of computation considered so far is the work performed by a finite automaton Finite automata were described informally, using state diagrams, and formally, as
More information10-704: Information Processing and Learning Fall Lecture 9: Sept 28
10-704: Information Processing and Learning Fall 2016 Lecturer: Siheng Chen Lecture 9: Sept 28 Note: These notes are based on scribed notes from Spring15 offering of this course. LaTeX template courtesy
More informationTTIC 31230, Fundamentals of Deep Learning David McAllester, April Information Theory and Distribution Modeling
TTIC 31230, Fundamentals of Deep Learning David McAllester, April 2017 Information Theory and Distribution Modeling Why do we model distributions and conditional distributions using the following objective
More informationLecture 1: Introduction, Entropy and ML estimation
0-704: Information Processing and Learning Spring 202 Lecture : Introduction, Entropy and ML estimation Lecturer: Aarti Singh Scribes: Min Xu Disclaimer: These notes have not been subjected to the usual
More informationCS154, Lecture 12: Kolmogorov Complexity: A Universal Theory of Data Compression
CS154, Lecture 12: Kolmogorov Complexity: A Universal Theory of Data Compression Rosencrantz & Guildenstern Are Dead (Tom Stoppard) Rigged Lottery? And the winning numbers are: 1, 2, 3, 4, 5, 6 But is
More informationLecture 4 : Adaptive source coding algorithms
Lecture 4 : Adaptive source coding algorithms February 2, 28 Information Theory Outline 1. Motivation ; 2. adaptive Huffman encoding ; 3. Gallager and Knuth s method ; 4. Dictionary methods : Lempel-Ziv
More informationThe Minimum Message Length Principle for Inductive Inference
The Principle for Inductive Inference Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population Health University of Melbourne University of Helsinki, August 25,
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More information1 Introduction to information theory
1 Introduction to information theory 1.1 Introduction In this chapter we present some of the basic concepts of information theory. The situations we have in mind involve the exchange of information through
More information6.02 Fall 2012 Lecture #1
6.02 Fall 2012 Lecture #1 Digital vs. analog communication The birth of modern digital communication Information and entropy Codes, Huffman coding 6.02 Fall 2012 Lecture 1, Slide #1 6.02 Fall 2012 Lecture
More information(Classical) Information Theory III: Noisy channel coding
(Classical) Information Theory III: Noisy channel coding Sibasish Ghosh The Institute of Mathematical Sciences CIT Campus, Taramani, Chennai 600 113, India. p. 1 Abstract What is the best possible way
More informationLecture 1: Shannon s Theorem
Lecture 1: Shannon s Theorem Lecturer: Travis Gagie January 13th, 2015 Welcome to Data Compression! I m Travis and I ll be your instructor this week. If you haven t registered yet, don t worry, we ll work
More information1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H.
Problem sheet Ex. Verify that the function H(p,..., p n ) = k p k log p k satisfies all 8 axioms on H. Ex. (Not to be handed in). looking at the notes). List as many of the 8 axioms as you can, (without
More informationAnnouncements. CompSci 102 Discrete Math for Computer Science. Chap. 3.1 Algorithms. Specifying Algorithms
CompSci 102 Discrete Math for Computer Science Announcements Read for next time Chap. 3.1-3.3 Homework 3 due Tuesday We ll finish Chapter 2 first today February 7, 2012 Prof. Rodger Chap. 3.1 Algorithms
More informationKolmogorov complexity
Kolmogorov complexity In this section we study how we can define the amount of information in a bitstring. Consider the following strings: 00000000000000000000000000000000000 0000000000000000000000000000000000000000
More informationlossless, optimal compressor
6. Variable-length Lossless Compression The principal engineering goal of compression is to represent a given sequence a, a 2,..., a n produced by a source as a sequence of bits of minimal possible length.
More informationClustering with k-means and Gaussian mixture distributions
Clustering with k-means and Gaussian mixture distributions Machine Learning and Category Representation 2014-2015 Jakob Verbeek, ovember 21, 2014 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.14.15
More information