Kolmogorov complexity: a primer. Alexander Shen, LIF CNRS & Univ. Aix Marseille. June 26, 2009

Similar documents
FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

2 Plain Kolmogorov Complexity

Probabilistic Constructions of Computable Objects and a Computable Version of Lovász Local Lemma.

,

Kolmogorov complexity

CISC 876: Kolmogorov Complexity

COS597D: Information Theory in Computer Science October 19, Lecture 10

Production-rule complexity of recursive structures

CS154, Lecture 12: Kolmogorov Complexity: A Universal Theory of Data Compression

Lecture 6: The Pigeonhole Principle and Probability Spaces

CS4800: Algorithms & Data Jonathan Ullman

Lecture 13: Foundations of Math and Kolmogorov Complexity

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

Kolmogorov complexity and its applications

lossless, optimal compressor

Kolmogorov complexity ; induction, prediction and compression

Kolmogorov complexity and its applications

Cryptography and Network Security Prof. D. Mukhopadhyay Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Automatic Kolmogorov complexity and normality revisited

Symmetry of Information: A Closer Look

Algorithmic Information Theory and Kolmogorov Complexity

Chapter 5: Data Compression

3 Self-Delimiting Kolmogorov complexity

CSE 421 Greedy: Huffman Codes

Computability Theory

The axiomatic power of Kolmogorov complexity

6.02 Fall 2012 Lecture #1

Autumn Coping with NP-completeness (Conclusion) Introduction to Data Compression

Algorithmic Information Theory and Foundations of Probability

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

arxiv: v1 [cs.it] 17 Sep 2017

arxiv: v1 [cs.it] 14 Jan 2016

Kolmogorov Complexity

Data Compression Techniques

Kolmogorov complexity as a language

As in the book. Each of the other thermopotentials yield similar and useful relations. Let s just do the Legendre Transformations and list them all:

Kolmogorov Complexity

Lecture # 12. Agenda: I. Vector Program Relaxation for Max Cut problem. Consider the vectors are in a sphere. Lecturer: Prof.

Lecture 5-1. The Incompressibility Method, continued

Automatic Kolmogorov complexity and normality revisited

Information Theory and Coding Techniques: Chapter 1.1. What is Information Theory? Why you should take this course?

Shannon and Poisson. sergio verdú

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

Computing and Communications 2. Information Theory -Entropy

Proving languages to be nonregular

Information Complexity vs. Communication Complexity: Hidden Layers Game

Algorithmic Information Theory

CS1800: Mathematical Induction. Professor Kevin Gold

AND RANDOMNESS1 by P. Martin-Lof University of Stockholm

10-704: Information Processing and Learning Fall Lecture 10: Oct 3

A Repetitive Corpus Testbed

Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity

PHYS Statistical Mechanics I Assignment 4 Solutions

Consistency of Feature Markov Processes

Intro to Information Theory

Algorithmic Information Theory

Game interpretation of Kolmogorov complexity

CSEP 521 Applied Algorithms Spring Statistical Lossless Data Compression

Lecture 10 : Basic Compression Algorithms

Homework Set #2 Data Compression, Huffman code and AEP

Pseudorandom Generators

3F1 Information Theory, Lecture 3

Finish K-Complexity, Start Time Complexity

CS 630 Basic Probability and Information Theory. Tim Campbell

Stream Codes. 6.1 The guessing game

Lecture 5: GPs and Streaming regression

Lecture 3: Randomness in Computation

CprE 281: Digital Logic

3F1 Information Theory, Lecture 3

Compressing Kinetic Data From Sensor Networks. Sorelle A. Friedler (Swat 04) Joint work with David Mount University of Maryland, College Park

Complexity 6: AIT. Outline. Dusko Pavlovic. Kolmogorov. Solomonoff. Chaitin: The number of wisdom RHUL Spring Complexity 6: AIT.

City, University of London Institutional Repository

Assignment 3 with Solutions

Multimedia. Multimedia Data Compression (Lossless Compression Algorithms)

Shift-complex Sequences

Information similarity metrics in information security and forensics

Applications of Kolmogorov Complexity to Graphs

c 1999 Society for Industrial and Applied Mathematics

Introduction to Kolmogorov Complexity

repetition, part ii Ole-Johan Skrede INF Digital Image Processing

Compror: On-line lossless data compression with a factor oracle

Recursion: Introduction and Correctness

Compression Complexity

Data Compression. Part I

Lecture 1: Introduction, Entropy and ML estimation

1. Basics of Information

Kolmogorov Complexity and Diophantine Approximation

And now for something completely different

Lecture 1 : Data Compression and Entropy

Theorem (Special Case of Ramsey s Theorem) R(k, l) is finite. Furthermore, it satisfies,

ELEMENTS O F INFORMATION THEORY

Algorithmic statistics revisited

Lecture 4 : Adaptive source coding algorithms

Pseudorandom Generators

Sequence comparison by compression

Thermodynamics of computation

Information Theory. Week 4 Compressing streams. Iain Murray,

1 Maintaining a Dictionary

Automata & languages. A primer on the Theory of Computation. Laurent Vanbever. ETH Zürich (D-ITET) October,

Part 3 out of 5. Automata & languages. A primer on the Theory of Computation. Last week, we learned about closure and equivalence of regular languages

Transcription:

Kolmogorov complexity: a primer Alexander Shen, LIF CNRS & Univ. Aix Marseille June 26, 2009

Information density The same English text uses four times more space in UTF-32 compared to ASCII

Information density The same English text uses four times more space in UTF-32 compared to ASCII density = weight volume

Information density The same English text uses four times more space in UTF-32 compared to ASCII information density = density = weight volume amount of information length

Amount of information Kolmogorov, 1965: Three approaches to the definition of the amount of information

Amount of information Kolmogorov, 1965: Three approaches to the definition of the amount of information Shannon approach: entropy of a random variable

Amount of information Kolmogorov, 1965: Three approaches to the definition of the amount of information Shannon approach: entropy of a random variable 0.1 : 0.9 random variable: about 0.47 bits/value 0.5 : 0.5 random variable: 1 bit/value

Amount of information Kolmogorov, 1965: Three approaches to the definition of the amount of information Shannon approach: entropy of a random variable 0.1 : 0.9 random variable: about 0.47 bits/value 0.5 : 0.5 random variable: 1 bit/value Not usable for finite object (e.g., DNA, files, etc.)

Compressed size

Compressed size Original file F contains f bits

Compressed size Original file F contains f bits Compressed file G = C(F ) contains g bits

Compressed size Original file F contains f bits Compressed file G = C(F ) contains g bits (we hope that g < f )

Compressed size Original file F contains f bits Compressed file G = C(F ) contains g bits (we hope that g < f ) Decompressing produces original file: F = D(G) = D(C(F ))

Compressed size Original file F contains f bits Compressed file G = C(F ) contains g bits (we hope that g < f ) Decompressing produces original file: gzip, gunzip F = D(G) = D(C(F ))

Choosing a (de)compressor

Choosing a (de)compressor How good the compression could be?

Choosing a (de)compressor How good the compression could be? Is it possible that every string (of length, say, 100000) is compressed by at least 10%?

Choosing a (de)compressor How good the compression could be? Is it possible that every string (of length, say, 100000) is compressed by at least 10%? Compressor should be injective: X Y C(X ) C(Y )

Choosing a (de)compressor How good the compression could be? Is it possible that every string (of length, say, 100000) is compressed by at least 10%? Compressor should be injective: X Y C(X ) C(Y ) Trade-off: for any string there is a compressor/decompressor pair that compresses this string well

Comparing (de)compressors

Comparing (de)compressors C 1 /D 1 is better than C 2 /D 2 if C 1 (F ) C 2 (F ) for every F

Comparing (de)compressors C 1 /D 1 is better than C 2 /D 2 if C 1 (F ) C 2 (F ) for every F C 1 /D 1 is asymptotically better than C 2 /D 2 if C 1 (F ) C 2 (F ) + c for every F and some c (not depending on F )

Comparing (de)compressors C 1 /D 1 is better than C 2 /D 2 if C 1 (F ) C 2 (F ) for every F C 1 /D 1 is asymptotically better than C 2 /D 2 if C 1 (F ) C 2 (F ) + c for every F and some c (not depending on F ) Combining two compressors: for every two C 1 /D 1 and C 2 /D 2 there exists C/D asymptotically better than both

Comparing (de)compressors C 1 /D 1 is better than C 2 /D 2 if C 1 (F ) C 2 (F ) for every F C 1 /D 1 is asymptotically better than C 2 /D 2 if C 1 (F ) C 2 (F ) + c for every F and some c (not depending on F ) Combining two compressors: for every two C 1 /D 1 and C 2 /D 2 there exists C/D asymptotically better than both Proof: use C 1 or C 2 (whichever is better) and use the first bit to record this choice.

Comparing (de)compressors C 1 /D 1 is better than C 2 /D 2 if C 1 (F ) C 2 (F ) for every F C 1 /D 1 is asymptotically better than C 2 /D 2 if C 1 (F ) C 2 (F ) + c for every F and some c (not depending on F ) Combining two compressors: for every two C 1 /D 1 and C 2 /D 2 there exists C/D asymptotically better than both Proof: use C 1 or C 2 (whichever is better) and use the first bit to record this choice. Does an optimal compressor/decompressor pair exist?

No optimal compressor

No optimal compressor For every C/D there is C /D that is much better than C/D on some strings.

No optimal compressor For every C/D there is C /D that is much better than C/D on some strings. Why?

No optimal compressor For every C/D there is C /D that is much better than C/D on some strings. Why? For any length n there exists C-incompressible string of length n.

No optimal compressor For every C/D there is C /D that is much better than C/D on some strings. Why? For any length n there exists C-incompressible string of length n. (Pigeon-hole principle)

No optimal compressor For every C/D there is C /D that is much better than C/D on some strings. Why? For any length n there exists C-incompressible string of length n. (Pigeon-hole principle) New C compresses the first C-incompressible string of length n into 0binary(n) and any other x of length n into 1x.

Kolmogorov complexity

Kolmogorov complexity No compressors

Kolmogorov complexity No compressors Decompressor = any partial computable function (=algorithm) that maps strings to strings

Kolmogorov complexity No compressors Decompressor = any partial computable function (=algorithm) that maps strings to strings K D (x) = {min p : D(p) = x}

Kolmogorov complexity No compressors Decompressor = any partial computable function (=algorithm) that maps strings to strings K D (x) = {min p : D(p) = x} D 1 is (asymptotically) better that D 2 if K D1 (x) K D2 (x) + c for all x and some c.

Kolmogorov complexity No compressors Decompressor = any partial computable function (=algorithm) that maps strings to strings K D (x) = {min p : D(p) = x} D 1 is (asymptotically) better that D 2 if K D1 (x) K D2 (x) + c for all x and some c. There exist an optimal decompressor (=asymptotically better than any other one)

Optimal decompressor exists

Optimal decompressor exists Idea: self-extracting archive

Optimal decompressor exists Idea: self-extracting archive Decompressor: Read from left to right until a self-delimited program is found Run this program on the rest of input

Optimal decompressor exists Idea: self-extracting archive Decompressor: Read from left to right until a self-delimited program is found Run this program on the rest of input Universal interpreter U

Optimal decompressor exists Idea: self-extracting archive Decompressor: Read from left to right until a self-delimited program is found Run this program on the rest of input Universal interpreter U K U (x) K D (x) + the length of D program

Is Kolmogorov complexity well defined?

Is Kolmogorov complexity well defined? Fix some optimal decompressor D

Is Kolmogorov complexity well defined? Fix some optimal decompressor D Call K D (x) the Kolmogorov complexity of D

Is Kolmogorov complexity well defined? Fix some optimal decompressor D Call K D (x) the Kolmogorov complexity of D K(x) is not really defined for an individual x, but

Is Kolmogorov complexity well defined? Fix some optimal decompressor D Call K D (x) the Kolmogorov complexity of D K(x) is not really defined for an individual x, but function K is defined up to a O(1) additive term

Is Kolmogorov complexity well defined? Fix some optimal decompressor D Call K D (x) the Kolmogorov complexity of D K(x) is not really defined for an individual x, but function K is defined up to a O(1) additive term natural choices of D lead to K D that differ not very much

Berry paradox and non-computability

Berry paradox and non-computability D : n the first string of complexity > n

Berry paradox and non-computability D : n the first string of complexity > n D -complexity of this string is about log n: a contradiction?

Berry paradox and non-computability D : n the first string of complexity > n D -complexity of this string is about log n: a contradiction? Theorem: complexity function is not computable

Berry paradox and non-computability D : n the first string of complexity > n D -complexity of this string is about log n: a contradiction? Theorem: complexity function is not computable Theorem: there is no algorithmic way to generate strings of high complexity.

Most strings of given length are incompressible

Most strings of given length are incompressible Random string is incompressible with high probability

Most strings of given length are incompressible Random string is incompressible with high probability A lot of information, but not very useful

Information in X about Y

Information in X about Y Conditional complexity of Y when X is known: K D (Y X ) = min{l(p): D(P, X ) = Y }

Information in X about Y Conditional complexity of Y when X is known: K D (Y X ) = min{l(p): D(P, X ) = Y } Optimal conditional decompressor exists; conditional Kolmogorov complexity K(Y X )

Information in X about Y Conditional complexity of Y when X is known: K D (Y X ) = min{l(p): D(P, X ) = Y } Optimal conditional decompressor exists; conditional Kolmogorov complexity K(Y X ) K(X X ) 0

Information in X about Y Conditional complexity of Y when X is known: K D (Y X ) = min{l(p): D(P, X ) = Y } Optimal conditional decompressor exists; conditional Kolmogorov complexity K(Y X ) K(X X ) 0 K(XX X ) 0

Information in X about Y Conditional complexity of Y when X is known: K D (Y X ) = min{l(p): D(P, X ) = Y } Optimal conditional decompressor exists; conditional Kolmogorov complexity K(Y X ) K(X X ) 0 K(XX X ) 0 K(Y X ) K(Y ) + O(1)

Information in X about Y Conditional complexity of Y when X is known: K D (Y X ) = min{l(p): D(P, X ) = Y } Optimal conditional decompressor exists; conditional Kolmogorov complexity K(Y X ) K(X X ) 0 K(XX X ) 0 K(Y X ) K(Y ) + O(1) I (X : Y ) = K(Y ) K(Y X )

Information in X about Y Conditional complexity of Y when X is known: K D (Y X ) = min{l(p): D(P, X ) = Y } Optimal conditional decompressor exists; conditional Kolmogorov complexity K(Y X ) K(X X ) 0 K(XX X ) 0 K(Y X ) K(Y ) + O(1) I (X : Y ) = K(Y ) K(Y X ) I (X : Y ) = I (Y : X )

Randomness=incompressibility

Randomness=incompressibility Why we don t belive in a fair coin that produces 0101010101... 01?

Randomness=incompressibility Why we don t belive in a fair coin that produces 0101010101... 01? Random string = incompressible string

Randomness=incompressibility Why we don t belive in a fair coin that produces 0101010101... 01? Random string = incompressible string Theorem: random string has frequency of ones about 50%

Randomness=incompressibility Why we don t belive in a fair coin that produces 0101010101... 01? Random string = incompressible string Theorem: random string has frequency of ones about 50% Other laws of probability theory (e.g., frequencies of groups)

Probabilistic proofs

Probabilistic proofs To prove existence of something: show that this event has positive probability

Probabilistic proofs To prove existence of something: show that this event has positive probability Example: bit string of length 1000 that is square-free for substrings of size 100

Probabilistic proofs To prove existence of something: show that this event has positive probability Example: bit string of length 1000 that is square-free for substrings of size 100 Complexity version: prove that incompressible string of length 1000 is square free.

Probabilistic proofs To prove existence of something: show that this event has positive probability Example: bit string of length 1000 that is square-free for substrings of size 100 Complexity version: prove that incompressible string of length 1000 is square free. Large square means compressibility

Best paper of STOC 2009 (Robin Moser)

Best paper of STOC 2009 (Robin Moser) N bit variables b 1,..., b N

Best paper of STOC 2009 (Robin Moser) N bit variables b 1,..., b N some number of clauses of size m: (( )b i1 ( )b i2... ( )b im )

Best paper of STOC 2009 (Robin Moser) N bit variables b 1,..., b N some number of clauses of size m: (( )b i1 ( )b i2... ( )b im ) Neighbor clauses: clauses that share some variable

Best paper of STOC 2009 (Robin Moser) N bit variables b 1,..., b N some number of clauses of size m: (( )b i1 ( )b i2... ( )b im ) Neighbor clauses: clauses that share some variable Each clause has at most r neighbors; r = o(2 m )

Best paper of STOC 2009 (Robin Moser) N bit variables b 1,..., b N some number of clauses of size m: (( )b i1 ( )b i2... ( )b im ) Neighbor clauses: clauses that share some variable Each clause has at most r neighbors; r = o(2 m ) Then there exists a satisfying assignment

Proof inspired by Kolmogorov complexity

Proof inspired by Kolmogorov complexity Start with an incompressible source of random bits

Proof inspired by Kolmogorov complexity Start with an incompressible source of random bits Use them to initialize variables

Proof inspired by Kolmogorov complexity Start with an incompressible source of random bits Use them to initialize variables If all clauses are satisfied, then OK

Proof inspired by Kolmogorov complexity Start with an incompressible source of random bits Use them to initialize variables If all clauses are satisfied, then OK If not, fix all of them sequentially

Proof inspired by Kolmogorov complexity Start with an incompressible source of random bits Use them to initialize variables If all clauses are satisfied, then OK If not, fix all of them sequentially Fix a clause: reinitialize it with fresh random bits until it is true; fix neighbor clauses that are damaged

Proof inspired by Kolmogorov complexity Start with an incompressible source of random bits Use them to initialize variables If all clauses are satisfied, then OK If not, fix all of them sequentially Fix a clause: reinitialize it with fresh random bits until it is true; fix neighbor clauses that are damaged If this does not terminate for a long time, the random source is compressible (it is determined by the list of corrected clauses)

Other applications

Other applications Definition of an infinite random sequence

Other applications Definition of an infinite random sequence Connection with recursion theory

Other applications Definition of an infinite random sequence Connection with recursion theory Shannon information theory and algorithmic information theory

Other applications Definition of an infinite random sequence Connection with recursion theory Shannon information theory and algorithmic information theory Combinatorial interpretation. Example: V 2 S 1 S 2 S 3