Kolmogorov Complexity - PDF Free Download

Kolmogorov Complexity Davide Basilio Bartolini University of Illinois at Chicago Politecnico di Milano dbarto3@uic.edu davide.bartolini@mail.polimi.it 1 Abstract What follows is a survey of the field of Kolmogorov complexity. Kolmogorov complexity, named after the mathematician Andrej Nikolaevič Kolmogorov, is a measure of the algorithmic complexity of a certain object (represented as a string of symbols) in terms of how hard it is to describe it. This measure dispenses with the need to know the probability distribution that rules a certain object (in general, the distribution may not be defined at all) and is thus more general than Shannon s entropy, while describing basically the same fact from a slightly different point of view. This paper gives an overview of this field of study from the origins to some of the latest developments and applications. I. INTRODUCTION The idea of measuring the complexity inherent in a certain object in an algorithmic way was independently developed in the mid 1960s by Ray Solomonoff [12,13], Gregory John Chaitin [4] and Andrej Nikolaevič Kolmogorov [7]. This new measure (know today as Kolmogorov complexity, or algorithmic complexity) is defined according to a novel algorithmic way of giving a quantitative definition of information, in contrast with the combinatorial and probabilistic approaches that were already known at the time [7]. Roughly speaking, the new approach is to define the complexity of an object depending on how hard it is to describe it. The universal measure of the inherent difficulty of giving a description, needed for such an approach to be applicable, is provided by the theory of computation in terms of the length of the shortest program that, run on a universal Turing machine, will give the studied object (represented as a string of symbols) as its output. The solid basis provided by the theory of computation allows to apply algorithmic complexity to virtually any computable object, without the need for it to be ruled by a known probability distribution. A. Structure of the survey First, some useful concepts from the theory of computation are exposed (Section I-B), then a brief introduction to the main topic is given (Section II) and the basic definitions needed to understand the theory are presented (Section II-A); also, some of the most interesting results are reported (Section II-B) and parallels are made with classical information theory (Section III). Then, some works extending Kolmogorov complexity theory in different areas - namely games theory (Section IV-A), quantum information theory (Section IV-B) and computational complexity theory (Section IV-C) - are described, to give an idea of some developments of the original theory and links to other research areas. Last (Section V), a few applications of Kolmogorov complexity to real-life problems are shown, to highlight how this theory proved useful in understanding real phenomena and in solving practical problems. B. Notions from computability theory To understand the definition of algorithmic complexity, some notions from computability theory are needed, which are briefly pointed out in this section. In particular, the existence of a universal model to evaluate the difficulty of describing a certain string is crucial to permit the definition of algorithmic complexity. This model is provided by the Turing machine, described by Alan Turing in 1937 and its universality comes from Church s thesis, which states that all (sufficiently complex) computational models are equivalent (i.e. they can compute the same family of functions) and each of them can be simulated by a universal Turing machine [5]. Now that the use of the Turing machine within this context is clear, let us define this computational model.

2 1) Turing Machine: Many equivalent definitions (more or less formal) are possible for a Turing machine; a formal definition is not necessary here and an intuitive idea should suffice. Intuitively, a 3-tape-symbol bounded transfer Turing machine [4] is a logical device formed by a control module (which is a finite-state automaton) and three tapes that can contain symbols or blanks; the tapes can be read/written, one symbol at a time, through dedicated heads. The machine is able to execute instructions written on its input tape (which is read-only) by changing its internal state, reading and writing symbols on its work tape and providing output by writing on its output tape. A graphical representation of such a device is shown in Figure 1, where the arrows indicate the symbol over which the heads are placed. From a mathematical point of view, a Turing machine can be viewed Fig. 1. Graphical representation of a Turing machine (adapted from [5] ). The arrows indicate the position of read/write heads as a map from a set of finite-length strings (which are the programs to be written on the input tape) to the set of finite- or infinite-length strings (retrieved from the output tape). In this representation, the work tape is not considered, as it is used only internally to the machine. In this context, when considering a binary Turing machine ( binary computing machine [4] ), the set of functions f : {0, 1} {0, 1} {0, 1} computable by a Turing macine is called the set of partial recursive functions. 2) Church thesis: Church thesis (also known as Church-Turing thesis), is a statement about the nature of computable (meaning computable by a machine) functions. Just like for the definition of the Turing machine, many equivalent formulations of the thesis exist; the most known version of it is due to Kleene and it goes: Every computable numerical partial function is partial recursive. One of the main consequences of the thesis is that there exists a universal Turing machine able to simulate (with proper emulation instructions provided at the beginning of its input tape) any other Turing machine. The next section shows that this result is crucial, since the difficulty of describing an object is defined in terms of the length of the shortest program (i.e. sequence of input symbols) needed for the universal Turing machine to halt having returned the wanted string on its output tape. II. MEASURING COMPLEXITY As already stated, the concept of algorithmic complexity emerged from three different authors around 1965. Each of these authors (Chaitin, Solomonoff and Kolmogorov) comes to the definition of complexity from a different background and using different notations. In particular, Chaitin [4] tries to apply information-theoretic and probabilistic ideas to recursive function theory; Solomonoff [12,13] comes to defining complexity when trying to obtain the prior probability of strings under an inductive inference point of view; Kolmogorov [7] directly proposes a novel algorithmic way of quantitatively defining information. It is very interesting to see how the three of them converged to the same results when starting from very different areas of research. The following tractation will try to unify the notation from the three sources, in order to provide a coherent and understandable overview of the main ideas in the theory. A. Definitions 1) Kolmogorov complexity: Given a finite-length binary string x of length l(x), denoting a univeral Turing machine as M and writing M(P ) to indicate the output of M when executing the program P (which, in turn, is a finite binary sequence of length l(p )), we can define 1 Kolmogorov (or algorithmic) complexity as the minimum length for a program that, evaluated by M, makes it return x as the output and then halt. 1 This same definition can be found, even if with different notations, in all of [4], [12,13], [7].

3 Definition 1 (Kolmogorov complexity): K M (x) min l(p ) (1) P :M(P )=x An interesting interpretation of K M (x) is given by Solomonoff [12], where the a priori probability of a binary string x is defined as 2 K M(x). This shows that the algorithmic complexity can be used as a measure of randomness where the higher the complexity, the more random the string happens to be. To give the feeling of this concept, Solomonoff [13] states that this method of inductive inference (i.e. of finding the prior probability of a string) can be, in a sense, seen as an inversion of Huffman coding where the minimal code of the string is given and, from this code, it is possible to derive the probability of the string. Note that these operations can be performed without the need for the string to be ruled by a known probability distribution of any kind. 2) Conditional Kolmogorov complexity: It should be quite clear that for any program (i.e. algorithmic description) to correctly yield the wanted string x when processed by a universal Turing machine M, it needs to include information about the length of the string l(x), if this information is not already present in the machine. When this is the case, the definition of complexity provided in Definition 1 can be used. On the other hand, it is possible to consider the case when the length of the string is hardcoded in or separately given to the Turing machine used for the computation; in this case, the needed program will be shorter than the one including information about l(x) and the conditional Kolmogorov complexity [5,7] is defined as: Definition 2 (Conditional Kolmogorov complexity): K M (x l(x)) min l(p ) (2) P :M(P,l(x))=x 3) Examples of complexity: The definition of algorithmic complexity provides a really smart way of defining how difficult it is to define one object and it should appear clear that this is related to the randomness of the object and to how much information it can convey. To get an intuitive idea of how this works, a couple of examples of more or less complex objects follow. Consider the following objects chosen from the domain of alphanumeric strings: 1) KolmogorovKolmogorovKolmogorovKolmogorovKolmogorovKolmogorov 2) acj73fd3hw24f3dj4s6e2dsg457hew46fsda34701sths45fa554nary5782 It should be quite intuitive to see that the complexity of the second string is way greater than the complexity of the first one. Keeping in mind that any computable function, with proper transformations, may be calculated with a Turing machine, the following considerations will stay at a high level of abstraction, to give an intuitive feeling of the matter. The first string is simply the repetition of the word Kolmogorov for six times. So, its algorithmic complexity can be no greater than the length of the substring Kolmogorov plus a little more information about the length of the whole string. Its conditional complexity is even lower, since the information about the length of the string does not need to be included in the description. Considering that the overhead of describing an extension of this string (i.e. the repetition of the word Kolmogorov for n times, with n > 6) adds little to the length of the description that we need to provide, it is easy to see that the complexity of such a string, for n, is less than the length of the string. The second string looks quite random and, probably, there is no better description, in terms of an algorithm to derive it, than reporting the exact characters of the string, plus the overhead of having to indicate its length. In this case, the complexity of the string is greater than its length and the conditional complexity is about the same as the length of the string. Other interesting examples may be found when considering objects that can be efficiently described using mathematics. For instance, the binary encoding of π up to the n-th decimal digit can be algorithmically described with a program that has almost constant size for any n. Hence, this string of bits that could look pretty random at first glance has in fact a fairly low algorithmic complexity. The following definitions and theorems should give a formal shape to the intuitions provided by the above examples. 4) Algorithmic randomness and incompressibility: The notions of algorithmic complexity, as defined above, permit to define the conditions under which a string x = (x 1 x 2... x n ) of length n can be considered random. A definition of this concept of algorithmic randomness [3,5] follows:

4 Definition 3 (Algorithmic randomness): x is algorithmically random iff. K(x n) n (3) Definition 3 states that a string is to be considered random if its conditional Kolmogorov complexity (i.e. the length of its minimal description, given the length of the string) is greater than the length of the string itself. This makes sense when considering the examples of Section II-A3, where the string that looks less random happens to have an algorithmic complexity that is less than its length, while the opposite holds for the random string. The concept of conditional Kolmogorov complexity allows to define another property of a string x = (x 1 x 2... x n ), correlated with its algorithmic randomness. This property, defined below, is incompressibility [5] : Definition 4 (Incompressibility): K(x n) x is incompressible iff. lim = 1 (4) n n Roughly speaking, a string is incompressible if the length of its minimal description - given the length of the string - tends to be equal to the length of the string as n goes to infinity. This definition gives an interesting interpretation of Kolmogorov complexity in terms of how much a string can be compressed. In fact, a string with low Kolmogorov complexity can be computed by using a short program, which can be seen as a compressed version of the string. The original string can be reconstructed by running the program on an appropriate Turing machine, which plays the role of a decompressor. In the next section, some of the most interesting and meaningful results based on the definitions given above are presented. B. Main results The original papers that first introduced the concept of algorithmic complexity and successive work by the same authors and others provide a wide range of results based on the definitions provided in Section II-A. Probably, the most fundamental one is that Kolmogorov complexity is not computable, but it can be approximated from above by a computable process (hence, an upper bound exists, as shown below). The following tractation shows this and some other interesting results. 1) Universality: The first important result should look almost obvious after the brief discussion of computability theory provided in Section I-B and is presented in the following theorem [5] : Theorem 1 (Computer independence of Kolmogorov complexity): If U is a universal Turing machine, then for any other Turing machine (and, more generally, for any computing machine) A there exists a constant c A such that: K U (x) K A + c A The constant c A in the theorem is due to the overhead needed in the program for the universal Turing machine U to provide the instructions on how to emulate the other computer A and can be safely neglected when the length of x is big enough. Hence, due to the fact that there exists a universal Turing machine able to simulate any other computational machinery, the algorithmic complexity of an object is independent of any specific computer but for a constant term that becomes neglectable for l(x) = n. This result is really crucial to make Kolmogorov complexity a useful measure without the need to refer to a particular computing device. For the following two results, the field is restricted to binary strings and binary computers; see that this restriction does really not result in a loss of generality, since any string may be represented in a binary alphabet just by translating it with a proper encoding. Also, the notations K(x) and K(x l(x)) are used without specifying the specific Turing machine, thanks to the result shown in Theorem 1. 2) Upper bound on Kolmogorov complexity: An upper bound to the algorithmic complexity can be easily posed first for conditional complexity: Theorem 2 (Upper bound for conditional complexity): K(x l(x)) l(x) + c where c is a non negative constant. The proof is quite simple, as an effective program for obtaining a string x, when l(x) is known, is simply formed by the string itself with at most a constant overhead dependent on the computer the program is written for.

5 The above result can be extended to Kolmogorov complexity, when the length of the string is not known (i.e. a program to compute the string must be self-delimiting [3] ) as by adding a term that represents the inclusion of the information about l(x): Theorem 3 (Upper bound for Kolmogorov complexity): K(x) K(x l(x)) + 2 log l(x) + c The upper bound comes from the fact that a trivial method of making the program self-delimiting requires at most 2 log l(x) + c bits [5], but it can be refined by finding an optimized way to represent l(x) (i.e. by using iterated logarithms [4,5] ). 3) Lower bound on Kolmogorov complexity: A probably more interesting result is fixing a lower bound on algorithmic complexity or, more properly, a lower bound on the number of strings within a certain complexity. This is interesting because, by doing so, one can get a flavor of how likely it is for a string to be whether complex or simple to describe. Theorem 4 (Lower bound on Kolmogorov complexity [3,5] ): {x {0, 1} : K(x) k} < 2 k The theorem shows that there are not many string with low complexity and is simply proven by the fact that the number of binary programs of length less than k is 2 k 1 < 2 k. 4) Information-theoretic version of Gödel theorem: A more advanced result shown by Chaitin [3] is briefly reported here (without any claim of being axhaustive), as an aside to show how algorithmic information theory can provide an alternative of Gödel s famous theorem. Chaitin shows that, in an axiomatic theory, a lower bound n on the algorithmic complexity of a certain string of symbols defined in the theory can be established only if n is less than the algorithmic complexity of the axioms of the formal theory (i.e. the axioms used for the demonstration of the bound). This shows an inherent limitation of axiomatic theories that is assimilable to Gödel s incompleteness theorem by using an information theoretic argument based on Kolmogorov complexity. III. COMPLEXITY AND ENTROPY Algorithmic complexity and entropy are defined from very different backgrounds and, apparently, measure different aspects of an object. Despite of these apparent differences, it is possible to show that complexity and entropy are both a measure for the randomness of a string and, under some hipotheses, a relationship between the two can be proven, as shown in Theorem 5. Theorem 5 (Relation between Kolmogorov complexity and entropy [5,8] ): Let {X i } be a stochastic process drawn i.i.d. f(x), x X, X < and let f(x n ) = n i=1 f(x i). Under these conditions, it can be proven that [ ] 1 E n K(Xn n) H(X), as n. The above theorem shows that algorithmic complexity and entropy turn out to be very similar measures. Of course, this result holds only under the specified hypotheses, which ensure that both complexity and entropy are well defined. This result makes Kolmogorov complexity an even more meaningful tool since it is assimilable to entropy when both are defined, but can be used in a more general context where the probability distribution (or a good estimate) is not available for the studied object. IV. FURTHER DEVELOPMENTS Section II showed the main definitions and the most interesting results of the original theory of algorithmic complexity. In about ten years after the first papers (published in the mid 1960 s), the theory grew to a complete theoretical system and, in 1977, Chaitin published a review of what he called algorithmic information theory [3]. Beyond these direct developments of the theory, Kolmogorov complexity has been extended to very different areas. The following sections show some of these branches, giving an idea of how wide is the scope of the concept of algorithmic complexity.

6 A. Game interpretation One of the most fascinating of the areas where Kolmogorov complexity has been applied is Game theory. The link between the two theories is provided by Muchnick s theorem, which relates the theory of recursive sequences (which algorithmic complexity theory is based on) and game theory. In particular, this theorem associates every statement φ of recursion theory with a two-players game G φ with infinitely many moves represented by 0 s and 1 s. Considering this game, it is shown that if one of the players (called the Mathematician) has a computable winning strategy, then the statement φ is true, while if the other player (called Nature) has a computable winning strategy, then φ is false. Since Kolmogorov complexity is defined upon recursion theory, it is possible to exploit Muchnick s theorem to analyse the truth value of statements about the theory by the means of building a proper game, as defined in the theorem. This provides an interesting and unconventional point of view on the matter of Kolmogorov complexity and is extensively treated (with several examples) by Vereshchagin [15]. Other results in the field of algorithmic complexity are obtained with a game-theoretic approach by Muchnick and others in a very recent paper [10]. Here, again, it is shown that it is possible to prove statements about Kolmogorov complexity by constructing a special game and a winning strategy in the game. B. Quantum complexity The increasing interest in quantum computing encouraged some studies aimed at extending Kolmogorov complexity to the domain of quantum computation. The extension of this theory to quantum computing can be performed only by a prior redefinition of the relevant concepts of computation theory in this new area. Quite a lot of work has been done in this direction and Vitanyi gives an overview of the results in this field in a paper [16] which title recalls the original 1965 paper by Kolmogorov. In particular, it is shown that Kolmogorov complexity can be quite naturally extended based on quantum Turing machines and it can be used to describe the amount of information contained in a pure quantum state (i.e. a set of variables that fully describe a quantum system in probabilistic terms). One of the analogies of quantum algorithmic complexity and the classical version is that the former, as the latter, is upper bounded and it can be approximated from above by a computable process with arbitrarily small probability of error. C. Complexity and problem spaces One of the fields where an extension of the ideas of Kolmogorov complexity seems more natural is the one of complexity theory. There is an interesting work by Allender, Buhrman and Koucký [1] that investigates whether it is possible to characterize the complexity class PSPACE of problems solvable in polynomial space with a Turing machine by efficiently reducing it to the set R K of algorithmically random strings (as defined in Definition 3). The paper is pretty technical and leaves many open problems; the interesting fact here is that the ideas behind Kolmogorov complexity can be employed in trying a new approach to the study of problems in a very wide variety of research areas. V. APPLICATIONS Kolmogorov complexity has so far been extended and exploited in a number of different research areas, as is shown in Section IV. Beyond this, the concepts of this theory have been used in a variety of practical applications, where it helped to provide a unique approach to the analysis of real world problems. The following section illustrates a few recent applications of Kolmogorov complexity. A. Information assurance Kolmogorov complexity can be applied to retrieve important information about the state of a system; in particular, it has been used by Evans, Bush and Hershey [6] to design an approach to monitoring an information system against security flaws in both data and processes. In this work, the apparent complexity K(S) is defined as the best estimate of Kolmogorov complexity available to a party which is analysing the system and two metrics are proposed to evaluate the vulnerability of a process with input X and output Y : K(X.Y ) is defined as the complexity of the concatenated input and output of a process K(X Y ) represents the relative complexity of a process Based on this measures, the vulnerability of both processes and data is analysed, showing that the higher both the quantities are, the less vulnerable the system is. This result states, intuitively, that the more complex the

7 operations carried out by a certain process (from the point of view of a potential attacker), the harder it is for an attacker to understand what the process is doing in order to harm the system. Thanks to these results, a method to monitor the security level of a system, based on the ability of estimating the apparent complexity of the system to an attacker, is proposed and the conclusion is that Kolmogorov complexity can be a good candidate for further developments in this area. B. Spam filtering Another smart application of Kolmogorov complexity has been published by Spracklin and Saxton [14], who apply it in a email spam filter. The underlying idea is that, when highlighting in a text the words linked with spam, their distribution will be quite random in normal mail, while it will obey some ordered criterion in spam. This idea is applied by encoding the text of an email into a binary string where the words linked with spam are represented with a 1, while the other words are represented with a 0. Then, Kolmogorov complexity can be used to evaluate how random the resulting binary string is. The issue here is that Kolmogorov complexity is not computable; so, the idea is to apply a compression algorithm (run-lenght compression in the specific case) to get an estimate of Kolmogorov complexity. Based on this estimate, messages with high complexity are classified as non spam, while messages yielding a low complexity are identified as spam. The authors show that this approach ensures high accuracy (80% to 96%) and is much faster than other approaches used for the scope (for instance, Bayesian filters). C. Mental fatigue The capability of Kolmogorov complexity to be applied to any object encodable as a binary string permits to exploit it on virtually any data. For instance, Kolmogorov complexity is used in a medical context by Lian-yi and Chong-xun [9] to evaluate the level of mental fatigue of a person based on the signals coming from the EEG (Electroencephalogram). As in Section V-B [14], an estimate of Kolmogorov must be computed to be able to analyse the data; in this work, this is done by using a modified version of the Lempel-Ziv alogrithm applied to the discrete sequences representing the EEG signals. The results are encouraging, as it is shown that the estimate of Kolmogorov complexity of the EEG decreases as the mental fatigue increases. This indicates that the EEG signal is somehow less random when a person is in a state of mental fatigue. D. Complexity in machine learning Machine learning is an applied research field where randomness is often exploited in algorithms (for example, in the training algorithm of a neural network). One of the notions used in this field is known as Occam s razor (referring to the 14th-century English logician, theologian and Franciscan friar Father William of Ockham) and it can be - a little improperly - summarized as: the simplest explaination is the best. In machine learning terms, this means that the simpler the rules found to explain the training data, the better the generalization on the testing data. One problem in classical machine learning applications is the lack of a general and well founded way of determining which rules are the simplest ones for a given data set. Schmidhuber [11] discusses this issue and proposes a method based on Levin s complexity (which is a time-bounded extension of Kolmogorov complexity) to pose an ordering on the complexity of sets of rules and allow to efficiently choose the best ones. In this work, some experiments on simple problems (chosen to be computationally feasible) are described which show that the proposed method is able to yield solutions with generalization performance unmatchable by different training algorithms. VI. CONCLUSION Kolmogorov complexity is a fascinating concept which really captures how the complexity of an object can be described by scientific and rigorous means. This paper should have given an idea of how powerful and meaningful the ideas first introduced by Solomonoff, Chaitin and Kolmogorov are and how these ideas have been useful in exploring new possibilities in a great number of research areas and applications. Beyond the achieved results, many open problems remain and today the idea of Kolmogorov complexity can help the understanding of still unsolved complex problems. For instance, it is believed that there exist not fully understood strong links between the concept of Kolmogorov complexity and different areas of physics, from thermodinamics to black holes. As the bottomline, the field of Kolmogorov complexity is a mature but still very active research area, able to provide many useful results and answers, but leaving some appealing and challenging problems still to be solved.

8 REFERENCES [1] E. Allender, H. Buhrman, and M. Kouck, What can be efficiently reduced to the kolmogorov-random strings? Annals of Pure and Applied Logic, vol. 138, no. 1-3, pp. 2 19, 2006. (Not directly cited in text) [2] G. J. Chaitin, A theory of program size formally identical to information theory, 1975. (Not directly cited in text) [3], Algorithmic information theory, IBM JOURNAL OF RESEARCH AND DEVELOPMENT, vol. 21, pp. 350 359, 1977. (Not directly cited in text) [4], On the length of programs for computing finite binary sequences: Statistical considerations, Journal of the ACM, vol. 13, pp. 547 569, 1969. (Not directly cited in text) [5] T. M. Cover and J. A. Thomas, Elements of Information Theory, 2nd ed. New York: Wiley, 2006, ch. 14. (Not directly cited in text) [6] S. Evans, S. Bush, and J. Hershey, Information assurance through kolmogorov complexity, DARPA Information Survivability Conference Exposition II, 2001. DISCEX 01. Proceedings, vol. 2, pp. 322 331 vol.2, 2001. (Not directly cited in text) [7] A. N. Kolmogorov, Three approaches to the quantitative definition of information, Problems of Information Transmission, vol. 1, no. 1, pp. 1 7, 1965. (Not directly cited in text) [8] S. Leung-Yan-Cheong and T. Cover, Some equivalences between shannon entropy and kolmogorov complexity, Information Theory, IEEE Transactions on, vol. 24, no. 3, pp. 331 338, may. 1978. (Not directly cited in text) [9] Z. Lian-yi and Z. Chong-xun, Analysis of kolmogorov complexity in spontaneous eeg signal and it s application to assessment of mental fatigue, Bioinformatics and Biomedical Engineering, 2008. ICBBE 2008. The 2nd International Conference on, pp. 2192 2194, may. 2008. (Not directly cited in text) [10] A. A. Muchnik, I. Mezhirov, A. Shen, and N. Vereshchagin, Game interpretation of kolmogorov complexity, ArXiv e-prints, mar 2010. (Not directly cited in text) [11] J. Schmidhuber, Discovering solutions with low kolmogorov complexity and high generalization capability, in Machine learning: proceedings of the twelfth international conference. Morgan Kaufmann Publishers, 1995, pp. 488 496. (Not directly cited in text) [12] R. Solomonoff, A preliminary report on a general theory of inductive inference, 1960. (Not directly cited in text) [13], A formal theory of inductive inference, part l, Information and Control, vol. 7, pp. 1 22, 1964. (Not directly cited in text) [14] L. Spracklin and L. Saxton, Filtering spam using kolmogorov complexity estimates, Advanced Information Networking and Applications Workshops, 2007, AINAW 07. 21st International Conference on, vol. 1, pp. 321 328, may 2007. (Not directly cited in text) [15] N. Vereshchagin, Kolmogorov complexity and games, 2008. (Not directly cited in text) [16] P. Vitanyi, Three approaches to the quantitative definition of information in an individual pure quantum state, Computational Complexity, 2000. Proceedings. 15th Annual IEEE Conference on, pp. 263 270, 2000. (Not directly cited in text)