A Characterization of Guesswork on Swiftly Tilting Curves

Size: px

Start display at page:

Download "A Characterization of Guesswork on Swiftly Tilting Curves"

Delilah McKinney
5 years ago
Views:

1 A Characterization of Guesswork on Swiftly Tilting Curves Ahma Beirami, Robert Calerbank, Mark Christiansen, Ken Duffy, an Muriel Méar Abstract arxiv: v [cs.it] 27 Jan 208 Given a collection of strings, each with an associate probability of occurrence, the guesswork of each of them is their position in an orere list from most likely to least likely, breaking ties arbitrarily. Guesswork is central to several applications in information theory: Average guesswork provies a lower boun on the expecte computational cost of a sequential ecoer to ecoe successfully the intene message; the complementary cumulative istribution function of guesswork gives the error probability in list ecoing; the logarithm of guesswork is the number of bits neee in optimal lossless oneto-one source coing; an guesswork is the number of trials require of an aversary to breach a passwor protecte system in a brute-force attack. In this paper, we consier memoryless string-sources that generate strings consisting of i.i.. characters rawn from a finite alphabet, an characterize their corresponing guesswork. Our main tool is the tilt operation on a memoryless string-source. We show that the tilt operation on a memoryless string-source parametrizes an exponential family of memoryless stringsources, which we refer to as the tilte family of the string-source. We provie an operational meaning to the tilte families by proving that two memoryless string-sources result in the same guesswork on all strings of all lengths if an only if their respective categorical istributions belong to the same tilte family. Establishing some general properties of the tilt operation, we generalize the notions of weakly typical set an asymptotic equipartition property to tilte weakly typical sets of ifferent orers. We use This work was presente in part in the 53r Annual Allerton Conference on Communication, Control, an Computing Allerton 205 []. This work was supporte in part by the Air Force Office of Scientific Reseah AFOSR uner awars FA an FA ; the work of K. Duffy an M. Méar was also supporte in part by Netapp through a Faculty Fellowship. A. Beirami is with the Research Laboratory of Electronics, Massachusetts Institute of Technology, USA beirami@mit.eu. R. Calerbank is with the Department of Electrical an Computer Engineering, Duke University, USA robert.calerbank@uke.eu. M. Christiansen an K. Duffy are with the Hamilton Institute, National University of Irelan Maynooth, Irelan mark.christiansen, ken.uffy@nuim.ie. M. Méar is with the Research Laboratory of Electronics, Massachusetts Institute of Technology, USA mear@mit.eu.

2 this new efinition to characterize the large eviations for all atypical strings an characterize the volume of weakly typical sets of ifferent orers. We subsequently buil on this characterization to prove large eviation bouns on guesswork an provie an accurate approximation of its probability mass function. Inex Terms Tilting; Orering; Guesswork; One-to-One Coes; Rényi Entropy; Information Geometry; Weakly Typical Set. I. INTRODUCTION A. Motivation An early motivation for the stuy of guesswork is to provie lower bouns on the computational complexity of sequential ecoing [2] where the ecoer sequentially examines several paths until it fins the correct coe sequence. Guesswork was first stuie by Massey [3], who showe that average guesswork is lower boune in terms of the Shannon entropy of the source. Arıkan [4] consiere guesswork on memoryless sources an prove that the moments of guesswork are relate to the Rényi entropy rates of the source in the it as the strings become long. This has been generalize to ergoic Markov chains [5] an a wie range of stationary sources [6]. It has also been stuie subject to an allowable istortion [7], an subject to constraine Shannon entropy [8]. Hanawal an Sunaresan [9] reerive the moments of guesswork assuming that the logarithm of guesswork satisfies a large eviations principle LDP. Christiansen an Duffy [0] establishe that guesswork satisfies a LDP an characterize the rate function. They also provie an approximation to the istribution of guesswork using the rate function associate with the LDP. In list ecoing, rather than eclaring one coewor message, the ecoer returns a list of size L of possible coewors [] [3]. The error event in list ecoing is the event where the message coewor is not returne in the list of the L returne coewors. For strings of fixe length, as the size of the list grows larger, the probability of error ecreases. The list ecoer can be use as an inner coe for an outer coe that eventually ecies the intene message from the returne list of messages by the inner list ecoer. The optimal list ecoer returns the L most likely messages given the receive coewor. Therefore, the probability of the error event in the optimal list ecoer is characterize by the complementary cumulative istribution function of guesswork conitione on the receive coewor see [4], [5]. Guesswork can also be use to quantify computational security against brute-force attack [6]. Suppose that a secret string is rawn from a given process on a finite alphabet, an is use to secure a passwor-

3 2 protecte system. Further suppose that the system only grants access if the correct secret string is provie exactly, an oes not reveal any information otherwise. If a brute-force attacker aversary ecies to guess the secret string by query, the best strategy at an aversary s isposal is to query the possible secret strings from the most likely to the least likely in terms of minimizing the average number of queries or maximizing the probability of success using a finite number of queries. In this case, guesswork gives the number of queries that it woul take an attacker to fin the secret string using the best strategy. Christiansen et al. [7] stuie guesswork over the weakly typical set an prove that the exponent is strictly smaller than that of a uniform set with the same support size; they showe that the average guesswork of a passwor over an erasure channel oes not relate to the average noise in general [8]; Christiansen et al. also consiere the setting where an attacker wants to guess one or more out of many secret strings rawn inepenently from not necessarily ientical string-sources [6]. Finally, the iea of guesswork has been extene to the setup where the probability istribution is unknown [8], [9], [20]. In the context of source coing, it is known that the length of the optimal one-to-one source coe that nee not satisfy Kraft s inequality is within one bit of the logarithm of guesswork. The average length of a one-to-one coe is relate to the normalize zeroth moment of guesswork or alternatively the expecte logarithm of guesswork see [20]. The source coing problem without prefix constraint ates back to Wyner [2], who showe that the average coewor length of one-to-one coes is upper boune by the entropy. Alon an Orlistky erive a lower boun on the average coewor length in terms of the Shannon entropy [22], which was recently revisite for other moments of guesswork [23]. Szpankowski [24] erive the asymptotic average coewor length of one-to-one coes on binary memoryless sources, which was subsequently generalize to finite-alphabet inepenent an ientically istribute i.i.. processes [25], [26], an later stuie uner a universal setup [20], [27]. B. Problem setup, efinitions, an notation Let X := a,..., a X be an orere set that enotes a finite alphabet of size X. We will subsequently use the orering on X to break ties in guesswork an reverse guesswork. Denote an n-string over X by x n+k k = x k x k+... x n+k X n. Further, let x n = x n an for i > n, xn i =, where enotes the null string. Let µ n enote a multinomial probability istribution on the sample space X n, where µ is the corresponing categorical istribution on X, i.e., µ n x n = n i= µ x i. We refer to µ := µ n : n N as a memoryless string-source, also calle source. Let θ = θ,..., θ X be the parameter vector corresponing to the categorical istribution µ. That is θ i := µ a i = P [X = a i ] for all i [ X ]. By efinition, θ is an element of the X imensional simplex of all stochastic vectors of size X. Denote Θ X as the X -ary probability simplex, i.e., Θ X := θ R X + : i [ X ] θ i =. We reserve

4 3 upper case letters for ranom variables while lower case letters correspon to their realizations. For example, X n enotes an n-string ranomly rawn from µ while x n X n is a realization of the ranom n-string. Note that as string lengths become large, the logarithm of the guesswork normalize by string length has been shown to satisfy a LDP for a more general set of sources [0]. While existing results on guesswork consier the orering of strings from most likely to least likely, either unconitionally or conitionally starting at [4], or universal guessing orers [9], the more restrictive assumptions use here enable us to treat, for the first time, the reverse guesswork process. In reverse guessing, strings are orere from least likely to most likely, breaking ties arbitrarily. At first glance, consieration of the worst possible strategy seems to be of ite value, but we will show that is not the case. Forwar guesswork provies goo estimates of the likelihoo of quickly iscovere strings, but we shall emonstrate reverse guesswork focuses on better estimates on the likelihoo that large numbers of guesses occur. We also establish an inherent geometrical corresponence between string-sources whose orers are reverse. Amongst other results, we prove that logarithm of reverse guesswork scale by string length satisfies a LDP with a concave rate function. The concavity captures the fact that as one moves along the reverse orer the rate of acquisition of probability increases initially. Due to it, the approach employe in [0], which leverage earlier results on the scaling behavior of moments of guesswork, has no hope of success an we instea use istinct means. Definition information: Let information ı n µ : X n R + be efine as: ı n µx n := log µ n x n. Note that throughout this paper we use log to enote the natural logarithm an hence information is measure in nats as efine above. Definition 2 Shannon entropy: Let H n µ enote the entropy, which is efine as the expecte information of a ranom n-string, an quantifies the average uncertainty that is reveale when a ranom n-string rawn from µ is observe: H n µ := E µ ı n µx n = x n X n µ n x n ı n µx n, 2 where E µ enotes the expectation operator with respect to the measure µ. The entropy rate, is enote by Hµ an is given by Hµ := n n Hn µ = H µ. Definition 3: For any n N, we efine [n] :=,..., n an [n] 0 := 0,..., n. R + := x R x > 0.

5 4 In what follows we put forth two assumptions on the string-source for us to prove that the logarithm of reverse guesswork satisfies a LDP. We will also later characterize the set of string-sources that satisfy the assumptions. Assumption : We assume that µ x > 0 for all x X, i.e., θ i > 0 for all i [ X ]. Define ε := min x X µ x, 3 ε := max x X µ x. 4 Assumption exclues the bounaries of the probability simplex an ensures that the likelihoo of the least likely n-string scales exponentially with n. In other wors, for all n N, an all x n X n n log µ n x n log <. 5 ε Assumption 2: We assume that arg min x X µ x an arg max x X µ x are unique. Assumption 2 ensures that the most likely an the least likely wors are unambiguously etermine so that tilting the string-source with large positive an negative orers woul lea to a eterministic stringsource that only outputs a single string with high probability. As such, for each istinct i, j [ X ], the exclue set of parameter vectors will be a subset of the set of all X 2-imensional hyperplanes that satisfy θ i = θ j, an the number of such hyperplanes is X 2. As such, the set of parameter vectors that o not satisfy Assumptions an 2 can be characterize by X 2 -imensional hyperplanes in the X -imensional probability simplex Θ X. 2 The probability simplex for a ternary alphabet, Θ 3, is epicte in Fig., where the green ashe an green soli lines represent the set of string-sources in Θ 3 that o not satisfy Assumption 2. Note that in this case, X 2 = an X 2 = 3 resulting in three -imensional hyperplanes. Definition 4 set M X of string-sources: We enote by M X, the set of all memoryless string-sources on alphabet X that satisfy Assumptions an 2. In this paper, we focus our attention to string-sources in M X. Definition 5 guesswork/optimal orering for µ: A one-to-one function G n µ : X n [ X n ] is sai to be the optimal orering for µ on n-strings if µ n x n > µ n y n G n µx n < G n µy n, 6 an further if µ n x n = µ n y n, then G n µx n < G n µy n if an only if x n appears prior to y n in lexicographic orering. 2 In other wors, the exclue set is of zero Lebesgue volume in the probability simplex.

6 5 Definition 6 reverse guesswork/reverse orering for µ: We efine R n µx n as the reverse guesswork, given by R n µx n := X n G n µx n 7 While guesswork correspons to the best guessing strategy for recovering the string x n, reverse guesswork correspons to the worst guessing strategy for the string-source µ as it procees from the least likely string all the way to the most likely. Hence, the statistical performance of any other guessing strategy orering on n-strings lies somewhere between that of guesswork an reverse guesswork. As will become clear later in the paper, our interest in reverse guesswork is mainly ue to its usefulness in characterizing the PMF of guesswork. Definition 7 logarithm of forwar/reverse guesswork: We efine the logarithm of forwar guesswork g n µ : X n [ X n ] an the logarithm of reverse guesswork r n µ : X n [ X n ] as g n µx n := log G n µx n, 8 r n µx n := log R n µx n, 9 where G n µ an R n µ are guesswork an the reverse guesswork efine in 6 an 7, respectively. Definition 8 varentropy [28]: Let the varentropy, enote by V n µ, be efine as V n µ = var µ ı n µ X n 0 where var µ enotes the variance operator with respect to the measure µ. The varentropy rate is enote by V µ an is given by V µ := n n V n µ = V µ. Definition 9 relative entropy: Denote D n µ ρ as the relative entropy, also referre to as the Kullback-Leibler KL ivergence, between the measures µ n an ρ n, given by D n µ ρ := µ µ n x n n x n log ρ n x n = nd µ ρ. 2 x n X n Definition 0 cross entropy: Let the cross entropy, enote by H n µ ρ, be efine as H n µ ρ :=E µ log ρ n X n = µ n x n log ρ n x n. 3 x n X n =H n µ + D n µ ρ 4 =nh µ + nd µ ρ. 5

7 6 Observe that H n µ µ = H n µ recovers the Shannon entropy. As a natural generalization of varentropy an cross entropy, we efine cross varentropy in the following efinition. Definition cross varentropy: Let the cross varentropy, enote by V n µ ρ, be efine as V n µ ρ :=var µ log ρ n X n = [ 2 µ n x n log ρ n x n H µ ρ] n. 6 x n X n Note that V n µ µ = V n µ simplifies to the varentropy. Further note that V n µ ρ = nv µ ρ because of our restriction to i.i.. string-sources. An operational interpretation of the cross entropy H n µ ρ arises in prefix-free source coing, where it is known that, ignoring the integer constraints on the coewor length, the length of the coewor for the string x n that minimizes the average coewor length equals ı n µx n = log µ n x n. In this case, the resulting average coewor length is given by E µ ı n µx n = E µ log µ n X n = H n µ. 7 Therefore, if a source coe is esigne for the mismatche string-source ρ, the length of the coewor for x n woul be chosen to be ı n ρx n. Hence, the average coewor length of the optimal prefix-free source coe that is esigne to minimize the average coewor length uner the mismatche istribution ρ is equal to the cross entropy: E µ ı n ρx n = E µ log ρ n X n = H n µ ρ. 8 Definition 2 Rényi entropy: For any istribution µ n, let the Rényi entropy of orer α, enote by H n α, be efine as [29] Hαµ n := α log [µ n x n ] α. 9 x n X n Further, efine the Rényi entropy rate of orer α if it exists as H α µ := n n Hn αµ = Hαµ. 20 Note that α H α µ = Hµ, which recovers the Shannon entropy. C. The tilt The primary objective of this paper is to evelop a tool, calle the tilt, for the characterization of guesswork, its moments, an its large eviations behavior. The tilt operation T on any string-source µ is formally efine below.

8 7 Definition 3 tilte string-source of orer α: Let the map T operate on a string-source µ an a real-value parameter α an output a string-source calle tilte µ of orer α, which is enote by T µ, α an is efine as T µ, α := T µ n, α : n N, 2 where T µ n, α with overloaing the notation for all n N an all x n X n is efine as T µ n, αx n := [µ n x n ] α y n X n[µn y n ] α. 22 Note that the tilt is a real analytic function of α in R for all µ M X an all n N, formally state in Lemma 7. Tilte µ of orer is the string-source µ itself, i.e., T µ, = µ. Further, for any µ M X, T µ, 0 = u X, where u X is efine as the uniform string-source on alphabet X such that for all x n X n Notice that by efinition u X M X. u n X x n := X n. 23 Observe that for all n N, T µ n, α is a multinomial istribution with the corresponing categorical istribution given by τθ, α = τ θ, α,..., τ X θ, α, where τ i : Θ X R Θ X for all i [ X ] is given by Let Γ + θ τ i θ, α := θ α i X ı= θα i Θ X enote the tilte family of θ an be given by. 24 Γ + θ := τθ, α : α R Observe that Γ + θ Θ X is a set of stochastic vectors in the probability simplex. Thus, the tilte family of a memoryless string-source with parameter vector θ is comprise of a set of memoryless string-sources whose parameter vectors belong to the tilte family of the vector θ, i.e., Γ + θ. Definition 4 reverse string-source: We efine the reverse of the string-source µ also calle µ- reverse, enote by T µ,. Note that the guesswork corresponing to µ-reverse is exactly the reverse guesswork. As an example, the reverse of a binary memoryless string-source with parameter vector θ = p, p is a binary memoryless string-source with τθ, = p, p. Observe that Rµ n is the optimal orering guesswork for the reverse string source T µ,. Further, G n µ is the reverse orering for T µ,. Definition 5 tilte/reverse family of a string-source: Let the set T µ + enote the tilte family of the string-source µ, an be given by T µ + := T µ, α : α R +. 26

9 8 0,0, θ - θ θ +,0,0 0,,0 θ 0 Fig. : The soli black lines epict the bounaries of the probability simplex Θ 3 of all ternary stochastic vectors, which are exclue from our analysis as they o not satisfy Assumption. The ashe green lines represent the string-sources that o not satisfy 29 in Assumption 2; an the soli green lines represent the string-sources that o not satisfy 30 in Assumption 2. The re pentagon marker is the ternary source parameter vector θ = 0.2, 0.3, 0.5; the blue soli curve epicts Γ + θ, i.e., the tilte family of θ; the re otte curve epicts Γ θ, i.e., the reverse family of θ; the yellow iamon marker correspons to the uniform parameter vector θ 0 = 3, 3, 3 ; an θ =, 0, 0 an θ = 0, 0,. Note that θ an θ are only achieve in the it an o not belong to either Γ + θ or Γ θ. Further, efine the reverse family of the string-source as T µ := T µ, α : α R. 27 The first notable property of the tilte family of µ is that it parametrizes a stanar exponential family of string-sources with α. In other wors, for any n N, the family of istributions T µ n, α : α R parametrize by α characterizes an exponential family of istributions on X n. As a working example throughout the paper, we consier a memoryless string-source on a ternary alphabet X = 3 with the parameter vector θ = 0.2, 0.3, 0.5. The ternary alphabet is general enough to encompass the ifficulties of the problem while simple enough for us to emonstrate the geometric techniques evelope in this paper. The ternary probability simplex is epicte in Fig., where the tilte family Γ + θ is plotte for θ = 0.2, 0.3, 0.5. As can be seen, the tilte family of θ is a parametric curve that lives in the 2-imensional simplex of stochastic vectors. The curve starts arbitrarily close to the maximum entropy uniform parameter vector θ 0 for α = 0 + an then moves towars one of the

10 9 zero-entropy corner points of the simplex corresponing to the most likely symbol as α. We shall show that these properties hol generally for all string-sources in M X. As a final note, the tilt is intimately relate to the concept of information geometry see [30], where several geometric properties of the Kullback-Leibler ivergence relative entropy have been stuie an their relevance to problems in information theory have been investigate. In particular, Borae an Zheng applie concepts from information geometry to the analysis of error exponents [3]; an Huang et al. performe a geometric stuy of hypothesis testing [32]. This work also buils upon the insights on the geometry of the probability simplex in the context of information theory an statistics pioneere by Barron, Clarke, an Cover [33], [34]. The preinary version of this work relie on the metho of types see [] while in this paper we buil on a generalize notion of the weakly typical set, calle the tilte weakly typical sets. We believe that the current treatment may be generalize to a class of stationary ergoic string-sources as iscusse in the concluing remarks. D. Organization an contribution In Section II, we provie the main properties of the tilt operation. In particular, we characterize the entropy H n T µ, α, the cross entropy H n T µ, α µ, the relative entropy D n T µ, α µ of the tilite string-source as well as the Rényi entropy Hαµ. n These quantities are use in Section V to implicitly characterize the large eviations rate function of the logarithm of reverse guesswork, the logarithm of forwar guesswork, an information. We further show that the erivatives with respect to the parameter α of these quantities can be expresse in terms of the varentropy V n T µ, α an the cross varentropy V n T µ, α µ of the tilte string-source. In Section III, we efine the equivalent orer class of a string-source as the set of all string-sources that result in the same guesswork for all n, an enote this set of string-sources by C µ. In Theorem, we show that the equivalent orer class of µ is equal to the tilte family, i.e., C µ = T µ +, proviing an operational meaning to the tilte family. Further, the set of all string-sources whose optimal orerings are inverse for µ is equal to Tµ, i.e., C T µ, = Tµ. This result raws an important istinction between information an guesswork. The string-source µ is uniquely etermine if information ı n µx n is provie for all x n X n. On the other han, it is not possible to recover uniquely the string-source µ if one is provie with the guesswork G n µx n for all strings x n X n. Our result shows that one can only etermine the string-source up to equivalence within the tilte family that the source belongs to given the guesswork for all x n X n leaing to one less egree of freeom. This result has implications in terms of universal source coing [20], [27] an universal guesswork [8], [9], where the egree of freeom in the moel class etermines the penalty of universality, which is calle reunancy in source coing.

11 0 In Section IV, we generalize the notion of the weakly typical set. The weakly typical set comprises of all strings x n X n such that e Hn µ nɛ < µ n x n < e Hn µ+nɛ. An important implication of the weakly typical set known as the asymptotic equipartition property is that the typical set roughly contains e Hn µ strings each of probability e Hn µ. For sufficiently large n, the probability of the typical set is close to one, i.e., if a string is ranomly rawn from the source it is in the weakly typical set with high probability. The weakly typical set is use to prove key results in information theory that are ominate by the typical events, such as the source coing theorem, an the channel coing theorem. Despite the great intuition that typicality provies on such quantities, the arguments base on typicality fall short in proviing intuition on other key quantities that are ominate by atypical events, such as the error exponents in ecoing, an the cut-off rate in sequential ecoing. In Section IV, we consier all strings x n X n such that e Hn T µ,α µ nɛ < µ n x n < e Hn T µ,α µ+nɛ for α R, which we efine as the tilte weakly typical set of orer α an enote by A n µ,α,ɛ see Definition 8. We show that for a fixe ɛ, for any string x n X n, there exists an α such that x n is containe in the tilte weakly typical set of orer α. Hence, this notion coul be use to cluster/partition all of the strings x n X n base on their likelihoos. In Theorem 2, we erive tight bouns on the probability, size, guesswork, an reverse guesswork for all elements of the tilte weakly typical set of any orer α. Roughly speaking, for any α R +, we show that A n µ,α,ɛ contains e Hn T µ,α strings where the likelihoo of each string is e Hn T µ,α µ, an the probability of the entire set is e Dn T µ,α µ, an the guesswork for each element in the set is e Hn T µ,α. For any α R, we show that A n µ,α,ɛ contains e Hn T µ,α strings where the likelihoo of each string is e Hn T µ,α µ, an the probability of the entire set is e Dn T µ,α µ, an the reverse guesswork for each element in the set is e Hn T µ,α. Note that α = recovers the weakly typical set as H n T µ, µ = H n T µ, = H n µ, an for α, this provies a generalization of the well-known weakly typical set. This generalization coul be use to provie new insights on unerstaning an proving results on information theoretic quantities that are ominate by atypical events. In Section V, we prove that the logarithm of reverse guesswork rµx n n, the logarithm of forwar guesswork gµx n n, an information ı n µx n satisfy a LDP an provie an implicit formula for the rate function in each case base on entropy, cross entropy, an relative entropy. Although some of the results in this section have been previously prove in more generality, we believe that the machinery built upon the tilt operation in this paper as well as the information theoretic characterization of the rate functions provie useful intuition in characterizing the large eviations behavior of these quantities. In particular, the LDP for the logarithm of reverse guesswork coul not have been establishe using the previous methos by Christiansen an Duffy [0], as the resulting rate function here is concave. The most common way

12 to prove a LDP is the Gärtner-Ellis Theorem [35, Theorem 2.3.6]. There one first etermines the scale cumulant generating function of the process of interest, which ientifies how moments scale, an then leverages properties of it to establish the LDP. This approach can only work if the resulting rate function is convex. Here, that is not the case. In Section VI, we provie an approximation on the istribution of guesswork that captures the behavior even for very small string lengths. The approximation is built base on the tilt operation, guesswork, an reverse guesswork. We empirically show that the approximation can be applie to a variety of string-sources to characterize the PMF of guesswok. Finally, the conclusion is provie in Section VII, where we also present ieas on how to generalize these results beyon the setting consiere in this paper. II. PROPERTIES OF THE TILTED STRING-SOURCES In this section, we provie the main properties of the tilt operation that shall be use in proving our results on the characterization of guesswork. This section may be skippe by the reaer an only referre to whenever a particular lemma is neee. Lemma : For any α, β R, T T µ, α, β = T T µ, β, α = T µ, αβ. 28 Proof: This is obtaine from the efinition using light algebraic manipulation. Lemma 2: The optimal orering for µ is the reverse orering for T µ, an the reverse orering for µ is the optimal orering for T µ,. Proof: The statement is a consequence of the efinition. Lemma 3: For any µ M X, Proof: The Lemma is a irect consequence of Assumption 2. HT µ, α = 0, 29 α HT µ, α = α Next, we provie a lemma that relates Rényi entropy, Shannon entropy, relative entropy, an cross entropy. Lemma 4: For any µ M X, an any α R, the following relations hol: H n T µ, α µ H n αµ = α Dn T µ, α µ, 3

13 2 where α Dn T µ, α µ := 0. α= Proof: To establish 3, H n T µ, α H n αµ = H n T µ, α µ H n αµ α Dn T µ, α µ = α α α α Dn T µ, α µ, 32 x n X n T µ n, αx n log µ n x n 33 [µ n x n ] α 34 x n X n [µ n T µ n, αx n x n ] α / y log n X n[µn y n ] α µ n x n x n X n 35 = 0, 36 where 35 follows from the efinition, an 36 hols because the term y n X n[µn y n ] α in 35 is a constant, an hence 35 coul be ecompose into two terms one of which cancels the right han sie of 33, an the other cancels 34. Finally, 32 is establishe by putting together 4 an 3. Lemma 5: For any µ M X, an any α R, the following hols: log T µ n, αx n Hn T µ, α = α log µ n x n Hn T µ, α µ. 37 Proof: The lemma is prove by consiering the following chain of equations: log T µ n, αx n Hn T µ, α = α log µ n x n + log [µ n x n ] α x n X n T µ n, αx n x n X n log = α α log µ n x n Hn T µ, α µ µ n x n + log [µ n x n ] α x n X n. Lemma 6: For any µ M X, an any α R, the following hols: α 2 V n T µ, α µ = V n T µ, α. 38

14 3 Proof: We have α 2 V n T µ, α µ = x n X n T µ n, αx n = x n X n T µ n, αx n = V n T µ, α, [ α log µ n x n ] 2 H n T µ, α µ [ ] 2 log T µ n, αx n Hn T µ, α 39 where 39 follows from Lemma 5. Lemma 7: For any string-source µ M X, for any n N, T µ n, α is a real analytic function of α R. In particular, α T µn, α exists; an α 0 T µ n, α = u n X, where un X X n. is the uniform istribution on Proof: The statement follows from the facts that the exponential function is real analytic in its argument an that the enominator in 22 in the efinition of the tilt is non-zero for all α R. The it as α 0 hols since µ n x n > 0 for all x n X n by Assumption. Lemma 8: For any string-source µ M X, for any n N, H n T µ, α, D n T µ, α µ, H n T µ, α µ, V n T µ, α, an V n T µ, α µ are continuous an ifferentiable functions of α R. Proof: Note that H n ρ, D n ρ µ, H n ρ µ, V n ρ, an V n ρ µ are continuously ifferentiable with respect to ρ M X, an by Lemma 7, T µ, α is also continuous an ifferentiable with respect to α R, an hence the overall composition by consiering ρ = T µ, α is continuous an ifferentiable with respect to α R. Lemma 9: For any µ M X, an any n N, an any α R: α T µn, αx n = T µ n, αx n H n T µ, α µ log µ n x n. 40 Proof: Let T µ n, α 0 x n := α T µn, αx n. 4 α=α0 Note that by Lemma, we know that for any β > 0: T µ n, αx n = T T µ n, β, α x n. 42 β

15 4 Differentiating both sies with respect to α, for all α 0, β > 0 we have α T µn, αx n = T µ n, α 0 x n = α=α0 α T T µ n, β, α β = β T T µ n, β, α 0 β x n α=α0 43 x n 44 = α 0 T T µ n, α 0, x n, 45 where 45 follows by setting β = α 0. In other wors, 45 implies that the erivative of a tilt with respect to α can be erive by taking the erivative at α = an rescaling the erivative evaluate at the tilte istribution. Consequently, to prove the lemma it suffices to prove that α T µn, αx n = µ n x n H n µ log α= µ n x n, 46 an then it follows from 45 that for any α 0, we will arrive at the following: H n T µ, α log α T µn, αx n = α T µn, αx n T µ n, αx n. 47 On the other han, the esire result is obtaine by consiering Lemma 5, an noting the continuity an ifferentiability of T µ n, α at 0 from Lemma 7 an that of HT µ n, α at 0 from Lemma 8. To establish 46, note the efinition of the tilt in 22 an the fact that the erivative of the numerator with respect to α is [µ n x n ] α log µ n x n an the erivative of the enominator with respect to α is -H n µ. Lemma 0: For any µ M X, we have α 0 α 2 V n T µ, α = V n u X µ. 48 Proof: Observe that α 0 α 2 V n T µ, α = α 0 V n T µ, α µ 49 = V n u X µ, where 49 follows from Lemma 6, completing the proof. Lemma : For any µ M X, an any α R, α Hn T µ, α = αv n T µ, α µ. 50 Proof: First, owing to Lemma 8, the erivative of H n T µ n, α with respect to α exists.

16 5 To prove the result for all α 0, it suffices to show that α Hn T µ, α = V n µ 5 α= for any given µ M X. Then, by Lemma, T, α + α = T T, + α/α, α which for α 0 leas to see 45 in the proof of Lemma 9 for a more formal statement of this part of the claim: α Hn T µ, α = α V n T µ, α. 52 This, in conjunction with Lemma 6 an continuity an ifferentiability of H n T µ, α with respect to all α R from Lemma 8 leas to the esire result. To show that 5 hols, we have α Hn T µ, α α= = α T µn, αx n log x n X n α= µ n x n + µ n x n α log T µ n, αx n x n X n α= = µ n x n H n µ log µ n x n log µ n x n x n X n + µ n x n log µ n x n Hn µ x n X n = E µ H n µ log µ n X n log µ n X n 53 = var µ ı n µx n = V n µ, 54 where 53 follows from Lemma 9, an 54 follows from the efinition of the varentropy in 0. The it as α 0 also follow from Lemma 0. Lemma 2: For any µ M X an any α R, α Hn T µ, α µ = V n T µ, α µ. 55

17 6 Proof: We have α Hn T µ, α µ = x n X n = α T µn, αx n log x n X n T µ n, αx n µ n x n H n T µ, α µ log = E T µ,α H n T µ, α µ log µ n X n µ n x n log log µ n X n µ n x n 56 = var T µ,α ı n µx n 57 = V n T µ, α µ, where 56 follows from Lemma 9, an 57 follows from the efinition of information in an cross entropy in 3 leaing to the fact that E T µ,α log µ n X n = H n T µ, α µ. 58 Lemma 3: For any µ M X an any α R, α Dn T µ, α µ = α V n T µ, α µ. 59 Proof: Note that from 4, D n T µ, α µ = H n T µ, α µ H n T µ, α. 60 The lemma then follows by ifferentiating both sies with respect to α an applying Lemmas an 2 on the right han sie. Lemma 4: For any µ M X an any α, α Hn αµ = α 2 Dn T µ, α µ, 6 an α Hn αµ = α= 2 V n µ. 62 Proof: To establish 6, consier: α Hn αµ = α Hn T µ, α µ α α Dn T µ, α µ = α 2 V n T µ, α + α 2 V n T µ, α α 2 Dn T µ, α µ 64 = α 2 Dn T µ, α µ, 65 63

18 7 where 63 follows by invoking 3 in Lemma 4, an 64 follows from the chain rule of erivatives. To establish 62, we have α Hn αµ = α= α α Hn αµ 66 = α D n T µ, α µ α 2 67 = α α α V n T µ, α 2 2α 68 = 2 V n µ, 69 where 66 follows from the continuity of the erivative of H n αµ with respect to α, 67 follows from 6, an 68 follows from L Hôpital s rule an Lemma 3. This result recovers the fact that Rényi entropies are monotonically ecreasing with respect to the orer, an further characterizes their rate of change with respect to the parameterization. Lemma 5: For any µ M X, Further, for all α R, n n n n V n T µ, α µ = V T µ, α µ. 70 n Hn T µ, α = HT µ, α, 7 n n Hn T µ, α µ = HT µ, α µ, 72 n Dn T µ, α µ = DT µ, α µ, 73 n n Hn αµ = H α µ, 74 HT µ, α = αv T µ, α µ, α 75 HT µ, α µ = V T µ, α µ, α 76 DT µ, α µ = α V T µ, α µ. α 77 α H αµ = DT µ, α µ, 78 α 2 where DT µ, α µ is ifferentiable with respect to α, an α DT µ, α µ := 2 α= 2 V µ. Proof: The first part of the claim is straightforwar as the equalities hol for all n an hence also hol in the it as n. The claim in 75 hence follows by invoking Lemma. Equations 76 an 77 an 78 are establishe similarly consiering Lemmas 2, 3, an 4, respectively. Lemma 6: If µ M X, then, for all α 0, we have T µ, α M X. In other wors, M X is close uner all non-zero tilts, an hence T + µ, T µ M X.

19 8 Proof: We nee to show that T µ, α satisfies Assumptions an 2. To check Assumption, for α R +, we have: Further, consiering α, an for all 0 < α, n log T µ n, αx n = y n log n X n[µn y n ] α [µ n x n ] α = log [µ n y n ] α + log n [µ n x n ] α. 79 y n X n α log X α log X n log 0 n log y n X n [µ n y n ] α 0, 80 y n X n [µ n y n ] α α log X log X. 8 Thus, for all α R +, by taking the maximum of the upper bouns an the minimum of the lower bouns in 80 an 8, an applying them to 79, we have α log X + α log ε n log T µ n, αx n log X + α log ε, 82 where ε an ε are efine in 3 an 4, respectively. On the other han, for all α R, we have n log T µ n, αx n = log [µ n y n ] α + log n [µ n x n ] α y n X n Thus, for all α R, we have Next, we nee to show that for all α 0, n log y n X n ε nα 83 = log X α log ε. 84 n log T µ n, αx n log X + α log <. 85 ε HT µ, α > Due to Lemma 5, HT µ, α is ifferentiable with respect to α an its erivative is given by αv T µ, α µ. On the other han, for α > 0, the erivative is strictly positive implying that HT µ, α is strictly ecreasing with respect to α, an hence it is non-zero for all α > 0. This together with 85 emonstrate that for all α R, T µ, α satisfies Assumption.

20 9 To show that Assumption 2 hols for the tilte source, we note that for any α > 0, we have arg min T µ x α x X µ, αx = arg min x X y X µ y α 87 = arg min x X µ x α 88 = arg min x X µ x 89 which is unique by the fact that µ satisfies Assumption 2. Similarly, we can argue that arg max x X T µ, αx is also unique. On the other han, for α < 0, we have arg min T µ x α x X µ, αx = arg min x X y X µ y α 90 = arg min x X µ x α 9 = arg max x X µ x 92 which is again unique by the fact that µ satisfies Assumption 2. Finally, arg max x X T µ, αx is unique by the fact that arg min x X µ x is unique completing the proof. III. EQUIVALENT ORDER CLASS In this section, we characterize the set of all string-sources that inuce the same optimal orering of the strings from the most likely to the least likely on all strings of all lengths. Definition 6 orer equivalent string-sources: We say two string-sources µ, ρ M X are orer equivalent an enote by µ ρ, if for all n N, G n µ = G n ρ, i.e., the optimal orering for µ is also the optimal orering for ρ. Definition 7 equivalent orer class: The equivalent orer class of a string-source µ M X is enote by C µ an is given by C µ = ρ M X : ρ µ. We also nee another efinition that generalizes the weakly typical set. Definition 8 tilte weakly typical set of orer α: For any α R, enote by A n µ,α,ɛ the tilte weakly typical set of orer α with respect to µ, which is the set of all x n such that for an ɛ R + : e Hn T µ,α µ nɛ < µ n x n < e Hn T µ,α µ+nɛ, 93 where H is the cross entropy efine in 3. Note that in the case of α =, A n µ,,ɛ is the familiar weakly typical set see 3.6 in [36]; an Definition 8 generalizes the efinition of the weakly typical set for α. We shall establish some useful properties of the tilte weakly typical sets in the next section.

21 20 Our interest in fining the equivalent orer class of a string-source µ lies in the fact that such stringsources are subject to the same optimal guessing proceure, which reners them equivalent in terms of many problems, such as brute-force passwor guessing, list ecoing, sequential ecoing, an one-to-one source coing. Theorem : For any string-source µ M X, if ρ C µ, then there exists a unique α R + such that ρ = T µ, α. Consequently, the equivalent orer class of µ is equal to the tilte family, i.e., C µ = T µ +. Further, the set of all string-sources whose optimal orerings are inverse for µ is equal to Tµ, i.e., C T µ, = Tµ. Theorem characterizes the set of all string-sources that woul result in the same orering of strings for any length. As iscusse above, these string-sources are calle orer equivalent as their optimal guesswork proceure is exactly the same. Theorem establishes that any such orer equivalent stringsource is necessarily the result of a tilt proviing an operational meaning to the tilte family. To prove the theorem, we state two intermeiate lemmas an a efinition. Lemma 7: For any string-source µ, the equivalent orer class of µ contains T µ +, i.e., T µ + C µ. Proof: To prove the lemma, we nee to show that for any α R +, we have T α µ µ, i.e., T α µ C µ. To this en, we show that G n µ is also the optimal orering on n-strings for µ. We have T µ n, αx n < T µ n, αy n G n µx n > G n µy n. 94 By efinition, this establishes that G n µ is also the optimal orering for T µ n, α. It suffices to show that for any α R +, we have µ n x n > µ n y n T µ n, αx n > T µ n, αy n. 95 We show here that µ n x n > µ n y n [µ n x n ] α > [µ n y n ] α 96 [µ n x n ] α z n X n[µn z n ] α > [µnyn]α z n X n[µn z n ] α T µ n, αx n > T µ n, αy n, 97 where 96 hols because α > 0, an 97 follows from the efinition of the tilt in 22. Next, we state a simple lemma about the weakly typical set which is a special case of the tilte weakly typical sets.

22 2 Lemma 8: For any µ M X, P µ X n A n V n µ µ,,ɛ n 2 ɛ 2, 98 A n µ,,ɛ > e Hn µ nɛ V n µ n 2 ɛ 2, 99 A n µ,,ɛ < e Hn µ+nɛ. 00 Proof: P µ X n A n µ,,ɛ = Pµ ı µ X n H n µ nɛ 0 V n µ n 2 ɛ 2, 02 where 02 follows from Chebyshev s inequality. Note that Chernoff bouns coul be use to establish an exponential ecay in 02 but the weaker boun above is sufficient for our purposes. To erive the lower boun in 99 on the size of the set A µ,,ɛ, observe that The boun in 00 is establishe similarly. A µ,,ɛ P µx n A µ,,ɛ max xn A µ,,ɛ µ n x n > V n µ n 2 ɛ 2 e Hn µ+nɛ = V n µ n 2 ɛ 2 e Hn µ nɛ. Definition 9 Shannon entropy contour: Let S h M X enote the Shannon entropy contour h, which is the set of all string-sources in M X whose entropy rate equals h 0, log X, i.e., S h := µ M X : Hµ = h. 03 Next, we provie the proof of the main result of this section on the equivalent orer class of a stringsource. Proof of Theorem : To prove the theorem, we take two steps. We first show that one element of the tilte family T µ + is also containe in the intersection of the Shannon entropy contour an the equivalent orer class, i.e., T µ + C µ S h M X is not empty. We also show that for any string-source µ M X an any h 0, log X, if ρ, ρ 2 M X are in the intersection of S h see Definition 9 an C µ enote by C µ S h, then ρ = ρ 2. Putting these two steps together ensures that any other element in the intersection C µ S h is equal to a member of the tilte family, an completes the proof.

23 22 We first show that C µ S h is not empty an contains one element from T + µ. This is carrie out by noting the continuity an monotonicity of HT µ, α in the parameter α prove in Lemma 7 an noting that Hu X = log X an α H n T µ, α = 0. Next, we procee the proof of uniqueness by contraiction. Let ρ, ρ 2 C µ S h be two string-sources that are containe in the intersection of S h an C µ. Note that by Definition 7, ρ, ρ 2 M X. Therefore, by efinition as ρ, ρ 2 S h, we have Hρ = Hρ 2 = h. 04 The claim is negate to assume that ρ an ρ 2 are not equal, i.e., there exists δ R + such that for all N N there exists n > N such that E ρ2 log ρ n 2 X n ρ n > 2nδ. 05 Xn Then, we can ecompose the expectation over the typical set an the rest of the strings as follows: ρ n E ρ2 log 2 X n ρ n = ρ ρ n Xn 2 x n n log 2 x n ρ n 06 x n A n xn ρ 2,,ɛ + ρ ρ n 2 x n n log 2 x n ρ n. 07 xn x n A n ρ 2,,ɛ By Assumption for the string-source ρ M X, there exists ε such that On the other han, by 98 in Lemma 8, max log x n X n ρ n xn < n log ε. 08 x n A n ρ 2,,ɛ Putting together 08 an 09, we have ρ ρ n 2 x n n log 2 x n ρ n xn x n A n ρ 2,,ɛ x n A n ρ 2,,ɛ ρ n 2 x n V n ρ 2 n 2 ɛ 2, 09 ρ n 2 x n log ρ n V n ρ 2 xn nɛ 2 log ε = V ρ 2 ɛ 2 log ε, 0 leaing to the fact that the term in 07 is O n. Using the above in conjunction with the fact that P µ X n A n ρ 2,,ɛ V n µ n 2 ɛ, we conclue that for any N 2 0 N, there exists n > N 0 such that: ρ n E ρ2 log 2 X n ρ n X n A n Xn ρ 2,,ɛ > nδ. Consequently, there exists y n A n ρ 2,,ɛ such that log ρ n yn > log ρ n + nδ. 2 2 yn

24 23 Since y n A n ρ 2,,ɛ, by efinition, we have which in conjunction with 2 leas to log By swapping ρ an ρ 2 an noting that nh nɛ log ρ n nh + nɛ, 2 yn ρ n yn > log ρ n + nδ nh nɛ + nδ. 3 2 yn Dρ 2 ρ = 0 Dρ 2 ρ = 0, 4 an repeating similar arguments, we coul pick a ifferent sequence ŷ n X n such that an log ρ n > nh nɛ + nδ. 5 2 ŷn log ρ n nh + nɛ. ŷn Note that the above hols for any ɛ R +. By choosing ɛ to be small enough smaller than δ/2 for any N, there exists n > N such that ρ n 2 yn > ρ n 2 ŷn an hence G n ρ 2 y n < G n ρ 2 ŷ n. Similarly, we can show that for sufficiently large n, we have ρ n yn < ρ n ŷn, which leas to G n ρ y n > G n ρ ŷ n. Therefore, ρ an ρ 2 cannot be orer equivalent, which is a contraiction. Hence, it must be true that for any δ R +, there exists N 0 N such that for all n > N 0 ρ n E ρ2 log 2 X n ρ n < nδ, 6 Xn which irectly implies that Dρ ρ 2 = Dρ 2 ρ = 0, an ρ = ρ 2, i.e., any two string-sources in the intersection of C µ an S h must be equal. IV. TILTED WEAKLY TYPICAL SET The main result in this section is a generalization of the asymptotic equipartition property on the weakly typical set concerning information an guesswork. We call this generalize notion the tilte weakly typical set an efine it in Definition 8 in the previous section. Although these results resemble the asymptotic equipartition property on the weakly typical set, they are concerne with the large eviations, i.e., atyptical behavior, rather than the typical behavior. We will consequently use these results in establishing the large eviation properties of guesswork an information in the next section. We nee two more efinitions before we can provie the statement of the main result in this section. Definition 20: For any µ M X, let Bµ,α,ɛ n A n µ,α,ɛ be a subset such that B n µ,α,ɛ = 2 A n µ,α,ɛ an for any x n B n µ,α,ɛ an y n A n µ,α,ɛ B n µ,α,ɛ we have T µ n, αx n < T µ n, αy n.

25 24 Definition 2: For any µ M X, let Dµ,α,ɛ n A n µ,α,ɛ be efine as follows: Dµ,α,ɛ n := x n X n : T µ n, αx n > e Hn T µ,α n α ɛ. 7 Definition 22: For any µ M X, let Eµ,α,ɛ n A n µ,α,ɛ be efine as follows: Eµ,α,ɛ n := x n X n : T µ n, αx n < e Hn T µ,α+n α ɛ. 8 Theorem 2: For any µ M X, any ɛ R +, an any α 0, a For any x n A n µ,α,ɛ, µ n x n > e Hn T µ,α µ nɛ, 9 µ n x n < e Hn T µ,α µ+nɛ ; 20 b For any n N, A n µ,α,ɛ > V n T µ, α µ n 2 ɛ 2 e Hn T µ,α α nɛ, 2 A n µ,α,ɛ < e Hn T µ,α+ α nɛ ; 22 c For any n N, an α, 0 [,, P µ X n Dµ,α,ɛ n Pµ X n A n µ,α,ɛ > V n T µ, α µ n 2 ɛ 2 e Dn T µ,α µ α nɛ, 23 P µ X n A n µ,α,ɛ Pµ X n D n µ,α,ɛ e D n T µ,α µ+ α nɛ, 24 P µ X n E n µ,α,ɛ e D n T µ,α µ+ α nɛ ; 25 For any n N, an α 0, ], P µ X n Eµ,α,ɛ n Pµ X n A n µ,α,ɛ > V n T µ, α µ n 2 ɛ 2 e Dn T µ,α µ α nɛ, 26 P µ X n A n µ,α,ɛ Pµ X n E n µ,α,ɛ e D n T µ,α µ+ α nɛ, 27 e For any n N an any α > 0: P µ X n D n µ,α,ɛ e D n T µ,α µ+ α nɛ ; 28 x n B n µ,α,ɛ G n µx n > 2 V n T µ, α µ n 2 ɛ 2 e Hn T µ,α αnɛ, 29 x n Dµ,α,ɛ n G n µx n e Hn T µ,α+αnɛ, 30 x n Dµ,α,ɛ n G n µx n V n T µ, α µ n 2 ɛ 2 e Hn T µ,α αnɛ, 3 x n E n µ,α,ɛ G n µx n > e Hn T µ,α+αnɛ. 32

26 25 f For any n N an any α < 0: x n B n µ,α,ɛ R n µx n > 2 V n T µ, α µ n 2 ɛ 2 e Hn T µ,α α nɛ, 33 x n Dµ,α,ɛ n Rµx n n e Hn T µ,α+ α nɛ, 34 x n Dµ,α,ɛ n Rµx n n V n T µ, α µ n 2 ɛ 2 e Hn T µ,α α nɛ, 35 x n E n µ,α,ɛ R n µx n > e Hn T µ,α+ α nɛ. 36 Recall that Shannon entropy, cross entropy, an relative entropy are measure in nats throughout this paper. We first provie a few lemmas that specialize to the typical set, which we use to prove the theorem. Lemma 9: For any µ M X, let x n Bµ,,ɛ n see Definition 20. Then, G n µx n > V n µ 2 n 2 ɛ 2 e Hn µ nɛ. 37 Proof: Note that by efinition for all x n Bµ,,ɛ n, we have Gn µx n > Bµ,,ɛ n 2 An µ,,ɛ. The proof is complete noticing the boun in 99. Lemma 20: For any µ M X, let x n D n µ,,ɛ see Definition 22. Then, G n µx n e Hn µ+nɛ. 38 Proof: We have G n µx n D n µ,,ɛ 39 min xn D n µ,,ɛ µn x n 40 = e Hn µ+nɛ. 4 Lemma 2: Suppose the following hols: Then, x n D n µ,,ɛ. G n µx n Proof: We procee the proof with contraiction. Suppose that x n V n µ n 2 ɛ 2 e Hn µ nɛ. 42 D n µ,,ɛ. Hence, µn x n e nhµ nɛ. Thus, µ n x n < µ n y n for all y n A n µ,,ɛ. Consequently, G n µx n > A n µ,,ɛ > V n µ n 2 ɛ 2 e Hn µ nɛ, 43

27 26 which contraicts the supposition of the lemma in 42. Lemma 22: Suppose the following hols: G n µx n > e Hn µ+nɛ. 44 Then, x n E n µ,,ɛ. Proof: We procee the proof with contraiction. Suppose that x n e nhµ+nɛ. Thus, µ n x n > µ n y n for all y n A n µ,,ɛ. Consequently, E n µ,,ɛ. Hence, µn x n G n µx n < G n µy n e Hn µ+nɛ, 45 which contraicts the supposition of the lemma in 42. Next, we provie a lemma of equivalence between tilte weakly typical sets an weakly typical sets of a tilte source, which we will rely on in proving the main theorem of this section. Lemma 23: Let µ M X. Then, for all α 0, the following equivalences hol: A n µ,α,ɛ = A n T µ,α,, α ɛ, 46 Bµ,α,ɛ n = BT n µ,α,, α ɛ, 47 Dµ,α,ɛ n = DT n µ,α,, α ɛ, 48 Eµ,α,ɛ n = ET n µ,α,, α ɛ. 49 Proof: We have A n T µ,α,, α ɛ x = n X n : log T µ n, αx n Hn T T µ, α, T µ, α < α nɛ = x n X n : α log T µ n, αx n α Hn T µ, α < nɛ = x n X n : log µ n x n Hn T µ, α µ < nɛ = A n µ,α,ɛ, where 50 follows from Definition 8, 5 follows by noting that T µ, = µ an H n µ µ = H n µ an the fact that α 0, an 52 follows from Lemma 5. Next, note that 47 follows irectly from 46 an Definition 20.

28 27 To establish 48 note that: DT n µ,α,, α ɛ x = n X n : log T µ n, αx n Hn T T µ, α, T µ, α < α nɛ = x n X n : log T µ n, αx n Hn T µ, α < α nɛ = D n µ,α,ɛ, where 54 follows from Definition 8, 55 follows by noting that T µ, = µ an H n µ µ = H n µ an the fact that α 0. Finally, the proof for 49 is very similar to that of 48 with all the inequalities reverse, an is omitte for brevity. Next, we procee to the proof of Theorem 2. Proof of Theorem 2: Part a is simply a reiteration of Definition 8, which is presente here for the completeness of the characterization. To establish Part b, notice that by Lemma 23, for all α 0, we have A n T µ,α,, α ɛ = An µ,α,ɛ. Part b is establishe by applying the bouns in 99 an 00 in Lemma 8 to A n T µ,α,, α ɛ, an noting that α 2 V T µ, α µ = V T µ, α from Lemma 6. In Parts c an, first notice that by efinition A n µ,α,ɛ D n µ,α,ɛ an A n µ,α,ɛ E n µ,α,ɛ an hence P µ X n A n µ,α,ɛ Pµ X n D n µ,α,ɛ an Pµ X n A n µ,α,ɛ Pµ X n E n µ,α,ɛ. Hence, we only nee to prove the right han sie inequalities in the claims. To prove 23 in Part c, consier 98 in Lemma 8 applie to A n T µ,α,, α ɛ, in conjunction with 46 in Lemma 23: P T µ,α X n A n µ,α,ɛ = PT µ,α A n T µ,α,, α ɛ On the other han, we have P T µ,α X n A n µ,α,ɛ = x n A n µ,α,ɛ V n T µ, α µ n 2 ɛ T µ n, αx n 58 x = n A [µ n x n ] α n µ,α,ɛ x n X n[µn x n ] α 59 x = n A µ n x n [µ n x n ] α n µ,α,ɛ e αhnµ 60 α < e αhn T µ,α µ Hα nµ+ α nɛ µ n x n 6 x n A n µ,α,ɛ = e Dn T µ,α µ+ α nɛ P µ X n A n µ,α,ɛ, 62 where 59 follow from the efinition of the tilt in 22, 60 follows from the efinition of the Rényi entropy in 9, 6 follows from 9 or 20 applie to [µ n x n ] α epening on the sign of

Guesswork Subject to a Total Entropy Budget

Guesswork Subject to a Total Entropy Budget Arman Rezaee, Ahmad Beirami, Ali Makhdoumi, Muriel Médard, and Ken Duffy arxiv:72.09082v [cs.it] 25 Dec 207 Abstract We consider an abstraction of computational