Hash Property and Fixed-rate Universal Coding Theorems

Similar documents
Hash Property and Coding Theorems for Sparse Matrices and Maximum-Likelihood Coding

Secret Key and Private Key Constructions for Simple Multiterminal Source Models

A Formula for the Capacity of the General Gel fand-pinsker Channel

On Scalable Coding in the Presence of Decoder Side Information

The Poisson Channel with Side Information

Performance-based Security for Encoding of Information Signals. FA ( ) Paul Cuff (Princeton University)

On The Binary Lossless Many-Help-One Problem with Independently Degraded Helpers

The Method of Types and Its Application to Information Hiding

Capacity Region of Reversely Degraded Gaussian MIMO Broadcast Channel

Superposition Encoding and Partial Decoding Is Optimal for a Class of Z-interference Channels

Distributed Lossless Compression. Distributed lossless compression system

Average Coset Weight Distributions of Gallager s LDPC Code Ensemble

Lecture 5: Channel Capacity. Copyright G. Caire (Sample Lectures) 122

Slepian-Wolf Code Design via Source-Channel Correspondence

Soft Covering with High Probability

(each row defines a probability distribution). Given n-strings x X n, y Y n we can use the absence of memory in the channel to compute

Variable Length Codes for Degraded Broadcast Channels

Reliable Computation over Multiple-Access Channels

How to Achieve the Capacity of Asymmetric Channels

Arimoto Channel Coding Converse and Rényi Divergence

Achievable Rates for Probabilistic Shaping

LECTURE 13. Last time: Lecture outline

Graph Coloring and Conditional Graph Entropy

Distributed Functional Compression through Graph Coloring

Remote Source Coding with Two-Sided Information

An Achievable Error Exponent for the Mismatched Multiple-Access Channel

On Lossless Coding With Coded Side Information Daniel Marco, Member, IEEE, and Michelle Effros, Fellow, IEEE

EE229B - Final Project. Capacity-Approaching Low-Density Parity-Check Codes

Practical Coding Scheme for Universal Source Coding with Side Information at the Decoder

Lecture 11: Polar codes construction

SHARED INFORMATION. Prakash Narayan with. Imre Csiszár, Sirin Nitinawarat, Himanshu Tyagi, Shun Watanabe

Subset Universal Lossy Compression

Interactive Hypothesis Testing with Communication Constraints

Equivalence for Networks with Adversarial State

The Gallager Converse

Multiaccess Channels with State Known to One Encoder: A Case of Degraded Message Sets

arxiv: v1 [cs.it] 19 Aug 2008

Duality Between Channel Capacity and Rate Distortion With Two-Sided State Information

Lecture 4 Channel Coding

3F1: Signals and Systems INFORMATION THEORY Examples Paper Solutions

A Comparison of Superposition Coding Schemes

A Novel Asynchronous Communication Paradigm: Detection, Isolation, and Coding

Large Deviations Performance of Knuth-Yao algorithm for Random Number Generation

Feedback Capacity of a Class of Symmetric Finite-State Markov Channels

Adaptive Cut Generation for Improved Linear Programming Decoding of Binary Linear Codes

Lecture 4 Noisy Channel Coding

Lattices for Distributed Source Coding: Jointly Gaussian Sources and Reconstruction of a Linear Function

Lower Bounds on the Graphical Complexity of Finite-Length LDPC Codes

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

Homework Set #2 Data Compression, Huffman code and AEP

Common Information. Abbas El Gamal. Stanford University. Viterbi Lecture, USC, April 2014

arxiv: v4 [cs.it] 17 Oct 2015

Mismatched Multi-letter Successive Decoding for the Multiple-Access Channel

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

SHARED INFORMATION. Prakash Narayan with. Imre Csiszár, Sirin Nitinawarat, Himanshu Tyagi, Shun Watanabe

Secret Key Agreement Using Conferencing in State- Dependent Multiple Access Channels with An Eavesdropper

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

EE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet.

Two Applications of the Gaussian Poincaré Inequality in the Shannon Theory

Representation of Correlated Sources into Graphs for Transmission over Broadcast Channels

Dispersion of the Gilbert-Elliott Channel

On Large Deviation Analysis of Sampling from Typical Sets

A Practical and Optimal Symmetric Slepian-Wolf Compression Strategy Using Syndrome Formers and Inverse Syndrome Formers

Universal Incremental Slepian-Wolf Coding

Optimal Natural Encoding Scheme for Discrete Multiplicative Degraded Broadcast Channels

Belief Propagation, Information Projections, and Dykstra s Algorithm

On Scalable Source Coding for Multiple Decoders with Side Information

arxiv: v1 [cs.it] 4 Jun 2018

AN INTRODUCTION TO SECRECY CAPACITY. 1. Overview

Simultaneous Nonunique Decoding Is Rate-Optimal

Polar Codes are Optimal for Write-Efficient Memories

Digital Communications III (ECE 154C) Introduction to Coding and Information Theory

ECE Information theory Final (Fall 2008)

Variable-to-Fixed Length Homophonic Coding with a Modified Shannon-Fano-Elias Code

Subset Typicality Lemmas and Improved Achievable Regions in Multiterminal Source Coding

Upper Bounds on the Capacity of Binary Intermittent Communication

An Extended Fano s Inequality for the Finite Blocklength Coding

Network coding for multicast relation to compression and generalization of Slepian-Wolf

Secret Key and Private Key Constructions for Simple Multiterminal Source Models Chunxuan Ye, Senior Member, IEEE, and Prakash Narayan, Fellow, IEEE

Refined Bounds on the Empirical Distribution of Good Channel Codes via Concentration Inequalities

An Alternative Proof of Channel Polarization for Channels with Arbitrary Input Alphabets

EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15

Linear Coding Schemes for the Distributed Computation of Subspaces

Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information

Lecture 4: Proof of Shannon s theorem and an explicit code

5218 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 12, DECEMBER 2006

Design of Optimal Quantizers for Distributed Source Coding

Tight Bounds for Symmetric Divergence Measures and a Refined Bound for Lossless Source Coding

Performance of Polar Codes for Channel and Source Coding

Distributed Lossy Interactive Function Computation

Motivation for Arithmetic Coding

Information Theory and Hypothesis Testing

EE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions

Optimization in Information Theory

Training-Based Schemes are Suboptimal for High Rate Asynchronous Communication

1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H.

Generalized Writing on Dirty Paper

A new converse in rate-distortion theory

On the Duality between Multiple-Access Codes and Computation Codes

Transcription:

1 Hash Property and Fixed-rate Universal Coding Theorems Jun Muramatsu Member, IEEE, Shigeki Miyake Member, IEEE, Abstract arxiv:0804.1183v1 [cs.it 8 Apr 2008 The aim of this paper is to prove the achievability of fixed-rate universal coding problems by using our previously introduced notion of hash property. These problems are the fixed-rate lossless universal source coding problem and the fixed-rate universal channel coding problem. Since an ensemble of sparse matrices satisfies the hash property requirement, it is proved that we can construct universal codes by using sparse matrices. Index Terms channel coding, fixed-rate universal codes hash functions, linear codes, lossless source coding, minimumdivergence encoding, minimum-entropy decoding, shannon theory, sparse matrix I. INTRODUCTION The notion of hash property is introduced in [12. It is a sufficient condition for the achievability of coding theorems including lossless and lossy source coding, channel coding, the Slepian-Wolf problem, the Wyner-Ziv problem, the Gel fand-pinsker problem, and the problem of source coding with partial side information at the decoder. Since an ensemble of sparse matrices satisfies the hash property requirement, it is proved that we can construct codes by using sparse matrices and maximum-likelihood coding. However, it is assumed in [12 that source and channel distributions are used when designing a code. The aim of this paper is to prove fixed-rate universal coding theorems based on the hash property, where a specific probability distribution is not assumed for the design of a code and the error probability of a code vanishes for all sources specified by the encoding rate. We prove theorems of fixed-rate lossless universal source coding (see Fig. 1) and fixed-rate universal channel coding (see Fig. 2). In the construction of codes, the maximum-likelihood coding used in [12 is replaced by a minimum-divergence encoder and a minimum-entropy decoder. A practical algorithm has been obtained for the minimum-entropy decoder by using linear programming [2. It should be noted that a practical algorithm for the minimum-divergence encoder can also be obtained by using linear programming as shown in Section V. The fixed-rate lossless universal source coding theorem is proved in [3 for the ensemble of all linear matrices in the context of the Slepian-Wolf source coding problem, in [7 for the class of universal hash functions, and J. Muramatsu is with NTT Communication Science Laboratories, NTT Corporation, 2-4, Hikaridai, Seika-cho, Soraku-gun, Kyoto 619-0237, Japan (E-mail: pure@cslab.kecl.ntt.co.jp). S. Miyake is with NTT Network Innovation Laboratories, NTT Corporation, 1-1, Hikarinooka, Yokosuka-shi, Kanagawa 239-0847, Japan (E-mail: miyake.shigeki@lab.ntt.co.jp).

2 X ϕ R > H(X) ϕ 1 X Fig. 1. Lossless Source Coding M ϕ X µ Y X Y ϕ 1 M R < I(X; Y ) Fig. 2. Channel Coding in [11 implicitly for an ensemble of sparse matrices in the context of a secret key agreement from correlated source outputs. The universal channel coding theorem that employs sparse matrices is proved in [8 for an additive noise channel and in [9 for an arbitrary channel. It should be noted here that the linearity for an ensemble member is not assumed in our proof. Our proof assumes that ensembles of sparse matrices have a hash property and so is simpler than previously reported proofs [11[8[9. II. DEFINITIONS AND NOTATIONS Throughout this paper, we use the following definitions and notations. Column vectors and sequences are denoted in boldface. Let Au denote a value taken by a function A : U n U at u U n where U n is a domain of the function. It should be noted that A may be non-linear. For a function A and a set of functions A, let ImA and ImA be defined as ImA {Au : u U n } ImA ImA. A A The cardinality of a set U is denoted by U and U {u} is a set difference. We define sets C A (c) and C AB (c, m) as C A (c) {u : Au = c} C AB (c, m) {u : Au = c, Bu = m}. In the context of linear codes, C A (c) is called a coset determined by c. Let p and p be probability distributions and let q and q be conditional probability distributions. Then entropy H(p), conditional entropy H(q p), divergence D(p p ), and conditional divergence D(q q p) are defined as H(p) u H(q p) u,v D(p p ) u p(u)log 1 p(u) q(u v)p(v) log p(u)log p(u) p (u) 1 q(u v)

3 D(q q p) v where we assume the base 2 of the logarithm. p(v) u q(u v)log q(u v) q (u v), Let µ UV be the joint probability distribution of random variables U and V. Let µ U and µ V be the respective marginal distributions and µ U V be the conditional probability distribution. Then the entropy H(U), the conditional entropy H(U V ), and the mutual information I(U; V ) of random variables are defined as Let ν u and ν u v be defined as H(U) H(µ U ) H(U V ) H(µ U V µ V ) I(U; V ) H(µ U ) + H(µ V ) H(µ UV ). ν u (u) {1 i n : u i = u} n ν u v (u v) ν uv(u, v). ν v (v) We call ν u a type 1 of u U n and ν u v a conditional type. Let U ν U be the type of a sequence and U V ν U V be the conditional type of a sequence given a sequence of type U. Then a set of typical sequences T U and a set of conditionally typical sequences T U V (v) are defined as T U {u : ν u = ν U } T U V (v) { u : ν u v = ν U V }, respectively. The empirical entropy, the empirical conditional entropy, and empirical mutual information are defined as H(u) H(ν u ) H(u v) H(ν u v ν v ) I(u; v) H(ν u ) + H(ν v ) H(ν uv ). In the construction of a universal source code, we use a minimum-entropy decoder g A (c) arg min x C H(x ) A(c) It should be noted that the linear programing technique introduced in [2 can be applied to the minimum-entropy decoder g A. In the construction of a universal channel code, we use a minimum-divergence encoder and a minimum-entropy decoder g AB (c, m) arg min D(ν x µ X) x C AB(c,m) g A (c, y) arg min x C H(x y). A(c) 1 In [12, the type of a sequence is defined as a histogram {nν u(u)} u U.

4 It should be noted that we have g AB (c, m) = arg max X(x ) + nh(ν x ) x C AB(c,m) [ = argmax U nh(u ) + max log µ X (x ) x C AB(c,m) T U from Lemma 7. When functions A and B are linear, the linear programing technieque introduced in [6 can be applied to the maximization max x µ X (x ) because U is fixed and the constraint condition x C AB (c, m) T U is represented by linear functions. Finally, we define χ( ) as We define a sequence {λ U (n)} n=1 as 1, if a = b χ(a = b) 0, if a b 1, if a b χ(a b) 0, if a = b. λ U (n) U log[n + 1. (1) n It should be noted here that the product set U V is denoted by UV when it appears in the subscript of this function and we omit argument n of λ U when n is clear in the context. We define + as θ, if θ > 0, θ + 0, if θ 0. (2) III. (α, β)-hash PROPERTY In this section, we reveiw the notion of the (α, β)-hash property introduced in [12. This is a sufficient condition for coding theorems, where the linearity of functions is not assumed. By using this notion, we prove a fixed-rate universal source coding theorem and a fixed-rate universal source coding theorem. Throughout the paper, Au denotes a value taken by a function A at u U n where U n is the domain of the function. It should again be noted here that A may be non-linear. We define the (α, β)-hash property in the following. Definition 1: Let A be a set of functions A : U n U and we assume that ImA = ImA for all A A and log U lim n n = 0. (H1) Let p A be a probability distribution on A. We call a pair (A, p A ) an ensemble. Then, (A, p A ) has an (α, β)-hash property if α {α(n)} n=1 and β {β(n)} n=1 satisfy lim α(n) = 1 n lim β(n) = 0 n (H2) (H3)

5 and p ({A : Au = Au }) T T + T T α(n) u T u T + min{ T, T }β(n) (H4) for any T, T U n. Throughout this paper, we omit argument n of α and β when n is fixed. In the following, we present two examples of ensembles that have a hash property. Example 1: In this example, we consider a universal class of hash functions introduced in [5. A set A of functions A : U n U is called a universal class of hash functions if {A : Au = Au } A U for any u u. For example, the set of all functions on U n and the set of all linear functions A : U n U la are universal classes of hash functions (see [5). It should be noted that every example above satisfies ImA = U. When A is a universal class of hash functions and p A is the uniform probability on A, we have p A ({A : Au = Au }) T T + T T. u T u T This implies that (A, p A ) has a (1,0)-hash property, where α(n) 1 and β(n) 0 for every n. Example 2: In this example, we revew the ensemble of q-ary sparse matrices introduced in [12. In the following, let U GF(q) and l A nr. We generate an l n matrix A with the following procedure: 1) Start from an all-zero matrix. 2) For each i {1,...,n}, repeat the following procedure τ times: a) Choose (j, a) {1,..., l A } [GF(q) {0} uniformly at random. b) Add a to the (j, i) component of A. Let (A, p A ) be an ensemble corresponding to the above procedure. Then ImA = u u has an even number of Ul : non-zero elements, if q = 2 U l, if q > 2 for all A A and there is (α A, β A ) such that (A, p A ) has an (α A, β A )-hash property (see [12, Theorem 2). In the following, Let A (resp. B) be a set of functions A : U n U A (resp. B : U n U B ). We assume that an ensemble (A, p A ) has an (α A, β A )-hash property and an ensemble (A B, p A p B ) also has an (α AB, β AB )-hash property. We also assume that p C and p M is the uniform distribution on ImA and ImB, respectively, and random variables A, B, C, and M are mutually independent, that is, 1, if c ImA p C (c) = 0, if c U ImA 1 ImB, if m ImB p M (m) = 0, if m U ImA

6 Encoder x A Ax Decoder Ax g A x Fig. 3. Construction of Fixed-rate Source Code p ABCM (A, B, c, m) = p A (A)p B (B)p C (c)p M (m) for any A, B, and c. We use the following lemmas, which are shown in [12. Lemma 1 ([12, Lemma 9): For any A and u U n, and for any u U n, p C ({c : Au = c}) = c p C (c)χ(au = c) = 1 E AC [χ(au = c) = p AC (A, c)χ(au = c) = 1. A,c Lemma 2 ([12, Lemma 2): If G U n and u / G, then p A ({A : G C A (Au) }) G α A + β A. Lemma 3 ([12, Lemma 5): If T, then p ABCM ({(A, B, c, m) : T C AB (c, m) = }) α AB 1 + ImB [β AB + 1. T When (A, p A ) and (B, p B ) are the ensembles of l A n and l B n linear matrices, respectively, we have the following lemma. Lemma 4 ([12, Lemma 7): The joint distribution (A B, p AB ) has an (α AB, β AB )-hash property for the ensemble of functions A B : U n U la+lb defined as A B(u) (Au, Bu), where α AB (n) = α A (n)α B (n) (3) β AB (n) = min{β A (n), β B (n)}. (4)

7 IV. FIXED-RATE LOSSLESS UNIVERSAL SOURCE CODING In this section, we consider the fixed-rate lossless universal source coding illustrated in Fig. 1. For a given encoding rate R, l A is given by We fix a function l A nr log X. A : X n X la which is available to construct an encoder and a decoder. We define the encoder and the decoder (illustrated in Fig. 3) as where The error probability Error X (A) is given by ϕ X : X n X la ϕ 1 : X la X n ϕ(x) Ax ϕ 1 (c) g A (c), g A (c) arg min x C H(x ). A(c) Error X (A) µ X ({ x : ϕ 1 (ϕ(x)) x }). We have the following theorem. It should be noted that the alphabet X may not be binary. Theorem 1: Assume that an ensemble (A, p A ) has an (α A, β A )-hash property. For a fixed rate R, δ > 0 and a sufficiently large n, there is a function (matrix) A A such that { } αa X la Error X (A) max, 1 2 n[inf FX(R) 2λX + β A (5) for any stationary memoryless sources X satisfying where H(X) < R, (6) F X (R) min U [ D(νU µ X ) + R H(U ) + and the infimum is taken over all X satisfying (6). Since inf F X(R) > 0, X:H(X)>R then the error probability goes to zero as n for all X satisfying (6).

8 We can prove the coding theorem for a channel µ Y X with additive noise Z Y X by letting A and C A (0) = {x : Ax = 0} be a parity check matrix and a set of codewords (channel inputs), respectively. Then the encoding rate of this channel code is given by and the error probability is given as Since Error Y X (A) 1 C A (0) log C A (0) n x C A(0) z = y x log X R µ Y X ({y : g A (Ay) y x} x). Az = Ay Ax = Ay, then the decoding of channel input x from a syndrome Ay is equivalent to the decoding of source output z from its codeword Az by using g A. We have the following corollary. Corollary 2: Assume that an ensemble (A, p A ) of linear functions has an (α A, β A )-hash property. For a fixed rate R, δ > 0 and sufficiently large n, there is a (sparse) matrix A A such that { } αa X la Error Y X (A) max, 1 2 n[inf FZ(R) 2λX + β A for any stationary memoryless channel with additive noize Z satisfying log X R < I(X; Y ) = log X H(Z), (7) where the infimum is taken over all Z satisfying (7) and the error probability goes to zero as n for all X satisfying (7). Remark 1: It should be noted here that the condition (H2) can be replaced by log α A (n) lim = 0. (8) n n By using the expurgation technique described in [1, we obtain an ensemble of sparce matrices that have an (α A,0)-hash property, where (H2) is replaced by (8). This implies that we can omit the term β A from the upper bound of the error probability. Remark 2: Since a class of universal hash functions with a uniform distribution and an ensemble of all linear functions has a (1, 0)-hash property, we obtain the same results as those reported in [7 and [3, respectively, where F X represents the error exponent function. When (A, p A ) is an ensemble of sparse matrices and (α A, β A ) is defined properly, we have the same result as that found in [8. V. FIXED-RATE UNIVERSAL CHANNEL CODING The code for the channel coding problem (illustrated in Fig. 2) is given in the following (illustrated in Fig. 4). The idea for the construction is drawn from [10[12[9. We give the explicit construction of the encoder by using minimum-divergence encoding, which is not described in [10[12[9.

9 Encoder m c g AB x Decoder y c g A x B m Fig. 4. Construction of Channel Code For a given R A, R B > 0, let satisfying respectively. as A : X n X la B : X n X lb log R A = n log ImB R B =, n We fix functions A, B and a vector c n X la available to constract an encoder and a decoder. We define the encoder and the decoder where ϕ : X lb X n ϕ 1 : Y n X lb ϕ(m) g AB (c, m) ϕ 1 (y) Bg A (c, y), g AB (c, m) arg min D(ν x µ X) x C AB(c,m) g A (c, y) arg min x C H(x y). A(c)

10 The error probability Error Y X (A, B, c) is given by Error Y X (A, B, c) m,y p M (m)µ Y X (y ϕ(m))χ(ϕ 1 (y) m), where 1 ImB p M (m), if c ImB 0 if c / ImB. It should be noted that ImB represents a set of all messages and R B represents the encoding rate of a channel. We have the following theorem. Theorem 3: Assume that an ensemble (A, p A ) (resp. (A B, p AB )) has an (α A, β A )-hash (resp. (α AB, β AB )- hash) property. For a fixed rate R A, R B > 0, a given input distribution µ X satisfying H(X) > R A + R B, (9) δ > 0, and a sufficiently large n, there are functions (matrices) A A, B B, and a vector c ImA such that Error Y X (A, B, c) α AB 1 + β AB + 1 κ for all µ Y X satisfying where + 2κ [max {α A, 1}2 n[inf F Y X(R A) 2λ X Y + β A (10) H(X Y ) < R A, (11) F Y X (R) min V U [D(ν V U µ Y X ν U ) + R H(U V ) +, the infimum is taken over all µ Y X satisfying (9), and κ {κ(n)} n=1 is an arbitrary sequence satisfying and κ denotes κ(n). Since lim κ(n) = (12) n lim A(n) = 0 n (13) log κ(n) lim = 0 n n (14) inf µ Y X : H(Y X)<R A F Y X (R A ) > 0, then the right hand side of (10) goes to zero as n for all µ Y X satisfying (11). Remark 3: It should be noted here that we have I(X; Y ) > R B (15) from (11) and (9). However (11) and (15) do not imply (9) even when R A < H(X). Remark 4: For β A satisfying (H3), there is κ satisfying (12) (14) by letting n ξ if β A (n) = o ( n ξ) κ(n) (16) 1, otherwise βa(n)

11 for every n. If β A (n) is not o ( n ξ), there is κ > 0 such that β A (n)n ξ > κ and log κ(n) n = log 1 β A(n) 2n ξ log n κ 2n ξ log n log κ = 2n for all sufficiently large n. This implies that κ satisfies (14). It should be noted that we can let ξ be arbitrarily large in (16) when β A (n) vanishes exponentially fast. This parameter ξ affects the upper bound of (10). Remark 5: From Lemma 4, we have the fact that the condition (H3) of β B is not necessary for the ensembles (A, p A ) and (B, p B ) of linear functions. In this section, we prove the theorems. VI. PROOF OF THEOREMS A. Proof of Theorem 1 Let G U {x : H(x ) H(U)}. If x T U and g A (Ax) x, then there is x C A (Ax) such that x x and which implies that Then we have H(x ) H(x) = H(U), [G U {x} C A (Ax). [ E A [Error X (A) = E A µ X (x)χ(g A (Ax) x) x A : µ X (x)p A U x T U [G U {x} C A (Ax) { } GU α A µ X (x)max + β A, 1 U x T U { X l A 2 n[r H(U) λx } α A µ X (x)max, 1 + β A U x T U { } αa X la max, 1 2 n[d(νu µx)+ R H(U) + λ X + β A U { } αa X la max, 1 2 n[fx(r) 2λX + β A, where the second inequality comes from Lemma 2, the third inequality comes from Lemma 8, the fourth inequality comes from Lemmas 6 and 7, and the last inequality comes from the definition of F X and Lemma 5. Then we have the fact that there is a function (matrix) A A satisfying (5).

12 B. Proof of Theorem 3 Let UV ν V U be a joint type of the sequence (x, y) X n Y n, where the marginal type U is defined as U argmin D(ν U µ X ). (17) U and the conditional type given type U is denoted by V U. Since R A + R B < H(X) and H(U) approaches H(X) as n goes to infinity because of the law of large numbers and the continuity of the entropy function, we have for all sufficiently large n. Then we have H(U) λ X > R A + R B + log κ n T U 2 n[h(u) λx, κ2 n[ra+rb = κ ImB for all sufficiently large n, where the first inequality comes from Lemma 6. This implies that there is T T U such that for all sufficiently large n. Let κ T 2κ (18) ImB g AB (c, m) T g A (c, y) = g AB (c, m). (UC1) (UC2) Then we have Error(A, B, c, µ Y X ) p MY (S1 c ) + p MY (S 1 S2 c ), (19) where First, we evaluate E ABC [p MY (S c 1). We have S i {(m, y, w) : (UCi)}. E ABC [p MY (S c 1 ) = p ABCM ({(A, B, c, m) : T C AB (c, m) = }) α AB 1 + ImB [β AB + 1 T α AB 1 + β AB + 1 κ where the equality comes from the property of T, the first inequailty comes from Lemma 3 and the second inequality comes from (18). Next, we evaluate E ABC [p MY (S 1 S2 c ). Let G(y) {x : H(x y) H(U V )} (20)

13 and assume that (x, y) T UV. Then we have Ax = c E AC [χ(ax = c)χ(g A (c, y) x) = p AC (A, c) : x x s.t. H(x y) H(x y) and Ax = c x x s.t. = p A A : p C ({c : Ax = c}) H(x y) H(x y) and Ax = Ax = 1 p x x s.t. H(x y) H(U V ) A A : and Ax = Ax 1 max p A ({A : Ax = Ax }),1 x [G(y) {x} 1 { 2 n[h(u V )+λ max X Y α A } + β A, 1 = 1 max { 2 n[ra H(U V ) λx Y α A + β A, 1 1 [max {α A, 1}2 n[ RA H(U V ) + λ X Y + β A, (21) where + is defined by (2), the third equality comes from Lemma 1 and the second inequality comes from Lemma 8 and (H4) for an ensemble p A. Then we have E ABC [p MY (S 1 S2 c ) = E ABCM µ Y X (y x)χ(g AB (c, m) = x)χ(g A (c, y) x) x T V U y T V U (x) E ABCM µ Y X (y x)χ(ax = c)χ(bx = m)χ(g A (c, y) x) x T V U y T V U (x) = µ Y X (y x)e AC [χ(ax = c)χ(g A (c, y) x) E BM [χ(bx = m) x T V U y T V U (x) µ Y X (y x) [max {α A, 1}2 n[ RA H(U V ) + λ X Y + β A 1 ImB x T V U y T V U (x) 1 = µ Y X (y x)max {α A, 1}2 n[ RA H(U V ) + λ X Y + β A ImB x T V U y T V U (x) 1 max {α A, 1} 2 n[d(ν V U µ Y X ν U)+ R A H(U V ) + λ X Y + β A ImB x T V U T [max {α A, 1}2 n[f Y X(R A) 2λ X Y + β A ImB 2κ [max {α A, 1} 2 n[f Y X(R A) 2λ X Y + β A, (22) where the second inequality comes from Lemma 1 and (21), the third inequality comes from Lemmas 7 and 6, the fourth inequality comes from the definition of F Y X and Lemma 5 and the last inequality comes from }

14 (18). From (19), (20), and (22) we have E ABC [ ErrorY X (A, B, c) α AB 1 + β AB + 1 κ + 2κ [max {α A, 1}2 n[f Y X(R A) 2λ X Y + β A. Applying the above argument for all µ Y X satisfying (11) and (9), we have the fact that there are A A, B B, and c ImA that satisfy (10). VII. CONCLUSION The fixed rate universal coding theorems are proved by using the notion of hash property. We proved the theorems of fixed-rate lossless universal source coding and fixed-rate universal channel coding. Since an ensemble of sparse matrices satisfies the hash property requirement, it is proved that we can construct universal codes by using sparse matrices. APPENDIX We introduce the following lemmas that are used in the proofs of the theorems. Lemma 5 ([4, Lemma 2.2): The number of different types of sequences in X n is fewer than [n+1 X. The number of conditional types of sequences X Y is fewer than [n + 1 X Y. Lemma 6 ([4, Lemma 2.3): For a type U of a sequence in X n, where λ X is defined in (1). Lemma 7 ([4, Lemma 2.6): 2 n[h(u) λx T U 2 nh(u), 1 n log 1 µ X (x) = H(ν x) + D(ν x µ X ) 1 n log 1 µ Y X (y x) = H(ν y x ν x ) + D(ν y x µ Y X ν y ). Lemma 8 ([11, Lemma 2): For y T V, where λ X and λ X Y are defined by (1). Proof: shown by {x : H(x ) H(U)} 2 n[h(u)+λx {x : H(x y) H(U V )} 2 n[h(u V )+λx Y, The first inequality of this lemma is shown by the second inequality. The second inequality is {x : H(x y) < H(U V )} = U : H(U V ) H(U V ) U : H(U V ) H(U V ) T U V (y) 2 nh(u V )

15 U : H(U V ) H(U V ) 2 nh(u V ) [n + 1 X Y 2 nh(u V ) = 2 n[h(u V )+λuv, where the first inequality comes from Lemma 6 and the third inequality comes from Lemma 5. ACKNOWLEDGEMENTS This paper was written while one of authors J. M. was a visiting researcher at ETH, Zürich. He wishes to thank Prof. Maurer for arranging for his stay. REFERENCES [1 A. Bennatan and D. Burshtein, On the application of LDPC codes to arbitrary discrete-memoryless channels, IEEE Trans. Inform. Theory, vol. IT-50, no. 3, pp. 417 438, Mar. 2004. [2 T. P. Coleman, M. Médard, and M. Effros, Towards practical miminum-entropy universal decoding, Proc. of the IEEE Data Compression Coference, Mar. 29 31, 2005 pp. 33 42. [3 I. Csiszár, Linear codes for sources and source networks: Error exponents, universal coding, IEEE Trans. Inform. Theory, vol. IT-28, no. 4, pp. 585 592, Jul. 1982. [4 I. Csiszár and J. Körner, Information Theory: Coding Theorems for Discrete Memoryless Systems, Academic Press, 1981. [5 J. L. Carter and M. N. Wegman, Universal classes of hash functions, J. Comput. Syst. Sci., vol. 18, pp. 143 154, 1979. [6 J. Feldman, M.J. Wainwright, and D.R. Karger, Using linear programming to decode binary linear codes, IEEE Trans. Inform. Theory, vol. IT-51, no. 3, pp. 954 972, Mar. 2005. [7 H. Koga, Source coding using families of universal hash functions, IEEE Trans. Inform. Theory, vol. IT-53, no. 9, pp. 3226 3233, Sept. 2007. [8 S. Miyake and M. Maruyama, Construction of universal codes using LDPC matrices and their error exponents, IEICE Trans. Fundamentals, vol. E90-A, No. 9, pp. 1830 1839, Sept. 2007. [9 S. Miyake and J. Muramatsu, A construction of channel code, JSCC and universal code for discrete memoryless channels using sparse matrices, to appear in Proc. 2008 IEEE Int. Symp. Inform. Theory, Tronto, Canada, Jul. 6 11, 2008. [10 J. Muramatsu, T. Uyematsu, and T. Wadayama, Low density parity check matrices for coding of correlated sources, IEEE Trans. Inform. Theory, vol. IT-51, no. 10, pp. 3645 3653, Oct. 2005. [11 J. Muramatsu, Secret key agreement from correlated source outputs using low density parity check matrices, IEICE Trans. Fundamentals, vol. E89-A, no. 7, pp. 2036 2046, Jul. 2006. [12 J. Muramatsu and S. Miyake Hash property and coding theorems for sparse matrices and maximal-likelihood coding, submittd to IEEE Trans. Inform. Theory, available at arxiv:0801.3878 [cs.it, 2007.