Nonstochastic Information Theory for Feedback Control

Size: px
Start display at page:

Download "Nonstochastic Information Theory for Feedback Control"

Transcription

1 Nonstochastic Information Theory for Feedback Control Girish Nair Department of Electrical and Electronic Engineering University of Melbourne Australia Dutch Institute of Systems and Control Summer School Zandvoort aan Zee The Netherlands June 2015 Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

2 Outline 1 Basics of Shannon Information Theory 2 Overview of Capacity-Limited State Estimation/Control 3 Motivation for Nonstochastic Control 4 Uncertain Variables, Unrelatedness and Markovness 5 Nonstochastic Information 6 Channels and Coding Theorems 7 State Estimation and Control via Noisy Channels Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

3 Outline 1 Basics of Shannon Information Theory 2 Overview of Capacity-Limited State Estimation/Control 3 Motivation for Nonstochastic Control 4 Uncertain Variables, Unrelatedness and Markovness 5 Nonstochastic Information 6 Channels and Coding Theorems 7 State Estimation and Control via Noisy Channels Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

4 Outline 1 Basics of Shannon Information Theory 2 Overview of Capacity-Limited State Estimation/Control 3 Motivation for Nonstochastic Control 4 Uncertain Variables, Unrelatedness and Markovness 5 Nonstochastic Information 6 Channels and Coding Theorems 7 State Estimation and Control via Noisy Channels Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

5 Outline 1 Basics of Shannon Information Theory 2 Overview of Capacity-Limited State Estimation/Control 3 Motivation for Nonstochastic Control 4 Uncertain Variables, Unrelatedness and Markovness 5 Nonstochastic Information 6 Channels and Coding Theorems 7 State Estimation and Control via Noisy Channels Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

6 Outline 1 Basics of Shannon Information Theory 2 Overview of Capacity-Limited State Estimation/Control 3 Motivation for Nonstochastic Control 4 Uncertain Variables, Unrelatedness and Markovness 5 Nonstochastic Information 6 Channels and Coding Theorems 7 State Estimation and Control via Noisy Channels Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

7 Outline 1 Basics of Shannon Information Theory 2 Overview of Capacity-Limited State Estimation/Control 3 Motivation for Nonstochastic Control 4 Uncertain Variables, Unrelatedness and Markovness 5 Nonstochastic Information 6 Channels and Coding Theorems 7 State Estimation and Control via Noisy Channels Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

8 Outline 1 Basics of Shannon Information Theory 2 Overview of Capacity-Limited State Estimation/Control 3 Motivation for Nonstochastic Control 4 Uncertain Variables, Unrelatedness and Markovness 5 Nonstochastic Information 6 Channels and Coding Theorems 7 State Estimation and Control via Noisy Channels Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

9 What is Information? Wikipedia: Information is that which informs i.e. that from which data and knowledge can be derived (as data represents values attributed to parameters, while knowledge is acquired through understanding of real things or abstract concepts, in any particular field of study). In [Shannon BSTJ 1948], information was somewhat more concretely defined, within a probability space. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

10 What is Information? Wikipedia: Information is that which informs i.e. that from which data and knowledge can be derived (as data represents values attributed to parameters, while knowledge is acquired through understanding of real things or abstract concepts, in any particular field of study). In [Shannon BSTJ 1948], information was somewhat more concretely defined, within a probability space. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

11 What is Information? Wikipedia: Information is that which informs i.e. that from which data and knowledge can be derived (as data represents values attributed to parameters, while knowledge is acquired through understanding of real things or abstract concepts, in any particular field of study). In [Shannon BSTJ 1948], information was somewhat more concretely defined, within a probability space. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

12 What is Information? Wikipedia: Information is that which informs i.e. that from which data and knowledge can be derived (as data represents values attributed to parameters, while knowledge is acquired through understanding of real things or abstract concepts, in any particular field of study). In [Shannon BSTJ 1948], information was somewhat more concretely defined, within a probability space. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

13 Shannon Entropy Prior uncertainty or entropy of a discrete random variable (rv) X p X apple 1 H[X]:=E log 2 = p X (X) Âp X (x)log 2 p X (x) 0. x Minimum expected no. yes/no questions sufficient to determine X. Joint (discrete) entropy H[X,Y ] defined by replacing p X with p X,Y. Conditional entropy of X given Y is average uncertainty in X given Y : 1 H[X Y ]:=E applelog 2 p X Y (X Y ) H[X,Y ] H[Y ] ( 0). Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

14 Shannon Entropy Prior uncertainty or entropy of a discrete random variable (rv) X p X apple 1 H[X]:=E log 2 = p X (X) Âp X (x)log 2 p X (x) 0. x Minimum expected no. yes/no questions sufficient to determine X. Joint (discrete) entropy H[X,Y ] defined by replacing p X with p X,Y. Conditional entropy of X given Y is average uncertainty in X given Y : 1 H[X Y ]:=E applelog 2 p X Y (X Y ) H[X,Y ] H[Y ] ( 0). Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

15 Shannon Entropy Prior uncertainty or entropy of a discrete random variable (rv) X p X apple 1 H[X]:=E log 2 = p X (X) Âp X (x)log 2 p X (x) 0. x Minimum expected no. yes/no questions sufficient to determine X. Joint (discrete) entropy H[X,Y ] defined by replacing p X with p X,Y. Conditional entropy of X given Y is average uncertainty in X given Y : 1 H[X Y ]:=E applelog 2 p X Y (X Y ) H[X,Y ] H[Y ] ( 0). Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

16 Shannon Information Information gained about X from Y := Reduction in uncertainty: I[X;Y ]:=H[X] H[X Y ]. Called mutual information since symmetric: px (x)p I[X;Y ] = Â Y (y) p X,Y (x,y)log 2 x,y p X,Y (x,y) H[X]+H[Y ] H[X,Y ]. For continuous X,Y, replace pmf s p with corresponding pdf s f. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

17 Shannon Information Information gained about X from Y := Reduction in uncertainty: I[X;Y ]:=H[X] H[X Y ]. Called mutual information since symmetric: px (x)p I[X;Y ] = Â Y (y) p X,Y (x,y)log 2 x,y p X,Y (x,y) H[X]+H[Y ] H[X,Y ]. For continuous X,Y, replace pmf s p with corresponding pdf s f. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

18 Shannon Information Information gained about X from Y := Reduction in uncertainty: I[X;Y ]:=H[X] H[X Y ]. Called mutual information since symmetric: px (x)p I[X;Y ] = Â Y (y) p X,Y (x,y)log 2 x,y p X,Y (x,y) H[X]+H[Y ] H[X,Y ]. For continuous X,Y, replace pmf s p with corresponding pdf s f. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

19 Codes for Stationary Memoryless Random Channels p p Y0: n X 0: n Yk X k ( y ( y x) q( y x) 0: n x 0: n ) n k 0 q( y x ) k k Message M Coder X k Channel Y k Decoder Mˆ A block code is defined by an error tolerance e > 0, block length n N and message-set cardinality µ 1; an encoder mapping g s.t. for any independent, uniformly distributed message M 2{m 1,...,m µ }, X 0:n = g(i) if M = m i ; and a decoder ˆM = d(y 0:n ) s.t. Pr[ ˆM 6= M] apple e. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

20 Codes for Stationary Memoryless Random Channels p p Y0: n X 0: n Yk X k ( y ( y x) q( y x) 0: n x 0: n ) n k 0 q( y x ) k k Message M Coder X k Channel Y k Decoder Mˆ A block code is defined by an error tolerance e > 0, block length n N and message-set cardinality µ 1; an encoder mapping g s.t. for any independent, uniformly distributed message M 2{m 1,...,m µ }, X 0:n = g(i) if M = m i ; and a decoder ˆM = d(y 0:n ) s.t. Pr[ ˆM 6= M] apple e. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

21 Codes for Stationary Memoryless Random Channels p p Y0: n X 0: n Yk X k ( y ( y x) q( y x) 0: n x 0: n ) n k 0 q( y x ) k k Message M Coder X k Channel Y k Decoder Mˆ A block code is defined by an error tolerance e > 0, block length n N and message-set cardinality µ 1; an encoder mapping g s.t. for any independent, uniformly distributed message M 2{m 1,...,m µ }, X 0:n = g(i) if M = m i ; and a decoder ˆM = d(y 0:n ) s.t. Pr[ ˆM 6= M] apple e. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

22 Codes for Stationary Memoryless Random Channels p p Y0: n X 0: n Yk X k ( y ( y x) q( y x) 0: n x 0: n ) n k 0 q( y x ) k k Message M Coder X k Channel Y k Decoder Mˆ A block code is defined by an error tolerance e > 0, block length n N and message-set cardinality µ 1; an encoder mapping g s.t. for any independent, uniformly distributed message M 2{m 1,...,m µ }, X 0:n = g(i) if M = m i ; and a decoder ˆM = d(y 0:n ) s.t. Pr[ ˆM 6= M] apple e. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

23 Capacity and Information Define (ordinary) capacity C operationally as the highest block-coding rate that yields vanishing error probability: log C := lim sup 2 µ e!0 n,µ2n,g,d n + 1 = lim lim e!0 n! sup µ2n,g,d log 2 µ n + 1. Shannon showed that capacity can also be thought of intrinsically, as the maximum information rate across channel: Theorem (Shannon BSTJ 1948) I[X 0:n ;Y 0:n ] C = sup n 0,p X0:n n + 1 = supi[x;y ] p X Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

24 Capacity and Information Define (ordinary) capacity C operationally as the highest block-coding rate that yields vanishing error probability: log C := lim sup 2 µ e!0 n,µ2n,g,d n + 1 = lim lim e!0 n! sup µ2n,g,d log 2 µ n + 1. Shannon showed that capacity can also be thought of intrinsically, as the maximum information rate across channel: Theorem (Shannon BSTJ 1948) I[X 0:n ;Y 0:n ] C = sup n 0,p X0:n n + 1 = supi[x;y ] p X Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

25 Networked State Estimation/Control Classical assumption: controllers and estimators knew plant outputs perfectly. Since the 60 s this assumption has been challenged: I Delays, due to latency and intermittent channel access, in control area networks. I Quantisation errors in digital control, I Finite communication capacity per sensor in long-range radar surveillance networks Focus here on limited quantiser resolution and capacity. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

26 Estimation/Control over Communication Channels U k X Y GX W, k k 1 k k AX BU V k k k Yk Quantiser/ Coder S k Channel Q k Decoder/ Estimator Xˆ k NoiseV, W k k Decoder/ Controller U k X Y GX W, k k 1 k k AX BU V Noise V, W k k k k k Yk Qk Channel S k Quantiser/ Coder Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

27 Additive Noise Model Early work considered errorless digital channels and static quantisers, with uniform quantiser errors modelled as additive, uncorrelated noise [e.g. Curry 1970] with variance µ 2 2R (R = bit rate). Good approximation for stable plants and high R, and allows linear stochastic estimation/control theory to be applied. However, for unstable plants it leads to conclusions that are wrong, e.g. I I if plant is noiseless and unstable, then states/estimation errors cannot converge to 0; and if plant is unstable, then mean-square-bounded states/estimation errors can always be achieved. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

28 Additive Noise Model Early work considered errorless digital channels and static quantisers, with uniform quantiser errors modelled as additive, uncorrelated noise [e.g. Curry 1970] with variance µ 2 2R (R = bit rate). Good approximation for stable plants and high R, and allows linear stochastic estimation/control theory to be applied. However, for unstable plants it leads to conclusions that are wrong, e.g. I I if plant is noiseless and unstable, then states/estimation errors cannot converge to 0; and if plant is unstable, then mean-square-bounded states/estimation errors can always be achieved. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

29 Additive Noise Model Early work considered errorless digital channels and static quantisers, with uniform quantiser errors modelled as additive, uncorrelated noise [e.g. Curry 1970] with variance µ 2 2R (R = bit rate). Good approximation for stable plants and high R, and allows linear stochastic estimation/control theory to be applied. However, for unstable plants it leads to conclusions that are wrong, e.g. I I if plant is noiseless and unstable, then states/estimation errors cannot converge to 0; and if plant is unstable, then mean-square-bounded states/estimation errors can always be achieved. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

30 Additive Noise Model Early work considered errorless digital channels and static quantisers, with uniform quantiser errors modelled as additive, uncorrelated noise [e.g. Curry 1970] with variance µ 2 2R (R = bit rate). Good approximation for stable plants and high R, and allows linear stochastic estimation/control theory to be applied. However, for unstable plants it leads to conclusions that are wrong, e.g. I I if plant is noiseless and unstable, then states/estimation errors cannot converge to 0; and if plant is unstable, then mean-square-bounded states/estimation errors can always be achieved. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

31 Errorless Channels - Data Rate Theorem In fact, coding-based analyses reveal that stable state estimation/control possible iff R > Â log 2 l i, l i 1 where l 1,...,l n = eigenvalues of plant matrix A. Holds under various assumptions and stability notions: I Random initial state, noiseless plant; mean rth power convergence to 0.[N.-Evans, Auto.03] I Bounded initial state, noiseless plant; uniform convergence to 0 [Tatikonda-Mitter, TAC04] I Random plant noise; mean-square boundedness.[n.-evans, SICON04] I Bounded plant noise; uniform boundedness [Tatikonda-Mitter, TAC04] Additive uncorrelated noise models of quantisation fail to capture the existence of such a threshold. Necessity typically proved using differential entropy power, quantisation theory or volume partitioning bounds. Sufficiency via explicit construction. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

32 Errorless Channels - Data Rate Theorem In fact, coding-based analyses reveal that stable state estimation/control possible iff R > Â log 2 l i, l i 1 where l 1,...,l n = eigenvalues of plant matrix A. Holds under various assumptions and stability notions: I Random initial state, noiseless plant; mean rth power convergence to 0.[N.-Evans, Auto.03] I Bounded initial state, noiseless plant; uniform convergence to 0 [Tatikonda-Mitter, TAC04] I Random plant noise; mean-square boundedness.[n.-evans, SICON04] I Bounded plant noise; uniform boundedness [Tatikonda-Mitter, TAC04] Additive uncorrelated noise models of quantisation fail to capture the existence of such a threshold. Necessity typically proved using differential entropy power, quantisation theory or volume partitioning bounds. Sufficiency via explicit construction. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

33 Errorless Channels - Data Rate Theorem In fact, coding-based analyses reveal that stable state estimation/control possible iff R > Â log 2 l i, l i 1 where l 1,...,l n = eigenvalues of plant matrix A. Holds under various assumptions and stability notions: I Random initial state, noiseless plant; mean rth power convergence to 0.[N.-Evans, Auto.03] I Bounded initial state, noiseless plant; uniform convergence to 0 [Tatikonda-Mitter, TAC04] I Random plant noise; mean-square boundedness.[n.-evans, SICON04] I Bounded plant noise; uniform boundedness [Tatikonda-Mitter, TAC04] Additive uncorrelated noise models of quantisation fail to capture the existence of such a threshold. Necessity typically proved using differential entropy power, quantisation theory or volume partitioning bounds. Sufficiency via explicit construction. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

34 Errorless Channels - Data Rate Theorem In fact, coding-based analyses reveal that stable state estimation/control possible iff R > Â log 2 l i, l i 1 where l 1,...,l n = eigenvalues of plant matrix A. Holds under various assumptions and stability notions: I Random initial state, noiseless plant; mean rth power convergence to 0.[N.-Evans, Auto.03] I Bounded initial state, noiseless plant; uniform convergence to 0 [Tatikonda-Mitter, TAC04] I Random plant noise; mean-square boundedness.[n.-evans, SICON04] I Bounded plant noise; uniform boundedness [Tatikonda-Mitter, TAC04] Additive uncorrelated noise models of quantisation fail to capture the existence of such a threshold. Necessity typically proved using differential entropy power, quantisation theory or volume partitioning bounds. Sufficiency via explicit construction. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

35 Errorless Channels - Data Rate Theorem In fact, coding-based analyses reveal that stable state estimation/control possible iff R > Â log 2 l i, l i 1 where l 1,...,l n = eigenvalues of plant matrix A. Holds under various assumptions and stability notions: I Random initial state, noiseless plant; mean rth power convergence to 0.[N.-Evans, Auto.03] I Bounded initial state, noiseless plant; uniform convergence to 0 [Tatikonda-Mitter, TAC04] I Random plant noise; mean-square boundedness.[n.-evans, SICON04] I Bounded plant noise; uniform boundedness [Tatikonda-Mitter, TAC04] Additive uncorrelated noise models of quantisation fail to capture the existence of such a threshold. Necessity typically proved using differential entropy power, quantisation theory or volume partitioning bounds. Sufficiency via explicit construction. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

36 Errorless Channels - Data Rate Theorem In fact, coding-based analyses reveal that stable state estimation/control possible iff R > Â log 2 l i, l i 1 where l 1,...,l n = eigenvalues of plant matrix A. Holds under various assumptions and stability notions: I Random initial state, noiseless plant; mean rth power convergence to 0.[N.-Evans, Auto.03] I Bounded initial state, noiseless plant; uniform convergence to 0 [Tatikonda-Mitter, TAC04] I Random plant noise; mean-square boundedness.[n.-evans, SICON04] I Bounded plant noise; uniform boundedness [Tatikonda-Mitter, TAC04] Additive uncorrelated noise models of quantisation fail to capture the existence of such a threshold. Necessity typically proved using differential entropy power, quantisation theory or volume partitioning bounds. Sufficiency via explicit construction. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

37 Errorless Channels - Data Rate Theorem In fact, coding-based analyses reveal that stable state estimation/control possible iff R > Â log 2 l i, l i 1 where l 1,...,l n = eigenvalues of plant matrix A. Holds under various assumptions and stability notions: I Random initial state, noiseless plant; mean rth power convergence to 0.[N.-Evans, Auto.03] I Bounded initial state, noiseless plant; uniform convergence to 0 [Tatikonda-Mitter, TAC04] I Random plant noise; mean-square boundedness.[n.-evans, SICON04] I Bounded plant noise; uniform boundedness [Tatikonda-Mitter, TAC04] Additive uncorrelated noise models of quantisation fail to capture the existence of such a threshold. Necessity typically proved using differential entropy power, quantisation theory or volume partitioning bounds. Sufficiency via explicit construction. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

38 Noisy Channels Stable states/estimation errors possible iff a suitable channel figure-of-merit (FoM) satisfies FoM > Â log 2 l i, l i 1 where l 1,...,l n = eigenvalues of plant matrix A. Unlike noiseless channel case, FoM depends critically on stability notion and noise model: I FoM = C - states/est. errors! 0 almost surely (a.s.) [Matveev-Savkin SIAM07], or mean-square bounded (MSB) states over AWGN channel [Braslavsky et al. TAC07] I FoM = C any - MSB states over random discrete memoryless channels [Sahai-Mitter TIT06] I FoM = C0f for control or C 0 for state estimation, with a.s. bounded states/est. errors [Matveev-Savkin IJC07] As C C any C 0f C 0, these criteria generally do not coincide. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

39 Noisy Channels Stable states/estimation errors possible iff a suitable channel figure-of-merit (FoM) satisfies FoM > Â log 2 l i, l i 1 where l 1,...,l n = eigenvalues of plant matrix A. Unlike noiseless channel case, FoM depends critically on stability notion and noise model: I FoM = C - states/est. errors! 0 almost surely (a.s.) [Matveev-Savkin SIAM07], or mean-square bounded (MSB) states over AWGN channel [Braslavsky et al. TAC07] I FoM = C any - MSB states over random discrete memoryless channels [Sahai-Mitter TIT06] I FoM = C0f for control or C 0 for state estimation, with a.s. bounded states/est. errors [Matveev-Savkin IJC07] As C C any C 0f C 0, these criteria generally do not coincide. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

40 Noisy Channels Stable states/estimation errors possible iff a suitable channel figure-of-merit (FoM) satisfies FoM > Â log 2 l i, l i 1 where l 1,...,l n = eigenvalues of plant matrix A. Unlike noiseless channel case, FoM depends critically on stability notion and noise model: I FoM = C - states/est. errors! 0 almost surely (a.s.) [Matveev-Savkin SIAM07], or mean-square bounded (MSB) states over AWGN channel [Braslavsky et al. TAC07] I FoM = C any - MSB states over random discrete memoryless channels [Sahai-Mitter TIT06] I FoM = C0f for control or C 0 for state estimation, with a.s. bounded states/est. errors [Matveev-Savkin IJC07] As C C any C 0f C 0, these criteria generally do not coincide. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

41 Noisy Channels Stable states/estimation errors possible iff a suitable channel figure-of-merit (FoM) satisfies FoM > Â log 2 l i, l i 1 where l 1,...,l n = eigenvalues of plant matrix A. Unlike noiseless channel case, FoM depends critically on stability notion and noise model: I FoM = C - states/est. errors! 0 almost surely (a.s.) [Matveev-Savkin SIAM07], or mean-square bounded (MSB) states over AWGN channel [Braslavsky et al. TAC07] I FoM = C any - MSB states over random discrete memoryless channels [Sahai-Mitter TIT06] I FoM = C0f for control or C 0 for state estimation, with a.s. bounded states/est. errors [Matveev-Savkin IJC07] As C C any C 0f C 0, these criteria generally do not coincide. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

42 Noisy Channels Stable states/estimation errors possible iff a suitable channel figure-of-merit (FoM) satisfies FoM > Â log 2 l i, l i 1 where l 1,...,l n = eigenvalues of plant matrix A. Unlike noiseless channel case, FoM depends critically on stability notion and noise model: I FoM = C - states/est. errors! 0 almost surely (a.s.) [Matveev-Savkin SIAM07], or mean-square bounded (MSB) states over AWGN channel [Braslavsky et al. TAC07] I FoM = C any - MSB states over random discrete memoryless channels [Sahai-Mitter TIT06] I FoM = C0f for control or C 0 for state estimation, with a.s. bounded states/est. errors [Matveev-Savkin IJC07] As C C any C 0f C 0, these criteria generally do not coincide. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

43 Missing Information If the goal is MSB or a.s. convergence! 0 of states/estimation errors, then information theory is crucial for finding lower bounds. However, when the goal is a.s. bounded states/errors, classical information theory has played no role so far in networked estimation/control. Yet information in some sense must be flowing across the channel, even without a probabilistic model/objective. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

44 Questions Is there a meaningful theory of information for nonrandom variables? Can we construct an information-theoretic basis for networked estimation/control with nonrandom noise? Are there intrinsic, information-theoretic interpretations of C 0 and C 0f? Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

45 Why Nonstochastic Anyway? Long tradition in control of treating noise as nonrandom perturbation with bounded magnitude, energy or power: Control systems usually have mechanical/chemical components, as well as electrical. I Dominant disturbances may not be governed by known probability distributions. I E.g. in mechanical systems, main disturbance may be vibrations at resonant frequencies determined by machine dimensions and material properties. In contrast, communication systems are mainly electrical/electro-magnetic/optical. I Dominant disturbances - thermal noise, shot noise, fading etc. - well-modelled by probability distributions derived from statistical/quantum physics. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

46 Why Nonstochastic Anyway? Long tradition in control of treating noise as nonrandom perturbation with bounded magnitude, energy or power: Control systems usually have mechanical/chemical components, as well as electrical. I Dominant disturbances may not be governed by known probability distributions. I E.g. in mechanical systems, main disturbance may be vibrations at resonant frequencies determined by machine dimensions and material properties. In contrast, communication systems are mainly electrical/electro-magnetic/optical. I Dominant disturbances - thermal noise, shot noise, fading etc. - well-modelled by probability distributions derived from statistical/quantum physics. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

47 Why Nonstochastic Anyway? Long tradition in control of treating noise as nonrandom perturbation with bounded magnitude, energy or power: Control systems usually have mechanical/chemical components, as well as electrical. I Dominant disturbances may not be governed by known probability distributions. I E.g. in mechanical systems, main disturbance may be vibrations at resonant frequencies determined by machine dimensions and material properties. In contrast, communication systems are mainly electrical/electro-magnetic/optical. I Dominant disturbances - thermal noise, shot noise, fading etc. - well-modelled by probability distributions derived from statistical/quantum physics. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

48 Why Nonstochastic Anyway? Long tradition in control of treating noise as nonrandom perturbation with bounded magnitude, energy or power: Control systems usually have mechanical/chemical components, as well as electrical. I Dominant disturbances may not be governed by known probability distributions. I E.g. in mechanical systems, main disturbance may be vibrations at resonant frequencies determined by machine dimensions and material properties. In contrast, communication systems are mainly electrical/electro-magnetic/optical. I Dominant disturbances - thermal noise, shot noise, fading etc. - well-modelled by probability distributions derived from statistical/quantum physics. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

49 Why Nonstochastic Anyway? Long tradition in control of treating noise as nonrandom perturbation with bounded magnitude, energy or power: Control systems usually have mechanical/chemical components, as well as electrical. I Dominant disturbances may not be governed by known probability distributions. I E.g. in mechanical systems, main disturbance may be vibrations at resonant frequencies determined by machine dimensions and material properties. In contrast, communication systems are mainly electrical/electro-magnetic/optical. I Dominant disturbances - thermal noise, shot noise, fading etc. - well-modelled by probability distributions derived from statistical/quantum physics. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

50 Why Nonstochastic Anyway? Long tradition in control of treating noise as nonrandom perturbation with bounded magnitude, energy or power: Control systems usually have mechanical/chemical components, as well as electrical. I Dominant disturbances may not be governed by known probability distributions. I E.g. in mechanical systems, main disturbance may be vibrations at resonant frequencies determined by machine dimensions and material properties. In contrast, communication systems are mainly electrical/electro-magnetic/optical. I Dominant disturbances - thermal noise, shot noise, fading etc. - well-modelled by probability distributions derived from statistical/quantum physics. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

51 Why Nonstochastic Anyway? (cont.) Related to the previous points, In most digital comm. systems, bit periods T b s or shorter. ) Thermal and shot noise (s µ p T b ) noticeable compared to detected signal amplitudes (µ T b ). Control systems typically operate with longer sample or bit periods, 10 2 or 10 3 s. ) Thermal/shot noise negligible compared to signal amplitudes. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

52 Why Nonstochastic Anyway? (cont.) Related to the previous points, In most digital comm. systems, bit periods T b s or shorter. ) Thermal and shot noise (s µ p T b ) noticeable compared to detected signal amplitudes (µ T b ). Control systems typically operate with longer sample or bit periods, 10 2 or 10 3 s. ) Thermal/shot noise negligible compared to signal amplitudes. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

53 Why Nonstochastic Anyway? (cont.) For safety or mission-critical reasons, stability and performance guarantees often required every time a control system is used, if disturbances within rated bounds. Especially if plant is unstable or marginally stable. Or if we wish to interconnect several control systems and still be sure of performance. In contrast, most consumer-oriented communications requires good performance only on average, or with high probability. Occasional violations of specifications permitted, and cannot be prevented within a probabilistic framework. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

54 Why Nonstochastic Anyway? (cont.) For safety or mission-critical reasons, stability and performance guarantees often required every time a control system is used, if disturbances within rated bounds. Especially if plant is unstable or marginally stable. Or if we wish to interconnect several control systems and still be sure of performance. In contrast, most consumer-oriented communications requires good performance only on average, or with high probability. Occasional violations of specifications permitted, and cannot be prevented within a probabilistic framework. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

55 Why Nonstochastic Anyway? (cont.) For safety or mission-critical reasons, stability and performance guarantees often required every time a control system is used, if disturbances within rated bounds. Especially if plant is unstable or marginally stable. Or if we wish to interconnect several control systems and still be sure of performance. In contrast, most consumer-oriented communications requires good performance only on average, or with high probability. Occasional violations of specifications permitted, and cannot be prevented within a probabilistic framework. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

56 Probability in Practice If there s a fifty-fifty chance that something can go wrong, nine out of ten times, it will. (attrib. L. Yogi Berra, former US baseball player) (Photo from Wikipedia) Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

57 Uncertain Variable Formalism Define an uncertain variable (uv) X to be a mapping from an underlying sample space to a space X. Each w 2 may represent a specific combination of noise/input signals into a system, and X may represent a state/output variable. For a given w, x = X(w) is the realisation of X. Unlike probability theory, no s-algebra 2 or measure on is imposed. Assume is uncountable to accommodate continuous X. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

58 Uncertain Variable Formalism Define an uncertain variable (uv) X to be a mapping from an underlying sample space to a space X. Each w 2 may represent a specific combination of noise/input signals into a system, and X may represent a state/output variable. For a given w, x = X(w) is the realisation of X. Unlike probability theory, no s-algebra 2 or measure on is imposed. Assume is uncountable to accommodate continuous X. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

59 UV Formalism- Ranges and Conditioning Marginal range JXK := {X(w):w 2 } X. Joint range JX,Y K := {(X(w),Y (w)) : w 2 } X Y. Conditional range JX yk := {X(w):Y (w)=y,w 2 }. In the absence of statistical structure, the joint range fully characterises the relationship between X and Y. Note JX,Y K = [ y2jy K JX yk {y}, i.e. joint range is given by the conditional and marginal, analogously to probability theory. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

60 Independence Without Probability Definition The uv s X,Y are called (mutually) unrelated if denoted X? Y. Else called related. Equivalent characterisation: Proposition The uv s X,Y unrelated if JX,Y K = JXK JY K, (1) JX yk = JXK, 8y 2 JY K. (2) Unrelatedness is equivalent to X and Y inducing qualitatively independent [Rényi 70] partitions of when is finite. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

61 Examples of Relatedness and Unrelatedness y Y x' Y y Y y X, Y X, Y X y' X Y = Y x' y x x x X X = X y ' a) X,Y related b) X,Y unrelated x Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

62 Markovness without Probability Definition X,Y,Z said to form a Markov uncertainty chain X Y Z if JX y,zk = JX yk, 8(y,z) 2 JY,Z K. (3) Equivalent to JX,Z yk = JX yk JZ yk, 8y 2 JY K, i.e. X,Z conditionally unrelated given Y, or in other words X? Z Y. X,Y,Z said to form a conditional Markov uncertainty chain given W if X (Y,W ) Z. Also write as X Y Z W or X? Z (Y,W ). Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

63 Markovness without Probability Definition X,Y,Z said to form a Markov uncertainty chain X Y Z if JX y,zk = JX yk, 8(y,z) 2 JY,Z K. (3) Equivalent to JX,Z yk = JX yk JZ yk, 8y 2 JY K, i.e. X,Z conditionally unrelated given Y, or in other words X? Z Y. X,Y,Z said to form a conditional Markov uncertainty chain given W if X (Y,W ) Z. Also write as X Y Z W or X? Z (Y,W ). Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

64 Markovness without Probability Definition X,Y,Z said to form a Markov uncertainty chain X Y Z if JX y,zk = JX yk, 8(y,z) 2 JY,Z K. (3) Equivalent to JX,Z yk = JX yk JZ yk, 8y 2 JY K, i.e. X,Z conditionally unrelated given Y, or in other words X? Z Y. X,Y,Z said to form a conditional Markov uncertainty chain given W if X (Y,W ) Z. Also write as X Y Z W or X? Z (Y,W ). Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

65 Information without Probability Definition Two points (x,y),(x 0,y 0 ) 2 JX,Y K are called taxicab connected (x,y)! (x 0 y 0 ) if 9 a sequence (x,y)=(x 1,y 1 ),(x 2,y 2 ),...,(x n 1,y n 1 ),(x n,y n )=(x 0,y 0 ) of points in JX,Y K such that each point differs in only one coordinate from its predecessor. Not hard to see that! is an equivalence relation on JX,Y K. Call its equivalence classes a taxicab partition T [X;Y ] of JX,Y K. Define a nonstochastic information index I [X;Y ]:=log 2 T [X;Y ] 2 [0, ]. (4) Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

66 Information without Probability Definition Two points (x,y),(x 0,y 0 ) 2 JX,Y K are called taxicab connected (x,y)! (x 0 y 0 ) if 9 a sequence (x,y)=(x 1,y 1 ),(x 2,y 2 ),...,(x n 1,y n 1 ),(x n,y n )=(x 0,y 0 ) of points in JX,Y K such that each point differs in only one coordinate from its predecessor. Not hard to see that! is an equivalence relation on JX,Y K. Call its equivalence classes a taxicab partition T [X;Y ] of JX,Y K. Define a nonstochastic information index I [X;Y ]:=log 2 T [X;Y ] 2 [0, ]. (4) Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

67 Connection to Common Random Variables T [X; Y ] also called ergodic decomposition [Gács-Körner PCIT72]. For discrete X,Y, the elements of T [X;Y ] are the connected components of [Wolf-Wullschleger itw04], which were shown there to be the maximal common rv Z, i.e. I Z = f (X)=g (Y ) under suitable mappings f,g (since points in distinct sets in T [X;Y ] are not taxicab-connected) I If another rv Z f (X) g(y ), then Z k(z ) (since all points in the same set in T [X;Y ] are taxicab-connected) Not hard to see that Z also has the largest no. distinct values of any common rv Z f (X) g(y ). I [X;Y ]=Hartley entropy of Z. Maximal common rv s first described in the brief paper The lattice theory of information [Shannon TIT53]. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

68 Connection to Common Random Variables T [X; Y ] also called ergodic decomposition [Gács-Körner PCIT72]. For discrete X,Y, the elements of T [X;Y ] are the connected components of [Wolf-Wullschleger itw04], which were shown there to be the maximal common rv Z, i.e. I Z = f (X)=g (Y ) under suitable mappings f,g (since points in distinct sets in T [X;Y ] are not taxicab-connected) I If another rv Z f (X) g(y ), then Z k(z ) (since all points in the same set in T [X;Y ] are taxicab-connected) Not hard to see that Z also has the largest no. distinct values of any common rv Z f (X) g(y ). I [X;Y ]=Hartley entropy of Z. Maximal common rv s first described in the brief paper The lattice theory of information [Shannon TIT53]. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

69 Examples z=1 y y z=0 z=0 z=1 z=0 x z=0 x T = 2= max.# distinct values that can always be agreed on from separate observations of X & Y. T = 1= max.# distinct values that can always be agreed on from separate observations of X & Y. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

70 Equivalent View via Overlap Partitions As in probability, often easier to work with conditional rather than joint ranges. Let JX Y K := {JX yk : y 2 JY K} be the conditional range family. Definition Two points x,x 0 are called JX Y K-overlap-connected if 9 a sequence of sets B 1,...,B n 2 JX Y K s.t. x 2 B 1 and x 0 2 B n B i \ B i+1 6= /0, 8i 2 [1 : n 1]. Overlap connectedness is an equivalence relation on JX K, induced by JX Y K. Let the overlap partition JX Y K of JXK denote the equivalence classes. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

71 Equivalent View via Overlap Partitions As in probability, often easier to work with conditional rather than joint ranges. Let JX Y K := {JX yk : y 2 JY K} be the conditional range family. Definition Two points x,x 0 are called JX Y K-overlap-connected if 9 a sequence of sets B 1,...,B n 2 JX Y K s.t. x 2 B 1 and x 0 2 B n B i \ B i+1 6= /0, 8i 2 [1 : n 1]. Overlap connectedness is an equivalence relation on JX K, induced by JX Y K. Let the overlap partition JX Y K of JXK denote the equivalence classes. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

72 Equivalent View via Overlap Partitions (cont.) Proposition For any uv s X,Y, I [X;Y ]=log 2 JX Y K. (5) Proof Sketch: For any two points (x,y),(x 0,y 0 ) 2 JX,Y K, (x,y)! (x 0,y 0 ) iff x 0 and x 0 are JX Y K-overlap-connected. This allows us to set up a bijection between the partitions T [X;Y ] and JX Y K. ) T [X;Y ] and JX Y K must have the same cardinality. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

73 Equivalent View via Overlap Partitions (cont.) Proposition For any uv s X,Y, I [X;Y ]=log 2 JX Y K. (5) Proof Sketch: For any two points (x,y),(x 0,y 0 ) 2 JX,Y K, (x,y)! (x 0,y 0 ) iff x 0 and x 0 are JX Y K-overlap-connected. This allows us to set up a bijection between the partitions T [X;Y ] and JX Y K. ) T [X;Y ] and JX Y K must have the same cardinality. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

74 Equivalent View via Overlap Partitions (cont.) Proposition For any uv s X,Y, I [X;Y ]=log 2 JX Y K. (5) Proof Sketch: For any two points (x,y),(x 0,y 0 ) 2 JX,Y K, (x,y)! (x 0,y 0 ) iff x 0 and x 0 are JX Y K-overlap-connected. This allows us to set up a bijection between the partitions T [X;Y ] and JX Y K. ) T [X;Y ] and JX Y K must have the same cardinality. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

75 Properties of I (Nonnegativity) I [X;Y ] 0 (obvious) (Symmetry) I [X;Y ]=I [Y ;X]. Follows from the fact that (x,y)! (x 0,y 0 ) 2 JX,Y K () (y,x)! (y 0,x 0 ) 2 JY,XK. (6) From this property and (5), knowing just one of the conditional range families JX Y K or JY XK is enough to determine I [X;Y ]. Not like ordinary mutual information. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

76 Properties of I (cont.) Proposition (Monotonicity) For any uv s X,Y,Z, I [X;Y,Z ] I [X;Y ]. (7) Proof: Idea is to find a surjection from JX Y,Z K! JX Y K. This would automatically imply that the latter cannot have greater cardinality. Pick any set B 2 JX Y,Z K and choose a B 0 2 JX Y K s.t. B \ B 0 6= /0. At least one such B 0 exists, since JX Y K covers JXK B. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

77 Properties of I (cont.) Proposition (Monotonicity) For any uv s X,Y,Z, I [X;Y,Z ] I [X;Y ]. (7) Proof: Idea is to find a surjection from JX Y,Z K! JX Y K. This would automatically imply that the latter cannot have greater cardinality. Pick any set B 2 JX Y,Z K and choose a B 0 2 JX Y K s.t. B \ B 0 6= /0. At least one such B 0 exists, since JX Y K covers JXK B. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

78 Proof of Monotonic Property (cont.) Furthermore, exactly one such intersecting B 0 2 JX Y K exists for each B 2 JX Y,Z K, since B B 0 : I By definition, any x 2 B and x 0 2 B \ B 0 are connected by a sequence of successively overlapping sets in JX Y,Z K. I As JX y,zk JX yk, x,x 0 are also connected by a sequence of successively overlapping sets in JX Y K. I But B 0 = all pts. that are JX Y K-overlap connected with the representative pt. x 0 2 B 0, so x 2 B 0. I As x was arbitrary, B B 0. Thus B 7! B 0 is a well-defined map from JX Y,Z K! JX Y K. It is also onto since, as noted before, every set B 0 2 JX Y K intersects some B in JX Y,Z K, which covers JXK. So B 7! B 0 is the required surjection from JX Y,Z K! JX Y K. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

79 Proof of Monotonic Property (cont.) Furthermore, exactly one such intersecting B 0 2 JX Y K exists for each B 2 JX Y,Z K, since B B 0 : I By definition, any x 2 B and x 0 2 B \ B 0 are connected by a sequence of successively overlapping sets in JX Y,Z K. I As JX y,zk JX yk, x,x 0 are also connected by a sequence of successively overlapping sets in JX Y K. I But B 0 = all pts. that are JX Y K-overlap connected with the representative pt. x 0 2 B 0, so x 2 B 0. I As x was arbitrary, B B 0. Thus B 7! B 0 is a well-defined map from JX Y,Z K! JX Y K. It is also onto since, as noted before, every set B 0 2 JX Y K intersects some B in JX Y,Z K, which covers JXK. So B 7! B 0 is the required surjection from JX Y,Z K! JX Y K. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

80 Proof of Monotonic Property (cont.) Furthermore, exactly one such intersecting B 0 2 JX Y K exists for each B 2 JX Y,Z K, since B B 0 : I By definition, any x 2 B and x 0 2 B \ B 0 are connected by a sequence of successively overlapping sets in JX Y,Z K. I As JX y,zk JX yk, x,x 0 are also connected by a sequence of successively overlapping sets in JX Y K. I But B 0 = all pts. that are JX Y K-overlap connected with the representative pt. x 0 2 B 0, so x 2 B 0. I As x was arbitrary, B B 0. Thus B 7! B 0 is a well-defined map from JX Y,Z K! JX Y K. It is also onto since, as noted before, every set B 0 2 JX Y K intersects some B in JX Y,Z K, which covers JXK. So B 7! B 0 is the required surjection from JX Y,Z K! JX Y K. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

81 Proof of Monotonic Property (cont.) Furthermore, exactly one such intersecting B 0 2 JX Y K exists for each B 2 JX Y,Z K, since B B 0 : I By definition, any x 2 B and x 0 2 B \ B 0 are connected by a sequence of successively overlapping sets in JX Y,Z K. I As JX y,zk JX yk, x,x 0 are also connected by a sequence of successively overlapping sets in JX Y K. I But B 0 = all pts. that are JX Y K-overlap connected with the representative pt. x 0 2 B 0, so x 2 B 0. I As x was arbitrary, B B 0. Thus B 7! B 0 is a well-defined map from JX Y,Z K! JX Y K. It is also onto since, as noted before, every set B 0 2 JX Y K intersects some B in JX Y,Z K, which covers JXK. So B 7! B 0 is the required surjection from JX Y,Z K! JX Y K. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

82 Proof of Monotonic Property (cont.) Furthermore, exactly one such intersecting B 0 2 JX Y K exists for each B 2 JX Y,Z K, since B B 0 : I By definition, any x 2 B and x 0 2 B \ B 0 are connected by a sequence of successively overlapping sets in JX Y,Z K. I As JX y,zk JX yk, x,x 0 are also connected by a sequence of successively overlapping sets in JX Y K. I But B 0 = all pts. that are JX Y K-overlap connected with the representative pt. x 0 2 B 0, so x 2 B 0. I As x was arbitrary, B B 0. Thus B 7! B 0 is a well-defined map from JX Y,Z K! JX Y K. It is also onto since, as noted before, every set B 0 2 JX Y K intersects some B in JX Y,Z K, which covers JXK. So B 7! B 0 is the required surjection from JX Y,Z K! JX Y K. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

83 Proof of Monotonic Property (cont.) Furthermore, exactly one such intersecting B 0 2 JX Y K exists for each B 2 JX Y,Z K, since B B 0 : I By definition, any x 2 B and x 0 2 B \ B 0 are connected by a sequence of successively overlapping sets in JX Y,Z K. I As JX y,zk JX yk, x,x 0 are also connected by a sequence of successively overlapping sets in JX Y K. I But B 0 = all pts. that are JX Y K-overlap connected with the representative pt. x 0 2 B 0, so x 2 B 0. I As x was arbitrary, B B 0. Thus B 7! B 0 is a well-defined map from JX Y,Z K! JX Y K. It is also onto since, as noted before, every set B 0 2 JX Y K intersects some B in JX Y,Z K, which covers JXK. So B 7! B 0 is the required surjection from JX Y,Z K! JX Y K. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

84 Proof of Monotonic Property (cont.) Furthermore, exactly one such intersecting B 0 2 JX Y K exists for each B 2 JX Y,Z K, since B B 0 : I By definition, any x 2 B and x 0 2 B \ B 0 are connected by a sequence of successively overlapping sets in JX Y,Z K. I As JX y,zk JX yk, x,x 0 are also connected by a sequence of successively overlapping sets in JX Y K. I But B 0 = all pts. that are JX Y K-overlap connected with the representative pt. x 0 2 B 0, so x 2 B 0. I As x was arbitrary, B B 0. Thus B 7! B 0 is a well-defined map from JX Y,Z K! JX Y K. It is also onto since, as noted before, every set B 0 2 JX Y K intersects some B in JX Y,Z K, which covers JXK. So B 7! B 0 is the required surjection from JX Y,Z K! JX Y K. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

85 Properties of I (cont.) Proposition (Data Processing) For Markov uncertainty chains X Y Z (3), Proof: I [X;Z ] apple I [X;Y ]. By monotonicity and the overlap partition characterisation of I, I [X;Z ] (7) apple I [X;Y,Z ] (5) = log JX Y,Z K. (8) By Markovness (3), JX y,zk = JX yk, 8y 2 JY K and z 2 JZ yk. ) JX Y,Z K = JX Y K. ) JX Y,Z K = JX Y K. Substitute into the RHS of the equation above. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

86 Properties of I (cont.) Proposition (Data Processing) For Markov uncertainty chains X Y Z (3), Proof: I [X;Z ] apple I [X;Y ]. By monotonicity and the overlap partition characterisation of I, I [X;Z ] (7) apple I [X;Y,Z ] (5) = log JX Y,Z K. (8) By Markovness (3), JX y,zk = JX yk, 8y 2 JY K and z 2 JZ yk. ) JX Y,Z K = JX Y K. ) JX Y,Z K = JX Y K. Substitute into the RHS of the equation above. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

87 Properties of I (cont.) Proposition (Data Processing) For Markov uncertainty chains X Y Z (3), Proof: I [X;Z ] apple I [X;Y ]. By monotonicity and the overlap partition characterisation of I, I [X;Z ] (7) apple I [X;Y,Z ] (5) = log JX Y,Z K. (8) By Markovness (3), JX y,zk = JX yk, 8y 2 JY K and z 2 JZ yk. ) JX Y,Z K = JX Y K. ) JX Y,Z K = JX Y K. Substitute into the RHS of the equation above. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

88 Properties of I (cont.) Proposition (Data Processing) For Markov uncertainty chains X Y Z (3), Proof: I [X;Z ] apple I [X;Y ]. By monotonicity and the overlap partition characterisation of I, I [X;Z ] (7) apple I [X;Y,Z ] (5) = log JX Y,Z K. (8) By Markovness (3), JX y,zk = JX yk, 8y 2 JY K and z 2 JZ yk. ) JX Y,Z K = JX Y K. ) JX Y,Z K = JX Y K. Substitute into the RHS of the equation above. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

A Nonstochastic Information Theory for Communication and State Estimation

A Nonstochastic Information Theory for Communication and State Estimation 1 A Nonstochastic Information Theory for Communication and State Estimation Girish N. Nair analogues of the stochastic concepts mentioned above, without assuming a probability space. Such questions are

More information

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

Lecture 6 I. CHANNEL CODING. X n (m) P Y X 6- Introduction to Information Theory Lecture 6 Lecturer: Haim Permuter Scribe: Yoav Eisenberg and Yakov Miron I. CHANNEL CODING We consider the following channel coding problem: m = {,2,..,2 nr} Encoder

More information

Towards control over fading channels

Towards control over fading channels Towards control over fading channels Paolo Minero, Massimo Franceschetti Advanced Network Science University of California San Diego, CA, USA mail: {minero,massimo}@ucsd.edu Invited Paper) Subhrakanti

More information

Lecture 4 Noisy Channel Coding

Lecture 4 Noisy Channel Coding Lecture 4 Noisy Channel Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw October 9, 2015 1 / 56 I-Hsiang Wang IT Lecture 4 The Channel Coding Problem

More information

Lecture 5 Channel Coding over Continuous Channels

Lecture 5 Channel Coding over Continuous Channels Lecture 5 Channel Coding over Continuous Channels I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw November 14, 2014 1 / 34 I-Hsiang Wang NIT Lecture 5 From

More information

ELEC546 Review of Information Theory

ELEC546 Review of Information Theory ELEC546 Review of Information Theory Vincent Lau 1/1/004 1 Review of Information Theory Entropy: Measure of uncertainty of a random variable X. The entropy of X, H(X), is given by: If X is a discrete random

More information

Revision of Lecture 5

Revision of Lecture 5 Revision of Lecture 5 Information transferring across channels Channel characteristics and binary symmetric channel Average mutual information Average mutual information tells us what happens to information

More information

EE 4TM4: Digital Communications II. Channel Capacity

EE 4TM4: Digital Communications II. Channel Capacity EE 4TM4: Digital Communications II 1 Channel Capacity I. CHANNEL CODING THEOREM Definition 1: A rater is said to be achievable if there exists a sequence of(2 nr,n) codes such thatlim n P (n) e (C) = 0.

More information

Noisy channel communication

Noisy channel communication Information Theory http://www.inf.ed.ac.uk/teaching/courses/it/ Week 6 Communication channels and Information Some notes on the noisy channel setup: Iain Murray, 2012 School of Informatics, University

More information

ECE 4400:693 - Information Theory

ECE 4400:693 - Information Theory ECE 4400:693 - Information Theory Dr. Nghi Tran Lecture 8: Differential Entropy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 1 / 43 Outline 1 Review: Entropy of discrete RVs 2 Differential

More information

Lecture 4 Channel Coding

Lecture 4 Channel Coding Capacity and the Weak Converse Lecture 4 Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw October 15, 2014 1 / 16 I-Hsiang Wang NIT Lecture 4 Capacity

More information

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 2: Entropy and Mutual Information Chapter 2 outline Definitions Entropy Joint entropy, conditional entropy Relative entropy, mutual information Chain rules Jensen s inequality Log-sum inequality

More information

Common Information. Abbas El Gamal. Stanford University. Viterbi Lecture, USC, April 2014

Common Information. Abbas El Gamal. Stanford University. Viterbi Lecture, USC, April 2014 Common Information Abbas El Gamal Stanford University Viterbi Lecture, USC, April 2014 Andrew Viterbi s Fabulous Formula, IEEE Spectrum, 2010 El Gamal (Stanford University) Disclaimer Viterbi Lecture 2

More information

Information Theoretic Limits of Randomness Generation

Information Theoretic Limits of Randomness Generation Information Theoretic Limits of Randomness Generation Abbas El Gamal Stanford University Shannon Centennial, University of Michigan, September 2016 Information theory The fundamental problem of communication

More information

Capacity of a channel Shannon s second theorem. Information Theory 1/33

Capacity of a channel Shannon s second theorem. Information Theory 1/33 Capacity of a channel Shannon s second theorem Information Theory 1/33 Outline 1. Memoryless channels, examples ; 2. Capacity ; 3. Symmetric channels ; 4. Channel Coding ; 5. Shannon s second theorem,

More information

Lecture 5: Channel Capacity. Copyright G. Caire (Sample Lectures) 122

Lecture 5: Channel Capacity. Copyright G. Caire (Sample Lectures) 122 Lecture 5: Channel Capacity Copyright G. Caire (Sample Lectures) 122 M Definitions and Problem Setup 2 X n Y n Encoder p(y x) Decoder ˆM Message Channel Estimate Definition 11. Discrete Memoryless Channel

More information

LECTURE 2. Convexity and related notions. Last time: mutual information: definitions and properties. Lecture outline

LECTURE 2. Convexity and related notions. Last time: mutual information: definitions and properties. Lecture outline LECTURE 2 Convexity and related notions Last time: Goals and mechanics of the class notation entropy: definitions and properties mutual information: definitions and properties Lecture outline Convexity

More information

Computing and Communications 2. Information Theory -Entropy

Computing and Communications 2. Information Theory -Entropy 1896 1920 1987 2006 Computing and Communications 2. Information Theory -Entropy Ying Cui Department of Electronic Engineering Shanghai Jiao Tong University, China 2017, Autumn 1 Outline Entropy Joint entropy

More information

Entropies & Information Theory

Entropies & Information Theory Entropies & Information Theory LECTURE I Nilanjana Datta University of Cambridge,U.K. See lecture notes on: http://www.qi.damtp.cam.ac.uk/node/223 Quantum Information Theory Born out of Classical Information

More information

Lecture 1: Introduction, Entropy and ML estimation

Lecture 1: Introduction, Entropy and ML estimation 0-704: Information Processing and Learning Spring 202 Lecture : Introduction, Entropy and ML estimation Lecturer: Aarti Singh Scribes: Min Xu Disclaimer: These notes have not been subjected to the usual

More information

Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information

Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information 204 IEEE International Symposium on Information Theory Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information Omur Ozel, Kaya Tutuncuoglu 2, Sennur Ulukus, and Aylin Yener

More information

(Classical) Information Theory III: Noisy channel coding

(Classical) Information Theory III: Noisy channel coding (Classical) Information Theory III: Noisy channel coding Sibasish Ghosh The Institute of Mathematical Sciences CIT Campus, Taramani, Chennai 600 113, India. p. 1 Abstract What is the best possible way

More information

Solutions to Homework Set #4 Differential Entropy and Gaussian Channel

Solutions to Homework Set #4 Differential Entropy and Gaussian Channel Solutions to Homework Set #4 Differential Entropy and Gaussian Channel 1. Differential entropy. Evaluate the differential entropy h(x = f lnf for the following: (a Find the entropy of the exponential density

More information

Data Rate Theorem for Stabilization over Time-Varying Feedback Channels

Data Rate Theorem for Stabilization over Time-Varying Feedback Channels Data Rate Theorem for Stabilization over Time-Varying Feedback Channels Workshop on Frontiers in Distributed Communication, Sensing and Control Massimo Franceschetti, UCSD (joint work with P. Minero, S.

More information

Chapter I: Fundamental Information Theory

Chapter I: Fundamental Information Theory ECE-S622/T62 Notes Chapter I: Fundamental Information Theory Ruifeng Zhang Dept. of Electrical & Computer Eng. Drexel University. Information Source Information is the outcome of some physical processes.

More information

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University Chapter 4 Data Transmission and Channel Capacity Po-Ning Chen, Professor Department of Communications Engineering National Chiao Tung University Hsin Chu, Taiwan 30050, R.O.C. Principle of Data Transmission

More information

EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15

EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15 EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15 1. Cascade of Binary Symmetric Channels The conditional probability distribution py x for each of the BSCs may be expressed by the transition probability

More information

Revision of Lecture 4

Revision of Lecture 4 Revision of Lecture 4 We have completed studying digital sources from information theory viewpoint We have learnt all fundamental principles for source coding, provided by information theory Practical

More information

Lecture 8: Channel Capacity, Continuous Random Variables

Lecture 8: Channel Capacity, Continuous Random Variables EE376A/STATS376A Information Theory Lecture 8-02/0/208 Lecture 8: Channel Capacity, Continuous Random Variables Lecturer: Tsachy Weissman Scribe: Augustine Chemparathy, Adithya Ganesh, Philip Hwang Channel

More information

Basic Principles of Video Coding

Basic Principles of Video Coding Basic Principles of Video Coding Introduction Categories of Video Coding Schemes Information Theory Overview of Video Coding Techniques Predictive coding Transform coding Quantization Entropy coding Motion

More information

Ch. 8 Math Preliminaries for Lossy Coding. 8.5 Rate-Distortion Theory

Ch. 8 Math Preliminaries for Lossy Coding. 8.5 Rate-Distortion Theory Ch. 8 Math Preliminaries for Lossy Coding 8.5 Rate-Distortion Theory 1 Introduction Theory provide insight into the trade between Rate & Distortion This theory is needed to answer: What do typical R-D

More information

Communication Theory II

Communication Theory II Communication Theory II Lecture 15: Information Theory (cont d) Ahmed Elnakib, PhD Assistant Professor, Mansoura University, Egypt March 29 th, 2015 1 Example: Channel Capacity of BSC o Let then: o For

More information

Information measures in simple coding problems

Information measures in simple coding problems Part I Information measures in simple coding problems in this web service in this web service Source coding and hypothesis testing; information measures A(discrete)source is a sequence {X i } i= of random

More information

LECTURE 3. Last time:

LECTURE 3. Last time: LECTURE 3 Last time: Mutual Information. Convexity and concavity Jensen s inequality Information Inequality Data processing theorem Fano s Inequality Lecture outline Stochastic processes, Entropy rate

More information

Control Over Noisy Channels

Control Over Noisy Channels IEEE RANSACIONS ON AUOMAIC CONROL, VOL??, NO??, MONH?? 004 Control Over Noisy Channels Sekhar atikonda, Member, IEEE, and Sanjoy Mitter, Fellow, IEEE, Abstract Communication is an important component of

More information

Estimating a linear process using phone calls

Estimating a linear process using phone calls Estimating a linear process using phone calls Mohammad Javad Khojasteh, Massimo Franceschetti, Gireeja Ranade Abstract We consider the problem of estimating an undisturbed, scalar, linear process over

More information

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy Coding and Information Theory Chris Williams, School of Informatics, University of Edinburgh Overview What is information theory? Entropy Coding Information Theory Shannon (1948): Information theory is

More information

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157 Lecture 6: Gaussian Channels Copyright G. Caire (Sample Lectures) 157 Differential entropy (1) Definition 18. The (joint) differential entropy of a continuous random vector X n p X n(x) over R is: Z h(x

More information

Principles of Communications

Principles of Communications Principles of Communications Weiyao Lin Shanghai Jiao Tong University Chapter 10: Information Theory Textbook: Chapter 12 Communication Systems Engineering: Ch 6.1, Ch 9.1~ 9. 92 2009/2010 Meixia Tao @

More information

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1 Kraft s inequality An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if N 2 l i 1 Proof: Suppose that we have a tree code. Let l max = max{l 1,...,

More information

Shannon meets Wiener II: On MMSE estimation in successive decoding schemes

Shannon meets Wiener II: On MMSE estimation in successive decoding schemes Shannon meets Wiener II: On MMSE estimation in successive decoding schemes G. David Forney, Jr. MIT Cambridge, MA 0239 USA forneyd@comcast.net Abstract We continue to discuss why MMSE estimation arises

More information

Two Applications of the Gaussian Poincaré Inequality in the Shannon Theory

Two Applications of the Gaussian Poincaré Inequality in the Shannon Theory Two Applications of the Gaussian Poincaré Inequality in the Shannon Theory Vincent Y. F. Tan (Joint work with Silas L. Fong) National University of Singapore (NUS) 2016 International Zurich Seminar on

More information

Coding into a source: an inverse rate-distortion theorem

Coding into a source: an inverse rate-distortion theorem Coding into a source: an inverse rate-distortion theorem Anant Sahai joint work with: Mukul Agarwal Sanjoy K. Mitter Wireless Foundations Department of Electrical Engineering and Computer Sciences University

More information

Channel capacity. Outline : 1. Source entropy 2. Discrete memoryless channel 3. Mutual information 4. Channel capacity 5.

Channel capacity. Outline : 1. Source entropy 2. Discrete memoryless channel 3. Mutual information 4. Channel capacity 5. Channel capacity Outline : 1. Source entropy 2. Discrete memoryless channel 3. Mutual information 4. Channel capacity 5. Exercices Exercise session 11 : Channel capacity 1 1. Source entropy Given X a memoryless

More information

Noisy-Channel Coding

Noisy-Channel Coding Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/05264298 Part II Noisy-Channel Coding Copyright Cambridge University Press 2003.

More information

Lecture 11: Continuous-valued signals and differential entropy

Lecture 11: Continuous-valued signals and differential entropy Lecture 11: Continuous-valued signals and differential entropy Biology 429 Carl Bergstrom September 20, 2008 Sources: Parts of today s lecture follow Chapter 8 from Cover and Thomas (2007). Some components

More information

Shannon s A Mathematical Theory of Communication

Shannon s A Mathematical Theory of Communication Shannon s A Mathematical Theory of Communication Emre Telatar EPFL Kanpur October 19, 2016 First published in two parts in the July and October 1948 issues of BSTJ. First published in two parts in the

More information

RECENT advances in technology have led to increased activity

RECENT advances in technology have led to increased activity IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL 49, NO 9, SEPTEMBER 2004 1549 Stochastic Linear Control Over a Communication Channel Sekhar Tatikonda, Member, IEEE, Anant Sahai, Member, IEEE, and Sanjoy Mitter,

More information

Lecture 2: August 31

Lecture 2: August 31 0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 2: August 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy

More information

Chapter 4: Continuous channel and its capacity

Chapter 4: Continuous channel and its capacity meghdadi@ensil.unilim.fr Reference : Elements of Information Theory by Cover and Thomas Continuous random variable Gaussian multivariate random variable AWGN Band limited channel Parallel channels Flat

More information

x log x, which is strictly convex, and use Jensen s Inequality:

x log x, which is strictly convex, and use Jensen s Inequality: 2. Information measures: mutual information 2.1 Divergence: main inequality Theorem 2.1 (Information Inequality). D(P Q) 0 ; D(P Q) = 0 iff P = Q Proof. Let ϕ(x) x log x, which is strictly convex, and

More information

Lecture 3: Channel Capacity

Lecture 3: Channel Capacity Lecture 3: Channel Capacity 1 Definitions Channel capacity is a measure of maximum information per channel usage one can get through a channel. This one of the fundamental concepts in information theory.

More information

MODULATION AND CODING FOR QUANTIZED CHANNELS. Xiaoying Shao and Harm S. Cronie

MODULATION AND CODING FOR QUANTIZED CHANNELS. Xiaoying Shao and Harm S. Cronie MODULATION AND CODING FOR QUANTIZED CHANNELS Xiaoying Shao and Harm S. Cronie x.shao@ewi.utwente.nl, h.s.cronie@ewi.utwente.nl University of Twente, Faculty of EEMCS, Signals and Systems Group P.O. box

More information

Shannon s Noisy-Channel Coding Theorem

Shannon s Noisy-Channel Coding Theorem Shannon s Noisy-Channel Coding Theorem Lucas Slot Sebastian Zur February 13, 2015 Lucas Slot, Sebastian Zur Shannon s Noisy-Channel Coding Theorem February 13, 2015 1 / 29 Outline 1 Definitions and Terminology

More information

A Probability Review

A Probability Review A Probability Review Outline: A probability review Shorthand notation: RV stands for random variable EE 527, Detection and Estimation Theory, # 0b 1 A Probability Review Reading: Go over handouts 2 5 in

More information

Iterative Encoder-Controller Design for Feedback Control Over Noisy Channels

Iterative Encoder-Controller Design for Feedback Control Over Noisy Channels IEEE TRANSACTIONS ON AUTOMATIC CONTROL 1 Iterative Encoder-Controller Design for Feedback Control Over Noisy Channels Lei Bao, Member, IEEE, Mikael Skoglund, Senior Member, IEEE, and Karl Henrik Johansson,

More information

Introduction to Information Theory. B. Škorić, Physical Aspects of Digital Security, Chapter 2

Introduction to Information Theory. B. Škorić, Physical Aspects of Digital Security, Chapter 2 Introduction to Information Theory B. Škorić, Physical Aspects of Digital Security, Chapter 2 1 Information theory What is it? - formal way of counting information bits Why do we need it? - often used

More information

Information Theory - Entropy. Figure 3

Information Theory - Entropy. Figure 3 Concept of Information Information Theory - Entropy Figure 3 A typical binary coded digital communication system is shown in Figure 3. What is involved in the transmission of information? - The system

More information

MAHALAKSHMI ENGINEERING COLLEGE-TRICHY QUESTION BANK UNIT V PART-A. 1. What is binary symmetric channel (AUC DEC 2006)

MAHALAKSHMI ENGINEERING COLLEGE-TRICHY QUESTION BANK UNIT V PART-A. 1. What is binary symmetric channel (AUC DEC 2006) MAHALAKSHMI ENGINEERING COLLEGE-TRICHY QUESTION BANK SATELLITE COMMUNICATION DEPT./SEM.:ECE/VIII UNIT V PART-A 1. What is binary symmetric channel (AUC DEC 2006) 2. Define information rate? (AUC DEC 2007)

More information

Cut-Set Bound and Dependence Balance Bound

Cut-Set Bound and Dependence Balance Bound Cut-Set Bound and Dependence Balance Bound Lei Xiao lxiao@nd.edu 1 Date: 4 October, 2006 Reading: Elements of information theory by Cover and Thomas [1, Section 14.10], and the paper by Hekstra and Willems

More information

3F1 Information Theory, Lecture 1

3F1 Information Theory, Lecture 1 3F1 Information Theory, Lecture 1 Jossy Sayir Department of Engineering Michaelmas 2013, 22 November 2013 Organisation History Entropy Mutual Information 2 / 18 Course Organisation 4 lectures Course material:

More information

Encoder Decoder Design for Feedback Control over the Binary Symmetric Channel

Encoder Decoder Design for Feedback Control over the Binary Symmetric Channel Encoder Decoder Design for Feedback Control over the Binary Symmetric Channel Lei Bao, Mikael Skoglund and Karl Henrik Johansson School of Electrical Engineering, Royal Institute of Technology, Stockholm,

More information

Math 3215 Intro. Probability & Statistics Summer 14. Homework 5: Due 7/3/14

Math 3215 Intro. Probability & Statistics Summer 14. Homework 5: Due 7/3/14 Math 325 Intro. Probability & Statistics Summer Homework 5: Due 7/3/. Let X and Y be continuous random variables with joint/marginal p.d.f. s f(x, y) 2, x y, f (x) 2( x), x, f 2 (y) 2y, y. Find the conditional

More information

Also, in recent years, Tsallis proposed another entropy measure which in the case of a discrete random variable is given by

Also, in recent years, Tsallis proposed another entropy measure which in the case of a discrete random variable is given by Gibbs-Shannon Entropy and Related Measures: Tsallis Entropy Garimella Rama Murthy, Associate Professor, IIIT---Hyderabad, Gachibowli, HYDERABAD-32, AP, INDIA ABSTRACT In this research paper, it is proved

More information

Feedback Capacity of a Class of Symmetric Finite-State Markov Channels

Feedback Capacity of a Class of Symmetric Finite-State Markov Channels Feedback Capacity of a Class of Symmetric Finite-State Markov Channels Nevroz Şen, Fady Alajaji and Serdar Yüksel Department of Mathematics and Statistics Queen s University Kingston, ON K7L 3N6, Canada

More information

Source Coding and Function Computation: Optimal Rate in Zero-Error and Vanishing Zero-Error Regime

Source Coding and Function Computation: Optimal Rate in Zero-Error and Vanishing Zero-Error Regime Source Coding and Function Computation: Optimal Rate in Zero-Error and Vanishing Zero-Error Regime Solmaz Torabi Dept. of Electrical and Computer Engineering Drexel University st669@drexel.edu Advisor:

More information

Shannon s Noisy-Channel Coding Theorem

Shannon s Noisy-Channel Coding Theorem Shannon s Noisy-Channel Coding Theorem Lucas Slot Sebastian Zur February 2015 Abstract In information theory, Shannon s Noisy-Channel Coding Theorem states that it is possible to communicate over a noisy

More information

Submodularity in Machine Learning

Submodularity in Machine Learning Saifuddin Syed MLRG Summer 2016 1 / 39 What are submodular functions Outline 1 What are submodular functions Motivation Submodularity and Concavity Examples 2 Properties of submodular functions Submodularity

More information

The Method of Types and Its Application to Information Hiding

The Method of Types and Its Application to Information Hiding The Method of Types and Its Application to Information Hiding Pierre Moulin University of Illinois at Urbana-Champaign www.ifp.uiuc.edu/ moulin/talks/eusipco05-slides.pdf EUSIPCO Antalya, September 7,

More information

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9 MAT 570 REAL ANALYSIS LECTURE NOTES PROFESSOR: JOHN QUIGG SEMESTER: FALL 204 Contents. Sets 2 2. Functions 5 3. Countability 7 4. Axiom of choice 8 5. Equivalence relations 9 6. Real numbers 9 7. Extended

More information

Stochastic Stability and Ergodicity of Linear Gaussian Systems Controlled over Discrete Channels

Stochastic Stability and Ergodicity of Linear Gaussian Systems Controlled over Discrete Channels Stochastic Stability and Ergodicity of Linear Gaussian Systems Controlled over Discrete Channels Serdar Yüksel Abstract We present necessary and sufficient conditions for stochastic stabilizability of

More information

A Mathematical Framework for Robust Control Over Uncertain Communication Channels

A Mathematical Framework for Robust Control Over Uncertain Communication Channels A Mathematical Framework for Robust Control Over Uncertain Communication Channels Charalambos D. Charalambous and Alireza Farhadi Abstract In this paper, a mathematical framework for studying robust control

More information

ECE Information theory Final

ECE Information theory Final ECE 776 - Information theory Final Q1 (1 point) We would like to compress a Gaussian source with zero mean and variance 1 We consider two strategies In the first, we quantize with a step size so that the

More information

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Joint Probability Distributions and Random Samples (Devore Chapter Five) Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01: Probability and Statistics for Engineers Spring 2013 Contents 1 Joint Probability Distributions 2 1.1 Two Discrete

More information

Ch. 8 Math Preliminaries for Lossy Coding. 8.4 Info Theory Revisited

Ch. 8 Math Preliminaries for Lossy Coding. 8.4 Info Theory Revisited Ch. 8 Math Preliminaries for Lossy Coding 8.4 Info Theory Revisited 1 Info Theory Goals for Lossy Coding Again just as for the lossless case Info Theory provides: Basis for Algorithms & Bounds on Performance

More information

Chapter 8: Differential entropy. University of Illinois at Chicago ECE 534, Natasha Devroye

Chapter 8: Differential entropy. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 8: Differential entropy Chapter 8 outline Motivation Definitions Relation to discrete entropy Joint and conditional differential entropy Relative entropy and mutual information Properties AEP for

More information

Entropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information

Entropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information Entropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information 1 Conditional entropy Let (Ω, F, P) be a probability space, let X be a RV taking values in some finite set A. In this lecture

More information

Variable Length Codes for Degraded Broadcast Channels

Variable Length Codes for Degraded Broadcast Channels Variable Length Codes for Degraded Broadcast Channels Stéphane Musy School of Computer and Communication Sciences, EPFL CH-1015 Lausanne, Switzerland Email: stephane.musy@ep.ch Abstract This paper investigates

More information

Classical Information Theory Notes from the lectures by prof Suhov Trieste - june 2006

Classical Information Theory Notes from the lectures by prof Suhov Trieste - june 2006 Classical Information Theory Notes from the lectures by prof Suhov Trieste - june 2006 Fabio Grazioso... July 3, 2006 1 2 Contents 1 Lecture 1, Entropy 4 1.1 Random variable...............................

More information

Universal Anytime Codes: An approach to uncertain channels in control

Universal Anytime Codes: An approach to uncertain channels in control Universal Anytime Codes: An approach to uncertain channels in control paper by Stark Draper and Anant Sahai presented by Sekhar Tatikonda Wireless Foundations Department of Electrical Engineering and Computer

More information

Capacity-achieving Feedback Scheme for Flat Fading Channels with Channel State Information

Capacity-achieving Feedback Scheme for Flat Fading Channels with Channel State Information Capacity-achieving Feedback Scheme for Flat Fading Channels with Channel State Information Jialing Liu liujl@iastate.edu Sekhar Tatikonda sekhar.tatikonda@yale.edu Nicola Elia nelia@iastate.edu Dept. of

More information

A General Formula for Compound Channel Capacity

A General Formula for Compound Channel Capacity A General Formula for Compound Channel Capacity Sergey Loyka, Charalambos D. Charalambous University of Ottawa, University of Cyprus ETH Zurich (May 2015), ISIT-15 1/32 Outline 1 Introduction 2 Channel

More information

MAHALAKSHMI ENGINEERING COLLEGE QUESTION BANK. SUBJECT CODE / Name: EC2252 COMMUNICATION THEORY UNIT-V INFORMATION THEORY PART-A

MAHALAKSHMI ENGINEERING COLLEGE QUESTION BANK. SUBJECT CODE / Name: EC2252 COMMUNICATION THEORY UNIT-V INFORMATION THEORY PART-A MAHALAKSHMI ENGINEERING COLLEGE QUESTION BANK DEPARTMENT: ECE SEMESTER: IV SUBJECT CODE / Name: EC2252 COMMUNICATION THEORY UNIT-V INFORMATION THEORY PART-A 1. What is binary symmetric channel (AUC DEC

More information

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information. L65 Dept. of Linguistics, Indiana University Fall 205 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission rate

More information

Dept. of Linguistics, Indiana University Fall 2015

Dept. of Linguistics, Indiana University Fall 2015 L645 Dept. of Linguistics, Indiana University Fall 2015 1 / 28 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission

More information

An introduction to basic information theory. Hampus Wessman

An introduction to basic information theory. Hampus Wessman An introduction to basic information theory Hampus Wessman Abstract We give a short and simple introduction to basic information theory, by stripping away all the non-essentials. Theoretical bounds on

More information

Variable selection and feature construction using methods related to information theory

Variable selection and feature construction using methods related to information theory Outline Variable selection and feature construction using methods related to information theory Kari 1 1 Intelligent Systems Lab, Motorola, Tempe, AZ IJCNN 2007 Outline Outline 1 Information Theory and

More information

CS6304 / Analog and Digital Communication UNIT IV - SOURCE AND ERROR CONTROL CODING PART A 1. What is the use of error control coding? The main use of error control coding is to reduce the overall probability

More information

Feedback Stabilization over Signal-to-Noise Ratio Constrained Channels

Feedback Stabilization over Signal-to-Noise Ratio Constrained Channels Feedback Stabilization over Signal-to-Noise Ratio Constrained Channels Julio H. Braslavsky, Rick H. Middleton and Jim S. Freudenberg Abstract There has recently been significant interest in feedback stabilization

More information

Arimoto Channel Coding Converse and Rényi Divergence

Arimoto Channel Coding Converse and Rényi Divergence Arimoto Channel Coding Converse and Rényi Divergence Yury Polyanskiy and Sergio Verdú Abstract Arimoto proved a non-asymptotic upper bound on the probability of successful decoding achievable by any code

More information

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed

More information

Characterization of Information Channels for Asymptotic Mean Stationarity and Stochastic Stability of Nonstationary/Unstable Linear Systems

Characterization of Information Channels for Asymptotic Mean Stationarity and Stochastic Stability of Nonstationary/Unstable Linear Systems 6332 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 58, NO 10, OCTOBER 2012 Characterization of Information Channels for Asymptotic Mean Stationarity and Stochastic Stability of Nonstationary/Unstable Linear

More information

Second-Order Asymptotics in Information Theory

Second-Order Asymptotics in Information Theory Second-Order Asymptotics in Information Theory Vincent Y. F. Tan (vtan@nus.edu.sg) Dept. of ECE and Dept. of Mathematics National University of Singapore (NUS) National Taiwan University November 2015

More information

ECE 650 Lecture 4. Intro to Estimation Theory Random Vectors. ECE 650 D. Van Alphen 1

ECE 650 Lecture 4. Intro to Estimation Theory Random Vectors. ECE 650 D. Van Alphen 1 EE 650 Lecture 4 Intro to Estimation Theory Random Vectors EE 650 D. Van Alphen 1 Lecture Overview: Random Variables & Estimation Theory Functions of RV s (5.9) Introduction to Estimation Theory MMSE Estimation

More information

Communication constraints and latency in Networked Control Systems

Communication constraints and latency in Networked Control Systems Communication constraints and latency in Networked Control Systems João P. Hespanha Center for Control Engineering and Computation University of California Santa Barbara In collaboration with Antonio Ortega

More information

Optimality of Walrand-Varaiya Type Policies and. Approximation Results for Zero-Delay Coding of. Markov Sources. Richard G. Wood

Optimality of Walrand-Varaiya Type Policies and. Approximation Results for Zero-Delay Coding of. Markov Sources. Richard G. Wood Optimality of Walrand-Varaiya Type Policies and Approximation Results for Zero-Delay Coding of Markov Sources by Richard G. Wood A thesis submitted to the Department of Mathematics & Statistics in conformity

More information

PROOF OF ZADOR-GERSHO THEOREM

PROOF OF ZADOR-GERSHO THEOREM ZADOR-GERSHO THEOREM FOR VARIABLE-RATE VQ For a stationary source and large R, the least distortion of k-dim'l VQ with nth-order entropy coding and rate R or less is δ(k,n,r) m k * σ 2 η kn 2-2R = Z(k,n,R)

More information

The Effect of Memory Order on the Capacity of Finite-State Markov and Flat-Fading Channels

The Effect of Memory Order on the Capacity of Finite-State Markov and Flat-Fading Channels The Effect of Memory Order on the Capacity of Finite-State Markov and Flat-Fading Channels Parastoo Sadeghi National ICT Australia (NICTA) Sydney NSW 252 Australia Email: parastoo@student.unsw.edu.au Predrag

More information

Feedback Stabilization over a First Order Moving Average Gaussian Noise Channel

Feedback Stabilization over a First Order Moving Average Gaussian Noise Channel Feedback Stabiliation over a First Order Moving Average Gaussian Noise Richard H. Middleton Alejandro J. Rojas James S. Freudenberg Julio H. Braslavsky Abstract Recent developments in information theory

More information

Limits on classical communication from quantum entropy power inequalities

Limits on classical communication from quantum entropy power inequalities Limits on classical communication from quantum entropy power inequalities Graeme Smith, IBM Research (joint work with Robert Koenig) QIP 2013 Beijing Channel Capacity X N Y p(y x) Capacity: bits per channel

More information