Nonstochastic Information Theory for Feedback Control

Size: px

Start display at page:

Download "Nonstochastic Information Theory for Feedback Control"

Spencer Gordon
5 years ago
Views:

1 Nonstochastic Information Theory for Feedback Control Girish Nair Department of Electrical and Electronic Engineering University of Melbourne Australia Dutch Institute of Systems and Control Summer School Zandvoort aan Zee The Netherlands June 2015 Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

2 Outline 1 Basics of Shannon Information Theory 2 Overview of Capacity-Limited State Estimation/Control 3 Motivation for Nonstochastic Control 4 Uncertain Variables, Unrelatedness and Markovness 5 Nonstochastic Information 6 Channels and Coding Theorems 7 State Estimation and Control via Noisy Channels Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

3 Outline 1 Basics of Shannon Information Theory 2 Overview of Capacity-Limited State Estimation/Control 3 Motivation for Nonstochastic Control 4 Uncertain Variables, Unrelatedness and Markovness 5 Nonstochastic Information 6 Channels and Coding Theorems 7 State Estimation and Control via Noisy Channels Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

4 Outline 1 Basics of Shannon Information Theory 2 Overview of Capacity-Limited State Estimation/Control 3 Motivation for Nonstochastic Control 4 Uncertain Variables, Unrelatedness and Markovness 5 Nonstochastic Information 6 Channels and Coding Theorems 7 State Estimation and Control via Noisy Channels Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

5 Outline 1 Basics of Shannon Information Theory 2 Overview of Capacity-Limited State Estimation/Control 3 Motivation for Nonstochastic Control 4 Uncertain Variables, Unrelatedness and Markovness 5 Nonstochastic Information 6 Channels and Coding Theorems 7 State Estimation and Control via Noisy Channels Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

6 Outline 1 Basics of Shannon Information Theory 2 Overview of Capacity-Limited State Estimation/Control 3 Motivation for Nonstochastic Control 4 Uncertain Variables, Unrelatedness and Markovness 5 Nonstochastic Information 6 Channels and Coding Theorems 7 State Estimation and Control via Noisy Channels Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

7 Outline 1 Basics of Shannon Information Theory 2 Overview of Capacity-Limited State Estimation/Control 3 Motivation for Nonstochastic Control 4 Uncertain Variables, Unrelatedness and Markovness 5 Nonstochastic Information 6 Channels and Coding Theorems 7 State Estimation and Control via Noisy Channels Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

8 Outline 1 Basics of Shannon Information Theory 2 Overview of Capacity-Limited State Estimation/Control 3 Motivation for Nonstochastic Control 4 Uncertain Variables, Unrelatedness and Markovness 5 Nonstochastic Information 6 Channels and Coding Theorems 7 State Estimation and Control via Noisy Channels Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

9 What is Information? Wikipedia: Information is that which informs i.e. that from which data and knowledge can be derived (as data represents values attributed to parameters, while knowledge is acquired through understanding of real things or abstract concepts, in any particular field of study). In [Shannon BSTJ 1948], information was somewhat more concretely defined, within a probability space. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

10 What is Information? Wikipedia: Information is that which informs i.e. that from which data and knowledge can be derived (as data represents values attributed to parameters, while knowledge is acquired through understanding of real things or abstract concepts, in any particular field of study). In [Shannon BSTJ 1948], information was somewhat more concretely defined, within a probability space. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

11 What is Information? Wikipedia: Information is that which informs i.e. that from which data and knowledge can be derived (as data represents values attributed to parameters, while knowledge is acquired through understanding of real things or abstract concepts, in any particular field of study). In [Shannon BSTJ 1948], information was somewhat more concretely defined, within a probability space. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

12 What is Information? Wikipedia: Information is that which informs i.e. that from which data and knowledge can be derived (as data represents values attributed to parameters, while knowledge is acquired through understanding of real things or abstract concepts, in any particular field of study). In [Shannon BSTJ 1948], information was somewhat more concretely defined, within a probability space. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

13 Shannon Entropy Prior uncertainty or entropy of a discrete random variable (rv) X p X apple 1 H[X]:=E log 2 = p X (X) Âp X (x)log 2 p X (x) 0. x Minimum expected no. yes/no questions sufficient to determine X. Joint (discrete) entropy H[X,Y ] defined by replacing p X with p X,Y. Conditional entropy of X given Y is average uncertainty in X given Y : 1 H[X Y ]:=E applelog 2 p X Y (X Y ) H[X,Y ] H[Y ] ( 0). Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

14 Shannon Entropy Prior uncertainty or entropy of a discrete random variable (rv) X p X apple 1 H[X]:=E log 2 = p X (X) Âp X (x)log 2 p X (x) 0. x Minimum expected no. yes/no questions sufficient to determine X. Joint (discrete) entropy H[X,Y ] defined by replacing p X with p X,Y. Conditional entropy of X given Y is average uncertainty in X given Y : 1 H[X Y ]:=E applelog 2 p X Y (X Y ) H[X,Y ] H[Y ] ( 0). Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

15 Shannon Entropy Prior uncertainty or entropy of a discrete random variable (rv) X p X apple 1 H[X]:=E log 2 = p X (X) Âp X (x)log 2 p X (x) 0. x Minimum expected no. yes/no questions sufficient to determine X. Joint (discrete) entropy H[X,Y ] defined by replacing p X with p X,Y. Conditional entropy of X given Y is average uncertainty in X given Y : 1 H[X Y ]:=E applelog 2 p X Y (X Y ) H[X,Y ] H[Y ] ( 0). Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

16 Shannon Information Information gained about X from Y := Reduction in uncertainty: I[X;Y ]:=H[X] H[X Y ]. Called mutual information since symmetric: px (x)p I[X;Y ] = Â Y (y) p X,Y (x,y)log 2 x,y p X,Y (x,y) H[X]+H[Y ] H[X,Y ]. For continuous X,Y, replace pmf s p with corresponding pdf s f. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

17 Shannon Information Information gained about X from Y := Reduction in uncertainty: I[X;Y ]:=H[X] H[X Y ]. Called mutual information since symmetric: px (x)p I[X;Y ] = Â Y (y) p X,Y (x,y)log 2 x,y p X,Y (x,y) H[X]+H[Y ] H[X,Y ]. For continuous X,Y, replace pmf s p with corresponding pdf s f. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

18 Shannon Information Information gained about X from Y := Reduction in uncertainty: I[X;Y ]:=H[X] H[X Y ]. Called mutual information since symmetric: px (x)p I[X;Y ] = Â Y (y) p X,Y (x,y)log 2 x,y p X,Y (x,y) H[X]+H[Y ] H[X,Y ]. For continuous X,Y, replace pmf s p with corresponding pdf s f. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

19 Codes for Stationary Memoryless Random Channels p p Y0: n X 0: n Yk X k ( y ( y x) q( y x) 0: n x 0: n ) n k 0 q( y x ) k k Message M Coder X k Channel Y k Decoder Mˆ A block code is defined by an error tolerance e > 0, block length n N and message-set cardinality µ 1; an encoder mapping g s.t. for any independent, uniformly distributed message M 2{m 1,...,m µ }, X 0:n = g(i) if M = m i ; and a decoder ˆM = d(y 0:n ) s.t. Pr[ ˆM 6= M] apple e. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

20 Codes for Stationary Memoryless Random Channels p p Y0: n X 0: n Yk X k ( y ( y x) q( y x) 0: n x 0: n ) n k 0 q( y x ) k k Message M Coder X k Channel Y k Decoder Mˆ A block code is defined by an error tolerance e > 0, block length n N and message-set cardinality µ 1; an encoder mapping g s.t. for any independent, uniformly distributed message M 2{m 1,...,m µ }, X 0:n = g(i) if M = m i ; and a decoder ˆM = d(y 0:n ) s.t. Pr[ ˆM 6= M] apple e. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

21 Codes for Stationary Memoryless Random Channels p p Y0: n X 0: n Yk X k ( y ( y x) q( y x) 0: n x 0: n ) n k 0 q( y x ) k k Message M Coder X k Channel Y k Decoder Mˆ A block code is defined by an error tolerance e > 0, block length n N and message-set cardinality µ 1; an encoder mapping g s.t. for any independent, uniformly distributed message M 2{m 1,...,m µ }, X 0:n = g(i) if M = m i ; and a decoder ˆM = d(y 0:n ) s.t. Pr[ ˆM 6= M] apple e. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

22 Codes for Stationary Memoryless Random Channels p p Y0: n X 0: n Yk X k ( y ( y x) q( y x) 0: n x 0: n ) n k 0 q( y x ) k k Message M Coder X k Channel Y k Decoder Mˆ A block code is defined by an error tolerance e > 0, block length n N and message-set cardinality µ 1; an encoder mapping g s.t. for any independent, uniformly distributed message M 2{m 1,...,m µ }, X 0:n = g(i) if M = m i ; and a decoder ˆM = d(y 0:n ) s.t. Pr[ ˆM 6= M] apple e. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

23 Capacity and Information Define (ordinary) capacity C operationally as the highest block-coding rate that yields vanishing error probability: log C := lim sup 2 µ e!0 n,µ2n,g,d n + 1 = lim lim e!0 n! sup µ2n,g,d log 2 µ n + 1. Shannon showed that capacity can also be thought of intrinsically, as the maximum information rate across channel: Theorem (Shannon BSTJ 1948) I[X 0:n ;Y 0:n ] C = sup n 0,p X0:n n + 1 = supi[x;y ] p X Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

24 Capacity and Information Define (ordinary) capacity C operationally as the highest block-coding rate that yields vanishing error probability: log C := lim sup 2 µ e!0 n,µ2n,g,d n + 1 = lim lim e!0 n! sup µ2n,g,d log 2 µ n + 1. Shannon showed that capacity can also be thought of intrinsically, as the maximum information rate across channel: Theorem (Shannon BSTJ 1948) I[X 0:n ;Y 0:n ] C = sup n 0,p X0:n n + 1 = supi[x;y ] p X Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

25 Networked State Estimation/Control Classical assumption: controllers and estimators knew plant outputs perfectly. Since the 60 s this assumption has been challenged: I Delays, due to latency and intermittent channel access, in control area networks. I Quantisation errors in digital control, I Finite communication capacity per sensor in long-range radar surveillance networks Focus here on limited quantiser resolution and capacity. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

26 Estimation/Control over Communication Channels U k X Y GX W, k k 1 k k AX BU V k k k Yk Quantiser/ Coder S k Channel Q k Decoder/ Estimator Xˆ k NoiseV, W k k Decoder/ Controller U k X Y GX W, k k 1 k k AX BU V Noise V, W k k k k k Yk Qk Channel S k Quantiser/ Coder Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

27 Additive Noise Model Early work considered errorless digital channels and static quantisers, with uniform quantiser errors modelled as additive, uncorrelated noise [e.g. Curry 1970] with variance µ 2 2R (R = bit rate). Good approximation for stable plants and high R, and allows linear stochastic estimation/control theory to be applied. However, for unstable plants it leads to conclusions that are wrong, e.g. I I if plant is noiseless and unstable, then states/estimation errors cannot converge to 0; and if plant is unstable, then mean-square-bounded states/estimation errors can always be achieved. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

28 Additive Noise Model Early work considered errorless digital channels and static quantisers, with uniform quantiser errors modelled as additive, uncorrelated noise [e.g. Curry 1970] with variance µ 2 2R (R = bit rate). Good approximation for stable plants and high R, and allows linear stochastic estimation/control theory to be applied. However, for unstable plants it leads to conclusions that are wrong, e.g. I I if plant is noiseless and unstable, then states/estimation errors cannot converge to 0; and if plant is unstable, then mean-square-bounded states/estimation errors can always be achieved. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

29 Additive Noise Model Early work considered errorless digital channels and static quantisers, with uniform quantiser errors modelled as additive, uncorrelated noise [e.g. Curry 1970] with variance µ 2 2R (R = bit rate). Good approximation for stable plants and high R, and allows linear stochastic estimation/control theory to be applied. However, for unstable plants it leads to conclusions that are wrong, e.g. I I if plant is noiseless and unstable, then states/estimation errors cannot converge to 0; and if plant is unstable, then mean-square-bounded states/estimation errors can always be achieved. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

30 Additive Noise Model Early work considered errorless digital channels and static quantisers, with uniform quantiser errors modelled as additive, uncorrelated noise [e.g. Curry 1970] with variance µ 2 2R (R = bit rate). Good approximation for stable plants and high R, and allows linear stochastic estimation/control theory to be applied. However, for unstable plants it leads to conclusions that are wrong, e.g. I I if plant is noiseless and unstable, then states/estimation errors cannot converge to 0; and if plant is unstable, then mean-square-bounded states/estimation errors can always be achieved. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

31 Errorless Channels - Data Rate Theorem In fact, coding-based analyses reveal that stable state estimation/control possible iff R > Â log 2 l i, l i 1 where l 1,...,l n = eigenvalues of plant matrix A. Holds under various assumptions and stability notions: I Random initial state, noiseless plant; mean rth power convergence to 0.[N.-Evans, Auto.03] I Bounded initial state, noiseless plant; uniform convergence to 0 [Tatikonda-Mitter, TAC04] I Random plant noise; mean-square boundedness.[n.-evans, SICON04] I Bounded plant noise; uniform boundedness [Tatikonda-Mitter, TAC04] Additive uncorrelated noise models of quantisation fail to capture the existence of such a threshold. Necessity typically proved using differential entropy power, quantisation theory or volume partitioning bounds. Sufficiency via explicit construction. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

32 Errorless Channels - Data Rate Theorem In fact, coding-based analyses reveal that stable state estimation/control possible iff R > Â log 2 l i, l i 1 where l 1,...,l n = eigenvalues of plant matrix A. Holds under various assumptions and stability notions: I Random initial state, noiseless plant; mean rth power convergence to 0.[N.-Evans, Auto.03] I Bounded initial state, noiseless plant; uniform convergence to 0 [Tatikonda-Mitter, TAC04] I Random plant noise; mean-square boundedness.[n.-evans, SICON04] I Bounded plant noise; uniform boundedness [Tatikonda-Mitter, TAC04] Additive uncorrelated noise models of quantisation fail to capture the existence of such a threshold. Necessity typically proved using differential entropy power, quantisation theory or volume partitioning bounds. Sufficiency via explicit construction. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

33 Errorless Channels - Data Rate Theorem In fact, coding-based analyses reveal that stable state estimation/control possible iff R > Â log 2 l i, l i 1 where l 1,...,l n = eigenvalues of plant matrix A. Holds under various assumptions and stability notions: I Random initial state, noiseless plant; mean rth power convergence to 0.[N.-Evans, Auto.03] I Bounded initial state, noiseless plant; uniform convergence to 0 [Tatikonda-Mitter, TAC04] I Random plant noise; mean-square boundedness.[n.-evans, SICON04] I Bounded plant noise; uniform boundedness [Tatikonda-Mitter, TAC04] Additive uncorrelated noise models of quantisation fail to capture the existence of such a threshold. Necessity typically proved using differential entropy power, quantisation theory or volume partitioning bounds. Sufficiency via explicit construction. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

34 Errorless Channels - Data Rate Theorem In fact, coding-based analyses reveal that stable state estimation/control possible iff R > Â log 2 l i, l i 1 where l 1,...,l n = eigenvalues of plant matrix A. Holds under various assumptions and stability notions: I Random initial state, noiseless plant; mean rth power convergence to 0.[N.-Evans, Auto.03] I Bounded initial state, noiseless plant; uniform convergence to 0 [Tatikonda-Mitter, TAC04] I Random plant noise; mean-square boundedness.[n.-evans, SICON04] I Bounded plant noise; uniform boundedness [Tatikonda-Mitter, TAC04] Additive uncorrelated noise models of quantisation fail to capture the existence of such a threshold. Necessity typically proved using differential entropy power, quantisation theory or volume partitioning bounds. Sufficiency via explicit construction. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

35 Errorless Channels - Data Rate Theorem In fact, coding-based analyses reveal that stable state estimation/control possible iff R > Â log 2 l i, l i 1 where l 1,...,l n = eigenvalues of plant matrix A. Holds under various assumptions and stability notions: I Random initial state, noiseless plant; mean rth power convergence to 0.[N.-Evans, Auto.03] I Bounded initial state, noiseless plant; uniform convergence to 0 [Tatikonda-Mitter, TAC04] I Random plant noise; mean-square boundedness.[n.-evans, SICON04] I Bounded plant noise; uniform boundedness [Tatikonda-Mitter, TAC04] Additive uncorrelated noise models of quantisation fail to capture the existence of such a threshold. Necessity typically proved using differential entropy power, quantisation theory or volume partitioning bounds. Sufficiency via explicit construction. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

36 Errorless Channels - Data Rate Theorem In fact, coding-based analyses reveal that stable state estimation/control possible iff R > Â log 2 l i, l i 1 where l 1,...,l n = eigenvalues of plant matrix A. Holds under various assumptions and stability notions: I Random initial state, noiseless plant; mean rth power convergence to 0.[N.-Evans, Auto.03] I Bounded initial state, noiseless plant; uniform convergence to 0 [Tatikonda-Mitter, TAC04] I Random plant noise; mean-square boundedness.[n.-evans, SICON04] I Bounded plant noise; uniform boundedness [Tatikonda-Mitter, TAC04] Additive uncorrelated noise models of quantisation fail to capture the existence of such a threshold. Necessity typically proved using differential entropy power, quantisation theory or volume partitioning bounds. Sufficiency via explicit construction. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

37 Errorless Channels - Data Rate Theorem In fact, coding-based analyses reveal that stable state estimation/control possible iff R > Â log 2 l i, l i 1 where l 1,...,l n = eigenvalues of plant matrix A. Holds under various assumptions and stability notions: I Random initial state, noiseless plant; mean rth power convergence to 0.[N.-Evans, Auto.03] I Bounded initial state, noiseless plant; uniform convergence to 0 [Tatikonda-Mitter, TAC04] I Random plant noise; mean-square boundedness.[n.-evans, SICON04] I Bounded plant noise; uniform boundedness [Tatikonda-Mitter, TAC04] Additive uncorrelated noise models of quantisation fail to capture the existence of such a threshold. Necessity typically proved using differential entropy power, quantisation theory or volume partitioning bounds. Sufficiency via explicit construction. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

38 Noisy Channels Stable states/estimation errors possible iff a suitable channel figure-of-merit (FoM) satisfies FoM > Â log 2 l i, l i 1 where l 1,...,l n = eigenvalues of plant matrix A. Unlike noiseless channel case, FoM depends critically on stability notion and noise model: I FoM = C - states/est. errors! 0 almost surely (a.s.) [Matveev-Savkin SIAM07], or mean-square bounded (MSB) states over AWGN channel [Braslavsky et al. TAC07] I FoM = C any - MSB states over random discrete memoryless channels [Sahai-Mitter TIT06] I FoM = C0f for control or C 0 for state estimation, with a.s. bounded states/est. errors [Matveev-Savkin IJC07] As C C any C 0f C 0, these criteria generally do not coincide. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

39 Noisy Channels Stable states/estimation errors possible iff a suitable channel figure-of-merit (FoM) satisfies FoM > Â log 2 l i, l i 1 where l 1,...,l n = eigenvalues of plant matrix A. Unlike noiseless channel case, FoM depends critically on stability notion and noise model: I FoM = C - states/est. errors! 0 almost surely (a.s.) [Matveev-Savkin SIAM07], or mean-square bounded (MSB) states over AWGN channel [Braslavsky et al. TAC07] I FoM = C any - MSB states over random discrete memoryless channels [Sahai-Mitter TIT06] I FoM = C0f for control or C 0 for state estimation, with a.s. bounded states/est. errors [Matveev-Savkin IJC07] As C C any C 0f C 0, these criteria generally do not coincide. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

40 Noisy Channels Stable states/estimation errors possible iff a suitable channel figure-of-merit (FoM) satisfies FoM > Â log 2 l i, l i 1 where l 1,...,l n = eigenvalues of plant matrix A. Unlike noiseless channel case, FoM depends critically on stability notion and noise model: I FoM = C - states/est. errors! 0 almost surely (a.s.) [Matveev-Savkin SIAM07], or mean-square bounded (MSB) states over AWGN channel [Braslavsky et al. TAC07] I FoM = C any - MSB states over random discrete memoryless channels [Sahai-Mitter TIT06] I FoM = C0f for control or C 0 for state estimation, with a.s. bounded states/est. errors [Matveev-Savkin IJC07] As C C any C 0f C 0, these criteria generally do not coincide. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

41 Noisy Channels Stable states/estimation errors possible iff a suitable channel figure-of-merit (FoM) satisfies FoM > Â log 2 l i, l i 1 where l 1,...,l n = eigenvalues of plant matrix A. Unlike noiseless channel case, FoM depends critically on stability notion and noise model: I FoM = C - states/est. errors! 0 almost surely (a.s.) [Matveev-Savkin SIAM07], or mean-square bounded (MSB) states over AWGN channel [Braslavsky et al. TAC07] I FoM = C any - MSB states over random discrete memoryless channels [Sahai-Mitter TIT06] I FoM = C0f for control or C 0 for state estimation, with a.s. bounded states/est. errors [Matveev-Savkin IJC07] As C C any C 0f C 0, these criteria generally do not coincide. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

42 Noisy Channels Stable states/estimation errors possible iff a suitable channel figure-of-merit (FoM) satisfies FoM > Â log 2 l i, l i 1 where l 1,...,l n = eigenvalues of plant matrix A. Unlike noiseless channel case, FoM depends critically on stability notion and noise model: I FoM = C - states/est. errors! 0 almost surely (a.s.) [Matveev-Savkin SIAM07], or mean-square bounded (MSB) states over AWGN channel [Braslavsky et al. TAC07] I FoM = C any - MSB states over random discrete memoryless channels [Sahai-Mitter TIT06] I FoM = C0f for control or C 0 for state estimation, with a.s. bounded states/est. errors [Matveev-Savkin IJC07] As C C any C 0f C 0, these criteria generally do not coincide. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

43 Missing Information If the goal is MSB or a.s. convergence! 0 of states/estimation errors, then information theory is crucial for finding lower bounds. However, when the goal is a.s. bounded states/errors, classical information theory has played no role so far in networked estimation/control. Yet information in some sense must be flowing across the channel, even without a probabilistic model/objective. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

44 Questions Is there a meaningful theory of information for nonrandom variables? Can we construct an information-theoretic basis for networked estimation/control with nonrandom noise? Are there intrinsic, information-theoretic interpretations of C 0 and C 0f? Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

45 Why Nonstochastic Anyway? Long tradition in control of treating noise as nonrandom perturbation with bounded magnitude, energy or power: Control systems usually have mechanical/chemical components, as well as electrical. I Dominant disturbances may not be governed by known probability distributions. I E.g. in mechanical systems, main disturbance may be vibrations at resonant frequencies determined by machine dimensions and material properties. In contrast, communication systems are mainly electrical/electro-magnetic/optical. I Dominant disturbances - thermal noise, shot noise, fading etc. - well-modelled by probability distributions derived from statistical/quantum physics. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

46 Why Nonstochastic Anyway? Long tradition in control of treating noise as nonrandom perturbation with bounded magnitude, energy or power: Control systems usually have mechanical/chemical components, as well as electrical. I Dominant disturbances may not be governed by known probability distributions. I E.g. in mechanical systems, main disturbance may be vibrations at resonant frequencies determined by machine dimensions and material properties. In contrast, communication systems are mainly electrical/electro-magnetic/optical. I Dominant disturbances - thermal noise, shot noise, fading etc. - well-modelled by probability distributions derived from statistical/quantum physics. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

47 Why Nonstochastic Anyway? Long tradition in control of treating noise as nonrandom perturbation with bounded magnitude, energy or power: Control systems usually have mechanical/chemical components, as well as electrical. I Dominant disturbances may not be governed by known probability distributions. I E.g. in mechanical systems, main disturbance may be vibrations at resonant frequencies determined by machine dimensions and material properties. In contrast, communication systems are mainly electrical/electro-magnetic/optical. I Dominant disturbances - thermal noise, shot noise, fading etc. - well-modelled by probability distributions derived from statistical/quantum physics. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

48 Why Nonstochastic Anyway? Long tradition in control of treating noise as nonrandom perturbation with bounded magnitude, energy or power: Control systems usually have mechanical/chemical components, as well as electrical. I Dominant disturbances may not be governed by known probability distributions. I E.g. in mechanical systems, main disturbance may be vibrations at resonant frequencies determined by machine dimensions and material properties. In contrast, communication systems are mainly electrical/electro-magnetic/optical. I Dominant disturbances - thermal noise, shot noise, fading etc. - well-modelled by probability distributions derived from statistical/quantum physics. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

49 Why Nonstochastic Anyway? Long tradition in control of treating noise as nonrandom perturbation with bounded magnitude, energy or power: Control systems usually have mechanical/chemical components, as well as electrical. I Dominant disturbances may not be governed by known probability distributions. I E.g. in mechanical systems, main disturbance may be vibrations at resonant frequencies determined by machine dimensions and material properties. In contrast, communication systems are mainly electrical/electro-magnetic/optical. I Dominant disturbances - thermal noise, shot noise, fading etc. - well-modelled by probability distributions derived from statistical/quantum physics. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

50 Why Nonstochastic Anyway? Long tradition in control of treating noise as nonrandom perturbation with bounded magnitude, energy or power: Control systems usually have mechanical/chemical components, as well as electrical. I Dominant disturbances may not be governed by known probability distributions. I E.g. in mechanical systems, main disturbance may be vibrations at resonant frequencies determined by machine dimensions and material properties. In contrast, communication systems are mainly electrical/electro-magnetic/optical. I Dominant disturbances - thermal noise, shot noise, fading etc. - well-modelled by probability distributions derived from statistical/quantum physics. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

51 Why Nonstochastic Anyway? (cont.) Related to the previous points, In most digital comm. systems, bit periods T b s or shorter. ) Thermal and shot noise (s µ p T b ) noticeable compared to detected signal amplitudes (µ T b ). Control systems typically operate with longer sample or bit periods, 10 2 or 10 3 s. ) Thermal/shot noise negligible compared to signal amplitudes. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

52 Why Nonstochastic Anyway? (cont.) Related to the previous points, In most digital comm. systems, bit periods T b s or shorter. ) Thermal and shot noise (s µ p T b ) noticeable compared to detected signal amplitudes (µ T b ). Control systems typically operate with longer sample or bit periods, 10 2 or 10 3 s. ) Thermal/shot noise negligible compared to signal amplitudes. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

53 Why Nonstochastic Anyway? (cont.) For safety or mission-critical reasons, stability and performance guarantees often required every time a control system is used, if disturbances within rated bounds. Especially if plant is unstable or marginally stable. Or if we wish to interconnect several control systems and still be sure of performance. In contrast, most consumer-oriented communications requires good performance only on average, or with high probability. Occasional violations of specifications permitted, and cannot be prevented within a probabilistic framework. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

54 Why Nonstochastic Anyway? (cont.) For safety or mission-critical reasons, stability and performance guarantees often required every time a control system is used, if disturbances within rated bounds. Especially if plant is unstable or marginally stable. Or if we wish to interconnect several control systems and still be sure of performance. In contrast, most consumer-oriented communications requires good performance only on average, or with high probability. Occasional violations of specifications permitted, and cannot be prevented within a probabilistic framework. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

55 Why Nonstochastic Anyway? (cont.) For safety or mission-critical reasons, stability and performance guarantees often required every time a control system is used, if disturbances within rated bounds. Especially if plant is unstable or marginally stable. Or if we wish to interconnect several control systems and still be sure of performance. In contrast, most consumer-oriented communications requires good performance only on average, or with high probability. Occasional violations of specifications permitted, and cannot be prevented within a probabilistic framework. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

56 Probability in Practice If there s a fifty-fifty chance that something can go wrong, nine out of ten times, it will. (attrib. L. Yogi Berra, former US baseball player) (Photo from Wikipedia) Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

57 Uncertain Variable Formalism Define an uncertain variable (uv) X to be a mapping from an underlying sample space to a space X. Each w 2 may represent a specific combination of noise/input signals into a system, and X may represent a state/output variable. For a given w, x = X(w) is the realisation of X. Unlike probability theory, no s-algebra 2 or measure on is imposed. Assume is uncountable to accommodate continuous X. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

58 Uncertain Variable Formalism Define an uncertain variable (uv) X to be a mapping from an underlying sample space to a space X. Each w 2 may represent a specific combination of noise/input signals into a system, and X may represent a state/output variable. For a given w, x = X(w) is the realisation of X. Unlike probability theory, no s-algebra 2 or measure on is imposed. Assume is uncountable to accommodate continuous X. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

59 UV Formalism- Ranges and Conditioning Marginal range JXK := {X(w):w 2 } X. Joint range JX,Y K := {(X(w),Y (w)) : w 2 } X Y. Conditional range JX yk := {X(w):Y (w)=y,w 2 }. In the absence of statistical structure, the joint range fully characterises the relationship between X and Y. Note JX,Y K = [ y2jy K JX yk {y}, i.e. joint range is given by the conditional and marginal, analogously to probability theory. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

60 Independence Without Probability Definition The uv s X,Y are called (mutually) unrelated if denoted X? Y. Else called related. Equivalent characterisation: Proposition The uv s X,Y unrelated if JX,Y K = JXK JY K, (1) JX yk = JXK, 8y 2 JY K. (2) Unrelatedness is equivalent to X and Y inducing qualitatively independent [Rényi 70] partitions of when is finite. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

61 Examples of Relatedness and Unrelatedness y Y x' Y y Y y X, Y X, Y X y' X Y = Y x' y x x x X X = X y ' a) X,Y related b) X,Y unrelated x Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

62 Markovness without Probability Definition X,Y,Z said to form a Markov uncertainty chain X Y Z if JX y,zk = JX yk, 8(y,z) 2 JY,Z K. (3) Equivalent to JX,Z yk = JX yk JZ yk, 8y 2 JY K, i.e. X,Z conditionally unrelated given Y, or in other words X? Z Y. X,Y,Z said to form a conditional Markov uncertainty chain given W if X (Y,W ) Z. Also write as X Y Z W or X? Z (Y,W ). Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

63 Markovness without Probability Definition X,Y,Z said to form a Markov uncertainty chain X Y Z if JX y,zk = JX yk, 8(y,z) 2 JY,Z K. (3) Equivalent to JX,Z yk = JX yk JZ yk, 8y 2 JY K, i.e. X,Z conditionally unrelated given Y, or in other words X? Z Y. X,Y,Z said to form a conditional Markov uncertainty chain given W if X (Y,W ) Z. Also write as X Y Z W or X? Z (Y,W ). Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

64 Markovness without Probability Definition X,Y,Z said to form a Markov uncertainty chain X Y Z if JX y,zk = JX yk, 8(y,z) 2 JY,Z K. (3) Equivalent to JX,Z yk = JX yk JZ yk, 8y 2 JY K, i.e. X,Z conditionally unrelated given Y, or in other words X? Z Y. X,Y,Z said to form a conditional Markov uncertainty chain given W if X (Y,W ) Z. Also write as X Y Z W or X? Z (Y,W ). Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

65 Information without Probability Definition Two points (x,y),(x 0,y 0 ) 2 JX,Y K are called taxicab connected (x,y)! (x 0 y 0 ) if 9 a sequence (x,y)=(x 1,y 1 ),(x 2,y 2 ),...,(x n 1,y n 1 ),(x n,y n )=(x 0,y 0 ) of points in JX,Y K such that each point differs in only one coordinate from its predecessor. Not hard to see that! is an equivalence relation on JX,Y K. Call its equivalence classes a taxicab partition T [X;Y ] of JX,Y K. Define a nonstochastic information index I [X;Y ]:=log 2 T [X;Y ] 2 [0, ]. (4) Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

66 Information without Probability Definition Two points (x,y),(x 0,y 0 ) 2 JX,Y K are called taxicab connected (x,y)! (x 0 y 0 ) if 9 a sequence (x,y)=(x 1,y 1 ),(x 2,y 2 ),...,(x n 1,y n 1 ),(x n,y n )=(x 0,y 0 ) of points in JX,Y K such that each point differs in only one coordinate from its predecessor. Not hard to see that! is an equivalence relation on JX,Y K. Call its equivalence classes a taxicab partition T [X;Y ] of JX,Y K. Define a nonstochastic information index I [X;Y ]:=log 2 T [X;Y ] 2 [0, ]. (4) Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

67 Connection to Common Random Variables T [X; Y ] also called ergodic decomposition [Gács-Körner PCIT72]. For discrete X,Y, the elements of T [X;Y ] are the connected components of [Wolf-Wullschleger itw04], which were shown there to be the maximal common rv Z, i.e. I Z = f (X)=g (Y ) under suitable mappings f,g (since points in distinct sets in T [X;Y ] are not taxicab-connected) I If another rv Z f (X) g(y ), then Z k(z ) (since all points in the same set in T [X;Y ] are taxicab-connected) Not hard to see that Z also has the largest no. distinct values of any common rv Z f (X) g(y ). I [X;Y ]=Hartley entropy of Z. Maximal common rv s first described in the brief paper The lattice theory of information [Shannon TIT53]. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

68 Connection to Common Random Variables T [X; Y ] also called ergodic decomposition [Gács-Körner PCIT72]. For discrete X,Y, the elements of T [X;Y ] are the connected components of [Wolf-Wullschleger itw04], which were shown there to be the maximal common rv Z, i.e. I Z = f (X)=g (Y ) under suitable mappings f,g (since points in distinct sets in T [X;Y ] are not taxicab-connected) I If another rv Z f (X) g(y ), then Z k(z ) (since all points in the same set in T [X;Y ] are taxicab-connected) Not hard to see that Z also has the largest no. distinct values of any common rv Z f (X) g(y ). I [X;Y ]=Hartley entropy of Z. Maximal common rv s first described in the brief paper The lattice theory of information [Shannon TIT53]. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

69 Examples z=1 y y z=0 z=0 z=1 z=0 x z=0 x T = 2= max.# distinct values that can always be agreed on from separate observations of X & Y. T = 1= max.# distinct values that can always be agreed on from separate observations of X & Y. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

70 Equivalent View via Overlap Partitions As in probability, often easier to work with conditional rather than joint ranges. Let JX Y K := {JX yk : y 2 JY K} be the conditional range family. Definition Two points x,x 0 are called JX Y K-overlap-connected if 9 a sequence of sets B 1,...,B n 2 JX Y K s.t. x 2 B 1 and x 0 2 B n B i \ B i+1 6= /0, 8i 2 [1 : n 1]. Overlap connectedness is an equivalence relation on JX K, induced by JX Y K. Let the overlap partition JX Y K of JXK denote the equivalence classes. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

71 Equivalent View via Overlap Partitions As in probability, often easier to work with conditional rather than joint ranges. Let JX Y K := {JX yk : y 2 JY K} be the conditional range family. Definition Two points x,x 0 are called JX Y K-overlap-connected if 9 a sequence of sets B 1,...,B n 2 JX Y K s.t. x 2 B 1 and x 0 2 B n B i \ B i+1 6= /0, 8i 2 [1 : n 1]. Overlap connectedness is an equivalence relation on JX K, induced by JX Y K. Let the overlap partition JX Y K of JXK denote the equivalence classes. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

72 Equivalent View via Overlap Partitions (cont.) Proposition For any uv s X,Y, I [X;Y ]=log 2 JX Y K. (5) Proof Sketch: For any two points (x,y),(x 0,y 0 ) 2 JX,Y K, (x,y)! (x 0,y 0 ) iff x 0 and x 0 are JX Y K-overlap-connected. This allows us to set up a bijection between the partitions T [X;Y ] and JX Y K. ) T [X;Y ] and JX Y K must have the same cardinality. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

73 Equivalent View via Overlap Partitions (cont.) Proposition For any uv s X,Y, I [X;Y ]=log 2 JX Y K. (5) Proof Sketch: For any two points (x,y),(x 0,y 0 ) 2 JX,Y K, (x,y)! (x 0,y 0 ) iff x 0 and x 0 are JX Y K-overlap-connected. This allows us to set up a bijection between the partitions T [X;Y ] and JX Y K. ) T [X;Y ] and JX Y K must have the same cardinality. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

74 Equivalent View via Overlap Partitions (cont.) Proposition For any uv s X,Y, I [X;Y ]=log 2 JX Y K. (5) Proof Sketch: For any two points (x,y),(x 0,y 0 ) 2 JX,Y K, (x,y)! (x 0,y 0 ) iff x 0 and x 0 are JX Y K-overlap-connected. This allows us to set up a bijection between the partitions T [X;Y ] and JX Y K. ) T [X;Y ] and JX Y K must have the same cardinality. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

75 Properties of I (Nonnegativity) I [X;Y ] 0 (obvious) (Symmetry) I [X;Y ]=I [Y ;X]. Follows from the fact that (x,y)! (x 0,y 0 ) 2 JX,Y K () (y,x)! (y 0,x 0 ) 2 JY,XK. (6) From this property and (5), knowing just one of the conditional range families JX Y K or JY XK is enough to determine I [X;Y ]. Not like ordinary mutual information. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

76 Properties of I (cont.) Proposition (Monotonicity) For any uv s X,Y,Z, I [X;Y,Z ] I [X;Y ]. (7) Proof: Idea is to find a surjection from JX Y,Z K! JX Y K. This would automatically imply that the latter cannot have greater cardinality. Pick any set B 2 JX Y,Z K and choose a B 0 2 JX Y K s.t. B \ B 0 6= /0. At least one such B 0 exists, since JX Y K covers JXK B. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

77 Properties of I (cont.) Proposition (Monotonicity) For any uv s X,Y,Z, I [X;Y,Z ] I [X;Y ]. (7) Proof: Idea is to find a surjection from JX Y,Z K! JX Y K. This would automatically imply that the latter cannot have greater cardinality. Pick any set B 2 JX Y,Z K and choose a B 0 2 JX Y K s.t. B \ B 0 6= /0. At least one such B 0 exists, since JX Y K covers JXK B. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

78 Proof of Monotonic Property (cont.) Furthermore, exactly one such intersecting B 0 2 JX Y K exists for each B 2 JX Y,Z K, since B B 0 : I By definition, any x 2 B and x 0 2 B \ B 0 are connected by a sequence of successively overlapping sets in JX Y,Z K. I As JX y,zk JX yk, x,x 0 are also connected by a sequence of successively overlapping sets in JX Y K. I But B 0 = all pts. that are JX Y K-overlap connected with the representative pt. x 0 2 B 0, so x 2 B 0. I As x was arbitrary, B B 0. Thus B 7! B 0 is a well-defined map from JX Y,Z K! JX Y K. It is also onto since, as noted before, every set B 0 2 JX Y K intersects some B in JX Y,Z K, which covers JXK. So B 7! B 0 is the required surjection from JX Y,Z K! JX Y K. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

79 Proof of Monotonic Property (cont.) Furthermore, exactly one such intersecting B 0 2 JX Y K exists for each B 2 JX Y,Z K, since B B 0 : I By definition, any x 2 B and x 0 2 B \ B 0 are connected by a sequence of successively overlapping sets in JX Y,Z K. I As JX y,zk JX yk, x,x 0 are also connected by a sequence of successively overlapping sets in JX Y K. I But B 0 = all pts. that are JX Y K-overlap connected with the representative pt. x 0 2 B 0, so x 2 B 0. I As x was arbitrary, B B 0. Thus B 7! B 0 is a well-defined map from JX Y,Z K! JX Y K. It is also onto since, as noted before, every set B 0 2 JX Y K intersects some B in JX Y,Z K, which covers JXK. So B 7! B 0 is the required surjection from JX Y,Z K! JX Y K. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

80 Proof of Monotonic Property (cont.) Furthermore, exactly one such intersecting B 0 2 JX Y K exists for each B 2 JX Y,Z K, since B B 0 : I By definition, any x 2 B and x 0 2 B \ B 0 are connected by a sequence of successively overlapping sets in JX Y,Z K. I As JX y,zk JX yk, x,x 0 are also connected by a sequence of successively overlapping sets in JX Y K. I But B 0 = all pts. that are JX Y K-overlap connected with the representative pt. x 0 2 B 0, so x 2 B 0. I As x was arbitrary, B B 0. Thus B 7! B 0 is a well-defined map from JX Y,Z K! JX Y K. It is also onto since, as noted before, every set B 0 2 JX Y K intersects some B in JX Y,Z K, which covers JXK. So B 7! B 0 is the required surjection from JX Y,Z K! JX Y K. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

81 Proof of Monotonic Property (cont.) Furthermore, exactly one such intersecting B 0 2 JX Y K exists for each B 2 JX Y,Z K, since B B 0 : I By definition, any x 2 B and x 0 2 B \ B 0 are connected by a sequence of successively overlapping sets in JX Y,Z K. I As JX y,zk JX yk, x,x 0 are also connected by a sequence of successively overlapping sets in JX Y K. I But B 0 = all pts. that are JX Y K-overlap connected with the representative pt. x 0 2 B 0, so x 2 B 0. I As x was arbitrary, B B 0. Thus B 7! B 0 is a well-defined map from JX Y,Z K! JX Y K. It is also onto since, as noted before, every set B 0 2 JX Y K intersects some B in JX Y,Z K, which covers JXK. So B 7! B 0 is the required surjection from JX Y,Z K! JX Y K. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

82 Proof of Monotonic Property (cont.) Furthermore, exactly one such intersecting B 0 2 JX Y K exists for each B 2 JX Y,Z K, since B B 0 : I By definition, any x 2 B and x 0 2 B \ B 0 are connected by a sequence of successively overlapping sets in JX Y,Z K. I As JX y,zk JX yk, x,x 0 are also connected by a sequence of successively overlapping sets in JX Y K. I But B 0 = all pts. that are JX Y K-overlap connected with the representative pt. x 0 2 B 0, so x 2 B 0. I As x was arbitrary, B B 0. Thus B 7! B 0 is a well-defined map from JX Y,Z K! JX Y K. It is also onto since, as noted before, every set B 0 2 JX Y K intersects some B in JX Y,Z K, which covers JXK. So B 7! B 0 is the required surjection from JX Y,Z K! JX Y K. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

83 Proof of Monotonic Property (cont.) Furthermore, exactly one such intersecting B 0 2 JX Y K exists for each B 2 JX Y,Z K, since B B 0 : I By definition, any x 2 B and x 0 2 B \ B 0 are connected by a sequence of successively overlapping sets in JX Y,Z K. I As JX y,zk JX yk, x,x 0 are also connected by a sequence of successively overlapping sets in JX Y K. I But B 0 = all pts. that are JX Y K-overlap connected with the representative pt. x 0 2 B 0, so x 2 B 0. I As x was arbitrary, B B 0. Thus B 7! B 0 is a well-defined map from JX Y,Z K! JX Y K. It is also onto since, as noted before, every set B 0 2 JX Y K intersects some B in JX Y,Z K, which covers JXK. So B 7! B 0 is the required surjection from JX Y,Z K! JX Y K. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

84 Proof of Monotonic Property (cont.) Furthermore, exactly one such intersecting B 0 2 JX Y K exists for each B 2 JX Y,Z K, since B B 0 : I By definition, any x 2 B and x 0 2 B \ B 0 are connected by a sequence of successively overlapping sets in JX Y,Z K. I As JX y,zk JX yk, x,x 0 are also connected by a sequence of successively overlapping sets in JX Y K. I But B 0 = all pts. that are JX Y K-overlap connected with the representative pt. x 0 2 B 0, so x 2 B 0. I As x was arbitrary, B B 0. Thus B 7! B 0 is a well-defined map from JX Y,Z K! JX Y K. It is also onto since, as noted before, every set B 0 2 JX Y K intersects some B in JX Y,Z K, which covers JXK. So B 7! B 0 is the required surjection from JX Y,Z K! JX Y K. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

85 Properties of I (cont.) Proposition (Data Processing) For Markov uncertainty chains X Y Z (3), Proof: I [X;Z ] apple I [X;Y ]. By monotonicity and the overlap partition characterisation of I, I [X;Z ] (7) apple I [X;Y,Z ] (5) = log JX Y,Z K. (8) By Markovness (3), JX y,zk = JX yk, 8y 2 JY K and z 2 JZ yk. ) JX Y,Z K = JX Y K. ) JX Y,Z K = JX Y K. Substitute into the RHS of the equation above. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

86 Properties of I (cont.) Proposition (Data Processing) For Markov uncertainty chains X Y Z (3), Proof: I [X;Z ] apple I [X;Y ]. By monotonicity and the overlap partition characterisation of I, I [X;Z ] (7) apple I [X;Y,Z ] (5) = log JX Y,Z K. (8) By Markovness (3), JX y,zk = JX yk, 8y 2 JY K and z 2 JZ yk. ) JX Y,Z K = JX Y K. ) JX Y,Z K = JX Y K. Substitute into the RHS of the equation above. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

87 Properties of I (cont.) Proposition (Data Processing) For Markov uncertainty chains X Y Z (3), Proof: I [X;Z ] apple I [X;Y ]. By monotonicity and the overlap partition characterisation of I, I [X;Z ] (7) apple I [X;Y,Z ] (5) = log JX Y,Z K. (8) By Markovness (3), JX y,zk = JX yk, 8y 2 JY K and z 2 JZ yk. ) JX Y,Z K = JX Y K. ) JX Y,Z K = JX Y K. Substitute into the RHS of the equation above. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

88 Properties of I (cont.) Proposition (Data Processing) For Markov uncertainty chains X Y Z (3), Proof: I [X;Z ] apple I [X;Y ]. By monotonicity and the overlap partition characterisation of I, I [X;Z ] (7) apple I [X;Y,Z ] (5) = log JX Y,Z K. (8) By Markovness (3), JX y,zk = JX yk, 8y 2 JY K and z 2 JZ yk. ) JX Y,Z K = JX Y K. ) JX Y,Z K = JX Y K. Substitute into the RHS of the equation above. Nair (Uni. Melbourne) Nonstochastic Information DISC School / 68

A Nonstochastic Information Theory for Communication and State Estimation

1 A Nonstochastic Information Theory for Communication and State Estimation Girish N. Nair analogues of the stochastic concepts mentioned above, without assuming a probability space. Such questions are