Imprecise Probabilities in Stochastic Processes and Graphical Models: New Developments

Imprecise Probabilities in Stochastic Processes and Graphical Models: New Developments Gert de Cooman Ghent University SYSTeMS NLMUA2011, 7th September 2011

IMPRECISE PROBABILITY MODELS

Imprecise probability models Sets of probability models An imprecise probability model is a set M of precise probability models P More precisely: M is a closed convex set of probabilities.

Imprecise probability models Sets of probability models An imprecise probability model is a set M of precise probability models P More precisely: M is a closed convex set of probabilities. Conditioning To condition M on an event B, we condition each precise model in M : M B := {P( B): P M }

Imprecise probability models Lower and upper previsions Consider, for a function f P(f ) := min{p(f ): P M } P(f ) := max{p(f ): P M } The nonlinear functionals P and P determine M completely.

Imprecise probability models Lower and upper previsions Consider, for a function f P(f ) := min{p(f ): P M } P(f ) := max{p(f ): P M } The nonlinear functionals P and P determine M completely. Conditioning and the Generalised Bayes Rule When P(B) > 0 then P(f B) is the unique solution of the equation in x: P(I B [f x]) = 0.

@ARTICLE{walley2000, author = {Walley, Peter}, title = {Towards a unified theory of imprecise probability}, journal = {International Journal of Approximate Reasoning}, year = 2000, volume = 24, pages = {125--148} }

Imprecise probability models Accepting gambles Consider an exhaustive set Ω of mutually exclusive alternatives ω, exactly one of which obtains. Subject is uncertain about which alternative obtains. A gamble f : Ω R is interpreted as an uncertain reward: if the alternative that obtains is ω, then the reward for Subject is f (ω). Let G (Ω) be the set of all gambles on Ω.

Imprecise probability models Accepting gambles Subject accepts a gamble f if he accepts to engage in the following transaction, where 1. we determine which alternative ω obtains; 2. Subject receives f (ω). We try to model Subject s uncertainty by looking at which gambles in G (Ω) he accepts.

Imprecise probability models Coherent sets of desirable gambles Subject specifies a set D of gambles he accepts, his set of desirable gambles. D is called coherent if it satisfies the following rationality requirements: D1. if f 0 then f D [avoiding partial loss]; D2. if f > 0 then f D [accepting partial gain]; D3. if f 1 D and f 2 D then f 1 + f 2 D [combination]; D4. if f D then λf D for all non-negative real numbers λ [scaling]. Here f > 0 means f 0 and not f = 0. Walley has also argued that sets of desirable gambles should satisfy an additional axiom: D5. D is B-conglomerable for any partition B of Ω: if I B f D for all B B, then also f D [full conglomerability].

Imprecise probability models Natural extension as a form of inference Since D1 D4 are preserved under arbitrary non-empty intersections: Theorem Let A be any set of gambles. Then there is a coherent set of desirable gambles that includes A if and only if f 0 for all f posi(a ) In that case, the natural extension E (A ) of A is the smallest such coherent set, and given by: E (A ) := posi(a G + (Ω)).

Imprecise probability models Lower and upper previsions Given Subject s coherent set D, we can define his upper and lower previsions: so P(f ) = P( f ). P(f ) := inf{α : α f D} P(f ) := sup{α : f α D} P(f ) is the supremum price α for which Subject will buy the gamble f, i.e., accept the gamble f α. the lower probability P(A) := P(I A ) is Subject s supremum rate for betting on the event A.

Imprecise probability models Conditional lower and upper previsions We can also define Subject s conditional lower and upper previsions: for any gamble f and any non-empty subset B of Ω, with indicator I B : P(f B) := inf{α : I B (α f ) D} P(f B) := sup{α : I B (f α) D} so P(f B) = P( f B) and P(f ) = P(f Ω). P(f B) is the supremum price α for which Subject will buy the gamble f, i.e., accept the gamble f α, contingent on the occurrence of B. For any partition B, define the gamble P(f B) as P(f B)(ω) := P(f B), B B,ω B

Imprecise probability models Coherence of conditional lower and upper previsions Suppose you have a number of functionals P( B 1 ),...,P( B n ) These are called coherent if there is some coherent set of desirable gambles D that is B k -conglomerable for all k, such that P(f B k ) = sup{α R: I Bk (f α) D} B k B k,k = 1,...,n

Imprecise probability models Properties of conditional lower and upper previsions Theorem ([10]) Consider a coherent set of desirable gambles, let B be any non-empty subset of Ω, and let f, f 1 and f 2 be gambles on Ω. Then: 1. inf ω B f (ω) P(f B) P(f B) sup ω B f (ω) [positivity]; 2. P(f 1 + f 2 B) P(f 1 B) + P(f 2 B) [super-additivity]; 3. P(λf B) = λp(f B) for all real λ 0 [non-negative homogeneity]; 4. if B is a partition of Ω that refines the partition {B,B c } and D is B-conglomerable, then P(f B) P(P(f B) B) [conglomerative property].

Imprecise probability models Conditional previsions If P(f B) = P(f B) =: P(f B) then P(f B) is Subject s fair price or prevision for f, conditional on B. It is the fixed amount of utility that I am willing to exchange the uncertain reward f for, conditional on the occurrence of B. Related to de Finetti s fair prices [6], and to Huygens s [7] Corollary 1. inf ω B f (ω) P(f B); dit is my so veel weerdt als. 2. P(λf + µg B) = λp(f B) + µp(g B); 3. if B is a partition of Ω that refines the partition {B,B c } and if there is B-conglomerability, then P(f B) = P(P(f B) B).

IMPRECISE STOCHASTIC PROCESSES

@ARTICLE{cooman2008, author = {{d}e Cooman, Gert and Hermans, Filip}, title = {Imprecise probability trees: Bridging two theories of imprecise probability}, journal = {Artificial Intelligence}, year = 2008, volume = 172, pages = {1400--1427}, doi = {10.1016/j.artint.2008.03.001} }

Reality s event tree and move spaces Reality can make a number of moves, where the possible next moves may depend only on the previous moves he has made. We can represent Reality s moves by an event tree. In each non-terminal situation t, Reality has a set of possible next moves { W t := w: tw Ω }, called Reality s move space in situation t. W t may be infinite, but has at least two elements. We assume the event tree has finite horizon.

An event tree and its situations Situations are nodes in the event tree ω terminal initial t non-terminal

An event tree and its situations The sample space Ω is the set of all terminal situations

An event tree and its situations The partial order on the set Ω of all situations s s t t s precedes t

An event tree and its situations The partial order on the set Ω of all situations s s t t s strictly precedes t

An event tree and its situations An event A is a subset of the sample space Ω s E(s) = {ω Ω: s ω}

An event tree and its situations Cuts of the initial situation u 1 u 2 u 3 U

Reality s move space W t in a non-terminal situation t t sw 1 w 3 tw 3 w4 tw 4 w 5 tw 5 W s = {w 1,w 2 } s w 1 W t = {w 3,w 4,w 5 } w2 sw 2

Coherent immediate prediction Immediate prediction models In each non-terminal situation t, Forecaster has beliefs about which move w t W t Reality will make immediately afterwards. Forecaster specifies those local predictive beliefs in the form of a coherent set of desirable gambles D t on G (W t ). This leads to an immediate prediction model D t, t Ω \ Ω. Coherence here means D1 D4.

Coherent immediate prediction Immediate prediction models t sw 1 w 3 tw 3 w4 tw 4 w 5 tw 5 R s G ({w 1,w 2 }) s w 1 R t G ({w 3,w 4,w 5 }) w2 sw 2

Coherent immediate prediction From a local to a global model How to combine the local pieces of information into a global model, i.e., which gambles f on the entire sample space Ω does Forecaster accept? For each non-terminal situation t and each h t D t, Forecaster accepts the gamble I E(t) h t on Ω, where I E(t) h t (ω) := { 0 t ω h t (w) tw ω,w W t I E(t) h t represents the gamble on Ω that is called off unless Reality ends up in situation t, and then depends only on Reality s move immediately after t, and gives the same value h t (w) to all paths ω that go through tw.

Coherent immediate prediction From a local to a global model So Forecaster accepts all gambles in the set { } D := I E(t) h t : h t D t,t Ω \ Ω. Find the natural extension E (D) of D: the smallest subset of G (Ω) that includes D and is coherent, i.e., satisfies D1 D4 and cut conglomerability.

Coherent immediate prediction Cut conglomerability We want predictive models, so we will condition on the E(t), i.e., on the event that we get to situation t. The E(t) are the only events that we can legitimately condition on. The events E(t) form a partition B U of the sample space Ω iff the situations t belong to a cut U. Definition A set of desirable gambles D on Ω is cut-conglomerable (D5 ) if it is B U -conglomerable for all cuts U: ( u U)(I E(u) f D) f D.

Coherent immediate prediction Selections and gamble processes A t-selection S is a process, defined on all non-terminal situations s that follow t, and such that S (s) D s. It selects, in advance, a desirable gamble S (s) from the available desirable gambles in each non-terminal s t. With a t-selection S, we can associate a real-valued t-gamble process I S, which is a t-process such that for all s t and w W s, I S (sw) = I S (s) + S (s)(w), I S (t) = 0.

Coherent immediate prediction Selections and gamble processes w S (t)(w) w 1 +3 w 2 4 W t = {w 1,w 2 } t w 1 tw 1 I S (tw 1 ) = 2 + 3 = 5 w2 tw 2 I S (t) = 2 I S (tw 2 ) = 2 4 = 2

Coherent immediate prediction Marginal Extension Theorem Theorem (Marginal Extension Theorem) There is a smallest set of gambles that satisfies D1 D4 and D5 and includes D. This natural extension of D is given by } E (D) := {g G (Ω): g IΩ S for some -selection S. Moreover, for any non-terminal situation t and any t-gamble g, it holds that I E(t) g E (D) if and only if there is some t-selection S t such that g I S t Ω.

Coherent immediate prediction Predictive lower and upper previsions Use the coherent set of desirable gambles E (D) to define special lower (and upper) previsions P( t) := P( E(t)) conditional on an event E(t). For any gamble f on Ω and for any non-terminal situation t, P(f t) := sup { α : I E(t) (f α) E (D) } } = sup {α : f α IΩ S for some t-selection S. We call such conditional lower previsions predictive lower previsions for Forecaster. For a cut U of t, define the U-measurable t-gamble P(f U) by P(f U)(ω) := P(f u), u U, u ω.

Properties of predictive previsions Concatenation Theorem Theorem (Concatenation Theorem) Consider any two cuts U and V of a situation t such that U precedes V. Then for all t-gambles f on Ω, 1. P(f t) = P(P(f U) t); 2. P(f U) = P(P(f V) U). We can calculate P(f t) by backwards recursion, starting with P(f ω) = f (ω), and using only the local models: P s (g) = sup{α : g α D s }, where the non-terminal s t and g is a gamble on W s.

Properties of predictive previsions Concatenation Theorem P(f u 1 ) U P(f t) = P(P(f U) t) u 1 t u 2 P(f u 2 ) P(f ω 1 ) = f (ω 1 ) P(f ω 2 ) = f (ω 2 ) P(f ω 3 ) = f (ω 3 ) P(f ω 4 ) = f (ω 4 ) P(f ω 5 ) = f (ω 5 )

Properties of predictive previsions Envelope theorems Consider in each non-terminal situation s a compatible precise model P s on G (W s ): P s M s ( g G (W s ))(P s (g) P s (g)) This leads to collection of compatible probability trees in the sense of Huygens (and Shafer). Use the Concatenation Theorem to find the corresponding precise predictive previsions P(f t) for each compatible probability tree. Theorem (Lower Envelope Theorem) For all situations t and t-gambles f, P(f t) is the infimum (minimum) of the P(f t) over all compatible probability trees.

Considering unbounded time What if Reality s event tree no longer has a finite time horizon: how to calculate the lower prices/previsions P(f t)? The Shafer Vovk Ville approach { } sup α : f α limsupi S for some t-selection S. Open question(s): What does natural extension yield in this case, must coherence be strengthened to yield the Shafer Vovk Ville approach, and if so, how?

@ARTICLE{cooman2009, author = {{d}e Cooman, Gert and Hermans, Filip and Quaegehebeur, Erik}, title = {Imprecise {M}arkov chains and their limit behaviour}, journal = {Probability in the Engineering and Informational Sciences}, year = 2009, volume = 23, pages = {597--635}, doi = {10.1017/S0269964809990039} }

Imprecise Markov chains Discrete-time and finite state uncertain process Consider an uncertain process with variables X 1, X 2,..., X n,... Each X k assumes values in a finite set of states X. This leads to a standard event tree with situations s = (x 1,x 2,...,x n ), x k X, n 0 In each situation s there is a local imprecise belief model M s : a closed convex set of probability mass functions p on X. Associated local lower prevision P s : P s (f ) := min{e p (f ): p M s }; E p (f ) := f (x)p(x). x X

Imprecise Markov chains Example of a standard event tree X 1 b a (b,b) (b,a) (a,b) (a,a) X 2 (b,b,b) (b,b,a) (b,a,b) (b,a,a) (a,b,b) (a,b,a) (a,a,b) (a,a,a)

Imprecise Markov chains Precise Markov chain The uncertain process is a (stationary) precise Markov chain when all M s are singletons (precise), and M = {m 1 }, Markov Condition: M (x1,...,x n ) = {q( x n )}.

Imprecise Markov chains Probability tree for a precise Markov chain b q( b) (b,b) (b,a) q( b) q( a) (b,b,b) (b,b,a) (b,a,b) (b,a,a) m 1 a q( a) (a,b) (a,a) q( b) q( a) (a,b,b) (a,b,a) (a,a,b) (a,a,a)

Imprecise Markov chains Definition of an imprecise Markov chain The uncertain process is a (stationary) imprecise Markov chain when the Markov Condition is satisfied: M (x1,...,x n ) = Q( x n ). An imprecise Markov chain can be seen as an infinity of probability trees.

Imprecise Markov chains Probability tree for an imprecise Markov chain b Q( b) (b,b) (b,a) (b,b,b) Q( b) (b,b,a) (b,a,b) Q( a) (b,a,a) M 1 a Q( a) (a,b) (a,a) (a,b,b) Q( b) (a,b,a) (a,a,b) Q( a) (a,a,a)

Imprecise Markov chains Lower and upper transition operators T: G (X) G (X) and T: G (X) G (X) where for any gamble f on X: Tf (x) := min{e p (f ): p Q( x)} Tf (x) := max{e p (f ): p Q( x)} Then the Concatenation Formula yields: P n (f ) = P 1 (T n 1 f ) and P n (f ) = P 1 (T n 1 f ). Complexity is linear in the number of time steps!

Imprecise Markov chains An example with lower and upper mass functions [ ] q(a a) q(b a) q(c a) TI{{a}} TI {{b}} TI {{c}} = q(a b) q(b b) q(c b) q(a c) q(b c) q(c c) = 1 9 9 162 144 18 18 200 9 162 9 [ ] q(a a) q(b a) q(c a) TI{{a}} TI {{b}} TI {{c}} = q(a b) q(b b) q(c b) q(a c) q(b c) q(c c) = 1 19 19 172 154 28 28 200 19 172 19

Imprecise Markov chains An example with lower and upper mass functions n = 1 n = 2 n = 3 n = 4 n = 5 n = 6 n = 7 n = 8 n = 9 n = 10 n = 22 n = 1000

Imprecise Markov chains A Perron Frobenius Theorem Theorem ([4]) Consider a stationary imprecise Markov chain with finite state set X and an upper transition operator T. Suppose that T is regular, meaning that there is some n > 0 such that mint n I {{x}} > 0 for all x X. Then for every initial upper prevision P 1, the upper prevision P n = P 1 T n 1 for the state at time n converges point-wise to the same upper prevision P : lim P n(h) = lim P 1 (T n 1 h) := P (h) n n for all h in G (X). Moreover, the corresponding limit upper prevision E is the only T-invariant upper prevision on G (X), meaning that P = P T.

CREDAL NETWORKS

@ARTICLE{cooman2010, author = {{d}e Cooman, Gert and Hermans, Filip and Antonucci, Alessandro and Zaffalon, Marco}, title = {Epistemic irrelevance in credal nets: the case of imprecise {M}arkov trees}, journal = {International Journal of Approximate Reasoning}, year = 2010, volume = 51, pages = {1029--1052}, doi = {10.1016/j.ijar.2010.08.011} }

Credal trees X 1 X 2 X 5 X 7 X 3 X 4 X 6 X 8 X 9 X 10 X 11

Credal trees Local uncertainty models X m(i) X j... X i... X k the variable X i may assume a value in the finite set X i ; for each possible value x m(i) X m(i) of the mother variable X m(i), we have a conditional lower prevision P i ( x m(i) ): G (X i ) R: P i (f x m(i) ) = lower prevision of f (X i ), given that X m(i) = x m(i) local model P i ( X m(i) ) is a conditional lower prevision operator

Credal trees under epistemic irrelevance Definition Interpretation of graphical structure: Conditional on the mother variable, the non-parent non-descendants of each node variable are epistemically irrelevant to it and its descendants. X 1 X 2 X 5 X 7 X 3 X 4 X 6 X 8 X 9 X 10 X 11

Credal networks under epistemic irrelevance As an expert system When the credal network is a (Markov) tree we can find the joint model from the local models recursively, from leaves to root. Exact message passing algorithm credal tree treated as an expert system linear complexity in the number of nodes Python code written by Filip Hermans testing in cooperation with Marco Zaffalon and Alessandro Antonucci Current (toy) applications in HMMs character recognition [3] air traffic trajectory tracking and identification [1]

Example of application HMMs: character recognition for Dante s Divina Commedia Original text:... V I T A S 1 S 2 S 3 OCR output:... V O 1 I O 2 T O 3 O

Example of application HMMs: character recognition for Dante s Divina Commedia Accuracy 93.96% (7275/7743) Accuracy (if imprecise indeterminate) 64.97% (243/374) Determinacy 95.17% (7369/7743) Set-accuracy 93.58% (350/374) Single accuracy 95.43% (7032/7369) Indeterminate output size 2.97 over 21 Table: Precise vs. imprecise HMMs. Test results obtained by twofold cross-validation on the first two chants of Dante s Divina Commedia and n = 2. Quantification is achieved by IDM with s = 2 and modified Perks prior. The single-character output by the precise model is then guaranteed to be included in the set of characters the imprecise HMM identifies.

@INPROCEEDINGS{cooman2011, author = {De Bock, Jasper and {d}e Cooman, Gert}, title = {Imprecise probability trees: Bridging two theories of imprecise probability}, booktitle = {ISIPTA 09 -- Proceedings of the Sixth International Symposium on Imprecise Probability: Theories and Applications}, year = 2009, editor = {Coolen, Frank P. A. and {d}e Cooman, Gert and Fetz, Thomas and Oberguggenberger, Michael}, address = {Innsbruck, Austria}, publisher = {SIPTA} }

Imprecise Hidden Markov Model (ihmm) marginal model: Q 1 transition model: Q k ( X k 1 ) states: X 1 X 2... X k... X n outputs: O 1 O 2... O k... O n emission model S k ( X k )

Imprecise Hidden Markov Model (ihmm) Interpretation of the graphical structure: traditional credal networks X 1 X 2... X k... X n O 1 O 2... O k... O n non-parent non-descendants node and descendants strongly independent

Imprecise Hidden Markov Model (ihmm) Interpretation of the graphical structure: recent developments X 1 X 2... X k... X n O 1 O 2... O k... O n non-parent non-descendants node and descendants epistemically irrelevant

The notion of optimality we use: maximality We define a partial order on state sequences: ˆx 1:n x 1:n P 1 (I {{ˆx1:n }} I {{x1:n }} o 1:n ) > 0

The notion of optimality we use: maximality We define a partial order on state sequences: ˆx 1:n x 1:n P 1 (I {{ˆx1:n }} I {{x1:n }} o 1:n ) > 0 A state sequence is optimal if it is undominated, or maximal: ˆx 1:n opt(x 1:n o 1:n ) ( x 1:n X 1:n )x 1:n ˆx 1:n ( x 1:n X 1:n )P 1 (I {{x1:n }} I {{ˆx1:n }} o 1:n ) 0 ( x 1:n X 1:n )P 1 (I {{o1:n }}[I {{x1:n }} I {{ˆx1:n }}]) 0, P 1 (I {{o1:n }}[I {{x1:n }} I {{ˆx1:n }}]) can be easily determined recursively!

Complexity of EstiHMM linear in the length of the chain n cubic in the number of states s linear in the number of maximal elements Comparable to the Viterbi counterpart for precise HMMs. Important conclusion: The complexity depends crucially on how the number of maximal elements increases with the length of the chain n

Number of maximal elements Output sequence (0,1,1,1,0,0) and imprecision 1%

Number of maximal elements Output sequence (0,1,1,1,0,0) and imprecision 5%

Number of maximal elements Output sequence (0,1,1,1,0,0) and imprecision 10%

Number of maximal elements Output sequence (0,1,1,1,0,0) and imprecision 25%

Number of maximal elements Output sequence (0,1,1,1,0,0) and imprecision 40%

Literature I Enrique Miranda, A survey of the theory of coherent lower previsions. International Journal of Approximate Reasoning, 48:628 658, 2008. Matthias Troffaes, Decision making under uncertainty using imprecise probabilities. International Journal of Approximate Reasoning, 45:17 29, 2007. Alessandro Antonucci, Alessio Benavoli, Marco Zaffalon, Gert de Cooman, and Filip Hermans. Multiple model tracking by imprecise Markov trees. In Proceedings of the 12th International Conference on Information Fusion (Seattle, WA, USA, July 6 9, 2009), pages 1767 1774, 2009. Gert de Cooman and Filip Hermans. Imprecise probability trees: Bridging two theories of imprecise probability. Artificial Intelligence, 172(11):1400 1427, 2008.

Literature II Gert de Cooman, Filip Hermans, Alessandro Antonucci, and Marco Zaffalon. Epistemic irrelevance in credal networks: the case of imprecise Markov trees. International Journal of Approximate Reasoning, 51:1029 1052, 2010. Gert de Cooman, Filip Hermans, and Erik Quaeghebeur. Imprecise Markov chains and their limit behaviour. Probability in the Engineering and Informational Sciences, 23(4):597 635, 2009. arxiv:0801.0980. B. de Finetti. Teoria delle Probabilità. Einaudi, Turin, 1970.

Literature III B. de Finetti. Theory of Probability: A Critical Introductory Treatment. John Wiley & Sons, Chichester, 1974 1975. English translation of [5], two volumes. Christiaan Huygens. Van Rekeningh in Spelen van Geluck. 1656 1657. Reprinted in Volume XIV of [8]. Christiaan Huygens. Œuvres complètes de Christiaan Huygens. Martinus Nijhoff, Den Haag, 1888 1950. Twenty-two volumes. Available in digitised form from the Bibliothèque nationale de France (http://gallica.bnf.fr). G. Shafer and V. Vovk. Probability and Finance: It s Only a Game! Wiley, New York, 2001.

Literature IV Peter Walley. Statistical Reasoning with Imprecise Probabilities. Chapman and Hall, London, 1991. Thomas Augustin, Frank Coolen, Gert de Cooman and Matthias Troffaes (eds.). Introduction to Imprecise Probabilities. Wiley, Chichester, mid 2012. Also look at www.sipta.org