Active Elicitation of Imprecise Probability Models

Size: px

Start display at page:

Download "Active Elicitation of Imprecise Probability Models"

Beryl Daniels
5 years ago
Views:

1 Active Elicitation of Imprecise Probability Models Natan T'Joens Supervisors: Prof. dr. ir. Gert De Cooman, Dr. ir. Jasper De Bock Counsellor: Ir. Arthur Van Camp Master's dissertation submitted in order to obtain the academic degree of Master of Science in Electromechanical Engineering Department of Electronics and Information Systems Chair: Prof. dr. ir. Rik Van de Walle Faculty of Engineering and Architecture Academic year

3 Active Elicitation of Imprecise Probability Models Natan T'Joens Supervisors: Prof. dr. ir. Gert De Cooman, Dr. ir. Jasper De Bock Counsellor: Ir. Arthur Van Camp Master's dissertation submitted in order to obtain the academic degree of Master of Science in Electromechanical Engineering Department of Electronics and Information Systems Chair: Prof. dr. ir. Rik Van de Walle Faculty of Engineering and Architecture Academic year

4 Preface When looking back on the past year, the main thought that comes to my mind is how fast it went by. Too fast, honestly. I have enjoyed the process of making this masterthesis from start to finish. The depth and variety of problems encountered during this project were a real challenge for my mathematical and scientific competences. In my opinion, nothing comes ever close to the satisfaction of solving a non-trivial mathematical problem. So, you could already imagine how much fun I had the past year. On top of that, it convinced me of what I wanted to do in the future. However, all of this would have not been the case when it was not for the three persons that have guided me intensively: Gert De cooman, Jasper De Bock and Arthur Van Camp. It was a real pleasure to work with all three of them. The passion they share for this matter is just stunning. It still inspires me every time. They convinced me to take this subject for my masterthesis and I have not regretted it for a second. The thing that maybe boggles my mind the most about these people, is how they are capable of combining pure professionalism with a very friendly and informal atmosphere. In particular, I want to thank Gert for spending his precious time with me to think about the directions we could take and the problems we could tackle in this masterthesis. He continuously served me with challenging, but then again, still feasible problems to think about. These problems were not always equally relevant for the purpose of this masterthesis, but sometimes purely interesting from a intellectual point of view. In this way, he kept the balance between targeted guidance and freedom to discover the field by myself. I want to thank Jasper for the enormous amount of feedback and advice he gave me during the writing of this thesis. However, sharing knowledge is one thing, motivating a person to apply this knowledge, is another. Jasper succeeded in this without any problem. For the most part, he brought me my passion for this subject. I also want to thank Arthur for all the effort he has put into my guidance. Apart from the support, I am also thankful for being extremely kind and approachable during the whole year. It was a pleasure. To conclude, I want to thank some people that did not have a direct impact on this thesis, but certainly were positively influencial for the work I have delivered during the past year. It seems like a good moment to thank these people for the support and help they gave me not only during the past years, but also during the whole of my studies. A lot of thanks goes to my parents, Steve and Marina and my brother, Elias. I can understand that talking about scientific subjects is not always as interesting for them as it is for me. Nonetheless, they always remained committed to my interests and gave me every oppurtunity to pursue my passion. I would also like to thank all my friends and especially my love, Lisa, for the indispensable entertainment and love they have brought me. Permission to loan The author gives permission to make this master dissertation available for consultation and to copy parts of this master dissertation for personal use. In the case of any other use, the copyright terms have to be respected, in particular with regard to the obligation to state expressly the source when quoting results from this master dissertation. June 2, 2017

5 Abstract We consider the problem of eliciting an imprecise probability model from an expert in the field. When doing this, one can distinguish between two different, yet related goals. The first is to construct an imprecise probability model that captures the expert s beliefs as completely as possible. The second is to gather information that is aimed specifically at answering a given question or solving a given decision problem. We here focus on the latter. The mathematical framework in which we study this problem, is that of sets of desirable gambles. Questions with regard to probabilities can be translated to the desirability of gambles. For instance, a positive answer to the question Do you think A is more likely to occur than B?, is equivalent to a specific gamble being desirable. We consider the problem where consecutive questions are chosen from a limited set of possible questions in an optimal way, such that the decision problem can be solved as quickly as possible. The decision problem will typically be formulated as making a choice between two gambles, or in general, choosing one gamble out of a set of gambles. First, we make an abstraction of the problem by simplifying the kinds of answers an expert can give. These problems are closely related to the study of convex closed cones, and serve us with theoretical insight that is useful in tackling more general cases. For these more complex problems, we provide a few preliminary ideas and insights. As these problems are not very effectively approached theoretically, we propose heuristic and semi-heuristic methods to solve them. Finally, we give results of simulations using these methods and draw conclusions. We also discuss possible approaches for similar problems, together with some ideas for future research.

6 Active Elicitation of Imprecise Probability Models Natan T Joens Supervisors: Prof. dr. ir. Gert De Cooman, Dr. ir. Jasper De Bock Counsellor: Ir. Arthur Van Camp Abstract We deal with the problem of elicitating an imprecise probability model in order to solve a decision problem. This is done by asking targeted questions to an expert in the field. The problem is considered in the framework of sets of desirable gambles. An abstract simplifaction of the problem is approached theoretically. The general problem is dealt with by heuristic and semi-heuristic methods. Keywords elicitation, desirable gambles, heuristics, decision making under uncertainty I. Introduction When faced with the problem of making a decision whose utility depends on the uncertain outcome of an experiment, we often lack sufficient knowledge for making such a decision. We will then be obliged to gather more information about the outcome of the experiment. Concretely, we consider the problem of asking targeted questions to an expert on the subject, so that we can subsequently update our imprecise probability model in order to solve the given decision problem. We call this targeted updating of an imprecise probability model the active elicitation of the imprecise probability model. We can also elicitate an imprecise probability model with the goal to capture the expert s beliefs as complete as possible. This is another problem we will not consider here. To tackle our problem, we will use the mathematical framework of sets of desirable gambles to model imprecise probabilities. This framework is chosen because it is elegant to work with when continuous changes are made to the probability model. However, two other frameworks to model imprecise probabilities, credal sets and lower previsions, may be closer related to classical probability theory. Therefore, and because we want to convince the reader of the relevance of imprecise probabilities, the first chapter will discuss these two frameworks and how the transition from precise to imprecise probabilities can be made. The second chapter will discuss sets of desirable gambles in a broad sense. In the third chapter we will consider an abstract simplification of the general problem. Chapter 4 will consider the general problem and the use of heuristic and semi-heuristic methods to solve it. II. Introduction to Imprecise Probabilities The theory of imprecise probabilities allows for a broader and more complete modellation of a person s belief. In this theory, beliefs about the uncertain outcome of an experiment are not modeled by one unique probability measure, as is done in classical probability theory, but by a set of them. When asking a person s belief about rolling a six with a fair die, he will probably say the probability is 1 /6. This is because it is generally assumed that a fair die falls on each of its sides approximately 1 /6 of the times. However, in most cases experiments are more complex than just rolling a die. Then it is not so obvious anymore to state one specific probability measure. For instance, assume we ask a person s belief about the weather next week. He estimates the probability that it is going to rain, to be 50%. When we would then ask him whether he would also agree with the statement that the probability is 49%, he will probably say yes. We could repeat this argument. We then end up with a set of possible probabilities according to the person s belief about the event that it will rain next week. So why did he then state 50% in the beginning? Usually, this will be some kind of averaged out value for all the probabilities he deems possible. This is however a too precise statement that does not represent his actual belief. It could lead to premature decisions that are not justified. A set of possible mass functions p or linear previsions P these are the corresponding expectation operators is called a credal set and is denoted by M. Although credal sets are intuitive to work with, they are not very practical as they are a set of functionals. A lower prevision P M associated with a closed convex credal set M captures all information available from the credal set in one functional. For any gamble real-valued map on the possibility space Ω f, the lower prevision is defined as: P M (f) := inf {P (f) : P M} A lower prevision can also be defined on its own, without a credal set. When it is derived from a credal set, it is always coherent. Coherence is the mathematical translation of our common sense about probabilities. It defines three criteria which a coherent lower prevision is bound to. It allows us to work with these lower previsions efficiently and will always be assumed. III. Sets of Desirable Gambles Imprecise probabilities can also be modelled in the framework of sets of desirable gambles. We collect all gambles in L(Ω) and call a gamble desirable when we strictly prefer it to the status-quo situation [2]. An assessment of desirable gambles is denoted by A L(Ω). Coherence can also be defined for sets of desirable gambles and follows from the same rationality criteria as was defined for lower previsions. By taking the so-called natural extension E(A) of an assessment A, the assessment A can be extended to a coherent set. Generally, there exist multiple coherent extensions, but the natural extension is used because it is the least committal.

7 The lower prevision can also be equivalently defined starting from the framework of desirability. For any gamble f L(Ω) the lower prevision associated with a coherent set of desirable gambles D is defined as: P D (f) := sup {α R : f α D} As questions with regard to lower previsions or credal sets can equivalently be translated in terms of the desirability of gambles, the framework of sets of desirable gambles does not induce any boundaries on the practical relevance of the problem we consider. Furthermore, the framework of sets of desirable gambles is appealing to work with when elicitating a probability model because it reduces to the geometrical study of convex cones. For this, inspiration was found in the CONEstrip algorithm [1]. We also discuss some methods on how decision making can be done when dealing with imprecise probabilities. More precisely, we will limit ourselves to decision problems that can be reformulated as choosing one or more optimal gambles among a given set F. IV. An Abstract Simplification of Active Elicitation of Imprecise Probabilities We consider the problem where the expert can only make a positive statement about the desirability of a gamble. This abstract assumption will allow a theoretical approach to the problem, which proves useful to get insight into the general problem. We search for the minimal subsets V of a set of gambles V which represents the set of possible questions so that when these gambles are stated desirable, the set of desirable gambles remains coherent and is expanded enough to allow us to make a decision on choosing an optimal gamble in F. First, we consider the case where F has two elements f 1 and f 2. The decision problem then comes down to checking whether f 1 f 2 or f 2 f 1 lies in the set of desirable gambles, which is a geometrical problem, closely related to the theory of convex cones. When both are included, the set of desirable gambles is incoherent. An efficient algorithm was developed for this simplified problem where the initial exponential complexity in V could been limited to a polynomial complexity in V by using Caratheodory s theorem [4, Section 17]. However, in the general case, where F conists of more than two elements, the complexity remains exponential in V. In realistic situations, V will usually be large, so this complexity will be a deciding factor for the practical feasibility of the algorithm. V. Simulations, Heuristics and Semi-heuristics To be practically relevant, we have to allow statements about the desirability of a gamble other than considering it desirable. To this end, multiple generalisations with different kind of statements can be regarded. We choose to work in a framework where only a positive, negative and neutral statement can be made. A positive statement corresponds to the desirability of the considered gamble, a negative statement corresponds to the desirability of the negative of the considered gamble and a neutral statement will not tell us anything. To limit the complexity of the problem, we only regard the problem where F consists of two elements. Contrary to the previous chapter, we will have to interact with the expert. We now ask questions depending on the previous answers given, so uncertainty must be introduced in the algorithm. Because of this, it is difficult to compute some kind of global optimal solution as it would be necessary to consider all possible combinations of answers. Since this is most likely an unmanageable task, we consider heurisic methods. These methods are computationally very efficient, but will in most cases result in a suboptimal solution. In order to improve optimality, a preliminary semiheuristic algorithm is proposed, based on our knowledge from Chapter 4. However, a limited amount of simulations indicate that the performance is not really improved with respect to the heuristic methods. VI. Conclusions The problem of actively elicitating an imprecise probability model in a somewhat realistic context seems to be very complex. The practical feasibility of finding an optimal solution is heavily dependent on the statements an expert can make. Variants of the problems where a different kind of statement-based model, for example the accept-reject statement-based model [3], is incorporated, were left unconsidered in this work. These could be interesting to study in the future. However, our intuition is strong that these will share the same issues. Further work on the heuristic methods could prove useful as well, as we will be confined to them when dealing with realistic problems. However, the proposed semiheuristic method is not completely written off as more simulations would be necessary to make such a statement. We belief that this idea still has much potential and that many improvements remain possible. Acknowledgements The author would like to acknowledge the help and exceptional guidance that Gert De Cooman, Jasper De Bock and Arthur Van Camp have given him during his research. References [1] Erik Quaeghebeur. The CONEstrip algorithm. In Synergies of Soft Computing and Statistics for Intelligent Data Analysis, pages Springer, Berlin, May [2] Erik Quaeghebeur. Desirability. In Thomas Augustin, Frank P. A. Coolen, Gert de Cooman, and Matthias C. M. Troffaes, editors, Introduction to Imprecise Probabilities, pages John Wiley & Sons, [3] Erik Quaeghebeur, Gert de Cooman, and Filip Hermans. Accept & reject statement-based uncertainty models. International Journal of Approximate Reasoning, 57:69 102, [4] Ralph Tyrrell Rockafellar. Convex Analysis. Princeton University Press, 1970.

8 Contents 1 Introduction 1 2 Introduction to Imprecise Probabilities Orientation and why we use imprecise probabilities Linear previsions Equivalence with mass functions A geometric interpretation Credal sets & lower previsions Coherence Sets of Desirable Gambles Introduction Coherence for sets of desirable gambles Rationality criteria Natural extension Correspondence with lower previsions and credal sets Back to linear previsions Credal sets To Lower previsions From lower previsions Preference orders and decision making under uncertainty Preference orders based on sets of desirable gambles Why we need decision making Decision making for imprecise probability models Accept-reject statements Convex cones and the CONEstrip algorithm Introduction to convex cones Computational aspects of working with convex cones Abstract Simplification of Active Elicitation of Imprecise Probability Models Introduction The basic problem Formulation of the problem Simplification using Carathéodory s theorem Complexity An extended problem Formulation of the problem Idea behind the algorithm Complexity Simulations, Heuristics and Semi-heuristics 48

9 5.1 To a broader framework A benchmark & Heuristics Computational aspects of the simulations A benchmark Heuristics Towards a more optimal approach Considerations regarding previous approaches A semi-heuristic Conclusions 64 A Results Simulations 65 A.1 Heuristics A.2 Stability of the results B Python code 68 B.1 Create a list of V s and G s B.2 Simulation of the max( )-heuristic B.3 Simulation of the semi-heuristic

10 Chapter 1 Introduction Probability theory is already a well-established theory for a decent amount of the time and is currently being applied in almost every state-of-the-art technology. More precisely, it is precise probability theory that has been studied exhaustively with succesful results. However, another variant of probability theory, known as imprecise probability theory, is less known, but has a lot of potential for future use in practice. In particular, imprecise probabilities have the potential to represent the belief of a person in a more complete and versatile way than classical precise probabilities can. In fact, precise probabilities can be seen as a specific case of the imprecise variant. In precise probability theory we assume to work with only one probability measure, whereas imprecise theory deals with a set of possible probability measures. In that way, when asking about a person s belief about the outcome of an experiment, he does not have to state one specific probability measure. By taking this into account, we prevent making premature decisions with regard to the experiment s outcome. Although a considerable amount of work has been done on imprecise probabilities, little is known on how such an imprecise model can be elicited efficiently. We here consider the problem where the model is built up by asking targeted questions to an expert in the field. In this way, we gain information about his belief towards the subject and the model becomes more precise. The targeted questioning of an expert in order to model his belief, is called active elicitation of the imprecise probability model. As the theory of imprecise probabilities is no common knowledge, we will give a short explanation on how the gap between precise probabilities and imprecise probabilities can be bridged in the second chapter when considering the introduction as the first chapter. In the third, we will elaborate on how the mathematical framework of sets of desirable gambles can be used to model imprecise probabilities, as we will consider our problems in this framework. The fourth chapter considers an abstract simplification of the more general elicitation problem that we aim to solve. The fifth chapter will then propose different methods to approach the general problem, and will present and discuss some simulations and results. Both chapters four and five consider the elicitation of an imprecise probability model in order to solve a given target question. This differs from the problem where elicitation is done to model a person s belief as completely and efficiently as possible. There is however a strong link between both problems and principal ideas from this work could inspire the reader on how to approach this other problem. 1

11 Chapter 2 Introduction to Imprecise Probabilities 2.1 Orientation and why we use imprecise probabilities Uncertainty can be modelled in many ways. The most common approach is to assign a probability to uncertain things. This so called probability is a way to represent our belief or knowledge about something which is uncertain. This something can be the outcome of an experiment which has yet to take place, such as rolling a die, or some kind of information which we do not exactly know because of a lack of resources. For the second case this can be, for instance, a die that has already been rolled, but we can t see the outcome yet. We will refer to this whole of actions or occurences that has an uncertain outcome as the experiment. The experiment has multiple possible outcomes otherwise we would be certain of the outcome which we will refer to as elementary events. The occurence of an event is theoretically equivalent to the case where all other events are deemed impossible. Indeed, when all possibilities but one are deemed impossible, we are certain of the outcome of the experiment. Certainty corresponds to a probability of 1. Impossibility corresponds to a probability of 0. The probability we associate with an event is then defined as the quantitative representation of our belief towards the occurence of this event, scaled linearly between 0 and 1, where 0 corresponds to impossibility and 1 corresponds to certainty. The sum of the probabilities of the possible outcomes should be equal to 1. This ensures that, when all possibilities but one are deemed impossible, the one remaining is certain to occur. Notationally, the uncertain outcome of an experiment will be denoted as X, and is called the state variable. The set of all elementary events of an experiment is called the possibility space, denoted as Ω = {ω 1, ω 2,...}. The state variable X takes a value in Ω. An event is denoted as A, and is a subset of Ω. Elementary events are mutually exclusive, they are singletons in Ω, whereas two events A 1 and A 2 in general can have mutual parts. Events are in general combinations of elementary events, for instance event A can be the occurence of ω 1 or ω 2. When the experiment is executed with an outcome ω, or when we are certain of the elementary event {ω}, the state variable becomes equal to the event s value X = ω. The probability of an event is denoted as p(x A). A = is the (trivially) impossible event with probability p( ) = 0, whereas A = Ω is the certain event with probability p(ω) = 1. 2

12 To illustrate this, when throwing a coin, the possibility space is Ω = {heads, tails}. X becomes equal to either heads or tails when the coin is thrown. When a fair coin is considered, we can say that both outcomes are equally likely, which would mean p(x = heads) = p(x = tails), and as they need to sum to one, we have that p(x = heads) = 1 /2 and p(x = tails) = 1 /2. The function that maps every elementary event to his probability of occurence, will be called the mass function p. As follows from the above, it has the following properties: A1. ( ω Ω) 0 p(ω) A2. ω Ω p(ω) = 1 From this mass function p, the probability of any event A can be calculated by summing the probabilities of the composing elementary events. For the case where the possibility space has an uncountably infinite amount of elements, for example when Ω is continuous, the mass function is replaced by a probability density function. We can then no longer associate probabilities larger than zero with the occurence of elementary events, and the sum is then replaced by an integral. Countably infinite possibility spaces are also somewhat special, and will not be regarded either. In the remaining of this masterthesis we will only regard finite possibility spaces. The concept of probability is open for interpretation. The consequence of this is that there exist many approaches with different nuances. We can make a distinction between two major approaches: the subjective approach and the frequentist approach. The subjective approach, as considered above, considers the probability of an event as the representation of a person s belief towards how likely it is to occur. Indeed, this has nothing to do with a physically or mathematically defined value. It depends on the person s individual experience, knowledge, character,... with respect to the subject. The frequentist approach on the other hand, says that the relative frequency of the occurence of an event converges to the probability of occurence of that event, when an experiment is repeated in a similar way. You could say that this definition is much more strict and mathematically thorough, but there are multiple problems with this approach. In many cases the nature of the experiment does not allow it to be repeated in a similar fashion. For instance: what is the probability of that bridge collapsing with the current weather conditions? This experiment can t be repeated multiple times, unless you would always wait for the same weather conditions and would build a new identical bridge when the previous one collapsed, which would be quite an impractical approach. The measurements will thus be too few to give an accurate representation of the probability. Then again, from which point on is it representative? Another thing that comes to mind is, what does repeated in a similar way exactly mean? The experiments can t be exactly the same, otherwise it would always have the same outcome. They also can t be too different, otherwise they are not a good representation of the proposed experiment you want to model. In the example, considering another bridge is in many ways irrelevant with regard to our proposed bridge. So, where the frequentist approach may seem theoretically more appealing to use for some people, it has a lot of technical issues to be dealt with. For this reason we choose to adopt the subjective approach in this masterthesis. We will therefore always assume that the subjective probabilities are obtained from a relevant source, such as an expert in the field. We don t say that the probabilities given by this source are the correct ones (as these don t really exist), we just say they are relevant and should be considered when they are available. 3

13 An obvious thought which comes with the subjective approach is the following. We want to translate a person s belief to a quantitative measure. So if you would ask for the person s belief towards the occurence of an event, he will give you a percentage of which he deems it is representative for his belief towards the probability of that event. But our belief about a probability does not work in this strict way. For example, a sports journalist has to give the probability of Tom Boonen winning the Tour of Flanders next week. Based on his experience over the past ten years, the current state of the competitors and the weather forecast for next week, the journalist gives it a probability of 10%. Based on this opinion, you bet an amount of money to the glory of Tom Boonen next week. But what if you could ask whether the journalist is sure that the probability is 10% and not for instance 9%? He will probably also agree with this 9%. This may change your bet to some lower amount, or it may even result in no bet at all. Posting that the probability is equal to 10%, is probably a good average, but it is too specific as a representation of the journalist s belief. In most cases, except for simple experiments, giving a precise probability for events is not justified as it is just some kind of intermediate value of all the probabilities we deem possible. It is not a precise unique measure of the person s belief. Imprecise probabilities will take into account this imprecision, as it will not work with one probability measure, but with a set of possible probabilities. It thus is a more general and complex approach, but it serves for a more robust and safe way of handling and deciding. The theory of imprecise probabilities is an extension of classical probability theory. According to this classical probability theory we characterise an experiment by one unique mass function (or in the continuous case a probability density function). In the theory of imprecise probabilities this characterisation is replaced by a set of possible mass functions. This allows for a lot more flexibility, and is almost always more representative for reality. For instance, suppose that the only knowledge you have is that tomorrow it is more likely to be sunny than it is to be rainy. All mass functions, with the probability of a sunny day larger than the probability of a rainy day, together, is then appropriate to model this knowledge. Stating one of these mass functions, would be too determinative. It could lead to a decision based on information that is not really available. The set of all these possible mass functions will be defined as the credal set M. The meaning of this credal set is even broader than may be concluded from previous examples. It can well be that a certain event cannot be more accurately specified than by an interval of probabilities, because of the very nature of the physical experiment. Although very interesting, this is more of a philosophical discussion, which we won t dive into for this masterthesis. Although the use of credal sets is an intuitive way to model imprecise probability, there are also other approaches, which are sometimes practically more appealing to work with. In this masterthesis we will mainly focus on the framework where imprecise probabilities are modelled with sets of desirable gambles. In this approach we define functions, called gambles, on the possibility space and say something about their desirability. Yet another approach uses a functional, called the lower/upper prevision, on these gambles to describe the degree to which they are desirable or not. In the next two sections, the transition will be made from classical probability theory to the theory of imprecise probabilities, by explaining the frameworks of credal sets and lower/upper previsions, as these are more intuitive. In the next chapter we will focus on the framework of desirable gambles and its similarities and differences with the other frameworks. 4

14 2.2 Linear previsions Equivalence with mass functions The most intuitive way to change our mindset from traditional probability theory to imprecise probability theory, is by means of credal sets. These define, as already introduced, a set of possible mass functions on the possibility space. This arises when no precise mass function can be elicitated or when different mass function are possible according to different sources. To bridge the gap between precise probability, we first use linear previsions to describe these precise probabilistic models. Before explaining linear previsions however, we should introduce the concept of gambles. Gambles are real-valued functions on the possibility space. We interpret a gamble f as an uncertain payoff that depends on the uncertain outcome of the experiment. It associates with every possible outcome of the experiment a (possibly negative) quantity or payoff. Here, as you would expect, positive payoffs are desirable, and negative ones are undesirable. In this way you could think of it as an amount of money you receive (or have to pay). The set of all gambles on Ω is denoted by L(Ω). An indicator I A, corresponding to an event A, is defined as the gamble: I A (ω) = { 1 if ω A 0 otherwise for all ω in Ω We will focus more on gambles in the third chapter about desirable gambles. For now, we define the linear prevision P on L(Ω) as the linear, normed functional, associated with the mass function p as: P (f) = ω Ω f(ω)p(ω) for all f in L(Ω) (2.1) It gives a so-called fair price for a gamble f, which is called its prevision P (f) or its expected value E(f(ω)) according to the precise probability model p. Indeed, every payoff is scaled with the probability of occurence of the corresponding elementary event, such that the result is a fair price for the gamble f. The definition is determined by the corresponding mass function that is used. In this way, there is a one-to-one correspondence between mass functions and linear previsions. When the mass function on every elementary event {ω} is known, we can construct the linear prevision for every gamble f in L(Ω). Conversely, when the linear prevision on every indicator I ω := I {ω} for every elementary event {ω} is known, we can compute the mass function p(x), because P (I ω ) = p(ω). Indeed, the linear prevision P (I A ) of an indicator I A is the probability of the event A: P (I A ) = I A (ω)p(ω) = p(ω) = p(a) (2.2) ω Ω ω A In fact, we don t need the linear previsions of the indicators, but just a set of Ω linear previsions of gambles which are linearly independent. The set of all linear previsions on L(Ω) is denoted as P(Ω) A geometric interpretation Although gambles are functions, and linear previsions are functionals, these can be presented by vectors of size Ω. For gambles, the elements of the vector representation, are the respective payoffs 5

15 for every elementary event {ω}. For linear previsions (and also mass functions), the elements are the probabilities of the respective elementary events. Explicitly, f = (f(ω 1 ), f(ω 2 ),..., f(ω n )) P = (P (I ω1 ), P (I ω2 ),..., P (I ωn )) with n = Ω. The prevision P (f) is then the scalar product the vector representations of the gambles f and P. At this point, it is useful to also give a visual representation of gambles and linear previsions. This will only be possible in the case where Ω has 2 or 3 elements. For the two-dimensional case, we can consider the experiment of tossing a coin. Thus, Ω = {H, T} and L(Ω) consists of all twodimensional vectors, where the first element represents the payoff when the experiment s outcome is heads, and the second element the payoff corresponding with tails. So, L(Ω) is isomorphic with R 2. Gambles in the first quadrant always give a nonnegative payoff, no matter the outcome. This makes them desirable to have or receive, because it will never diminish your wealth, and will in some cases increase it. Gambles in the third quadrant alaways give a nonpositive payoff, which makes them not desirable to have, as wealth is never increased and in some cases decreased (see figure 2.1). This rational sense about the desirability of gambles, is the basis for the framework of sets of desirable gambles in Chapter 3. T ( 3 2, 3 10 ) H ( 6 5, 3 2 ) Figure 2.1: The gamble ( 3 /2, 3 /10) gives a payoff of 3 /2 when the outcome is Heads, and 3 /10 when it is Tails. For the gamble ( 6 /5, 3 /2), we have to pay 6 /5 on Heads, and 3 /2 on Tails. The linear previsions of this experiment can also be represented in the R 2 -plane, but it has the additional constraint that elements of the vector should sum to one. Linear prevision representations are thus always located on the sum-one-plane in R Ω. Because of this, the representation of linear previsions can be done with one dimension less compared to the representation of gambles. In two dimensions, all possible linear previsions lie on the straight line connecting (1, 0) and (0, 1). (1, 0) is the linear prevision which corresponds to the certainty of always tossing heads, (0, 1) corresponds to always tossing tails. Every point (x, y) in between, is a convex combination of these two extrema, meaning (x, y) = θ(1, 0) + (1 θ)(0, 1), for some θ in [0, 1]. These are all possible linear previsions because: x = θ 0 y = (1 θ) 0 x + y = 1 In other words, P(Ω) can be represented by the line segment between (1, 0) and (0, 1) (see figure 2.2). In general, P(Ω) is a closed convex set, whose extreme points are the degenerate previsions P ω. There is one degerate prevision associated with every elementary event ω, denoted as P ω. 6

16 p(h) = 0 p(t ) = 1 p(h) = 1 p(t ) = 0 Figure 2.2: P({H, T }) can be represented in 1 dimension. The probability of one elementary event, automatically tells you the probability of the other because ω Ω p(ω) = 1 It is the prevision which ensures the occurence of the event ω. Mathematically, it is defined as P ω (f) = f(ω) for every gamble f. This corresponds with the mass function for which p(ω) = 1, and zero for all other elementary events. We can write P(Ω), with n = Ω, as P(Ω) := conv {P ω : ω Ω} := {θ 1 P ω1 + θ 2 P ω θ n P ωn : n θ i = 1 and θ i 0 for all i}. i=1 For a three-dimensional case, we could take the example of drawing a coloured ball from an urn. There are three kinds of balls in the urn: red ones (R), green ones (G), and blue ones (B). The possibility space is now Ω = {R, G, B}. A gamble is then represented as a 3-dimensional vector, whose first element is the payoff when drawing a red ball, the second element the payoff when drawing a green one, and the third the payoff when drawing a blue one. The set of all possible gambles L({R, G, B}) is thus isomorphic with R 3. In this case we can t represent the gambles anymore, but we can represent the linear previsions. Again, P({R, G, B}) can be presented as the closed convex set between (1, 0, 0), (0, 1, 0) and (0, 0, 1). This is an equilateral triangle (see figure 2.3). For a general dimension n = Ω, P(Ω) can be represented by a regular (n 1)-simplex, known as the probability simplex. A k-simplex is defined as the convex hull of its k + 1 vertices. So the probability simplex can be defined as the set {p R Ω : ( ω Ω)0 p(ω) and ω Ω p(ω) = 1} In one dimension, this is a line segment, in two dimensions, this is a triangle and in three dimensions it is a tetrahedron. It is regular, because the extreme points of the simplex are the unit vectors. So all edges have length 2. So in the 4-dimensional case, where we can t visually present or even imagine gambles anymore, the set of all linear previsions will be a regular tetrahedron. p(r) = 0 p(g) = 1 p(b) = 0 p(b) p(r) p(r) = 1 p(g) = 0 p(b) = 0 p(g) p(r) = 0 p(g) = 0 p(b) = 1 Figure 2.3: geometric representation of P({R, G, B}). The corresponding probabilities are presented by the distances of the considered prevision to the edges of the simplex We emphasize this geometrical representation, because we will be using it in the next sections, 7

17 as a helpful tool to catch the ideas behind imprecise probabilities. We will often be using the 3-dimensional representation, instead of the 2-dimensional one, because it allows for some more complex and interesting situations. It will also prove useful for Chapter 4 where the considered problem will be closely related to the study of convex closed cones and simplices. 2.3 Credal sets & lower previsions As explained earlier, in the case of imprecise probabilities we consider a credal set M that consists of multiple mass functions. As every mass function p corresponds to a linear prevision P p, the credal set M will from now on be presented by the set of linear previsions, instead of mass functions. Generally, we will assume that the credal set M is any set of linear previsions. Most conventions however assume credal sets to be closed and convex. The assumption of convexity is the mathematical interpretation of the fact that when multiple mass functions are considered possible, the intermediate mass functions should also be possible mass functions. For example, consider we have two possible mass functions for a two-dimensional experiment: (p(a) = 40%, p(b) = 60%) and (p(a) = 30%, p(b) = 70%). Then it seems rational to assume that every mass funtion with 30% p(a) 40% (or 60% p(b) 70%) is also possible. So, when defining a convex credal set M, giving the boundaries should be enough. However, we should allow the possibility of non-convex sets. Take, for example, a three-dimensional experiment. The following statement is made about the probability measures of the experiment: The probability of A is the largest or the probability of C is the largest. From the first part of the statement it follows that the mass function p 1 with (p 1 (A) = 60%, p 1 (B) = 40%, p 1 (C) = 0%) should be possible and from the second part it follows that the mass function p 2 with (p 2 (A) = 0%, p 2 (B) = 40%, p 2 (C) = 60%) should be possible. Because of convexity p 3 = 1 /2p /2p 2 with (p 3 (A) = 30%, p 3 (B) = 40%, p 3 (C) = 30%) should then also be a possible mass function. As a result, when stating that A or C has the largest probability, adopting convexity implicitly implies that it is also possible that B has the largest probability. We should ask ourselves whether we can indeed do this or not. Convexity is thus not as trivial as it seems. The assumption of closedness is more an assumption for convenience. In practice, it matters little whether the boundaries of the credal set are closed or open. So, for ease, we could assume closedness of the credal set. Although credal sets have an intuitive feel, they are not very practical to work with mathematically. To make conclusions or decisions based on credal sets, we would be obliged to look at every single prevision in the credal set M. This is impractical because M will generally include an infinite amount of previsions. We need a more efficient approach which captures the information about the imprecise probability model more elegantly. One way is by using lower or upper previsions. The lower prevision P M, associated with a credal set M, is defined as: P M (f) := inf {P (f) : P M} for all f in L(Ω) (2.3) We can interpret this lower prevision as the infimum expected value of a gamble f among the expected values of f according to the imprecise probability model. Geometrically, it is the value of the scalar product of the gamble f with the linear prevision that minimises this vector product. Analogously, we can define the associated upper prevision P M as: 8

18 P M (f) := sup {P (f) : P M} for all f in L(Ω) (2.4) In fact, one of the two suffices to describe the probability model, because they can be equivalently transformed to one another: P M (f) = sup {P (f) : P M} = inf { P (f) : P M} = inf {P ( f) : P M} = P M ( f) (2.5) This property is called conjugacy. In the following, only lower previsions will be used. Similar to linear prevision, the upper and lower previsions on an indicator gamble I A, can be regarded as upper or lower probabilities of the event A. We will use the following notation: P M (A) := P M (I A ) (2.6) P M (A) := P M (I A ) = P M ( I A ) = P M (I A c 1) = 1 P M (A c ) (2.7) where the A c is the relative complement of A in Ω: A c := Ω \ A. For closed convex credal sets M, the associated lower prevision P M will bear all the information included in M. This follows from the convexity and closedness of M. When M is not closed, the border structure will be lost when converting to lower previsions. For non-convex credal sets information will also be lost. We can also define a lower prevision P on its own. From this, we can trivially derive the associated credal set M(P ) as: M(P ) := {P P(Ω) : ( f L(Ω))P (f) P (f)} (2.8) It is easy to see that M M(P M ). If M is closed and convex, we also have that M M(P M ); see [2, Theorem 4.4] for a proof. A credal set M(P ) constructed out of a lower prevision will always be a closed convex set. 2.4 Coherence Lower previsions allow the introduction of coherence as a clear concept. Coherence is the mathematical translation of our common sense about probabilities. It ensures that we handle these probabilities in a rational way. It will always be assumed as a property of the imprecise prbability model, unless mentioned otherwise. The concept may even be more clear when explained in the context of sets of desirable gambles, which will be explained in the next chapter. We say that a lower prevision P is coherent if and only if it has the following three properties: C1. P (f) min f for all f L(Ω) [boundedness] 9

19 C2. P (f + g) P (f) + P (f) for all f, g L(Ω) [super-linearity] C3. P (λf) = λp (f) for all f L(Ω) and all real λ > 0 [positive homogeneity] These properties can be interpreted/justified (based on M) as follows: (C1) As the prevision (or expected value) of a gamble f should be larger (or equal) than the minimum payoff of that gamble f, it should also be so for the minimum prevision P (f) = P (f). This means that the lowest fair price for a gamble f should be at least as large as its minimum value. (C2) As the utility scale of payoffs is assumed linear, in the sense that twice the payoff is twice as valueable, previsions should scale linearly with a common factor in the gamble s payoffs. So the prevision of a linear combination of gambles should be equal to the linear combination of the gambles repective previsions. Now consider the lower previsions of two gambles f and g. These are the expected values of these gambles for their respective worst case mass functions in the credal set M. Therefore, the expected value P (f + g) for every other mass function in the credal set M will be equal or higher than P (f) + P (g), since these are the minimal values. (C3) We can repeat the same argument as for C2. As the gamble λf is a positively scaled version of the gamble f, their minimal expected values correspond to the same linear prevision. As a result, their lower previsions will just be scaled with a factor λ. These statements immediately explain why a lower prevision deduced from a credal set M is always coherent. A mathematical proof for this can be found in [2, Theorem 4.1]. For a coherent lower prevision, we have the following additional properties: Proposition For any coherent lower prevision P and α R representing a constant gamble, it holds that P1. min f P (f) P (f) max f P2. P (α) = α P3. P (f + α) = P (f) + α Proof. (P1) The first inequality follows immediately from coherence. The third inequality follows from the first one by min f P (f) min f P (f) max ( f) P ( f). The second follows from the fact that P (0) = 0 and so P (f) P (f) 0 or P (f) P (f). (P2) α = min α P (α) max α = α P (0) = P (f f) P (f) + P ( f) = P (f) P (f) 10

20 (P3) Because of super-linearity and (P2), we have P (f) = P (f + α α) P (f + α) + P ( α) P (f) P (f + α) α P (f) + α P (f + α) Again by super-linearity, we also have P (f) + α P (f + α), so P (f) + α = P (f + α). These properties will be used extensively in the next chapter. Also, further attention will be given to coherence in the context of sets of desirable gambles. Further work on lower previsions can be found in [1, Chapter 2]. To conclude this section, we give an example in the 3-dimensional case. Example Consider our previous example of an urn with three kinds of balls in it. Then, M = P({R, G, B}). Initially, we don t have any information with regard to the balls inside the urn. The game host knows exactly how many balls of each kind there are inside the urn. He tells you the following: It is at least as likely to draw a red ball than it is to draw a blue ball and The probability on drawing a green ball is equal to or more than 40 %. How can we translate this information into an imprecise probability model? Both statements can easily be translated towards credal sets. The first statement is equivalent to p(r) p(b). This halves our credal set. The extreme points are (1, 0, 0), (0, 1, 0) and ( 1 2, 0, 1 2 ). The second statement is equivalent to p(g) The resulting extreme points are now ( 6 10, 4 10, 0), (0, 1, 0) and ( 3 10, 4 10, 3 10 ); see figure 2.4. The credal set is completely defined by these extreme points. To translate this towards lower previsions, we use equation 2.8. Every statement towards the credal set M can be translated towards the lower prevision P M on a specific gamble f with: P (f) = f(r)p(r) + f(g)p(g) + f(b)p(b) P M (f) For the first statement, this becomes f 1 = (1, 0, 1) and P M (f 1 ) = 0. For the second we have f 2 = (0, 1, 0) and P M (f 2 ) = In the same way we can convert a statement about the lower prevision P M to a statement about the credal set M. In three dimensions, a lower prevision for a gamble will give rise to the criterium of the credal set being on one side of a given line. The gamble determines the angle of this line, the value of the lower prevision determines the position or shift of this line. p(g) = 1 M p(g) 4 10 p(r) = 1 p(r) p(b) p(b) = 1 Figure 2.4: The credal set M with p(r) p(b) and p(g)

Probability theory basics

Probability theory basics Michael Franke Basics of probability theory: axiomatic definition, interpretation, joint distributions, marginalization, conditional probability & Bayes rule. Random variables: