Contents Decision Making under Uncertainty 1 elearning resources Prof. Ahti Salo Helsinki University of Technology http://www.dm.hut.fi Meanings of uncertainty Interpretations of probability Biases in probability elicitation Calibration of experts Discrete Continuous 2/39 Meanings of uncertainty We frequently make statements about uncertainty: It will rain tomorrow. subjective probability The 100 000 th decimal of π is 6. a fact, the uncertainty lies in the available information I win in a lottery with probability frequentist or classical 0,00005. probability interpretation Uncertainty = An event with unknown outcome Probability = A number for measuring uncertainty. 3/39 Interpretations of probability Classical interpretation P.S. Laplace (1825) Probability = The ratio between the number of possible outcomes favourable to the event to the total number of possible outcomes, each assumed to be equally likely. #( A) PA ( ) = #( S) #(A) = Number of possible outcomes favourable to A #(S) = Total number of possible outcomes Circular definition: probability defined in terms of equally likely. Principle of indifference: Events are equally likely if there is no known reason for predicting the occurrence of one event rather than another. Each event is defined as a collection of outcomes. The probability to get 6 when tossing a dice is 1/6. 4/39
Interpretations of probability Frequentist interpretation Interpretations of probability Subjective (Bayesian) interpretation Leslie Ellis, mid 19 th century Probability = The relative frequency of trials in which the favourable event occurs as the number of trials approaches infinity. n( A) n(a) = Number of times that A occurs PA ( ) = lim n n n = Total number of trials You may determine the probability of getting heads by tossing a coin a very large number of times. De Finetti (1937) Probability = Represents an individual s degree of belief in the occurrence of a particular outcome. The probability may change e.g. when additional information is received. The event may have already occurred. I believe it s a 50 % chance that it will rain tomorrow. I m 15 % sure that Martin Luther King was 34 years old when he died. 5/39 6/39 Prerequisite The events must be well defined. Results / Goal A probability number to any outcome of interest. One can use 1) Past evidence 2) Causal models 3) Expert judgement Elicitation of discrete subjective probabilities 1) Direct assessment Ask the respondent to assign numerical values to events. 2) Betting approach Find a specific amount to win or lose such that the DM is indifferent about which side of the bet to take. 3) Reference lottery Compare the uncertain event to events with known probabilities. 7/39 8/39
Direct assessment Betting approach The respondent is asked to assess the desired probabilities directly. E.g. a wheel of fortune (probability wheel) can be used to support the elicitation process. Bet 1 Bet 2 Win X if A happens. Lose Y if A does not happen. Lose X if A happens. Win Y if A does not happen. Bet for A Bet against A A not A A not A X -Y -X Y Event A Event B Event C The mode of questioning may affect the results. 9/39 Adjust X and Y until the respondent is indifferent about which bet to take. Now the expected monetary values of the bets must be equal*: X P(A) - Y (1 - P(A)) = - X P(A) + Y (1 - P(A)) Y PA ( ) = X + Y * The respondent is assumed to be concerned with expected monetary value only (i.e. risk neutral). 10/39 Reference lottery The Ellsberg paradox (1961) (1/2) Lottery Win X if A happens. (X > Y) Win Y if A does not happen. Ref. Win X with prob. p. lottery Win Y with prob. (1-p). The reference lottery can be visualised e.g. with a wheel of fortune or a box with e.g. red and blue balls from which you want to draw a red one. The probability is changed by adjusting the wheel or the number of balls. Once the respondent is indifferent about which lottery to participate in, we have: PA ( ) = p Lottery Ref. lottery The respondent s risk attitude does not affect the result. 11/39 A not A p (1-p) X Y X Y Which game would you choose, 1 or 2? Game 1 Win 1000 if you pick a red ball Game 2 Win 1000 if you pick a blue ball What about 3 or 4? Game 3 Win 1000 if you pick a red or yellow ball Game 4 Win 1000 if you pick a blue or yellow ball Balls in the urn: 12/39 30 60 red blue yellow
The Ellsberg paradox (1961) (2/2) Assessment of continuous subjective probabilities (1/3) Most people choose games 1 and 4. But yellow should not matter! Wins: Red Blue Yellow Game 1 1000 0 0 Game 2 0 1000 0 Game 3 1000 0 1000 Game 4 0 1000 1000 Ambiguity aversion: People tend to prefer events with known probabilities! 13/39 Fractile method The expert is asked to give the feasible range (min, max) median, f 50% (i.e. P(X<f 50% ) = 0.5) other fractiles (e.g. 5%, 25%, 75%, 95%). Typical elicitation questions: min: What is the least possible value you could imagine that the variable X could possible obtain? Median (f 50% ): Give a number f 50% such that, in your opinion, X < f 50% is just as probable as X>f 50%. (i.e. P(X<f 50% ) = P(X>f 50% ) ) 5% fractile (f 5% ): Give a number f 5% such that, in your opinion, X < f 5% is just as probable as picking a red ball from an urn with 1 red and 19 blue balls. (i.e. P(X<f 5% ) = 0.05) 14/39 Assessment of continuous subjective probabilities (2/3) A cumulative distribution function is obtained e.g. by interpolation or fitting a curve from a specific class of functions. Cumulative probability 0,8 0,6 0,4 0,2 1 0 Fractiles and fitted distribution Assessment of continuous subjective probabilities (3/3) Histogram method The possible values of the variable are divided into intervals. The probability of the event corresponding to each interval is assessed. A histogram is gained (=estimate for the density function). Anchoring bias may occur (See bias section.). Variable value 15/39 16/39
Example: Histogram method Scoring rules How much will the GDP (volume) grow in Finland in year 2004? (Average growth 1997-2001 was 4.5 %.) Growth % Probability -5-0 0.10 0-1 0.15 1-2 0.30 2-3 0.20 3-4 0.12 4-5 0.08 5-10 0.05 Elicited 17/39 With scoring rules e.g. the compensation of the respondent can be thread to the accuracy of his reply in such a way that he benefits from telling his true estimate. A scoring rule S(r) is strictly proper if the true opinion of the respondent maximises his expected score: E[S(p)] > E[S(r )], for all r p p = (p 1, p 2,, p n ) = true opinion r = (r 1, r 2,, p n ) = responded distribution The three mots commonly encountered proper scoring rules are: n 2 j( 1, 2,..., n) = 2 j i i= 1 S r r r r r Sj( r1, r2,..., rn) = log( rj) r S ( r, r,..., r ) =,where S j is the score if event j occurs. j j 1 2 n n 12 2 ( r i= 1 i ) 18/39 Example: Scoring rules Biases in probability elicitation A weather forecaster believes that it is a p = 0.7 probability of rain the next day. Which probability r should he respond with, when he gets S 1 (r) = 2*r - r 2 - (1-r) 2 if it does rain and S 2 (r) = 2*(1-r) - r 2 - (1-r) 2 if it does not? His expected compensation is: E[S(r )] = 0.7*S 1 (r ) + 0.3*S 2 (r ), which will be maximised if he responds 0.7, which is his true estimate. Expected compensation 0,8 0,6 0,4 0,2 0 0 0,2 0,4 0,6 0,8 1-0,2-0,4-0,6 r People use rules of thumb, i.e. heuristics in assessing probabilities Often useful May lead to systematic incorrect estimates, biases Gamblers believe their luck will change. Independence: You toss HHHHHHH (H=head), what is next? Is P(HHHHTTTT) < P(HTTHHTHT)? 19/39 20/39
Biases in probability elicitation Representativeness If x fits the description of a set A well, then P(x A) is assumed to be large. The relative proportions (prior prob.) are not taken into consideration. Example: You see a very good looking young woman in a bar. Is she more probably a professional model or a nurse? Many people are tempted to answer a model. Yet, professional models are rare, while there are far more nurses: thus, she is actually more likely to be a nurse. Biases in probability elicitation Availability People assess the probability of an event by the ease with which instances or occurrences can be brought to mind. Example: In a typical sample of text in English, is it more likely that a word starts with the letter K or that K is the third letter? Generally about 70 % think that words starting with K are more common. In truth, however, there are approximately twice as many words with K in the third position as there are words that begin with it. 21/39 22/39 Biases in probability elicitation Anchoring and adjustment Biases in probability elicitation Conservatism In several situations people assess probabilities by adjusting a given number. Often the adjustment isn t big enough and the final assessment is too close to the starting value. Example: Is the percentage of African countries in the UN a) greater or less than 65? What is the exact percentage? Average answer: Less, 45% b) greater or less than 10? What is the exact percentage? Average answer: Greater, 25% How to reduce: Avoid giving starting values People tend to change previous probability estimates more slowly than warranted by new data (according to the Bayes theorem). Example: Suppose we have two bags. One bag contains 30 white balls and 10 black balls. The other bag contains 30 black balls and 10 white. Suppose we choose one of these bags at random. For this bag we select five balls at random, replacing each ball after it has been selected. The result is that we find 4 white balls and one black. What is the probability that we were using the bag with mainly white balls? If you are a typical subject, your would have answered between 0.7 and 0.8. The answer is in fact, 0.96, which can be calculated using the Bayes theorem. 23/39 24/39
Biases in probability elicitation Hindsight bias People falsely believe they would have predicted the outcome of an event. Once outcomes observed, the DM may assume that they are the only ones that could have happened and underestimate the uncertainty. Prevents ourselves from learning from the past. Warning people of this bias has little effect. How to reduce: Argue against the inevitability of the reported outcome and convince that it might have turned out otherwise. Biases in probability elicitation Overconfidence People tend to be overconfident in their assessments According to research: claimed confidence 100 % 90 % 80 % relative frequency of correct answers 80 % 75 % 65 % How to reduce: Give feedback according to the quality of earlier assessments. 25/39 26/39 Calibration of experts Calibration of experts Calibration of experts Example: Calibration of experts (1/2) Goal: To know that if the DM says the probability is p, the real probability is f(p). Define a calibration curve p f(p) Calibration can be done e.g. by eliciting known probabilities. An expert assesses a cumulative subjective probability distribution function F X (x) for a variable X prior to its realisation When the outcome x* becomes known, define ζ as ζ = F X (x*) Now ζ should follow the uniform distribution ζ ~ U(0,1) 27/39 On 10 recent occasions, a technical expert assessed normal probability distributions on the performance index x i of a series of research experiments: The values ζ i are calculated from the cumulative predictive distributions: ζ i = F N (x i µ i, σ i ) In a situation of perfect calibration the ζ i :s should be 0.1, 0.2, 0.9, 1.0. 28/39
Calibration of experts Example: Calibration of experts (2/2) Now the calibration function to the left can be drawn. If the expert assessed the distribution shown on right for an eleventh experiment, it could be corrected using the calibration function. Assessed probability distributions can be improved i.e. updated when new information is gained. 29/39 30/39 Updating of discrete probabilities 1. We have a probability estimate for event H: prior probability P(H) 2. New information D is gained The Bayes theorem The updating is done using the Bayes theorem: P( D H) P( H) PH ( D) = PD ( ) 3. Update the estimate using Bayes theorem: posterior probability P(H D) 31/39 32/39
Example: Using Bayes theorem Updating of continuous distributions (1/3) 1,5 % of the population suffer from schizophrenia P(S) = 0.015 (prior probability) Brain atrophy is found in 30 % of the schizophrenic P(A S) = 0.3 2 % of normal people P(A S) = 0.02 If a person has brain atrophy, the probability that he is schizophrenic (posterior probability) is: PASPS ( ) ( ) PS ( A) = PASPS ( ) ( ) + PASPS ( ) ( ) 0.3 0.015 = = 0.186 0.3 0.015 + 0.02 0.985 Picture: Clemen s. 250 Figure: Posterior probability with different prior probabilities. 33/39 Choose a theoretical distribution, P(X=x θ), for the physical process of interest. Assess uncertainty about parameter θ: prior distribution, f(θ) Observe data x 1 Update using Bayes theorem: posterior distribution of θ, f(θ x 1 ) 34/39 Note: Uncertainty about X has two parts: 1. Due to the process itself, P(X=x θ). 2. Uncertainty about θ, f(θ), later updated to f(θ x 1 ). Updating of continuous distributions (2/3) Updating of continuous distributions (3/3) Bayes theorem for continuous θ : f( x1 θ) f( θ) f( θ x1 ) = f ( x θ) f( θ) dθ 1 f(x 1 θ) is called the likelihood function of θ with a given observed data x 1. In most cases the posterior distribution can not be calculated analytically, but must be solved numerically. When we have natural conjugate distributions the posterior distribution can be solved analytically. If the distribution for the physical process of interest and the prior distribution for its parameter are suitably chosen, then the prior and posterior distributions of the parameter have the same form: physical process: θ prior: θ posterior: X ~ Exp(λ) λ ~ Gamma(α, β) λ ~ Gamma(α*, β*) X ~ Bin(n, p) p ~ Beta(r 0, n 0 ) p ~ Beta(r*, n*) X ~ N(µ, σ 2 ) µ ~ N(m 0, σ 2 0 ) µ ~ N(m*, σ2 *) 35/39 36/39
Binomial distribution About the beta distribution 1. The physical process of interest is binomial distributed: X ~ Bin(n, p) 2. Prior distribution for p: p ~ Beta(r 0, n 0 ) Γ( r0) Γ( n0 r0) r0 1 n0 r0 f ( ) (1 ) 1 β p = p p,0< p< 1, r0, n0 > 0 Γ( n ) 0 3. Observe a sample of the physical process: sample size: n 1 favourable cases: r 1 4. The posterior distribution, calculated using the Bayes theorem, gets reduced to: p ~ Beta(r 0 +r 1, n 0 +n 1 ) The beta distribution is very flexible. Some examples: 37/39 38/39 Normal distribution 1. The physical process of interest is normal distributed: X ~ N(µ, σ 2 ) (σ is assumed to be known) 2. Prior distribution for µ: µ ~ N(m 0, σ 2 0 ) (notation: σ2 0 = σ2 / n 0 ) 3. Observe a sample of the physical process: sample size: n 1 sample mean: x 1 4. The posterior distribution, calculated using the Bayes theorem, gets reduced to: 2 2 m µ ~ N(m*, σ 2 0σ n1+ x1σ0 n0m0 + n1x 1 *), where m* = = 2 2 σ n1+ σ0 n0 + n1 2 2 2 2 σσ 0 n1 σ σ * = = 2 2 σ n + σ n + n 1 0 0 1 39/39